CN116939254A

CN116939254A - Video stream transmission method, device, computer equipment and storage medium

Info

Publication number: CN116939254A
Application number: CN202310878027.XA
Authority: CN
Inventors: 曾其妙; 庄一嵘; 陈戈; 梁洁; 海锦霞
Original assignee: China Telecom Technology Innovation Center; China Telecom Corp Ltd
Current assignee: China Telecom Technology Innovation Center; China Telecom Corp Ltd
Priority date: 2023-07-17
Filing date: 2023-07-17
Publication date: 2023-10-24

Abstract

The application relates to the technical field of artificial intelligence, and provides a video stream sending method, a video stream sending device, computer equipment, a storage medium and a computer program product. The application can improve the efficiency and accuracy of video stream transmission. The method comprises the following steps: model parameters of a target visual angle information prediction model sent by a server are received; model parameters of the target visual angle information prediction model are obtained by aggregating model parameters in a basic visual angle information prediction model trained in advance on each terminal through a server; updating model parameters in a basic visual angle information prediction model trained in advance on a terminal by using model parameters of a target visual angle information prediction model to obtain the target visual angle information prediction model; determining predicted view angle information of the object to which the terminal belongs at the next time through a target view angle information prediction model; transmitting the predicted viewing angle information to a server; the server is used for determining a target video stream corresponding to the predicted viewing angle information and sending the target video stream to the terminal.

Description

Video stream transmission method, device, computer equipment and storage medium

Technical Field

The present application relates to the field of artificial intelligence, and in particular, to a video streaming method, apparatus, computer device, storage medium, and computer program product.

Background

With the development of video technology, the requirements of video quality are continuously increasing. In order to improve the smoothness of video playing and reduce the time delay of video playing, efficient video streaming is required.

In the conventional technology, a server generally transmits a complete video stream of each frame to a terminal; however, in the case where the data amount of the video stream is large, a large amount of memory and bandwidth are required to be consumed, resulting in low efficiency of video stream transmission.

Disclosure of Invention

In view of the foregoing, it is desirable to provide a video stream transmission method, apparatus, computer device, computer-readable storage medium, and computer program product that can improve the efficiency of video stream transmission.

In a first aspect, the present application provides a video stream transmission method. The method is applied to the terminal and comprises the following steps:

model parameters of a target visual angle information prediction model sent by a server are received; model parameters of the target visual angle information prediction model are obtained by the server through aggregation processing of model parameters in the basic visual angle information prediction model trained in advance on each terminal;

Updating model parameters in a basic visual angle information prediction model trained in advance on the terminal by using the model parameters of the target visual angle information prediction model to obtain the target visual angle information prediction model;

determining predicted view angle information of the object to which the terminal belongs at the next time through the target view angle information prediction model;

transmitting the predicted viewing angle information to the server; the server is used for determining a target video stream corresponding to the predicted viewing angle information and sending the target video stream to the terminal.

In one embodiment, the pre-trained base view information prediction model on the terminal is obtained by training in the following manner:

the method comprises the steps of inputting historical view angle information of sample time of a sample object and historical equipment information of the sample time which are locally stored by the terminal into a basic view angle information prediction model to be trained on the terminal, and obtaining predicted view angle information of the sample object at the next time of the sample time;

and training a basic visual angle information prediction model to be trained on the terminal according to the difference between the predicted visual angle information of the sample object at the next time and the real visual angle information of the sample object at the next time, so as to obtain a basic visual angle information prediction model trained in advance on the terminal.

In one embodiment, before receiving the model parameters of the target perspective information prediction model sent by the server, the method further includes:

extracting model parameters of a pre-trained basic visual angle information prediction model on the terminal; the model parameters at least comprise weight parameters and deviation parameters;

and sending model parameters in a pre-trained basic visual angle information prediction model on the terminal to the server.

In one embodiment, after the predicted viewing angle information is sent to the server, the method further includes:

acquiring real visual angle information of an object to which the terminal belongs at the next time;

and playing the target video corresponding to the target video stream on the terminal under the condition that the predicted view angle information is matched with the real view angle information.

In one embodiment, updating the model parameters in the basic view angle information prediction model trained in advance on the terminal by using the model parameters of the target view angle information prediction model to obtain the target view angle information prediction model includes:

dividing model parameters of the target visual angle information prediction model to obtain weight parameters and deviation parameters of the target visual angle information prediction model;

And replacing the weight parameters in the pre-trained basic view angle information prediction model on the terminal with the weight parameters of the target view angle information prediction model, and replacing the deviation parameters in the pre-trained basic view angle information prediction model on the terminal with the deviation parameters of the target view angle information prediction model to obtain the target view angle information prediction model.

In a second aspect, the present application provides a video stream transmission method. Applied to a server, the method comprises the following steps:

performing aggregation processing on model parameters in a basic visual angle information prediction model trained in advance on each terminal to obtain model parameters of a target visual angle information prediction model;

sending the model parameters of the target visual angle information prediction model to a corresponding terminal; the terminal is used for updating the model parameters in the basic view angle information prediction model trained in advance on the terminal by utilizing the model parameters of the target view angle information prediction model to obtain the target view angle information prediction model, determining the predicted view angle information of the object to which the terminal belongs at the next time through the target view angle information prediction model, and sending the predicted view angle information to the server;

And determining a target video stream corresponding to the predicted viewing angle information, and sending the target video stream to the terminal.

In one embodiment, the aggregating the model parameters in the pre-trained basic view angle information prediction model on each terminal to obtain the model parameters of the target view angle information prediction model includes:

dividing model parameters in a pre-trained basic visual angle information prediction model on each terminal to obtain weight parameters and deviation parameters of the pre-trained basic visual angle information prediction model on each terminal;

the weight parameters of the basic visual angle information prediction model trained in advance on each terminal are subjected to aggregation treatment to obtain the weight parameters of the target visual angle information prediction model;

performing aggregation processing on deviation parameters of a basic visual angle information prediction model pre-trained on each terminal to obtain deviation parameters of the target visual angle information prediction model;

and identifying the weight parameters of the target visual angle information prediction model and the deviation parameters of the target visual angle information prediction model as model parameters of the target visual angle information prediction model.

In a third aspect, the present application further provides a video stream sending device. The device is applied to the terminal and comprises:

The parameter receiving module is used for receiving model parameters of the target visual angle information prediction model sent by the server; model parameters of the target visual angle information prediction model are obtained by the server through aggregation processing of model parameters in the basic visual angle information prediction model trained in advance on each terminal;

the parameter updating module is used for updating the model parameters in the basic visual angle information prediction model trained in advance on the terminal by utilizing the model parameters of the target visual angle information prediction model to obtain the target visual angle information prediction model;

the view angle determining module is used for determining predicted view angle information of the next time of the object to which the terminal belongs through the target view angle information prediction model;

the information sending module is used for sending the predicted viewing angle information to the server; the server is used for determining a target video stream corresponding to the predicted viewing angle information and sending the target video stream to the terminal.

In a fourth aspect, the application further provides a video stream sending device. Applied to a server, the apparatus comprising:

the parameter processing module is used for carrying out aggregation processing on model parameters in the basic visual angle information prediction model trained in advance on each terminal to obtain model parameters of the target visual angle information prediction model;

The parameter sending module is used for sending the model parameters of the target visual angle information prediction model to the corresponding terminals; the terminal is used for updating the model parameters in the basic view angle information prediction model trained in advance on the terminal by utilizing the model parameters of the target view angle information prediction model to obtain the target view angle information prediction model, determining the predicted view angle information of the object to which the terminal belongs at the next time through the target view angle information prediction model, and sending the predicted view angle information to the server;

and the information determining module is used for determining a target video stream corresponding to the predicted viewing angle information and sending the target video stream to the terminal.

In a fifth aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:

In a sixth aspect, the present application also provides a computer device. The computer device comprises a memory storing a computer program and a processor which when executing the computer program performs the steps of:

In a seventh aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

In an eighth aspect, the present application also provides a computer-readable storage medium. The computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

In a ninth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:

In a tenth aspect, the present application also provides a computer program product. The computer program product comprises a computer program which, when executed by a processor, implements the steps of:

The video stream sending method, the video stream sending device, the computer equipment, the storage medium and the computer program product are used for receiving model parameters of a target visual angle information prediction model sent by a server; model parameters of the target visual angle information prediction model are obtained by the server through aggregation processing of model parameters in the basic visual angle information prediction model trained in advance on each terminal; updating model parameters in a basic visual angle information prediction model trained in advance on the terminal by using the model parameters of the target visual angle information prediction model to obtain the target visual angle information prediction model; determining predicted view angle information of the object to which the terminal belongs at the next time through the target view angle information prediction model; transmitting the predicted viewing angle information to the server; the server is used for determining a target video stream corresponding to the predicted viewing angle information and sending the target video stream to the terminal. According to the scheme, model parameters in a basic visual angle information prediction model trained in advance on each terminal are aggregated through a server to obtain model parameters of a target visual angle information prediction model, so that accurate global model parameters are obtained; sending the model parameters of the target visual angle information prediction model to a corresponding terminal; the method comprises the steps that a terminal receives model parameters of a target visual angle information prediction model sent by a server, and updates the model parameters in a basic visual angle information prediction model trained in advance on the terminal by using the model parameters of the target visual angle information prediction model to obtain the target visual angle information prediction model, so that a more accurate and efficient local target visual angle information prediction model is obtained; determining the predicted view angle information of the next time of the object to which the terminal belongs through the target view angle information prediction model, so that the predicted view angle of the next time of the object is obtained more accurately and rapidly; transmitting the predicted viewing angle information to the server; the server determines a target video stream corresponding to the predicted viewing angle information, and sends the target video stream to the terminal, so that only part of video stream needs to be sent to the terminal, thereby being beneficial to improving the efficiency and accuracy of video stream sending.

Drawings

FIG. 1 is an application environment diagram of a video streaming method in one embodiment;

FIG. 2 is a flow chart of a video streaming method in one embodiment;

FIG. 3 is a flow chart of a video streaming method according to another embodiment;

FIG. 4 is an application environment diagram of a video streaming method in another embodiment;

FIG. 5 is a block diagram showing the structure of a video stream transmission apparatus in one embodiment;

fig. 6 is a block diagram showing the structure of a video stream transmission apparatus in another embodiment;

FIG. 7 is an internal block diagram of a computer device in one embodiment;

fig. 8 is an internal structural view of a computer device in another embodiment.

Detailed Description

The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or sufficiently authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region.

The video stream sending method provided by the application can be applied to an application environment shown in figure 1. The application scenario may include: the terminal and the server can be in communication connection. Specifically, the server aggregates model parameters in a basic visual angle information prediction model trained in advance on each terminal to obtain model parameters of a target visual angle information prediction model, and sends the model parameters of the target visual angle information prediction model to a corresponding terminal; the terminal receives model parameters of a target visual angle information prediction model sent by the server, updates the model parameters in a basic visual angle information prediction model trained in advance on the terminal by using the model parameters of the target visual angle information prediction model to obtain a target visual angle information prediction model, determines the predicted visual angle information of the object to which the terminal belongs at the next time through the target visual angle information prediction model, and sends the predicted visual angle information to the server; the server determines a target video stream corresponding to the predicted viewing angle information and transmits the target video stream to the terminal. The terminal can be, but not limited to, various personal computers, notebook computers, smart phones, tablet computers, internet of things equipment and portable wearable equipment, and the internet of things equipment can be smart televisions, smart vehicle-mounted equipment and the like. The portable wearable device may be a headset device or the like. The server may be implemented as a stand-alone server or as a server cluster composed of a plurality of servers.

In one embodiment, as shown in fig. 2, a video stream transmission method is provided, and this embodiment is illustrated by applying the method to a terminal. In this embodiment, the method includes the steps of:

step S201, receiving model parameters of a target visual angle information prediction model sent by a server; model parameters of the target visual angle information prediction model are obtained by aggregation processing of model parameters in the basic visual angle information prediction model trained in advance on each terminal through a server.

In this step, the model parameters of the target view information prediction model may refer to global model parameters of a global model, where the global model is used to predict view information, for example, the target view information prediction model may be a neural network model, such as an LSTM (long short-term memory artificial neural network) model, where the view information may be FOV (Field of view), for example, the view information is FOV of VR (virtual reality technology) live video; the server may be a media server; the terminal may be a VR device, for example, a VR device with live software installed; the pre-trained base view information prediction model may be a trained view information prediction model local to the terminal, e.g., the base view information prediction model may be a neural network model, such as an LSTM (long short term memory artificial neural network) model.

Specifically, each terminal respectively transmits model parameters in a pre-trained basic visual angle information prediction model on each terminal to a server; the method comprises the steps that a server receives model parameters in a pre-trained basic visual angle information prediction model on each terminal, aggregates the model parameters in the pre-trained basic visual angle information prediction model on each terminal to obtain model parameters of a target visual angle information prediction model, and sends the model parameters of the target visual angle information prediction model to corresponding terminals (such as each terminal); and the terminal receives the model parameters of the target visual angle information prediction model sent by the server.

Step S202, updating model parameters in the basic view angle information prediction model trained in advance on the terminal by using model parameters of the target view angle information prediction model to obtain the target view angle information prediction model.

Specifically, the terminal updates the model parameters in the local pre-trained basic view angle information prediction model on the terminal by using the model parameters of the target view angle information prediction model to obtain an updated view angle information prediction model, wherein the updated view angle information prediction model is the target view angle information prediction model.

Step S203, determining predicted view angle information of the object to which the terminal belongs at the next time by using the target view angle information prediction model.

In this step, the object to which the terminal belongs may be an object that is using the terminal, for example, a user who is using the terminal to watch VR video (such as VR live video); the next time may refer to the next time or the next time period; the predicted view information of the next time of the object may refer to the predicted view of the object at the next time.

Specifically, the terminal acquires current view angle information of an object to which the terminal belongs at a current time and current equipment information of the current time (wherein the current time can be a current time, can also be the current time and a period of time before the current time; the current view angle information can be a recorded view angle of the object; the current equipment information can be recorded network information of the obtained terminal, can also be recorded network information of the obtained terminal and buffer information), and inputs the current view angle information of the object to which the terminal belongs at the current time and the current equipment information of the current time into the target view angle information prediction model to obtain predicted view angle information of the next time of the object to which the terminal belongs, which is output by the target view angle information prediction model.

Step S204, the predicted viewing angle information is sent to a server; the server is used for determining a target video stream corresponding to the predicted viewing angle information and sending the target video stream to the terminal.

In this step, the target video stream corresponding to the predicted view angle information may be a video stream corresponding to the predicted view angle information in a video (e.g., VR live video), and the target video stream may be a partial video stream in the complete video stream of each frame.

Specifically, the terminal sends the predicted viewing angle information to the server; and the server receives the predicted view angle information sent by the terminal, determines a target video stream corresponding to the predicted view angle information from the complete video stream, and sends the target video stream to the terminal.

In the video stream sending method, model parameters of a target visual angle information prediction model sent by a server are received; model parameters of the target visual angle information prediction model are obtained by aggregating model parameters in the basic visual angle information prediction model trained in advance on each terminal through a server; updating model parameters in a basic visual angle information prediction model trained in advance on a terminal by using model parameters of a target visual angle information prediction model to obtain the target visual angle information prediction model; determining predicted view angle information of the object to which the terminal belongs at the next time through a target view angle information prediction model; transmitting the predicted viewing angle information to a server; the server is used for determining a target video stream corresponding to the predicted viewing angle information and sending the target video stream to the terminal. According to the scheme, model parameters in a basic visual angle information prediction model trained in advance on each terminal are aggregated through a server to obtain model parameters of a target visual angle information prediction model, so that accurate global model parameters are obtained; model parameters of the target visual angle information prediction model are sent to corresponding terminals; the terminal receives model parameters of the target visual angle information prediction model sent by the server, updates the model parameters in the basic visual angle information prediction model trained in advance on the terminal by using the model parameters of the target visual angle information prediction model to obtain the target visual angle information prediction model, so that a more accurate and efficient local target visual angle information prediction model is obtained; the target visual angle information prediction model is used for determining the predicted visual angle information of the next time of the object to which the terminal belongs, so that the predicted visual angle of the next time of the object can be obtained more accurately and rapidly; transmitting the predicted viewing angle information to a server; the server determines a target video stream corresponding to the predicted viewing angle information, and sends the target video stream to the terminal, so that only part of the video stream needs to be sent to the terminal, thereby being beneficial to improving the efficiency and accuracy of video stream sending.

In one embodiment, the pre-trained base view information prediction model on the terminal is obtained by training in the following manner, and specifically includes the following contents: the method comprises the steps of inputting historical view angle information of sample time and historical equipment information of sample time of a sample object stored locally by a terminal into a basic view angle information prediction model to be trained on the terminal to obtain predicted view angle information of the sample object at the next time of the sample time; and training the basic visual angle information prediction model to be trained on the terminal according to the difference between the predicted visual angle information of the sample object at the next time and the real visual angle information of the sample object at the next time to obtain the basic visual angle information prediction model trained in advance on the terminal.

In this embodiment, the historical view information of the sample time of the sample object may refer to a view of the sample object at the sample time, where the sample object may be a sample user who uses the terminal to watch VR video (such as VR live video), and the sample time may refer to a historical time (where the sample time may be a historical moment or a historical period); the historical device information of the sample time may be device information of the terminal at the sample time and may include network information and buffer information of the terminal; the predicted view angle information of the sample object at the next time of the sample time may be predicted view angle information of the sample object at the next time or the next period of the sample time; the real perspective information of the next time of the sample object may be the real perspective of the sample object at the next time of the sample time.

Specifically, the terminal inputs the historical view angle information of the sample time of the sample object and the historical equipment information of the sample time which are locally stored by the terminal into a basic view angle information prediction model to be trained on the terminal, so as to obtain the predicted view angle information of the sample object at the next time of the sample time, which is output by the basic view angle information prediction model to be trained; according to the difference between the predicted view angle information of the sample object at the next time and the real view angle information of the sample object at the next time, training a basic view angle information prediction model to be trained on the terminal to obtain a trained basic view angle information prediction model, and taking the trained basic view angle information prediction model as a pre-trained basic view angle information prediction model on the terminal.

The terminal inputs historical view angle information (such as viewing behavior data, which may include FOV, FOV maintaining time and FOV moving preference) of sample time of a sample object stored locally by the terminal and historical equipment information (such as network state information and buffer state information of the terminal) of the sample time into a base view angle information prediction model to be trained on the terminal, analyzes the historical view angle information and the historical equipment information through the base view angle information prediction model to be trained (for example, according to the historical view angle information, future FOV maintaining time and future FOV moving direction can be predicted, according to the network state information and buffer state information in the historical equipment information, the moving speed of the future FOV can be predicted, for example, when the network state information and the buffer state information are represented as poor states, the loading speed or the jam situation can easily occur, the moving speed of the future FOV can be predicted, when the network state information and the buffer state information are represented as good states, the moving speed of the future FOV can be predicted, and the predicted view angle information of the sample object output by the base view angle information prediction model to be trained at the next time of the sample time can be obtained; training a basic visual angle information prediction model to be trained on the terminal by utilizing the difference value between the predicted visual angle information of the next time of the sample object and the real visual angle information of the next time of the sample object to obtain a trained basic visual angle information prediction model, judging whether the trained basic visual angle information prediction model meets error conditions (such as the condition that the difference value is smaller than a set error threshold), and taking the trained basic visual angle information prediction model meeting the error conditions as a pre-trained basic visual angle information prediction model on the terminal.

According to the technical scheme, the basic visual angle information prediction model to be trained on the terminal is trained, so that the basic visual angle information prediction model which is trained in advance on the terminal is more efficient and accurate, and the efficiency and the accuracy of video stream transmission are improved.

In one embodiment, the step S201 further includes a step of sending the model parameters before receiving the model parameters of the target viewing angle information prediction model sent by the server, and specifically includes the following steps: extracting model parameters of a pre-trained basic visual angle information prediction model on a terminal; the model parameters at least comprise weight parameters and deviation parameters; and sending model parameters in the pre-trained basic visual angle information prediction model on the terminal to a server.

Specifically, the weight parameter and the deviation parameter of the basic visual angle information prediction model pre-trained on the terminal are extracted from the basic visual angle information prediction model pre-trained on the terminal, and the weight parameter and the deviation parameter of the basic visual angle information prediction model pre-trained on the terminal are used as the model parameters of the basic visual angle information prediction model pre-trained on the terminal; model parameters in a basic visual angle information prediction model trained in advance on the terminal are sent to a server; and the server receives model parameters in the pre-trained basic visual angle information prediction model sent by the terminal.

According to the technical scheme provided by the embodiment, the model parameters of the basic visual angle information prediction model trained in advance on the terminal are sent to the server, so that the data privacy of an object is protected, the transmission overhead is reduced, the distribution efficiency of a network is improved, and the video stream sending efficiency is improved.

In one embodiment, the step S204 further includes a step of playing the target video after the predicted viewing angle information is sent to the server, which specifically includes the following steps: acquiring real visual angle information of an object to which the terminal belongs at the next time; and playing the target video corresponding to the target video stream on the terminal under the condition that the predicted view angle information is matched with the real view angle information.

In this embodiment, the real view angle information of the next time of the object to which the terminal belongs may be the real view angle (real view angle) of the next time of the object to which the terminal belongs.

Specifically, the terminal acquires real visual angle information of the object to which the terminal belongs at the next time; and judging whether the predicted view angle information is matched with the real view angle information (for example, judging whether the predicted view angle information is identical to or coincident with the real view angle information), and playing the target video corresponding to the target video stream on the terminal when the predicted view angle information is matched with the real view angle information (for example, when the predicted view angle information is identical to or coincident with the real view angle information).

According to the technical scheme provided by the embodiment, under the condition that the predicted view angle information is matched with the real view angle information, the target video corresponding to the target video stream is played, the target video stream is sent to the terminal in advance through the server, the time delay of video stream sending is reduced, the time delay of video corresponding to the played video stream is reduced, and the fluency of video corresponding to the played video stream is improved.

In one embodiment, in step S202, the model parameters in the basic view angle information prediction model trained in advance on the terminal are updated by using the model parameters of the target view angle information prediction model to obtain the target view angle information prediction model, which specifically includes the following contents: dividing model parameters of the target visual angle information prediction model to obtain weight parameters and deviation parameters of the target visual angle information prediction model; and replacing the weight parameter in the basic view angle information prediction model which is trained in advance on the terminal with the weight parameter of the target view angle information prediction model, and replacing the deviation parameter in the basic view angle information prediction model which is trained in advance on the terminal with the deviation parameter of the target view angle information prediction model to obtain the target view angle information prediction model.

In this embodiment, the model parameters of the target view angle information prediction model may include a weight parameter and a bias parameter of the target view angle information prediction model.

Specifically, the terminal divides model parameters of the target visual angle information prediction model to obtain weight parameters of the target visual angle information prediction model and deviation parameters of the target visual angle information prediction model; and simultaneously, replacing the deviation parameter in the basic view angle information prediction model which is trained in advance on the terminal with the deviation parameter of the target view angle information prediction model to obtain a view angle information prediction model with updated parameters, and confirming the view angle information prediction model with updated parameters as the target view angle information prediction model (the view angle information prediction model with updated parameters is the target view angle information prediction model).

According to the technical scheme provided by the embodiment, the model parameters in the basic visual angle information prediction model trained in advance on the terminal are correspondingly replaced by the model parameters of the target visual angle information prediction model, so that the accuracy of updating the model parameters is improved, and the accuracy of transmitting the video stream is improved.

In one embodiment, as shown in fig. 3, a video streaming method is provided, and this embodiment is illustrated by applying the method to a server. In this embodiment, the method includes the steps of:

step S301, model parameters in the basic visual angle information prediction model trained in advance on each terminal are aggregated to obtain model parameters of the target visual angle information prediction model.

Specifically, the server receives model parameters in the pre-trained basic view angle information prediction model of each terminal sent by each terminal, performs aggregation processing on the model parameters in the pre-trained basic view angle information prediction model of each terminal to obtain aggregated model parameters, and takes the aggregated model parameters as the model parameters of the target view angle information prediction model.

Step S302, model parameters of a target visual angle information prediction model are sent to a corresponding terminal; the terminal is used for updating the model parameters in the basic view angle information prediction model trained in advance on the terminal by utilizing the model parameters of the target view angle information prediction model to obtain the target view angle information prediction model, determining the predicted view angle information of the object to which the terminal belongs at the next time through the target view angle information prediction model, and sending the predicted view angle information to the server.

Step S303, a target video stream corresponding to the predicted viewing angle information is determined, and the target video stream is sent to the terminal.

Specifically, the server receives the predicted viewing angle information sent by the corresponding terminal, determines a target video stream corresponding to the predicted viewing angle information, and sends the target video stream to the corresponding terminal.

In the video stream sending method, model parameters in a basic visual angle information prediction model trained in advance on each terminal are aggregated to obtain model parameters of a target visual angle information prediction model; model parameters of the target visual angle information prediction model are sent to corresponding terminals; the terminal is used for updating the model parameters in the basic visual angle information prediction model trained in advance on the terminal by utilizing the model parameters of the target visual angle information prediction model to obtain a target visual angle information prediction model, determining the predicted visual angle information of the object to which the terminal belongs at the next time through the target visual angle information prediction model, and sending the predicted visual angle information to the server; and determining a target video stream corresponding to the predicted viewing angle information, and transmitting the target video stream to the terminal. According to the scheme, model parameters in a basic visual angle information prediction model trained in advance on each terminal are aggregated through a server to obtain model parameters of a target visual angle information prediction model, so that accurate global model parameters are obtained; model parameters of the target visual angle information prediction model are sent to corresponding terminals; the terminal receives model parameters of the target visual angle information prediction model sent by the server, updates the model parameters in the basic visual angle information prediction model trained in advance on the terminal by using the model parameters of the target visual angle information prediction model to obtain the target visual angle information prediction model, so that a more accurate and efficient local target visual angle information prediction model is obtained; the target visual angle information prediction model is used for determining the predicted visual angle information of the next time of the object to which the terminal belongs, so that the predicted visual angle of the next time of the object can be obtained more accurately and rapidly; transmitting the predicted viewing angle information to a server; the server determines a target video stream corresponding to the predicted viewing angle information, and sends the target video stream to the terminal, so that only part of the video stream needs to be sent to the terminal, thereby being beneficial to improving the efficiency and accuracy of video stream sending.

In one embodiment, in step S301, model parameters in a basic view angle information prediction model trained in advance on each terminal are aggregated to obtain model parameters of a target view angle information prediction model, which specifically includes the following contents: dividing model parameters in the pre-trained basic visual angle information prediction model on each terminal to obtain weight parameters and deviation parameters of the pre-trained basic visual angle information prediction model on each terminal; the weight parameters of the basic visual angle information prediction model trained in advance on each terminal are aggregated to obtain the weight parameters of the target visual angle information prediction model; aggregating deviation parameters of a basic visual angle information prediction model trained in advance on each terminal to obtain deviation parameters of a target visual angle information prediction model; and identifying the weight parameters of the target visual angle information prediction model and the deviation parameters of the target visual angle information prediction model as model parameters of the target visual angle information prediction model.

In this embodiment, the model parameters in the pre-trained base view angle information prediction model on the terminal may include weight parameters and bias parameters of the pre-trained base view angle information prediction model on the terminal.

Specifically, the terminal divides model parameters in the pre-trained basic visual angle information prediction model on each terminal to obtain weight parameters of the pre-trained basic visual angle information prediction model on each terminal and deviation parameters of the pre-trained basic visual angle information prediction model on each terminal; the method comprises the steps of carrying out aggregation treatment (for example, the aggregation treatment can be average value taking treatment) on weight parameters of a basic visual angle information prediction model trained in advance on each terminal to obtain aggregated weight parameters, and taking the aggregated weight parameters as weight parameters of a target visual angle information prediction model; performing aggregation treatment on deviation parameters of a basic visual angle information prediction model trained in advance on each terminal to obtain aggregated deviation parameters, and taking the aggregated deviation parameters as deviation parameters of a target visual angle information prediction model; and combining the weight parameters of the target visual angle information prediction model and the deviation parameters of the target visual angle information prediction model to obtain the model parameters of the target visual angle information prediction model.

According to the technical scheme provided by the embodiment, the aggregated weight parameters and the aggregated deviation parameters are used as the model parameters of the target visual angle information prediction model, so that the more accurate model parameters of the target visual angle information prediction model can be obtained, and the accuracy of video stream transmission can be improved.

The following describes a video stream sending method provided by the present application with an embodiment, where the method is applied to a terminal and a server for illustration, and the main steps include:

the method comprises the steps that firstly, a terminal inputs historical view angle information of sample time and historical equipment information of sample time of a sample object stored locally by the terminal into a basic view angle information prediction model to be trained on the terminal, and prediction view angle information of the sample object at the next time of the sample time is obtained.

And secondly, training the basic visual angle information prediction model to be trained on the terminal according to the difference between the predicted visual angle information of the sample object at the next time and the real visual angle information of the sample object at the next time by the terminal to obtain the basic visual angle information prediction model pre-trained on the terminal.

Thirdly, the terminal extracts model parameters of a basic visual angle information prediction model trained in advance on the terminal; the model parameters at least comprise weight parameters and deviation parameters; and sending model parameters in the pre-trained basic visual angle information prediction model on the terminal to a server.

Dividing model parameters in the pre-trained basic visual angle information prediction model on each terminal by the server to obtain weight parameters and deviation parameters of the pre-trained basic visual angle information prediction model on each terminal; the weight parameters of the basic visual angle information prediction model trained in advance on each terminal are aggregated to obtain the weight parameters of the target visual angle information prediction model; aggregating deviation parameters of a basic visual angle information prediction model trained in advance on each terminal to obtain deviation parameters of a target visual angle information prediction model; and identifying the weight parameters of the target visual angle information prediction model and the deviation parameters of the target visual angle information prediction model as model parameters of the target visual angle information prediction model. And sending the model parameters of the target visual angle information prediction model to the corresponding terminal.

Fifthly, the terminal receives model parameters of a target visual angle information prediction model sent by a server; dividing model parameters of the target visual angle information prediction model to obtain weight parameters and deviation parameters of the target visual angle information prediction model; and replacing the weight parameter in the basic view angle information prediction model which is trained in advance on the terminal with the weight parameter of the target view angle information prediction model, and replacing the deviation parameter in the basic view angle information prediction model which is trained in advance on the terminal with the deviation parameter of the target view angle information prediction model to obtain the target view angle information prediction model.

Sixthly, the terminal determines the predicted view angle information of the object to which the terminal belongs at the next time through a target view angle information prediction model; the predicted viewing angle information is transmitted to a server.

Seventh, the server determines a target video stream corresponding to the predicted viewing angle information, and transmits the target video stream to the terminal.

Eighth step, the terminal obtains the real visual angle information of the object of the terminal at the next time; and playing the target video corresponding to the target video stream on the terminal under the condition that the predicted view angle information is matched with the real view angle information.

According to the technical scheme provided by the embodiment, model parameters in the basic visual angle information prediction model trained in advance on each terminal are aggregated through the server, so that model parameters of the target visual angle information prediction model are obtained, and accurate global model parameters are obtained; model parameters of the target visual angle information prediction model are sent to corresponding terminals; the terminal receives model parameters of the target visual angle information prediction model sent by the server, updates the model parameters in the basic visual angle information prediction model trained in advance on the terminal by using the model parameters of the target visual angle information prediction model to obtain the target visual angle information prediction model, so that a more accurate and efficient local target visual angle information prediction model is obtained; the target visual angle information prediction model is used for determining the predicted visual angle information of the next time of the object to which the terminal belongs, so that the predicted visual angle of the next time of the object can be obtained more accurately and rapidly; transmitting the predicted viewing angle information to a server; the server determines a target video stream corresponding to the predicted viewing angle information, and sends the target video stream to the terminal, so that only part of the video stream needs to be sent to the terminal, thereby being beneficial to improving the efficiency and accuracy of video stream sending.

The video stream sending method provided by the application is illustrated by an application example, and the application example is applied to a terminal and a server for illustration by the method, as shown in fig. 4, the main steps include:

in a first step, a terminal (e.g., a plurality of client participants) gathers FOV prediction related datasets on a local device. Such data may include object viewing history data, as well as viewing behavior data of objects (FOV, FOV maintenance time, FOV movement preference), network status and buffer status, etc.

Secondly, the terminal trains a local FOV prediction model (a basic visual angle information prediction model to be trained) on the local equipment by utilizing the collected data through an LSTM algorithm to obtain a basic visual angle information prediction model pre-trained on the terminal, and the weight parameters and the deviation parameters of the basic visual angle information prediction model pre-trained on the terminal are uploaded to a server.

Wherein, due to the privacy protection characteristic of federal learning, the training data of each terminal will not leave the local device, only the local model weight parameters (which can be expressed as w _t ) Deviation parameter(can be expressed as b _t ) And uploading the parameters to a server for parameter aggregation.

Thirdly, after receiving the local model parameters uploaded by each terminal, the server aggregates the parameters through a FedAVg algorithm (federal learning algorithm) to obtain global model parameters. Specifically, it is assumed that in the kth (k may be expressed as a numerical value) round of iteration, the model parameter of the ith (i may be expressed as a numerical value) client (terminal) is set to (w) _i，k ，b _i，k ) The weight and bias parameters of the global model after the kth iteration (kth training) can be expressed asAnd->Wherein N can be expressed as the number of terminals, w _i，k Weight parameters, b, which can be expressed as models on the ith terminal in the kth round of training _i，k Bias parameters, which can be expressed as models on the ith terminal in the kth round of training, use w _k And b _k The parameters update the global model. The global model may extract the required information from the decentralized client data without exposing the client data. The above steps are repeatedly performed until the global model reaches convergence.

And fourthly, the server updates the local model parameters by using the aggregated global model parameters, and then returns the updated model parameters to each terminal.

And fifthly, the terminal updates the local LSTM model by using the updated model parameters returned by the server for the local prediction of the next round.

Sixth, the terminal predicts the FOV of the object at the next time by using the locally updated model to obtain a predicted FOV value of the object at the next time, which can be expressed as (y _t+1 ，p _t+1 ) Wherein y is _t+1 A first angle value, p, which can be expressed as a FOV prediction value of the object at the next instant _t+1 A second angle value which can be expressed as a predicted value of FOV of the object at the next moment, and predictsThe result (FOV predicted value of the object at the next time) is uploaded to the server.

Seventh, the server predicts the FOV view angle (y _t+1 ，p _t+1 ) And issuing the corresponding video stream to the terminal in advance.

And eighth step, when the terminal recognizes that the FOV predicted value is coincident with the FOV true value of the object, the video stream is played, so that the time delay is reduced, and the viewing experience of the object is improved.

The server may include a global prediction module, the local prediction module may be disposed in a gateway or a terminal, the server may be in communication connection with the gateway, and the gateway may be in communication connection with one or more terminals.

Wherein video (live video) can be encoded by MCTS (motion constrained block set) and packetized in a corresponding format via a server (media server) into M x N (e.g., length M and width N, M and N can be represented as numerical values) tiles (which can be understood as image regions) for subsequent view-based transmission, using (y, p) to represent FOV selection of an object (where y can be represented as a first angle value, e.g., yaw angle, and p can be represented as a second angle value, e.g., tilt angle).

The technical scheme provided by the application example realizes protection of the privacy of the data of the object, reduces the network bandwidth required by data uploading, improves the viewing experience of the object, reduces the time delay, reduces the transmission overhead, improves the distribution efficiency of the network, and improves the efficiency and accuracy of video stream transmission.

It should be understood that, although the steps in the flowcharts related to the embodiments described above are sequentially shown as indicated by arrows, these steps are not necessarily sequentially performed in the order indicated by the arrows. The steps are not strictly limited to the order of execution unless explicitly recited herein, and the steps may be executed in other orders. Moreover, at least some of the steps in the flowcharts described in the above embodiments may include a plurality of steps or a plurality of stages, which are not necessarily performed at the same time, but may be performed at different times, and the order of the steps or stages is not necessarily performed sequentially, but may be performed alternately or alternately with at least some of the other steps or stages.

Based on the same inventive concept, the embodiment of the application also provides a video stream sending device for realizing the video stream sending method. The implementation of the solution provided by the device is similar to the implementation described in the above method, so the specific limitation in the embodiments of the video streaming device or devices provided below may refer to the limitation of the video streaming method hereinabove, and will not be repeated herein.

In one embodiment, as shown in fig. 5, a video stream transmission apparatus is provided, and applied to a terminal, the apparatus 500 may include:

the parameter receiving module 501 is configured to receive a model parameter of a target view angle information prediction model sent by a server; model parameters of the target visual angle information prediction model are obtained by aggregation processing of model parameters in the basic visual angle information prediction model trained in advance on each terminal through a server.

And the parameter updating module 502 is configured to update the model parameters in the pre-trained basic view angle information prediction model on the terminal by using the model parameters of the target view angle information prediction model to obtain the target view angle information prediction model.

The view angle determining module 503 is configured to determine, according to the target view angle information prediction model, predicted view angle information of the object to which the terminal belongs at a next time.

An information sending module 504, configured to send the predicted viewing angle information to a server; the server is used for determining a target video stream corresponding to the predicted viewing angle information and sending the target video stream to the terminal.

In one embodiment, the apparatus 500 further comprises: the model training module is used for inputting the historical view angle information of the sample time and the historical equipment information of the sample time of the sample object stored locally by the terminal into a basic view angle information prediction model to be trained on the terminal to obtain the predicted view angle information of the sample object at the next time of the sample time; and training the basic visual angle information prediction model to be trained on the terminal according to the difference between the predicted visual angle information of the sample object at the next time and the real visual angle information of the sample object at the next time to obtain the basic visual angle information prediction model trained in advance on the terminal.

In one embodiment, the apparatus 500 further comprises: the parameter extraction module is used for extracting model parameters of a basic visual angle information prediction model trained in advance on the terminal; the model parameters at least comprise weight parameters and deviation parameters; and sending model parameters in the pre-trained basic visual angle information prediction model on the terminal to a server.

In one embodiment, the apparatus 500 further comprises: the video playing module is used for acquiring real visual angle information of an object to which the terminal belongs at the next time; and playing the target video corresponding to the target video stream on the terminal under the condition that the predicted view angle information is matched with the real view angle information.

In one embodiment, the parameter updating module 502 is further configured to divide model parameters of the target view angle information prediction model to obtain a weight parameter and a deviation parameter of the target view angle information prediction model; and replacing the weight parameter in the basic view angle information prediction model which is trained in advance on the terminal with the weight parameter of the target view angle information prediction model, and replacing the deviation parameter in the basic view angle information prediction model which is trained in advance on the terminal with the deviation parameter of the target view angle information prediction model to obtain the target view angle information prediction model.

In one embodiment, as shown in fig. 6, a video streaming apparatus is provided, applied to a server, the apparatus 600 may include:

and the parameter processing module 601 is configured to aggregate model parameters in the basic view angle information prediction model trained in advance on each terminal, so as to obtain model parameters of the target view angle information prediction model.

The parameter sending module 602 is configured to send model parameters of the target view angle information prediction model to a corresponding terminal; the terminal is used for updating the model parameters in the basic view angle information prediction model trained in advance on the terminal by utilizing the model parameters of the target view angle information prediction model to obtain the target view angle information prediction model, determining the predicted view angle information of the object to which the terminal belongs at the next time through the target view angle information prediction model, and sending the predicted view angle information to the server.

The information determining module 603 is configured to determine a target video stream corresponding to the predicted viewing angle information, and send the target video stream to the terminal.

In one embodiment, the parameter processing module 601 is further configured to divide model parameters in the pre-trained base view angle information prediction model on each terminal, so as to obtain weight parameters and deviation parameters of the pre-trained base view angle information prediction model on each terminal; the weight parameters of the basic visual angle information prediction model trained in advance on each terminal are aggregated to obtain the weight parameters of the target visual angle information prediction model; aggregating deviation parameters of a basic visual angle information prediction model trained in advance on each terminal to obtain deviation parameters of a target visual angle information prediction model; and identifying the weight parameters of the target visual angle information prediction model and the deviation parameters of the target visual angle information prediction model as model parameters of the target visual angle information prediction model.

The respective modules in the video streaming transmission apparatus described above may be implemented in whole or in part by software, hardware, and a combination thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which may be a terminal, and the internal structure of which may be as shown in fig. 7. The computer device includes a processor, a memory, an input/output interface, a communication interface, a display unit, and an input means. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface, the display unit and the input device are connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for carrying out wired or wireless communication with an external terminal, and the wireless mode can be realized through WIFI, a mobile cellular network, NFC (near field communication) or other technologies. The computer program is executed by a processor to implement a video streaming method. The display unit of the computer device is used for forming a visual picture, and can be a display screen, a projection device or a virtual reality imaging device. The display screen can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be a key, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

In one embodiment, a computer device is provided, which may be a server, and the internal structure of which may be as shown in fig. 8. The computer device includes a processor, a memory, an Input/Output interface (I/O) and a communication interface. The processor, the memory and the input/output interface are connected through a system bus, and the communication interface is connected to the system bus through the input/output interface. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer programs, and a database. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The database of the computer device is used for storing data such as video streams. The input/output interface of the computer device is used to exchange information between the processor and the external device. The communication interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a video streaming method.

It will be appreciated by persons skilled in the art that the structures shown in fig. 7 and 8 are block diagrams of only portions of structures associated with the present inventive arrangements and are not limiting of the computer device to which the present inventive arrangements are applied, and that a particular computer device may include more or fewer components than shown, or may combine certain components, or have a different arrangement of components.

In an embodiment, there is also provided a computer device comprising a memory and a processor, the memory having stored therein a computer program, the processor implementing the steps of the method embodiments described above when the computer program is executed.

In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored which, when executed by a processor, carries out the steps of the method embodiments described above.

In an embodiment, a computer program product is provided, comprising a computer program which, when executed by a processor, implements the steps of the method embodiments described above.

Those skilled in the art will appreciate that implementing all or part of the above described methods may be accomplished by way of a computer program stored on a non-transitory computer readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. Any reference to memory, database, or other medium used in embodiments provided herein may include at least one of non-volatile and volatile memory. The nonvolatile Memory may include Read-Only Memory (ROM), magnetic tape, floppy disk, flash Memory, optical Memory, high density embedded nonvolatile Memory, resistive random access Memory (ReRAM), magnetic random access Memory (Magnetoresistive Random Access Memory, MRAM), ferroelectric Memory (Ferroelectric Random Access Memory, FRAM), phase change Memory (Phase Change Memory, PCM), graphene Memory, and the like. Volatile memory can include random access memory (Random Access Memory, RAM) or external cache memory, and the like. By way of illustration, and not limitation, RAM can be in the form of a variety of forms, such as static random access memory (Static Random Access Memory, SRAM) or dynamic random access memory (Dynamic Random Access Memory, DRAM), and the like. The databases referred to in the embodiments provided herein may include at least one of a relational database and a non-relational database. The non-relational database may include, but is not limited to, a blockchain-based distributed database, and the like. The processor referred to in the embodiments provided in the present application may be a general-purpose processor, a central processing unit, a graphics processor, a digital signal processor, a programmable logic unit, a data processing logic unit based on quantum computing, or the like, but is not limited thereto.

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of the application should be assessed as that of the appended claims.

Claims

1. A video streaming transmission method, applied to a terminal, the method comprising:

2. The method of claim 1, wherein the pre-trained base view information prediction model on the terminal is trained by:

3. The method according to claim 2, further comprising, before receiving the model parameters of the target viewing angle information prediction model transmitted by the server:

4. The method of claim 1, further comprising, after transmitting the predicted viewing angle information to the server:

5. The method according to claim 1, wherein updating the model parameters in the basic view information prediction model trained in advance on the terminal by using the model parameters of the target view information prediction model to obtain the target view information prediction model comprises:

6. A video streaming method, applied to a server, comprising:

7. The method according to claim 6, wherein the aggregating the model parameters in the pre-trained base view information prediction model at each terminal to obtain the model parameters of the target view information prediction model includes:

8. A video stream transmission apparatus, characterized by being applied to a terminal, comprising:

9. A video streaming transmission apparatus, applied to a server, comprising:

10. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any of claims 1 to 7 when the computer program is executed.

11. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.

12. A computer program product comprising a computer program, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 7.