CN110809127B

CN110809127B - Video call method and device based on deep simulation learning

Info

Publication number: CN110809127B
Application number: CN201910960211.2A
Authority: CN
Inventors: 周安福; 张欢欢; 马若暄; 苏光远; 张新宇; 马华东; 陈虓将
Original assignee: Beijing University of Posts and Telecommunications
Current assignee: Beijing University of Posts and Telecommunications
Priority date: 2019-10-10
Filing date: 2019-10-10
Publication date: 2021-03-19
Anticipated expiration: 2039-10-10
Also published as: CN110809127A

Abstract

The embodiment of the invention provides a video call method and a device based on deep imitation learning, wherein the method comprises the following steps: acquiring transmission information of a previous transmission time slot aiming at a current transmission time slot of a video call; the transmission information includes: transport layer information and application layer information; inputting the transmission information into a code rate optimization network model to obtain the transmission code rate of the current transmission time slot; the code rate optimization network model is obtained by training according to a training set, wherein the training set comprises: real transmission information and real transmission code rate of each transmission time slot in the sample video call; and sending the video call data to a receiving end based on the transmission code rate of the current transmission time slot. The method and the device can determine the proper transmission code rate in the video call in real time and improve the video call quality.

Description

Video call method and device based on deep simulation learning

Technical Field

The embodiment of the invention relates to the technical field of communication, in particular to a video call method and device based on deep simulation learning.

Background

With the development of communication technology, real-time video call has become an indispensable part of people's life. And mobile wireless network applications such as crowd-sourced live broadcast, cloud video games, robotics, vehicle remote operation, etc. are constantly pushing the growth of video call traffic.

However, the quality of the existing video call is still not satisfactory enough, for example, during the video call, problems such as image blurring, image frame loss, jamming and the like may occur.

The main reasons for the low quality of the existing video call are as follows: the coordination between the application layer and the transport layer is not such that the appropriate bit rate for data transmission cannot be determined. Specifically, the transport layer typically updates the network capacity estimate at millisecond granularity to respond to network changes as dynamically as possible, while the video codec at the application layer can only change the video bit rate over a large time interval, resulting in the video encoder not being able to adjust the transmission rate in real time following the data transmission rate at the transport layer.

Therefore, in the existing video call technology, due to the incompatibility between the application layer and the transmission layer, the proper transmission specific rate cannot be determined, and the video call quality is not high.

Disclosure of Invention

The embodiment of the invention aims to provide a video call method and a video call device based on deep simulation learning, so as to determine a proper transmission code rate in a video call in real time and improve the video call quality. The specific technical scheme is as follows:

in order to achieve the above object, an embodiment of the present invention provides a video call method based on deep mock learning, where the method includes:

acquiring transmission information of a previous transmission time slot aiming at a current transmission time slot of a video call; the transmission information includes: transport layer information and application layer information;

inputting the transmission information into a code rate optimization network model to obtain the transmission code rate of the current transmission time slot; the code rate optimization network model is a model obtained by training according to a training set, wherein the training set comprises: real transmission information and real transmission code rate of each transmission time slot in the sample video call;

and sending the video call data to a receiving end based on the transmission code rate of the current transmission time slot.

Optionally, the transmission layer information includes a packet loss rate and an inter-packet delay, and the application layer information includes a transmission code rate and throughput.

Optionally, the obtaining the transmission information of the previous transmission timeslot includes:

acquiring the transmission code rate output by the code rate optimization network model in the last transmission time slot;

and acquiring feedback information of the receiving end aiming at the last transmission time slot, and determining the packet loss rate, the inter-packet delay and the throughput of the last transmission time slot based on the feedback information.

Optionally, the code rate optimization network model is trained according to the following method:

acquiring a preset neural network model and the training set;

inputting the transmission information of a preset number of first transmission time slots into the neural network model to obtain the transmission code rate of the preset number of second transmission time slots; the first transmission time slot is a last transmission time slot of the second transmission time slot;

determining a loss value aiming at the transmission code rate according to the obtained transmission code rate of the second transmission time slot, the real transmission code rate in the transmission information of each transmission time slot in the sample video call and a preset loss function;

determining whether the neural network model converges according to the loss value;

if not, adjusting parameter values in the neural network model, and returning to the step of inputting the transmission information of a preset number of first transmission time slots into the neural network model;

and if so, determining the current neural network model as a code rate optimization network model.

Optionally, the loss function is:

l(π_θ(s_t),π^*(s_t))＝w(s)×H(π_θ(s_t),π^*(s_t))

wherein the content of the first and second substances,

a total loss value, s, representing a transmission code rate of a current transmission time slot output by the neural network model_tIndicating the current transmission time slot, pi_θ(s_t) A transmission code rate, pi, representing a current transmission time slot output by the neural network model^*(s_t) Representing the true transmission code rate, l (pi), of the current transmission time slot contained in said training set_θ(s_t),π^*(s_t) Represents a first loss of a transmission code rate of a current transmission time slot output by the neural network model, w(s) represents a weight function, and H (pi)_θ(s_t),π^*(s_t) Represents cross entropy loss, λ represents a preset superposition weight, C represents a preset constant, | pi_θ(s_t) -phi (t, k) l represents a second loss of the transmission code rate of the current transmission slot output by the neural network model, phi (t, k) represents a weighting of the transmission code rate of k transmission slots before the current transmission slot, s_t-iIndicating the historical transmission time slot of i transmission time slots adjacent to the current transmission time slot, and k indicating the preset time slot number.

In order to achieve the above object, an embodiment of the present invention further provides a video call device based on deep mock learning, where the device includes:

the acquisition module is used for acquiring the transmission information of the last transmission time slot aiming at the current transmission time slot of the video call; the transmission information includes: transport layer information and application layer information;

the input module is used for inputting the transmission information into a code rate optimization network model to obtain the transmission code rate of the current transmission time slot; the code rate optimization network model is a model obtained by training according to a training set, wherein the training set comprises: real transmission information and real transmission code rate of each transmission time slot in the sample video call;

and the sending module is used for sending the video call data to the receiving end based on the transmission code rate of the current transmission time slot.

Optionally, the obtaining module is specifically configured to:

and acquiring feedback information of the receiving end aiming at the last transmission time slot, and determining the packet loss rate, the inter-packet delay and the throughput data of the last transmission time slot based on the feedback information.

Optionally, the apparatus further includes a training module, where the training module is configured to train the rate optimization network model according to the following steps:

acquiring a preset neural network model and the training set;

Optionally, the loss function is:

l(π_θ(s_t),π^*(s_t))＝w(s)×H(π_θ(s_t),π^*(s_t))

wherein the content of the first and second substances,

In order to achieve the above object, an embodiment of the present invention further provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing any method step when executing the program stored in the memory.

To achieve the above object, an embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements any of the above method steps.

By applying the video call method based on deep simulation learning provided by the embodiment of the invention, a sending end of a video call acquires transmission information of a previous transmission time slot aiming at a current transmission time slot of the video call; the transmission information includes: transport layer information and application layer information; inputting the transmission information into a code rate optimization network model to obtain the transmission code rate of the current transmission time slot; and sending the video call data to a receiving end based on the transmission code rate of the current transmission time slot. The code rate optimization network model is obtained by pre-training according to the real transmission information and the real transmission code rate of each transmission time slot in the sample video call. Therefore, the historical transmission information of the application layer and the transmission layer in the video call can be fused to determine the optimal transmission code rate, so that the problem that the proper transmission code rate cannot be determined due to the fact that the application layer and the transmission layer are not coordinated in the existing video call is solved, and the video call quality is improved.

Of course, it is not necessary for any product or method of practicing the invention to achieve all of the above-described advantages at the same time.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a schematic flowchart of a video call method based on deep emulation learning according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a video call based on deep emulation learning according to an embodiment of the present invention;

FIG. 3 is a schematic flow chart of a training code rate optimization network model according to an embodiment of the present invention;

FIG. 4 is a schematic structural diagram of a neural network model according to an embodiment of the present invention;

fig. 5 is a schematic structural diagram of a video call device based on deep emulation learning according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In order to solve the technical problem that in the existing video call technology, due to the fact that an application layer and a transmission layer are not coordinated, an appropriate transmission code rate cannot be determined, and therefore video call quality is not high, the embodiment of the invention provides a video call method and device based on deep simulation learning, electronic equipment and a computer readable storage medium.

For ease of understanding, the following description will first describe an application scenario of the embodiment of the present invention.

The embodiment of the invention can be applied to a video call scene, the video call is essentially a process that two parties send data to each other, and in the video call, one party can be regarded as a sending end, and the other party can be regarded as a receiving end. The sending end sends video call data to the receiving end in each transmission time slot, and therefore video call is achieved. The video call method based on deep simulation learning provided by the embodiment of the invention can be applied to a sending end of video call.

Referring to fig. 1, a video call method based on deep emulation learning according to an embodiment of the present invention may include the following steps:

s101: and acquiring the transmission information of the last transmission time slot aiming at the current transmission time slot, wherein the transmission information comprises transmission layer information and application layer information.

In the embodiment of the present invention, the time interval between adjacent transmission timeslots may be set according to actual situations, for example, the time interval is set to be 1 second.

In the embodiment of the invention, in the video call process, the sending end can determine the appropriate transmission code rate to be adopted by the current transmission time slot based on the transmission information of the last transmission time slot, the appropriate transmission code rate can be understood as the maximum transmission code rate under the premise of not generating network congestion, and the transmission code rate is positively correlated with the transmission rate of the video data, namely, the higher the determined transmission code rate is, the higher the transmission rate of the video data is.

Wherein the transmission information may include transmission layer information and application layer information, and those skilled in the art can understand that the transmission layer information represents information related to data transmission and the application layer information represents data related to a video codec.

S102: inputting the transmission information into a code rate optimization network model to obtain the transmission code rate of the current transmission time slot; the code rate optimization network model is a model obtained by training according to a training set, wherein the training set comprises: the transmission information and the real transmission code rate of each transmission time slot in the sample video call;

in the embodiment of the invention, the proper transmission code rate to be adopted by the current transmission time slot can be determined according to the neural network model. Specifically, the transmission information of the last transmission time slot is input into a code rate optimization network model, the code rate optimization network model is trained in advance according to the real transmission information of each transmission time slot in the sample video call and the real transmission code rate, and the transmission code rate suitable for the current transmission time slot can be output.

In one embodiment of the present invention, the transmission layer information may include a packet loss rate and an inter-packet delay, and the application layer information may include a transmission code rate and a throughput.

As an example, assuming the current transmission time slot is t +1, the input to the rate optimized network model may be represented as

Wherein the content of the first and second substances,

a packet loss rate sequence representing the last transmission slot,

an inter-packet delay sequence representing the last transmission slot,

a sequence of transmission code rates, which represents the last transmission slot, can also be understood as the data transmission rate of the last transmission slot,

representing the throughput sequence of the last transmission slot.

In an embodiment of the present invention, obtaining the transmission information of the last transmission timeslot may include the following steps:

step 11: acquiring a transmission code rate output by a code rate optimization network model in the last transmission time slot;

because the transmission code rate of the last transmission time slot is also output by the code rate optimization network model, the transmission code rate output by the code rate optimization network model in the last transmission time slot can be directly obtained aiming at the transmission code rate in the transmission information, namely the transmission code rate of the last transmission time slot in the video call.

Step 12: and acquiring feedback information of the receiving end aiming at the last transmission time slot, and determining the packet loss rate, the inter-packet delay and the throughput data of the last transmission time slot based on the feedback information.

In the embodiment of the invention, the sending end can determine the packet loss rate, the inter-packet delay and the throughput data of the last transmission time slot based on the feedback information of the receiving end.

Specifically, after receiving the video call data of the previous transmission time slot, the receiving end may generate feedback information according to the receiving condition of the video call data and feed the feedback information back to the sending end. The feedback information may be an Acknowledgement Character (ACK) or the like.

Taking ACK as an example, the communication protocol specifies that after receiving data, the receiving end needs to feed back ACK information to the sending end, and the ACK information generally has a fixed format. Therefore, the sending end can determine the transmission information of the data in the last transmission time slot according to the ACK information fed back by the receiving end, wherein the transmission information comprises the packet loss rate of a transmission layer, the inter-packet delay, the throughput of an application layer and the like.

As just one example, the sending end may also determine the packet loss rate, the inter-packet delay, and the throughput data of the last transmission timeslot by using other manners, which is not limited in the embodiment of the present invention.

S103: and sending the video call data to a receiving end based on the transmission code rate of the current transmission time slot.

In S102, the transmission code rate of the current transmission time slot output by the code rate optimization network model is the optimal transmission code rate of the current transmission time slot, and may also be understood as the maximum transmission code rate that does not cause network congestion for the video call data to be transmitted in the current time slot.

Further, the sending end may send the video call data to the receiving end based on the determined transmission code rate of the current transmission timeslot. Specifically, the encoder at the transmitting end may perform the compilation and conversion of the data stream based on the determined transmission code rate. In addition, the transmitting end may map the determined transmission code rate to a data transmission rate of the transport layer, thereby transmitting the video call data to the receiving end at the determined data transmission rate.

For ease of understanding, fig. 2 is a schematic diagram of a video call based on deep emulation learning according to an embodiment of the present invention, which is further described below with reference to fig. 2. As shown in fig. 2, the sending end obtains the transmission information of the previous transmission time slot, including the packet loss rate, the inter-packet delay, the throughput, and the transmission code rate, and inputs the transmission information into the code rate optimization network model to obtain the transmission code rate of the current time slot. And then sending the video call data to the receiving end based on the determined transmission code rate of the current transmission time slot. The receiving end generates feedback information based on the receiving condition of the video call data and feeds the feedback information back to the sending end, and the sending end determines the packet loss rate, the inter-packet delay and the throughput in the transmission information based on the feedback information of the receiving end.

In an embodiment of the present invention, the rate optimization network model may be trained according to the following method, see fig. 3, including the following steps:

s301: and acquiring a preset neural network model and a training set.

In the embodiment of the invention, the preset neural network model can comprise a convolution layer and a full connection layer; the training set comprises transmission information of each transmission time slot in the sample video call, the sample video call can be a video call collected in actual live broadcast, and the transmission information of each transmission time slot can comprise transmission code rate, packet loss rate, inter-packet delay and throughput of each time slot in the video call.

For convenience of understanding, the structure and operation of the neural network model provided by the embodiment of the present invention are described below.

As an example, see fig. 4, where fig. 4 provides for an embodiment of the inventionA schematic diagram of a neural network model. 4 independent convolutional layers for extracting input characteristics including packet loss rate sequence

Inter-packet delay sequence

Transmission code rate sequence

Throughput sequence

Where each convolutional layer may use a kernel of size 3 x 3 and 64 filters to extract features.

After convolution, the convolution is performed through an activation function, and the embodiment of the present invention is not limited to the use of activation functions such as a Linear rectification function (RecU), a Leaky Linear rectification function (leak RecU), and the like. Taking Leaky ReLU as an example, the activation function can keep a non-zero gradient in the whole training stage, effectively avoids gradient disappearance and can accelerate training time.

Further, zero padding (zero padding) in deep learning can be used to normalize the 4 input components to the same size, facilitating accelerated training and creating a feature map with uniform size, thereby facilitating fusion of features.

The 4 feature maps are input into the full link layer after fusion. As an example, as shown in fig. 4, the fused 64 feature maps are input to the next layer through the ReLU activation function. Then 128 signatures are passed through the ReLU activation function and input to the next layer. Then 256 profiles are passed through the ReLU activation function and input to the next layer. And finally, outputting the result, namely the transmission code rate of the current time slot, by a normalized soft maximization (softmax) distribution function in the full connection layer.

S302: inputting the transmission information of a preset number of first transmission time slots into the neural network model to obtain the transmission code rate of the preset number of second transmission time slots; the first transmission time slot is a last transmission time slot of the second transmission time slot;

in the embodiment of the present invention, the training process of the neural network model may be performed in batch, that is, the transmission information of the preset number of first transmission time slots is input into the neural network model each time, and the neural network model may output the transmission code rate of the preset number of second transmission time slots through the above operation. And the first transmission time slot is the last transmission time slot of the second transmission time slot.

For example, if the transmission information of 10 first transmission slots is input each time, the 10 first transmission slots are denoted as t₁-t₁₀Then the transmission code rate of 10 second transmission slots can be output, and then the 10 second transmission slots are t₂-t₁₁In which time slots t are transmitted₁For transmission of time slots t₂Last transmission time slot of, transmission time slot t₂For transmission of time slots t₃The last transmission slot of (c), and so on.

S303: and determining a loss value aiming at the transmission code rate according to the obtained transmission code rate of the second transmission time slot, the real transmission code rate in the transmission information of each transmission time slot in the sample video call and a preset loss function.

In the embodiment of the invention, the transmission code rate output by the neural network model and the real transmission code rate contained in the data set can be input into a preset loss function, and the loss value aiming at the transmission code rate is determined.

In the embodiment of the present invention, the loss value is obtained by using, but not limited to, Mean Squared Error (MSE) formula as the loss function.

S304: and determining whether the neural network model converges according to the loss value, otherwise, executing the step S305, and if so, executing the step S306.

When the loss value does not exceed the preset loss threshold, the neural network model may be considered to have converged. In addition, the maximum number of iterations may also be preset, and when the maximum number of iterations is reached, the neural network model may also be considered to have converged, which is not limited.

S305: and adjusting the parameter values in the neural network model, and returning to execute the step S302.

S306: and determining the current neural network model as a code rate optimization network model.

It can be seen that, in the embodiment of the present invention, unlike the prior art in which the transmission rate is determined by blindly following the estimation of the network capacity by the transmission layer, the neural network model is trained according to the historical transmission information in the video call. The trained neural network model can output a proper transmission code rate, video data is transmitted at the proper transmission code rate, and the quality of video call can be improved.

In an embodiment of the invention, in order to train a code rate optimization network model more conforming to the field of video calls, and further determine a transmission code rate more conforming to the video calls, the quality of the video calls is further improved, and a loss function can be improved.

Specifically, in the embodiment of the present invention, the loss function may be set based on two aspects of suppressing the transmission code rate from being too large and maintaining the smoothness of the transmission code rate, which are respectively described below.

On the first hand, network congestion is easily caused when the transmission code rate is large, and the video call quality is seriously influenced. Therefore, in the process of training the code rate optimization network model, the situation that the trained transmission code rate is higher than the real transmission code rate is limited. Specifically, the following weight function may be preset:

wherein w(s) represents a weight function, π_θ(s) Transmission Rate, π, representing the neural network model output^*(s) represents the true transmission rate in the sample video call, and C represents a preset constant, which reflects an extra penalty of too high a transmission rate.

The first loss of the transmission code rate of the current transmission slot output by the neural network model can be expressed as:

l(π_θ(s_t),π^*(s_t))＝w(s)×H(π_θ(s_t),π^*(s_t))

wherein, H (pi)_θ(s_t),π^*(s_t) Represents cross entropy loss.

It can be seen that after each training iteration, if the transmission code rate output by the neural network model is greater than the real transmission code rate, a preset constant C is additionally added when the weight function value is calculated, and the corresponding loss function value is larger; if the transmission code rate output by the neural network model is not larger than the real transmission code rate, a preset constant C is not required to be added when the weight function value is calculated, and the corresponding loss function value is smaller. Finally, the trained code rate optimization network model can effectively reduce the output of larger transmission code rate, network congestion caused by larger transmission code rate is avoided as much as possible, and the video call quality is further improved.

In the second aspect, since the video call quality is also affected if the transmission code rate variation range is large in the video call, in the training process of the code rate optimization network model, smoothness of the video may be further considered, specifically, a second loss of the transmission code rate may be defined, that is, a loss related to smoothness of the video is:

||π_θ(s_t)-φ(t,k)||

wherein s is_tIndicating the current transmission time slot, pi_θ(s_t) Transmission code rate, s, representing the current transmission time slot output by the neural network model_t-iIndicating historical transmission slots of i transmission slots adjacent to the current transmission slot, phi (t, k) indicating a weighted value of transmission code rates of k transmission slots before the current transmission slot, k indicating a preset number of slots, and the value of k being set according to practical situations, for example, setting k to 3, that is, considering transmission code rates of three transmission slots before the current transmission slot.

Therefore, in the training process, the current transmission time slot output by the neural network model is transmittedThe closer the code rate is to the weighted value of the transmission code rate of k transmission time slots before the current transmission time slot, the smoother the transmission code rate is, the corresponding pi_θ(s_t) The smaller the value of-phi (t, k), i.e. the smaller the value of the loss function. Finally, the trained code rate optimization network model can output a relatively smooth transmission code rate, the influence caused by large variation amplitude of the transmission code rate is avoided as much as possible, and the video call quality is further improved.

It should be noted that, the modifications made to the loss function in the above two aspects may be used in any optional manner, or may be used in an overlapping manner, which is not limited in the embodiment of the present invention.

If the superposition uses the improvements made to the loss function for the two aspects, the final loss function can be expressed as:

l(π_θ(s_t),π^*(s_t))＝w(s)×H(π_θ(s_t),π^*(s_t))

wherein the content of the first and second substances,

Based on the same inventive concept, according to the above video call method embodiment based on deep imitation learning, the embodiment of the present invention further provides a video call device based on deep imitation learning, referring to fig. 5, which may include the following modules:

an obtaining module 501, configured to obtain, for a current transmission timeslot of a video call, transmission information of a previous transmission timeslot; the transmission information includes: transmitting code rate, packet loss rate, inter-packet delay and throughput data;

an input module 502, configured to input the transmission information into a code rate optimization network model to obtain a transmission code rate of a current transmission timeslot; the code rate optimization network model is a model obtained by training according to a training set, wherein the training set comprises: real transmission information of each transmission time slot in the sample video call;

a sending module 503, configured to send video call data to the receiving end based on the transmission code rate of the current transmission timeslot.

In an embodiment of the present invention, the obtaining module 501 may be specifically configured to:

In an embodiment of the present invention, the apparatus may further include a training module, where the training module is configured to train the rate optimization network model according to the following steps:

acquiring a preset neural network model and the training set;

In one embodiment of the invention, the loss function may be:

l(π_θ(s_t),π^*(s_t))＝w(s)×H(π_θ(s_t),π^*(s_t))

wherein the content of the first and second substances,

By applying the video call device based on deep simulation learning provided by the embodiment of the invention, a sending end of a video call acquires transmission information of a previous transmission time slot aiming at a current transmission time slot of the video call; the transmission information includes: transport layer information and application layer information; inputting the transmission information into a code rate optimization network model to obtain the transmission code rate of the current transmission time slot; and sending the video call data to a receiving end based on the transmission code rate of the current transmission time slot. The code rate optimization network model is obtained by pre-training according to the real transmission information and the real transmission code rate of each transmission time slot in the sample video call. Therefore, the historical transmission information of the application layer and the transmission layer in the video call can be fused to determine the optimal transmission code rate, so that the problem that the proper transmission code rate cannot be determined due to the fact that the application layer and the transmission layer are not coordinated in the existing video call is solved, and the video call quality is improved.

Based on the same inventive concept, according to the above video call method embodiment based on deep emulation learning, an embodiment of the present invention further provides an electronic device, as shown in fig. 6, including a processor 601, a communication interface 602, a memory 603, and a communication bus 604, where the processor 601, the communication interface 602, and the memory 603 complete mutual communication via the communication bus 604,

a memory 603 for storing a computer program;

the processor 601 is configured to implement the following steps when executing the program stored in the memory 603:

The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.

The communication interface is used for communication between the electronic equipment and other equipment.

The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.

The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.

By applying the electronic equipment provided by the embodiment of the invention, the sending end of the video call acquires the transmission information of the last transmission time slot aiming at the current transmission time slot of the video call; the transmission information includes: transport layer information and application layer information; inputting the transmission information into a code rate optimization network model to obtain the transmission code rate of the current transmission time slot; and sending the video call data to a receiving end based on the transmission code rate of the current transmission time slot. The code rate optimization network model is obtained by pre-training according to the real transmission information and the real transmission code rate of each transmission time slot in the sample video call. Therefore, the historical transmission information of the application layer and the transmission layer in the video call can be fused to determine the optimal transmission code rate, so that the problem that the proper transmission code rate cannot be determined due to the fact that the application layer and the transmission layer are not coordinated in the existing video call is solved, and the video call quality is improved.

Based on the same inventive concept, according to the above-mentioned video call method based on deep imitation learning, in another embodiment provided by the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when being executed by a processor, implements any of the video call method steps based on deep imitation learning shown in fig. 1 to 4.

It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the video call apparatus embodiment, the electronic device embodiment and the computer storage medium embodiment based on the deep emulation learning, since they are substantially similar to the video call method embodiment based on the deep emulation learning, the description is relatively simple, and relevant points can be referred to the partial description of the video call method embodiment based on the deep emulation learning.

The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims

1. A video call method based on deep imitation learning, which is characterized by comprising the following steps:

sending video call data to a receiving end based on the transmission code rate of the current transmission time slot;

the code rate optimization network model is trained according to the following method:

acquiring a preset neural network model and the training set;

if yes, determining the current neural network model as a code rate optimization network model;

the loss function is:

wherein the content of the first and second substances,

2. The method of claim 1, wherein the transmission layer information comprises a packet loss rate and an inter-packet delay, and wherein the application layer information comprises a transmission code rate and a throughput.

3. The method of claim 2, wherein obtaining the transmission information of the last transmission slot comprises:

4. A video call apparatus based on deep emulation learning, the apparatus comprising:

the sending module is used for sending the video call data to the receiving end based on the transmission code rate of the current transmission time slot;

the device further comprises a training module, wherein the training module is used for training the code rate optimization network model according to the following steps:

acquiring a preset neural network model and the training set;

the loss function is:

wherein the content of the first and second substances,

5. The apparatus of claim 4, wherein the transmission layer information comprises a packet loss rate and an inter-packet delay, and wherein the application layer information comprises a transmission code rate and a throughput.

6. The apparatus of claim 5, wherein the obtaining module is specifically configured to:

7. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;

a memory for storing a computer program;

a processor for implementing the method steps of any of claims 1 to 3 when executing a program stored in the memory.

8. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-3.