CN110809127B - Video call method and device based on deep simulation learning - Google Patents

Video call method and device based on deep simulation learning Download PDF

Info

Publication number
CN110809127B
CN110809127B CN201910960211.2A CN201910960211A CN110809127B CN 110809127 B CN110809127 B CN 110809127B CN 201910960211 A CN201910960211 A CN 201910960211A CN 110809127 B CN110809127 B CN 110809127B
Authority
CN
China
Prior art keywords
transmission
code rate
time slot
transmission time
network model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910960211.2A
Other languages
Chinese (zh)
Other versions
CN110809127A (en
Inventor
周安福
张欢欢
马若暄
苏光远
张新宇
马华东
陈虓将
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201910960211.2A priority Critical patent/CN110809127B/en
Publication of CN110809127A publication Critical patent/CN110809127A/en
Application granted granted Critical
Publication of CN110809127B publication Critical patent/CN110809127B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N7/00Television systems
    • H04N7/14Systems for two-way working
    • H04N7/141Systems for two-way working between two video terminals, e.g. videophone
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/04Network management architectures or arrangements
    • H04L41/044Network management architectures or arrangements comprising hierarchical management structures
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L41/00Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
    • H04L41/14Network analysis or design
    • H04L41/145Network analysis or design involving simulating, designing, planning or modelling of a network

Abstract

The embodiment of the invention provides a video call method and a device based on deep imitation learning, wherein the method comprises the following steps: acquiring transmission information of a previous transmission time slot aiming at a current transmission time slot of a video call; the transmission information includes: transport layer information and application layer information; inputting the transmission information into a code rate optimization network model to obtain the transmission code rate of the current transmission time slot; the code rate optimization network model is obtained by training according to a training set, wherein the training set comprises: real transmission information and real transmission code rate of each transmission time slot in the sample video call; and sending the video call data to a receiving end based on the transmission code rate of the current transmission time slot. The method and the device can determine the proper transmission code rate in the video call in real time and improve the video call quality.

Description

Video call method and device based on deep simulation learning
Technical Field
The embodiment of the invention relates to the technical field of communication, in particular to a video call method and device based on deep simulation learning.
Background
With the development of communication technology, real-time video call has become an indispensable part of people's life. And mobile wireless network applications such as crowd-sourced live broadcast, cloud video games, robotics, vehicle remote operation, etc. are constantly pushing the growth of video call traffic.
However, the quality of the existing video call is still not satisfactory enough, for example, during the video call, problems such as image blurring, image frame loss, jamming and the like may occur.
The main reasons for the low quality of the existing video call are as follows: the coordination between the application layer and the transport layer is not such that the appropriate bit rate for data transmission cannot be determined. Specifically, the transport layer typically updates the network capacity estimate at millisecond granularity to respond to network changes as dynamically as possible, while the video codec at the application layer can only change the video bit rate over a large time interval, resulting in the video encoder not being able to adjust the transmission rate in real time following the data transmission rate at the transport layer.
Therefore, in the existing video call technology, due to the incompatibility between the application layer and the transmission layer, the proper transmission specific rate cannot be determined, and the video call quality is not high.
Disclosure of Invention
The embodiment of the invention aims to provide a video call method and a video call device based on deep simulation learning, so as to determine a proper transmission code rate in a video call in real time and improve the video call quality. The specific technical scheme is as follows:
in order to achieve the above object, an embodiment of the present invention provides a video call method based on deep mock learning, where the method includes:
acquiring transmission information of a previous transmission time slot aiming at a current transmission time slot of a video call; the transmission information includes: transport layer information and application layer information;
inputting the transmission information into a code rate optimization network model to obtain the transmission code rate of the current transmission time slot; the code rate optimization network model is a model obtained by training according to a training set, wherein the training set comprises: real transmission information and real transmission code rate of each transmission time slot in the sample video call;
and sending the video call data to a receiving end based on the transmission code rate of the current transmission time slot.
Optionally, the transmission layer information includes a packet loss rate and an inter-packet delay, and the application layer information includes a transmission code rate and throughput.
Optionally, the obtaining the transmission information of the previous transmission timeslot includes:
acquiring the transmission code rate output by the code rate optimization network model in the last transmission time slot;
and acquiring feedback information of the receiving end aiming at the last transmission time slot, and determining the packet loss rate, the inter-packet delay and the throughput of the last transmission time slot based on the feedback information.
Optionally, the code rate optimization network model is trained according to the following method:
acquiring a preset neural network model and the training set;
inputting the transmission information of a preset number of first transmission time slots into the neural network model to obtain the transmission code rate of the preset number of second transmission time slots; the first transmission time slot is a last transmission time slot of the second transmission time slot;
determining a loss value aiming at the transmission code rate according to the obtained transmission code rate of the second transmission time slot, the real transmission code rate in the transmission information of each transmission time slot in the sample video call and a preset loss function;
determining whether the neural network model converges according to the loss value;
if not, adjusting parameter values in the neural network model, and returning to the step of inputting the transmission information of a preset number of first transmission time slots into the neural network model;
and if so, determining the current neural network model as a code rate optimization network model.
Optionally, the loss function is:
Figure BDA0002228662820000021
l(πθ(st),π*(st))=w(s)×H(πθ(st),π*(st))
Figure BDA0002228662820000031
Figure BDA0002228662820000032
wherein the content of the first and second substances,
Figure BDA0002228662820000033
a total loss value, s, representing a transmission code rate of a current transmission time slot output by the neural network modeltIndicating the current transmission time slot, piθ(st) A transmission code rate, pi, representing a current transmission time slot output by the neural network model*(st) Representing the true transmission code rate, l (pi), of the current transmission time slot contained in said training setθ(st),π*(st) Represents a first loss of a transmission code rate of a current transmission time slot output by the neural network model, w(s) represents a weight function, and H (pi)θ(st),π*(st) Represents cross entropy loss, λ represents a preset superposition weight, C represents a preset constant, | piθ(st) -phi (t, k) l represents a second loss of the transmission code rate of the current transmission slot output by the neural network model, phi (t, k) represents a weighting of the transmission code rate of k transmission slots before the current transmission slot, st-iIndicating the historical transmission time slot of i transmission time slots adjacent to the current transmission time slot, and k indicating the preset time slot number.
In order to achieve the above object, an embodiment of the present invention further provides a video call device based on deep mock learning, where the device includes:
the acquisition module is used for acquiring the transmission information of the last transmission time slot aiming at the current transmission time slot of the video call; the transmission information includes: transport layer information and application layer information;
the input module is used for inputting the transmission information into a code rate optimization network model to obtain the transmission code rate of the current transmission time slot; the code rate optimization network model is a model obtained by training according to a training set, wherein the training set comprises: real transmission information and real transmission code rate of each transmission time slot in the sample video call;
and the sending module is used for sending the video call data to the receiving end based on the transmission code rate of the current transmission time slot.
Optionally, the transmission layer information includes a packet loss rate and an inter-packet delay, and the application layer information includes a transmission code rate and throughput.
Optionally, the obtaining module is specifically configured to:
acquiring the transmission code rate output by the code rate optimization network model in the last transmission time slot;
and acquiring feedback information of the receiving end aiming at the last transmission time slot, and determining the packet loss rate, the inter-packet delay and the throughput data of the last transmission time slot based on the feedback information.
Optionally, the apparatus further includes a training module, where the training module is configured to train the rate optimization network model according to the following steps:
acquiring a preset neural network model and the training set;
inputting the transmission information of a preset number of first transmission time slots into the neural network model to obtain the transmission code rate of the preset number of second transmission time slots; the first transmission time slot is a last transmission time slot of the second transmission time slot;
determining a loss value aiming at the transmission code rate according to the obtained transmission code rate of the second transmission time slot, the real transmission code rate in the transmission information of each transmission time slot in the sample video call and a preset loss function;
determining whether the neural network model converges according to the loss value;
if not, adjusting parameter values in the neural network model, and returning to the step of inputting the transmission information of a preset number of first transmission time slots into the neural network model;
and if so, determining the current neural network model as a code rate optimization network model.
Optionally, the loss function is:
Figure BDA0002228662820000041
l(πθ(st),π*(st))=w(s)×H(πθ(st),π*(st))
Figure BDA0002228662820000042
Figure BDA0002228662820000051
wherein the content of the first and second substances,
Figure BDA0002228662820000052
a total loss value, s, representing a transmission code rate of a current transmission time slot output by the neural network modeltIndicating the current transmission time slot, piθ(st) A transmission code rate, pi, representing a current transmission time slot output by the neural network model*(st) Representing the true transmission code rate, l (pi), of the current transmission time slot contained in said training setθ(st),π*(st) Represents a first loss of a transmission code rate of a current transmission time slot output by the neural network model, w(s) represents a weight function, and H (pi)θ(st),π*(st) Represents cross entropy loss, λ represents a preset superposition weight, C represents a preset constant, | piθ(st) -phi (t, k) l represents a second loss of the transmission code rate of the current transmission slot output by the neural network model, phi (t, k) represents a weighting of the transmission code rate of k transmission slots before the current transmission slot, st-iIndicating the historical transmission time slot of i transmission time slots adjacent to the current transmission time slot, and k indicating the preset time slot number.
In order to achieve the above object, an embodiment of the present invention further provides an electronic device, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;
a memory for storing a computer program;
and the processor is used for realizing any method step when executing the program stored in the memory.
To achieve the above object, an embodiment of the present invention further provides a computer-readable storage medium, in which a computer program is stored, and the computer program, when executed by a processor, implements any of the above method steps.
By applying the video call method based on deep simulation learning provided by the embodiment of the invention, a sending end of a video call acquires transmission information of a previous transmission time slot aiming at a current transmission time slot of the video call; the transmission information includes: transport layer information and application layer information; inputting the transmission information into a code rate optimization network model to obtain the transmission code rate of the current transmission time slot; and sending the video call data to a receiving end based on the transmission code rate of the current transmission time slot. The code rate optimization network model is obtained by pre-training according to the real transmission information and the real transmission code rate of each transmission time slot in the sample video call. Therefore, the historical transmission information of the application layer and the transmission layer in the video call can be fused to determine the optimal transmission code rate, so that the problem that the proper transmission code rate cannot be determined due to the fact that the application layer and the transmission layer are not coordinated in the existing video call is solved, and the video call quality is improved.
Of course, it is not necessary for any product or method of practicing the invention to achieve all of the above-described advantages at the same time.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a video call method based on deep emulation learning according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a video call based on deep emulation learning according to an embodiment of the present invention;
FIG. 3 is a schematic flow chart of a training code rate optimization network model according to an embodiment of the present invention;
FIG. 4 is a schematic structural diagram of a neural network model according to an embodiment of the present invention;
fig. 5 is a schematic structural diagram of a video call device based on deep emulation learning according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
In order to solve the technical problem that in the existing video call technology, due to the fact that an application layer and a transmission layer are not coordinated, an appropriate transmission code rate cannot be determined, and therefore video call quality is not high, the embodiment of the invention provides a video call method and device based on deep simulation learning, electronic equipment and a computer readable storage medium.
For ease of understanding, the following description will first describe an application scenario of the embodiment of the present invention.
The embodiment of the invention can be applied to a video call scene, the video call is essentially a process that two parties send data to each other, and in the video call, one party can be regarded as a sending end, and the other party can be regarded as a receiving end. The sending end sends video call data to the receiving end in each transmission time slot, and therefore video call is achieved. The video call method based on deep simulation learning provided by the embodiment of the invention can be applied to a sending end of video call.
Referring to fig. 1, a video call method based on deep emulation learning according to an embodiment of the present invention may include the following steps:
s101: and acquiring the transmission information of the last transmission time slot aiming at the current transmission time slot, wherein the transmission information comprises transmission layer information and application layer information.
In the embodiment of the present invention, the time interval between adjacent transmission timeslots may be set according to actual situations, for example, the time interval is set to be 1 second.
In the embodiment of the invention, in the video call process, the sending end can determine the appropriate transmission code rate to be adopted by the current transmission time slot based on the transmission information of the last transmission time slot, the appropriate transmission code rate can be understood as the maximum transmission code rate under the premise of not generating network congestion, and the transmission code rate is positively correlated with the transmission rate of the video data, namely, the higher the determined transmission code rate is, the higher the transmission rate of the video data is.
Wherein the transmission information may include transmission layer information and application layer information, and those skilled in the art can understand that the transmission layer information represents information related to data transmission and the application layer information represents data related to a video codec.
S102: inputting the transmission information into a code rate optimization network model to obtain the transmission code rate of the current transmission time slot; the code rate optimization network model is a model obtained by training according to a training set, wherein the training set comprises: the transmission information and the real transmission code rate of each transmission time slot in the sample video call;
in the embodiment of the invention, the proper transmission code rate to be adopted by the current transmission time slot can be determined according to the neural network model. Specifically, the transmission information of the last transmission time slot is input into a code rate optimization network model, the code rate optimization network model is trained in advance according to the real transmission information of each transmission time slot in the sample video call and the real transmission code rate, and the transmission code rate suitable for the current transmission time slot can be output.
In one embodiment of the present invention, the transmission layer information may include a packet loss rate and an inter-packet delay, and the application layer information may include a transmission code rate and a throughput.
As an example, assuming the current transmission time slot is t +1, the input to the rate optimized network model may be represented as
Figure BDA0002228662820000081
Wherein the content of the first and second substances,
Figure BDA0002228662820000082
a packet loss rate sequence representing the last transmission slot,
Figure BDA0002228662820000083
an inter-packet delay sequence representing the last transmission slot,
Figure BDA0002228662820000084
a sequence of transmission code rates, which represents the last transmission slot, can also be understood as the data transmission rate of the last transmission slot,
Figure BDA0002228662820000085
representing the throughput sequence of the last transmission slot.
In an embodiment of the present invention, obtaining the transmission information of the last transmission timeslot may include the following steps:
step 11: acquiring a transmission code rate output by a code rate optimization network model in the last transmission time slot;
because the transmission code rate of the last transmission time slot is also output by the code rate optimization network model, the transmission code rate output by the code rate optimization network model in the last transmission time slot can be directly obtained aiming at the transmission code rate in the transmission information, namely the transmission code rate of the last transmission time slot in the video call.
Step 12: and acquiring feedback information of the receiving end aiming at the last transmission time slot, and determining the packet loss rate, the inter-packet delay and the throughput data of the last transmission time slot based on the feedback information.
In the embodiment of the invention, the sending end can determine the packet loss rate, the inter-packet delay and the throughput data of the last transmission time slot based on the feedback information of the receiving end.
Specifically, after receiving the video call data of the previous transmission time slot, the receiving end may generate feedback information according to the receiving condition of the video call data and feed the feedback information back to the sending end. The feedback information may be an Acknowledgement Character (ACK) or the like.
Taking ACK as an example, the communication protocol specifies that after receiving data, the receiving end needs to feed back ACK information to the sending end, and the ACK information generally has a fixed format. Therefore, the sending end can determine the transmission information of the data in the last transmission time slot according to the ACK information fed back by the receiving end, wherein the transmission information comprises the packet loss rate of a transmission layer, the inter-packet delay, the throughput of an application layer and the like.
As just one example, the sending end may also determine the packet loss rate, the inter-packet delay, and the throughput data of the last transmission timeslot by using other manners, which is not limited in the embodiment of the present invention.
S103: and sending the video call data to a receiving end based on the transmission code rate of the current transmission time slot.
In S102, the transmission code rate of the current transmission time slot output by the code rate optimization network model is the optimal transmission code rate of the current transmission time slot, and may also be understood as the maximum transmission code rate that does not cause network congestion for the video call data to be transmitted in the current time slot.
Further, the sending end may send the video call data to the receiving end based on the determined transmission code rate of the current transmission timeslot. Specifically, the encoder at the transmitting end may perform the compilation and conversion of the data stream based on the determined transmission code rate. In addition, the transmitting end may map the determined transmission code rate to a data transmission rate of the transport layer, thereby transmitting the video call data to the receiving end at the determined data transmission rate.
For ease of understanding, fig. 2 is a schematic diagram of a video call based on deep emulation learning according to an embodiment of the present invention, which is further described below with reference to fig. 2. As shown in fig. 2, the sending end obtains the transmission information of the previous transmission time slot, including the packet loss rate, the inter-packet delay, the throughput, and the transmission code rate, and inputs the transmission information into the code rate optimization network model to obtain the transmission code rate of the current time slot. And then sending the video call data to the receiving end based on the determined transmission code rate of the current transmission time slot. The receiving end generates feedback information based on the receiving condition of the video call data and feeds the feedback information back to the sending end, and the sending end determines the packet loss rate, the inter-packet delay and the throughput in the transmission information based on the feedback information of the receiving end.
By applying the video call method based on deep simulation learning provided by the embodiment of the invention, a sending end of a video call acquires transmission information of a previous transmission time slot aiming at a current transmission time slot of the video call; the transmission information includes: transport layer information and application layer information; inputting the transmission information into a code rate optimization network model to obtain the transmission code rate of the current transmission time slot; and sending the video call data to a receiving end based on the transmission code rate of the current transmission time slot. The code rate optimization network model is obtained by pre-training according to the real transmission information and the real transmission code rate of each transmission time slot in the sample video call. Therefore, the historical transmission information of the application layer and the transmission layer in the video call can be fused to determine the optimal transmission code rate, so that the problem that the proper transmission code rate cannot be determined due to the fact that the application layer and the transmission layer are not coordinated in the existing video call is solved, and the video call quality is improved.
In an embodiment of the present invention, the rate optimization network model may be trained according to the following method, see fig. 3, including the following steps:
s301: and acquiring a preset neural network model and a training set.
In the embodiment of the invention, the preset neural network model can comprise a convolution layer and a full connection layer; the training set comprises transmission information of each transmission time slot in the sample video call, the sample video call can be a video call collected in actual live broadcast, and the transmission information of each transmission time slot can comprise transmission code rate, packet loss rate, inter-packet delay and throughput of each time slot in the video call.
For convenience of understanding, the structure and operation of the neural network model provided by the embodiment of the present invention are described below.
As an example, see fig. 4, where fig. 4 provides for an embodiment of the inventionA schematic diagram of a neural network model. 4 independent convolutional layers for extracting input characteristics including packet loss rate sequence
Figure BDA0002228662820000104
Inter-packet delay sequence
Figure BDA0002228662820000101
Transmission code rate sequence
Figure BDA0002228662820000102
Throughput sequence
Figure BDA0002228662820000103
Where each convolutional layer may use a kernel of size 3 x 3 and 64 filters to extract features.
After convolution, the convolution is performed through an activation function, and the embodiment of the present invention is not limited to the use of activation functions such as a Linear rectification function (RecU), a Leaky Linear rectification function (leak RecU), and the like. Taking Leaky ReLU as an example, the activation function can keep a non-zero gradient in the whole training stage, effectively avoids gradient disappearance and can accelerate training time.
Further, zero padding (zero padding) in deep learning can be used to normalize the 4 input components to the same size, facilitating accelerated training and creating a feature map with uniform size, thereby facilitating fusion of features.
The 4 feature maps are input into the full link layer after fusion. As an example, as shown in fig. 4, the fused 64 feature maps are input to the next layer through the ReLU activation function. Then 128 signatures are passed through the ReLU activation function and input to the next layer. Then 256 profiles are passed through the ReLU activation function and input to the next layer. And finally, outputting the result, namely the transmission code rate of the current time slot, by a normalized soft maximization (softmax) distribution function in the full connection layer.
S302: inputting the transmission information of a preset number of first transmission time slots into the neural network model to obtain the transmission code rate of the preset number of second transmission time slots; the first transmission time slot is a last transmission time slot of the second transmission time slot;
in the embodiment of the present invention, the training process of the neural network model may be performed in batch, that is, the transmission information of the preset number of first transmission time slots is input into the neural network model each time, and the neural network model may output the transmission code rate of the preset number of second transmission time slots through the above operation. And the first transmission time slot is the last transmission time slot of the second transmission time slot.
For example, if the transmission information of 10 first transmission slots is input each time, the 10 first transmission slots are denoted as t1-t10Then the transmission code rate of 10 second transmission slots can be output, and then the 10 second transmission slots are t2-t11In which time slots t are transmitted1For transmission of time slots t2Last transmission time slot of, transmission time slot t2For transmission of time slots t3The last transmission slot of (c), and so on.
S303: and determining a loss value aiming at the transmission code rate according to the obtained transmission code rate of the second transmission time slot, the real transmission code rate in the transmission information of each transmission time slot in the sample video call and a preset loss function.
In the embodiment of the invention, the transmission code rate output by the neural network model and the real transmission code rate contained in the data set can be input into a preset loss function, and the loss value aiming at the transmission code rate is determined.
In the embodiment of the present invention, the loss value is obtained by using, but not limited to, Mean Squared Error (MSE) formula as the loss function.
S304: and determining whether the neural network model converges according to the loss value, otherwise, executing the step S305, and if so, executing the step S306.
When the loss value does not exceed the preset loss threshold, the neural network model may be considered to have converged. In addition, the maximum number of iterations may also be preset, and when the maximum number of iterations is reached, the neural network model may also be considered to have converged, which is not limited.
S305: and adjusting the parameter values in the neural network model, and returning to execute the step S302.
S306: and determining the current neural network model as a code rate optimization network model.
It can be seen that, in the embodiment of the present invention, unlike the prior art in which the transmission rate is determined by blindly following the estimation of the network capacity by the transmission layer, the neural network model is trained according to the historical transmission information in the video call. The trained neural network model can output a proper transmission code rate, video data is transmitted at the proper transmission code rate, and the quality of video call can be improved.
In an embodiment of the invention, in order to train a code rate optimization network model more conforming to the field of video calls, and further determine a transmission code rate more conforming to the video calls, the quality of the video calls is further improved, and a loss function can be improved.
Specifically, in the embodiment of the present invention, the loss function may be set based on two aspects of suppressing the transmission code rate from being too large and maintaining the smoothness of the transmission code rate, which are respectively described below.
On the first hand, network congestion is easily caused when the transmission code rate is large, and the video call quality is seriously influenced. Therefore, in the process of training the code rate optimization network model, the situation that the trained transmission code rate is higher than the real transmission code rate is limited. Specifically, the following weight function may be preset:
Figure BDA0002228662820000121
wherein w(s) represents a weight function, πθ(s) Transmission Rate, π, representing the neural network model output*(s) represents the true transmission rate in the sample video call, and C represents a preset constant, which reflects an extra penalty of too high a transmission rate.
The first loss of the transmission code rate of the current transmission slot output by the neural network model can be expressed as:
l(πθ(st),π*(st))=w(s)×H(πθ(st),π*(st))
wherein, H (pi)θ(st),π*(st) Represents cross entropy loss.
It can be seen that after each training iteration, if the transmission code rate output by the neural network model is greater than the real transmission code rate, a preset constant C is additionally added when the weight function value is calculated, and the corresponding loss function value is larger; if the transmission code rate output by the neural network model is not larger than the real transmission code rate, a preset constant C is not required to be added when the weight function value is calculated, and the corresponding loss function value is smaller. Finally, the trained code rate optimization network model can effectively reduce the output of larger transmission code rate, network congestion caused by larger transmission code rate is avoided as much as possible, and the video call quality is further improved.
In the second aspect, since the video call quality is also affected if the transmission code rate variation range is large in the video call, in the training process of the code rate optimization network model, smoothness of the video may be further considered, specifically, a second loss of the transmission code rate may be defined, that is, a loss related to smoothness of the video is:
||πθ(st)-φ(t,k)||
Figure BDA0002228662820000122
wherein s istIndicating the current transmission time slot, piθ(st) Transmission code rate, s, representing the current transmission time slot output by the neural network modelt-iIndicating historical transmission slots of i transmission slots adjacent to the current transmission slot, phi (t, k) indicating a weighted value of transmission code rates of k transmission slots before the current transmission slot, k indicating a preset number of slots, and the value of k being set according to practical situations, for example, setting k to 3, that is, considering transmission code rates of three transmission slots before the current transmission slot.
Therefore, in the training process, the current transmission time slot output by the neural network model is transmittedThe closer the code rate is to the weighted value of the transmission code rate of k transmission time slots before the current transmission time slot, the smoother the transmission code rate is, the corresponding piθ(st) The smaller the value of-phi (t, k), i.e. the smaller the value of the loss function. Finally, the trained code rate optimization network model can output a relatively smooth transmission code rate, the influence caused by large variation amplitude of the transmission code rate is avoided as much as possible, and the video call quality is further improved.
It should be noted that, the modifications made to the loss function in the above two aspects may be used in any optional manner, or may be used in an overlapping manner, which is not limited in the embodiment of the present invention.
If the superposition uses the improvements made to the loss function for the two aspects, the final loss function can be expressed as:
Figure BDA0002228662820000131
l(πθ(st),π*(st))=w(s)×H(πθ(st),π*(st))
Figure BDA0002228662820000132
Figure BDA0002228662820000133
wherein the content of the first and second substances,
Figure BDA0002228662820000134
a total loss value, s, representing a transmission code rate of a current transmission time slot output by the neural network modeltIndicating the current transmission time slot, piθ(st) A transmission code rate, pi, representing a current transmission time slot output by the neural network model*(st) Representing the true transmission code rate, l (pi), of the current transmission time slot contained in said training setθ(st),π*(st) Represents a first loss of a transmission code rate of a current transmission time slot output by the neural network model, w(s) represents a weight function, and H (pi)θ(st),π*(st) Represents cross entropy loss, λ represents a preset superposition weight, C represents a preset constant, | piθ(st) -phi (t, k) l represents a second loss of the transmission code rate of the current transmission slot output by the neural network model, phi (t, k) represents a weighting of the transmission code rate of k transmission slots before the current transmission slot, st-iIndicating the historical transmission time slot of i transmission time slots adjacent to the current transmission time slot, and k indicating the preset time slot number.
Based on the same inventive concept, according to the above video call method embodiment based on deep imitation learning, the embodiment of the present invention further provides a video call device based on deep imitation learning, referring to fig. 5, which may include the following modules:
an obtaining module 501, configured to obtain, for a current transmission timeslot of a video call, transmission information of a previous transmission timeslot; the transmission information includes: transmitting code rate, packet loss rate, inter-packet delay and throughput data;
an input module 502, configured to input the transmission information into a code rate optimization network model to obtain a transmission code rate of a current transmission timeslot; the code rate optimization network model is a model obtained by training according to a training set, wherein the training set comprises: real transmission information of each transmission time slot in the sample video call;
a sending module 503, configured to send video call data to the receiving end based on the transmission code rate of the current transmission timeslot.
In an embodiment of the present invention, the obtaining module 501 may be specifically configured to:
acquiring the transmission code rate output by the code rate optimization network model in the last transmission time slot;
and acquiring feedback information of the receiving end aiming at the last transmission time slot, and determining the packet loss rate, the inter-packet delay and the throughput data of the last transmission time slot based on the feedback information.
In an embodiment of the present invention, the apparatus may further include a training module, where the training module is configured to train the rate optimization network model according to the following steps:
acquiring a preset neural network model and the training set;
inputting the transmission information of a preset number of first transmission time slots into the neural network model to obtain the transmission code rate of the preset number of second transmission time slots; the first transmission time slot is a last transmission time slot of the second transmission time slot;
determining a loss value aiming at the transmission code rate according to the obtained transmission code rate of the second transmission time slot, the real transmission code rate in the transmission information of each transmission time slot in the sample video call and a preset loss function;
determining whether the neural network model converges according to the loss value;
if not, adjusting parameter values in the neural network model, and returning to the step of inputting the transmission information of a preset number of first transmission time slots into the neural network model;
and if so, determining the current neural network model as a code rate optimization network model.
In one embodiment of the invention, the loss function may be:
Figure BDA0002228662820000151
l(πθ(st),π*(st))=w(s)×H(πθ(st),π*(st))
Figure BDA0002228662820000152
Figure BDA0002228662820000153
wherein the content of the first and second substances,
Figure BDA0002228662820000154
a total loss value, s, representing a transmission code rate of a current transmission time slot output by the neural network modeltIndicating the current transmission time slot, piθ(st) A transmission code rate, pi, representing a current transmission time slot output by the neural network model*(st) Representing the true transmission code rate, l (pi), of the current transmission time slot contained in said training setθ(st),π*(st) Represents a first loss of a transmission code rate of a current transmission time slot output by the neural network model, w(s) represents a weight function, and H (pi)θ(st),π*(st) Represents cross entropy loss, λ represents a preset superposition weight, C represents a preset constant, | piθ(st) -phi (t, k) l represents a second loss of the transmission code rate of the current transmission slot output by the neural network model, phi (t, k) represents a weighting of the transmission code rate of k transmission slots before the current transmission slot, st-iIndicating the historical transmission time slot of i transmission time slots adjacent to the current transmission time slot, and k indicating the preset time slot number.
By applying the video call device based on deep simulation learning provided by the embodiment of the invention, a sending end of a video call acquires transmission information of a previous transmission time slot aiming at a current transmission time slot of the video call; the transmission information includes: transport layer information and application layer information; inputting the transmission information into a code rate optimization network model to obtain the transmission code rate of the current transmission time slot; and sending the video call data to a receiving end based on the transmission code rate of the current transmission time slot. The code rate optimization network model is obtained by pre-training according to the real transmission information and the real transmission code rate of each transmission time slot in the sample video call. Therefore, the historical transmission information of the application layer and the transmission layer in the video call can be fused to determine the optimal transmission code rate, so that the problem that the proper transmission code rate cannot be determined due to the fact that the application layer and the transmission layer are not coordinated in the existing video call is solved, and the video call quality is improved.
Based on the same inventive concept, according to the above video call method embodiment based on deep emulation learning, an embodiment of the present invention further provides an electronic device, as shown in fig. 6, including a processor 601, a communication interface 602, a memory 603, and a communication bus 604, where the processor 601, the communication interface 602, and the memory 603 complete mutual communication via the communication bus 604,
a memory 603 for storing a computer program;
the processor 601 is configured to implement the following steps when executing the program stored in the memory 603:
acquiring transmission information of a previous transmission time slot aiming at a current transmission time slot of a video call; the transmission information includes: transport layer information and application layer information;
inputting the transmission information into a code rate optimization network model to obtain the transmission code rate of the current transmission time slot; the code rate optimization network model is a model obtained by training according to a training set, wherein the training set comprises: real transmission information and real transmission code rate of each transmission time slot in the sample video call;
and sending the video call data to a receiving end based on the transmission code rate of the current transmission time slot.
The communication bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The communication bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
By applying the electronic equipment provided by the embodiment of the invention, the sending end of the video call acquires the transmission information of the last transmission time slot aiming at the current transmission time slot of the video call; the transmission information includes: transport layer information and application layer information; inputting the transmission information into a code rate optimization network model to obtain the transmission code rate of the current transmission time slot; and sending the video call data to a receiving end based on the transmission code rate of the current transmission time slot. The code rate optimization network model is obtained by pre-training according to the real transmission information and the real transmission code rate of each transmission time slot in the sample video call. Therefore, the historical transmission information of the application layer and the transmission layer in the video call can be fused to determine the optimal transmission code rate, so that the problem that the proper transmission code rate cannot be determined due to the fact that the application layer and the transmission layer are not coordinated in the existing video call is solved, and the video call quality is improved.
Based on the same inventive concept, according to the above-mentioned video call method based on deep imitation learning, in another embodiment provided by the present invention, a computer-readable storage medium is further provided, in which a computer program is stored, and the computer program, when being executed by a processor, implements any of the video call method steps based on deep imitation learning shown in fig. 1 to 4.
It is noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
All the embodiments in the present specification are described in a related manner, and the same and similar parts among the embodiments may be referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the video call apparatus embodiment, the electronic device embodiment and the computer storage medium embodiment based on the deep emulation learning, since they are substantially similar to the video call method embodiment based on the deep emulation learning, the description is relatively simple, and relevant points can be referred to the partial description of the video call method embodiment based on the deep emulation learning.
The above description is only for the preferred embodiment of the present invention, and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (8)

1. A video call method based on deep imitation learning, which is characterized by comprising the following steps:
acquiring transmission information of a previous transmission time slot aiming at a current transmission time slot of a video call; the transmission information includes: transport layer information and application layer information;
inputting the transmission information into a code rate optimization network model to obtain the transmission code rate of the current transmission time slot; the code rate optimization network model is a model obtained by training according to a training set, wherein the training set comprises: real transmission information and real transmission code rate of each transmission time slot in the sample video call;
sending video call data to a receiving end based on the transmission code rate of the current transmission time slot;
the code rate optimization network model is trained according to the following method:
acquiring a preset neural network model and the training set;
inputting the transmission information of a preset number of first transmission time slots into the neural network model to obtain the transmission code rate of the preset number of second transmission time slots; the first transmission time slot is a last transmission time slot of the second transmission time slot;
determining a loss value aiming at the transmission code rate according to the obtained transmission code rate of the second transmission time slot, the real transmission code rate in the transmission information of each transmission time slot in the sample video call and a preset loss function;
determining whether the neural network model converges according to the loss value;
if not, adjusting parameter values in the neural network model, and returning to the step of inputting the transmission information of a preset number of first transmission time slots into the neural network model;
if yes, determining the current neural network model as a code rate optimization network model;
the loss function is:
Figure FDA0002824861410000011
Figure FDA0002824861410000012
Figure FDA0002824861410000013
Figure FDA0002824861410000021
wherein the content of the first and second substances,
Figure FDA0002824861410000022
a total loss value, s, representing a transmission code rate of a current transmission time slot output by the neural network modeltIndicating the current transmission time slot, piθ(st) A transmission code rate, pi, representing a current transmission time slot output by the neural network model*(st) Representing the true transmission code rate, l (pi), of the current transmission time slot contained in said training setθ(st),π*(st) Represents a first loss of a transmission code rate of a current transmission time slot output by the neural network model, w(s) represents a weight function, and H (pi)θ(st),π*(st) Represents cross entropy loss, λ represents a preset superposition weight, C represents a preset constant, | piθ(st) -phi (t, k) l represents a second loss of the transmission code rate of the current transmission slot output by the neural network model, phi (t, k) represents a weighting of the transmission code rate of k transmission slots before the current transmission slot, st-iIndicating the historical transmission time slot of i transmission time slots adjacent to the current transmission time slot, and k indicating the preset time slot number.
2. The method of claim 1, wherein the transmission layer information comprises a packet loss rate and an inter-packet delay, and wherein the application layer information comprises a transmission code rate and a throughput.
3. The method of claim 2, wherein obtaining the transmission information of the last transmission slot comprises:
acquiring the transmission code rate output by the code rate optimization network model in the last transmission time slot;
and acquiring feedback information of the receiving end aiming at the last transmission time slot, and determining the packet loss rate, the inter-packet delay and the throughput of the last transmission time slot based on the feedback information.
4. A video call apparatus based on deep emulation learning, the apparatus comprising:
the acquisition module is used for acquiring the transmission information of the last transmission time slot aiming at the current transmission time slot of the video call; the transmission information includes: transport layer information and application layer information;
the input module is used for inputting the transmission information into a code rate optimization network model to obtain the transmission code rate of the current transmission time slot; the code rate optimization network model is a model obtained by training according to a training set, wherein the training set comprises: real transmission information and real transmission code rate of each transmission time slot in the sample video call;
the sending module is used for sending the video call data to the receiving end based on the transmission code rate of the current transmission time slot;
the device further comprises a training module, wherein the training module is used for training the code rate optimization network model according to the following steps:
acquiring a preset neural network model and the training set;
inputting the transmission information of a preset number of first transmission time slots into the neural network model to obtain the transmission code rate of the preset number of second transmission time slots; the first transmission time slot is a last transmission time slot of the second transmission time slot;
determining a loss value aiming at the transmission code rate according to the obtained transmission code rate of the second transmission time slot, the real transmission code rate in the transmission information of each transmission time slot in the sample video call and a preset loss function;
determining whether the neural network model converges according to the loss value;
if not, adjusting parameter values in the neural network model, and returning to the step of inputting the transmission information of a preset number of first transmission time slots into the neural network model;
if yes, determining the current neural network model as a code rate optimization network model;
the loss function is:
Figure FDA0002824861410000031
Figure FDA0002824861410000032
Figure FDA0002824861410000033
Figure FDA0002824861410000034
wherein the content of the first and second substances,
Figure FDA0002824861410000035
a total loss value, s, representing a transmission code rate of a current transmission time slot output by the neural network modeltIndicating the current transmission time slot, piθ(st) A transmission code rate, pi, representing a current transmission time slot output by the neural network model*(st) Representing the true transmission code rate, l (pi), of the current transmission time slot contained in said training setθ(st),π*(st) Represents a first loss of a transmission code rate of a current transmission time slot output by the neural network model, w(s) represents a weight function, and H (pi)θ(st),π*(st) Represents cross entropy loss, λ represents a preset superposition weight, C represents a preset constant, | piθ(st) -phi (t, k) l represents a second loss of the transmission code rate of the current transmission slot output by the neural network model, phi (t, k) represents a weighting of the transmission code rate of k transmission slots before the current transmission slot, st-iIndicating the historical transmission time slot of i transmission time slots adjacent to the current transmission time slot, and k indicating the preset time slot number.
5. The apparatus of claim 4, wherein the transmission layer information comprises a packet loss rate and an inter-packet delay, and wherein the application layer information comprises a transmission code rate and a throughput.
6. The apparatus of claim 5, wherein the obtaining module is specifically configured to:
acquiring the transmission code rate output by the code rate optimization network model in the last transmission time slot;
and acquiring feedback information of the receiving end aiming at the last transmission time slot, and determining the packet loss rate, the inter-packet delay and the throughput data of the last transmission time slot based on the feedback information.
7. An electronic device is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor and the communication interface are used for realizing mutual communication by the memory through the communication bus;
a memory for storing a computer program;
a processor for implementing the method steps of any of claims 1 to 3 when executing a program stored in the memory.
8. A computer-readable storage medium, characterized in that a computer program is stored in the computer-readable storage medium, which computer program, when being executed by a processor, carries out the method steps of any one of the claims 1-3.
CN201910960211.2A 2019-10-10 2019-10-10 Video call method and device based on deep simulation learning Active CN110809127B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910960211.2A CN110809127B (en) 2019-10-10 2019-10-10 Video call method and device based on deep simulation learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910960211.2A CN110809127B (en) 2019-10-10 2019-10-10 Video call method and device based on deep simulation learning

Publications (2)

Publication Number Publication Date
CN110809127A CN110809127A (en) 2020-02-18
CN110809127B true CN110809127B (en) 2021-03-19

Family

ID=69488132

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910960211.2A Active CN110809127B (en) 2019-10-10 2019-10-10 Video call method and device based on deep simulation learning

Country Status (1)

Country Link
CN (1) CN110809127B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111343412B (en) * 2020-03-31 2021-08-17 联想(北京)有限公司 Image processing method and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106656629A (en) * 2017-01-13 2017-05-10 南京理工大学 Prediction method for stream media playing quality
CN107276910A (en) * 2017-06-07 2017-10-20 上海迪爱斯通信设备有限公司 The real-time adjusting apparatus of video code rate and system, video server
CN108063961A (en) * 2017-12-22 2018-05-22 北京联合网视文化传播有限公司 A kind of self-adaption code rate video transmission method and system based on intensified learning
CN109218744A (en) * 2018-10-17 2019-01-15 华中科技大学 A kind of adaptive UAV Video of bit rate based on DRL spreads transmission method
CN109981225A (en) * 2019-04-12 2019-07-05 广州视源电子科技股份有限公司 A kind of code rate predictor method, device, equipment and storage medium

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI581578B (en) * 2010-02-26 2017-05-01 新力股份有限公司 Encoder and encoding method providing incremental redundancy
CN102802089B (en) * 2012-09-13 2014-12-31 浙江大学 Shifting video code rate regulation method based on experience qualitative forecast
CN106937073B (en) * 2015-12-29 2019-10-15 展讯通信(上海)有限公司 Video calling code rate adjustment method, device and mobile terminal based on VoLTE
CN109561310B (en) * 2017-09-26 2022-09-16 腾讯科技(深圳)有限公司 Video coding processing method, device, equipment and storage medium
CN110072119B (en) * 2019-04-11 2020-04-10 西安交通大学 Content-aware video self-adaptive transmission method based on deep learning network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106656629A (en) * 2017-01-13 2017-05-10 南京理工大学 Prediction method for stream media playing quality
CN107276910A (en) * 2017-06-07 2017-10-20 上海迪爱斯通信设备有限公司 The real-time adjusting apparatus of video code rate and system, video server
CN108063961A (en) * 2017-12-22 2018-05-22 北京联合网视文化传播有限公司 A kind of self-adaption code rate video transmission method and system based on intensified learning
CN109218744A (en) * 2018-10-17 2019-01-15 华中科技大学 A kind of adaptive UAV Video of bit rate based on DRL spreads transmission method
CN109981225A (en) * 2019-04-12 2019-07-05 广州视源电子科技股份有限公司 A kind of code rate predictor method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN110809127A (en) 2020-02-18

Similar Documents

Publication Publication Date Title
CN1878049B (en) Method of controlling transmission rate by using error correction packets and communication apparatus using the same
DE102019135861B3 (en) Latency reduction based on packet failure prediction
CN107342848A (en) A kind of adaptive code stream transmission method, device and equipment
Fang et al. Reinforcement learning for bandwidth estimation and congestion control in real-time communications
CN111629210A (en) Data processing method and device and electronic equipment
CN109818714B (en) Dynamic FEC method, device, computer terminal and computer readable storage medium
CN106488243A (en) A kind of many description screen content method for video coding
CN101854308A (en) Self-adaptation realizing method of high-tone quality service network of VoIP system
CN106507024A (en) A kind of self-adaption code rate method of adjustment and device
JP7356581B2 (en) Information processing methods, devices, equipment and computer readable storage media
CN105142002A (en) Audio/video live broadcasting method and device as well as control method and device
CN110809127B (en) Video call method and device based on deep simulation learning
CN110224728A (en) Multi-beam satellite system robust pre-coding method based on outage probability constraint
CN106789427A (en) A kind of transmission volume computational methods
CN103312469B (en) Confirmation in multicast retransmission represents system of selection and device
CN107749827A (en) Method for controlling network congestion, apparatus and system based on network state classification
CN104202257A (en) Satellite network congestion control method based on bandwidth estimation
DE112014004437T5 (en) System and method for matching audio frame generation with LTE transmission capabilities
CN101656807B (en) Networking telephone sending terminal and voice control method thereof
CN103414886A (en) Network video transmission method
CN107295667A (en) A kind of access-in resource method of adjustment and device
Huang et al. Airtime fair distributed cross-layer congestion control for real-time video over WLAN
Li et al. Fountain Coded Streaming for SAGIN With Learning-Based Pause-and-Listen
Mikhailenko et al. Analysis of the adaptive neural network router
Thorsager et al. Generative Network Layer for Communication Systems with Artificial Intelligence

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant