CN114785757A

CN114785757A - Multipath transmission control method for real-time session service

Info

Publication number: CN114785757A
Application number: CN202210329683.XA
Authority: CN
Inventors: 雷为民; 聂金雨; 张伟; 赵莹月; 刘壮志
Original assignee: Northeastern University China
Current assignee: Northeastern University China
Priority date: 2022-03-31
Filing date: 2022-03-31
Publication date: 2022-07-22
Anticipated expiration: 2042-03-31
Also published as: CN114785757B

Abstract

The invention discloses a multipath transmission control method facing real-time session service, which comprises the steps of firstly configuring corresponding buffer areas, distributors, controllers and recombiners at a sending end and a receiving end; then the application data is sent to a sending total buffer area and transmitted to a load distributor, and the load distributor divides the application data into a plurality of data packets and encapsulates the data packets; then, distributing the encapsulated data packet to a plurality of sub-path sending sub-buffers by using a load distribution strategy; then the multipath congestion controller controls the data sending rate of each sub-path through a multipath congestion control strategy based on-line learning, and transmits the data packet of each sub-path sending sub-buffer area to the corresponding sub-path receiving sub-buffer area; and finally, the sub-stream recombiner sends the data packets in the sub-receiving sub-buffers of each sub-path to the sub-stream recombination buffer, and sequentially recombines the data packets in the sub-packets and delivers the data packets to an application layer. The method of the invention meets the high throughput of the CBR service and simultaneously achieves smaller round-trip delay.

Description

Multipath transmission control method for real-time session service

Technical Field

The invention relates to the technical field of transmission optimization, in particular to a multi-path transmission control method for real-time session services.

Background

Network video has become a new hotspot for the growth of internet services, and video traffic accounts for 82% of the total IP traffic by the year 2022, as reported by Cisco corporation "VNI global fixed network and mobile internet traffic prediction (2017-. The current network video service modes comprise video on demand, entertainment and selling live broadcast, short video APP, online education and the like, and most of videos mainly comprise 'streaming videos'. Unlike the "streaming video" such as entertainment and content services, the "session video" mainly functions as "videophone" and emphasizes real-time communication. At present, applications such as 'video call' of WeChat and 'video conference' of nailing become an important tool for people to communicate and communicate gradually. How to improve the QoE experience of a user on video call under the existing network environment becomes an important research direction on the video services.

The experience quality of the current session video service excessively depends on resource scheduling of network infrastructure, which causes a technical barrier for the service, after years of development of the infrastructure of the internet, an access network which causes network bandwidth bottleneck in the early stage already has sufficient resources, and the network bandwidth bottleneck problem is gradually transferred to a backbone network and an edge network, so that another possible solution is provided for optimizing the session video service, namely, the network is optimized by using a multipath transmission mechanism, and a plurality of available link resources are aggregated by rescheduling the existing network resources, so that strict service requirements of high bandwidth and low delay are realized on a best-effort IP network, and meanwhile, the utilization rate of the network resources can also be improved.

One of the important research points in the multipath transmission control mechanism is the multipath congestion control mechanism. In the network of the 'best effort' mode, the router always receives the arriving data packet and forwards the data packet according to the own routing table, and when the speed of the data packet arriving at the router is higher than the speed of the forwarding processing of the data packet, the router can cache the data packet in the own memory for processing. However, the problem is that the memory space of the router is limited, and when the cached data packet is full, the new data packet is discarded. Therefore, it is important to control the sending rate of data in the network by using a congestion control mechanism, and a congestion control algorithm commonly used in the current network is to continuously increase a congestion window to fully utilize a bottleneck bandwidth, which may result in an increase of queue delay, which is obviously disadvantageous to delay sensitivity of real-time conversational services. In addition, multipath transmission control also needs to solve two problems, one is the load distribution problem of the sub-path, and the other is the sub-flow recombination problem of the sub-path. Therefore, a multipath transmission control method capable of sufficiently utilizing bandwidth resources and minimizing delay is required.

In conclusion, multipath transmission control is an important part in improving the QoE experience of the user for the real-time conversational service, and a reasonable transmission control mechanism can effectively improve the transmission quality. Meanwhile, considering that the service is essentially a CBR (Constant Bit Rate) service and the service requirement changes from "high throughput and low delay" to "minimum delay meeting a certain throughput", the invention provides a multipath transmission control method more suitable for real-time session services, and the method comprises multipath congestion control, sub-path load distribution and sub-flow recombination mechanism for sub-path to ensure the service requirement.

Disclosure of Invention

Aiming at the defects of the prior art, the invention provides a multi-path transmission control method facing to real-time conversation services.

In order to solve the technical problems, the technical scheme adopted by the invention is as follows: a multipath transmission control method facing to real-time conversation service includes the following steps:

step 1: setting a sending total buffer area, a load distributor, a multipath congestion controller and a plurality of sub-path sending sub-buffer areas at a sending end; a plurality of sub-path receiving sub-buffers, sub-stream recombiners and sub-stream recombination buffers corresponding to the sending end are arranged at the receiving end;

and 2, step: a sending total buffer area of a sending end receives application data from an application layer and transmits the application data to a load distributor;

and step 3: after receiving the application data, the load distributor divides the data into a plurality of data packets and encapsulates the data packets;

the specific method comprises the following steps: the application data is grouped according to video frames, each video frame is a group and is packaged into a plurality of data packets, and each data packet head comprises a sequence number, a group identifier and an intra-group identifier.

And 4, step 4: the load distributor acquires the state information of each sub-path, calculates the load distribution rate and the path cost of each sub-path by using a load distribution strategy, and distributes the packaged data packet to a plurality of sub-path sending sub-buffer areas;

further, the state information of the sub-path includes: packet loss rate ρ_iNetwork delay r_iAnd a sub path transmission sub buffer data size and transmission rate.

Further, the network delay r_iThe method comprises the following steps of path transmission delay, router queuing delay and data processing delay of each buffer area:

sending RTT detection packet, obtaining round trip RTT of current sub-path, taking average RTT of round trip RTT_a，r_iApproximate taking

ri＝RTT_a/2

The value is RTT_aHalf of (i.e., half of (b).

Further, the process of calculating the load distribution ratio and the path cost of each sub-path by using the load distribution policy and distributing the encapsulated data packet to the plurality of sub-path transmission sub-buffers is as follows:

step 4.1: calculating the total time delay d of each sub-path_i：

Wherein r is_iNetwork delay for sub-path, Q_iRepresents the size of the data in the current sub-path transmission sub-buffer, sr_iIndicating the current sub-path data transmission rate, p_iRepresenting the packet loss rate of the current sub-path;

step 4.2: calculating a load distribution ratio alpha of each sub path_i：

Wherein, d_iFor the total delay of each sub-path, n is the number of sub-paths, and the load distribution ratio alpha_iIndicating the proportion of the total data which should be planned and distributed by the current sub-path;

step 4.3: calculating the path cost sp of each sub-path_i：

Where N represents the total amount of data to be transmitted, Q_i+α_iN represents the sum of the existing data of the current sub-path sending sub-buffer and the planned distributed data quantity;

step 4.4: the load distributor distributes the data packet from the total sending buffer to the path cost sp each time_iThe minimum sub-path is sent to the sub-buffer area until the distribution data volume of the sub-path reaches the planned distribution data volume or the sp thereof_iNo longer at a minimum.

And 5: the multipath congestion controller acquires the throughput and time delay information of each sub-path, controls the data sending rate of each sub-path through a multipath congestion control strategy based on-line learning, and transmits the data packet of each sub-path sending sub-buffer area to the corresponding sub-path receiving sub-buffer area;

further, the specific process of the multi-path congestion control strategy based on online learning is as follows:

step 5.1: each sub-path firstly runs a slow start algorithm in the traditional TCP Reno algorithm, and when a congestion window exceeds a slow start threshold or packet loss occurs, the sub-path permanently exits from a slow start algorithm stage and enters into an AIMD stage;

step 5.2: modeling the adjustment of the congestion window cwnd in each sub-path AIMD stage as an online learning multi-arm gambling machine MAB model, and defining the updating rule of each sub-path congestion window as follows:

the method comprises the steps that 5 values of beta are modeled as 5 rocker arms in an MAB model, when a sub-path is congested, namely packet loss occurs, the sub-path is triggered to update a new action, namely a new beta value is selected, so that an action update round is defined as ' action update round ', and in each action update round ', every time one RTT is passed, the beta value selected in the round is used for updating a congestion window cwnd of the current sub-path according to a formula (4);

step 5.3: when a congestion event occurs, before the next "action update round" is started, a new β value is selected to update the window, and a "window down" process needs to be performed on the window, where the processed window is as follows:

cwnd_new＝0.85*cwnd_old (5)

wherein cwnd_newCwnd, the processed congestion window size_oldThe size of a congestion window when a congestion event occurs; step 5.4: defining a reward function R adapted to CBR traffic:

wherein, through cpu_iRepresenting the throughput of each sub-path, r_iFor each sub-path network delay, through cpu_aRepresenting the constant bandwidth value required by the CBR service;

step 5.5: establishing a Q table in which beta values are associated with the reward functions R in each sub path, wherein each beta value in the Q table corresponds to a reward value; meanwhile, since the β value of each action will affect cwnd update in an action update round, and an action update round contains multiple RTTs, the reward value of the β value selected in the round needs to be updated in each action update round in real time, and the update rule of the reward is as follows:

R(m，β)＝(1-σ)·R(m-1，β)+σ·reward(m，β) (7)

wherein, R (m, beta) represents an expected reward value for updating operation by adopting the beta value in the current RTT, R (m-1, beta) represents a historical reward value for updating operation by adopting the beta value in the last RTT, reward (m, beta) represents a real-time reward value for updating operation by adopting the beta value in the current RTT, and the calculation mode is as step 5.4; sigma is an attenuation coefficient and represents the influence of the historical reward value on the current reward value;

step 5.6: the selection of the beta value in each action updating turn of each sub-path is completed by utilizing an epsilon-greedy mechanism, and the method specifically comprises the following steps: an exploration phase and a development phase; the exploration stage adopts a random strategy to select a beta value, and the development stage adopts a beta value with the maximum winning incentive function value of a Q table.

And 6: the sub-stream recombiner at the receiving end sends the data packets in the sub-buffer received by each sub-path to the sub-stream recombination buffer;

and 7: the sub-stream recombiner carries out sequential recombination on the data packets in the sub-stream recombination buffer zone and then delivers the data packets to an application layer, and the specific process is as follows:

step 7.1: the sub-stream recombiner utilizes the grouping identification and the internal identification in the data packet to carry out in-sequence recombination on the data packet in the sub-stream recombination buffer zone;

step 7.2: for a packet with a lost data packet, if the packet exceeds the video decoding time, discarding the data packet of the whole packet; if the video decoding time is not exceeded, the recombination operation is carried out after the lost data packet arrives.

Adopt the produced beneficial effect of above-mentioned technical scheme to lie in: the method provided by the invention adopts a congestion control strategy for modeling the updating of the congestion window of each sub-path AIMD stage as the problem of the multi-arm gambling machine, designs a reward function adaptive to CBR (Cone Beam routing) services, dynamically adjusts the updating value of the sub-path congestion window by utilizing online learning, is different from the traditional congestion control algorithm taking the maximum throughput as the target, and defines the target of the strategy as 'the minimum time delay under certain throughput' meeting. Meanwhile, load distribution is carried out according to the minimum time delay, and the problem of out-of-order arrival of data packets brought under the multipath condition is solved at a receiving end. Through experimental result analysis, compared with the traditional TCP congestion control algorithm, the invention can obtain the high throughput of the CBR service and achieve smaller round-trip delay, and is a multipath transmission control mechanism more suitable for real-time session services.

Drawings

Fig. 1 is a flowchart of a multipath transmission control method for real-time conversational services according to an embodiment of the present invention;

fig. 2 is a working model diagram of a multipath transmission control method for real-time conversational services according to an embodiment of the present invention;

FIG. 3 is a diagram of a processed packet structure according to an embodiment of the present invention;

FIG. 4 is a flow chart of load distribution in an embodiment of the present invention;

FIG. 5 is a schematic diagram of a sub-stream reassembly process according to an embodiment of the present invention;

FIG. 6 is a multi-path network topology diagram established by simulation in an embodiment of the present invention;

FIG. 7 is a graph comparing the results of simulated throughput in an embodiment of the present invention;

fig. 8 is a comparison diagram of the simulation average one-way delay effect in the embodiment of the present invention.

Detailed Description

The following detailed description of the present invention is provided in connection with the accompanying drawings and examples. The following examples are intended to illustrate the invention, but are not intended to limit the scope of the invention.

As shown in fig. 1, the multipath transmission control method for the real-time conversational service in this embodiment is as follows.

the multipath transmission control method is mainly oriented to service scenes comprising real-time conversation services such as real-time video call, real-time video conference, real-time video teaching and the like, and the services are essentially CBR services, namely, the condition that certain throughput is met is emphasized, and smaller time delay is pursued. In this embodiment, fig. 2 is a scene diagram of a working model of the multipath transmission control method for real-time conversational services, which mainly includes three parts, i.e., load distribution, multipath congestion control, and substream reassembly, and is controlled by a load distributor, a multipath congestion controller, and a substream recombiner, respectively. The load distribution solves the scheduling problem of data load under a multipath scene, the multipath congestion control solves the data sending rhythm problem of each sub-path sending end, and the sub-stream recombination solves the problem of orderly recombination of data packets of each sub-path at a receiving end;

and 3, step 3: after receiving the application data, the load distributor divides the data into a plurality of data packets and encapsulates the data packets;

the specific method comprises the following steps: the application data is grouped according to video frames, each video frame is a group and is packaged into a plurality of data packets, and each data packet head comprises a sequence number, a group identifier and an in-group identifier.

In the conventional TCP data packet retransmission, because a mechanism that the sequence number of the retransmission data packet is the same as that of the original data packet is adopted, the retransmission ambiguity problem occurs, and the obtained sampling RTT is influenced slightly. In order to avoid retransmission ambiguity, a 32-bit sequence number is added to the head of the original data, and the sequence number of the retransmission data packet is the sequence number of the original data packet plus one, so that the RTT can be accurately calculated. In this embodiment, in order to sequentially reassemble the data packet at the receiving end, 16-bit grouping identifiers and 8-bit group identifiers are added after the 32-bit sequence number. The structure of the processed data packet is shown in fig. 3.

And 4, step 4: the load distributor acquires the state information of each sub-path, calculates the load distribution ratio and the path cost of each sub-path by using a load distribution strategy, and distributes the encapsulated data packet to a plurality of sub-path sending sub-buffers;

sending RTT detection packet, obtaining round trip time of current sub-path, taking average RTT of round trip time_a，r_iTaking of approximations

r_i＝RTT_a/2

The value is RTT_aHalf of (i.e., half of (b).

In this embodiment, the packet loss rate ρ_iThe calculation process of (2) is as follows:

when the connection is initially established, the sub-path packet loss rate ρ_iAnd (3) setting the data packet to be 0, after the transmission starts, counting the proportion of the data packets which are not acked in a certain amount of data packets to the total data packets as the packet loss rate:

wherein, Δ dlected is a data packet sent within a period of time, and Δ acked is a data packet confirmed by the opposite terminal;

the data size of the sub-path sending sub-buffer zone can be directly obtained, and the data sending rate of the sub-path can be obtained from the multi-path congestion controller.

Further, the process of calculating the load distribution ratio and the path cost of each sub-path by using the load distribution policy and distributing the encapsulated packet to the multiple sub-path transmission sub-buffers is as follows:

step 4.1: calculating the total time delay d of each sub-path_i：

Wherein r is_iNetwork delay for sub-path, Q_iIndicates the size of the data in the current sub-path transmission sub-buffer, sr_iIndicating the current sub-path data transmission rate, p_iRepresenting the packet loss rate of the current sub-path;

total time delay d_iThe method can be a reaction of the existing data transmission time of the current sub-path, the smaller the numerical value is, the stronger the bearing capacity of the sub-path is, and more data should be distributed to the sub-path to match the service requirement of the real-time session class.

And 4.2: calculating a load distribution ratio alpha of each sub path_i：

step 4.3: calculating the path cost sp of each sub-path_i：

step 4.4: the load distributor distributes the data packet from the total sending buffer to the path cost sp each time_iSending the minimum sub-path into the sub-buffer until the distribution data volume of the sub-path reaches the planning distribution data volume or sp thereof_iNo longer at a minimum.

The load distribution flow involved in this embodiment is shown in fig. 4, where attention is paid to the path cost sp of selection_iWhen the minimum path is used for distributing data, the sub-paths that have already completed the distribution of the planned data amount should be excluded.

After the data packet arrives at the sub-path, the adjustment of the congestion window cwnd in the AIMD stage is not performed in the traditional TCP congestion control mode, but the data packet is modeled as a multi-arm gambling Machine (MAB) problem in online learning, a typical MAB problem is that a gambler shakes a swing arm of the gambling machine many times in order to achieve the cumulative maximization of the reward, the adjustment of the congestion window is not fixed any more each time, and there are many options, the congestion controller obtains the throughput and the delay information of each sub-path, and the "exploration" and "development" stages are balanced by setting a reward function adapted to the real-time conversation type service and using an epsilon-greedy strategy, so that the transmission can achieve the target of "minimum delay under certain throughput", and the specific process is as the following step 5.

and step 5.2: modeling the adjustment of the congestion window cwnd in each sub-path AIMD stage as an online learning dobby machine MAB model, defining the update rule of each sub-path congestion window as:

the method comprises the following steps that 5 values of beta are modeled as 5 rocker arms in an MAB model, when a sub-path is congested, namely packet loss occurs, the sub-path is triggered to update a new action, namely a new beta value is selected, so that an action updating round is defined as ' action updating round ', and in each action updating round ', the beta value selected in the round is used for updating a congestion window cwnd of the current sub-path according to a formula (4) every time an RTT passes;

cwnd_new＝0.85*cwnd_old (5)

wherein, the through cpu_iRepresents the throughput of each sub-path, r_iFor each sub-path network delay, through cpu_aRepresenting the constant bandwidth value required by the CBR service;

in this embodiment, considering that the real-time conversational service is a CBR service, that is, a constant bit rate service, the service is characterized in that the requirement for throughput is constant, when a constant service bandwidth required by the CBR is achieved, the QoE experience of both communication parties cannot be improved even if the throughput of both communication parties is increased, on the contrary, if a high throughput is still pursued at this time, the load in the network increases, which further increases queuing in the network, increases the delay, and deteriorates the QoE experience of both communication parties, which is also a reason why the conventional congestion control algorithm targeting pursuit of maximum throughput cannot be adapted to the service. Therefore, a constraint term γ is added to the reward function R in the present invention.

The constraint term γ constrains the reward function in that: the closer the sum of the sub-path throughputs is to the throughput_aThe larger the value of the reward function R, the smaller the reward value, and therefore, when the sum of the throughputs of the sub-paths approaches the constant bandwidth required by the CBR, the longer the increase or decrease of the throughputs will decrease the reward value, and in order to increase the reward value, the goal is to decrease the delay. Meanwhile, the main body of the reward function can also stimulate the algorithm to compete for the bandwidth, and bandwidth resources are not preempted by other flows.

Step 5.5: establishing a Q table in which the beta value is associated with the reward function R in each sub-path, which is shown in table 1 below in the present embodiment;

TABLE 1Q-TABLE associating beta values with reward functions R

β	1	2	3	4	5
						R	R(m,1)	R(m,2)	R(m,3)	R(m,4)	R(m,5)

Each beta value in the Q table corresponds to an award value; meanwhile, since the β value of each action will affect cwnd update in an action update round, and an action update round contains multiple RTTs, the reward value of the β value selected in the round needs to be updated in each action update round in real time, and the update rule of the reward is as follows:

R(m，β)＝(1-σ)·R(m-1，β)+σ·reward(m，β) (7)

wherein, R (m, beta) represents an expected reward value for updating operation by adopting the beta value in the current RTT, R (m-1, beta) represents a historical reward value for updating operation by adopting the beta value in the last RTT, reward (m, beta) represents a real-time reward value for updating operation by adopting the beta value in the current RTT, and the calculation mode is as step 5.4; σ is a decay coefficient, which represents the influence of the historical reward value on the current reward value, and the larger the value is, the smaller the influence of the historical reward is. In this embodiment, the value of the attenuation factor is set to 0.8, indicating that the update of the prize value emphasizes real-time prizes. Note that at algorithm initialization, all prize values need to be initialized to 0, and the prize values in the Q table are continuously updated as the algorithm executes and the prize values are calculated.

Step 5.6: the selection of the beta value in each action updating turn of each sub-path is completed by utilizing an epsilon-greedy mechanism, which comprises the following specific steps: an exploration phase and a development phase; the exploration stage adopts a random strategy to select a beta value, and the development stage adopts a beta value with the maximum winning incentive function value of a Q table.

One classic problem of MAB in online learning is the "EE predicament" problem, where one E represents Exploration (Exploration) and the other E represents development (Exploration), i.e. the gambler shaking the rocker arm, in order to maximize the cumulative prize, there are generally two options:

selecting 1, and continuing to shake the rocker arm with the largest current reward value, namely developing;

select 2, attempting to rock the non-selected rocker, explore possible rockers with more rewards, i.e., explore.

On-line learning is a dynamic problem, so the relationship between exploration and development needs to be well balanced, the algorithm in the invention also faces the same problem, and the solving steps are as follows:

step 5.6.1: defining the 'exploration' phase as randomly selecting the value of beta; defining the 'development' stage as selecting the value of beta in the Q table that maximizes the reward value;

step 5.6.2: the 'exploration' and 'development' stages are balanced by an epsilon-greedy strategy, and the epsilon value is set to be 0.3, namely the algorithm has 30% probability to execute the 'exploration' stage and 70% probability to execute the 'development' stage.

Step 6: the sub-stream recombiner at the receiving end sends the data packets in the sub-buffer received by each sub-path to the sub-stream recombination buffer;

and 7.2: for a packet with a lost data packet, if the packet exceeds the video decoding time, discarding the data packet of the whole packet; if the video decoding time is not exceeded, the recombination operation is carried out after the lost data packet arrives.

In this embodiment, fig. 5 is a schematic diagram of a sub-stream reassembly process.

In order to prove the effectiveness of the method, the method is verified through a specific simulation experiment, a network topology as shown in fig. 6 is built on an ns-3 simulation platform, the bandwidths of two paths are respectively set to be 3Mbps and 1Mbps, the unidirectional transmission time delay is 100ms, and the simulation experiment time is 300 seconds each time. End-to-end throughput, one-way time delay and algorithm fairness are used as performance indexes, and a congestion control algorithm Cubic which is most commonly used in the current network is used for comparing the performance with the multi-path transmission control in the invention. In fig. 6, node n0 communicates with node n4 and uses the multipath transmission control method of the present invention, the CBR required bandwidth is set to 2.5Mbps, and node n1 communicates with node n5 and uses the Cubic algorithm. Fig. 7 is an experimental comparison result, flow1 is a throughput variation graph of the multipath transmission control method in the present invention, flow2 is a throughput variation graph of the Cubic algorithm, and it can be seen from the graph that the throughput of the multipath transmission control method in the present invention converges at about 2.5Mbps, and the throughput of the Cubic algorithm converges at about 1.5Mbps, which shows the superior effect of multipath aggregate transmission, and at the same time, it can be seen that the multipath transmission control method in the present invention has algorithm fairness for the Cubic algorithm.

In order to prove that the method of the invention has the characteristic of low time delay, the simulation experiment is operated for 5 times, the one-way average transmission time delay of each experiment is counted, the experimental result is shown in fig. 8, flow1 in the graph represents the average time delay of the 5 experiments of the multipath transmission control method of the invention, flow2 represents the average time delay of the 5 experiments of the Cubic algorithm, and it can be seen that the multipath transmission control method of the invention has lower time delay and meets the service requirement of the real-time conversation service.

Claims

1. A multipath transmission control method facing to real-time session service is characterized in that the method comprises the following steps:

and 7: the sub-stream recombiner carries out sequential recombination on the data packets in the sub-stream recombination buffer zone and then delivers the data packets to the application layer.

2. The method for controlling multipath transmission of real-time conversational services according to claim 1, wherein the method in step 3 is: the application data is grouped according to video frames, each video frame is a group and is packaged into a plurality of data packets, and each data packet head comprises a sequence number, a group identifier and an intra-group identifier.

3. The method for controlling multipath transmission of real-time conversational services according to claim 1, wherein the state information of the sub-path includes: packet loss rate ρ_iNetwork delay r_iAnd a sub path transmission sub buffer data size and transmission rate.

4. The method for controlling multipath transmission of real-time conversational services according to claim 3, wherein the network delay r is_iThe method comprises the following steps of path transmission delay, router queuing delay and data processing delay of each buffer area, and the specific calculation method comprises the following steps:

sending RTT detection packet, obtaining round trip time of current sub-path, taking average RTT of round trip time_a，r_iApproximate value is RTT_aIs one half of (i.e. r)_i＝RTT_a/2。

5. The method for controlling multipath transmission for real-time conversational services according to claim 3, wherein the load distribution ratio and the path cost of each sub-path are calculated by using a load distribution strategy, and the process of distributing the encapsulated data packet to the plurality of sub-path transmission sub-buffers is as follows:

step 4.1: calculating the total time delay d of each sub-path_i：

Wherein r is_iNetwork delay, Q, for a sub-path_iRepresents the size of the data in the current sub-path transmission sub-buffer, sr_iIndicates the current sub-path data transmission rate, p_iRepresenting the packet loss rate of the current sub-path;

step 4.2: calculating a load distribution ratio alpha of each sub path_i：

Wherein d is_iFor each sub-path total delay, n is the number of sub-paths, the load distribution ratio alpha_iIndicating the proportion of the total data which should be planned and distributed by the current sub-path;

step 4.3: calculating a path cost sp of each sub-path_i：

step 4.4: the load distributor distributes the data packet from the total sending buffer to the path cost sp each time_iThe minimum sub-path is sent to the sub-buffer until the sub-path distributes data in the planned amountDistributing data volumes or sp thereof_iNo longer being minimal.

6. The method for controlling multipath transmission for real-time conversational services according to claim 1, wherein the specific process of the multipath congestion control strategy based on online learning is as follows:

cwnd_new＝0.85*cwnd_old (5)

wherein cwnd_newFor the processed congestion window size, cwnd_oldThe size of a congestion window when a congestion event occurs;

step 5.4: defining a reward function R adapted to CBR traffic:

wherein, through cpu_iRepresents the throughput of each sub-path, r_iFor each sub-path network delay, through cpu_aRepresenting the constant bandwidth value required by the CBR service;

step 5.5: establishing a Q table in which beta values are associated with the reward functions R in each sub path, wherein each beta value in the Q table corresponds to a reward value; meanwhile, as the beta value of each action influences cwnd updating in an action updating turn, and an action updating turn comprises a plurality of RTTs, the reward value of the beta value selected in the turn needs to be updated in real time in each action updating turn, and the updating rule of the reward is as follows:

R(m,β)＝(1-σ)·R(m-1,β)+σ·reward(m,β)(7)

wherein, R (m, β) represents an expected reward value for updating operation by adopting the β value in the current RTT, R (m-1, β) represents a historical reward value for updating operation by adopting the β value in the last RTT, reward (m, β) represents a real-time reward value for updating operation by adopting the β value in the current RTT, and the calculation mode is as in step 5.4; sigma is an attenuation coefficient and represents the influence of the historical reward value on the current reward value;

7. The method for controlling multipath transmission of real-time conversational services according to claim 2, wherein the procedure of step 7 is as follows: