CN114866461A

CN114866461A - RTC (real time clock) streaming media self-adaptive transmission method, device, equipment and storage medium

Info

Publication number: CN114866461A
Application number: CN202210471454.1A
Authority: CN
Inventors: 田昌; 刘莉
Original assignee: Jitter Technology Shenzhen Co ltd
Current assignee: Jitter Technology Shenzhen Co ltd
Priority date: 2022-04-28
Filing date: 2022-04-28
Publication date: 2022-08-05

Abstract

The application provides a RTC streaming media self-adaptive transmission method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring state parameters of all streaming media nodes in a target RTC transmission scene, and establishing a streaming media transmission strategy model based on a Markov decision process theory and a preset user experience quality condition; and based on the streaming media transmission strategy model, solving the optimal transmission strategy of the target streaming media in the target RTC transmission scene through strategy iteration. The method and the device find the optimal solution from the intelligent dynamic planning angle so as to achieve the lowest time delay and transmission effect.

Description

RTC (real time clock) streaming media self-adaptive transmission method, device, equipment and storage medium

Technical Field

The present application relates to the field of communications technologies, and in particular, to an RTC streaming media adaptive transmission method and apparatus, a computer device, and a computer-readable storage medium.

Background

At present, the time delay is often large in the RTC transmission process due to different selected streaming media nodes, and especially the time delay and jitter of long-distance transmission are very large; the current common method in the industry is static path planning, that is, configuring a static routing table, selecting a transport stream media node according to the static routing in the RTC transmission process, and updating the static routing table at intervals; or a path is searched by selecting a limited number of nodes; due to the limitations of these methods, the given path is not the optimal path, and the real-time path condition of the RTC cannot be adapted to the minimum delay.

Disclosure of Invention

In view of the foregoing problems in the prior art, embodiments of the present application provide an RTC streaming media adaptive transmission method, apparatus, computer device, and computer-readable storage medium.

In a first aspect, the present application provides an RTC streaming media adaptive transmission method, including:

acquiring state parameters of all streaming media nodes in a target RTC transmission scene, and establishing a streaming media transmission strategy model based on a Markov decision process theory and a preset user experience quality condition;

and based on the streaming media transmission strategy model, solving the optimal transmission strategy of the target streaming media in the target RTC transmission scene through strategy iteration, and transmitting the target streaming media based on the optimal transmission strategy.

In some embodiments, the obtaining the state parameters of all streaming media nodes in the target RTC transmission scenario includes:

acquiring original state data of all streaming media nodes in a target RTC transmission scene, wherein the original state data comprises the number of users, bandwidth and CPU (Central processing Unit) resources of each streaming media node;

and normalizing the number of users, the bandwidth and the CPU resource of each streaming media node, and determining the state parameter corresponding to each streaming media node according to preset different weight coefficients.

In some embodiments, the obtaining state parameters of all streaming media nodes in a target RTC transmission scenario, and establishing a streaming media transmission policy model based on a markov decision process theory and a preset user experience quality condition includes:

defining a four-tuple (S, A, Psa, R) based on a Markov decision process theory according to the state parameters of all streaming media nodes under the target RTC transmission scene;

constructing a condition function F according to the preset user experience quality condition;

constructing an optimal state value function and an optimal action value function based on a Bellman equation;

wherein S represents the state set of all streaming media nodes, and S is _i ∈S，s _i Representing the state parameter of the ith streaming media node in the target RTC transmission scene; a represents a set of actions, having a _i ∈A，a _i Representing the action that the streaming media node in the ith step selects the next streaming media node; psa is expressed at the present s _i In the state of passing through a _i State transition probability for transition to the next state after the action; r represents a return function for transferring the node state of the transmission state of the strategy to be transmitted based on path selection; f is the return value of the state transition of the strategy to be transmitted when the strategy enters the action of the absorption state.

In some embodiments, the conditional function F comprises:

taking the last streaming media node of the strategy to be transmitted as an absorption state, and when the strategy to be transmitted enters the action of the absorption state, if the time delay of the strategy to be transmitted is smaller than a preset time delay threshold value and the code rate of the strategy to be transmitted is smaller than a preset code rate threshold value, the return value of state transition of the strategy to be transmitted when the strategy to be transmitted enters the action of the absorption state is 0, otherwise, the return value is-1.

In some embodiments, based on the streaming media transmission policy model, solving an optimal transmission policy in the target RTC transmission scenario through policy iteration includes:

aiming at a plurality of strategy selections of an initial node under a target RTC transmission scene, establishing a strategy estimation formula based on a Bellman equation;

performing policy estimation on all states of the current policy according to the policy estimation formula to update a state value function of the current policy;

changing an action for the state of the initial node by the current strategy through a strategy improvement principle to enable the action value function of the current strategy to be larger than the corresponding state value function, and traversing the states and all actions of all streaming media nodes through a greedy algorithm to carry out strategy improvement to obtain a new strategy;

and performing strategy estimation and strategy improvement on the new strategy, and obtaining a maximum state value function and a corresponding optimal transmission strategy through iterative computation.

In some embodiments, the policy estimation formula is as follows:

wherein, p (s '| s, pi (s)) represents the probability of transferring to the state s' after executing the action a corresponding to the current strategy pi(s) under the current node state s; r (s '| s, π (s)) represents a return function for transferring to state s' after executing an action a corresponding to a current policy π(s) from a current node state s; gamma represents a discount factor; there are many possibilities for action a corresponding to π(s), each possibility being denoted as π (a | s).

In some embodiments, the performing policy estimation and policy improvement on the new policy by iterative computation to obtain a maximum state value function and a corresponding optimal transmission policy includes:

and when the calculated change value of the current state value function is smaller than a preset threshold value, determining that the current state value function is the maximum state value function, and determining that the strategy corresponding to the maximum state value function is the optimal transmission strategy.

In a second aspect, the present application provides an RTC streaming media adaptive transmission apparatus, including:

the modeling module is used for acquiring state parameters of all streaming media nodes in a target RTC transmission scene and establishing a streaming media transmission strategy model based on a Markov decision process theory and a preset user experience quality condition;

and the path planning module is used for solving the optimal transmission strategy of the target streaming media in the target RTC transmission scene through strategy iteration based on the streaming media transmission strategy model, and transmitting the target streaming media based on the optimal transmission strategy.

In a third aspect, an embodiment of the present application provides a computer device, including:

at least one processor, at least one memory, and a communication interface; wherein,

the processor, the memory and the communication interface are communicated with each other;

the memory stores program instructions executable by the processor, and the processor calls the program instructions to execute the RTC streaming media adaptive transmission method provided by any of the various implementations of the first aspect.

In a fourth aspect, embodiments of the present application provide a computer-readable storage medium storing computer instructions that, when executed on a computer device, cause the computer device to perform the RTC streaming media adaptive transmission method provided in any one of the various implementations of the first aspect.

According to the embodiment, on the basis of a Markov decision process theory, user experience quality conditions are added to construct a streaming media transmission strategy model, so that more effective strategies are conveniently searched, the real-time experience and viewing quality of a user are met, and the time delay is reduced; and then based on the streaming media transmission strategy model, solving the optimal transmission strategy of the target streaming media in the target RTC transmission scene through strategy iteration, thereby improving the adaptability of RTC transmission and reducing the transmission time delay.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, a brief description will be given below to the drawings required for the description of the embodiments or the prior art, and it is obvious that the drawings in the following description are some embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.

Fig. 1 is a flowchart of an RTC streaming media adaptive transmission method according to an embodiment of the present application;

fig. 2 is a flowchart of a step S101 of an RTC streaming media adaptive transmission method according to an embodiment of the present application;

fig. 3 is a flowchart of a step S101 of an RTC streaming media adaptive transmission method according to another embodiment of the present application;

fig. 4 is a flowchart of a step S102 of an RTC streaming media adaptive transmission method according to an embodiment of the present application;

fig. 5 is a schematic diagram of an RTC streaming media adaptive transmission apparatus according to an embodiment of the present application;

fig. 6 is a schematic diagram of a computer device according to an embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application. In addition, the technical features of the various embodiments or individual embodiments provided in this application may be arbitrarily combined with each other to form a feasible technical solution, and such combination is not limited by the sequence of steps and/or the structural composition mode, but must be based on the realization of the capability of a person skilled in the art, and when the technical solution combination is contradictory or cannot be realized, the technical solution combination should be considered to be absent and not within the protection scope of the present application.

Referring to fig. 1, the RTC streaming media adaptive transmission method according to the embodiment of the present application may include the following steps:

s101, acquiring state parameters of all streaming media nodes in a target RTC transmission scene, and establishing a streaming media transmission strategy model based on a Markov decision process theory and a preset user experience quality condition.

It should be noted that the target RTC transmission scenario may be a streaming media transmission scenario based on an RTC (Real-Time Communication) protocol, such as live-cast streaming media transmission, where a media to be transmitted is transmitted from the source node to the target node through the RTC protocol. The preset user experience quality condition refers to Qoe index conditions, and includes a delay threshold and a code rate threshold. The streaming media node, the source node, the target node and the initial node appearing in this embodiment are all referred to as a streaming media server.

In some embodiments, referring to fig. 2, the obtaining of the state parameters of all streaming media nodes in the target RTC transmission scenario in step S101 may include:

s201, acquiring original state data of all streaming media nodes in a target RTC transmission scene, wherein the original state data comprises the number of users, bandwidth and CPU (central processing unit) resources of each streaming media node;

s202, normalizing the number of users, the bandwidth and the CPU resource of each streaming media node, and determining the state parameter corresponding to each streaming media node according to different preset weight coefficients.

For the parameters after normalization of the number of users, the bandwidth, and the CPU resources, different weight coefficients may be set according to actual conditions, such as 0.3, and 0.4, or 0.4, and 0.2, and the state parameter corresponding to each streaming media node is calculated.

In some embodiments, referring to fig. 3, step S101 may include the following steps:

s301, defining a four-tuple (S, A, Psa, R) based on a Markov decision process theory according to the state parameters of all streaming media nodes in the target RTC transmission scene;

s302, constructing a condition function F according to the preset user experience quality condition;

s303, constructing an optimal state value function and an optimal action value function based on the Bellman equation;

wherein S represents the state set of all streaming media nodes, and S is _i ∈S，s _i Representing a target RTC transmission scenarioThe state parameters of the streaming media node in the ith step; a represents a set of actions, having a _i ∈A，a _i Representing the action that the streaming media node in the ith step selects the next streaming media node; psa is expressed at the present s _i In the state of passing through a _i State transition probability for transition to the next state after the action; r represents a return function for transferring the node state of the transmission state of the strategy to be transmitted based on path selection; f is the return value of the state transition of the strategy to be transmitted when the strategy enters the action of the absorption state.

In some embodiments, the optimum state value function V ^* (s) and an optimal action value function Q ^* (s) is represented as follows:

in the formula, Σ p (s '| s, π (s)) represents the probability of transitioning to state s' after the action a corresponding to the current policy π(s) is executed from the current node state s; r (s '| s, pi (s)) represents a return function transferred to the state s' after the action a corresponding to the current strategy pi(s) is executed in the current node state s, and r (s '| s, pi (s)) can also be recorded as r (s' | s, a); gamma represents a discount factor; s ₀ Indicates an initial state, a ₀ Representing an initial action; v ^π (s ') a state value function representing the state s' of the subsequent streaming media node; q ^* (s ', a') represents the maximum action value function of the state s 'of the subsequent streaming media node after performing action a'. The above-mentioned optimum state value function V ^* (s) and an optimal action value function Q ^* (s) is called Bellman optimality equation.

Based on the Markov decision process theory, it is necessary to search for the state s in any initial state ₀ The strategy pi of the state value function and the action value function can be maximized.

In some embodiments, the conditional function F comprises: taking the last streaming media node of the strategy to be transmitted as an absorption state, and when the strategy to be transmitted enters the action of the absorption state, if the time delay of the strategy to be transmitted is smaller than a preset time delay threshold value and the code rate of the strategy to be transmitted is smaller than a preset code rate threshold value, the return value of state transition of the strategy to be transmitted when the strategy to be transmitted enters the action of the absorption state is 0, otherwise, the return value is-1.

It should be noted that, in the embodiment, when a policy is searched, two Qoe index conditions, namely a delay threshold and a code rate threshold, are incorporated into the streaming media transmission policy model and are used to determine a return value of the state transition of the streaming media when the policy to be transmitted is in the absorption state, if the return value of the state transition of the streaming media when the policy to be transmitted is in the absorption state is-1, it indicates that the policy to be transmitted is an invalid policy, and if the return value is 0, it indicates that the policy to be transmitted is an effective policy, so that the delay of the found optimal transmission policy is low, and the real-time experience and viewing quality of a user can be met.

In a specific embodiment, for a domestic RTC transmission scenario, a delay threshold value may be set to 90-100 ms, and a code rate threshold value may be set to 1.8-2.2 million, and for a foreign RTC transmission scenario, a delay threshold value may be set to 200-300 ms, and a code rate threshold value may be set to 1.8-2.2 million, so that specific values of the delay threshold value and the code rate threshold value may be set according to actual situations.

S102, based on the streaming media transmission strategy model, solving an optimal transmission strategy of the target streaming media in the target RTC transmission scene through strategy iteration, and transmitting the target streaming media based on the optimal transmission strategy; the target streaming media can be an audio-video resource for live viewing by a user.

In some embodiments, referring to fig. 4, step S102 may include:

s401, aiming at multiple strategy selections of an initial node in a target RTC transmission scene, establishing a strategy estimation formula based on a Bellman equation;

s402, performing strategy estimation on all states of the current strategy according to the strategy estimation formula to update a state value function of the current strategy;

s403, changing an action for the state of the initial node by the current strategy according to a strategy improvement principle, so that the action value function of the current strategy is larger than the corresponding state value function, and traversing the states and all actions of all streaming media nodes by a greedy algorithm to perform strategy improvement to obtain a new strategy;

s404, strategy estimation and strategy improvement are carried out on the new strategy, and a maximum state value function and a corresponding optimal transmission strategy are obtained through iterative calculation.

In some embodiments, the policy estimation formula is specified as follows:

the meaning of the strategy estimation formula is that the strategy estimation formula is in the current node state s, if the next streaming media node is selected according to the current strategy pi, actions a corresponding to pi(s) have multiple possibilities, and each possibility is marked as pi (a | s). In the formula, p (s '| s, pi (s)) represents the probability of transferring to the state s' after the action a corresponding to the current strategy pi(s) is executed in the current node state s; r (s '| s, pi (s)) represents a return function for transferring to the state s' after the action a corresponding to the current strategy pi(s) is executed from the current node state s; gamma denotes a discount factor.

In this embodiment, an iterative method is used for policy estimation, and the (k + 1) th iteration may be represented as:

in the iterative process, each state is scanned once in each iteration, and in the (k + 1) th iteration, larger V which can be directly obtained ^π (s) assigning a value to V _k+1 Specifically, an array may be used to store each state value function, and each time a new larger state value is obtained, the old smaller state value is overwritten. The array may be of the form: [ V ] _k+1 (s ₁ ),V _k+1 (s ₂ ),V _k+1 (s ₃ ),...V _k+1 (s _n )]. Through an iterative process, the strategy estimation formula can be continuously drivenConvergence is achieved.

In some embodiments, step S404 includes: and when the change value of the current state value function obtained by calculation is smaller than a preset threshold value, determining that the current state value function is the maximum state value function and the strategy corresponding to the maximum state value function is the optimal transmission strategy, so that iteration can be exited and the operation amount is reduced. In this embodiment, the change value of the current state value function may refer to a change rate, and the threshold value may be set according to an actual situation, for example, 1% to 5%.

The implementation basis of the various embodiments of the present application is realized by a programmed process performed by a device having a processor function. Therefore, in engineering practice, the technical solutions and functions thereof of the embodiments of the present application can be packaged into various modules.

Based on this reality, on the basis of the foregoing embodiments, embodiments of the present application provide an apparatus for adaptive RTC streaming, where the apparatus is configured to execute the method for adaptive RTC streaming in the foregoing method embodiments. Referring to fig. 5, the apparatus for adaptive transmission of RTC streaming media includes:

the modeling module 501 is configured to acquire state parameters of all streaming media nodes in a target RTC transmission scene, and establish a streaming media transmission policy model based on a markov decision process theory and preset user experience quality conditions;

and the path planning module 502 is configured to solve an optimal transmission strategy of the target streaming media in the target RTC transmission scene through strategy iteration based on the streaming media transmission strategy model, and transmit the target streaming media based on the optimal transmission strategy.

For specific limitations of each module of the RTC streaming media adaptive transmission apparatus, reference may be made to the above limitations on the RTC streaming media adaptive transmission method, which is not described herein again. In addition, it should be noted that all or part of the modules in the RTC streaming media adaptive transmission apparatus may be implemented by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

Referring to fig. 6, the present embodiment further provides a computer device, which may be a computing device such as a mobile terminal, a desktop computer, a notebook, a palmtop computer, and a server. The computer device comprises a processor 601, a memory 602 and a display 603. FIG. 6 shows some of the components of a computer device, but it is understood that not all of the shown components are required to be implemented, and that more or fewer components may be implemented instead.

The memory 602 may be, in some embodiments, an internal storage unit of the computer device, such as a hard disk or a memory of the computer device. The memory 602 may also be an external storage device of the computer device in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), etc. provided on the computer device. Further, the memory 602 may also include both internal and external storage units of the computer device. The memory 602 is used for storing application software installed on the computer device and various data, such as program codes for installing the computer device. The memory 602 may also be used to temporarily store data that has been output or is to be output. In one embodiment, the memory 602 stores an RTC streaming adaptive transmission program 604.

The processor 601 may be a Central Processing Unit (CPU), microprocessor or other data Processing chip in some embodiments, and is used for executing program codes stored in the memory 602 or Processing data, such as executing an RTC streaming media adaptive transmission method.

The display 603 may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an OLED (Organic Light-Emitting Diode) touch panel, or the like in some embodiments. The display 603 is used for displaying information at the computer device and for displaying a visualized user interface. The

components

601 and 603 of the computer device communicate with each other via a system bus.

In one embodiment, when the processor 601 executes the RTC streaming media adaptive transmission program 604 in the memory 602, the following steps are implemented:

The present embodiment further provides a computer-readable storage medium, on which an RTC streaming media adaptive transmission program is stored, and when executed by a processor, the RTC streaming media adaptive transmission program implements the following steps:

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above.

Any reference to memory, storage, database, or other medium used in the embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).

It will be evident to those skilled in the art that the present application is not limited to the details of the foregoing illustrative embodiments, and that the present application may be embodied in other specific forms without departing from the spirit or essential attributes thereof. The present embodiments are therefore to be considered in all respects as illustrative and not restrictive, the scope of the application being indicated by the appended claims rather than by the foregoing description, and all changes which come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims

1. An adaptive transmission method for RTC streaming media, comprising:

2. The RTC streaming media adaptive transmission method according to claim 1, wherein the acquiring the state parameters of all streaming media nodes in the target RTC transmission scenario includes:

3. The RTC streaming media adaptive transmission method according to claim 1, wherein the acquiring the state parameters of all streaming media nodes in the target RTC transmission scene and establishing a streaming media transmission policy model based on a Markov decision process theory and a preset user experience quality condition comprises:

4. The RTC streaming adaptive transmission method according to claim 3, characterized in that the conditional function F comprises:

5. The RTC streaming media adaptive transmission method according to claim 2, wherein the step of solving the optimal transmission strategy in the target RTC transmission scenario through strategy iteration based on the streaming media transmission strategy model comprises:

changing an action for the state of the initial node by the current strategy according to a strategy improvement principle, so that an action value function of the current strategy is larger than a corresponding state value function, and traversing the states and all actions of all streaming media nodes by a greedy algorithm to carry out strategy improvement so as to obtain a new strategy;

6. The RTC streaming adaptive transmission method according to claim 5, characterized in that the policy estimation formula comprises:

p (s '| s, pi (s)) represents the probability of transferring to the state s' after the action a corresponding to the current strategy pi(s) is executed from the current node state s; r (s '| s, pi (s)) represents a return function for transferring to the state s' after the action a corresponding to the current strategy pi(s) is executed from the current node state s; gamma represents a discount factor; there are many possibilities for action a corresponding to π(s), each possibility being denoted as π (a | s).

7. The RTC streaming media adaptive transmission method according to claim 5, wherein the performing policy estimation and policy modification on the new policy through iterative computation to obtain a maximum state value function and a corresponding optimal transmission policy comprises:

8. An apparatus for adaptive transmission of RTC streaming media, comprising:

9. A computer device, comprising:

the memory stores program instructions executable by the processor, the processor calling the program instructions to perform the RTC streaming adaptive transmission method of any of claims 1 to 7.

10. A computer readable storage medium storing computer instructions which, when run on a computer device, cause the computer device to perform the RTC streaming adaptive transmission method of any one of claims 1 to 7.