CN116527613A

CN116527613A - Audio parameter adjustment method, device, apparatus and storage medium

Info

Publication number: CN116527613A
Application number: CN202310363189.XA
Authority: CN
Inventors: 瞿伟; 任晋奎; 张献涛
Original assignee: Alibaba China Co Ltd
Current assignee: Alibaba China Co Ltd
Priority date: 2023-04-04
Filing date: 2023-04-04
Publication date: 2023-08-01

Abstract

The embodiment of the application provides an audio parameter adjustment method, device and apparatus and a storage medium. In the embodiment of the application, according to the actual delay time of the audio frame from the source end to the destination end in the set time period and the callback time interval of the destination end callback audio frame from the audio buffer area in the time period, the audio quality parameter of the destination end is determined, and the quantization of the audio quality of the destination end is realized. Further, according to the audio quality parameters of the destination terminal, the audio stream parameters of the destination terminal are adjusted, so that the automatic adjustment of the audio stream parameters of the destination terminal is realized, and the improvement of the audio parameter adjustment efficiency is facilitated. On the other hand, the audio stream parameters of the destination terminal are adjusted based on the audio quality parameters, so that the audio quality of the destination terminal is guaranteed to a certain extent, and the audio quality of the destination terminal is guaranteed.

Description

Audio parameter adjustment method, device, apparatus and storage medium

Technical Field

The present disclosure relates to the field of internet technologies, and in particular, to a method, an apparatus, a device, and a storage medium for adjusting audio parameters.

Background

Along with the development of cloud computing, program modules originally located in terminal equipment are configured at a server side to achieve weight reduction of the terminal equipment, so that the cloud computing is a mainstream development trend at present. Based on the method, products such as cloud desktops, cloud applications and the like are derived.

The audio in the cloud desktop or the cloud application is transmitted through the network, and the audio quality is greatly influenced by network fluctuation, central processing unit (Central Processing Unit, CPU) scheduling and other software and hardware conditions of the client. The audio stream parameters affect the audio quality to some extent. In the existing scheme, the audio stream parameters are manually adjusted by an administrator or a user, so that whether the influence of the adjusted audio stream parameters on the audio quality is positive or negative cannot be guaranteed, and the adjustment is generally carried out for many times, so that the adjustment efficiency is low.

Disclosure of Invention

Various aspects of the present application provide an audio parameter adjustment method, apparatus, device, and storage medium, which are used to implement automatic adjustment of audio stream parameters, and are helpful to improve the audio stream parameter adjustment efficiency.

The embodiment of the application provides an audio parameter adjusting method, which comprises the following steps:

acquiring the actual delay time length of an audio frame from a source end to a destination end in a set time period;

acquiring a callback time interval of the destination terminal callback audio frame from the audio buffer area in the set time period;

determining an audio quality parameter of the destination according to the actual delay time length and the callback time interval;

and adjusting the audio stream parameters of the destination terminal according to the audio quality parameters.

The embodiment of the application also provides an audio parameter adjusting method, which comprises the following steps:

acquiring the actual delay time length of an audio frame from a server to a client in a set time period; the server side and the client side are a cloud desktop server side and a cloud desktop client side, or a cloud application server side and a cloud desktop client side;

acquiring a callback time interval of the client callback audio frame from the audio buffer area in the set time period;

determining an audio quality parameter of the client according to the actual delay time length and the callback time interval;

and adjusting the audio stream parameters of the client according to the audio quality parameters.

The embodiment of the application also provides an audio parameter adjusting device, which comprises:

the acquisition module is used for acquiring the actual delay time length of the audio frame from the source end to the destination end in the set time period; obtaining a callback time interval of the destination terminal callback audio frame from the audio buffer area in the set time period;

the determining module is used for determining the audio quality parameter of the destination terminal according to the actual delay time length and the callback time interval;

and the adjusting module is used for adjusting the audio stream parameters of the destination terminal according to the audio quality parameters.

the acquisition module is used for acquiring the actual delay time length of the audio frame from the server side to the client side in the set time period; acquiring callback time intervals of callback audio frames from an audio buffer by the client in the set time period; the server side and the client side are a cloud desktop server side and a cloud desktop client side, or a cloud application server side and a cloud desktop client side;

the determining module is used for determining the audio quality parameter of the client according to the actual delay time length and the callback time interval;

and the adjusting module is used for adjusting the audio stream parameters of the client according to the audio quality parameters.

Embodiments of the present application also provide a computing device, comprising: a memory and a processor; wherein the memory is used for storing a computer program;

the processor is coupled to the memory for executing the computer program for performing the steps in the audio quality adjustment methods described above.

Embodiments also provide a computer-readable storage medium storing computer instructions that, when executed by one or more processors, cause the one or more processors to perform the steps in the above-described audio quality adjustment methods.

In the embodiment of the application, according to the actual delay time of the audio frame from the source end to the destination end in the set time period and the callback time interval of the destination end callback audio frame from the audio buffer area in the time period, the audio quality parameter of the destination end is determined, and the quantization of the audio quality of the destination end is realized. Further, according to the audio quality parameters of the destination terminal, the audio stream parameters of the destination terminal are adjusted, so that the automatic adjustment of the audio stream parameters of the destination terminal is realized, and the improvement of the audio parameter adjustment efficiency is facilitated. On the other hand, the audio stream parameters of the destination terminal are adjusted based on the audio quality parameters, so that the audio quality of the destination terminal is guaranteed to a certain extent, and the audio quality of the destination terminal is guaranteed.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

FIGS. 1 and 2 are schematic diagrams illustrating the structure of a data processing system according to an embodiment of the present application;

fig. 3a and fig. 3b are schematic flow diagrams of an audio parameter adjustment method according to an embodiment of the present application;

FIG. 4 is a schematic diagram of a computing device according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of an audio parameter adjusting apparatus according to an embodiment of the present application.

Detailed Description

For the purposes, technical solutions and advantages of the present application, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

In the existing scheme, the audio stream parameters are manually adjusted by an administrator or a user, whether the influence of the adjusted audio stream parameters on the audio quality is positive or negative cannot be guaranteed, and the adjustment is generally required to be carried out for multiple times, so that the adjustment efficiency is low. In order to solve the technical problem, in some embodiments of the present application, according to the actual delay time of an audio frame from a source end to a destination end in a set period of time and the callback time interval of the destination end callback audio frame from an audio buffer in the period of time, the audio quality parameter of the destination end is determined, so as to realize the quantization of the audio quality of the destination end. Further, according to the audio quality parameters of the destination terminal, the audio stream parameters of the destination terminal are adjusted, so that the automatic adjustment of the audio stream parameters of the destination terminal is realized, and the improvement of the audio parameter adjustment efficiency is facilitated. On the other hand, the audio stream parameters of the destination terminal are adjusted based on the audio quality parameters, so that the audio quality of the destination terminal is guaranteed to a certain extent, and the audio quality of the destination terminal is guaranteed.

The following describes in detail the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.

It should be noted that: like reference numerals denote like objects in the following figures and embodiments, and thus once an object is defined in one figure or embodiment, further discussion thereof is not necessary in the subsequent figures and embodiments.

Fig. 1 and fig. 2 are schematic structural diagrams of a data processing system according to an embodiment of the present application. As shown in fig. 1 and 2, the data processing system may include: a source end 10 and a destination end 20. The source end refers to equipment for providing audio; the destination is a device that receives audio and plays the audio.

The application scenes are different, and the realization modes of the source terminal 10 and the destination terminal 20 are different. In a cloud desktop or cloud application scenario, the source 10 may be a service of a cloud desktop or cloud application, and the destination 20 may be a client of the cloud desktop or cloud application. In the audio-video call scene, the source terminal and the destination terminal can be realized as a plurality of terminals for audio-video call. In an audio-video call scenario, the source and destination are changed with the sound source. For example, in a call, the terminal of the current speaker may be the source; the terminal of the receiver can be the destination terminal, etc.

In this embodiment, in a cloud desktop or cloud application scenario, applications running in a cloud environment are collectively referred to as cloud-side applications. The cloud-side application refers to an application deployed in a cloud environment, and can be a cloud application (Cloud Applications) or an application running in a cloud desktop. The cloud application is the embodiment of the cloud computing technology at the application layer. The working principle of cloud application converts the use mode of traditional software 'local installation and local operation' into 'instant use' service. The client can connect and control the remote service cluster through the Internet or a local area network without installing corresponding software, and complete the business logic or operation task.

The cloud desktop is based on the virtualization technology of computer hardware resources, so that the cloud desktop is virtualized into a plurality of virtual computers, and a desktop operating system without any modification can be directly operated on the virtual computers. Meanwhile, the cloud desktop also provides an operating system supporting the class virtualization technology, and the virtualized application can directly run on the desktop operating system. The application running in the cloud desktop refers to an application running in the desktop operating system. The applications running within the cloud desktop may be applications with User Interfaces (UIs), such as instant messaging applications, online shopping applications, social applications, mailbox applications, and the like. Of course, the application running in the cloud desktop can also be an application without UI, such as a background script running in the cloud desktop, a system service, and the like.

In the embodiment of the application, the client of the cloud desktop or the cloud application refers to an electronic device used by a user and having a function of communication and the like, and may be, for example, a mobile phone, a tablet computer, a personal computer, a wearable device, or an internet of things (Internet of Things, ioT) device. Of course, the Client may be a Thin Client (Thin Client) such as a set-top box. The client typically includes at least one processing unit and at least one memory. The client may also include basic configurations such as a network card chip, an Input/Output (IO) bus, an audio/video component, and a display component.

In the embodiment of the application, the server side of the cloud side application refers to a computer device capable of providing cloud side application services, and generally has the capability of bearing services and guaranteeing the services. The service end can respond to the service request of the client end and provide service related to the cloud side application for the user. The server may be a single server device, a cloud server array, or a Virtual Machine (VM) running in the cloud server array. In addition, the server may also refer to other computing devices with corresponding service capabilities, for example, a terminal device (running a service program) such as a computer, and the like.

For the cloud desktop system, the server side deploys the application programs of the cloud desktop and other application programs running in the cloud desktop. The number of applications running within the cloud desktop may be 1 or more. The plural means 2 or more than 2. Multiple applications may be deployed on the same physical machine or on different physical machines. For the cloud application system, an application program of the cloud application is deployed in the server side. The application program of the cloud application may also be 1 or more. In the embodiment of the present application, the implementation form of the cloud application is not limited. Alternatively, the cloud Application may be a web page, client software, application (APP), applet, or the like.

In a cloud desktop or cloud application scenario, a server may store audio data of a cloud-side application and send the audio data to a client in response to a request from the client. In the audio-video call scene, the source terminal can collect the audio data of the source terminal user through the pickup and send the audio data to other terminals (namely the destination terminal) in the call. The number of destination ends may be 1 or more. The plural means 2 or more than 2. Fig. 1 and 2 are only illustrative, but not limiting, of a data processing system as a cloud desktop or cloud application system.

For the destination 20 receiving audio data, audio jams, even breaks, etc. occur due to network jitter, which may cause audio frames to fail to arrive at the speaker 202 of the destination 20 on time. In order to alleviate the phenomenon that network jitter and the like cause audio to be stuck, the destination end 20 may set an audio buffer 201. The audio buffer 201 may be disposed in the memory of the destination terminal 20, and is used for storing the audio frames sent by the source terminal 10. The audio buffer 201 may be a first-in first-out (First Input First Output, FIFO) audio buffer queue. The amount of data of the audio frames stored in the audio buffer 201 is less than or equal to the capacity of the audio buffer 201. In general, the amount of data of the audio frames stored in the audio buffer 201 is equal to the capacity of the audio buffer 201.

Based on the audio buffer 201, the destination 20 may obtain an audio stream from the source 10, store the audio stream in the audio buffer 201, and then send the audio stream to the speaker 202 of the destination 20 uniformly by means of a callback function for playing. For example, in a cloud desktop or cloud application scenario, a server may send currently requested audio data to a client in the form of an audio stream in response to a request by the client. The client may store the received audio frames to the audio buffer 201. The client may then send the audio stream uniformly to the client's speaker 202 for playback by way of a callback function.

Specifically, the audio thread in the destination terminal 20 obtains the audio frame to be played through the callback function. Specifically, the audio thread of the destination 20 continuously sends an empty Buffer queue (Buffer) to the callback function, and the callback function is responsible for filling the Buffer queue into audio data, and then sending the Buffer queue to the playing system of the destination 20 for playing.

Based on the above-mentioned audio buffer 201, as shown in fig. 1, the destination 20 may obtain an audio frame sent by the source, and store the audio frame sent by the source in the audio buffer 201. Further, the audio thread of the destination terminal 20 can be utilized to obtain the audio frame to be played from the audio buffer area through a callback function; and transmits the audio frame to be played to the speaker 202 of the destination for playing. The process is the local scheduling process of the destination terminal.

In the embodiment of the application, the quality of the audio stream mainly reflects the fluency of playing the video by the destination terminal. The inventors of the present application have studied and found that the quality of the audio stream at the destination 20 is mainly affected by network jitter and audio scheduling jitter local to the destination. Wherein network jitter, also referred to as audio transmission jitter, is used to describe the extent of variation of the delay of audio from source to destination. The audio scheduling jitter local to the destination is used to describe how much the destination changes the delay in scheduling audio frames from the audio buffer to the speaker for playback.

Because the audio transmission jitter and the audio scheduling jitter of the destination end are main factors influencing the audio stream quality of the destination end, the audio stream parameters of the destination end can be adjusted according to the audio transmission jitter parameters and the audio scheduling jitter parameters. The audio stream parameter refers to a parameter affecting the audio quality of the destination terminal, and specifically refers to a parameter affecting the audio playing quality of the destination terminal. In this embodiment, the audio stream parameters may be implemented as parameters affecting audio transmission jitter and audio scheduling jitter of the destination. In this embodiment, the audio stream quality can be ensured by adjusting the audio stream parameters of the destination.

Based on the above analysis, since the audio transmission jitter describes the degree of variation of the delay of the audio from the source to the destination, the destination 20 may acquire the actual delay time length of the audio frame from the source 10 to the destination 20 within a set period of time, as shown in fig. 2, in order to determine the audio quality of the destination. Also, since the audio scheduling jitter is a degree of change describing the delay of the destination scheduling audio from the audio buffer to the speaker for playing, the destination 20 may also obtain a callback time interval of the destination callback audio frame from the audio buffer 201 in a set period of time. The callback time interval refers to a time interval between two adjacent audio frames called back by the destination terminal.

In the embodiment of the present application, the specific duration of the set period of time is not limited. The set time period may be a set parameter adjustment period, such as, but not limited to, a second order (1 second, 2 seconds or 5 seconds, 30 seconds, etc.), a minute order (e.g., 5 minutes, 10 minutes or half an hour, etc.), an hour order (e.g., 1 hour, 2 hours or 6 hours, etc.).

Further, the audio quality parameter of the destination may be determined according to the actual delay time of the audio frame from the source 10 to the destination 20 in the set period of time and the callback time interval of the destination callback audio frame from the audio buffer 201 in the set period of time.

In the embodiment of the present application, the specific implementation manner of determining the audio quality parameter of the destination terminal according to the actual delay duration and the callback time interval is not limited.

In some embodiments, the audio transmission jitter parameter may be determined based on the actual delay time of the audio frame from the source 10 to the destination 20 for a set period of time. Specifically, in an ideal case, the actual delay time of the audio frame from the source terminal 10 to the destination terminal 20 is fixed, which is equal to the preset theoretical delay time. Wherein the theoretical delay time is equal to the playing time of the audio frame according to the set playing speed under ideal condition. The set playing speed can be the original audio speed. The audio raw speed may be a default play speed, typically an empirical value. The degree of fluctuation in the actual delay time length of the audio frame from the source 10 to the destination 20 compared to the theoretical delay time length may reflect the audio transmission jitter. Based on this, the audio transmission jitter parameter may be determined according to the actual delay time length from the source end 10 to the destination end 20 of the audio frame in the set period of time and the preset theoretical delay time length.

Alternatively, a difference between an actual delay time of the audio frame from the source terminal 10 to the destination terminal 20 within the set period of time and a preset theoretical delay time may be calculated; and determines an audio transmission jitter parameter according to a difference between an actual delay time length of the audio frame from the source end 10 to the destination end 20 and a theoretical delay time length within a set period.

Specifically, the variance or standard deviation between the actual delay duration and the theoretical delay duration may be calculated as the audio transmission jitter parameter according to the difference between the actual delay duration and the theoretical delay duration of the audio frame from the source terminal 10 to the destination terminal 20 in the set period. Alternatively, the average value of the difference between the actual delay time length and the theoretical delay time length of the audio frame from the source end 10 to the destination end 20 in the set period of time may be calculated as the audio transmission jitter parameter.

In other embodiments, the actual delay time of the audio frame from the source 10 to the destination 20 is ideally constant, and the degree of variation between the actual delay time may also reflect the jitter of the audio transmission. Based on this, the average value of the actual delay time length of the audio frame from the source end 10 to the destination end 20 in the set time period can be calculated; and according to the actual delay time length of the audio frame from the source end 10 to the destination end 20 in the set time period and the average value of the actual delay time length, calculating the mean square error between the actual delay time lengths of the audio frame from the source end 10 to the destination end 20 as the audio transmission jitter parameter.

The specific embodiment of determining the audio transmission jitter parameter according to the actual delay time of the audio frame from the source end 10 to the destination end 20 in the set period is shown in the above embodiment by way of example only and not by way of limitation.

Since the audio scheduling jitter parameter of the destination is also one of the main parameters of the audio quality parameter of the destination, the audio scheduling jitter parameter of the destination can be determined according to the callback time interval of the destination for callback audio frames from the audio buffer 201 in the set time period.

In particular, in an ideal case, the callback time interval of the audio frame is equal to the duration of the audio frame. The duration of the audio frame may be a play duration of the audio frame at a set play speed. In the actual process, due to the change of the performance of the processor of the destination end, the callback time interval of the audio frame may change, and the degree of change of the callback time interval of the audio frame may reflect the audio scheduling jitter of the destination end. Based on the difference value between the callback time interval of the destination terminal callback audio frame from the audio buffer area and the preset duration of the audio frame in the set time period can be calculated; and determining an audio scheduling jitter parameter according to the difference between the callback time interval of the destination terminal callback audio frame from the audio buffer and the preset duration of the audio frame in the set time period.

Specifically, according to the difference between the callback time interval of the destination end callback audio frame from the audio buffer and the preset duration of the audio frame in the set time period, the variance or standard deviation between the callback time interval and the duration of the audio frame can be calculated and used as the audio scheduling jitter parameter. Or, a mean value of a difference value between a callback time interval of the destination terminal callback audio frame from the audio buffer and a preset duration of the audio frame in a set time period can be calculated and used as an audio scheduling jitter parameter.

In other embodiments, ideally, the callback time interval of the destination terminal callback audio frame from the audio buffer is fixed, and the variation degree between the actual callback time intervals can reflect the jitter condition of the audio transmission. Based on the method, the average value of callback time intervals of callback audio frames from the audio buffer at the destination end in a set time period can be calculated; and calculating the mean square error between callback time intervals of the destination end callback audio frames from the audio buffer in the set time period according to the mean value of callback time intervals of the destination end callback audio frames from the audio buffer in the set time period, and taking the mean square error as an audio scheduling jitter parameter.

The method for calculating the audio scheduling jitter parameter shown in the above embodiment is only exemplary and not limited thereto.

According to the manner shown in the above embodiments, the audio quality parameters of the destination can be determined. Because the audio stream parameters of the destination end influence the audio quality, the audio stream parameters of the destination end can be adjusted by taking the audio quality parameters as feedback conditions. Based on this, the destination terminal 20 can adjust the audio stream parameters of the destination terminal according to the audio quality parameters of the destination terminal, so as to realize automatic adjustment of the audio stream parameters of the destination terminal, and help to improve the audio parameter adjustment efficiency. On the other hand, the audio stream parameters of the destination terminal are adjusted based on the audio quality parameters, so that the audio quality of the destination terminal is guaranteed.

In the embodiment of the present application, the specific implementation content of the audio stream parameters is not limited. In some embodiments, the playing process of the audio at the destination end can be known based on the following steps: the larger the volume of the audio buffer 201, the more audio frames that can be stored, the less the time jitter between different audio frames that the destination terminal uses to get from the audio buffer 201 by the callback function, and the smoother the audio playing perceived by the user of the destination terminal. Therefore, the amount of data of the audio frames stored in the audio buffer 201 has a certain influence on the audio quality of the destination. However, the more audio frames (i.e., the greater the amount of audio frame data) that the audio buffer 201 buffers, the greater the delay for the destination 20 to first play the audio. For example, in a cloud desktop or cloud application scenario, the more audio frames that are buffered in an audio buffer of a client, the less affected by network jitter when the client recalls an audio frame from the audio buffer 201, and the higher the smoothness of audio playing perceived by a user. The more audio frames the audio buffer 201 buffers, the greater the delay for the client to play the first frame of the requested audio data.

Based on the above analysis, the capacity of the audio buffer 201 has a certain influence on the audio playback quality. Based on this, in order to improve the audio playing quality of the destination, the length of the audio buffer area can be used as the audio stream parameter of the destination. The length of the audio buffer is used to determine the number of audio frames that the audio buffer can store. In the embodiment of the application, the data amount of the audio stored in the audio buffer area corresponds to the playing time length of the audio one by one under the condition of setting the playing speed. Based on this, the length of the audio buffer may be represented in terms of a length of time. The volume of the audio buffer may be equal to the length (duration) of the audio buffer times the set play-out speed.

At the set play speed, the data amount of the audio frame corresponds to the duration of the audio frame. The longer the duration of the audio frame of the destination end is, the smaller the playing quality of the audio frame is affected by network jitter. Based on the above, the length of the audio frame of the destination callback can be used as the audio stream parameter of the destination. The length of the audio frame may also be represented in terms of duration. The amount of data of an audio frame is equal to the duration of the audio frame times the set play speed.

The audio stream parameters based on the destination end comprise: in the embodiment of the length of the audio buffer of the destination end and the duration of the audio frame, the destination end 20 adjusts the audio stream parameters of the destination end according to the audio quality parameters, which can be implemented as follows: according to the audio transmission jitter parameters determined by the embodiment, the length of an audio buffer area is adjusted; and according to the audio scheduling jitter parameters determined by the above embodiment, adjusting the length of the audio frame of the destination callback.

In the embodiment of the application, the length of the audio buffer area is not limited to be adjusted according to the audio transmission jitter parameter; and adjusting the specific implementation mode of the length of the audio frame of the destination callback according to the audio scheduling jitter parameter. The following is an exemplary description in connection with several embodiments.

In some embodiments, an audio transmission jitter threshold may be preset, where the audio transmission jitter threshold is an empirical value obtained according to a priori knowledge, and is a threshold for audio transmission stability determination. If the audio transmission jitter parameter in the set time period is smaller than or equal to the audio transmission jitter threshold, and it is determined that the audio transmission is stable in the set time period, the length of the audio buffer area can be properly reduced, so that the delay of playing the audio at the destination end is reduced. If the audio transmission jitter parameter in the set time period is larger than the audio transmission jitter threshold, determining that the audio transmission jitter is larger and the audio transmission is unstable in the set time period, the length of the audio buffer area needs to be increased, so that the audio transmission jitter is reduced.

Based on this, if the audio transmission jitter parameter in the set time period is smaller than the set audio transmission jitter threshold, the length of the audio buffer can be adjusted to be small. Alternatively, the length of the audio buffer may be reduced by a set first length based on the original length. The set first length can be flexibly set according to actual adjustment requirements.

Accordingly, if the audio transmission jitter parameter in the set time period is smaller than the set audio transmission jitter threshold, the length of the audio buffer can be increased.

In some embodiments, the length of the audio buffer may be scaled up by a set second length based on the original length. The second length can be flexibly set according to actual adjustment requirements. The first length is the same as or different from the second length. Optionally, the first length is less than the second length.

In other embodiments, the number of audio frames transmitted from the source to the destination over a set period of time may be obtained; determining the increment length of the audio buffer zone according to the audio transmission jitter parameter in the set time period and the number of audio frames transmitted from the source end to the destination end in the set time period; and increasing the length of the audio buffer zone on the basis of the original length to obtain the length of the audio buffer zone after the audio buffer zone is enlarged.

For example, the average value of the audio transmission jitter of the audio frames in the set time period can be determined as the increment length of the audio buffer according to the audio transmission jitter parameter in the set time period and the number of the audio frames transmitted from the source end to the destination end in the set time period; and the length of the audio buffer zone is regulated to enlarge the jitter mean value of the audio transmission on the basis of the original length, so as to obtain the length of the regulated audio buffer zone.

For the variance between the actual delay time length and the theoretical delay time length of the audio frame from the source end to the destination end in the set time period, as an example of the audio transmission jitter parameter, the audio transmission jitter average value of the audio frame in the set time period, that is, the increment length of the audio buffer zone, can be expressed asAccordingly, the length of the audio buffer after the adjustment can be expressed as:

in formula (1), b ₁ Representing the length of the audio buffer after the scaling up; n represents the audio transmission jitter parameter in the set time period, namely the variance between the actual delay time length from the source end to the destination end of the audio frame in the set time period and the theoretical delay time length; s denotes the number of audio frames transmitted from the source to the destination within a set period of time. b ₀ Representing the original length of the audio buffer.

In other embodiments, an audio scheduling jitter threshold of the destination may be preset, where the audio scheduling jitter threshold is an empirical value obtained according to a priori knowledge, and is a threshold for determining stability of audio scheduling. If the audio scheduling jitter parameter in the set time period is smaller than or equal to the audio scheduling jitter threshold, and it is determined that the audio scheduling is stable in the set time period, the audio frame length of the destination callback can be properly reduced, so as to reduce the delay of playing the audio by the destination. If the audio scheduling jitter parameter in the set time period is larger than the audio scheduling jitter threshold, the fact that the audio frame of the local callback of the destination end is short is indicated, so that the local processing speed cannot keep pace with the callback speed of the audio frame, and the length of the local audio frame can be properly adjusted to improve the audio playing stability.

Based on the analysis, if the audio scheduling jitter parameter in the set time period is smaller than the set audio scheduling jitter threshold, the length of the audio frame of the destination callback can be reduced. Optionally, the length of the audio frame of the destination callback can be reduced by a set third length based on the original length. The third length can be flexibly set according to the actual adjustment requirement.

Correspondingly, if the audio scheduling jitter parameter in the set time period is smaller than the set audio scheduling jitter threshold, the length of the audio frame of the destination callback can be increased.

In some embodiments, the length of the audio frame of the destination callback may be increased by a set fourth length based on the original length. The set fourth length can be flexibly set according to actual adjustment requirements. The third length is the same as or different from the fourth length. Optionally, the third length is less than the fourth length.

In other embodiments, the number of audio frames of the destination callback in the set time period may be obtained; according to the audio scheduling jitter parameter in the set time period and the number of the audio frames of the destination callback in the set time period, determining the increment length of the audio frames of the destination callback; and increasing the length of the audio frame of the callback of the target end on the basis of the original length to obtain the length of the audio frame of the callback of the target end after the enlargement.

For example, according to the audio scheduling jitter parameter in the set time period and the number of the audio frames of the destination callback in the set time period, determining the audio scheduling jitter average value of the audio frames in the set time period as the increment length of the audio frames of the destination callback; and the length of the audio buffer zone and the length of the audio frame of the callback of the target end are increased, and the audio scheduling jitter average value is increased on the basis of the original length, so that the length of the audio frame of the callback of the target end after the audio frame is enlarged is obtained.

For the embodiment of taking the variance between the callback time interval of the destination end and the duration of the audio frame in the set time period as the audio scheduling jitter parameter, the audio callback jitter average value of the audio frame in the set time period can be expressed asAccordingly, the length of the audio frame of the target callback after the audio frame is enlarged can be expressed as:

in formula (2), l ₁ Representing the length of the audio frame of the callback of the target end after the audio frame is enlarged; l represents the audio scheduling jitter parameter in a set time period, namely the variance between the callback time interval of the destination end and the duration of the audio frame in the set time period; r represents the number of audio frames of the destination callback in the set time period. l (L) ₀ Representing the original length of the audio frame of the destination callback.

The length of the audio buffer of the destination and the adjustment manner of the audio frame length of the destination callback shown in the above embodiments are only illustrative, but not limiting.

In addition to the data processing system provided in the above embodiments, the embodiments of the present application also provide an audio parameter adjustment method. The audio parameter adjustment method provided by the embodiment of the application can be applied to an audio receiving end (namely a destination end), and can also be applied to other computing devices, such as edge nodes near the audio receiving end in an edge cloud system, and the like. The following describes an exemplary audio parameter adjustment method provided in the embodiment of the present application.

Fig. 3a is a flowchart illustrating an audio adjustment method according to an embodiment of the present application. As shown in fig. 3a, the audio adjustment method mainly includes:

301. and acquiring the actual delay time length of the audio frame from the source end to the destination end in the set time period.

302. And acquiring a callback time interval of the destination terminal callback audio frame from the audio buffer area within a set time period.

303. And determining the audio quality parameter of the destination terminal according to the actual delay time length and the callback time interval.

304. And adjusting the audio stream parameters of the destination terminal according to the audio quality parameters.

For a destination terminal receiving audio data, audio frames may not arrive at a speaker of the destination terminal on time due to network jitter, audio jamming, even break-up, and the like occur. In order to alleviate the phenomenon that network jitter and the like cause audio jamming, the destination end can be provided with an audio buffer. For a description of the audio buffer, reference may be made to the relevant content of the above embodiments.

Based on the audio buffer area, the destination end can acquire the audio stream from the source end, store the audio stream into the audio buffer area, and then uniformly send the audio stream to a loudspeaker of the destination end for playing in a callback function mode.

Based on the audio buffer area, the destination end can acquire the audio frame sent by the source end and store the audio frame sent by the source end into the audio buffer area. Further, the audio thread of the destination end can be utilized to acquire the audio frame to be played from the audio buffer area through a callback function; and transmitting the audio frame to be played to a loudspeaker of the destination terminal for playing. The process is the local scheduling process of the destination terminal.

In the embodiment of the application, the quality of the audio stream mainly reflects the fluency of playing the video by the destination terminal. The inventor of the application researches and discovers that the audio stream quality of the destination end is mainly influenced by network jitter and audio scheduling jitter local to the destination end. Because the audio transmission jitter and the audio scheduling jitter of the destination end are main factors influencing the audio stream quality of the destination end, the audio stream parameters of the destination end can be adjusted according to the audio transmission jitter parameters and the audio scheduling jitter parameters.

Based on the above analysis, since the audio transmission jitter describes the degree of variation of the delay of the audio from the source to the destination, in order to determine the audio quality of the destination, the actual delay period of the audio frame from the source to the destination in the set period of time may be acquired in step 301. Also, since the audio scheduling jitter is a measure describing the delay of the destination end scheduling the audio from the audio buffer to the speaker for playing, in step 302, a callback time interval of the destination end for callback of the audio frame from the audio buffer in the set time period may also be obtained.

Further, in step 303, the audio quality parameter of the destination terminal may be determined according to the actual delay time of the audio frame from the source terminal to the destination terminal in the set period of time and the callback time interval of the destination terminal for callback of the audio frame from the audio buffer in the set period of time.

In some embodiments, the audio transmission jitter parameter may be determined based on the actual delay time of the audio frame from the source to the destination over a set period of time. In particular, in an ideal case, the actual delay time of the audio frame from the source end to the destination end is fixed and equal to the preset theoretical delay time. The fluctuation degree of the actual delay time length of the audio frame from the source end to the destination end compared with the theoretical delay time length can reflect the audio transmission jitter. Based on the above, the audio transmission jitter parameter may be determined according to the actual delay time length from the source end to the destination end of the audio frame in the set period of time and the preset theoretical delay time length.

Optionally, a difference value between an actual delay time length of the audio frame from the source end to the destination end in a set time period and a preset theoretical delay time length can be calculated; and determining an audio transmission jitter parameter according to the difference between the actual delay time length of the audio frame from the source end to the destination end and the theoretical delay time length in the set time period.

Specifically, according to the difference between the actual delay time length and the theoretical delay time length of the audio frame from the source end to the destination end in the set time period, the variance or standard deviation between the actual delay time length and the theoretical delay time length can be calculated and used as the audio transmission jitter parameter. Alternatively, the average value of the difference between the actual delay time length and the theoretical delay time length of the audio frame from the source end to the destination end in the set time period can be calculated as the audio transmission jitter parameter.

In other embodiments, the actual delay time of the audio frame from the source end to the destination end is ideally fixed, and the variation degree between the actual delay time can reflect the jitter condition of the audio transmission. Based on the method, the average value of the actual delay time length of the audio frame from the source end to the destination end in the set time period can be calculated; and calculating the mean square error between the actual delay time lengths of the audio frames from the source end to the destination end as the audio transmission jitter parameter according to the actual delay time lengths of the audio frames from the source end to the destination end and the average value of the actual delay time lengths in the set time period.

The specific embodiments for determining the audio transmission jitter parameters according to the actual delay time of the audio frame from the source end to the destination end in the set time period are only exemplary and not limited.

Because the audio scheduling jitter parameter of the destination terminal is also one of the main parameters of the audio quality parameter of the destination terminal, the audio scheduling jitter parameter of the destination terminal can be determined according to the callback time interval of the destination terminal for callback of the audio frame from the audio buffer in the set time period.

Specifically, a difference value between a callback time interval of the destination end callback audio frame from the audio buffer and a preset duration of the audio frame in a set time period can be calculated; and determining an audio scheduling jitter parameter according to the difference between the callback time interval of the destination terminal callback audio frame from the audio buffer and the preset duration of the audio frame in the set time period.

According to the manner shown in the above embodiments, the audio quality parameters of the destination can be determined. Because the audio stream parameters of the destination end influence the audio quality, the audio stream parameters of the destination end can be adjusted by taking the audio quality parameters as feedback conditions. Based on this, in step 304, the audio stream parameters of the destination can be adjusted according to the audio quality parameters of the destination, so as to implement automatic adjustment of the audio stream parameters of the destination.

In this embodiment, according to the actual delay time of the audio frame from the source end to the destination end in the set period of time and the callback time interval of the destination end callback audio frame from the audio buffer area in the period of time, the audio quality parameter of the destination end is determined, and quantization of the audio quality of the destination end is realized. Further, according to the audio quality parameters of the destination terminal, the audio stream parameters of the destination terminal are adjusted, so that the automatic adjustment of the audio stream parameters of the destination terminal is realized, and the improvement of the audio parameter adjustment efficiency is facilitated. On the other hand, the audio stream parameters of the destination terminal are adjusted based on the audio quality parameters, so that the audio quality of the destination terminal is guaranteed to a certain extent, and the audio quality of the destination terminal is guaranteed.

In the embodiment of the present application, the specific implementation content of the audio stream parameters is not limited. The capacity of the audio buffer has a certain impact on the audio playback quality. Based on this, in order to improve the audio playing quality of the destination, the length of the audio buffer area can be used as the audio stream parameter of the destination. The length of the audio buffer is used to determine the number of audio frames that the audio buffer can store.

At the set play speed, the data amount of the audio frame corresponds to the duration of the audio frame. The longer the duration of the audio frame of the destination end is, the smaller the playing quality of the audio frame is affected by network jitter. Based on this, the duration of the audio frame of the destination may also be used as the audio stream parameter of the destination.

The audio stream parameters based on the destination end comprise: in the embodiment of the length of the audio buffer area of the destination end and the duration of the audio frame, according to the audio quality parameter, adjusting the audio stream parameter of the destination end may be implemented as: according to the audio transmission jitter parameters determined by the embodiment, the length of an audio buffer area is adjusted; and according to the audio scheduling jitter parameters determined by the above embodiment, adjusting the length of the audio frame of the destination callback.

The audio parameter adjustment method provided by the embodiment of the application is suitable for application scenes with high requirements on audio instantaneity. For example, the method can be applied to a cloud desktop or a cloud application scene. In a cloud desktop or cloud application scene, the source terminal is a service terminal of the cloud desktop or a service terminal of the cloud application; the client is a cloud desktop client or a cloud application client. The server may store audio data of the cloud-side application and send the audio data to the client in response to a request from the client. In the audio-video call scene, the source terminal can collect the audio data of the source terminal user through the pickup and send the audio data to other terminals (namely the destination terminal) in the call. Taking a cloud desktop or a cloud application scene as an example, an audio parameter adjustment method provided by the embodiment of the application is described in an exemplary manner.

Fig. 3b is a flowchart illustrating another audio parameter adjustment method according to an embodiment of the present application. The method can be suitable for cloud desktops or cloud application clients, and can also be suitable for other computing devices, such as edge nodes near one side of the clients in an edge cloud system. As shown in fig. 3b, the method mainly includes:

31. acquiring the actual delay time length of an audio frame from a server to a client in a set time period; the server side and the client side are respectively a cloud desktop server side and a cloud desktop client side, or a cloud application server side and a cloud desktop client side.

32. And acquiring a callback time interval of the client callback audio frames from the audio buffer in a set time period.

33. And determining the audio quality parameter of the client according to the actual delay time length and the callback time interval.

34. And adjusting the audio stream parameters of the client according to the audio quality parameters.

For a client receiving audio data in a cloud desktop or cloud application scene, audio frames can not arrive at a loudspeaker of the client on time due to network jitter, audio jamming, even cut-off and the like can occur. To mitigate the phenomenon of audio stuck caused by network jitter, etc., the client may set an audio buffer. For a description of the audio buffer, reference may be made to the relevant content of the above embodiments.

Since the audio transmission jitter describes the extent of the change in the delay of the audio from the server to the client, in order to determine the audio quality of the client, in step 31, the actual delay time of the audio frame from the cloud desktop server to the cloud desktop client in the set period of time may be obtained; or acquiring the actual delay time length of the audio frame from the cloud application server to the cloud application client in the set time period. Also, since the audio scheduling jitter is a measure describing how much the client delays playing audio from the audio buffer to the speaker, in step 32, a callback time interval for the client to callback audio frames from the audio buffer during the set time period may also be obtained.

Further, in step 33, the audio quality parameter of the client may be determined according to the actual delay time of the audio frame from the server to the client in the set period of time and the callback time interval of the client callback of the audio frame from the audio buffer in the set period of time.

In this embodiment of the present application, the specific implementation manner of determining the audio quality parameter of the client according to the actual delay duration and the callback time interval may refer to the above description of determining the relevant content of the audio quality parameter of the destination according to the actual delay duration and the callback time interval, which is not described herein again.

Because the audio stream parameters of the client affect the audio quality, the audio stream parameters of the client can be adjusted by taking the audio quality parameters as feedback conditions. Based on this, in step 34, the audio stream parameters of the client may be adjusted according to the audio quality parameters of the client, so as to implement automatic adjustment of the audio stream parameters of the client.

In this embodiment, according to the actual delay time of the audio frame from the server to the client in the set period and the callback time interval of the client callback audio frame from the audio buffer in the period, the audio quality parameter of the client is determined, so as to realize the quantization of the audio quality of the client. Further, according to the audio quality parameters of the client, the audio stream parameters of the client are adjusted, so that the automatic adjustment of the audio stream parameters of the client is realized, and the improvement of the audio parameter adjustment efficiency is facilitated. On the other hand, the audio stream parameters of the client are adjusted based on the audio quality parameters, so that the audio quality of the client is ensured to a certain extent, and the audio quality of the client is ensured.

In the embodiment of the present application, the specific implementation content of the audio stream parameters is not limited. The capacity of the audio buffer has a certain impact on the audio playback quality. Based on this, in order to improve the audio playing quality of the client, the length of the audio buffer may be used as the audio stream parameter of the client. The length of the audio buffer is used to determine the number of audio frames that the audio buffer can store.

At the set play speed, the data amount of the audio frame corresponds to the duration of the audio frame. The longer the duration of the audio frame of the client, the smaller the play quality of the audio frame is affected by the network jitter. Based on this, the duration of the audio frame of the client may also be used as an audio stream parameter for the client.

The audio stream parameters based on the client include: in the embodiment of the length of the audio buffer and the duration of the audio frame of the client, according to the audio quality parameter, adjusting the audio stream parameter of the client may be implemented as: according to the audio transmission jitter parameters determined by the embodiment, the length of an audio buffer area is adjusted; and according to the audio scheduling jitter parameters determined by the above embodiment, adjusting the length of the audio frame of the client callback.

In the embodiment of the application, the length of the audio buffer area is adjusted according to the audio transmission jitter parameter; and adjusting the length of the audio frame of the client callback according to the audio scheduling jitter parameter, which can be referred to the related content of the above embodiment and will not be described herein.

It should be noted that, the execution subjects of each step of the method provided in the above embodiment may be the same device, or the method may also be executed by different devices. For example, the execution subject of steps 301 and 302 may be device a; for another example, the execution body of step 301 may be device a, and the execution body of step 302 may be device B; etc.

In addition, in some of the flows described in the above embodiments and the drawings, a plurality of operations appearing in a specific order are included, but it should be clearly understood that the operations may be performed out of the order in which they appear herein or performed in parallel, the sequence numbers of the operations such as 301, 302, etc. are merely used to distinguish between the various operations, and the sequence numbers themselves do not represent any order of execution. In addition, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel.

Accordingly, embodiments of the present application also provide a computer-readable storage medium storing computer instructions that, when executed by one or more processors, cause the one or more processors to perform the steps in the above-described respective audio parameter adjustment methods.

Fig. 4 is a schematic structural diagram of a computing device according to an embodiment of the present application. As shown in fig. 4, the computing device mainly includes: a memory 40a and a processor 40b. A memory 40a for storing a computer program.

The processor 40b is coupled to the memory 40a for executing a computer program for: acquiring the actual delay time length of an audio frame from a source end to a destination end in a set time period; obtaining a callback time interval of a destination terminal callback audio frame from an audio buffer area within a set time period; determining an audio quality parameter of the destination according to the actual delay time length and the callback time interval; and adjusting the audio stream parameters of the destination terminal according to the audio quality parameters.

In some embodiments, the audio quality parameters include: audio transmission jitter parameters and audio scheduling jitter parameters of the destination. Accordingly, the processor 40b is specifically configured to, when determining the audio quality parameter of the destination according to the actual delay duration and the callback time interval: determining an audio transmission jitter parameter according to the actual delay time length and the preset theoretical delay time length; and determining the audio scheduling jitter parameter of the destination terminal according to the callback time interval and the duration of the audio frame.

Further, the processor 40b is specifically configured to, when determining the audio transmission jitter parameter according to the actual delay duration and the preset theoretical delay duration: calculating a difference between the actual delay time and the theoretical delay time; and determining an audio transmission jitter parameter according to the difference between the actual delay time length and the theoretical delay time length.

Optionally, the processor 40b is specifically configured to, when determining the audio transmission jitter parameter according to a difference between the actual delay period and the theoretical delay period: calculating the variance or standard deviation between the actual delay time length and the theoretical delay time length according to the difference between the actual delay time length and the theoretical delay time length, and taking the variance or standard deviation as an audio transmission jitter parameter; or, calculating the average value of the difference between the actual delay time length and the theoretical delay time length as an audio transmission jitter parameter.

In other embodiments, the processor 40b is specifically configured to, when determining the audio scheduling jitter parameter of the destination according to the callback time interval and the duration of the audio frame: calculating a difference between the callback time interval and the duration of the audio frame; and determining the audio scheduling jitter parameter according to the difference value between the callback time interval and the duration of the audio frame.

Further, the processor 40b is specifically configured to, when determining the audio scheduling jitter parameter according to the difference between the callback time interval and the duration of the audio frame: calculating the variance or standard deviation between the callback time interval and the duration of the audio frame as an audio transmission jitter parameter according to the difference between the callback time interval and the duration of the audio frame; or, calculating the average value of the difference between the callback time interval and the duration of the audio frame as the audio transmission jitter parameter.

In some embodiments of the present application, the audio stream parameters include: the length of the audio buffer and the length of the audio frame local to the destination. Accordingly, the processor 40b is specifically configured to, when adjusting the audio stream parameters of the destination according to the audio quality parameters: adjusting the length of an audio buffer area according to the audio transmission jitter parameters; and adjusting the length of the audio frame of the destination callback according to the audio scheduling jitter parameter.

Further, the processor 40b is specifically configured to, when adjusting the length of the audio buffer according to the audio transmission jitter parameter: if the audio transmission jitter parameter is smaller than the set audio transmission jitter threshold value, reducing the length of the audio buffer; if the audio transmission jitter parameter is greater than the audio transmission jitter threshold, the length of the audio buffer is increased.

Optionally, the processor 40b is specifically configured to, when adjusting down the length of the audio buffer: and adjusting the length of the audio buffer zone to be smaller than the set first length on the basis of the original length.

And/or, the processor 40b is specifically configured to, when the length of the audio buffer is adjusted to be greater: acquiring a first number of audio frames transmitted from a source terminal to a destination terminal in a set time period; determining a first increment length corresponding to the audio buffer area according to the audio transmission jitter parameter and the first quantity; and increasing the length of the audio buffer area by the first increment length based on the original length.

Further, the processor 40b is specifically configured to, when adjusting the length of the audio buffer according to the audio transmission jitter parameter and the first number: and calculating an audio transmission jitter average value corresponding to each frame of audio in a set time period according to the audio transmission jitter parameters and the first quantity, and taking the audio transmission jitter average value as a first increment length.

Optionally, the processor 40b is specifically configured to, when adjusting the length of the audio frame of the destination callback according to the audio scheduling jitter parameter: if the audio scheduling jitter parameter is smaller than the set audio scheduling jitter threshold, reducing the length of the audio frame called back by the destination terminal; and if the audio transmission jitter parameter is greater than the audio scheduling jitter threshold, the length of the audio frame called back by the destination terminal is enlarged.

Optionally, the processor 40b is specifically configured to, when reducing the length of the audio frame of the destination call-back: and reducing the length of the audio frame called back by the destination end by a set second length on the basis of the original length.

And/or, the processor 40b is specifically configured to, when the length of the audio frame of the destination callback is adjusted to be greater: acquiring a second number of audio frames recalled from the destination end in a set time period; determining a second increment length corresponding to the audio frame of the destination callback according to the audio scheduling jitter parameter and the second quantity; and increasing the length of the audio frame of the destination callback by the second increment length on the basis of the original length.

Optionally, the processor 40b is specifically configured to, when adjusting the length of the audio frame of the destination call-back according to the audio scheduling jitter parameter and the second number: and calculating an audio callback jitter average value corresponding to each frame of audio in a set time period according to the audio scheduling jitter parameters and the first quantity, and taking the audio callback jitter average value as a second increment length.

In some embodiments, the computing device is implemented as the destination end described above. Accordingly, the computing device may further include: communication component 40c and audio component 40d. The processor 40b is also configured to: acquiring an audio frame sent by a source end through a communication component 40 d; storing the audio frames sent by the source end into an audio buffer area; acquiring an audio frame to be played from an audio buffer zone by utilizing an audio thread of a destination terminal through a callback function; and transmitting the audio frames to be played to speakers in the audio component 40d for playback.

In some embodiments of the present application, the processor 40b is further configured to: acquiring the actual delay time length of an audio frame from a server to a client in a set time period; the server side and the client side are a cloud desktop server side and a cloud desktop client side, or a cloud application server side and a cloud application client side; acquiring a callback time interval of the client callback audio frame from the audio buffer area within a set time period; determining an audio quality parameter of the client according to the actual delay time length and the callback time interval; and adjusting the audio stream parameters of the client according to the audio quality parameters. The specific implementation manner of each operation step in this embodiment may refer to the relevant content of the foregoing embodiment, which is not described herein again.

In some alternative implementations, as shown in fig. 4, the computing device may further include: power supply assembly 40e, display assembly 40f, etc. Only a portion of the components are schematically shown in fig. 4, which does not mean that the computing device must contain all of the components shown in fig. 4, nor that the computing device can only include the components shown in fig. 4.

According to the computing device provided by the embodiment, the audio quality parameter of the destination terminal can be determined according to the actual delay time of the audio frame from the source terminal to the destination terminal in the set time period and the callback time interval of the destination terminal callback audio frame from the audio buffer in the time period, so that the quantization of the audio quality of the destination terminal is realized. Further, according to the audio quality parameters of the destination terminal, the audio stream parameters of the destination terminal are adjusted, so that the automatic adjustment of the audio stream parameters of the destination terminal is realized, and the improvement of the audio parameter adjustment efficiency is facilitated. On the other hand, the audio stream parameters of the destination terminal are adjusted based on the audio quality parameters, so that the audio quality of the destination terminal is guaranteed to a certain extent, and the audio quality of the destination terminal is guaranteed.

Fig. 5 is a schematic structural diagram of an audio parameter adjusting apparatus according to an embodiment of the present application. As shown in fig. 5, the audio parameter adjusting apparatus includes: the acquisition module 50a, the determination module 50b and the adjustment module 50c.

The acquiring module 50a is configured to acquire an actual delay time length of the audio frame from the source end to the destination end in a set period of time; and acquiring a callback time interval of the destination terminal callback audio frame from the audio buffer in the set time period.

The determining module 50b is configured to determine an audio quality parameter of the destination according to the actual delay duration and the callback time interval.

The adjusting module 50c is configured to adjust the audio stream parameter of the destination according to the audio quality parameter.

In some embodiments, the audio quality parameters include: audio transmission jitter parameters and audio scheduling jitter parameters of the destination. Accordingly, the determining module 50b is specifically configured to, when determining the audio quality parameter of the destination according to the actual delay duration and the callback time interval: determining an audio transmission jitter parameter according to a difference value between the actual delay time length and a preset theoretical delay time length; and determining the audio scheduling jitter parameter of the destination terminal according to the difference value between the callback time interval and the duration of the audio frame.

Optionally, the determining module 50b is specifically configured to, when determining the audio transmission jitter parameter according to a difference between the actual delay period and the theoretical delay period: calculating the variance or standard deviation between the actual delay time length and the theoretical delay time length according to the difference between the actual delay time length and the theoretical delay time length, and taking the variance or standard deviation as an audio transmission jitter parameter; or, calculating the average value of the difference between the actual delay time length and the theoretical delay time length as an audio transmission jitter parameter.

Optionally, the determining module 50b is specifically configured to, when determining the audio scheduling jitter parameter according to a difference between the callback time interval and the duration of the audio frame: calculating the variance or standard deviation between the callback time interval and the duration of the audio frame as an audio transmission jitter parameter according to the difference between the callback time interval and the duration of the audio frame; or, calculating the average value of the difference between the callback time interval and the duration of the audio frame as the audio transmission jitter parameter.

In other embodiments, the audio stream parameters include: the length of the audio buffer and the length of the audio frame local to the destination. Accordingly, the adjusting module 50c is specifically configured to, when adjusting the audio stream parameter of the destination according to the audio quality parameter: adjusting the length of an audio buffer area according to the audio transmission jitter parameters; and adjusting the length of the audio frame of the callback of the destination terminal according to the audio scheduling jitter parameter.

Further, the adjusting module 50c is specifically configured to, when adjusting the length of the audio buffer according to the audio transmission jitter parameter: if the audio transmission jitter parameter is smaller than the set audio transmission jitter threshold value, reducing the length of the audio buffer; if the audio transmission jitter parameter is greater than the audio transmission jitter threshold, the length of the audio buffer is increased.

Optionally, the adjusting module 50c is specifically configured to, when adjusting down the length of the audio buffer: and adjusting the length of the audio buffer zone to be smaller than the set first length on the basis of the original length.

The adjusting module 50c is specifically configured to, when the length of the audio buffer is adjusted to be greater: acquiring a first number of audio frames transmitted from a source terminal to a destination terminal in a set time period; determining a first increment length corresponding to the audio buffer area according to the audio transmission jitter parameter and the first quantity; the length of the audio buffer is increased by a first increment length based on the original length.

Optionally, the adjusting module 50c is specifically configured to, when adjusting the length of the audio frame of the destination callback according to the audio scheduling jitter parameter: if the audio scheduling jitter parameter is smaller than the set audio scheduling jitter threshold, reducing the length of the audio frame called back by the destination terminal; and if the audio transmission jitter parameter is greater than the audio scheduling jitter threshold, the length of the audio frame called back by the destination terminal is enlarged.

Further, the adjusting module 50c is specifically configured to, when reducing the length of the audio frame of the destination callback: and reducing the length of the audio frame called back by the destination end by a set second length on the basis of the original length.

The adjusting module 50c increases the length of the audio frame of the destination callback, including: acquiring a second number of audio frames recalled by a destination end in a set time period; determining a second increment length corresponding to the audio frame called back by the destination end according to the audio scheduling jitter parameter and the second quantity; and increasing the length of the audio frame recalled by the destination end by a second increment length on the basis of the original length.

In some embodiments of the present application, the obtaining module 50a is configured to obtain an actual delay duration of an audio frame from a server to a client in a set period of time, where the server and the client are a cloud desktop server and a cloud desktop client, or are a cloud application server and a cloud application client; and acquiring a callback time interval of the callback of the client from the audio buffer to the audio frame in the set time period.

A determining module 50b, configured to determine an audio quality parameter of the client according to the actual delay duration and the callback time interval.

The adjusting module 50c is configured to adjust the audio stream parameters of the client according to the audio quality parameters.

Regarding the obtaining module 50a obtaining the actual delay time of the audio frame from the cloud desktop or the server side of the cloud application to the cloud desktop or the client side of the cloud application in the set period of time, and obtaining the specific implementation of the callback time interval of the client side callback audio frame from the audio buffer in the set period of time, the determining module 50b determines the specific implementation of the audio quality parameter of the client side according to the actual delay time and the callback time interval, and the adjusting module 50c adjusts the specific implementation of the audio stream parameter of the client side according to the audio quality parameter, which will not be described herein.

According to the audio parameter adjusting device provided by the embodiment, the audio quality parameter of the destination terminal can be determined according to the actual delay time of the audio frame from the source terminal to the destination terminal in the set time period and the callback time interval of the destination terminal callback audio frame from the audio buffer zone in the time period, so that the quantization of the audio quality of the destination terminal is realized. Further, according to the audio quality parameters of the destination terminal, the audio stream parameters of the destination terminal are adjusted, so that the automatic adjustment of the audio stream parameters of the destination terminal is realized, and the improvement of the audio parameter adjustment efficiency is facilitated. On the other hand, the audio stream parameters of the destination terminal are adjusted based on the audio quality parameters, so that the audio quality of the destination terminal is guaranteed to a certain extent, and the audio quality of the destination terminal is guaranteed.

In embodiments of the present application, the memory is used to store a computer program and may be configured to store various other data to support operations on the device on which it resides. Wherein the processor may execute a computer program stored in the memory to implement the corresponding control logic. The Memory may be implemented by any type or combination of volatile or non-volatile Memory devices, such as Static Random-Access Memory (SRAM), electrically erasable programmable Read-Only Memory (Electrically Erasable Programmable Read Only Memory, EEPROM), erasable programmable Read-Only Memory (Electrical Programmable Read Only Memory, EPROM), programmable Read-Only Memory (Programmable Read Only Memory, PROM), read Only Memory (ROM), magnetic Memory, flash Memory, magnetic or optical disk.

In the embodiments of the present application, the processor may be any hardware processing device that may execute the above-described method logic. Alternatively, the processor may be a central processing unit (Central Processing Unit, CPU), a graphics processor (Graphics Processing Unit, GPU) or a micro control unit (Microcontroller Unit, MCU); programmable devices such as Field programmable gate arrays (Field-Programmable Gate Array, FPGA), programmable array logic devices (Programmable Array Logic, PAL), general array logic devices (General Array Logic, GAL), complex programmable logic devices (Complex Programmable Logic Device, CPLD), and the like; or an application specific integrated circuit (Application Specific Integrated Circuit, ASIC) chip; or an advanced reduced instruction set (Reduced Instruction Set Compute, RISC) processor (Advanced RISC Machines, ARM) or System on Chip (SoC), etc., but is not limited thereto.

In embodiments of the present application, the communication component is configured to facilitate wired or wireless communication between the device in which it resides and other devices. The device in which the communication component is located may access a wireless network based on a communication standard, such as wireless fidelity (Wireless Fidelity, wiFi), 2G or 3G,4G,5G or a combination thereof. In one exemplary embodiment, the communication component receives a broadcast signal or broadcast-related information from an external broadcast management system via a broadcast channel. In one exemplary embodiment, the communication component may also be implemented based on near field communication (Near Field Communication, NFC) technology, radio frequency identification (Radio Frequency Identification, RFID) technology, infrared data association (Infrared Data Association, irDA) technology, ultra Wide Band (UWB) technology, bluetooth (BT) technology, or other technologies.

In embodiments of the present application, the display assembly may include a liquid crystal display (Liquid Crystal Display, LCD) and a Touch Panel (TP). If the display assembly includes a touch panel, the display assembly may be implemented as a touch screen to receive input signals from a user. The touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may sense not only the boundary of a touch or sliding action, but also the duration and pressure associated with the touch or sliding operation.

In embodiments of the present application, the power supply assembly is configured to provide power to the various components of the device in which it is located. The power components may include a power management system, one or more power sources, and other components associated with generating, managing, and distributing power for the devices in which the power components are located.

In embodiments of the present application, the audio component may be configured to output and/or input audio signals. For example, the audio component includes a Microphone (MIC) configured to receive external audio signals when the device in which the audio component is located is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may be further stored in a memory or transmitted via a communication component. In some embodiments, the audio assembly further comprises a speaker for outputting audio signals. For example, for a device with language interaction functionality, voice interaction with a user, etc., may be accomplished through an audio component.

It should be noted that, the user information (including but not limited to user equipment information, user personal information, etc.) and the data (including but not limited to data for analysis, stored data, presented data, etc.) related to the present application are information and data authorized by the user or fully authorized by each party, and the collection, use and processing of the related data need to comply with the related laws and regulations and standards of the related country and region, and provide corresponding operation entries for the user to select authorization or rejection.

It should be further noted that, the descriptions of "first" and "second" herein are used to distinguish between different messages, devices, modules, etc., and do not represent a sequence, nor do they limit that "first" and "second" are different types.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, magnetic disk storage, CD-ROM (Compact Disc Read-Only Memory), optical storage, etc.) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (or systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (e.g., CPUs, etc.), input/output interfaces, network interfaces, and memory.

The Memory may include volatile Memory, random-Access Memory (RAM), and/or nonvolatile Memory in a computer-readable medium, such as Read Only Memory (ROM) or Flash Memory (Flash RAM). Memory is an example of computer-readable media.

The storage medium of the computer is a readable storage medium, which may also be referred to as a readable medium. Readable storage media, including both permanent and non-permanent, removable and non-removable media, may be implemented in any method or technology for information storage. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase-Change Memory (PRAM), static Random-Access Memory (SRAM), dynamic Random-Access Memory (Dynamic Random Access Memory, DRAM), other types of Random-Access Memory (RAM), read-only Memory (ROM), electrically erasable programmable read-only Memory (Electrically Erasable Programmable Read Only Memory, EEPROM), flash Memory or other Memory technology, read-only compact disc read-only Memory (CD-ROM), digital versatile discs (Digital Video Disc, DVD) or other optical storage, magnetic cassettes, magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by the computing device. Computer-readable Media, as defined herein, does not include Transitory computer-readable Media (transmission Media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims

1. An audio parameter adjustment method, comprising:

2. The method of claim 1, wherein the audio quality parameters comprise: an audio transmission jitter parameter and an audio scheduling jitter parameter of the destination terminal; and determining the audio quality parameter of the destination according to the actual delay time length and the callback time interval, including:

determining an audio transmission jitter parameter according to the difference between the actual delay time length and a preset theoretical delay time length;

and determining the audio scheduling jitter parameter of the destination terminal according to the difference value between the callback time interval and the duration of the audio frame.

3. The method of claim 2, wherein said determining the audio transmission jitter parameter based on the difference between the actual delay period and the theoretical delay period comprises:

calculating a variance or standard deviation between the actual delay time length and the theoretical delay time length as the audio transmission jitter parameter according to the difference between the actual delay time length and the theoretical delay time length;

Or alternatively, the process may be performed,

and calculating the average value of the difference between the actual delay time length and the theoretical delay time length as the audio transmission jitter parameter.

4. The method of claim 2, wherein determining the audio scheduling jitter parameter based on a difference between the callback time interval and the duration of the audio frame comprises:

calculating a variance or standard deviation between the callback time interval and the duration of the audio frame as the audio transmission jitter parameter according to the difference between the callback time interval and the duration of the audio frame;

or alternatively, the process may be performed,

and calculating the average value of the difference value between the callback time interval and the duration of the audio frame as the audio transmission jitter parameter.

5. The method of any of claims 2-4, wherein the audio stream parameters comprise: the length of the audio buffer area and the length of the audio frame local to the destination end; the adjusting the audio stream parameters of the destination according to the audio quality parameters includes:

adjusting the length of the audio buffer area according to the audio transmission jitter parameter;

and adjusting the length of the audio frame of the destination callback according to the audio scheduling jitter parameter.

6. The method of claim 5, wherein adjusting the length of the audio buffer based on the audio transmission jitter parameter comprises:

if the audio transmission jitter parameter is smaller than a set audio transmission jitter threshold value, reducing the length of the audio buffer;

and if the audio transmission jitter parameter is larger than the audio transmission jitter threshold value, the length of the audio buffer area is increased.

7. The method of claim 6, wherein said reducing the length of the audio buffer comprises:

the length of the audio buffer area is reduced by a set first length on the basis of the original length;

and/or the number of the groups of groups,

the adjusting the length of the audio buffer comprises:

acquiring a first number of audio frames transmitted from the source end to the destination end in the set time period;

determining a first increment length corresponding to an audio buffer according to the audio transmission jitter parameter and the first quantity;

and increasing the length of the audio buffer area by the first increment length based on the original length.

8. The method of claim 5, wherein adjusting the length of the audio frame of the destination call-back according to the audio scheduling jitter parameter comprises:

If the audio scheduling jitter parameter is smaller than a set audio scheduling jitter threshold, reducing the length of the audio frame recalled by the destination terminal;

and if the audio transmission jitter parameter is larger than the audio scheduling jitter threshold, the length of the audio frame of the destination callback is increased.

9. The method of claim 8, wherein the reducing the length of the audio frame of the destination call-back comprises:

the length of the audio frame recalled by the destination end is reduced by a set second length on the basis of the original length;

and/or the number of the groups of groups,

the step of adjusting the length of the audio frame of the destination callback comprises the following steps:

acquiring a second number of audio frames recalled by the destination end in the set time period;

determining a second increment length corresponding to the audio frame of the destination callback according to the audio scheduling jitter parameter and the second quantity;

and increasing the length of the audio frame of the destination callback by the second increment length on the basis of the original length.

10. An audio parameter adjustment method, comprising:

11. An audio parameter adjustment apparatus, comprising:

12. An audio parameter adjustment apparatus, comprising:

13. A computing device, comprising: a memory and a processor; wherein the memory is used for storing a computer program;

the processor is coupled to the memory for executing the computer program for performing the steps in the method of any of claims 1-10.

14. A computer-readable storage medium storing computer instructions that, when executed by one or more processors, cause the one or more processors to perform the steps in the method of any of claims 1-10.