WO2023050994A1 - Audio control method and apparatus, device, and computer readable storage medium - Google Patents
Audio control method and apparatus, device, and computer readable storage medium Download PDFInfo
- Publication number
- WO2023050994A1 WO2023050994A1 PCT/CN2022/108334 CN2022108334W WO2023050994A1 WO 2023050994 A1 WO2023050994 A1 WO 2023050994A1 CN 2022108334 W CN2022108334 W CN 2022108334W WO 2023050994 A1 WO2023050994 A1 WO 2023050994A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- audio
- data
- network
- control instruction
- decoding
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims abstract description 220
- 230000008569 process Effects 0.000 claims abstract description 160
- 238000012545 processing Methods 0.000 claims abstract description 69
- 239000000872 buffer Substances 0.000 claims description 70
- 238000007906 compression Methods 0.000 claims description 57
- 230000006835 compression Effects 0.000 claims description 50
- 230000005540 biological transmission Effects 0.000 claims description 45
- 230000003139 buffering effect Effects 0.000 claims description 29
- 238000004590 computer program Methods 0.000 claims description 5
- 230000008859 change Effects 0.000 claims description 4
- 238000012549 training Methods 0.000 description 21
- 230000000694 effects Effects 0.000 description 17
- 238000004364 calculation method Methods 0.000 description 13
- 239000013598 vector Substances 0.000 description 8
- 238000004891 communication Methods 0.000 description 7
- 238000012706 support-vector machine Methods 0.000 description 7
- 238000010586 diagram Methods 0.000 description 6
- 230000003044 adaptive effect Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 238000000605 extraction Methods 0.000 description 3
- 238000005457 optimization Methods 0.000 description 3
- 230000006870 function Effects 0.000 description 2
- 230000003068 static effect Effects 0.000 description 2
- 230000004913 activation Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000003064 k means clustering Methods 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 230000007723 transport mechanism Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L65/00—Network arrangements, protocols or services for supporting real-time applications in data packet communication
- H04L65/80—Responding to QoS
Definitions
- the embodiments of the present application relate to but are not limited to the field of network communication technologies, and in particular, relate to an audio control method, device, device, and computer-readable storage medium.
- Audio communications are often included in network communication systems.
- a variety of methods are generally used to optimize the network, such as optimizing the network topology, sending redundant packets at the sending end, etc., to ensure the audio quality at the playback end.
- the operability is poor; the method of sending redundant packets at the sender needs the support of the sender, but in the case of congestion, the network situation will become more complicated. Oops, thus affecting the audio effect.
- Embodiments of the present application provide an audio control method, device, device, and computer-readable storage medium.
- an embodiment of the present application provides an audio control method, including: acquiring network characteristic data during network processing and audio characteristic data during audio processing; inputting the network characteristic data and the audio characteristic data To the classifier model, a number of audio control instructions are obtained; according to the audio control instructions, the corresponding audio processing operations in the corresponding audio processing process are controlled.
- the embodiment of the present application further provides an audio control device, configured to execute the audio control method described in the first aspect above.
- the embodiment of the present application also provides an audio control device, including: a memory, a processor, and a computer program stored in the memory and operable on the processor, and the processor executes the The computer program implements the audio control method as described in the first aspect above.
- the embodiment of the present application further provides a computer-readable storage medium storing computer-executable instructions, and the computer-executable instructions are used to execute the audio control method as described in the first aspect above.
- FIG. 1 is a schematic flowchart of an audio control method provided by an embodiment of the present invention
- Fig. 2 is a schematic flow chart of obtaining audio control instructions provided by an embodiment of the present invention
- Fig. 3 is a schematic flowchart of an audio stretch control instruction or an audio compression control instruction provided by an embodiment of the present invention
- Fig. 4 is a schematic flowchart of a stop audio decoding control instruction or an audio decoding average speed control instruction provided by an embodiment of the present application;
- FIG. 5 is a schematic flow diagram of audio stretching data provided by an embodiment of the present application.
- FIG. 6 is a schematic flow diagram of audio compression data provided by an embodiment of the present application.
- Fig. 7 is a schematic flow diagram of a stop audio decoding control instruction or an audio decoding average speed control instruction provided by another embodiment of the present application;
- Fig. 8 is a schematic structural diagram of an audio control system provided by an embodiment of the present application.
- optimization can be performed by optimizing the network topology and sending redundant packets at the sending end to ensure the audio quality at the playback end.
- the operability is poor; and the method of sending redundant packets at the sender needs to be supported by the sender in the case of congestion, but it also needs to be supported in the case of congestion. It will make the network situation worse and affect the audio effect.
- embodiments of the present application provide an audio control method, device, device, and computer-readable storage medium, which can effectively ensure audio effects and have good operability.
- the embodiment of the present application specifically relates to real-time audio reception and playback, including but not limited to application scenarios such as video conferencing, video playback, real-time voice chat, or Lianmai.
- FIG. 1 is a schematic flowchart of the audio control method provided by an embodiment of the present application.
- Step S100 acquiring network feature data during network processing and audio feature data during audio processing
- Step S200 inputting network feature data and audio feature data into the classifier model to obtain several audio control instructions
- Step S300 according to the audio control instruction, control the corresponding audio processing operation in the corresponding audio processing process.
- the embodiment of the present application can perform feature extraction on the corresponding feature data in the network processing process and audio processing process, so that the classifier model can perform data processing on the feature data, so as to obtain classification results, that is, several audio Control instructions, and then adaptively issue the audio control instructions according to several audio control instructions obtained by classification to control the corresponding audio processing operations in the corresponding audio processing process, so as to effectively ensure the audio effect and have good operability.
- the classifier model in the embodiment of the present application may adopt a support vector machine (Support Vector Machine, SVM).
- SVM Support Vector Machine
- the support vector machine can perform binary classification on feature data (such as network feature data and audio feature data) in a supervised learning manner, and it is a generalized linear classifier.
- a support vector machine is used as a classifier model to classify network feature data and audio feature data with supervised learning.
- the support vector machine of this embodiment usually has three kinds of kernel functions: linear kernel, polynomial kernel and Gaussian kernel. Since the linear kernel has the advantages of few parameters and fast iteration, it has a good effect on the classification of supervised learning. Therefore, this embodiment adopts the support vector machine of the linear kernel as the classifier model, and its training time is shorter and the accuracy is higher.
- classifiers can also be used to replace the support vector machine in the embodiment of the present application, including but not limited to: deep belief network, linear classifier, random forest and other classifiers.
- the classifier model needs to be constructed before the network feature data and the audio feature data are input into the classifier model.
- the classifier model can be constructed by means of offline training and online training.
- the offline training method collect feature training data such as network feature training data and audio feature training data from different platforms to construct a training sample, and store the training sample as data. Then, the classifier model is trained by inputting the training samples into a high-performance machine. In offline training, unsupervised clustering may not be performed on the feature training vectors corresponding to the training samples.
- the method of designing a static buffer queue at the playback end can be used to optimize the network, such as optimizing network jitter and packet loss.
- the method of designing a static buffer queue on the playback side although it can resist large jitters, if the length of the buffer queue is set unreasonably, it will increase the delay, resulting in poor real-time performance, thereby affecting the audio effect.
- the network feature data in this embodiment of the present application includes network cache queue data, network jitter data, and network packet loss rate data
- the network processing process includes a network cache queue process and a network prediction process.
- the network cache queue data is obtained from the network real-time transmission protocol packet data through the network cache operation in the network cache queue process; the network jitter data and the network packet loss rate data are both obtained from the network real-time transmission protocol packet data through the network cache queue process. After the cache operation, it is obtained through the network prediction operation in the network prediction process.
- audio coding data is packaged to form network real-time transmission protocol packet data.
- the network real-time transport protocol packet data can be used as the initial input data of the audio control method of this embodiment.
- the corresponding network cache operation is performed on the network real-time transmission protocol packet data through the network cache queue process, so as to realize the network caching effect on the initial input data, that is, the network real-time transmission protocol packet data .
- each received network real-time transmission protocol packet data can be sorted, and the sorting method can be sorted according to the corresponding time when the network real-time transmission protocol packet data enters the network cache queue process sequentially, so that The serial number data corresponding to each network real-time transmission protocol packet data is obtained, so as to facilitate further calculation to obtain network jitter data and network packet loss rate data.
- the network cache queue data may be network cache queue length data, for example, the size of the network cache queue length data may be expressed in bytes.
- the network cache queue data is the corresponding cache number of network real-time transport protocol packet data in the network cache queue process.
- the corresponding network cache operation is performed to obtain the network cache queue data.
- the network real-time transmission protocol packet data is taken out.
- the network real-time transmission protocol packet data enters the network prediction process, and the network jitter data and network packet loss are obtained through the corresponding network prediction operation. rate data.
- the audio feature data in this embodiment of the present application includes audio decoding rate data, audio buffer queue data and audio consumption rate data
- the audio processing process includes audio decoding process, audio buffering process and audio consumption process.
- the audio decoding rate data is obtained by the network real-time transmission protocol packet data through the network cache operation in the network cache queue process, and then through the audio decoding operation in the audio decoding process;
- the audio cache queue data is obtained by the network real-time transmission protocol packet data through the audio
- the audio decoding data is obtained through the audio buffering operation in the audio buffering process;
- the audio consumption rate data is obtained from the audio decoding data through the audio buffering operation in the audio buffering process, and then through The audio consuming operation in the audio consuming process gets.
- the network real-time transmission protocol packet data in the embodiment of the present application is transmitted to the audio decoding process through the corresponding audio decoding operation after the network buffering operation in the network buffering queue process to obtain audio decoding rate data and audio decoding data; after that, the audio The audio buffer queue data is obtained by decoding the data through the corresponding audio buffering operation in the audio buffering process; after the audio decoding data is processed by the corresponding audio buffering operation in the audio buffering process, the audio consumption rate data is obtained through the audio consumption operation in the audio consumption process .
- the audio consumption in this embodiment of the present application may be audio playback.
- the audio buffer queue data may be audio buffer queue length data, for example, the size of the audio buffer queue length data may be expressed in bytes.
- the audio buffer queue data is the number of buffers corresponding to the audio decoding data in the audio buffering process.
- the defined network buffer queue data is represented as L net ; the audio buffer queue data is represented as L audio .
- the audio codec generally uses audio data corresponding to a unit time of 20 ms as a basic processing unit.
- the audio data corresponding to a unit time of 20 ms is obtained through the audio encoding operation in the audio encoding process, and the audio encoding data is packaged to form network real-time transmission protocol packet data, which is used as the initial Input data enters the network cache queue process during network processing.
- the network real-time transport protocol packet data in this embodiment is subjected to the network buffering operation in the network buffering queue process, and then the audio decoding operation in the audio decoding process to obtain audio decoding data.
- the audio decoding data may also be audio data corresponding to a unit time of 20 ms.
- the network real-time transmission protocol packet data is subjected to the network prediction operation in the network prediction process after the corresponding network cache operation in the network cache queue process.
- the corresponding network prediction operation in the network prediction process can be: by obtaining the time difference between two adjacent network real-time transmission protocol packet data entering the network cache queue, and the corresponding network real-time transmission protocol packet data in the network cache queue process.
- the number of cached data, the serial number data corresponding to each network real-time transmission protocol packet data in the network cache queue process, etc. are used to calculate the network cache queue data, network jitter data and network packet loss rate data. After that, the network cache queue data, network jitter data and network packet loss rate data are fed back to the audio control process. It can be understood that the audio control process in this embodiment is to execute steps S200 and S300.
- Net jitter T current -T last ;
- T current represents the corresponding time of the current network real-time transmission protocol packet data entering the network cache queue process, and T last represents the corresponding time of the last network real-time transmission protocol packet data entering the network cache queue process.
- the data of each network real-time transmission protocol packet received can be sorted, that is, sorted according to the corresponding time when each network real-time transmission protocol packet data enters the network cache queue in sequence, The serial number data corresponding to each network real-time transmission protocol packet data is obtained.
- the audio feature data in this embodiment includes audio decoding rate data, audio buffer queue data, and audio consumption rate data.
- audio decoding rate data as V decode
- audio consumption rate data as V play .
- V decode T audio1 /T 2 -T 1 , wherein, T audio1 represents the decoding time corresponding to the audio decoding data in the preset first calculation cycle during the audio decoding process, and the unit is ms; T 2 represents the above-mentioned first calculation The end time corresponding to the cycle, T1 represents the start time corresponding to the above-mentioned first calculation cycle; it can be understood that audio1 represents the data length corresponding to the audio decoding data in the preset first calculation cycle during the audio decoding process.
- V play T audio2 /T 4 -T 3
- T audio2 represents the consumption time corresponding to the audio consumption data in the preset second calculation period during the audio consumption process, and the unit is ms
- T 4 represents the above-mentioned second calculation
- the end time corresponding to the period, T3 indicates the start time corresponding to the second calculation period above; it can be understood that audio2 indicates the length of the audio consumption data corresponding to the audio consumption data in the preset second calculation period during the audio consumption process ;
- the audio consumption data may be audio stretching data or audio compression data or audio decoding data. It can be understood that the first calculation period and the second calculation period may be the same or different, which is not limited herein.
- the corresponding audio consumption rate data is 0.8. In other embodiments, it can also be expressed as 80%.
- a feature vector F is constructed by acquiring the aforementioned network feature data and audio feature data within a preset calculation period.
- Feature vector F[N] [L net ⁇ L audio ⁇ Net jitter ⁇ Net lost ⁇ V decode ⁇ V play ], to feed back to the audio control process, that is, by inputting the network feature data and audio feature data into the classifier model, a number of audio control instructions are obtained , according to the audio control instruction, control the corresponding audio processing operation in the corresponding audio processing process.
- w and b are included in the trained classifier model; by inputting the feature vector F into the classifier model, the class label C is obtained as an output, that is, the audio control instruction is obtained.
- C sigmoid(w*F+b), where w represents the feature matrix, b represents the offset, F represents the feature vector, sigmoid represents the activation function, and C represents the category label.
- step S200 includes but is not limited to:
- Step S210 input the network buffer queue data, network jitter data, network packet loss rate data, audio decoding rate data, audio buffer queue data and audio consumption rate data into the classifier model to obtain audio decoding control instructions and audio modulation and speed control instruction.
- the network feature data includes network jitter data and network packet loss rate data
- the audio feature data includes audio decoding rate data, audio buffer queue data, and audio consumption rate data.
- the embodiment of the present application models the network feature data in the network processing process and the audio feature data in the audio processing process, that is, by inputting the network feature data and audio feature data into the trained classifier model, several Audio control commands, namely, audio decoding control commands and audio pitch and speed control commands.
- Audio control commands namely, audio decoding control commands and audio pitch and speed control commands.
- the corresponding audio processing operation in the corresponding audio processing process is controlled according to the audio control instruction.
- the embodiments of the present application can adapt to different network conditions, ensure audio effects, and further ensure the effectiveness and real-time performance of video communication.
- the embodiment of the present application by extracting the feature data corresponding to each link of the network processing process and the audio processing process, the network cache queue data, network jitter data, network packet loss rate data, and audio decoding rate data are obtained. data, audio buffer queue data, and audio consumption rate data.
- the embodiment of the present application can avoid artificially setting thresholds and formulating rules, etc., and uses a classifier model to carry out supervised classification of feature data, so as to facilitate adaptive delivery of audio control instructions to reduce delay.
- the corresponding audio processing operation in the corresponding audio processing process is controlled, including but not limited to:
- control the corresponding audio decoding operation in the corresponding audio decoding process According to the audio decoding control instruction, control the corresponding audio decoding operation in the corresponding audio decoding process.
- the corresponding audio pitch shifting and speed changing operation during the corresponding audio pitch shifting and speed changing process is controlled.
- the process of audio pitch shifting and speed change may include audio stretching and audio compression.
- the embodiment of the present application can reasonably utilize the feature data corresponding to each link of the network processing process and the audio processing process.
- this process there is no need to set a threshold, and the feature data is processed and analyzed by the classifier model.
- the corresponding audio processing operation in the corresponding audio processing process is controlled through the adaptive control method, and the operability is good.
- the audio pitch and speed control command can be obtained by at least one of the following steps:
- Step S201 when the data length corresponding to the audio buffer queue data is less than the first audio consumption data length corresponding to the first audio consumption rate, the audio modulation and speed control instruction is an audio stretching control instruction; or
- Step S202 when the data length corresponding to the audio buffer queue data is greater than the first audio data length corresponding to the first time interval threshold and smaller than the second audio data length corresponding to the second time interval threshold, or the data length corresponding to the audio buffer queue data is greater than
- the second audio consumption data length corresponding to the second audio consumption rate is less than the third audio consumption data length corresponding to the third audio consumption rate
- the audio modulation speed control instruction is an audio compression control instruction, wherein the second audio consumption data length is greater than the first audio consumption data length An audio consumes data length.
- the network feature data and audio feature data are input into the classifier model, and the corresponding audio control instructions can be obtained.
- the audio pitch and speed control instructions can be audio stretching control instructions or Audio compression control commands.
- step S201 and step S202 are only a classification condition of the corresponding audio stretching control instruction or audio compression control instruction.
- the audio stretching control instruction or audio compression control instruction can also be obtained according to other characteristic data. The audio compression control command will not be repeated here.
- the audio decoding control instruction can be obtained by at least one of the following steps:
- Step S203 when the data length corresponding to the audio buffer queue data is greater than the second audio data length corresponding to the second time interval threshold or greater than the third audio consumption data length corresponding to the third audio consumption rate, the audio decoding control instruction is to stop audio decoding control order; or
- Step S204 obtain the first total number of executions corresponding to the audio stretching control instruction and the second total number of executions corresponding to the audio compression control instruction within the preset time, when there is no stop audio decoding control instruction within the preset time, and the first execution
- the absolute value of the difference between the total number of times and the second total number of executions is smaller than a preset threshold, and the audio decoding control instruction is an audio decoding average speed control instruction.
- the network feature data and audio feature data are input into the classifier model, and the corresponding audio control instructions can be obtained.
- the audio modulation and speed control instructions can be stop audio decoding control instructions or Audio decoding average speed control command.
- steps S203 and S204 are only a classification condition for the corresponding stop audio decoding control instruction or audio decoding average speed control instruction.
- the stop audio decoding control can also be obtained according to other characteristic data. Instructions or audio decoding average speed control instructions will not be described in detail here.
- the audio control instruction in this embodiment includes an audio stretch control instruction, an audio compression control instruction, a stop audio decoding control instruction, and an audio decoding average speed control instruction.
- a sending time interval threshold T control may be preset, that is, T control means that the audio control command is sent adaptively every time a sending time interval threshold passes. It can be understood that the T control may correspond to the data length of an RTP packet data, that is, the transmission time corresponding to the RTP packet data of a preset data length, which may be expressed as T control .
- each feature training data such as network feature training data and audio feature training data needs to have a corresponding category label, in order to train a better classifier model.
- the feature vector F is input into the classifier model, and the category label can be outputted, and the category label corresponds to the audio control instructions of the above four categories (i.e. audio stretch control instruction, audio compression control instruction , stop audio decoding control instruction and audio decoding average speed control instruction), so as to achieve the effect of control, and the operability is good.
- the audio decoding control command corresponding to the tag is the stop audio decoding control command.
- the audio pitch and speed change control command corresponding to the category label is an audio stretch control command.
- the pitch shifting and speed changing control command is an audio compression control command, wherein the length of the second audio consumption data is greater than the length of the first audio consumption data.
- the decoding control command is an audio decoding average speed control command; at this time, it means that the total execution times of the audio stretching control command and the audio compression control command are relatively consistent within the preset time.
- T control may correspond to the data length corresponding to one RTP packet data
- the first time interval threshold 4*T control may correspond to the data length corresponding to 4 RTP packet data, That is, the first audio data length
- the second time interval threshold 6*T control may correspond to the data length corresponding to 6 RTP packet data, that is, the second audio data length.
- the first audio consumption rate V play' represents the audio consumption rate corresponding to the first audio consumption data within a preset unit time during the audio consumption process, wherein the first audio consumption data corresponds to the length of the first audio consumption data ; Therefore, the second audio consumption rate 4*V play' represents the audio consumption rate corresponding to 4 times the first audio consumption data in the preset unit time during the audio consumption process, wherein 4 times the first audio consumption data corresponds to 4 times The length of the first audio consumption data, that is, the length of the second audio consumption data; the third audio consumption rate 6*V play' represents the audio consumption rate corresponding to 6 times the first audio consumption data in the preset unit time during the audio consumption process, Wherein, 6 times of the first audio consumption data corresponds to 6 times of the length of the first audio consumption data, that is, the length of the third audio consumption data.
- the above-mentioned category labels that is, the audio control instructions are classified according to rules/classification conditions. In some embodiments, there is greater repeatability.
- unsupervised clustering is performed on the number of input samples in the same category, such as using k-means clustering algorithm (k-means), This facilitates the extraction of feature data.
- k-means clustering algorithm k-means
- the audio decoding data in the audio buffering process will be about to Insufficient consumption in the audio consumption process, that is, insufficient to perform audio consumption operations, and issue audio stretch control commands.
- an audio compression control command is issued.
- a control instruction to stop audio decoding or an audio compression control instruction is issued.
- step S300 including but not limited to:
- Step S310 when the audio pitch and speed control command is an audio stretching control command, according to the audio stretching control command, control the audio decoding data through the audio stretching operation in the audio stretching process to obtain audio stretching data;
- Step S320 control the audio stretching data to go through the audio consumption operation in the audio consumption process after the audio buffering operation in the audio buffering process.
- the audio stretching operation is controlled by the audio stretching control instruction to control the audio decoding data through the audio stretching operation in the audio stretching process to obtain the audio stretching data, so that after the audio stretching data undergoes the audio buffering operation in the audio buffering process, Then go through the audio consumption operation in the audio consumption process.
- the audio processing process includes an audio compression process; step S300, including but not limited to:
- Step S330 when the audio modulation and speed control instruction is an audio compression control instruction, according to the audio compression control instruction, control the audio decoding data to undergo an audio compression operation in the audio compression process to obtain audio compression data;
- Step S340 control the audio compression data to go through the audio consuming operation in the audio consuming process after the audio buffering operation in the audio buffering process.
- the audio decoding data is controlled through the audio compression operation in the audio compression process through the audio compression control command to obtain the audio compression data, and then the audio compression data is processed through the audio buffer operation in the audio buffer process, and then through the audio consumption process Audio consumption operations in .
- the audio decoding data in the embodiment of the present application includes silent data, voiced data and unprocessed decoding data.
- the audio pitch and speed change process includes audio stretching process and audio compression process.
- the audio stretching data i.e., the silent stretching data and the voiced sound stretching data
- audio compression data i.e. silent compressed data and voiced compressed data
- audio compression control command after which the audio compressed data and the unprocessed decoded data are processed by audio
- the audio consuming operation in the audio consuming process is performed.
- controllable delivery of audio modulation and speed control instructions such as audio stretching control instructions or audio compression control instructions
- audio decoding data can be changed by audio modulation and speed
- the audio pitch and speed data that is, audio stretch data or audio compression data
- Step S300 includes at least one of the following:
- Step S350 when the audio decoding control instruction is a stop audio decoding control instruction, according to the stop audio decoding control instruction, control the network real-time transmission protocol packet data to stop the audio decoding operation during the stop audio decoding process; or
- Step S360 when the audio decoding control instruction is an audio decoding average speed control instruction, according to the audio decoding average speed control instruction, control the audio decoding average speed operation during the audio decoding average speed of the network real-time transmission protocol packet data.
- the embodiment of the present application controls the network real-time transport protocol packet data to stop the audio decoding operation in the audio decoding process according to the stop audio decoding control instruction, thereby ensuring that the audio buffer operation in the audio buffer process can be performed normally, so as to avoid causing audio buffer Stacking in the process to reduce delay.
- the embodiment of the present application controls the audio decoding average speed operation during the audio decoding average speed process of the network real-time transmission protocol packet data, thereby ensuring that the audio decoding average speed operation during the audio decoding average speed operation is carried out at an average speed , to ensure a better audio effect.
- the embodiment of the present application can solve problems such as unreasonable buffer queue length setting, untimely audio stretching or compression, and large audio delay.
- the audio control process can The classifier model in classifies network feature data and audio feature data to obtain audio control instructions, thereby effectively reducing delay and achieving the effect of taking both delay and audio into account.
- the embodiment of the present application also provides an audio control device, configured to implement the audio control method described in the first aspect above.
- the embodiment of the present application also provides an audio control system, including:
- An audio control device configured to execute the audio control method as described in the first aspect above;
- It also includes a network buffer queue device, a network prediction device, an audio decoding device, an audio buffer queue device, an audio tone shifting device and an audio consumption device;
- the network buffer queue device, the audio decoding device, the audio buffer queue device and the audio consumption device are sequentially connected, and the network buffer queue device, the audio decoding device, the audio buffer queue device and the audio consumption device are respectively connected to the audio control device, and the network prediction device They are respectively connected to the network buffer queue device and the audio control device, and the audio frequency shifting device is respectively connected to the audio buffer queue device and the audio control device.
- the network cache queue device is used to store and transmit network real-time transmission protocol packet data to adapt to network jitter; the network cache queue device corresponds to the network cache queue process and can be used to perform network cache operations; the network real-time transmission protocol packet data The network cache queue data is obtained after the network cache queue device executes the network cache operation.
- Network prediction device used to model real-time network transmission protocol packet data, and obtain network prediction data such as network jitter data and network packet loss rate data through network prediction operations, such as statistics of the network situation in the previous cycle; network prediction device Corresponding to the network prediction process, it can be used to perform network prediction operations; after the network real-time transmission protocol packet data is executed by the network buffer queue device for network buffer operations, the network prediction device is then used to perform network prediction operations to obtain network jitter data and network packet loss rate data.
- network prediction device Corresponding to the network prediction process, it can be used to perform network prediction operations; after the network real-time transmission protocol packet data is executed by the network buffer queue device for network buffer operations, the network prediction device is then used to perform network prediction operations to obtain network jitter data and network packet loss rate data.
- Audio decoding device used to perform audio decoding operations on network real-time transmission protocol packet data to obtain audio decoding rate data and audio decoding data; the audio decoding device corresponds to the audio decoding process and can be used to perform audio decoding operations; network real-time transmission protocol packet data is passed through After the network cache queue device executes the network cache operation, the audio decoding device executes the audio decoding operation to obtain audio decoding rate data and audio decoding data.
- Audio buffer queue device used to store and transmit the audio decoding data after the audio decoding device performs the audio decoding operation, so as to perform adaptive expansion of the audio buffer queue; the audio buffer queue device corresponds to the audio buffer process and can be used to perform audio buffer operations; the network After the audio decoding operation is performed on the RTP packet data by the audio decoding device to obtain the audio decoding data, the audio decoding data is then subjected to an audio buffering operation by the audio buffer queue device.
- Audio pitch shifting device used to perform different audio pitch shifting and speed changing operations on audio decoding data such as silent data and voiced data, such as performing audio stretching or audio compression operations, so as to effectively ensure audio effects.
- the audio pitch shifting and speed changing device corresponds to the audio pitch shifting and speed changing process, and can be used to perform audio pitch shifting and speed changing operations.
- the audio pitch shifting operation includes an audio stretching operation and an audio compression operation, and correspondingly, the audio pitch shifting device further includes an audio stretching module and an audio compression module.
- the audio control device is used to control the audio decoding data to perform an audio stretching operation through the audio stretching module according to the audio stretching control instruction to obtain audio stretching data; the audio control device is also used to control the audio stretching data through the audio buffer queue After the device executes the audio buffer operation, the audio consumption operation is performed by the audio consumption device; or, the audio control device is used to control the audio decoding data to perform an audio compression operation through the audio compression module to obtain audio compression data according to the audio compression control instruction; the audio control The device is also used to control the audio compression data to execute the audio buffer operation through the audio buffer queue device, and then execute the audio consumption operation through the audio consumption device.
- Audio consumption device used to obtain audio consumption/playback rules for different platforms and playback congestion, such as audio consumption rate data, and feed back to the audio control device; the audio consumption device corresponds to the audio consumption process and can be used to perform audio consumption Operation; in some embodiments, the audio consumption device can also be an audio playback device; after the audio decoding data is performed by the audio buffer queue device to perform an audio buffer operation, the audio consumption device is then used to perform an audio consumption operation to obtain audio consumption rate data.
- Audio control device used to issue audio control instructions, such as issuing stop audio decoding control instructions or audio decoding average speed control instructions to control whether the audio decoding device performs decoding; issue audio stretch control instructions or audio compression control instructions, To control the audio pitch shifting device to perform audio stretching operation or audio compression operation.
- the audio control device also includes a classifier module.
- the feature data can be supervised and learned through the classifier module, so as to classify and output the category label corresponding to the feature data, that is, the corresponding audio Control instructions to achieve the purpose of self-adaptation.
- the embodiment of the present application performs adaptive adjustment according to network processing conditions and audio processing conditions, with a small amount of parameters and low manual participation, fine-tuning can be performed on a classifier module such as a trained classifier model, and the iteration speed is fast.
- the embodiment of the third aspect of the present application also provides an audio control device, the audio control device includes: a memory, a processor, and a computer program stored in the memory and operable on the processor.
- the processor and memory can be connected by a bus or other means.
- memory can be used to store non-transitory software programs and non-transitory computer-executable programs.
- the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage devices.
- the memory may include memory located remotely from the processor, which remote memory may be connected to the processor via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
- the non-transitory software programs and instructions required to realize the audio control method of the embodiment of the first aspect above are stored in the memory, and when executed by the processor, the audio control method in the above embodiment is executed, for example, the above-described diagram is executed.
- the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
- an embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are executed by a processor or a controller, for example, by the above-mentioned Execution by a processor in the device embodiment can cause the above-mentioned processor to execute the audio control method in the above-mentioned embodiment, for example, execute the method steps S100 to S300 in FIG. 1 described above, the method step S210 in FIG. 3, method steps S203 to S204 in FIG. 4 , method steps S310 to S320 in FIG. 5 , method steps S330 to S340 in FIG. 6 , and method steps S350 to S360 in FIG. 7 .
- the embodiment of the present application includes: obtaining the network feature data in the network processing process and the audio feature data in the audio processing process; then inputting the network feature data and audio feature data into the classifier model, and classifying and obtaining several audio control instructions; and then according to The audio control instruction controls the corresponding audio processing operation in the corresponding audio processing process.
- the embodiment of the present application can perform feature extraction on the corresponding feature data in the network processing process and the audio processing process, so that the classifier model can perform data processing on the feature data, thereby obtaining the classification result, that is, several audio Control instructions, and then adaptively issue the audio control instructions according to several audio control instructions obtained by classification to control the corresponding audio processing operations in the corresponding audio processing process, so as to effectively ensure the audio effect and have good operability.
- Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage devices, or can Any other medium used to store desired information and which can be accessed by a computer.
- communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Multimedia (AREA)
- Molecular Biology (AREA)
- Computational Linguistics (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Data Exchanges In Wide-Area Networks (AREA)
Abstract
An audio control method and apparatus, a device, and a computer readable storage medium. The method comprises: obtaining network feature data in a network processing process and audio feature data in an audio processing process (S100); inputting the network feature data and the audio feature data into a classifier model to obtain multiple audio control instructions (S200); and controlling a corresponding audio processing operation in a corresponding audio processing process according to the audio control instructions (S300).
Description
相关申请的交叉引用Cross References to Related Applications
本申请基于申请号为202111145232.2、申请日为2021年09月28日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。This application is based on a Chinese patent application with application number 202111145232.2 and a filing date of September 28, 2021, and claims the priority of this Chinese patent application. The entire content of this Chinese patent application is hereby incorporated by reference into this application.
本申请实施例涉及但不限于网络通信技术领域,尤其涉及一种音频控制方法、装置、设备及计算机可读存储介质。The embodiments of the present application relate to but are not limited to the field of network communication technologies, and in particular, relate to an audio control method, device, device, and computer-readable storage medium.
在网络通信系统中通常包括音频通信。在一些情形下,一般采用多种方法来优化网络,例如通过优化网络拓扑结构、发送端冗余发包等方法来进行优化,以保障播放端的音频质量。然而,对于网络拓扑结构的优化,由于缺乏可控性以及客观因素影响大,使得可操作性差;对于发送端冗余发包的方法需要发送端支持,但在拥塞情况下会使得网络情况变得更加糟糕,从而影响音频效果。Audio communications are often included in network communication systems. In some cases, a variety of methods are generally used to optimize the network, such as optimizing the network topology, sending redundant packets at the sending end, etc., to ensure the audio quality at the playback end. However, for the optimization of the network topology, due to the lack of controllability and the large influence of objective factors, the operability is poor; the method of sending redundant packets at the sender needs the support of the sender, but in the case of congestion, the network situation will become more complicated. Oops, thus affecting the audio effect.
发明内容Contents of the invention
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。The following is an overview of the topics described in detail in this article. This summary is not intended to limit the scope of the claims.
本申请实施例提供了一种音频控制方法、装置、设备及计算机可读存储介质。Embodiments of the present application provide an audio control method, device, device, and computer-readable storage medium.
第一方面,本申请实施例提供了一种音频控制方法,包括:获取网络处理过程中的网络特征数据和音频处理过程中的音频特征数据;将所述网络特征数据和所述音频特征数据输入至分类器模型,得到若干音频控制指令;根据所述音频控制指令,控制对应的音频处理过程中对应的音频处理操作。In the first aspect, an embodiment of the present application provides an audio control method, including: acquiring network characteristic data during network processing and audio characteristic data during audio processing; inputting the network characteristic data and the audio characteristic data To the classifier model, a number of audio control instructions are obtained; according to the audio control instructions, the corresponding audio processing operations in the corresponding audio processing process are controlled.
第二方面,本申请实施例还提供了一种音频控制装置,用于执行如上述第一方面所述的音频控制方法。In the second aspect, the embodiment of the present application further provides an audio control device, configured to execute the audio control method described in the first aspect above.
第三方面,本申请实施例还提供了一种音频控制设备,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如上述第一方面所述的音频控制方法。In the third aspect, the embodiment of the present application also provides an audio control device, including: a memory, a processor, and a computer program stored in the memory and operable on the processor, and the processor executes the The computer program implements the audio control method as described in the first aspect above.
第四方面,本申请实施例还提供了一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行如上述第一方面所述的音频控制方法。In a fourth aspect, the embodiment of the present application further provides a computer-readable storage medium storing computer-executable instructions, and the computer-executable instructions are used to execute the audio control method as described in the first aspect above.
本申请的其它特征和优点将在随后的说明书中阐述,并且,部分地从说明书中变得显而易见,或者通过实施本申请而了解。本申请的目的和其他优点可通过在说明书、权利要求书以及附图中所特别指出的结构来实现和获得。Additional features and advantages of the application will be set forth in the description which follows, and, in part, will be obvious from the description, or may be learned by practice of the application. The objectives and other advantages of the application will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
附图用来提供对本申请技术方案的进一步理解,并且构成说明书的一部分,与本申请的 实施例一起用于解释本发明的技术方案,并不构成对本发明技术方案的限制。The accompanying drawings are used to provide a further understanding of the technical solution of the present application, and constitute a part of the description, and are used together with the embodiments of the application to explain the technical solution of the present invention, and do not constitute a limitation to the technical solution of the present invention.
图1是本发明一个实施例提供的音频控制方法的流程示意图;FIG. 1 is a schematic flowchart of an audio control method provided by an embodiment of the present invention;
图2是本发明一个实施例提供的得到音频控制指令的流程示意图;Fig. 2 is a schematic flow chart of obtaining audio control instructions provided by an embodiment of the present invention;
图3是本发明一个实施例提供的音频拉伸控制指令或音频压缩控制指令的流程示意图;Fig. 3 is a schematic flowchart of an audio stretch control instruction or an audio compression control instruction provided by an embodiment of the present invention;
图4是本申请一个实施例提供的停止音频解码控制指令或音频解码均速控制指令的流程示意图;Fig. 4 is a schematic flowchart of a stop audio decoding control instruction or an audio decoding average speed control instruction provided by an embodiment of the present application;
图5是本申请一个实施例提供的音频拉伸数据的流程示意图;FIG. 5 is a schematic flow diagram of audio stretching data provided by an embodiment of the present application;
图6是本申请一个实施例提供的音频压缩数据的流程示意图;FIG. 6 is a schematic flow diagram of audio compression data provided by an embodiment of the present application;
图7是本申请另一个实施例提供的停止音频解码控制指令或音频解码均速控制指令的流程示意图;Fig. 7 is a schematic flow diagram of a stop audio decoding control instruction or an audio decoding average speed control instruction provided by another embodiment of the present application;
图8是本申请一个实施例提供的音频控制系统的结构示意图。Fig. 8 is a schematic structural diagram of an audio control system provided by an embodiment of the present application.
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。In order to make the purpose, technical solution and advantages of the present application clearer, the present application will be further described in detail below in conjunction with the accompanying drawings and embodiments. It should be understood that the specific embodiments described here are only used to explain the present application, not to limit the present application.
需要说明的是,虽然在装置示意图中进行了功能模块划分,在流程图中示出了逻辑顺序,但是在某些情况下,可以以不同于装置中的模块划分,或流程图中的顺序执行所示出或描述的步骤。说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。It should be noted that although the functional modules are divided in the schematic diagram of the device, and the logical sequence is shown in the flowchart, in some cases, it can be executed in a different order than the module division in the device or the flowchart in the flowchart. steps shown or described. The terms "first", "second" and the like in the specification and claims and the above drawings are used to distinguish similar objects, and not necessarily used to describe a specific sequence or sequence.
在一些情形下,一般采用多种方法来优化网络如优化网络抖动和网络丢包情况,具体可通过优化网络拓扑结构、发送端冗余发包等方法来进行优化,以保障播放端的音频质量。然而,对于网络拓扑结构的优化,由于缺乏可控性以及客观因素影响大,使得可操作性差;而对于发送端冗余发包的方法需要发送端在拥塞情况下能够支持,但在拥塞情况下也会使得网络情况变得更加糟糕,从而影响音频效果。In some cases, a variety of methods are generally used to optimize the network, such as optimizing network jitter and network packet loss. Specifically, optimization can be performed by optimizing the network topology and sending redundant packets at the sending end to ensure the audio quality at the playback end. However, for the optimization of the network topology, due to the lack of controllability and the large influence of objective factors, the operability is poor; and the method of sending redundant packets at the sender needs to be supported by the sender in the case of congestion, but it also needs to be supported in the case of congestion. It will make the network situation worse and affect the audio effect.
基于此,本申请实施例提供了一种音频控制方法、装置、设备及计算机可读存储介质,能够有效保证音频效果,可操作性较好。Based on this, embodiments of the present application provide an audio control method, device, device, and computer-readable storage medium, which can effectively ensure audio effects and have good operability.
可以理解的是,本申请实施例具体涉及音频的实时接收以及播放,包括但不限于视频会议、视频播放、实时语音交谈或连麦等应用场景。It can be understood that the embodiment of the present application specifically relates to real-time audio reception and playback, including but not limited to application scenarios such as video conferencing, video playback, real-time voice chat, or Lianmai.
下面结合附图,对本申请实施例作进一步阐述。The embodiments of the present application will be further described below in conjunction with the accompanying drawings.
本申请第一方面实施例具体提供一种音频控制方法,如图1所示,图1是本申请一个实施例提供的音频控制方法的流程示意图。The embodiment of the first aspect of the present application specifically provides an audio control method, as shown in FIG. 1 , which is a schematic flowchart of the audio control method provided by an embodiment of the present application.
本申请实施例的音频控制方法包括但不限于以下步骤:The audio control method of the embodiment of the present application includes but is not limited to the following steps:
步骤S100,获取网络处理过程中的网络特征数据和音频处理过程中的音频特征数据;Step S100, acquiring network feature data during network processing and audio feature data during audio processing;
步骤S200,将网络特征数据和音频特征数据输入至分类器模型,得到若干音频控制指令;Step S200, inputting network feature data and audio feature data into the classifier model to obtain several audio control instructions;
步骤S300,根据音频控制指令,控制对应的音频处理过程中对应的音频处理操作。Step S300, according to the audio control instruction, control the corresponding audio processing operation in the corresponding audio processing process.
可以理解的是,本申请实施例能够分别对网络处理过程和音频处理过程中所对应的特征数据分别进行特征提取,以便于分类器模型对该特征数据进行数据处理,从而得到分类结果即若干音频控制指令,之后根据分类得到的若干音频控制指令,自适应地下发该音频控制指 令,以控制对应的音频处理过程中对应的音频处理操作,从而能够有效保证音频效果,可操作性较好。It can be understood that the embodiment of the present application can perform feature extraction on the corresponding feature data in the network processing process and audio processing process, so that the classifier model can perform data processing on the feature data, so as to obtain classification results, that is, several audio Control instructions, and then adaptively issue the audio control instructions according to several audio control instructions obtained by classification to control the corresponding audio processing operations in the corresponding audio processing process, so as to effectively ensure the audio effect and have good operability.
可以理解的是,本申请实施例的分类器模型可以采用支持向量机(Support Vector Machine,SVM)。支持向量机能够按监督学习方式对特征数据(如网络特征数据和音频特征数据)进行二元分类,其是一种广义线性分类器。It can be understood that, the classifier model in the embodiment of the present application may adopt a support vector machine (Support Vector Machine, SVM). The support vector machine can perform binary classification on feature data (such as network feature data and audio feature data) in a supervised learning manner, and it is a generalized linear classifier.
本申请实施例采用支持向量机作为分类器模型对网络特征数据和音频特征数据进行有监督学习的分类。本实施例的支持向量机通常存在三种核函数:线性核、多项式核和高斯核。由于线性核具有参数少和迭代快等优点,对于有监督学习的分类具有较好效果,因此,本实施例采用线性核的支持向量机作为分类器模型,其训练时间较短、精度较高。In the embodiment of the present application, a support vector machine is used as a classifier model to classify network feature data and audio feature data with supervised learning. The support vector machine of this embodiment usually has three kinds of kernel functions: linear kernel, polynomial kernel and Gaussian kernel. Since the linear kernel has the advantages of few parameters and fast iteration, it has a good effect on the classification of supervised learning. Therefore, this embodiment adopts the support vector machine of the linear kernel as the classifier model, and its training time is shorter and the accuracy is higher.
可以理解的是,还可以采用不同的分类器替代本申请实施例的支持向量机,包括但不限于:深度置信网络、线性分类器、随机森林等分类器。It can be understood that different classifiers can also be used to replace the support vector machine in the embodiment of the present application, including but not limited to: deep belief network, linear classifier, random forest and other classifiers.
可以理解的是,本申请实施例在将网络特征数据和音频特征数据输入至分类器模型之前,需要构建该分类器模型。It can be understood that, in the embodiment of the present application, the classifier model needs to be constructed before the network feature data and the audio feature data are input into the classifier model.
可采用离线训练和在线训练的方式来构建分类器模型。The classifier model can be constructed by means of offline training and online training.
例如,对于离线训练方式:收集来自不同平台的网络特征训练数据、音频特征训练数据等特征训练数据以构建训练样本,并将该训练样本进行数据存储。之后通过输入该训练样本至高性能机器中以训练出分类器模型。在离线训练中,可以对训练样本对应的特征训练向量不进行无监督聚类。For example, for the offline training method: collect feature training data such as network feature training data and audio feature training data from different platforms to construct a training sample, and store the training sample as data. Then, the classifier model is trained by inputting the training samples into a high-performance machine. In offline training, unsupervised clustering may not be performed on the feature training vectors corresponding to the training samples.
对于在线训练方式:在离线训练分类器模型的基础上,根据不同平台的网络处理情况、音频处理情况,对该分类器模型进行迭代微调,从而使得该分类器模型更加鲁棒。在在线训练中,需要专门设置一个子线程进行迭代微调,同时两次在线训练之间的时间间隔不能过短。For the online training method: On the basis of the offline training classifier model, iteratively fine-tuning the classifier model according to the network processing conditions and audio processing conditions of different platforms, so as to make the classifier model more robust. In online training, a sub-thread needs to be specially set up for iterative fine-tuning, and the time interval between two online trainings should not be too short.
可以理解的是,在一些情形下,在音频通信中,为了保障播放端的音频质量,可以采用播放端设计静态缓冲队列的方法来优化网络,例如优化网络抖动和丢包的情况。通过播放端设计静态缓冲队列的方法,虽然可以抵抗住大抖动,但是在缓冲队列长度设置不合理的情况下,会使得延时加大,导致实时性变差,从而影响音频效果。It can be understood that in some cases, in audio communication, in order to ensure the audio quality of the playback end, the method of designing a static buffer queue at the playback end can be used to optimize the network, such as optimizing network jitter and packet loss. Through the method of designing a static buffer queue on the playback side, although it can resist large jitters, if the length of the buffer queue is set unreasonably, it will increase the delay, resulting in poor real-time performance, thereby affecting the audio effect.
基于此,本申请实施例的网络特征数据包括网络缓存队列数据、网络抖动数据和网络丢包率数据,网络处理过程包括网络缓存队列过程和网络预测过程。Based on this, the network feature data in this embodiment of the present application includes network cache queue data, network jitter data, and network packet loss rate data, and the network processing process includes a network cache queue process and a network prediction process.
其中,网络缓存队列数据由网络实时传输协议包数据经网络缓存队列过程中的网络缓存操作得到;网络抖动数据和网络丢包率数据均由网络实时传输协议包数据经网络缓存队列过程中的网络缓存操作之后,再经网络预测过程中的网络预测操作得到。Among them, the network cache queue data is obtained from the network real-time transmission protocol packet data through the network cache operation in the network cache queue process; the network jitter data and the network packet loss rate data are both obtained from the network real-time transmission protocol packet data through the network cache queue process. After the cache operation, it is obtained through the network prediction operation in the network prediction process.
本申请实施例通过将音频编码数据进行打包,以形成网络实时传输协议包数据。网络实时传输协议包数据即可作为本实施例的音频控制方法的初始输入数据。网络实时传输协议包数据被传输至网络缓存队列过程中,通过网络缓存队列过程对网络实时传输协议包数据进行对应的网络缓存操作,以对初始输入数据即网络实时传输协议包数据实现网络缓存效果。例如,在网络缓存队列过程中可对接收到的每个网络实时传输协议包数据进行排序,该排序的方式可以以网络实时传输协议包数据依次进入网络缓存队列过程中对应的时间进行排序,从而得到每个网络实时传输协议包数据对应的序号数据,从而便于进一步计算得到网络抖动数据和网络丢包率数据。In the embodiment of the present application, audio coding data is packaged to form network real-time transmission protocol packet data. The network real-time transport protocol packet data can be used as the initial input data of the audio control method of this embodiment. During the process of the network real-time transmission protocol packet data being transmitted to the network cache queue, the corresponding network cache operation is performed on the network real-time transmission protocol packet data through the network cache queue process, so as to realize the network caching effect on the initial input data, that is, the network real-time transmission protocol packet data . For example, in the network cache queue process, each received network real-time transmission protocol packet data can be sorted, and the sorting method can be sorted according to the corresponding time when the network real-time transmission protocol packet data enters the network cache queue process sequentially, so that The serial number data corresponding to each network real-time transmission protocol packet data is obtained, so as to facilitate further calculation to obtain network jitter data and network packet loss rate data.
网络缓存队列数据可以为网络缓存队列长度数据,例如可以以字节形式表示网络缓存队 列长度数据的大小。网络缓存队列数据即为网络实时传输协议包数据在网络缓存队列过程中对应的缓存个数。The network cache queue data may be network cache queue length data, for example, the size of the network cache queue length data may be expressed in bytes. The network cache queue data is the corresponding cache number of network real-time transport protocol packet data in the network cache queue process.
网络实时传输协议包数据进入网络缓存队列过程中,经对应的网络缓存操作,得到网络缓存队列数据。经网络缓存队列过程中对应的网络缓存操作之后取出网络实时传输协议包数据,此时,网络实时传输协议包数据再进入网络预测过程,经对应的网络预测操作,得到网络抖动数据和网络丢包率数据。通过设置动态缓冲队列,能够有效降低延时,提高实时性。During the process of network real-time transmission protocol packet data entering the network cache queue, the corresponding network cache operation is performed to obtain the network cache queue data. After the corresponding network cache operation in the network cache queue process, the network real-time transmission protocol packet data is taken out. At this time, the network real-time transmission protocol packet data enters the network prediction process, and the network jitter data and network packet loss are obtained through the corresponding network prediction operation. rate data. By setting a dynamic buffer queue, the delay can be effectively reduced and the real-time performance can be improved.
可以理解的是,本申请实施例的音频特征数据包括音频解码速率数据、音频缓存队列数据和音频消耗速率数据,音频处理过程包括音频解码过程、音频缓存过程和音频消耗过程。It can be understood that the audio feature data in this embodiment of the present application includes audio decoding rate data, audio buffer queue data and audio consumption rate data, and the audio processing process includes audio decoding process, audio buffering process and audio consumption process.
其中,音频解码速率数据由网络实时传输协议包数据经网络缓存队列过程中的网络缓存操作之后,再经音频解码过程中的音频解码操作得到;音频缓存队列数据由网络实时传输协议包数据经音频解码过程中的音频解码操作得到音频解码数据之后,再由音频解码数据经音频缓存过程中的音频缓存操作得到;音频消耗速率数据由音频解码数据经音频缓存过程中的音频缓存操作之后,再经音频消耗过程中的音频消耗操作得到。Among them, the audio decoding rate data is obtained by the network real-time transmission protocol packet data through the network cache operation in the network cache queue process, and then through the audio decoding operation in the audio decoding process; the audio cache queue data is obtained by the network real-time transmission protocol packet data through the audio After the audio decoding operation in the decoding process obtains the audio decoding data, the audio decoding data is obtained through the audio buffering operation in the audio buffering process; the audio consumption rate data is obtained from the audio decoding data through the audio buffering operation in the audio buffering process, and then through The audio consuming operation in the audio consuming process gets.
本申请实施例的网络实时传输协议包数据经网络缓存队列过程中的网络缓存操作之后,再传输至音频解码过程中经对应的音频解码操作,得到音频解码速率数据和音频解码数据;之后,音频解码数据经音频缓存过程中对应的音频缓存操作,得到音频缓存队列数据;而音频解码数据经音频缓存过程中对应的音频缓存操作之后,再经音频消耗过程中的音频消耗操作得到音频消耗速率数据。The network real-time transmission protocol packet data in the embodiment of the present application is transmitted to the audio decoding process through the corresponding audio decoding operation after the network buffering operation in the network buffering queue process to obtain audio decoding rate data and audio decoding data; after that, the audio The audio buffer queue data is obtained by decoding the data through the corresponding audio buffering operation in the audio buffering process; after the audio decoding data is processed by the corresponding audio buffering operation in the audio buffering process, the audio consumption rate data is obtained through the audio consumption operation in the audio consumption process .
可以理解的是,本申请实施例的音频消耗可以为音频播放。It can be understood that the audio consumption in this embodiment of the present application may be audio playback.
音频缓存队列数据可以为音频缓存队列长度数据,例如可以以字节形式表示音频缓存队列长度数据的大小。音频缓存队列数据即为音频解码数据在音频缓存过程中对应的缓存个数。The audio buffer queue data may be audio buffer queue length data, for example, the size of the audio buffer queue length data may be expressed in bytes. The audio buffer queue data is the number of buffers corresponding to the audio decoding data in the audio buffering process.
可以理解的是,定义网络缓存队列数据表示为L
net;音频缓存队列数据表示为L
audio。
It can be understood that the defined network buffer queue data is represented as L net ; the audio buffer queue data is represented as L audio .
一实施例中,音频编解码通常以20ms一个单位时间对应的音频数据作为基础处理单元。例如,对20ms一个单位时间对应的音频数据经音频编码过程中的音频编码操作得到音频编码数据,对该音频编码数据进行打包,以形成网络实时传输协议包数据,网络实时传输协议包数据作为初始输入数据进入网络处理过程中的网络缓存队列过程。本实施例的网络实时传输协议包数据经网络缓存队列过程中的网络缓存操作之后,再经音频解码过程中的音频解码操作得到音频解码数据。可以理解的是,该音频解码数据也可以为20ms一个单位时间对应的音频数据。In one embodiment, the audio codec generally uses audio data corresponding to a unit time of 20 ms as a basic processing unit. For example, the audio data corresponding to a unit time of 20 ms is obtained through the audio encoding operation in the audio encoding process, and the audio encoding data is packaged to form network real-time transmission protocol packet data, which is used as the initial Input data enters the network cache queue process during network processing. The network real-time transport protocol packet data in this embodiment is subjected to the network buffering operation in the network buffering queue process, and then the audio decoding operation in the audio decoding process to obtain audio decoding data. It can be understood that the audio decoding data may also be audio data corresponding to a unit time of 20 ms.
网络实时传输协议包数据经网络缓存队列过程中对应的网络缓存操作之后,再经网络预测过程中的网络预测操作。例如,网络预测过程中对应的网络预测操作,可以为:通过获取相邻两个网络实时传输协议包数据进入网络缓存队列过程中的时间差值、网络缓存队列过程中对应的网络实时传输协议包数据的缓存个数、网络缓存队列过程中每个网络实时传输协议包数据对应的序号数据等,来计算网络缓存队列数据、网络抖动数据和网络丢包率数据。之后,再将网络缓存队列数据、网络抖动数据和网络丢包率数据反馈到音频控制过程。可以理解的是,本实施例的音频控制过程,即执行步骤S200、S300。The network real-time transmission protocol packet data is subjected to the network prediction operation in the network prediction process after the corresponding network cache operation in the network cache queue process. For example, the corresponding network prediction operation in the network prediction process can be: by obtaining the time difference between two adjacent network real-time transmission protocol packet data entering the network cache queue, and the corresponding network real-time transmission protocol packet data in the network cache queue process The number of cached data, the serial number data corresponding to each network real-time transmission protocol packet data in the network cache queue process, etc., are used to calculate the network cache queue data, network jitter data and network packet loss rate data. After that, the network cache queue data, network jitter data and network packet loss rate data are fed back to the audio control process. It can be understood that the audio control process in this embodiment is to execute steps S200 and S300.
定义相邻两个网络实时传输协议包数据进入网络缓存队列过程中的两包的时间差值为Net
jitter,即该时间差值Net
jitter表示为网络实时传输协议包数据对应的网络抖动数据;其中,Net
jitter=T
current-T
last;T
current表示当前的网络实时传输协议包数据进入网络缓存队列过程对应 的时间,T
last表示上一个网络实时传输协议包数据进入网络缓存队列过程对应的时间。
Define the time difference between two packets in the process of two adjacent network real-time transmission protocol packets entering the network cache queue as Net jitter , that is, the time difference Net jitter is expressed as the network jitter data corresponding to the network real-time transmission protocol packet data; where , Net jitter =T current -T last ; T current represents the corresponding time of the current network real-time transmission protocol packet data entering the network cache queue process, and T last represents the corresponding time of the last network real-time transmission protocol packet data entering the network cache queue process.
可以理解的是,在网络缓存队列过程中可对接收到的每个网络实时传输协议包数据进行排序,即根据每个网络实时传输协议包数据依次进入网络缓存队列过程中对应的时间进行排序,得到每个网络实时传输协议包数据对应的序号数据。It can be understood that in the process of the network cache queue, the data of each network real-time transmission protocol packet received can be sorted, that is, sorted according to the corresponding time when each network real-time transmission protocol packet data enters the network cache queue in sequence, The serial number data corresponding to each network real-time transmission protocol packet data is obtained.
定义网络丢包率数据表示为Net
lost;通过网络缓存队列过程中对应的网络实时传输协议包数据的缓存个数除以网络缓存队列过程中网络实时传输协议包数据对应的最大序号数据和网络实时传输协议包数据对应的最小序号数据的差值,得到网络丢包率数据Net
lost,即Net
lost=num(rtp)/(index(rtp)
max-index(rtp)
min);其中,rtp表示网络实时传输协议包数据;num(rtp)表示网络缓存队列过程中对应的网络实时传输协议包数据的缓存个数;index(rtp)
max表示网络缓存队列过程中网络实时传输协议包数据对应的最大序号数据;index(rtp)
min表示网络缓存队列过程中网络实时传输协议包数据对应的最小序号数据。
Define the network packet loss rate data as Net lost ; divide the corresponding maximum serial number data and network real-time transmission protocol packet data in the network cache queue process by the number of buffers corresponding to the network real-time transmission protocol packet data in the network cache queue process The difference of the minimum sequence number data corresponding to the transmission protocol packet data obtains the network packet loss rate data Net lost , that is, Net lost =num(rtp)/(index(rtp) max -index(rtp) min ); wherein, rtp represents the network Real-time transmission protocol packet data; num(rtp) indicates the cache number of corresponding network real-time transmission protocol packet data in the network cache queue process; index(rtp) max indicates the maximum serial number corresponding to network real-time transmission protocol packet data in the network cache queue process data; index(rtp) min indicates the minimum serial number data corresponding to the network real-time transmission protocol packet data in the network cache queue process.
可以理解的是,本实施例的音频特征数据包括音频解码速率数据、音频缓存队列数据和音频消耗速率数据。定义音频解码速率数据表示为V
decode,音频消耗速率数据表示为V
play。
It can be understood that the audio feature data in this embodiment includes audio decoding rate data, audio buffer queue data, and audio consumption rate data. Define audio decoding rate data as V decode , and audio consumption rate data as V play .
V
decode=T
audio1/T
2-T
1,其中,T
audio1表示音频解码过程中在预设的第一计算周期内音频解码数据所对应的解码时间,单位为ms;T
2表示上述第一计算周期对应的结束时间,T
1表示上述第一计算周期对应的起始时间;可以理解的是,audio1表示音频解码过程中在预设的第一计算周期内音频解码数据所对应的数据长度。
V decode = T audio1 /T 2 -T 1 , wherein, T audio1 represents the decoding time corresponding to the audio decoding data in the preset first calculation cycle during the audio decoding process, and the unit is ms; T 2 represents the above-mentioned first calculation The end time corresponding to the cycle, T1 represents the start time corresponding to the above-mentioned first calculation cycle; it can be understood that audio1 represents the data length corresponding to the audio decoding data in the preset first calculation cycle during the audio decoding process.
V
play=T
audio2/T
4-T
3,其中,T
audio2表示音频消耗过程中在预设的第二计算周期内音频消耗数据所对应的消耗时间,单位为ms,T
4表示上述第二计算周期对应的结束时间,T
3表示上述第二计算周期对应的起始时间;可以理解的是,audio2表示音频消耗过程中在预设的第二计算周期内音频消耗数据所对应的音频消耗数据长度;其中,音频消耗数据可以为音频拉伸数据或音频压缩数据或音频解码数据。可以理解的是,第一计算周期和第二计算周期可以为相同也可以为不同,在此不作限定。例如,对于第二计算周期为100ms,一定音频消耗数据长度对应的音频消耗数据所对应的消耗时间为80ms,则对应的音频消耗速率数据为0.8。在其他实施例中,也可以表示为80%。
V play =T audio2 /T 4 -T 3 , wherein, T audio2 represents the consumption time corresponding to the audio consumption data in the preset second calculation period during the audio consumption process, and the unit is ms, and T 4 represents the above-mentioned second calculation The end time corresponding to the period, T3 indicates the start time corresponding to the second calculation period above; it can be understood that audio2 indicates the length of the audio consumption data corresponding to the audio consumption data in the preset second calculation period during the audio consumption process ; Wherein, the audio consumption data may be audio stretching data or audio compression data or audio decoding data. It can be understood that the first calculation period and the second calculation period may be the same or different, which is not limited herein. For example, if the second calculation period is 100ms, and the consumption time corresponding to the audio consumption data corresponding to a certain audio consumption data length is 80ms, then the corresponding audio consumption rate data is 0.8. In other embodiments, it can also be expressed as 80%.
可以理解的是,网络缓存队列数据、网络抖动数据、网络丢包率数据、音频解码速率数据、音频缓存队列数据和音频消耗速率数据均为瞬时数据。通过在预设的计算周期内,获取上述网络特征数据和音频特征数据,并构建一个特征向量F。特征向量F[N]=[L
netΛL
audioΛNet
jitterΛNet
lostΛV
decodeΛV
play],以反馈到音频控制过程,即通过将网络特征数据和音频特征数据输入至分类器模型,得到若干音频控制指令,根据音频控制指令,控制对应的音频处理过程中对应的音频处理操作。
It can be understood that the network buffer queue data, network jitter data, network packet loss rate data, audio decoding rate data, audio buffer queue data and audio consumption rate data are all instantaneous data. A feature vector F is constructed by acquiring the aforementioned network feature data and audio feature data within a preset calculation period. Feature vector F[N]=[L net ΛL audio ΛNet jitter ΛNet lost ΛV decode ΛV play ], to feed back to the audio control process, that is, by inputting the network feature data and audio feature data into the classifier model, a number of audio control instructions are obtained , according to the audio control instruction, control the corresponding audio processing operation in the corresponding audio processing process.
可以理解的是,训练好的分类器模型中带有w和b;通过输入特征向量F至分类器模型中,以输出得到类别标签C,即得到音频控制指令。C=sigmoid(w*F+b),其中,w表示特征矩阵,b表示偏移量,F表示特征向量,sigmoid表示激活函数,C表示类别标签。通过获取特征向量,从而预测得到将要下发的音频控制指令,以降低延时,进而保证音频效果。It can be understood that w and b are included in the trained classifier model; by inputting the feature vector F into the classifier model, the class label C is obtained as an output, that is, the audio control instruction is obtained. C=sigmoid(w*F+b), where w represents the feature matrix, b represents the offset, F represents the feature vector, sigmoid represents the activation function, and C represents the category label. By obtaining the feature vector, the audio control command to be issued can be predicted to reduce the delay and ensure the audio effect.
参照图2,可以理解的是,步骤S200,包括但不限于:Referring to FIG. 2, it can be understood that step S200 includes but is not limited to:
步骤S210,将网络缓存队列数据、网络抖动数据、网络丢包率数据、音频解码速率数据、音频缓存队列数据和音频消耗速率数据均输入至分类器模型,得到音频解码控制指令和音频变调变速控制指令。Step S210, input the network buffer queue data, network jitter data, network packet loss rate data, audio decoding rate data, audio buffer queue data and audio consumption rate data into the classifier model to obtain audio decoding control instructions and audio modulation and speed control instruction.
可以理解的是,网络特征数据包括网络抖动数据、网络丢包率数据;音频特征数据包括音频解码速率数据、音频缓存队列数据和音频消耗速率数据。It can be understood that the network feature data includes network jitter data and network packet loss rate data; the audio feature data includes audio decoding rate data, audio buffer queue data, and audio consumption rate data.
本申请实施例针对网络处理过程中的网络特征数据以及音频处理过程中的音频特征数据进行建模,即通过将网络特征数据和音频特征数据输入到训练好的分类器模型中,以分类得到若干音频控制指令,即音频解码控制指令和音频变调变速控制指令。在不影响音频效果例如音频消耗/播放效果的前提下,根据音频控制指令,控制对应的音频处理过程中对应的音频处理操作。本申请实施例能够适应不同的网络情况,保证音频效果,进而能够保证视频通信的有效性和实时性。The embodiment of the present application models the network feature data in the network processing process and the audio feature data in the audio processing process, that is, by inputting the network feature data and audio feature data into the trained classifier model, several Audio control commands, namely, audio decoding control commands and audio pitch and speed control commands. On the premise of not affecting the audio effect such as audio consumption/playing effect, the corresponding audio processing operation in the corresponding audio processing process is controlled according to the audio control instruction. The embodiments of the present application can adapt to different network conditions, ensure audio effects, and further ensure the effectiveness and real-time performance of video communication.
可以理解的是,本申请实施例通过对网络处理过程、音频处理过程的各个环节所对应的特征数据分别进行提取,即得到网络缓存队列数据、网络抖动数据、网络丢包率数据、音频解码速率数据、音频缓存队列数据和音频消耗速率数据。本申请实施例能够避免人为设定阈值和制定规则等,通过利用分类器模型对特征数据进行有监督的分类,以便于自适应下发音频控制指令,以降低延时。It can be understood that, in the embodiment of the present application, by extracting the feature data corresponding to each link of the network processing process and the audio processing process, the network cache queue data, network jitter data, network packet loss rate data, and audio decoding rate data are obtained. data, audio buffer queue data, and audio consumption rate data. The embodiment of the present application can avoid artificially setting thresholds and formulating rules, etc., and uses a classifier model to carry out supervised classification of feature data, so as to facilitate adaptive delivery of audio control instructions to reduce delay.
可以理解的是,本申请实施例的根据音频控制指令,控制对应的音频处理过程中对应的音频处理操作,包括但不限于:It can be understood that, according to the embodiment of the present application, according to the audio control instruction, the corresponding audio processing operation in the corresponding audio processing process is controlled, including but not limited to:
根据音频解码控制指令,控制对应的音频解码过程中对应的音频解码操作;和/或,According to the audio decoding control instruction, control the corresponding audio decoding operation in the corresponding audio decoding process; and/or,
根据音频变调变速控制指令,控制对应的音频变调变速过程中对应的音频变调变速操作。According to the audio pitch shifting and speed changing control instruction, the corresponding audio pitch shifting and speed changing operation during the corresponding audio pitch shifting and speed changing process is controlled.
可以理解的是,音频变调变速过程可包括音频拉伸过程和音频压缩过程。It can be understood that the process of audio pitch shifting and speed change may include audio stretching and audio compression.
可以理解的是,本申请实施例能够合理利用网络处理过程、音频处理过程的各个环节所对应的特征数据,在这过程中不需要设定阈值,通过分类器模型对该特征数据进行数据处理并通过自适应控制方式,控制对应的音频处理过程中对应的音频处理操作,可操作性较好。It can be understood that the embodiment of the present application can reasonably utilize the feature data corresponding to each link of the network processing process and the audio processing process. In this process, there is no need to set a threshold, and the feature data is processed and analyzed by the classifier model. The corresponding audio processing operation in the corresponding audio processing process is controlled through the adaptive control method, and the operability is good.
参照图3,一些实施例中,音频变调变速控制指令可以由如下至少一个步骤得到:Referring to FIG. 3 , in some embodiments, the audio pitch and speed control command can be obtained by at least one of the following steps:
步骤S201,当音频缓存队列数据对应的数据长度小于第一音频消耗速率对应的第一音频消耗数据长度,音频变调变速控制指令为音频拉伸控制指令;或Step S201, when the data length corresponding to the audio buffer queue data is less than the first audio consumption data length corresponding to the first audio consumption rate, the audio modulation and speed control instruction is an audio stretching control instruction; or
步骤S202,当音频缓存队列数据对应的数据长度大于第一时间间隔阈值对应的第一音频数据长度且小于第二时间间隔阈值对应的第二音频数据长度,或音频缓存队列数据对应的数据长度大于第二音频消耗速率对应的第二音频消耗数据长度且小于第三音频消耗速率对应的第三音频消耗数据长度,音频变调变速控制指令为音频压缩控制指令,其中,第二音频消耗数据长度大于第一音频消耗数据长度。Step S202, when the data length corresponding to the audio buffer queue data is greater than the first audio data length corresponding to the first time interval threshold and smaller than the second audio data length corresponding to the second time interval threshold, or the data length corresponding to the audio buffer queue data is greater than The second audio consumption data length corresponding to the second audio consumption rate is less than the third audio consumption data length corresponding to the third audio consumption rate, and the audio modulation speed control instruction is an audio compression control instruction, wherein the second audio consumption data length is greater than the first audio consumption data length An audio consumes data length.
本实施例通过根据分类条件,以使得将网络特征数据和音频特征数据输入至分类器模型中,能够得到对应的音频控制指令,根据分类条件,音频变调变速控制指令可以为音频拉伸控制指令或者音频压缩控制指令。In this embodiment, according to the classification conditions, the network feature data and audio feature data are input into the classifier model, and the corresponding audio control instructions can be obtained. According to the classification conditions, the audio pitch and speed control instructions can be audio stretching control instructions or Audio compression control commands.
可以理解的是,上述步骤S201、步骤S202仅是对应的音频拉伸控制指令或者音频压缩控制指令的一个分类条件,在其他实施例中,还可以根据其他特征数据,得到音频拉伸控制指令或者音频压缩控制指令,在此不再赘述。It can be understood that the above step S201 and step S202 are only a classification condition of the corresponding audio stretching control instruction or audio compression control instruction. In other embodiments, the audio stretching control instruction or audio compression control instruction can also be obtained according to other characteristic data. The audio compression control command will not be repeated here.
参照图4,一些实施例中,音频解码控制指令可以由如下至少一个步骤得到:Referring to Figure 4, in some embodiments, the audio decoding control instruction can be obtained by at least one of the following steps:
步骤S203,当音频缓存队列数据对应的数据长度大于第二时间间隔阈值对应的第二音频数据长度或大于第三音频消耗速率对应的第三音频消耗数据长度,音频解码控制指令为停止音频解码控制指令;或Step S203, when the data length corresponding to the audio buffer queue data is greater than the second audio data length corresponding to the second time interval threshold or greater than the third audio consumption data length corresponding to the third audio consumption rate, the audio decoding control instruction is to stop audio decoding control order; or
步骤S204,获取预设时间内音频拉伸控制指令对应的第一执行总次数和音频压缩控制指令对应的第二执行总次数,当预设时间内不存在停止音频解码控制指令,且第一执行总次数和第二执行总次数之间的差值的绝对值小于预设阈值,音频解码控制指令为音频解码均速控制指令。Step S204, obtain the first total number of executions corresponding to the audio stretching control instruction and the second total number of executions corresponding to the audio compression control instruction within the preset time, when there is no stop audio decoding control instruction within the preset time, and the first execution The absolute value of the difference between the total number of times and the second total number of executions is smaller than a preset threshold, and the audio decoding control instruction is an audio decoding average speed control instruction.
本实施例通过根据分类条件,以使得将网络特征数据和音频特征数据输入至分类器模型中,能够得到对应的音频控制指令,根据分类条件,音频变调变速控制指令可以为停止音频解码控制指令或者音频解码均速控制指令。In this embodiment, according to the classification conditions, the network feature data and audio feature data are input into the classifier model, and the corresponding audio control instructions can be obtained. According to the classification conditions, the audio modulation and speed control instructions can be stop audio decoding control instructions or Audio decoding average speed control command.
可以理解的是,上述步骤S203、步骤S204仅是对应的停止音频解码控制指令或者音频解码均速控制指令的一个分类条件,在其他实施例中,还可以根据其他特征数据,得到停止音频解码控制指令或者音频解码均速控制指令,在此不再赘述。It can be understood that the above steps S203 and S204 are only a classification condition for the corresponding stop audio decoding control instruction or audio decoding average speed control instruction. In other embodiments, the stop audio decoding control can also be obtained according to other characteristic data. Instructions or audio decoding average speed control instructions will not be described in detail here.
可以理解的是,本实施例的音频控制指令包括音频拉伸控制指令、音频压缩控制指令、停止音频解码控制指令以及音频解码均速控制指令。It can be understood that the audio control instruction in this embodiment includes an audio stretch control instruction, an audio compression control instruction, a stop audio decoding control instruction, and an audio decoding average speed control instruction.
对于音频控制指令的下发,可预设一个下发时间间隔阈值T
control,即T
control表示每经过一个下发时间间隔阈值,则音频控制指令自适应下发一次。可以理解的是,该T
control可对应于一个网络实时传输协议包数据对应的数据长度,即一个预设的数据长度的网络实时传输协议包数据所对应的传输时间,可表示为T
control。
For the sending of audio control commands, a sending time interval threshold T control may be preset, that is, T control means that the audio control command is sent adaptively every time a sending time interval threshold passes. It can be understood that the T control may correspond to the data length of an RTP packet data, that is, the transmission time corresponding to the RTP packet data of a preset data length, which may be expressed as T control .
对于本实施例的分类器模型而言,除了特征训练向量外,对于每个网络特征训练数据、音频特征训练数据等特征训练数据都需要有与之对应的类别标签,才能训练出一个较好的分类器模型。分类器模型构建好后,将特征向量F输入至分类器模型中,便可输出得到类别标签,该类别标签即对应上述四个类别的音频控制指令(即音频拉伸控制指令、音频压缩控制指令、停止音频解码控制指令以及音频解码均速控制指令)中的之一,从而达到控制的效果,可操作性较好。For the classifier model of this embodiment, in addition to the feature training vector, each feature training data such as network feature training data and audio feature training data needs to have a corresponding category label, in order to train a better classifier model. After the classifier model is built, the feature vector F is input into the classifier model, and the category label can be outputted, and the category label corresponds to the audio control instructions of the above four categories (i.e. audio stretch control instruction, audio compression control instruction , stop audio decoding control instruction and audio decoding average speed control instruction), so as to achieve the effect of control, and the operability is good.
当音频缓存队列数据对应的数据长度大于第二时间间隔阈值6*T
control对应的第二音频数据长度或大于第三音频消耗速率6*V
play’对应的第三音频消耗数据长度,则该类别标签对应的音频解码控制指令为停止音频解码控制指令。
When the data length corresponding to the audio buffer queue data is greater than the second audio data length corresponding to the second time interval threshold 6*T control or greater than the third audio consumption data length corresponding to the third audio consumption rate 6*V play' , the category The audio decoding control command corresponding to the tag is the stop audio decoding control command.
当音频缓存队列数据对应的数据长度小于第一音频消耗速率V
play’对应的第一音频消耗数据长度,则该类别标签对应的音频变调变速控制指令为音频拉伸控制指令。
When the data length corresponding to the audio buffer queue data is smaller than the first audio consumption data length corresponding to the first audio consumption rate V play' , the audio pitch and speed change control command corresponding to the category label is an audio stretch control command.
当音频缓存队列数据对应的数据长度大于第一时间间隔阈值4*T
control对应的第一音频数据长度且小于第二时间间隔阈值6*T
control对应的第二音频数据长度,或音频缓存队列数据对应的数据长度大于第二音频消耗速率4*V
play’对应的第二音频消耗数据长度且小于第三音频消耗速率6*V
play’对应的第三音频消耗数据长度,该类别标签对应的音频变调变速控制指令为音频压缩控制指令,其中,第二音频消耗数据长度大于第一音频消耗数据长度。
When the data length corresponding to the audio buffer queue data is greater than the first audio data length corresponding to the first time interval threshold 4*T control and less than the second audio data length corresponding to the second time interval threshold 6*T control , or the audio buffer queue data The corresponding data length is greater than the second audio consumption data length corresponding to the second audio consumption rate 4*V play' and smaller than the third audio consumption data length corresponding to the third audio consumption rate 6*V play' , the audio corresponding to the category label The pitch shifting and speed changing control command is an audio compression control command, wherein the length of the second audio consumption data is greater than the length of the first audio consumption data.
获取预设时间内音频拉伸控制指令对应的第一执行总次数n
draw和音频压缩控制指令对应的第二执行总次数n
compress,当预设时间内例如当前n
control个音频控制指令中不存在停止音频解码控制指令,且第一执行总次数和第二执行总次数之间的差值的绝对值小于预设阈值Δn,即abs(n
draw-n
compress)<Δn,则该标签对应的音频解码控制指令为音频解码均速控制指令;此时,表示预设时间内音频拉伸控制指令和音频压缩控制指令间的执行总次数较一致。
Obtain the first total number of executions n draw corresponding to the audio stretching control instruction within the preset time and the second total number of executions n compress corresponding to the audio compression control instruction, when there is no such as the current n control audio control instructions within the preset time Stop the audio decoding control instruction, and the absolute value of the difference between the first total number of executions and the second total number of executions is less than the preset threshold Δn, that is, abs(n draw -n compress )<Δn, then the audio corresponding to the label The decoding control command is an audio decoding average speed control command; at this time, it means that the total execution times of the audio stretching control command and the audio compression control command are relatively consistent within the preset time.
可以理解的是,由于T
control可对应于一个网络实时传输协议包数据对应的数据长度,因此,第一时间间隔阈值4*T
control可对应于4个网络实时传输协议包数据对应的数据长度,即第一 音频数据长度;第二时间间隔阈值6*T
control可对应于6个网络实时传输协议包数据对应的数据长度,即第二音频数据长度。
It can be understood that, since T control may correspond to the data length corresponding to one RTP packet data, therefore, the first time interval threshold 4*T control may correspond to the data length corresponding to 4 RTP packet data, That is, the first audio data length; the second time interval threshold 6*T control may correspond to the data length corresponding to 6 RTP packet data, that is, the second audio data length.
可以理解的是,第一音频消耗速率V
play’表示音频消耗过程中在预设单位时间内第一音频消耗数据所对应的音频消耗速率,其中,第一音频消耗数据对应第一音频消耗数据长度;因此,第二音频消耗速率4*V
play’表示音频消耗过程中在预设单位时间内4倍第一音频消耗数据所对应的音频消耗速率,其中,4倍第一音频消耗数据对应4倍第一音频消耗数据长度,即第二音频消耗数据长度;第三音频消耗速率6*V
play’表示音频消耗过程中在预设单位时间内6倍第一音频消耗数据所对应的音频消耗速率,其中,6倍第一音频消耗数据对应6倍第一音频消耗数据长度,即第三音频消耗数据长度。
It can be understood that the first audio consumption rate V play' represents the audio consumption rate corresponding to the first audio consumption data within a preset unit time during the audio consumption process, wherein the first audio consumption data corresponds to the length of the first audio consumption data ; Therefore, the second audio consumption rate 4*V play' represents the audio consumption rate corresponding to 4 times the first audio consumption data in the preset unit time during the audio consumption process, wherein 4 times the first audio consumption data corresponds to 4 times The length of the first audio consumption data, that is, the length of the second audio consumption data; the third audio consumption rate 6*V play' represents the audio consumption rate corresponding to 6 times the first audio consumption data in the preset unit time during the audio consumption process, Wherein, 6 times of the first audio consumption data corresponds to 6 times of the length of the first audio consumption data, that is, the length of the third audio consumption data.
可以理解的是,音频解码均速控制指令下发时,不存在音频拉伸控制指令、音频压缩控制指令和停止音频解码控制指令。It can be understood that when the audio decoding average speed control command is issued, there are no audio stretch control commands, audio compression control commands, and audio decoding stop control commands.
可以理解的是,上述的类别标签即音频控制指令是根据规则/分类条件而分类得到的。在一些实施例中,具有较大的重复性。为了减少特征数据的数目,降低分类器模型的数据处理难度,对于同一个类别中的输入样本数目进行无监督聚类,例如使用k均值聚类算法(k-means clustering algorithm,k-means),从而便于实现对特征数据的提取。通过无监督聚类,使得四个类别标签中的每个类别标签对应的输入样本数目变得均衡,同时使输入样本更具代表性。It can be understood that the above-mentioned category labels, that is, the audio control instructions are classified according to rules/classification conditions. In some embodiments, there is greater repeatability. In order to reduce the number of feature data and reduce the difficulty of data processing of the classifier model, unsupervised clustering is performed on the number of input samples in the same category, such as using k-means clustering algorithm (k-means), This facilitates the extraction of feature data. Through unsupervised clustering, the number of input samples corresponding to each of the four class labels is balanced, and the input samples are more representative.
可以理解的是,当音频消耗速率数据大于音频解码速率数据,下发音频拉伸控制指令。It can be understood that when the audio consumption rate data is greater than the audio decoding rate data, an audio stretch control command is issued.
一实施例中,在网络预测过程中,当网络抖动及丢包的情况较明显时,例如网络抖动数据和网络丢包率数据均大于设定值时,同时音频缓存过程中的音频解码数据即将不足音频消耗过程中的消耗,即不足执行音频消耗操作,下发音频拉伸控制指令。In one embodiment, in the network prediction process, when the network jitter and packet loss are more obvious, for example, when the network jitter data and the network packet loss rate data are greater than the set value, the audio decoding data in the audio buffering process will be about to Insufficient consumption in the audio consumption process, that is, insufficient to perform audio consumption operations, and issue audio stretch control commands.
一实施例中,当音频消耗过程中反馈的音频消耗速率数据小于音频解码速率数据,下发音频压缩控制指令。In one embodiment, when the audio consumption rate data fed back during the audio consumption process is less than the audio decoding rate data, an audio compression control command is issued.
一实施例中,当网络情况较好,音频解码速率数据较快,下发停止音频解码控制指令或者音频压缩控制指令。In one embodiment, when the network condition is good and the audio decoding rate data is fast, a control instruction to stop audio decoding or an audio compression control instruction is issued.
参照图5,可以理解的是,音频处理过程包括音频拉伸过程;步骤S300,包括但不限于:Referring to FIG. 5, it can be understood that the audio processing process includes an audio stretching process; step S300, including but not limited to:
步骤S310,当音频变调变速控制指令为音频拉伸控制指令,根据音频拉伸控制指令,控制音频解码数据经音频拉伸过程中的音频拉伸操作,得到音频拉伸数据;Step S310, when the audio pitch and speed control command is an audio stretching control command, according to the audio stretching control command, control the audio decoding data through the audio stretching operation in the audio stretching process to obtain audio stretching data;
步骤S320,控制音频拉伸数据经音频缓存过程中的音频缓存操作之后,再经音频消耗过程中的音频消耗操作。Step S320, control the audio stretching data to go through the audio consumption operation in the audio consumption process after the audio buffering operation in the audio buffering process.
可以理解的是,通过音频拉伸控制指令控制音频解码数据经音频拉伸过程中的音频拉伸操作,得到音频拉伸数据,进而使得音频拉伸数据经音频缓存过程中的音频缓存操作之后,再经音频消耗过程中的音频消耗操作。It can be understood that the audio stretching operation is controlled by the audio stretching control instruction to control the audio decoding data through the audio stretching operation in the audio stretching process to obtain the audio stretching data, so that after the audio stretching data undergoes the audio buffering operation in the audio buffering process, Then go through the audio consumption operation in the audio consumption process.
参照图6,可以理解的是,音频处理过程包括音频压缩过程;步骤S300,包括但不限于:Referring to FIG. 6, it can be understood that the audio processing process includes an audio compression process; step S300, including but not limited to:
步骤S330,当音频变调变速控制指令为音频压缩控制指令,根据音频压缩控制指令,控制音频解码数据经音频压缩过程中的音频压缩操作,得到音频压缩数据;Step S330, when the audio modulation and speed control instruction is an audio compression control instruction, according to the audio compression control instruction, control the audio decoding data to undergo an audio compression operation in the audio compression process to obtain audio compression data;
步骤S340,控制音频压缩数据经音频缓存过程中的音频缓存操作之后,再经音频消耗过程中的音频消耗操作。Step S340, control the audio compression data to go through the audio consuming operation in the audio consuming process after the audio buffering operation in the audio buffering process.
可以理解的是,通过音频压缩控制指令控制音频解码数据经音频压缩过程中的音频压缩 操作,得到音频压缩数据,进而使得音频压缩数据经音频缓存过程中的音频缓存操作之后,再经音频消耗过程中的音频消耗操作。It can be understood that the audio decoding data is controlled through the audio compression operation in the audio compression process through the audio compression control command to obtain the audio compression data, and then the audio compression data is processed through the audio buffer operation in the audio buffer process, and then through the audio consumption process Audio consumption operations in .
可以理解的是,一些实施例中,本申请实施例的音频解码数据包括静音数据、浊音数据和未处理解码数据。音频变调变速过程包括音频拉伸过程、音频压缩过程。It can be understood that, in some embodiments, the audio decoding data in the embodiment of the present application includes silent data, voiced data and unprocessed decoding data. The audio pitch and speed change process includes audio stretching process and audio compression process.
通过音频拉伸控制指令控制静音数据和浊音数据经音频拉伸过程中的音频拉伸操作,得到音频拉伸数据(即静音拉伸数据和浊音拉伸数据),之后,音频拉伸数据和未处理解码数据经音频缓存过程中的音频缓存操作之后,再经音频消耗过程中的音频消耗操作。Control the mute data and the voiced sound data through the audio stretching operation in the audio stretching process by the audio stretching control command to obtain the audio stretching data (i.e., the silent stretching data and the voiced sound stretching data), after that, the audio stretching data and the unvoiced sound stretching data After processing the decoded data, it goes through the audio consumption operation in the audio consumption process after the audio buffer operation in the audio buffer process.
或者,通过音频压缩控制指令控制静音数据和浊音数据经音频压缩过程中的音频压缩操作,得到音频压缩数据(即静音压缩数据和浊音压缩数据),之后,音频压缩数据和未处理解码数据经音频缓存过程中的音频缓存操作之后,再经音频消耗过程中的音频消耗操作。Alternatively, audio compression data (i.e. silent compressed data and voiced compressed data) is obtained by controlling the audio compression operation of the silent data and voiced data through the audio compression process through the audio compression control command, after which the audio compressed data and the unprocessed decoded data are processed by audio After the audio caching operation in the caching process, the audio consuming operation in the audio consuming process is performed.
可以理解的是,本申请实施例通过根据网络处理情况和音频处理情况,可控地下发音频变调变速控制指令(例如音频拉伸控制指令或音频压缩控制指令),使得音频解码数据经音频变调变速过程中对应的音频变调变速操作之后,得到音频变调变速数据(即音频拉伸数据或音频压缩数据),从而有效保证音频效果,且能够降低延时。It can be understood that, in the embodiment of the present application, according to the network processing situation and the audio processing situation, controllable delivery of audio modulation and speed control instructions (such as audio stretching control instructions or audio compression control instructions), so that the audio decoding data can be changed by audio modulation and speed After the corresponding audio pitch and speed operation in the process, the audio pitch and speed data (that is, audio stretch data or audio compression data) is obtained, so as to effectively guarantee the audio effect and reduce the delay.
参照图7,可以理解的是,音频解码过程包括停止音频解码过程和音频解码均速过程;步骤S300,包括如下至少之一:Referring to Fig. 7, it can be understood that the audio decoding process includes stopping the audio decoding process and the audio decoding average speed process; Step S300 includes at least one of the following:
步骤S350,当音频解码控制指令为停止音频解码控制指令,根据停止音频解码控制指令,控制网络实时传输协议包数据经停止音频解码过程中的停止音频解码操作;或Step S350, when the audio decoding control instruction is a stop audio decoding control instruction, according to the stop audio decoding control instruction, control the network real-time transmission protocol packet data to stop the audio decoding operation during the stop audio decoding process; or
步骤S360,当音频解码控制指令为音频解码均速控制指令,根据音频解码均速控制指令,控制网络实时传输协议包数据经音频解码均速过程中的音频解码均速操作。Step S360, when the audio decoding control instruction is an audio decoding average speed control instruction, according to the audio decoding average speed control instruction, control the audio decoding average speed operation during the audio decoding average speed of the network real-time transmission protocol packet data.
本申请实施例通过根据停止音频解码控制指令以控制网络实时传输协议包数据经停止音频解码过程中的停止音频解码操作,进而保证音频缓存过程中的音频缓存操作能够正常执行,以避免造成音频缓存过程中的堆积,以降低延时。The embodiment of the present application controls the network real-time transport protocol packet data to stop the audio decoding operation in the audio decoding process according to the stop audio decoding control instruction, thereby ensuring that the audio buffer operation in the audio buffer process can be performed normally, so as to avoid causing audio buffer Stacking in the process to reduce delay.
本申请实施例根据音频解码均速控制指令以控制网络实时传输协议包数据经音频解码均速过程中的音频解码均速操作,进而保证音频解码均速过程中的音频解码均速操作均速进行,以保证较好的音频效果。According to the audio decoding average speed control instruction, the embodiment of the present application controls the audio decoding average speed operation during the audio decoding average speed process of the network real-time transmission protocol packet data, thereby ensuring that the audio decoding average speed operation during the audio decoding average speed operation is carried out at an average speed , to ensure a better audio effect.
可以理解的是,本申请实施例能够针对缓冲队列长度设置不合理、音频拉伸或压缩不及时、音频延时较大等问题,通过设置本申请实施例的音频控制方法,能够通过音频控制过程中的分类器模型对网络特征数据和音频特征数据进行分类,以得到音频控制指令,从而有效降低延时,达到了兼顾延时和音频的效果。It can be understood that the embodiment of the present application can solve problems such as unreasonable buffer queue length setting, untimely audio stretching or compression, and large audio delay. By setting the audio control method in the embodiment of the present application, the audio control process can The classifier model in classifies network feature data and audio feature data to obtain audio control instructions, thereby effectively reducing delay and achieving the effect of taking both delay and audio into account.
本申请实施例还提供了一种音频控制装置,用于执行如上述第一方面所述的音频控制方法。The embodiment of the present application also provides an audio control device, configured to implement the audio control method described in the first aspect above.
参照图8,一些实施例中,本申请实施例还提供了一种音频控制系统,包括:Referring to Fig. 8, in some embodiments, the embodiment of the present application also provides an audio control system, including:
音频控制装置,用于执行如上述第一方面所述的音频控制方法;An audio control device, configured to execute the audio control method as described in the first aspect above;
还包括网络缓存队列装置、网络预测装置、音频解码装置、音频缓存队列装置、音频变调变速装置和音频消耗装置;It also includes a network buffer queue device, a network prediction device, an audio decoding device, an audio buffer queue device, an audio tone shifting device and an audio consumption device;
其中,网络缓存队列装置、音频解码装置、音频缓存队列装置以及音频消耗装置依次连接,且网络缓存队列装置、音频解码装置、音频缓存队列装置以及音频消耗装置分别与音频控制装置连接,网络预测装置分别与网络缓存队列装置、音频控制装置连接,音频变调变速 装置分别与音频缓存队列装置、音频控制装置连接。Wherein, the network buffer queue device, the audio decoding device, the audio buffer queue device and the audio consumption device are sequentially connected, and the network buffer queue device, the audio decoding device, the audio buffer queue device and the audio consumption device are respectively connected to the audio control device, and the network prediction device They are respectively connected to the network buffer queue device and the audio control device, and the audio frequency shifting device is respectively connected to the audio buffer queue device and the audio control device.
可以理解的是,网络缓存队列装置:用于存储以及传输网络实时传输协议包数据,以适应网络抖动;网络缓存队列装置对应网络缓存队列过程,可用于执行网络缓存操作;网络实时传输协议包数据经网络缓存队列装置执行网络缓存操作后得到网络缓存队列数据。It can be understood that the network cache queue device is used to store and transmit network real-time transmission protocol packet data to adapt to network jitter; the network cache queue device corresponds to the network cache queue process and can be used to perform network cache operations; the network real-time transmission protocol packet data The network cache queue data is obtained after the network cache queue device executes the network cache operation.
网络预测装置:用于对网络实时传输协议包数据进行建模,通过网络预测操作,例如通过统计前一个周期的网络情况,得到网络抖动数据和网络丢包率数据等网络预测数据;网络预测装置对应网络预测过程,可用于执行网络预测操作;网络实时传输协议包数据经网络缓存队列装置执行网络缓存操作后,再经网络预测装置执行网络预测操作,得到网络抖动数据和网络丢包率数据。Network prediction device: used to model real-time network transmission protocol packet data, and obtain network prediction data such as network jitter data and network packet loss rate data through network prediction operations, such as statistics of the network situation in the previous cycle; network prediction device Corresponding to the network prediction process, it can be used to perform network prediction operations; after the network real-time transmission protocol packet data is executed by the network buffer queue device for network buffer operations, the network prediction device is then used to perform network prediction operations to obtain network jitter data and network packet loss rate data.
音频解码装置:用于对网络实时传输协议包数据进行音频解码操作,得到音频解码速率数据和音频解码数据;音频解码装置对应音频解码过程,可用于执行音频解码操作;网络实时传输协议包数据经网络缓存队列装置执行网络缓存操作之后,再经音频解码装置执行音频解码操作,得到音频解码速率数据和音频解码数据。Audio decoding device: used to perform audio decoding operations on network real-time transmission protocol packet data to obtain audio decoding rate data and audio decoding data; the audio decoding device corresponds to the audio decoding process and can be used to perform audio decoding operations; network real-time transmission protocol packet data is passed through After the network cache queue device executes the network cache operation, the audio decoding device executes the audio decoding operation to obtain audio decoding rate data and audio decoding data.
音频缓存队列装置:用于存储以及传输音频解码装置执行音频解码操作后的音频解码数据,以进行音频缓存队列的自适应扩张;音频缓存队列装置对应音频缓存过程,可用于执行音频缓存操作;网络实时传输协议包数据经音频解码装置执行音频解码操作得到音频解码数据之后,再由音频解码数据经音频缓存队列装置执行音频缓存操作。Audio buffer queue device: used to store and transmit the audio decoding data after the audio decoding device performs the audio decoding operation, so as to perform adaptive expansion of the audio buffer queue; the audio buffer queue device corresponds to the audio buffer process and can be used to perform audio buffer operations; the network After the audio decoding operation is performed on the RTP packet data by the audio decoding device to obtain the audio decoding data, the audio decoding data is then subjected to an audio buffering operation by the audio buffer queue device.
音频变调变速装置:用于对音频解码数据例如静音数据和浊音数据进行不同的音频变调变速操作,例如进行音频拉伸操作或音频压缩操作,从而有效保证音频效果。音频变调变速装置对应音频变调变速过程,可用于执行音频变调变速操作。音频变调变速操作包括音频拉伸操作和音频压缩操作,对应的,音频变调变速装置还包括音频拉伸模块和音频压缩模块。Audio pitch shifting device: used to perform different audio pitch shifting and speed changing operations on audio decoding data such as silent data and voiced data, such as performing audio stretching or audio compression operations, so as to effectively ensure audio effects. The audio pitch shifting and speed changing device corresponds to the audio pitch shifting and speed changing process, and can be used to perform audio pitch shifting and speed changing operations. The audio pitch shifting operation includes an audio stretching operation and an audio compression operation, and correspondingly, the audio pitch shifting device further includes an audio stretching module and an audio compression module.
例如,音频控制装置用于根据音频拉伸控制指令,控制音频解码数据经音频拉伸模块执行音频拉伸操作,得到音频拉伸数据;音频控制装置还用于控制音频拉伸数据经音频缓存队列装置执行音频缓存操作之后,再经音频消耗装置执行音频消耗操作;或者,音频控制装置用于根据音频压缩控制指令,控制音频解码数据经音频压缩模块执行音频压缩操作,得到音频压缩数据;音频控制装置还用于控制音频压缩数据经音频缓存队列装置执行音频缓存操作之后,再经音频消耗装置执行音频消耗操作。For example, the audio control device is used to control the audio decoding data to perform an audio stretching operation through the audio stretching module according to the audio stretching control instruction to obtain audio stretching data; the audio control device is also used to control the audio stretching data through the audio buffer queue After the device executes the audio buffer operation, the audio consumption operation is performed by the audio consumption device; or, the audio control device is used to control the audio decoding data to perform an audio compression operation through the audio compression module to obtain audio compression data according to the audio compression control instruction; the audio control The device is also used to control the audio compression data to execute the audio buffer operation through the audio buffer queue device, and then execute the audio consumption operation through the audio consumption device.
音频消耗装置:用于针对不同的平台以及播放的拥塞情况,获取音频消耗/播放的规律,例如音频消耗速率数据,并反馈给音频控制装置;音频消耗装置对应音频消耗过程,可用于执行音频消耗操作;一些实施例中,音频消耗装置也可以为音频播放装置;音频解码数据经音频缓存队列装置执行音频缓存操作之后,再经音频消耗装置执行音频消耗操作得到音频消耗速率数据。Audio consumption device: used to obtain audio consumption/playback rules for different platforms and playback congestion, such as audio consumption rate data, and feed back to the audio control device; the audio consumption device corresponds to the audio consumption process and can be used to perform audio consumption Operation; in some embodiments, the audio consumption device can also be an audio playback device; after the audio decoding data is performed by the audio buffer queue device to perform an audio buffer operation, the audio consumption device is then used to perform an audio consumption operation to obtain audio consumption rate data.
音频控制装置:用于下发音频控制指令,例如下发停止音频解码控制指令或音频解码均速控制指令,以控制音频解码装置是否进行解码;下发音频拉伸控制指令或者音频压缩控制指令,以控制音频变调变速装置进行音频拉伸操作或音频压缩操作。Audio control device: used to issue audio control instructions, such as issuing stop audio decoding control instructions or audio decoding average speed control instructions to control whether the audio decoding device performs decoding; issue audio stretch control instructions or audio compression control instructions, To control the audio pitch shifting device to perform audio stretching operation or audio compression operation.
音频控制装置还包括分类器模块,本申请实施例针对四种音频控制指令,可以通过分类器模块对特征数据进行监督学习,从而进行分类,以输出得到特征数据对应的类别标签,即对应的音频控制指令,达到自适应的目的。本申请实施例根据网络处理情况、音频处理情况进行自适应调节,参数量少且人工参与度低,可以在分类器模块例如训练好的分类器模型上 进行微调,迭代速度快。The audio control device also includes a classifier module. In the embodiment of the present application, for the four audio control instructions, the feature data can be supervised and learned through the classifier module, so as to classify and output the category label corresponding to the feature data, that is, the corresponding audio Control instructions to achieve the purpose of self-adaptation. The embodiment of the present application performs adaptive adjustment according to network processing conditions and audio processing conditions, with a small amount of parameters and low manual participation, fine-tuning can be performed on a classifier module such as a trained classifier model, and the iteration speed is fast.
另外,本申请第三方面实施例还提供了一种音频控制设备,该音频控制设备包括:存储器、处理器及存储在存储器上并可在处理器上运行的计算机程序。In addition, the embodiment of the third aspect of the present application also provides an audio control device, the audio control device includes: a memory, a processor, and a computer program stored in the memory and operable on the processor.
处理器和存储器可以通过总线或者其他方式连接。The processor and memory can be connected by a bus or other means.
存储器作为一种非暂态计算机可读存储介质,可用于存储非暂态软件程序以及非暂态性计算机可执行程序。此外,存储器可以包括高速随机存取存储器,还可以包括非暂态存储器,例如至少一个磁盘存储器件、闪存器件、或其他非暂态固态存储器件。在一些实施方式中,存储器可包括相对于处理器远程设置的存储器,这些远程存储器可以通过网络连接至该处理器。上述网络的实例包括但不限于互联网、企业内部网、局域网、移动通信网及其组合。As a non-transitory computer-readable storage medium, memory can be used to store non-transitory software programs and non-transitory computer-executable programs. In addition, the memory may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage devices. In some embodiments, the memory may include memory located remotely from the processor, which remote memory may be connected to the processor via a network. Examples of the aforementioned networks include, but are not limited to, the Internet, intranets, local area networks, mobile communication networks, and combinations thereof.
实现上述第一方面实施例的音频控制方法所需的非暂态软件程序以及指令存储在存储器中,当被处理器执行时,执行上述实施例中的音频控制方法,例如,执行以上描述的图1中的方法步骤S100至S300、图2中的方法步骤S210、图3中的方法步骤S201至S202、图4中的方法步骤S203至S204、图5中的方法步骤S310至S320、图6中的方法步骤S330至S340、图7中的方法步骤S350至S360。The non-transitory software programs and instructions required to realize the audio control method of the embodiment of the first aspect above are stored in the memory, and when executed by the processor, the audio control method in the above embodiment is executed, for example, the above-described diagram is executed. Method steps S100 to S300 in 1, method steps S210 in Fig. 2, method steps S201 to S202 in Fig. 3, method steps S203 to S204 in Fig. 4, method steps S310 to S320 in Fig. 5, method steps in Fig. 6 The method steps S330 to S340 of , and the method steps S350 to S360 in FIG. 7 .
以上所描述的设备实施例仅仅是示意性的,其中作为分离部件说明的单元可以是或者也可以不是物理上分开的,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。The device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, that is, they may be located in one place, or may be distributed to multiple network units. Part or all of the modules can be selected according to actual needs to achieve the purpose of the solution of this embodiment.
此外,本申请的一个实施例还提供了一种计算机可读存储介质,该计算机可读存储介质存储有计算机可执行指令,该计算机可执行指令被一个处理器或控制器执行,例如,被上述设备实施例中的一个处理器执行,可使得上述处理器执行上述实施例中的音频控制方法,例如,执行以上描述的图1中的方法步骤S100至S300、图2中的方法步骤S210、图3中的方法步骤S201至S202、图4中的方法步骤S203至S204、图5中的方法步骤S310至S320、图6中的方法步骤S330至S340、图7中的方法步骤S350至S360。In addition, an embodiment of the present application also provides a computer-readable storage medium, the computer-readable storage medium stores computer-executable instructions, and the computer-executable instructions are executed by a processor or a controller, for example, by the above-mentioned Execution by a processor in the device embodiment can cause the above-mentioned processor to execute the audio control method in the above-mentioned embodiment, for example, execute the method steps S100 to S300 in FIG. 1 described above, the method step S210 in FIG. 3, method steps S203 to S204 in FIG. 4 , method steps S310 to S320 in FIG. 5 , method steps S330 to S340 in FIG. 6 , and method steps S350 to S360 in FIG. 7 .
本申请实施例包括:通过获取网络处理过程中的网络特征数据以及音频处理过程中的音频特征数据;之后将网络特征数据和音频特征数据输入分类器模型中,分类得到若干音频控制指令;再根据音频控制指令,控制对应的音频处理过程中对应的音频处理操作。通过如此设置,使得本申请实施例能够分别对网络处理过程和音频处理过程中所对应的特征数据分别进行特征提取,以便于分类器模型对该特征数据进行数据处理,从而得到分类结果即若干音频控制指令,之后根据分类得到的若干音频控制指令,自适应地下发该音频控制指令,以控制对应的音频处理过程中对应的音频处理操作,从而能够有效保证音频效果,可操作性较好。The embodiment of the present application includes: obtaining the network feature data in the network processing process and the audio feature data in the audio processing process; then inputting the network feature data and audio feature data into the classifier model, and classifying and obtaining several audio control instructions; and then according to The audio control instruction controls the corresponding audio processing operation in the corresponding audio processing process. With such a setting, the embodiment of the present application can perform feature extraction on the corresponding feature data in the network processing process and the audio processing process, so that the classifier model can perform data processing on the feature data, thereby obtaining the classification result, that is, several audio Control instructions, and then adaptively issue the audio control instructions according to several audio control instructions obtained by classification to control the corresponding audio processing operations in the corresponding audio processing process, so as to effectively ensure the audio effect and have good operability.
本领域普通技术人员可以理解,上文中所公开方法中的全部或某些步骤、系统可以被实施为软件、固件、硬件及其适当的组合。某些物理组件或所有物理组件可以被实施为由处理器,如中央处理器、数字信号处理器或微处理器执行的软件,或者被实施为硬件,或者被实施为集成电路,如专用集成电路。这样的软件可以分布在计算机可读介质上,计算机可读介质可以包括计算机存储介质(或非暂时性介质)和通信介质(或暂时性介质)。如本领域普通技术人员公知的,术语计算机存储介质包括在用于存储信息(诸如计算机可读指令、数据结构、程序模块或其他数据)的任何方法或技术中实施的易失性和非易失性、可移除和不可移除介质。计算机存储介质包括但不限于RAM、ROM、EEPROM、闪存或其他存储器技术、CD-ROM、数字多功能盘(DVD)或其他光盘存储、磁盒、磁带、磁盘存储或其他磁存储装置、或者可以 用于存储期望的信息并且可以被计算机访问的任何其他的介质。此外,本领域普通技术人员公知的是,通信介质通常包含计算机可读指令、数据结构、程序模块或者诸如载波或其他传输机制之类的调制数据信号中的其他数据,并且可包括任何信息递送介质。Those skilled in the art can understand that all or some of the steps and systems in the methods disclosed above can be implemented as software, firmware, hardware and an appropriate combination thereof. Some or all of the physical components may be implemented as software executed by a processor, such as a central processing unit, digital signal processor, or microprocessor, or as hardware, or as an integrated circuit, such as an application-specific integrated circuit . Such software may be distributed on computer readable media, which may include computer storage media (or non-transitory media) and communication media (or transitory media). As known to those of ordinary skill in the art, the term computer storage media includes both volatile and nonvolatile media implemented in any method or technology for storage of information, such as computer readable instructions, data structures, program modules, or other data. permanent, removable and non-removable media. Computer storage media includes, but is not limited to, RAM, ROM, EEPROM, flash memory or other memory technology, CD-ROM, digital versatile disk (DVD) or other optical disk storage, magnetic cartridges, tape, magnetic disk storage or other magnetic storage devices, or can Any other medium used to store desired information and which can be accessed by a computer. In addition, as is well known to those of ordinary skill in the art, communication media typically embodies computer readable instructions, data structures, program modules, or other data in a modulated data signal such as a carrier wave or other transport mechanism, and may include any information delivery media .
以上是对本申请的若干实施方式进行了具体说明,但本申请并不局限于上述实施方式,熟悉本领域的技术人员在不违背本申请精神的前提下还可作出种种的等同变形或替换,这些等同的变形或替换均包含在本申请权利要求所限定的范围内。The above is a specific description of several embodiments of the present application, but the present application is not limited to the above-mentioned embodiments, and those skilled in the art can also make various equivalent deformations or replacements without violating the spirit of the present application. Equivalent modifications or replacements are all within the scope defined by the claims of the present application.
Claims (12)
- 一种音频控制方法,包括:A method of audio control, comprising:获取网络处理过程中的网络特征数据和音频处理过程中的音频特征数据;Obtain network feature data during network processing and audio feature data during audio processing;将所述网络特征数据和所述音频特征数据输入至分类器模型,得到若干音频控制指令;Inputting the network characteristic data and the audio characteristic data into a classifier model to obtain several audio control instructions;根据所述音频控制指令,控制对应的音频处理过程中对应的音频处理操作。According to the audio control instruction, the corresponding audio processing operation in the corresponding audio processing process is controlled.
- 根据权利要求1所述的方法,其中,所述网络特征数据包括网络缓存队列数据、网络抖动数据和网络丢包率数据,所述网络处理过程包括网络缓存队列过程和网络预测过程;The method according to claim 1, wherein the network characteristic data includes network cache queue data, network jitter data and network packet loss rate data, and the network processing process includes a network cache queue process and a network prediction process;其中,所述网络缓存队列数据由网络实时传输协议包数据经所述网络缓存队列过程中的网络缓存操作得到;所述网络抖动数据和所述网络丢包率数据均由所述网络实时传输协议包数据经所述网络缓存队列过程中的网络缓存操作之后,再经所述网络预测过程中的网络预测操作得到。Wherein, the network cache queue data is obtained by the network real-time transport protocol packet data through the network cache operation in the network cache queue process; the network jitter data and the network packet loss rate data are obtained by the network real-time transport protocol The packet data is obtained through the network prediction operation in the network prediction process after the network buffer operation in the network buffer queue process.
- 根据权利要求2所述的方法,其中,所述音频特征数据包括音频解码速率数据、音频缓存队列数据和音频消耗速率数据,所述音频处理过程包括音频解码过程、音频缓存过程和音频消耗过程;The method according to claim 2, wherein the audio feature data includes audio decoding rate data, audio buffer queue data and audio consumption rate data, and the audio processing process includes an audio decoding process, an audio buffering process and an audio consumption process;其中,所述音频解码速率数据由所述网络实时传输协议包数据经所述网络缓存队列过程中的网络缓存操作之后,再经所述音频解码过程中的音频解码操作得到;所述音频缓存队列数据由所述网络实时传输协议包数据经所述音频解码过程中的音频解码操作得到音频解码数据之后,再由所述音频解码数据经所述音频缓存过程中的音频缓存操作得到;所述音频消耗速率数据由所述音频解码数据经所述音频缓存过程中的音频缓存操作之后,再经所述音频消耗过程中的音频消耗操作得到。Wherein, the audio decoding rate data is obtained through the audio decoding operation in the audio decoding process after the network real-time transport protocol packet data in the network caching queue process through the network caching operation; the audio caching queue After the data is obtained by the audio decoding operation in the audio decoding process from the network real-time transport protocol packet data, the audio decoding data is obtained through the audio buffering operation in the audio buffering process; the audio The consumption rate data is obtained from the audio decoding data through the audio buffering operation in the audio buffering process, and then the audio consuming operation in the audio consuming process.
- 根据权利要求3所述的方法,其中,所述将所述网络特征数据和所述音频特征数据输入至分类器模型,得到若干音频控制指令,包括:The method according to claim 3, wherein said inputting said network feature data and said audio feature data into a classifier model to obtain several audio control instructions, including:将所述网络缓存队列数据、所述网络抖动数据、所述网络丢包率数据、所述音频解码速率数据、所述音频缓存队列数据和所述音频消耗速率数据均输入至所述分类器模型,得到音频解码控制指令和音频变调变速控制指令。inputting the network buffer queue data, the network jitter data, the network packet loss rate data, the audio decoding rate data, the audio buffer queue data and the audio consumption rate data into the classifier model , to obtain the audio decoding control instruction and the audio pitch shifting control instruction.
- 根据权利要求4所述的方法,其中,所述音频变调变速控制指令由如下至少一个步骤得到:The method according to claim 4, wherein the audio pitch and speed change control instruction is obtained by at least one of the following steps:当所述音频缓存队列数据对应的数据长度小于第一音频消耗速率对应的第一音频消耗数据长度,所述音频变调变速控制指令为音频拉伸控制指令;或When the data length corresponding to the audio buffer queue data is less than the first audio consumption data length corresponding to the first audio consumption rate, the audio modulation and speed control instruction is an audio stretching control instruction; or当所述音频缓存队列数据对应的数据长度大于第一时间间隔阈值对应的第一音频数据长度且小于第二时间间隔阈值对应的第二音频数据长度,或所述音频缓存队列数据对应的数据长度大于第二音频消耗速率对应的第二音频消耗数据长度且小于第三音频消耗速率对应的第三音频消耗数据长度,所述音频变调变速控制指令为音频压缩控制指令,其中,所述第二音频消耗数据长度大于所述第一音频消耗数据长度。When the data length corresponding to the audio buffer queue data is greater than the first audio data length corresponding to the first time interval threshold and smaller than the second audio data length corresponding to the second time interval threshold, or the data length corresponding to the audio buffer queue data greater than the second audio consumption data length corresponding to the second audio consumption rate and less than the third audio consumption data length corresponding to the third audio consumption rate, the audio modulation and speed control instruction is an audio compression control instruction, wherein the second audio The consumption data length is greater than the first audio consumption data length.
- 根据权利要求5所述的方法,其中,所述音频解码控制指令由如下至少一个步骤得到:The method according to claim 5, wherein the audio decoding control instruction is obtained by at least one of the following steps:当所述音频缓存队列数据对应的数据长度大于所述第二时间间隔阈值对应的第二音频数据长度或大于所述第三音频消耗速率对应的第三音频消耗数据长度,所述音频解码控制指令为停止音频解码控制指令;或When the data length corresponding to the audio buffer queue data is greater than the second audio data length corresponding to the second time interval threshold or greater than the third audio consumption data length corresponding to the third audio consumption rate, the audio decoding control instruction to stop audio decoding control commands; or获取预设时间内所述音频拉伸控制指令对应的第一执行总次数和所述音频压缩控制指令对应的第二执行总次数,当所述预设时间内不存在所述停止音频解码控制指令,且所述第一执行总次数和所述第二执行总次数之间的差值的绝对值小于预设阈值,所述音频解码控制指令为音频解码均速控制指令。Obtain the first total number of executions corresponding to the audio stretch control instruction and the second total number of executions corresponding to the audio compression control instruction within a preset time, when the stop audio decoding control instruction does not exist within the preset time , and the absolute value of the difference between the first total number of execution times and the second total number of execution times is smaller than a preset threshold, the audio decoding control instruction is an audio decoding average speed control instruction.
- 根据权利要求5所述的方法,其中,所述音频处理过程包括音频拉伸过程;The method according to claim 5, wherein the audio processing process comprises an audio stretching process;所述根据所述音频控制指令,控制对应的音频处理过程中对应的音频处理操作,包括:According to the audio control instruction, controlling the corresponding audio processing operation in the corresponding audio processing process includes:当所述音频变调变速控制指令为所述音频拉伸控制指令,根据所述音频拉伸控制指令,控制所述音频解码数据经所述音频拉伸过程中的音频拉伸操作,得到音频拉伸数据;When the audio pitch and speed control instruction is the audio stretching control instruction, according to the audio stretching control instruction, the audio decoding data is controlled to undergo an audio stretching operation in the audio stretching process to obtain audio stretching data;控制所述音频拉伸数据经所述音频缓存过程中的所述音频缓存操作之后,再经所述音频消耗过程中的音频消耗操作。Controlling the audio stretching data to go through the audio consuming operation in the audio consuming process after the audio buffering operation in the audio buffering process.
- 根据权利要求5至7任一项所述的方法,其中,所述音频处理过程包括音频压缩过程;The method according to any one of claims 5 to 7, wherein the audio processing process comprises an audio compression process;所述根据所述音频控制指令,控制对应的音频处理过程中对应的音频处理操作,包括:According to the audio control instruction, controlling the corresponding audio processing operation in the corresponding audio processing process includes:当所述音频变调变速控制指令为所述音频压缩控制指令,根据所述音频压缩控制指令,控制所述音频解码数据经所述音频压缩过程中的音频压缩操作,得到音频压缩数据;When the audio modulation and speed control instruction is the audio compression control instruction, according to the audio compression control instruction, the audio decoding data is controlled to undergo an audio compression operation in the audio compression process to obtain audio compression data;控制所述音频压缩数据经所述音频缓存过程中的所述音频缓存操作之后,再经所述音频消耗过程中的音频消耗操作。After the audio compression data is controlled to undergo the audio buffer operation in the audio buffer process, it is then controlled to undergo the audio consumption operation in the audio consumption process.
- 根据权利要求6所述的方法,其中,所述音频解码过程包括停止音频解码过程和音频解码均速过程;The method according to claim 6, wherein the audio decoding process comprises stopping the audio decoding process and the audio decoding average speed process;所述根据所述音频控制指令,控制对应的音频处理过程中对应的音频处理操作,包括如下至少之一:According to the audio control instruction, controlling the corresponding audio processing operation in the corresponding audio processing process includes at least one of the following:当所述音频解码控制指令为所述停止音频解码控制指令,根据所述停止音频解码控制指令,控制所述网络实时传输协议包数据经所述停止音频解码过程中的停止音频解码操作;或When the audio decoding control instruction is the stop audio decoding control instruction, according to the stop audio decoding control instruction, control the real-time transport protocol packet data through the stop audio decoding operation in the stop audio decoding process; or当所述音频解码控制指令为所述音频解码均速控制指令,根据所述音频解码均速控制指令,控制所述网络实时传输协议包数据经所述音频解码均速过程中的音频解码均速操作。When the audio decoding control instruction is the audio decoding average speed control instruction, according to the audio decoding average speed control instruction, control the audio decoding average speed of the network real-time transmission protocol packet data through the audio decoding average speed process operate.
- 一种音频控制装置,用于执行如权利要求1至9任意一项所述的音频控制方法。An audio control device, configured to execute the audio control method according to any one of claims 1-9.
- 一种音频控制设备,包括:存储器、处理器及存储在所述存储器上并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现如权利要求1至9中任意一项所述的音频控制方法。An audio control device, comprising: a memory, a processor, and a computer program stored on the memory and operable on the processor, when the processor executes the computer program, it realizes the process described in claims 1 to 9 Any one of the audio control methods.
- 一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令用于执行权利要求1至9中任意一项所述的音频控制方法。A computer-readable storage medium storing computer-executable instructions for executing the audio control method according to any one of claims 1-9.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111145232.2A CN115883527A (en) | 2021-09-28 | 2021-09-28 | Audio control method, device, equipment and computer readable storage medium |
CN202111145232.2 | 2021-09-28 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2023050994A1 true WO2023050994A1 (en) | 2023-04-06 |
Family
ID=85763668
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/108334 WO2023050994A1 (en) | 2021-09-28 | 2022-07-27 | Audio control method and apparatus, device, and computer readable storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115883527A (en) |
WO (1) | WO2023050994A1 (en) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN117789734B (en) * | 2024-02-28 | 2024-05-28 | 腾讯科技(深圳)有限公司 | Audio processing method, device, computer equipment and storage medium |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105071897A (en) * | 2015-07-03 | 2015-11-18 | 东北大学 | Multipath redundant transmission method for network real-time audio conversation media data |
CN107920176A (en) * | 2017-11-19 | 2018-04-17 | 天津光电安辰信息技术股份有限公司 | A kind of tonequality for voice communication system optimizes device |
US20200112600A1 (en) * | 2017-11-30 | 2020-04-09 | Logmein, Inc. | Managing jitter buffer length for improved audio quality |
CN113409801A (en) * | 2021-08-05 | 2021-09-17 | 云从科技集团股份有限公司 | Noise processing method, system, medium, and apparatus for real-time audio stream playback |
-
2021
- 2021-09-28 CN CN202111145232.2A patent/CN115883527A/en active Pending
-
2022
- 2022-07-27 WO PCT/CN2022/108334 patent/WO2023050994A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN105071897A (en) * | 2015-07-03 | 2015-11-18 | 东北大学 | Multipath redundant transmission method for network real-time audio conversation media data |
CN107920176A (en) * | 2017-11-19 | 2018-04-17 | 天津光电安辰信息技术股份有限公司 | A kind of tonequality for voice communication system optimizes device |
US20200112600A1 (en) * | 2017-11-30 | 2020-04-09 | Logmein, Inc. | Managing jitter buffer length for improved audio quality |
CN113409801A (en) * | 2021-08-05 | 2021-09-17 | 云从科技集团股份有限公司 | Noise processing method, system, medium, and apparatus for real-time audio stream playback |
Also Published As
Publication number | Publication date |
---|---|
CN115883527A (en) | 2023-03-31 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2016015670A1 (en) | Audio stream decoding method and device | |
US11323136B2 (en) | Method and apparatus for processing a received sequence of data packets by removing unsuitable error correction packets from the sequence | |
WO2023050994A1 (en) | Audio control method and apparatus, device, and computer readable storage medium | |
US9420022B2 (en) | Media requests to counter latency and minimize network bursts | |
WO2014099319A1 (en) | Audio processing apparatus and audio processing method | |
US10862767B2 (en) | Data packet prediction | |
US20080232442A1 (en) | Method of transmitting data in a communication system | |
US20180315431A1 (en) | Audio frame labeling to achieve unequal error protection for audio frames of unequal importance | |
US11356739B2 (en) | Video playback method, terminal apparatus, and storage medium | |
WO2023065642A1 (en) | Corpus screening method, intention recognition model optimization method, device, and storage medium | |
CN113259657A (en) | DPPO code rate self-adaptive control system and method based on video quality fraction | |
Sani et al. | SMASH: A supervised machine learning approach to adaptive video streaming over HTTP | |
CN117834944A (en) | Method, device, electronic equipment and storage medium for adaptive video semantic communication | |
CN107979482B (en) | Information processing method, device, sending end, jitter removal end and receiving end | |
Grassucci et al. | Enhancing Semantic Communication with Deep Generative Models: An Overview | |
CN116962179A (en) | Network transmission optimization method and device, computer readable medium and electronic equipment | |
CN113572703B (en) | Online traffic service classification method based on FPGA | |
CN112702624B (en) | Method, system, medium and device for optimizing short video playing efficiency | |
US20210327440A1 (en) | Audio packet loss concealment method, device and Bluetooth receiver | |
CN116095395A (en) | Method and device for adjusting buffer length, electronic equipment and storage medium | |
CN104934040A (en) | Duration adjustment method and device for audio signal | |
CN105450543B (en) | Voice data transmission method | |
US11238236B2 (en) | Summarization of group chat threads | |
CN111063347A (en) | Real-time voice recognition method, server and client | |
US20230094815A1 (en) | System and method for ultra low latency live streaming based on user datagram protocol |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 22874390 Country of ref document: EP Kind code of ref document: A1 |