WO2018099143A1 - 一种处理音频数据的方法和装置 - Google Patents

一种处理音频数据的方法和装置 Download PDF

Info

Publication number
WO2018099143A1
WO2018099143A1 PCT/CN2017/098350 CN2017098350W WO2018099143A1 WO 2018099143 A1 WO2018099143 A1 WO 2018099143A1 CN 2017098350 W CN2017098350 W CN 2017098350W WO 2018099143 A1 WO2018099143 A1 WO 2018099143A1
Authority
WO
WIPO (PCT)
Prior art keywords
algorithm
adjustment coefficient
audio processing
type information
audio data
Prior art date
Application number
PCT/CN2017/098350
Other languages
English (en)
French (fr)
Inventor
刘泽新
李海婷
苗磊
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2018099143A1 publication Critical patent/WO2018099143A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/012Comfort noise or silence coding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech

Definitions

  • the present invention relates to the field of wireless communication technologies, and in particular, to a method and apparatus for processing audio data.
  • the use of mobile terminals is becoming more and more popular, and people can perform voice communication through mobile terminals.
  • voice communication the user at the transmitting end can speak or play music
  • the transmitting end can detect the corresponding audio data, and then send the detected audio data to the receiving end, and after receiving the audio data, the receiving end can pass
  • the audio data is played by components such as headphones or speakers so that the user at the receiving end can hear the corresponding audio.
  • the transmitting end and the receiving end can process the audio data through a preset audio processing algorithm to improve the voice communication. quality.
  • the audio processing algorithm may be a 3A algorithm, that is, an AEC (Adaptive Echo Cancellation) algorithm, an ANS (Automatic Noise Suppression) algorithm, and an AGC (Automatic Gain Control) algorithm, based on 3A.
  • the algorithm can reduce the noise of the audio data, eliminate the echo, and make the output signal have a certain energy and stability;
  • the audio processing algorithm can be a JBM (Jitter Buffer Management) algorithm, based on the JBM algorithm, can make the network When jittering, it is still possible to ensure a relatively continuous and stable signal output.
  • JBM Joint Buffer Management
  • the audio effect of the audio data may be deteriorated after the above processing, for example, the audio data is audio of a piece of music.
  • the data, after the noise reduction process by the ANS algorithm, will seriously affect the sound effect of the music, which will result in poor communication quality.
  • an embodiment of the present invention provides a method and apparatus for processing audio data.
  • the technical solution is as follows:
  • a method of processing audio data comprising:
  • the target audio processing algorithm is adjusted, and the audio data is processed based on the adjusted target audio processing algorithm
  • the audio data is processed based on the target audio processing algorithm.
  • the target audio processing algorithm in the process of voice communication, whether the target audio processing algorithm is adjusted based on the type information of the audio data and the target audio processing algorithm may be performed, so that when processing certain types of audio data, The target audio processing algorithm is adjusted to achieve better processing results and improve voice communication quality.
  • the adjusting the target audio processing algorithm includes:
  • the parameter value of the target parameter is adjusted based on the adjustment coefficient.
  • an implementation manner of adjusting an audio processing algorithm is provided.
  • the target parameter includes an intermediate parameter in a process performed based on the target audio processing algorithm.
  • the target audio processing algorithm includes an automatic noise suppression ANS algorithm
  • the intermediate parameter includes a noise parameter of noise determined based on the ANS algorithm and the audio data.
  • the target audio processing algorithm includes an automatic gain control AGC algorithm
  • the intermediate parameter includes an attenuation gain factor determined based on the AGC algorithm and the audio data.
  • the target audio processing algorithm includes an adaptive echo cancellation AEC algorithm
  • the intermediate parameter includes an echo parameter of the echo determined based on the AEC algorithm and the audio data.
  • the target parameter includes an initial parameter in a process performed based on the target audio processing algorithm.
  • the target audio processing algorithm includes a jitter buffer management JBM algorithm
  • the initial parameter includes a buffer depth of the audio data
  • the target audio processing algorithm includes a time scale adjustment TSM algorithm
  • the initial parameters include a stretching parameter or a compression parameter of the audio data.
  • the determining, according to the type information of the audio data and the target audio processing algorithm, whether to adjust the target audio processing algorithm includes:
  • the target audio processing algorithm is an ANS algorithm
  • the type information is a non-voice active frame type, determining to adjust the ANS algorithm; if the type information is a voice active frame type, determining The ANS algorithm does not adjust;
  • the target audio processing algorithm is an ANS algorithm
  • the type information is a music type, it is determined to adjust the ANS algorithm; if the type information is a voice type, it is determined that the ANS algorithm is not adjusted. ;
  • the target audio processing algorithm is an AGC algorithm
  • the type information is a non-voice active frame type, determining to adjust the AGC algorithm; if the type information is a voice active frame type, determining The AGC algorithm does not adjust;
  • the target audio processing algorithm is an AGC algorithm
  • the type information is a music type, it is determined to adjust the AGC algorithm; if the type information is a voice type, it is determined that the AGC algorithm is not adjusted. ;
  • the target audio processing algorithm is an AEC algorithm
  • the type information is a non-voice active frame type, determining to adjust the AEC algorithm; if the type information is a voice active frame type, determining The AEC algorithm does not adjust;
  • the target audio processing algorithm is an AEC algorithm
  • the type information is a music type, it is determined to adjust the AEC algorithm; if the type information is a voice type, it is determined that the AEC algorithm is not adjusted. ;
  • the target audio processing algorithm is a JBM algorithm
  • the type information is a non-voice active frame type, determining to adjust the JBM algorithm; if the type information is a voice active frame type, determining The JBM algorithm is not adjusted; or
  • the target audio processing algorithm is a TSM algorithm
  • the type information is a voice active frame type, it is determined to adjust the TSM algorithm; if the type information is a non-voice active frame type, determine the TSM The algorithm does not adjust.
  • a method of processing audio data comprising:
  • the audio data is processed based on the adjusted parameter value of the target parameter.
  • the category information of the audio signal to be processed may be determined first, and then the adjustment coefficient for adjusting the audio signal is determined according to the category information, and then according to the target audio processing algorithm and the adjustment coefficient.
  • the audio signal is processed, and the processed audio signal is output, so that different audio processing can be performed for different types of audio signals, thereby improving the quality of voice communication.
  • the target parameter includes an intermediate parameter in a process performed based on the target audio processing algorithm.
  • the target audio processing algorithm includes an automatic noise suppression ANS algorithm
  • the intermediate parameter includes a noise parameter of noise determined based on the ANS algorithm and the audio data.
  • the target audio processing algorithm includes an automatic gain control AGC algorithm
  • the intermediate parameter includes an attenuation gain factor determined based on the AGC algorithm and the audio data.
  • the target audio processing algorithm includes an adaptive echo cancellation AEC algorithm
  • the intermediate parameter includes an echo parameter of the echo determined based on the AEC algorithm and the audio data.
  • the adjusting by using the adjustment coefficient, a parameter value of the target parameter, including:
  • the target audio processing algorithm is an ANS algorithm
  • the type information is a voice active frame type
  • the noise parameter of the noise is adjusted based on a preset first adjustment coefficient
  • the type information is non-voice The active frame type
  • the noise parameter of the noise is adjusted based on a preset second adjustment coefficient, where the first adjustment coefficient is smaller than the second adjustment coefficient
  • the target audio processing algorithm is an ANS algorithm
  • the noise parameter of the noise is adjusted based on a preset third adjustment coefficient
  • the type information is a music type
  • the target audio processing algorithm is an AEC algorithm
  • the echo parameters of the echo are adjusted based on a preset fifth adjustment coefficient
  • the type information is non-voice
  • the echo parameter of the echo is adjusted based on a preset sixth adjustment coefficient
  • the fifth adjustment coefficient is smaller than the sixth adjustment coefficient
  • the target audio processing algorithm is an AEC algorithm
  • the echo parameters of the echo are adjusted based on a preset seventh adjustment coefficient, and if the type information is a music type, Adjusting an echo parameter of the echo according to a preset eighth adjustment coefficient, where the seventh adjustment coefficient is greater than the eighth adjustment coefficient;
  • the target audio processing algorithm is an AGC algorithm
  • the attenuation gain factor is adjusted based on a preset ninth adjustment coefficient
  • the type information is a non-voice activity a frame type, wherein the attenuation gain factor is adjusted based on a preset tenth adjustment coefficient, the ninth adjustment coefficient being greater than the tenth adjustment coefficient
  • the target audio processing algorithm is an AGC algorithm
  • the attenuation gain factor is adjusted based on a preset eleventh adjustment coefficient
  • the type information is a music type
  • the attenuation gain factor is adjusted based on a preset twelfth adjustment coefficient, the eleventh adjustment coefficient being greater than the twelfth adjustment coefficient.
  • the target parameter includes an initial parameter in a process performed based on the target audio processing algorithm.
  • the target audio processing algorithm includes a jitter buffer management JBM algorithm
  • the initial parameter includes a buffer depth of the audio data
  • the target audio processing algorithm includes a time scale adjustment TSM algorithm
  • the initial parameters include a stretching parameter or a compression parameter of the audio data.
  • the adjusting by using the adjustment coefficient, a parameter value of the target parameter, including:
  • the buffer depth is adjusted based on a preset thirteenth adjustment coefficient, if the type information is a non-voice activity.
  • the buffer depth is adjusted based on a preset fourteenth adjustment coefficient, where the thirteenth adjustment coefficient is greater than the fourteenth adjustment coefficient;
  • the target audio processing algorithm is a TSM algorithm
  • the stretching parameter or the compression parameter is adjusted based on a preset fifteenth adjustment coefficient
  • the stretching parameter or the compression parameter is adjusted based on a preset sixteenth adjustment coefficient, the fifteenth adjustment coefficient being smaller than the sixteenth adjustment coefficient
  • the target audio processing algorithm is a TSM algorithm
  • the type information is a voice type
  • the noise is adjusted based on a preset seventeenth adjustment coefficient
  • the type information is a music type
  • the eighteenth adjustment coefficient is set to adjust the noise, and the seventeenth adjustment coefficient is greater than the eighteenth adjustment coefficient.
  • an apparatus for processing audio data comprising: a processor, a network interface, a memory, and a bus, the memory and the network interface being respectively connected to the processor through a bus; the processor being configured to perform storage in the memory An instruction by the processor to implement any of the above first aspect or the first aspect by executing an instruction A method of processing audio data provided by the present mode.
  • an embodiment of the present invention provides an apparatus for processing audio data, where the apparatus for processing audio data includes at least one unit, and the at least one unit is configured to implement any one of the foregoing first aspect or the first aspect.
  • a method of processing audio data provided by an implementation.
  • a fifth aspect provides an apparatus for processing audio data, the apparatus comprising: a processor, a network interface, a memory, and a bus, wherein the memory and the network interface are respectively connected to the processor through a bus; the processor is configured to perform storage in the memory
  • the processor implements the method of processing audio data provided by any of the possible implementations of the second aspect or the second aspect by executing the instructions.
  • an embodiment of the present invention provides an apparatus for processing audio data, where the apparatus for processing audio data includes at least one unit, and the at least one unit is configured to implement any one of the foregoing second aspect or the second aspect.
  • a method of processing audio data provided by an implementation.
  • an embodiment of the present invention provides a computer storage medium, where the computer program is stored, and when the computer program is executed by the processor, the following steps are implemented:
  • the target audio processing algorithm is adjusted, and the audio data is processed based on the adjusted target audio processing algorithm
  • the audio data is processed based on the target audio processing algorithm.
  • an embodiment of the present invention provides a computer storage medium, where the computer program is stored, and when the computer program is executed by the processor, the following steps are implemented:
  • the audio data is processed based on the adjusted parameter value of the target parameter.
  • the target audio processing algorithm in the process of voice communication, whether the target audio processing algorithm is adjusted based on the type information of the audio data and the target audio processing algorithm, so that some types of audio numbers can be processed. According to the time, the target audio processing algorithm is adjusted to achieve better processing effect and improve the quality of voice communication.
  • FIG. 1 is a system frame diagram provided by an embodiment of the present invention.
  • FIG. 2 is a schematic diagram of transmitting audio data according to an embodiment of the present invention.
  • FIG. 3 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
  • FIG. 4 is a flowchart of a method for processing audio data according to an embodiment of the present invention.
  • FIG. 5 is a flowchart of a method for processing audio data according to an embodiment of the present invention.
  • FIG. 6 is a flowchart of a method for processing audio data according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of an apparatus for processing audio data according to an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of an apparatus for processing audio data according to an embodiment of the present invention.
  • the embodiment of the invention provides a method for processing audio data, and the execution body of the method is a terminal.
  • the terminal may be a transmitting end for transmitting audio data during a voice communication process, or may be a receiving end for receiving audio data.
  • the transmitting end may detect audio data through an input device such as a microphone, and the audio data may be a user's voice, and may be a piece of music or other audio data.
  • the transmitting end may encode the audio data, and then send the encoded audio data to the receiving end through the network, and after receiving the encoded audio data, the receiving end may decode the audio data, and then Play the decoded audio data.
  • FIG. 1 it is a system framework diagram provided by an embodiment of the present invention, including a transmitting end, a receiving end, and a network.
  • an audio processing algorithm may be pre-stored in the terminal to process the audio data.
  • the audio processing algorithm can be 3A algorithm, namely AEC (Adaptive Echo Cancellation) algorithm, ANS (Automatic Noise Suppression) algorithm and AGC (Automatic Gain Control) algorithm.
  • audio processing algorithm can be JBM (Jitter Buffer Management) algorithm, based on JBM algorithm, can not receive audio data During the time period, the buffered audio data is sent to improve the continuity of the call; the audio processing algorithm may also be a TSM (Time Scale Modification) algorithm, which can stretch or compress the audio data based on the TSM algorithm, thereby The audio data is adjusted to the audio data of the target duration to improve the continuity of the call. For example, due to the network, if the duration of the audio data received by the terminal in a certain frame is less than one frame, the received audio can be received by the TSM algorithm.
  • JBM Joint Buffer Management
  • TSM Time Scale Modification
  • the data is stretched into audio data of one frame duration, Person, when the terminal receives a frame of audio data is longer than one frame may be received by the audio data compression algorithm for TSM a time length of one frame of audio data.
  • the process of transmitting audio data between the transmitting end and the receiving end may be as follows: after detecting the audio data, the transmitting end may process the audio data through the 3A algorithm, and then encode the processed audio data, thereby performing wireless communication.
  • the network will encode the audio data Send to the receiving end.
  • the receiving end may process the received audio data through the JBM algorithm and/or the TSM algorithm, and then decode the processed audio data, and then use the 3A algorithm to decode the decoded audio.
  • the data is processed, and the processed audio data is output through an output device (such as a headphone or a speaker), so that the user at the receiving end can hear the audio data, as shown in FIG. 2, between the transmitting end and the receiving end.
  • an output device such as a headphone or a speaker
  • the terminal may be the foregoing sending end or receiving end.
  • the terminal 10 includes a transceiver 1011 and a memory 1012.
  • the terminal may further include a processor 1013.
  • the memory 1012 and the network interface 1014 are respectively connected to the processor 1013; the memory 1012 is configured to store program code, the program code includes computer operation instructions, and the processor 1013 and the transceiver 1011 are configured to execute program code stored in the memory 1012 for The related processing of the audio data is implemented, and can interact with the base station or other terminals through the network interface 1014.
  • Processor 1013 includes one or more processing cores.
  • the processor 1013 executes the following method of processing audio data by running a software program and a unit.
  • the terminal may also include components such as bus 1015.
  • the memory 1012 and the network interface 1014 are respectively connected to the processor 1013 and the transceiver 1011 via the bus 1015.
  • Memory 1012 can be used to store software programs and units. Specifically, the memory 1012 can store the operating system 10121, the application unit 10122 required for at least one function.
  • the operating system 10121 can be an operating system such as Real Time eXecutive (RTX), LINUX, UNIX, WINDOWS, or OS X.
  • FIG. 4 is a flowchart of a method for processing audio data according to an exemplary embodiment of the present invention, which may be used in the system framework shown in FIG. 1. As shown in FIG. 4, the method for processing audio data may include:
  • Step 401 Acquire audio data to be processed.
  • the audio data may be an audio signal obtained by the terminal detection or decoding process, or may be an audio code stream obtained by the encoding process.
  • the type information may be information indicating a type of the audio data, and the type of the audio data may include a voice activity frame and a non-voice activity frame, and the voice activity frame may include a voice type and a music type.
  • the terminal can obtain the audio data to be processed.
  • the terminal can detect the audio data through an input device (such as a microphone), and use the detected audio data as the audio data to be processed.
  • the terminal may receive the audio code stream sent by the transmitting end through the receiving component, and use the received audio code stream as the audio data to be processed, or may also treat the audio data after performing some processing.
  • the processed audio data such as decoding processing or an algorithm processing.
  • Step 402 determining type information of the target audio processing algorithm and audio data to be used.
  • the target audio processing algorithm to be used may be determined according to the stage in which the audio data is in the process of the voice communication.
  • the audio data to be processed is the audio data detected by the transmitting end
  • the target audio processing algorithm may be the 3A algorithm
  • the audio data to be processed is the audio data after the decoding process of the receiving end
  • the target audio processing algorithm may be the 3A algorithm
  • the processed audio data is audio data received by the receiving end, and the target audio processing algorithm may be a JBM algorithm or a TSM algorithm.
  • the type information of the audio data may also be determined.
  • the terminal may determine the type information of the audio data according to the existing audio classification algorithm, and the corresponding processing may be as follows: determining the feature value of the audio data according to the pre-stored audio classification algorithm, and determining the audio data according to the feature value of the audio data. Type information.
  • the audio classification algorithm for classifying the audio data may be pre-stored in the terminal. After acquiring the audio data to be processed, the terminal may calculate the feature value of the audio data according to the pre-stored audio classification algorithm, and then according to the The feature value of the audio data determines the type information of the audio data.
  • the audio classification algorithm may use an audio classification algorithm in the prior art, such as a VAD (Voice Activity Detection) algorithm and a voice music classification algorithm. Based on the VAD algorithm, it can be determined whether the audio data is a voice activity frame or a non-voice activity frame; based on the voice music classifier, it can be further determined whether the audio data of the voice activity frame type is a voice type or a music type.
  • VAD Voice Activity Detection
  • the type information of the audio data may be determined according to the feature value.
  • the terminal may determine whether the feature value is greater than a preset classification threshold. If the feature value is greater than a preset classification threshold, the first type information may be used as the type information of the audio data, if the feature value is smaller than the preset type information. Then, the second type information can be used as the type information of the audio data. For example, if the preset classification threshold is 0.5 and the feature value of the audio data is 0.8, the type information of the audio data is 1, indicating that the audio data is a voice type signal; and the audio data has a characteristic value of 0.2. The type information of the audio data is 0, indicating that the audio data is a signal of a music type.
  • the terminal may also obtain type information of the audio data from the codec.
  • the terminal may use a codec with a signal classification function, and an audio classification algorithm may be stored in the codec.
  • the codec can determine the feature value of the audio data according to the pre-stored audio classification algorithm, and then determine the type information of the audio data according to the feature value of the audio data, and the specific processing process and The above process is similar and will not be described again.
  • the codec can store the determined type information for subsequent processing.
  • the terminal may first process the audio data through the audio processing algorithm and then perform the encoding and decoding, the terminal may obtain the type information from the codec as the type information of the current frame audio data.
  • the type information stored in the codec is type information obtained by the codec analyzing the audio data input in the previous frame. In this case, the type information has a frame delay relative to the audio data, however, Since the speech signal can be understood as a slow-grading signal of a class period, the delay can be ignored.
  • Step 403 Determine whether to adjust the target audio processing algorithm based on the type information of the audio data and the target audio processing algorithm.
  • the type information of the audio data may include a voice activity frame type and a non-voice activity frame type, wherein the voice activity frame type may include a music type and a voice type.
  • the terminal can classify the audio signals according to different requirements. For example, the audio data can be classified into a voice activity frame type and a non-voice activity frame, or the audio data can be first divided into non-voice activity frames and voices.
  • the active frame type, the audio data in the voice activity frame type is further classified into a voice type or a music type, which is not limited in this embodiment.
  • the terminal may determine the type information that needs to be adjusted corresponding to the target audio processing algorithm according to the pre-stored audio processing algorithm and the type information corresponding to the adjustment (referred to as target type information). If the type information of the audio data to be processed is the target type information, it is determined to adjust the target audio processing algorithm; otherwise, it is determined that the target audio processing algorithm is not adjusted.
  • target type information the type information that needs to be adjusted corresponding to the target audio processing algorithm according to the pre-stored audio processing algorithm and the type information corresponding to the adjustment.
  • the target audio processing algorithm is the ANS algorithm
  • the type information is a non-voice active frame type, it is determined to adjust the ANS algorithm; if the type information is a voice active frame type, it is determined that the ANS algorithm is not adjusted;
  • the terminal further determines that the audio data is a music type or a voice type
  • the target audio processing algorithm is an ANS algorithm
  • the type information is a music type, it is determined to adjust the ANS algorithm; if the type information is a voice type, the pair is judged
  • the ANS algorithm does not adjust.
  • the target audio processing algorithm is the AGC algorithm
  • the type information is a non-voice active frame type, it is determined to adjust the AGC algorithm; if the type information is a voice active frame type, it is determined that the AGC algorithm is not adjusted;
  • the terminal further determines that the audio data is a music type or a voice type
  • the target audio processing algorithm is the AGC algorithm
  • the type information is a music type, it is determined to adjust the AGC algorithm; if the type information is a voice type, the pair is judged
  • the AGC algorithm does not adjust.
  • the target audio processing algorithm is the AEC algorithm
  • the type information is a non-voice active frame type, it is determined to adjust the AEC algorithm; if the type information is a voice active frame type, it is determined that the AEC algorithm is not adjusted;
  • the terminal may determine that the audio data is a music type or a voice type.
  • the target audio processing algorithm is an AEC algorithm
  • the type information is a music type, it is determined to adjust the AEC algorithm; if the type information is a voice type, it is determined that the AEC algorithm is not Make adjustments.
  • the target audio processing algorithm is the JBM algorithm
  • the type information is a non-voice active frame type, it is determined to adjust the JBM algorithm; if the type information is a voice active frame type, it is determined that the JBM algorithm is not adjusted.
  • the target audio processing algorithm is the TSM algorithm
  • the type information is a voice active frame type
  • the type information is a non-voice active frame type
  • the TSM algorithm is determined not to be adjusted.
  • the terminal may further determine that the audio data is a music type or a voice type, and the audio data of the music type and the audio data of the voice type may be adjusted to different degrees, and will be described in detail later.
  • Step 404 If it is determined that the target audio processing algorithm is adjusted, the target audio processing algorithm is adjusted, and the audio data is processed based on the adjusted target audio processing algorithm.
  • the target audio processing algorithm may be adjusted according to an adjustment strategy of the pre-stored audio processing algorithm, and the audio data is processed based on the adjusted target audio processing algorithm. Further, the processed audio data can be output.
  • the terminal may output the processed audio data, so that the codec acquires the processed audio data, and performs encoding processing on the processed audio data.
  • the terminal may perform the above processing before decoding, and correspondingly, the terminal may output the processed audio data to the codec, so that the codec obtains the processed audio data, and the processed audio is processed.
  • the data is subjected to decoding processing; the terminal may also perform the above processing after decoding, and correspondingly, the terminal may output the processed audio data through an output component (such as a headphone or a speaker) so that the user can hear the audio.
  • an output component such as a headphone or a speaker
  • Step 405 If it is determined that the target audio processing algorithm is not adjusted, the audio data is processed based on the target audio processing algorithm.
  • the audio data may be directly processed based on the target audio processing algorithm stored in the terminal.
  • This embodiment provides a specific processing procedure for the terminal to adjust the audio processing algorithm. As shown in FIG. 5, the following steps may be included:
  • Step 501 Determine an adjustment coefficient based on the type information.
  • the adjustment coefficient may be determined based on the type information of the audio data.
  • the number of adjustment coefficients may be one or plural.
  • the terminal determines the tone based on the type information.
  • the manner of the integer coefficient can be various. This embodiment provides two feasible ways, as follows:
  • Manner 1 Determine an adjustment coefficient corresponding to the type information of the audio data to be processed according to the correspondence between the pre-stored type information and the adjustment coefficient.
  • the correspondence between the type information and the adjustment coefficient may be pre-stored in the terminal, and the correspondence may be established according to an audio processing algorithm, and different audio processing algorithms may establish different correspondences.
  • the terminal may obtain the correspondence between the type information and the adjustment coefficient corresponding to the target audio processing algorithm.
  • the target audio processing algorithm is an ANS algorithm
  • the adjustment coefficient corresponding to the non-voice activity frame type may be 0
  • the adjustment coefficient corresponding to the music type may be 0.3.
  • the terminal may determine an adjustment coefficient corresponding to the type information according to the obtained correspondence relationship, so as to perform subsequent processing.
  • Manner 2 The feature value of the type information is used as an adjustment coefficient of the audio data.
  • the terminal may also use the feature value of the determined type information as an adjustment coefficient. For example, if the target audio processing algorithm is an ANS algorithm, and the determined feature value of the type information is 0.8, 0.8 may be used as an adjustment coefficient; If the characteristic value of the type information is 0.2, the 0.2 can be used as the adjustment coefficient.
  • Step 502 Determine a target parameter that needs to be adjusted by the parameter value based on the target audio processing algorithm.
  • the terminal may further determine the target parameter corresponding to the target audio processing algorithm according to the correspondence between the audio processing algorithm and the parameter to be adjusted, so as to perform subsequent processing.
  • the target parameter may include an intermediate parameter in the algorithm processing process based on the target audio processing algorithm.
  • the target audio processing algorithm includes an ANS algorithm, and the intermediate parameters may include noise parameters of the noise determined based on the ANS algorithm and the audio data.
  • the terminal may determine the noise corresponding to the audio data based on the ANS algorithm and the audio data to be processed, so as to subsequently adjust the noise parameter of the noise. If the audio data is adjusted in the time domain, the noise parameter of the noise may be the noise value of the noise. If the audio data is adjusted in the frequency domain, the noise parameter of the noise may be the spectral coefficient of the noise and/or Spectrum amplitude.
  • the target audio processing algorithm includes an AGC algorithm, and the intermediate parameters include an attenuation gain factor determined based on the AGC algorithm and the audio data.
  • the terminal may determine a signal gain value of the current frame according to the energy/amplitude of the audio data of the current frame (ie, the audio data to be processed) and the energy/amplitude of the audio data before the current frame, and the gain value may reflect
  • the energy/amplitude of the audio data of the current frame is changed relative to the energy/amplitude of the previous audio data, and then the attenuation gain factor corresponding to the audio data of the current frame can be determined according to the gain value, and then the audio to be processed can be processed by the attenuation gain factor.
  • the data is subjected to enlargement processing or reduction processing so that the energy of the outputted audio data does not suddenly become large or small.
  • the target audio processing algorithm includes an AEC algorithm, and the intermediate parameters include echo parameters of the echo determined based on the AEC algorithm and the audio data.
  • the terminal may determine an echo of the audio data to be processed according to the AEC algorithm, so as to subsequently adjust the echo parameters of the echo.
  • the echo parameter can be the echo value of the echo.
  • the target parameters may also include initial parameters in the algorithm processing based on the target audio processing algorithm. This embodiment provides several examples, as follows:
  • the target audio processing algorithm may include a JBM algorithm, and the initial parameters may include a buffer depth of the audio data.
  • the receiving end can perform real-time buffering on the received audio data, and then output the buffered audio data with the earliest receiving time, so that the receiving end can be in the period when the audio data is not received. Output buffered audio data to improve the continuity of voice communication.
  • the cache depth may be the number of frames of audio data buffered by the terminal during the call.
  • the target audio processing algorithm may include a TSM algorithm
  • the initial parameters may include stretching parameters or compression parameters of the audio data.
  • the receiving end may stretch or compress the received audio data to adjust the playing duration corresponding to the audio data. For example, when the received voice is not enough for one frame, if it is required to output by one frame, the received audio data may be stretched based on the stretching parameter, when the received voice is greater than one frame, if The output needs to be outputted in a frame.
  • the received audio data can be compressed based on the compression parameters. For the specific processing, refer to the prior art, which is not described in this embodiment.
  • the stretching parameter can be used to indicate the degree of stretching of the audio data, such as the target stretching time
  • the compression parameter can be used to indicate the degree of compression of the audio data, such as the target compression time.
  • Step 503 Adjust the parameter value of the target parameter based on the adjustment coefficient.
  • the target audio processing algorithm may be adjusted by multiplying the parameter value of the target parameter by the adjustment coefficient.
  • the target audio processing algorithm is the ANS algorithm
  • the noise parameter of the noise can be multiplied by a larger adjustment coefficient so that the adjusted noise is larger than the calculated noise.
  • the noise can be filtered out in a normal manner, thereby improving the speech intelligibility in the voice communication process without attenuating the speech signal; and for the audio data of the non-voice active frame type, Filter out more noise, so that users can hear noise when no one is talking.
  • the noise parameter of the noise can be multiplied by a smaller adjustment coefficient so that the adjusted noise is smaller than the calculated noise. In this way, for voice type audio data, noise can be filtered out in a normal manner, thereby improving speech intelligibility during speech communication; and for music type audio data, relatively less noise can be filtered out, thereby optimizing music playback. Sound effects.
  • the target audio processing algorithm is the AGC algorithm
  • the attenuation coefficient may be multiplied by the adjustment coefficient 0, so that for the audio data of the voice active frame type, The gain adjustment is performed in the normal manner to keep the volume of the voice communication process consistent; and for the audio data of the non-voice active frame type, the gain adjustment may not be performed, thereby saving processing resources.
  • the attenuation gain factor can be multiplied by a small adjustment factor to obtain a smaller attenuation gain factor.
  • voice type audio data normal gain adjustment can be performed to keep the volume of the voice communication process consistent; and for music type audio data, the gain adjustment range can be reduced, so that the audio data of each frame in the original audio is The energy is basically the same, improving the reproduction degree of music playback.
  • the target audio processing algorithm is the AEC algorithm
  • the echo parameter of the echo can be multiplied by a larger adjustment coefficient so that the adjusted echo is larger than the calculated echo.
  • the echo can be filtered out in a normal manner to improve the speech intelligibility during the voice communication without attenuating the speech signal; and for the audio data of the non-voice active frame type, the filtering can be performed. More echoes are dropped, so that the user can hear the noise when no one is talking.
  • the echo parameters of the echo may be multiplied by a smaller adjustment factor such that the adjusted echo is less than the calculated echo.
  • echo can be filtered out in a normal manner to improve speech intelligibility during speech communication; and for music type audio data, relatively less classified echo can be filtered out to avoid filtering out audio.
  • the target audio processing algorithm can be the JBM algorithm
  • the buffer depth of the audio data can be multiplied by a smaller adjustment coefficient so that the adjusted buffer depth is smaller than The default cache depth in the JBM algorithm.
  • the target audio processing algorithm may be the TSM algorithm
  • the parameter value of the stretch parameter or the compression parameter may be multiplied by a smaller adjustment coefficient, so that the adjusted The parameter value of the stretch parameter or the compression parameter is smaller than the parameter value of the stretch parameter or the compression parameter preset in the TSM algorithm.
  • the degree of stretching or compression of the audio data of the voice activity frame can be reduced, the user can avoid the tone of the tone, and the normal TSM processing of the audio data of the non-voice activity frame can reduce the packet loss caused by the network jitter. It is not possible to output voice or output too much voice in time.
  • the parameter value of the stretch parameter or the compression parameter may be multiplied by a smaller adjustment coefficient, so that the adjusted stretch parameter or the parameter value of the compression parameter is smaller than the TSM algorithm.
  • Set the stretch parameter or the parameter value of the compression parameter if the type of the audio data is music type audio data, you can multiply the parameter value of the stretch parameter or the compression parameter by a smaller adjustment factor, so that the adjusted stretch
  • the parameter value of the parameter or the compression parameter is smaller than the parameter value of the stretching parameter or the compression parameter corresponding to the audio data of the voice type.
  • the target audio processing algorithm in the process of voice communication, whether the target audio processing algorithm is adjusted based on the type information of the audio data and the target audio processing algorithm may be performed, so that when processing certain types of audio data, The target audio processing algorithm is adjusted to achieve better processing results and improve voice communication quality.
  • the embodiment of the present invention further provides a method for processing audio data.
  • the method for processing audio data may include:
  • Step 601 Acquire audio data to be processed.
  • step 401 For the processing of this step, refer to step 401 above, and details are not described herein again.
  • Step 602 determining type information of the target audio processing algorithm and audio data to be used.
  • step 402 For the processing of this step, refer to step 402 above, and details are not described herein again.
  • Step 603 determining an adjustment coefficient based on the type information.
  • step 501 For the processing of this step, refer to step 501 above, and details are not described herein again.
  • Step 604 determining a target parameter that needs to be adjusted by the parameter value based on the target audio processing algorithm.
  • step 502 For the processing of this step, refer to step 502 above, and details are not described herein again.
  • the target parameter includes an intermediate parameter in the algorithm processing process based on the target audio processing algorithm; or the target parameter may also include an initial parameter in the algorithm processing process based on the target audio processing algorithm.
  • Step 605 Adjust the parameter value of the target parameter based on the adjustment coefficient.
  • the target audio processing algorithm may be adjusted by multiplying the parameter value of the target parameter by the adjustment coefficient.
  • the adjustment coefficients for adjusting the audio processing algorithms are different. This embodiment provides a description for adjusting different audio processing algorithms, as follows:
  • the intermediate parameter may be a noise parameter of the noise determined based on the ANS algorithm and the audio data.
  • the noise parameter of the noise may be the noise value of the noise. If the audio data is adjusted in the frequency domain, the noise parameter of the noise may be the spectral coefficient of the noise and / or spectrum amplitude.
  • the noise parameter of the noise is adjusted based on the preset first adjustment coefficient, and if the type information is a non-voice active frame type, based on the preset
  • the second adjustment coefficient adjusts the noise parameter of the noise, and the first adjustment coefficient is smaller than the second adjustment coefficient.
  • the type information is 1, indicating that the type of the audio data is a voice activity frame, the corresponding first adjustment coefficient is 0.7, and the type information is 2, indicating that the type of the audio data is a non-voice active frame, and the corresponding second adjustment coefficient is 1. . If the audio data is a signal of a voice active frame type, the noise of the noise can be multiplied by 0.7 to obtain the adjusted noise. If the audio data is a non-voice active frame type signal, the noise parameter of the noise can be multiplied by one to obtain the adjusted noise.
  • the noise can be filtered out in a normal manner, thereby improving the speech intelligibility in the voice communication process without attenuating the speech signal; and for the audio data of the non-voice active frame type, Filter out more noise, so that users can hear noise when no one is talking.
  • the noise parameter of the noise is adjusted based on the preset third adjustment coefficient. If the type information is a music type, the noise parameter of the noise is adjusted based on the preset fourth adjustment coefficient, and the third The adjustment coefficient is greater than the fourth adjustment coefficient, and the third adjustment coefficient may be less than or equal to the second adjustment coefficient.
  • the type information of the voice type is 1, the corresponding third adjustment coefficient is 0.7; the type information of the music type is 0, and the corresponding fourth adjustment coefficient is 0.3.
  • the audio data is a voice type signal
  • the The noise parameter of the noise is multiplied by 0.7 to obtain the adjusted noise.
  • the audio data is a music type signal
  • the noise parameter of the noise can be multiplied by 0.3 to obtain the adjusted noise. In this way, for voice type audio data, relatively more noise can be filtered out, thereby improving speech intelligibility during speech communication; and for music type audio data, relatively less noise can be filtered out, thereby optimizing music playback. Sound effects.
  • the target audio processing algorithm includes an adaptive echo cancellation AEC algorithm, and the intermediate parameters include echo parameters of the echo determined based on the AEC algorithm and the audio data.
  • the echo parameter of the echo may be the parameter value of the echo parameter.
  • the echo parameters of the echo are adjusted based on the preset fifth adjustment coefficient. If the type information is a non-voice active frame type, based on the preset The sixth adjustment coefficient adjusts the echo parameters of the echo, and the fifth adjustment coefficient is smaller than the sixth adjustment coefficient.
  • the type information is 1, indicating that the type of the audio data is a voice activity frame, the corresponding fifth adjustment coefficient is 0.7, and the type information is 2, indicating that the type of the audio data is a non-voice active frame, and the corresponding sixth adjustment coefficient is 1. . If the sound If the frequency data is a voice activity frame type signal, the echo parameter of the echo can be multiplied by 0.7 to obtain an adjusted echo. If the audio data is a non-voice active frame type signal, the echo of the echo can be multiplied by 1 to obtain an adjusted echo.
  • the echo can be filtered out in a normal manner to improve the speech intelligibility during the voice communication without attenuating the speech signal; and for the audio data of the non-voice active frame type, the filtering can be performed. More echoes are dropped, so that the user can hear the noise when no one is talking.
  • the echo parameter of the echo is adjusted based on the preset seventh adjustment coefficient. If the type information is a music type, the echo parameter of the echo is adjusted based on the preset eighth adjustment coefficient, and the seventh The adjustment coefficient may be greater than the eighth adjustment coefficient, and the seventh adjustment coefficient may be smaller than the sixth adjustment coefficient.
  • the type information of the voice type is 1, the corresponding seventh adjustment coefficient may be 0.7; the type information of the music type is 0, and the corresponding eighth adjustment coefficient may be 0.3.
  • the audio data is a voice type signal, Multiply the echo parameter of the echo by 0.7 to get the adjusted echo.
  • the audio data is a music type signal, the echo of the echo can be multiplied by 0.3 to obtain an adjusted echo. In this way, for voice type audio data, relatively more echoes can be filtered out to improve speech intelligibility during speech communication; and for music type audio data, relatively less classified echoes can be filtered out to avoid filtering out Useful signals in the audio data to optimize the sound of music playback.
  • the target audio processing algorithm includes an automatic gain control AGC algorithm, and the intermediate parameter may include an attenuation gain factor determined based on the AGC algorithm and the audio data.
  • the attenuation gain factor is adjusted based on the preset ninth adjustment coefficient. If the type information is a non-voice active frame type, the preset tenth is based on the preset The adjustment coefficient adjusts the attenuation gain factor, and the ninth adjustment coefficient is greater than the tenth adjustment coefficient.
  • the type information is 1, indicating that the type of the audio data is a voice activity frame, the corresponding ninth adjustment coefficient is 1, and the type information is 2, indicating that the type of the audio data is a non-voice active frame, and the corresponding tenth adjustment coefficient is 0. .
  • the attenuation gain factor can be multiplied by one to obtain an adjusted attenuation gain factor.
  • the attenuation gain factor can be multiplied by 0 to obtain an adjusted attenuation gain factor.
  • the gain adjustment can be performed in a normal manner to keep the volume in the voice communication process consistent; and for the audio data of the non-voice active frame type, the gain adjustment can be omitted, thereby saving processing resources. .
  • the attenuation gain factor is adjusted based on the preset eleventh adjustment coefficient. If the type information is a music type, the attenuation gain factor is adjusted based on the preset twelfth adjustment coefficient. An adjustment coefficient is greater than the twelfth adjustment coefficient, and the twelfth adjustment coefficient may be greater than the tenth adjustment coefficient.
  • the type information of the voice type is 1, the corresponding eleventh adjustment coefficient is 0.7; the type information of the music type is 0, and the corresponding twelfth adjustment coefficient is 0.3.
  • the adjusted attenuation gain factor is obtained by multiplying the attenuation gain factor by 0.7.
  • the attenuation gain factor can be multiplied by 0.3 to obtain an adjusted attenuation gain factor. In this way, for voice type audio data, appropriate gain adjustment can be performed to keep the volume of the voice communication process consistent; and for music type audio data, the gain adjustment range can be reduced, so that the audio data of each frame in the original audio is The energy is basically the same, improving the reproduction degree of music playback.
  • the target audio processing algorithm includes a JBM algorithm, and the initial parameters include the buffer depth of the audio data.
  • the type information is a voice activity frame type, it is based on the preset thirteenth tone.
  • the integer coefficient adjusts the buffer depth. If the type information is a non-voice active frame type, the buffer depth is adjusted based on the preset fourteenth adjustment coefficient, and the thirteenth adjustment coefficient is greater than the fourteenth adjustment coefficient.
  • the type information is 1, indicating that the type of the audio data is a voice activity frame, the corresponding thirteenth adjustment coefficient is 1, and the type information is 2, indicating that the type of the audio data is a non-voice active frame, and the corresponding fourteenth adjustment coefficient.
  • the buffer depth in the JBM algorithm is 10 frames. If the audio data is a voice active frame type signal, it can be determined that the adjustment coefficient can be 1, and the buffer depth in the JBM algorithm is multiplied by 1, that is, after the adjustment.
  • the cache depth in the JBM algorithm is 10 frames.
  • the adjustment coefficient may be 0.5, and the buffer depth in the JBM algorithm is multiplied by 0.5, that is, the buffer depth in the adjusted JBM algorithm is 5 frames.
  • the audio data is processed based on the JBM algorithm, there is a certain delay between the transmitting end and the receiving end. Based on the above processing, for the audio data of the non-voice active frame, the receiving end can only buffer less audio data, thereby reducing The delay between the small sender and the receiver improves the user experience.
  • the target audio processing algorithm includes a TSM algorithm, and the initial parameters include stretching parameters or compression parameters of the audio data.
  • the stretching parameter or the compression parameter is adjusted based on the preset fifteenth adjustment coefficient, and if the type information is a non-voice active frame type, based on the pre-
  • the sixteenth adjustment coefficient is set to adjust the stretching parameter or the compression parameter, and the fifteenth adjustment coefficient is smaller than the sixteenth adjustment coefficient.
  • the type information is 1, indicating that the type of the audio data is a voice activity frame, the corresponding fifteenth adjustment coefficient is 0, and the type information is 2, indicating that the type of the audio data is a non-voice active frame, and the corresponding sixteenth adjustment coefficient. If the audio data is a voice activity frame type signal, it may be determined that the adjustment coefficient may be 0, and the parameter value of the stretch parameter or the compression parameter in the TSM algorithm is multiplied by 0, that is, the audio data may not be Stretching or compression processing to ensure that the call sound does not change.
  • the audio data is a non-voice active frame type signal
  • the adjustment coefficient may be 1, multiplying the parameter value of the stretch parameter or the compression parameter in the TSM algorithm by 1, and then according to the adjusted TSM algorithm,
  • the audio data is processed to obtain processed audio data.
  • the degree of stretching or compression of the audio data of the voice activity frame can be reduced, the user can avoid the tone of the tone, and the normal TSM processing of the audio data of the non-voice activity frame can reduce the packet loss caused by the network jitter. It is not possible to output voice or output too much voice in time.
  • the noise parameter of the noise is adjusted based on the preset seventeenth adjustment coefficient. If the type information is a music type, the noise parameter of the noise is adjusted based on the preset eighteenth adjustment coefficient.
  • the seventeenth adjustment coefficient is greater than the eighteenth adjustment coefficient, and the seventeenth adjustment coefficient may be smaller than the sixteenth adjustment coefficient.
  • the type information of the voice type may be 1, the corresponding seventeenth adjustment coefficient may be 0.7; the type information of the music type is 0, and the corresponding eighteenth adjustment coefficient may be 0, if the audio data is a voice type signal Then, it can be determined that the adjustment coefficient can be 0.7, and the parameter value of the stretch parameter or the compression parameter is multiplied by 0.7, and then the audio data is processed. If the audio data is a music type signal, it may be determined that the adjustment coefficient may be 0, the parameter value of the stretch parameter or the compression parameter is multiplied by 0, and then the audio data is processed.
  • Step 606 Perform algorithm processing on the audio data based on the adjusted parameter values of the target parameter.
  • step 404 For the processing of this step, refer to the related description of step 404 above, and details are not described herein again.
  • the type information of the audio data to be processed may be determined first. And determining, according to the type information, an adjustment coefficient for adjusting the audio data, and then processing the audio data according to the target audio processing algorithm and the adjustment coefficient, and outputting the processed audio data, so that for different types of audio data, Different audio processing can be performed, so that the quality of voice communication can be improved.
  • FIG. 7 is a structural block diagram of an apparatus for processing audio data according to an embodiment of the present invention.
  • the apparatus may be implemented as part or all of a terminal by software, hardware, or a combination of both.
  • the apparatus includes an acquisition unit 701, a determination unit 702, a determination unit 703, an adjustment unit 704, and a processing unit 705.
  • the obtaining unit 701 is configured to perform step 401 and its alternatives in the foregoing embodiments.
  • the determining unit 702 is configured to perform step 402 and its alternatives in the above embodiments.
  • the determining unit 703 is configured to perform step 403 and its alternatives in the above embodiment.
  • the adjusting unit 704 is configured to perform step 404 and its alternatives in the above embodiments.
  • the processing unit 705 is configured to perform step 405 and its alternatives in the above embodiments.
  • the target audio processing algorithm in the process of voice communication, whether the target audio processing algorithm is adjusted based on the type information of the audio data and the target audio processing algorithm may be performed, so that when processing certain types of audio data, The target audio processing algorithm is adjusted to achieve better processing results and improve voice communication quality.
  • FIG. 8 is a structural block diagram of an apparatus for processing audio data according to an embodiment of the present invention.
  • the apparatus may be implemented as part or all of a terminal by software, hardware, or a combination of both.
  • the apparatus includes an acquisition unit 801, a determination unit 802, an adjustment unit 803, and a processing unit 804.
  • the obtaining unit 801 is configured to perform step 601 and its alternatives in the foregoing embodiments.
  • the determining unit 802 is configured to perform steps 602-604 and its alternatives in the foregoing embodiments.
  • the adjusting unit 803 is configured to perform step 605 and its alternatives in the above embodiment.
  • the processing unit 804 is configured to perform step 606 and its alternatives in the above embodiments.
  • the type information of the audio data to be processed may be determined first, and then the adjustment coefficient for adjusting the audio data is determined according to the type information, and then according to the target audio processing algorithm and the adjustment coefficient.
  • the audio data is processed, and the processed audio data is output, so that different audio processing can be performed for different types of audio data, thereby improving the quality of voice communication.
  • a person skilled in the art may understand that all or part of the steps of implementing the above embodiments may be completed by hardware, or may be instructed by a program to execute related hardware, and the program may be stored in a computer readable storage medium.
  • the storage medium mentioned may be a read only memory, a magnetic disk or an optical disk or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Telephone Function (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

一种处理音频数据的方法和装置,方法包括:获取待处理的音频数据(401);确定待使用的目标音频处理算法和音频数据的类型信息(402);基于音频数据的类型信息和目标音频处理算法,判断是否对目标音频处理算法进行调整(403);如果判断对目标音频处理算法进行调整,则对目标音频处理算法进行调整,基于调整后的目标音频处理算法对音频数据进行处理(404);如果判断不对目标音频处理算法进行调整,则基于目标音频处理算法对音频数据进行处理(405)。该方法和装置可以提高语音通信质量。

Description

一种处理音频数据的方法和装置
本申请要求于2016年11月30日提交中国专利局、申请号为201611080131.0、发明名称为“一种处理音频数据的方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及无线通信技术领域,特别涉及一种处理音频数据的方法和装置。
背景技术
随着通信技术的发展,移动终端的使用越来越普及,人们可以通过移动终端进行语音通信。在语音通信的过程中,发送端的用户可以说话或播放音乐,发送端则可以检测到相应的音频数据,然后将检测到的音频数据发送给接收端,接收端接收到该音频数据后,可以通过耳机或扬声器等部件播放该音频数据,以使接收端的用户可以听到相应的音频。
由于网络环境的影响,音频数据可能会受到噪声干扰,或者出现延迟、回声、丢失等情况,因此,发送端和接收端可以通过预设的音频处理算法,对音频数据进行处理,以提高语音通信质量。例如,音频处理算法可以为3A算法,即AEC(Adaptive Echo Cancellation,自适应回声消除)算法、ANS(Automatic Noise Suppression,自动噪声抑制)算法和AGC(Automatic Gain Control,自动增益控制)算法,基于3A算法可以降低音频数据的噪声、消除回声,并使得输出信号有一定的能量且稳定;又如,音频处理算法可以为JBM(Jitter Buffer Management,抖动缓存管理)算法,基于JBM算法,可以使得在网络抖动时,仍然能够保证有相对连续稳定的信号输出。
由于上述技术方案对于语音通信过程中的所有音频数据,都会通过上述音频处理算法进行处理,然而某些音频数据的在进行上述处理后,听觉效果会变差,例如,音频数据为一段音乐的音频数据,经过ANS算法进行降噪处理后,会严重影响该音乐的音效,这样,会导致通信质量较差。
发明内容
为了解决通信质量较差的问题,本发明实施例提供了一种处理音频数据的方法和装置。所述技术方案如下:
第一方面,提供了一种处理音频数据的方法,所述方法包括:
获取待处理的音频数据;
确定待使用的目标音频处理算法和所述音频数据的类型信息;
基于所述音频数据的类型信息和所述目标音频处理算法,判断是否对所述目标音频处理算法进行调整;
如果判断对所述目标音频处理算法进行调整,则对所述目标音频处理算法进行调整,基于调整后的目标音频处理算法对所述音频数据进行处理;
如果判断不对所述目标音频处理算法进行调整,则基于所述目标音频处理算法对所述音频数据进行处理。
本发明实施例中,在语音通信的过程中,可以基于音频数据的类型信息和目标音频处理算法,判断是否对所述目标音频处理算法进行调整,从而可以在处理某些类型的音频数据时,对目标音频处理算法进行调整,以达到较佳的处理效果,提高语音通信质量。
在一种可能的实现方式中,所述对所述目标音频处理算法进行调整,包括:
基于所述类型信息确定调整系数;
基于所述目标音频处理算法确定需要进行参数值调整的目标参数;
基于所述调整系数对所述目标参数的参数值进行调整。
本发明实施例中,提供了一种调整音频处理算法的实现方式。
在另一种可能的实现方式中,所述目标参数包括基于所述目标音频处理算法进行处理过程中的中间参数。
在另一种可能的实现方式中,所述目标音频处理算法包括自动噪声抑制ANS算法,所述中间参数包括基于所述ANS算法和所述音频数据确定出的噪声的噪声参数。
在另一种可能的实现方式中,所述目标音频处理算法包括自动增益控制AGC算法,所述中间参数包括基于所述AGC算法和所述音频数据确定出的衰减增益因子。
在另一种可能的实现方式中,所述目标音频处理算法包括自适应回声消除AEC算法,所述中间参数包括基于所述AEC算法和所述音频数据确定出的回声的回声参数。
在另一种可能的实现方式中,所述目标参数包括基于所述目标音频处理算法进行处理过程中的初始参数。
在另一种可能的实现方式中,所述目标音频处理算法包括抖动缓存管理JBM算法,所述初始参数包括音频数据的缓存深度。
在另一种可能的实现方式中,所述目标音频处理算法包括时间尺度调整TSM算法,所述初始参数包括音频数据的拉伸参数或压缩参数。
在另一种可能的实现方式中,所述基于所述音频数据的类型信息和所述目标音频处理算法,判断是否对所述目标音频处理算法进行调整,包括:
当所述目标音频处理算法为ANS算法时,如果所述类型信息为非话音活动帧类型,则判断对所述ANS算法进行调整;如果所述类型信息为话音活动帧类型,则判断对所述ANS算法不进行调整;
当所述目标音频处理算法为ANS算法时,如果所述类型信息为音乐类型,则判断对所述ANS算法进行调整;如果所述类型信息为语音类型,则判断对所述ANS算法不进行调整;
当所述目标音频处理算法为AGC算法时,如果所述类型信息为非话音活动帧类型,则判断对所述AGC算法进行调整;如果所述类型信息为话音活动帧类型,则判断对所述AGC算法不进行调整;
当所述目标音频处理算法为AGC算法时,如果所述类型信息为音乐类型,则判断对所述AGC算法进行调整;如果所述类型信息为语音类型,则判断对所述AGC算法不进行调整;
当所述目标音频处理算法为AEC算法时,如果所述类型信息为非话音活动帧类型,则判断对所述AEC算法进行调整;如果所述类型信息为话音活动帧类型,则判断对所述AEC算法不进行调整;
当所述目标音频处理算法为AEC算法时,如果所述类型信息为音乐类型,则判断对所述AEC算法进行调整;如果所述类型信息为语音类型,则判断对所述AEC算法不进行调整;
当所述目标音频处理算法为JBM算法时,如果所述类型信息为非话音活动帧类型,则判断对所述JBM算法进行调整;如果所述类型信息为话音活动帧类型,则判断对所述JBM算法不进行调整;或
当所述目标音频处理算法为TSM算法时,如果所述类型信息为话音活动帧类型,则判断对所述TSM算法进行调整;如果所述类型信息为非话音活动帧类型,则判断所述TSM算法不进行调整。
第二方面,提供了一种处理音频数据的方法,所述方法包括:
获取待处理的音频数据;
确定待使用的目标音频处理算法和所述音频数据的类型信息;
基于所述类型信息确定调整系数;
基于所述目标音频处理算法确定需要进行参数值调整的目标参数;
基于所述调整系数对所述目标参数的参数值进行调整;
基于所述目标参数调整后的参数值,对所述音频数据进行处理。
本发明实施例中,在语音通信的过程中,可以先确定待处理的音频信号的类别信息,然后根据该类别信息,确定用于调整音频信号的调整系数,进而根据目标音频处理算法和调整系数,对音频信号进行处理,并输出处理后的音频信号,这样,对于不同的类别的音频信号,可以进行不同的音频处理,从而可以提高语音通信质量。
在一种可能的实现方式中,所述目标参数包括基于所述目标音频处理算法进行处理过程中的中间参数。
在另一种可能的实现方式中,所述目标音频处理算法包括自动噪声抑制ANS算法,所述中间参数包括基于所述ANS算法和所述音频数据确定出的噪声的噪声参数。
在另一种可能的实现方式中,所述目标音频处理算法包括自动增益控制AGC算法,所述中间参数包括基于所述AGC算法和所述音频数据确定出的衰减增益因子。
在另一种可能的实现方式中,所述目标音频处理算法包括自适应回声消除AEC算法,所述中间参数包括基于所述AEC算法和所述音频数据确定出的回声的回声参数。
在另一种可能的实现方式中,所述基于所述调整系数对所述目标参数的参数值进行调整,包括:
当所述目标音频处理算法为ANS算法时,如果所述类型信息为话音活动帧类型,则基于预设的第一调整系数对所述噪声的噪声参数进行调整,如果所述类型信息为非话音活动帧类型,则基于预设的第二调整系数对所述噪声的噪声参数进行调整,所述第一调整系数小于所述第二调整系数;
当所述目标音频处理算法为ANS算法时,如果所述类型信息为语音类型,则基于预设的第三调整系数对所述噪声的噪声参数进行调整,如果所述类型信息为音乐类型,则基于预设的第四调整系数对所述噪声的噪声参数进行调整,所述第三调整系数大于所述第四调整系数;
当所述目标音频处理算法为AEC算法时,如果所述类型信息为话音活动帧类型,则基于预设的第五调整系数对所述回声的回声参数进行调整,如果所述类型信息为非话音活动帧类型,则基于预设的第六调整系数对所述回声的回声参数进行调整,所述第五调整系数小于所述第六调整系数;
当所述目标音频处理算法为AEC算法时,如果所述类型信息为语音类型,则基于预设的第七调整系数对所述回声的回声参数进行调整,如果所述类型信息为音乐类型,则基于预设的第八调整系数对所述回声的回声参数进行调整,所述第七调整系数大于所述第八调整系数;
当所述目标音频处理算法为AGC算法时,如果所述类型信息为话音活动帧类型,则基于预设的第九调整系数对所述衰减增益因子进行调整,如果所述类型信息为非话音活动帧类型,则基于预设的第十调整系数对所述衰减增益因子进行调整,所述第九调整系数大于所述第十调整系数;或
当所述目标音频处理算法为AGC算法时,如果所述类型信息为语音类型,则基于预设的第十一调整系数对所述衰减增益因子进行调整,如果所述类型信息为音乐类型,则基于预设的第十二调整系数对所述衰减增益因子进行调整,所述第十一调整系数大于所述第十二调整系数。
在另一种可能的实现方式中,所述目标参数包括基于所述目标音频处理算法进行处理过程中的初始参数。
在另一种可能的实现方式中,所述目标音频处理算法包括抖动缓存管理JBM算法,所述初始参数包括音频数据的缓存深度。
在另一种可能的实现方式中,所述目标音频处理算法包括时间尺度调整TSM算法,所述初始参数包括音频数据的拉伸参数或压缩参数。
在另一种可能的实现方式中,所述基于所述调整系数对所述目标参数的参数值进行调整,包括:
当所述目标音频处理算法为JBM算法时,如果所述类型信息为话音活动帧类型,则基于预设的第十三调整系数对所述缓存深度进行调整,如果所述类型信息为非话音活动帧类型,则基于预设的第十四调整系数对所述缓存深度进行调整,所述第十三调整系数大于所述第十四调整系数;
当所述目标音频处理算法为TSM算法时,如果所述类型信息为话音活动帧类型,则基于预设的第十五调整系数对所述拉伸参数或压缩参数进行调整,如果所述类型信息为非话音活动帧类型,则基于预设的第十六调整系数对所述拉伸参数或压缩参数进行调整,所述第十五调整系数小于所述第十六调整系数;或
当所述目标音频处理算法为TSM算法时,如果所述类型信息为语音类型,则基于预设的第十七调整系数对所述噪声进行调整,如果所述类型信息为音乐类型,则基于预设的第十八调整系数对所述噪声进行调整,所述第十七调整系数大于所述第十八调整系数。
第三方面,提供了一种处理音频数据的装置,该装置包括:处理器、网络接口、存储器以及总线,存储器与网络接口分别通过总线与处理器相连;处理器被配置为执行存储器中存储的指令;处理器通过执行指令来实现上述第一方面或第一方面中任意一种可能的实 现方式所提供的处理音频数据的方法。
第四方面,本发明实施例提供了一种处理音频数据的装置,该处理音频数据的装置包括至少一个单元,该至少一个单元用于实现上述第一方面或第一方面中任意一种可能的实现方式所提供的处理音频数据的方法。
第五方面,提供了一种处理音频数据的装置,该装置包括:处理器、网络接口、存储器以及总线,存储器与网络接口分别通过总线与处理器相连;处理器被配置为执行存储器中存储的指令;处理器通过执行指令来实现上述第二方面或第二方面中任意一种可能的实现方式所提供的处理音频数据的方法。
第六方面,本发明实施例提供了一种处理音频数据的装置,该处理音频数据的装置包括至少一个单元,该至少一个单元用于实现上述第二方面或第二方面中任意一种可能的实现方式所提供的处理音频数据的方法。
第七方面,本发明实施例提供了一种计算机存储介质,该存储介质上存储计算机程序,所述计算机程序被处理器执行时实现以下步骤:
获取待处理的音频数据;
确定待使用的目标音频处理算法和所述音频数据的类型信息;
基于所述音频数据的类型信息和所述目标音频处理算法,判断是否对所述目标音频处理算法进行调整;
如果判断对所述目标音频处理算法进行调整,则对所述目标音频处理算法进行调整,基于调整后的目标音频处理算法对所述音频数据进行处理;
如果判断不对所述目标音频处理算法进行调整,则基于所述目标音频处理算法对所述音频数据进行处理。
第八方面,本发明实施例提供了一种计算机存储介质,该存储介质上存储计算机程序,所述计算机程序被处理器执行时实现以下步骤:
获取待处理的音频数据;
确定待使用的目标音频处理算法和所述音频数据的类型信息;
基于所述类型信息确定调整系数;
基于所述目标音频处理算法确定需要进行参数值调整的目标参数;
基于所述调整系数对所述目标参数的参数值进行调整;
基于所述目标参数调整后的参数值,对所述音频数据进行处理。
上述本发明实施例第三、第四方面和第七方面所获得的技术效果与第一方面中对应的技术手段获得的技术效果近似,上述本发明实施例第五、第六方面和第八方面所获得的技术效果与第二方面中对应的技术手段获得的技术效果近似,在这里不再赘述。
本发明实施例中,在语音通信的过程中,可以基于音频数据的类型信息和目标音频处理算法,判断是否对所述目标音频处理算法进行调整,从而可以在处理某些类型的音频数 据时,对目标音频处理算法进行调整,以达到较佳的处理效果,提高语音通信质量。
附图说明
图1是本发明实施例提供的一种系统框架图;
图2是本发明实施例提供的一种传输音频数据的示意图;
图3是本发明实施例提供的一种终端的结构示意图;
图4是本发明实施例提供的一种处理音频数据的方法流程图;
图5是本发明实施例提供的一种处理音频数据的方法流程图;
图6是本发明实施例提供的一种处理音频数据的方法流程图;
图7是本发明实施例提供的一种处理音频数据的装置结构示意图;
图8是本发明实施例提供的一种处理音频数据的装置结构示意图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
为使本发明的目的、技术方案和优点更加清楚,下面将结合附图对本发明实施方式作进一步地详细描述。
本发明实施例提供了一种处理音频数据的方法,该方法的执行主体为终端。其中,该终端可以是语音通信过程中,用于发送音频数据的发送端,也可以是用于接收音频数据的接收端。在语音通信的过程中,发送端可以通过麦克风等输入设备检测音频数据,该音频数据可以是用户的语音,可以是一段音乐,也可以是其他音频数据。发送端检测到音频数据后,可以对该音频数据进行编码,然后通过网络将编码后的音频数据发送给接收端,接收端接收到编码后的音频数据后,可以对该音频数据进行解码,然后播放解码后的音频数据。如图1所示,为本发明实施例提供的系统框架图,其中包括发送端、接收端和网络。
为了提高语音通信的质量,终端中可以预先存储音频处理算法,以便对音频数据进行处理。音频处理算法可以为3A算法,即AEC(Adaptive Echo Cancellation,自适应回声消除)算法、ANS(Automatic Noise Suppression,自动噪声抑制)算法和AGC(Automatic Gain Control,自动增益控制)算法,基于3A算法可以消除音频数据中的回声,降低音频数据的噪声、并提高信号输出的稳定性;音频处理算法可以为JBM(Jitter Buffer Management,抖动缓存管理)算法,基于JBM算法,可以在接受不到音频数据的时间段内,发送缓存的音频数据,提高通话的持续性;音频处理算法还可以为TSM(Time Scale Modification,时间尺度调整)算法,基于TSM算法,可以对音频数据进行拉伸或压缩,从而将音频数据调整为目标时长的音频数据,提高通话的持续性,例如,由于网络的原因,终端在某一帧内接收到的音频数据的时长小于一帧,则可以通过TSM算法将接收到的音频数据拉伸为时长为一帧的音频数据,或者,终端在某一帧内接收到的音频数据的时长大于一帧,则可以通过TSM算法将接收到的音频数据压缩为时长为一帧的音频数据。发送端和接收端之间传输音频数据的过程可以如下:发送端在检测到音频数据后,可以通过3A算法,对该音频数据进行处理,然后对处理后的音频数据进行编码,进而通过无线通信网络将编码后的音频数据 发送给接收端。接收端接收到编码后的音频数据后,可以通过JBM算法和/或TSM算法,对接收到的音频数据进行处理,然后对处理后的音频数据进行解码,再通过3A算法,对解码后的音频数据进行处理,进而通过输出设备(如耳机或扬声器等)对处理后的音频数据进行输出,以使接收端的用户可以听到该音频数据,如图2所示,为发送端和接收端之间传输音频数据的示意图。
参见图3,其示出了本发明示例性实施例提供的一种终端,该终端可以是上述发送端或者接收端,该终端10包括收发器1011和存储器1012,该终端还可以包括处理器1013和网络接口1014。其中,存储器1012和网络接口1014分别与处理器1013连接;存储器1012用于存储程序代码,程序代码包括计算机操作指令,处理器1013和收发器1011用于执行存储器1012中存储的程序代码,用于实现音频数据的相关处理,并可以通过网络接口1014与基站或其他终端进行交互。
处理器1013包括一个或者一个以上处理核心。处理器1013通过运行软件程序以及单元,从而执行下述处理音频数据的方法。
在一个可能的设计中,该终端还可以包括总线1015等部件。其中,存储器1012与网络接口1014分别通过总线1015与处理器1013和收发器1011相连。
存储器1012可用于存储软件程序以及单元。具体的,存储器1012可存储操作系统10121、至少一个功能所需的应用程序单元10122。操作系统10121可以是实时操作系统(Real Time eXecutive,RTX)、LINUX、UNIX、WINDOWS或OS X之类的操作系统。
图4是本发明一示例性实施例提供的一种处理音频数据的方法流程图,该方法可以用于如图1所示的系统框架中。如图4所示,该处理音频数据的方法可以包括:
步骤401,获取待处理的音频数据。
其中,音频数据可以是终端检测到或解码处理得到的音频信号,也可以是经过编码处理得到的音频码流。类型信息可以是用于表示该音频数据的类型的信息,音频数据的类型可以包括话音活动帧和非话音活动帧,话音活动帧可以包括语音类型和音乐类型。
在实施中,终端可以获取待处理的音频数据,对于终端为发送端的情况,终端可以通过输入设备(如麦克风)检测音频数据,将检测到的音频数据,作为待处理的音频数据。对于终端为接收端的情况,终端可以通过接收部件接收发送端发送的音频码流,将接收到的音频码流作为待处理的音频数据,或者,也可以将进行某种处理后的音频数据作为待处理的音频数据,如解码处理或某算法处理。
步骤402,确定待使用的目标音频处理算法和音频数据的类型信息。
在实施中,终端获取到待处理的音频数据后,可以根据该音频数据在语音通信过程中所处的阶段,确定待使用的目标音频处理算法。例如,待处理的音频数据为发送端检测到的音频数据,目标音频处理算法可以为3A算法;待处理的音频数据为接收端解码处理后的音频数据,目标音频处理算法可以为3A算法;待处理的音频数据为接收端接收到的音频数据,目标音频处理算法可以为JBM算法或TSM算法。
另外,终端获取到待处理的音频数据后,还可以确定该音频数据的类型信息。终端可以根据已有的音频分类算法,确定音频数据的类型信息,相应的处理过程可以如下:根据预先存储的音频分类算法,确定音频数据的特征值,根据音频数据的特征值,确定音频数据的类型信息。
在实施中,终端中可以预先存储用于对音频数据进行分类的音频分类算法,终端获取到待处理的音频数据后,可以根据预先存储的音频分类算法,计算音频数据的特征值,然后可以根据该音频数据的特征值,确定该音频数据的类型信息。其中,音频分类算法可以采用现有技术中的音频分类算法,如VAD(Voice Activity Detection,语音活动检测)算法和语音音乐分类算法。基于VAD算法,可以确定该音频数据是话音活动帧,还是非话音活动帧;基于语音音乐分类器,可以进一步确定该话音活动帧类型的音频数据是语音类型,还是音乐类型。
终端计算出该音频数据的特征值后,可以根据该特征值确定该音频数据的类型信息。终端可以判断该特征值是否大于预设的分类阈值,如果该特征值大于预设的分类阈值,则可以将第一类型信息作为该音频数据的类型信息,如果该特征值小于预设的类型信息,则可以将第二类型信息作为该音频数据的类型信息。例如,预设的分类阈值为0.5,该音频数据的特征值为0.8,则该音频数据的类型信息为1,表示该音频数据为语音类型的信号;该音频数据的特征值为0.2,则该音频数据的类型信息为0,表示该音频数据为音乐类型的信号。
或者,终端也可以从编解码器中获取音频数据的类型信息。
在实施中,终端可以采用具有信号分类功能的编解码器,编解码器中可以存储有音频分类算法。当某音频数据输入到编解码器后,编解码器可以根据预先存储的音频分类算法,确定音频数据的特征值,进而根据音频数据的特征值,确定该音频数据的类型信息,具体处理过程与上述过程类似,不再赘述。编解码器可以对确定出的类型信息进行存储,以便进行后续处理。
由于终端可能会先通过音频处理算法对音频数据进行处理,然后再进行编解码,这时,终端可以从编解码器中获取类型信息,作为当前帧音频数据的类型信息。而编解码器中存储的类型信息,为编解码器对上一帧输入的音频数据进行分析得到的类型信息,这种情况下,类型信息相对于该音频数据会存在一帧时延,但是,由于语音信号可以理解为类周期的慢渐变信号,所以该时延可以忽略。
步骤403,基于音频数据的类型信息和目标音频处理算法,判断是否对目标音频处理算法进行调整。
在实施中,音频数据的类型信息可以包括话音活动帧类型和非话音活动帧类型,其中,话音活动帧类型可以包括音乐类型和语音类型。基于不同的需求,终端可以对音频信号进行不同等级的分类,例如,可以将音频数据分为话音活动帧类型和非话音活动帧,或者,也可以先将音频数据分为非话音活动帧和话音活动帧类型,在对话音活动帧类型的音频数据进一步分类成语音类型或音乐类型,本实施例不做限定。
终端确定目标音频处理算法和音频数据的类型信息之后,可以根据预先存储的音频处理算法和需要调整的类型信息对应关系,确定目标音频处理算法对应的需要调整的类型信息(可称为目标类型信息),如果待处理的音频数据的类型信息为目标类型信息,则判断对目标音频处理算法进行调整,否则,判断对目标音频处理算法不进行调整。本实施例对常用的几种音频处理算法的判断方式进行了说明,具体如下:
一、当目标音频处理算法为ANS算法时,如果类型信息为非话音活动帧类型,则判断对ANS算法进行调整;如果类型信息为话音活动帧类型,则判断对ANS算法不进行调整;
对于终端进一步确定音频数据是音乐类型或语音类型的情况,当目标音频处理算法为ANS算法时,如果类型信息为音乐类型,则判断对ANS算法进行调整;如果类型信息为语音类型,则判断对ANS算法不进行调整。
二、当目标音频处理算法为AGC算法时,如果类型信息为非话音活动帧类型,则判断对AGC算法进行调整;如果类型信息为话音活动帧类型,则判断对AGC算法不进行调整;
对于终端进一步确定音频数据是音乐类型或语音类型的情况,当目标音频处理算法为AGC算法时,如果类型信息为音乐类型,则判断对AGC算法进行调整;如果类型信息为语音类型,则判断对AGC算法不进行调整。
三、当目标音频处理算法为AEC算法时,如果类型信息为非话音活动帧类型,则判断对AEC算法进行调整;如果类型信息为话音活动帧类型,则判断对AEC算法不进行调整;
终端可以确定音频数据是音乐类型或语音类型,当目标音频处理算法为AEC算法时,如果类型信息为音乐类型,则判断对AEC算法进行调整;如果类型信息为语音类型,则判断对AEC算法不进行调整。
四、当目标音频处理算法为JBM算法时,如果类型信息为非话音活动帧类型,则判断对JBM算法进行调整;如果类型信息为话音活动帧类型,则判断对JBM算法不进行调整。
五、当目标音频处理算法为TSM算法时,如果类型信息为话音活动帧类型,则判断对TSM算法进行调整;如果类型信息为非话音活动帧类型,则判断TSM算法不进行调整。
对于TSM算法,终端还可以进一步确定音频数据是音乐类型或语音类型,对于音乐类型的音频数据和语音类型的音频数据,可以进行不同程度的调整,后续会进行详细介绍。
步骤404,如果判断对目标音频处理算法进行调整,则对目标音频处理算法进行调整,基于调整后的目标音频处理算法对音频数据进行处理。
在实施中,如果终端判断对目标音频处理算法进行调整,则可以根据预先存储的音频处理算法的调整策略,对目标音频处理算法进行调整,基于调整后的目标音频处理算法对音频数据进行处理,进而可以输出处理后的音频数据。对于终端为发送端的情况,终端可以输出处理后的音频数据,以使编解码器获取处理后的音频数据,对处理后的音频数据进行编码处理。对于终端为接收端的情况,终端可以在解码之前进行上述处理,相应的,终端可以将处理后的音频数据输出给编解码器,以使编解码器获取处理后的音频数据,对处理后的音频数据进行解码处理;终端也可以在解码之后进行上述处理,相应的,终端可以将处理后的音频数据通过输出部件进行输出(如耳机或扬声器等),以使用户可以听到该音频。终端对音频处理算法进行调整的具体过程后续会进行详细介绍。
步骤405,如果判断不对目标音频处理算法进行调整,则基于目标音频处理算法对音频数据进行处理。
在实施中,如果终端判断不对目标音频处理算法进行调整,则可以基于终端中存储的目标音频处理算法,直接对音频数据进行处理。
本实施例提供了终端对音频处理算法进行调整的具体处理过程,如图5所示,可以包括以下步骤:
步骤501,基于类型信息确定调整系数。
在实施中,终端判断对目标音频处理算法进行调整后,可以基于音频数据的类型信息确定调整系数。该调整系数的数目可以是一个,也可以是多个。终端基于类型信息确定调 整系数的方式可以是多种多样的,本实施例提供了两种可行的方式,具体如下:
方式一、根据预先存储的类型信息与调整系数的对应关系,确定待处理的音频数据的类型信息对应的调整系数。
在实施中,终端中可以预先存储类型信息和调整系数的对应关系,该对应关系可以根据音频处理算法建立,不同的音频处理算法可以建立不同的对应关系。终端获取到目标音频处理算法后,可以获取目标音频处理算法对应的类型信息和调整系数的对应关系。例如,目标音频处理算法为ANS算法,非话音活动帧类型对应的调整系数可以为0;音乐类型对应的调整系数可以为0.3。终端确定该音频数据的类型信息后,可以根据获取到的对应关系,确定该类型信息对应的调整系数,以便进行后续处理。
方式二、将类型信息的特征值作为音频数据的调整系数。
在实施中,终端也可以将确定出的类型信息的特征值作为调整系数,例如,目标音频处理算法为ANS算法,确定出的类型信息的特征值为0.8,则可以将0.8作为调整系数;确定出的类型信息的特征值为0.2,则可以将0.2作为调整系数。
步骤502,基于目标音频处理算法确定需要进行参数值调整的目标参数。
在实施中,不同的音频处理算法中,需要调整的参数是不同的,终端还可以根据音频处理算法和待调整的参数的对应关系,确定目标音频处理算法对应的目标参数,以便进行后续处理。
其中,目标参数可以包括基于目标音频处理算法进行算法处理过程中的中间参数,本实施例提供了几个示例,具体如下:
一、目标音频处理算法包括ANS算法,中间参数可以包括基于ANS算法和音频数据确定出的噪声的噪声参数。
在实施中,终端可以基于ANS算法和待处理的音频数据,确定该音频数据对应的噪声,以便后续对该噪声的噪声参数进行调整。如果是对音频数据进行时域上的调整,则噪声的噪声参数可以为噪声的噪声值,如果是对音频数据进行频域上的调整,则噪声的噪声参数可以为噪声的频谱系数和/或频谱幅度。
二、目标音频处理算法包括AGC算法,中间参数包括基于AGC算法和音频数据确定出的衰减增益因子。
在实施中,终端可以根据当前帧的音频数据(即待处理的音频数据)的能量/幅度,以及当前帧之前的音频数据的能量/幅度,确定当前帧的信号增益值,该增益值可以反映当前帧的音频数据的能量/幅度相对于之前的音频数据的能量/幅度的变化,进而可以根据增益值确定当前帧的音频数据对应的衰减增益因子,然后可以通过衰减增益因子,对待处理的音频数据进行放大处理或缩小处理,使得输出的音频数据的能量不会突然变大或变小。
三、目标音频处理算法包括AEC算法,中间参数包括基于AEC算法和音频数据确定出的回声的回声参数。
在实施中,终端可以根据AEC算法确定待处理的音频数据的回声,以便后续对该回声的回声参数进行调整。该回声参数可以是该回声的回声值。
目标参数也可以包括基于目标音频处理算法进行算法处理过程中的初始参数。本实施例提供了几个示例,具体如下:
一、目标音频处理算法可以包括JBM算法,初始参数可以包括音频数据的缓存深度。
在实施中,基于JBM算法,接收端可以对接收到的音频数据进行实时缓存,然后将缓存的接收时间最早的音频数据进行输出,这样,在接收不到音频数据的时间段内,接收端可以输出缓存的音频数据,从而提高语音通信的持续性。其中,缓存深度可以为通话过程中终端缓存的音频数据的帧数。
二、目标音频处理算法可以包括TSM算法,初始参数可以包括音频数据的拉伸参数或压缩参数。
在实施中,基于TSM算法,接收端可以对接收到的音频数据进行拉伸或压缩,以调整音频数据对应的播放时长。例如,当接收端接收到的语音不够一帧时,如果需要按一帧进行输出,可以基于拉伸参数将接收到的音频数据进行拉伸,当接收端接收到的语音大于一帧时,如果需要按一帧进行输出,可以基于压缩参数将接收到的音频数据进行压缩,具体的处理过程可以参考现有技术,本实施例不再赘述。其中,拉伸参数可以用于表示音频数据的拉伸程度,如目标拉伸时长;压缩参数可以用于表示音频数据的压缩程度,如目标压缩时长。
步骤503,基于调整系数对目标参数的参数值进行调整。
在实施中,终端确定出调整系数,以及需要进行参数值调整的目标参数后,可以用目标参数的参数值,乘以该调整系数,从而对目标音频处理算法进行调整。本实施例提供了对不同的音频处理算法进行调整的情况,具体如下:
对于目标音频处理算法为ANS算法的情况,如果音频数据的类型为非话音活动帧类型的音频数据,则可以用噪声的噪声参数乘以较大的调整系数,使得调整后的噪声大于计算出的噪声。这样,对于话音活动帧类型的音频数据,可以按照正常方式滤掉噪声,从而提高语音通信过程中的语音清晰度,同时又不会削弱语音信号;而对于非话音活动帧类型的音频数据,可以滤掉较多的噪声,从而避免在无人说话时,用户听到杂音的情况。
如果音频数据的类型为音乐类型的音频数据,则可以用噪声的噪声参数乘以较小的调整系数,使得调整后的噪声小于计算出的噪声。这样,对于语音类型的音频数据,可以按照正常方式滤掉噪声,从而提高语音通信过程中的语音清晰度;而对于音乐类型的音频数据,可以滤掉相对较少的噪声,从而优化音乐播放的音效。
对于目标音频处理算法为AGC算法的情况,如果音频数据的类型为非话音活动帧类型的音频数据,则可以用衰减增益因子乘以调整系数0,这样,对于话音活动帧类型的音频数据,可以按照正常方式进行增益调整,使语音通信过程中的音量保持一致;而对于非话音活动帧类型的音频数据,可以不进行增益调整,从而节约处理资源。
如果音频数据的类型为音乐类型的音频数据,则可以用衰减增益因子乘以小的调整系数,得到较小的衰减增益因子。这样,对于语音类型的音频数据,可以进行正常的增益调整,使语音通信过程中的音量保持一致;而对于音乐类型的音频数据,可以降低增益的调整幅度,使得原音频中各帧音频数据的能量基本不变,提高音乐播放的还原度。
对于目标音频处理算法为AEC算法的情况,如果音频数据的类型为非话音活动帧类型的音频数据,则可以用回声的回声参数乘以较大的调整系数,使得调整后的回声大于计算出的回声。这样,对于话音活动帧类型的音频数据,可以按照正常方式滤掉回声,提高语音通信过程中的语音清晰度,同时又不会削弱语音信号;而对于非话音活动帧类型的音频数据,可以滤掉较多的回声,从而避免在无人说话时,用户听到杂音的情况。
如果音频数据的类型为音乐类型的音频数据,则可以用回声的回声参数乘以较小的调整系数,使得调整后的回声小于计算出的回声。这样,对于语音类型的音频数据,可以按照正常方式滤掉回声,提高语音通信过程中的语音清晰度;而对于音乐类型的音频数据,可以滤掉相对较少分类的回声,以避免滤掉音频数据中的有用信号,从而优化音乐播放的音效。
对于目标音频处理算法可以为JBM算法的情况,如果音频数据的类型为非话音活动帧类型的音频数据,则可以用音频数据的缓存深度乘以较小的调整系数,使得调整后的缓存深度小于JBM算法中预设的缓存深度。基于JBM算法对音频数据进行处理时,发送端和接收端会存在一定的时延,而基于上述调整后,对于非话音活动帧的音频数据,接收端可以只缓存较少的音频数据,从而可以减小发送端和接收端的时延,提高用户体验。
对于目标音频处理算法可以为TSM算法的情况,如果音频数据的类型为话音活动帧类型的音频数据,则可以用拉伸参数或压缩参数的参数值乘以较小的调整系数,使得调整后的拉伸参数或压缩参数的参数值小于TSM算法中预设的拉伸参数或压缩参数的参数值。这样,可以降低对话音活动帧的音频数据的拉伸或压缩程度,避免用户听到变调的音频,同时,对于非话音活动帧的音频数据进行正常的TSM处理,可以减少网络抖动时丢包引起的不能及时输出语音或输出的语音过多的情况。
如果音频数据的类型为语音类型的音频数据,则可以用拉伸参数或压缩参数的参数值乘以较小的调整系数,使得调整后的拉伸参数或压缩参数的参数值小于TSM算法中预设的拉伸参数或压缩参数的参数值;如果音频数据的类型为音乐类型的音频数据,则可以用拉伸参数或压缩参数的参数值乘以更小的调整系数,使得调整后的拉伸参数或压缩参数的参数值小于语音类型的音频数据对应的拉伸参数或压缩参数的参数值。这样,对于语音类型的音频数据,适当的做一些拉伸或压缩处理,从而在一定程度上减少网络抖动时丢包引起的不能及时输出语音或输出的语音过多的情况;对于音乐类型的音频数据,由于对音调的准确度要求较高,所以可以不进行或进行更小程度的拉伸或压缩处理,从而优化音乐播放的音效。
本发明实施例中,在语音通信的过程中,可以基于音频数据的类型信息和目标音频处理算法,判断是否对所述目标音频处理算法进行调整,从而可以在处理某些类型的音频数据时,对目标音频处理算法进行调整,以达到较佳的处理效果,提高语音通信质量。
本发明实施例还提供的一种处理音频数据的方法,如图6所示,该处理音频数据的方法可以包括:
步骤601,获取待处理的音频数据。
该步骤的处理过程可以参见上述步骤401,此处不再赘述。
步骤602,确定待使用的目标音频处理算法和音频数据的类型信息。
该步骤的处理过程可以参见上述步骤402,此处不再赘述。
步骤603,基于类型信息确定调整系数。
该步骤的处理过程可以参见上述步骤501,此处不再赘述。
步骤604,基于目标音频处理算法确定需要进行参数值调整的目标参数。
该步骤的处理过程可以参见上述步骤502,此处不再赘述。
其中,目标参数包括基于目标音频处理算法进行算法处理过程中的中间参数;或者,目标参数也可以包括基于目标音频处理算法进行算法处理过程中的初始参数。
步骤605,基于调整系数对目标参数的参数值进行调整。
在实施中,终端确定出调整系数,以及需要进行参数值调整的目标参数后,可以用目标参数的参数值,乘以该调整系数,从而对目标音频处理算法进行调整。在对不同类型的音频数据进行处理时,用于调整各音频处理算法的调整系数是不同的,本实施例提供了对不同的音频处理算法进行调整的说明,具体如下:
一、对于目标音频处理算法包括ANS算法的情况,中间参数可以为基于ANS算法和音频数据确定出的噪声的噪声参数。
其中,如果是对音频数据进行时域上的调整,则噪声的噪声参数可以为噪声的噪声值,如果是对音频数据进行频域上的调整,则噪声的噪声参数可以为噪声的频谱系数和/或频谱幅度。
在对ANS算法进行调整时,如果类型信息为话音活动帧类型,则基于预设的第一调整系数对噪声的噪声参数进行调整,如果类型信息为非话音活动帧类型,则基于预设的第二调整系数对噪声的噪声参数进行调整,第一调整系数小于第二调整系数。
例如,类型信息为1,表示音频数据的类型为话音活动帧,对应的第一调整系数为0.7,类型信息为2,表示音频数据的类型为非话音活动帧,对应的第二调整系数为1。如果该音频数据为话音活动帧类型的信号,则可以用该噪声的噪声参数的乘以0.7,得到调整后的噪声。如果该音频数据为非话音活动帧类型的信号,则可以用该噪声的噪声参数乘以1,得到调整后的噪声。这样,对于话音活动帧类型的音频数据,可以按照正常方式滤掉噪声,从而提高语音通信过程中的语音清晰度,同时又不会削弱语音信号;而对于非话音活动帧类型的音频数据,可以滤掉较多的噪声,从而避免在无人说话时,用户听到杂音的情况。
如果类型信息为语音类型,则基于预设的第三调整系数对噪声的噪声参数进行调整,如果类型信息为音乐类型,则基于预设的第四调整系数对噪声的噪声参数进行调整,第三调整系数大于第四调整系数,第三调整系数可以小于等于第二调整系数。
例如,语音类型的类型信息为1,对应的第三调整系数为0.7;音乐类型的类型信息为0,对应的第四调整系数为0.3,如果该音频数据为语音类型的信号,则可以用该噪声的噪声参数乘以0.7,得到调整后的噪声,如果该音频数据为音乐类型的信号,则可以用该噪声的噪声参数乘以0.3,得到调整后的噪声。这样,对于语音类型的音频数据,可以滤掉相对较多的噪声,从而提高语音通信过程中的语音清晰度;而对于音乐类型的音频数据,可以滤掉相对较少的噪声,从而优化音乐播放的音效。
二、对于目标音频处理算法包括自适应回声消除AEC算法,中间参数包括基于AEC算法和音频数据确定出的回声的回声参数。
其中,回声的回声参数可以是回声参数的参数值。
在对AEC算法进行调整时,如果类型信息为话音活动帧类型,则基于预设的第五调整系数对回声的回声参数进行调整,如果类型信息为非话音活动帧类型,则基于预设的第六调整系数对回声的回声参数进行调整,第五调整系数小于第六调整系数。
例如,类型信息为1,表示音频数据的类型为话音活动帧,对应的第五调整系数为0.7,类型信息为2,表示音频数据的类型为非话音活动帧,对应的第六调整系数为1。如果该音 频数据为话音活动帧类型的信号,则可以用该回声的回声参数乘以0.7,得到调整后的回声。如果该音频数据为非话音活动帧类型的信号,则可以用该回声的回声参数乘以1,得到调整后的回声。这样,对于话音活动帧类型的音频数据,可以按照正常方式滤掉回声,提高语音通信过程中的语音清晰度,同时又不会削弱语音信号;而对于非话音活动帧类型的音频数据,可以滤掉较多的回声,从而避免在无人说话时,用户听到杂音的情况。
如果类型信息为语音类型,则基于预设的第七调整系数对回声的回声参数进行调整,如果类型信息为音乐类型,则基于预设的第八调整系数对回声的回声参数进行调整,第七调整系数可以大于第八调整系数,第七调整系数可以小于第六调整系数。
例如,语音类型的类型信息为1,对应的第七调整系数可以为0.7;音乐类型的类型信息为0,对应的第八调整系数可以为0.3,如果该音频数据为语音类型的信号,则可以用该回声的回声参数乘以0.7,得到调整后的回声。如果该音频数据为音乐类型的信号,则可以用该回声的回声参数乘以0.3,得到调整后的回声。这样,对于语音类型的音频数据,可以滤掉相对较多的回声,提高语音通信过程中的语音清晰度;而对于音乐类型的音频数据,可以滤掉相对较少分类的回声,以避免滤掉音频数据中的有用信号,从而优化音乐播放的音效。
三、对于目标音频处理算法包括自动增益控制AGC算法,中间参数可以包括基于AGC算法和音频数据确定出的衰减增益因子。
在对AGC算法进行调整时,如果类型信息为话音活动帧类型,则基于预设的第九调整系数对衰减增益因子进行调整,如果类型信息为非话音活动帧类型,则基于预设的第十调整系数对衰减增益因子进行调整,第九调整系数大于第十调整系数。
例如,类型信息为1,表示音频数据的类型为话音活动帧,对应的第九调整系数为1,类型信息为2,表示音频数据的类型为非话音活动帧,对应的第十调整系数为0。如果该音频数据为话音活动帧类型的信号,则可以用衰减增益因子乘以1,得到调整后的衰减增益因子。如果该音频数据为非话音活动帧类型的信号,则可以用衰减增益因子乘以0,得到调整后的衰减增益因子。这样,对于话音活动帧类型的音频数据,可以按照正常方式进行增益调整,使语音通信过程中的音量保持一致;而对于非话音活动帧类型的音频数据,可以不进行增益调整,从而节约处理资源。
如果类型信息为语音类型,则基于预设的第十一调整系数对衰减增益因子进行调整,如果类型信息为音乐类型,则基于预设的第十二调整系数对衰减增益因子进行调整,第十一调整系数大于第十二调整系数,第十二调整系数可以大于第十调整系数。
例如,语音类型的类型信息为1,对应的第十一调整系数为0.7;音乐类型的类型信息为0,对应的第十二调整系数为0.3,如果该音频数据为语音类型的信号,则可以用衰减增益因子乘以0.7,得到调整后的衰减增益因子。如果该音频数据为音乐类型的信号,则可以用衰减增益因子乘以0.3,得到调整后的衰减增益因子。这样,对于语音类型的音频数据,可以进行适当的增益调整,使语音通信过程中的音量保持一致;而对于音乐类型的音频数据,可以降低增益的调整幅度,使得原音频中各帧音频数据的能量基本不变,提高音乐播放的还原度。
四、目标音频处理算法包括JBM算法,初始参数包括音频数据的缓存深度。
在对JBM算法进行调整时,如果类型信息为话音活动帧类型,则基于预设的第十三调 整系数对缓存深度进行调整,如果类型信息为非话音活动帧类型,则基于预设的第十四调整系数对缓存深度进行调整,第十三调整系数大于第十四调整系数。
例如,类型信息为1,表示音频数据的类型为话音活动帧,对应的第十三调整系数为1,类型信息为2,表示音频数据的类型为非话音活动帧,对应的第十四调整系数为0.5,JBM算法中的缓存深度为10帧,如果该音频数据为话音活动帧类型的信号,则可以确定调整系数可以为1,将JBM算法中的缓存深度乘以1,也即,调整后的JBM算法中的缓存深度为10帧。如果该音频数据为非话音活动帧类型的信号,则可以确定调整系数可以为0.5,将JBM算法中的缓存深度乘以0.5,也即,调整后的JBM算法中的缓存深度为5帧。基于JBM算法对音频数据进行处理时,发送端和接收端会存在一定的时延,而基于上述处理,对于非话音活动帧的音频数据,接收端可以只缓存较少的音频数据,从而可以减小发送端和接收端的时延,提高用户体验。
五、目标音频处理算法包括TSM算法,初始参数包括音频数据的拉伸参数或压缩参数。
在对TSM算法进行调整时,如果类型信息为话音活动帧类型,则基于预设的第十五调整系数对拉伸参数或压缩参数进行调整,如果类型信息为非话音活动帧类型,则基于预设的第十六调整系数对拉伸参数或压缩参数进行调整,第十五调整系数小于第十六调整系数。
例如,类型信息为1,表示音频数据的类型为话音活动帧,对应的第十五调整系数为0,类型信息为2,表示音频数据的类型为非话音活动帧,对应的第十六调整系数为1,如果该音频数据为话音活动帧类型的信号,则可以确定调整系数可以为0,将TSM算法中的拉伸参数或压缩参数的参数值乘以0,也即,可以不对该音频数据进行拉伸处理或压缩处理,从而保证通话声音不变调。如果该音频数据为非话音活动帧类型的信号,则可以确定调整系数可以为1,将TSM算法中的拉伸参数或压缩参数的参数值乘以1,然后根据调整后的TSM算法,对该音频数据进行处理,得到处理后的音频数据。这样,可以降低对话音活动帧的音频数据的拉伸或压缩程度,避免用户听到变调的音频,同时,对于非话音活动帧的音频数据进行正常的TSM处理,可以减少网络抖动时丢包引起的不能及时输出语音或输出的语音过多的情况。
如果类型信息为语音类型,则基于预设的第十七调整系数对噪声的噪声参数进行调整,如果类型信息为音乐类型,则基于预设的第十八调整系数对噪声的噪声参数进行调整,第十七调整系数大于第十八调整系数,第十七调整系数可以小于第十六调整系数。
例如,语音类型的类型信息可以为1,对应的第十七调整系数可以为0.7;音乐类型的类型信息为0,对应的第十八调整系数可以为0,如果该音频数据为语音类型的信号,则可以确定调整系数可以为0.7,用拉伸参数或压缩参数的参数值乘以0.7,然后再对该音频数据进行处理。如果该音频数据为音乐类型的信号,则可以确定调整系数可以为0,用拉伸参数或压缩参数的参数值乘以0,然后再对该音频数据进行处理。这样,对于语音类型的音频数据,适当的做一些拉伸或压缩处理,从而在一定程度上减少网络抖动时丢包引起的不能及时输出语音或输出的语音过多的情况;对于音乐类型的音频数据,由于对音调的准确度要求较高,所以可以不进行拉伸或压缩处理,从而优化音乐播放的音效。
步骤606,基于目标参数调整后的参数值,对音频数据进行算法处理。
该步骤的处理过程可以参见上述步骤404的相关说明,此处不再赘述。
本发明实施例中,在语音通信的过程中,可以先确定待处理的音频数据的类型信息, 然后根据该类型信息,确定用于调整音频数据的调整系数,进而根据目标音频处理算法和调整系数,对音频数据进行处理,并输出处理后的音频数据,这样,对于不同的类型的音频数据,可以进行不同的音频处理,从而可以提高语音通信质量。
图7是本发明实施例提供的一种处理音频数据的装置的结构方框图,该装置可以通过软件、硬件或者两者的结合实现成为终端的部分或者全部。
该装置包括:获取单元701、确定单元702、判断单元703、调整单元704和处理单元705。
获取单元701用于执行上述实施例中的步骤401及其可选方案。
确定单元702用于执行上述实施例中的步骤402及其可选方案。
判断单元703用于执行上述实施例中的步骤403及其可选方案。
调整单元704用于执行上述实施例中的步骤404及其可选方案。
处理单元705用于执行上述实施例中的步骤405及其可选方案。
本发明实施例中,在语音通信的过程中,可以基于音频数据的类型信息和目标音频处理算法,判断是否对所述目标音频处理算法进行调整,从而可以在处理某些类型的音频数据时,对目标音频处理算法进行调整,以达到较佳的处理效果,提高语音通信质量。
图8是本发明实施例提供的一种处理音频数据的装置的结构方框图,该装置可以通过软件、硬件或者两者的结合实现成为终端的部分或者全部。
该装置包括:获取单元801、确定单元802、调整单元803和处理单元804。
获取单元801用于执行上述实施例中的步骤601及其可选方案。
确定单元802用于执行上述实施例中的步骤602~604及其可选方案。
调整单元803用于执行上述实施例中的步骤605及其可选方案。
处理单元804用于执行上述实施例中的步骤606及其可选方案。
本发明实施例中,在语音通信的过程中,可以先确定待处理的音频数据的类型信息,然后根据该类型信息,确定用于调整音频数据的调整系数,进而根据目标音频处理算法和调整系数,对音频数据进行处理,并输出处理后的音频数据,这样,对于不同的类型的音频数据,可以进行不同的音频处理,从而可以提高语音通信质量。
本申请中术语“和/或”,仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。另外,本文中字符“/”,一般表示前后关联对象是一种“或”的关系。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (42)

  1. 一种处理音频数据的方法,其特征在于,所述方法包括:
    获取待处理的音频数据;
    确定待使用的目标音频处理算法和所述音频数据的类型信息;
    基于所述音频数据的类型信息和所述目标音频处理算法,判断是否对所述目标音频处理算法进行调整;
    如果判断对所述目标音频处理算法进行调整,则对所述目标音频处理算法进行调整,基于调整后的目标音频处理算法对所述音频数据进行处理;
    如果判断不对所述目标音频处理算法进行调整,则基于所述目标音频处理算法对所述音频数据进行处理。
  2. 根据权利要求1所述的方法,其特征在于,所述对所述目标音频处理算法进行调整,包括:
    基于所述类型信息确定调整系数;
    基于所述目标音频处理算法确定需要进行参数值调整的目标参数;
    基于所述调整系数对所述目标参数的参数值进行调整。
  3. 根据权利要求2所述的方法,其特征在于,所述目标参数包括基于所述目标音频处理算法进行处理过程中的中间参数。
  4. 根据权利要求3所述的方法,其特征在于,所述目标音频处理算法包括自动噪声抑制ANS算法,所述中间参数包括基于所述ANS算法和所述音频数据确定出的噪声的噪声参数。
  5. 根据权利要求3或4所述的方法,其特征在于,所述目标音频处理算法包括自动增益控制AGC算法,所述中间参数包括基于所述AGC算法和所述音频数据确定出的衰减增益因子。
  6. 根据权利要求3-5任一所述的方法,其特征在于,所述目标音频处理算法包括自适应回声消除AEC算法,所述中间参数包括基于所述AEC算法和所述音频数据确定出的回声的回声参数。
  7. 根据权利要求2所述的方法,其特征在于,所述目标参数包括基于所述目标音频处理算法进行处理过程中的初始参数。
  8. 根据权利要求7所述的方法,其特征在于,所述目标音频处理算法包括抖动缓存管理JBM算法,所述初始参数包括音频数据的缓存深度。
  9. 根据权利要求7或8所述的方法,其特征在于,所述目标音频处理算法包括时间尺度调整TSM算法,所述初始参数包括音频数据的拉伸参数或压缩参数。
  10. 根据权利要求1-9任一所述的方法,其特征在于,所述基于所述音频数据的类型信息和所述目标音频处理算法,判断是否对所述目标音频处理算法进行调整,包括:
    当所述目标音频处理算法为ANS算法时,如果所述类型信息为非话音活动帧类型,则判断对所述ANS算法进行调整;如果所述类型信息为话音活动帧类型,则判断对所述ANS算法不进行调整;
    当所述目标音频处理算法为ANS算法时,如果所述类型信息为音乐类型,则判断对所述ANS算法进行调整;如果所述类型信息为语音类型,则判断对所述ANS算法不进行调整;
    当所述目标音频处理算法为AGC算法时,如果所述类型信息为非话音活动帧类型,则判 断对所述AGC算法进行调整;如果所述类型信息为话音活动帧类型,则判断对所述AGC算法不进行调整;
    当所述目标音频处理算法为AGC算法时,如果所述类型信息为音乐类型,则判断对所述AGC算法进行调整;如果所述类型信息为语音类型,则判断对所述AGC算法不进行调整;
    当所述目标音频处理算法为AEC算法时,如果所述类型信息为非话音活动帧类型,则判断对所述AEC算法进行调整;如果所述类型信息为话音活动帧类型,则判断对所述AEC算法不进行调整;
    当所述目标音频处理算法为AEC算法时,如果所述类型信息为音乐类型,则判断对所述AEC算法进行调整;如果所述类型信息为语音类型,则判断对所述AEC算法不进行调整;
    当所述目标音频处理算法为JBM算法时,如果所述类型信息为非话音活动帧类型,则判断对所述JBM算法进行调整;如果所述类型信息为话音活动帧类型,则判断对所述JBM算法不进行调整;或
    当所述目标音频处理算法为TSM算法时,如果所述类型信息为话音活动帧类型,则判断对所述TSM算法进行调整;如果所述类型信息为非话音活动帧类型,则判断所述TSM算法不进行调整。
  11. 一种处理音频数据的方法,其特征在于,所述方法包括:
    获取待处理的音频数据;
    确定待使用的目标音频处理算法和所述音频数据的类型信息;
    基于所述类型信息确定调整系数;
    基于所述目标音频处理算法确定需要进行参数值调整的目标参数;
    基于所述调整系数对所述目标参数的参数值进行调整;
    基于所述目标参数调整后的参数值,对所述音频数据进行处理。
  12. 根据权利要求11所述的方法,其特征在于,所述目标参数包括基于所述目标音频处理算法进行处理过程中的中间参数。
  13. 根据权利要求12所述的方法,其特征在于,所述目标音频处理算法包括自动噪声抑制ANS算法,所述中间参数包括基于所述ANS算法和所述音频数据确定出的噪声的噪声参数。
  14. 根据权利要求12或13所述的方法,其特征在于,所述目标音频处理算法包括自动增益控制AGC算法,所述中间参数包括基于所述AGC算法和所述音频数据确定出的衰减增益因子。
  15. 根据权利要求12-14任一所述的方法,其特征在于,所述目标音频处理算法包括自适应回声消除AEC算法,所述中间参数包括基于所述AEC算法和所述音频数据确定出的回声的回声参数。
  16. 根据权利要求13-15所述的方法,其特征在于,所述基于所述调整系数对所述目标参数的参数值进行调整,包括:
    当所述目标音频处理算法为ANS算法时,如果所述类型信息为话音活动帧类型,则基于预设的第一调整系数对所述噪声的噪声参数进行调整,如果所述类型信息为非话音活动帧类型,则基于预设的第二调整系数对所述噪声的噪声参数进行调整,所述第一调整系数小于所述第二调整系数;
    当所述目标音频处理算法为ANS算法时,如果所述类型信息为语音类型,则基于预设的第三调整系数对所述噪声的噪声参数进行调整,如果所述类型信息为音乐类型,则基于预设的第四调整系数对所述噪声的噪声参数进行调整,所述第三调整系数大于所述第四调整系数;
    当所述目标音频处理算法为AEC算法时,如果所述类型信息为话音活动帧类型,则基于预设的第五调整系数对所述回声的回声参数进行调整,如果所述类型信息为非话音活动帧类型,则基于预设的第六调整系数对所述回声的回声参数进行调整,所述第五调整系数小于所述第六调整系数;
    当所述目标音频处理算法为AEC算法时,如果所述类型信息为语音类型,则基于预设的第七调整系数对所述回声的回声参数进行调整,如果所述类型信息为音乐类型,则基于预设的第八调整系数对所述回声的回声参数进行调整,所述第七调整系数大于所述第八调整系数;
    当所述目标音频处理算法为AGC算法时,如果所述类型信息为话音活动帧类型,则基于预设的第九调整系数对所述衰减增益因子进行调整,如果所述类型信息为非话音活动帧类型,则基于预设的第十调整系数对所述衰减增益因子进行调整,所述第九调整系数大于所述第十调整系数;或
    当所述目标音频处理算法为AGC算法时,如果所述类型信息为语音类型,则基于预设的第十一调整系数对所述衰减增益因子进行调整,如果所述类型信息为音乐类型,则基于预设的第十二调整系数对所述衰减增益因子进行调整,所述第十一调整系数大于所述第十二调整系数。
  17. 根据权利要求11-16任一所述的方法,其特征在于,所述目标参数包括基于所述目标音频处理算法进行处理过程中的初始参数。
  18. 根据权利要求17所述的方法,其特征在于,所述目标音频处理算法包括抖动缓存管理JBM算法,所述初始参数包括音频数据的缓存深度。
  19. 根据权利要求17或18所述的方法,其特征在于,所述目标音频处理算法包括时间尺度调整TSM算法,所述初始参数包括音频数据的拉伸参数或压缩参数。
  20. 根据权利要求18或19所述的方法,其特征在于,所述基于所述调整系数对所述目标参数的参数值进行调整,包括:
    当所述目标音频处理算法为JBM算法时,如果所述类型信息为话音活动帧类型,则基于预设的第十三调整系数对所述缓存深度进行调整,如果所述类型信息为非话音活动帧类型,则基于预设的第十四调整系数对所述缓存深度进行调整,所述第十三调整系数大于所述第十四调整系数;
    当所述目标音频处理算法为TSM算法时,如果所述类型信息为话音活动帧类型,则基于预设的第十五调整系数对所述拉伸参数或压缩参数进行调整,如果所述类型信息为非话音活动帧类型,则基于预设的第十六调整系数对所述拉伸参数或压缩参数进行调整,所述第十五调整系数小于所述第十六调整系数;或
    当所述目标音频处理算法为TSM算法时,如果所述类型信息为语音类型,则基于预设的第十七调整系数对所述噪声进行调整,如果所述类型信息为音乐类型,则基于预设的第十八调整系数对所述噪声进行调整,所述第十七调整系数大于所述第十八调整系数。
  21. 一种处理音频数据的装置,其特征在于,所述装置包括:
    获取单元,用于获取待处理的音频数据;
    确定单元,用于确定待使用的目标音频处理算法和所述音频数据的类型信息;
    判断单元,用于基于所述音频数据的类型信息和所述目标音频处理算法,判断是否对所述目标音频处理算法进行调整;
    调整单元,用于如果判断对所述目标音频处理算法进行调整,则对所述目标音频处理算法进行调整,基于调整后的目标音频处理算法对所述音频数据进行处理;
    处理单元,用于如果判断不对所述目标音频处理算法进行调整,则基于所述目标音频处理算法对所述音频数据进行处理。
  22. 根据权利要求21所述的装置,其特征在于,所述调整单元,用于:
    基于所述类型信息确定调整系数;
    基于所述目标音频处理算法确定需要进行参数值调整的目标参数;
    基于所述调整系数对所述目标参数的参数值进行调整。
  23. 根据权利要求22所述的装置,其特征在于,所述目标参数包括基于所述目标音频处理算法进行处理过程中的中间参数。
  24. 根据权利要求23所述的装置,其特征在于,所述目标音频处理算法包括自动噪声抑制ANS算法,所述中间参数包括基于所述ANS算法和所述音频数据确定出的噪声的噪声参数。
  25. 根据权利要求23或24所述的装置,其特征在于,所述目标音频处理算法包括自动增益控制AGC算法,所述中间参数包括基于所述AGC算法和所述音频数据确定出的衰减增益因子。
  26. 根据权利要求23-25任一所述的装置,其特征在于,所述目标音频处理算法包括自适应回声消除AEC算法,所述中间参数包括基于所述AEC算法和所述音频数据确定出的回声的回声参数。
  27. 根据权利要求22所述的装置,其特征在于,所述目标参数包括基于所述目标音频处理算法进行处理过程中的初始参数。
  28. 根据权利要求27所述的装置,其特征在于,所述目标音频处理算法包括抖动缓存管理JBM算法,所述初始参数包括音频数据的缓存深度。
  29. 根据权利要求27或28所述的装置,其特征在于,所述目标音频处理算法包括时间尺度调整TSM算法,所述初始参数包括音频数据的拉伸参数或压缩参数。
  30. 根据权利要求21-29任一所述的装置,其特征在于,所述判断单元,用于:
    当所述目标音频处理算法为ANS算法时,如果所述类型信息为非话音活动帧类型,则判断对所述ANS算法进行调整;如果所述类型信息为话音活动帧类型,则判断对所述ANS算法不进行调整;
    当所述目标音频处理算法为ANS算法时,如果所述类型信息为音乐类型,则判断对所述ANS算法进行调整;如果所述类型信息为语音类型,则判断对所述ANS算法不进行调整;
    当所述目标音频处理算法为AGC算法时,如果所述类型信息为非话音活动帧类型,则判断对所述AGC算法进行调整;如果所述类型信息为话音活动帧类型,则判断对所述AGC算法不进行调整;
    当所述目标音频处理算法为AGC算法时,如果所述类型信息为音乐类型,则判断对所述AGC算法进行调整;如果所述类型信息为语音类型,则判断对所述AGC算法不进行调整;
    当所述目标音频处理算法为AEC算法时,如果所述类型信息为非话音活动帧类型,则判断对所述AEC算法进行调整;如果所述类型信息为话音活动帧类型,则判断对所述AEC算法不进行调整;
    当所述目标音频处理算法为AEC算法时,如果所述类型信息为音乐类型,则判断对所述AEC算法进行调整;如果所述类型信息为语音类型,则判断对所述AEC算法不进行调整;
    当所述目标音频处理算法为JBM算法时,如果所述类型信息为非话音活动帧类型,则判断对所述JBM算法进行调整;如果所述类型信息为话音活动帧类型,则判断对所述JBM算法不进行调整;或
    当所述目标音频处理算法为TSM算法时,如果所述类型信息为话音活动帧类型,则判断对所述TSM算法进行调整;如果所述类型信息为非话音活动帧类型,则判断所述TSM算法不进行调整。
  31. 一种处理音频数据的装置,其特征在于,所述装置包括:
    获取单元,用于获取待处理的音频数据;
    确定单元,用于确定待使用的目标音频处理算法和所述音频数据的类型信息;
    所述确定单元,还用于基于所述类型信息确定调整系数;
    所述确定单元,还用于基于所述目标音频处理算法确定需要进行参数值调整的目标参数;
    调整单元,用于基于所述调整系数对所述目标参数的参数值进行调整;
    处理单元,用于基于所述目标参数调整后的参数值,对所述音频数据进行处理。
  32. 根据权利要求31所述的装置,其特征在于,所述目标参数包括基于所述目标音频处理算法进行处理过程中的中间参数。
  33. 根据权利要求32所述的装置,其特征在于,所述目标音频处理算法包括自动噪声抑制ANS算法,所述中间参数包括基于所述ANS算法和所述音频数据确定出的噪声的噪声参数。
  34. 根据权利要求32或33所述的装置,其特征在于,所述目标音频处理算法包括自动增益控制AGC算法,所述中间参数包括基于所述AGC算法和所述音频数据确定出的衰减增益因子。
  35. 根据权利要求32-34任一所述的装置,其特征在于,所述目标音频处理算法包括自适应回声消除AEC算法,所述中间参数包括基于所述AEC算法和所述音频数据确定出的回声的回声参数。
  36. 根据权利要求33-35所述的装置,其特征在于,所述调整单元,用于:
    当所述目标音频处理算法为ANS算法时,如果所述类型信息为话音活动帧类型,则基于预设的第一调整系数对所述噪声的噪声参数进行调整,如果所述类型信息为非话音活动帧类型,则基于预设的第二调整系数对所述噪声的噪声参数进行调整,所述第一调整系数小于所述第二调整系数;
    当所述目标音频处理算法为ANS算法时,如果所述类型信息为语音类型,则基于预设的第三调整系数对所述噪声的噪声参数进行调整,如果所述类型信息为音乐类型,则基于预设的第四调整系数对所述噪声的噪声参数进行调整,所述第三调整系数大于所述第四调整系数;
    当所述目标音频处理算法为AEC算法时,如果所述类型信息为话音活动帧类型,则基于预设的第五调整系数对所述回声的回声参数进行调整,如果所述类型信息为非话音活动帧类 型,则基于预设的第六调整系数对所述回声的回声参数进行调整,所述第五调整系数小于所述第六调整系数;
    当所述目标音频处理算法为AEC算法时,如果所述类型信息为语音类型,则基于预设的第七调整系数对所述回声的回声参数进行调整,如果所述类型信息为音乐类型,则基于预设的第八调整系数对所述回声的回声参数进行调整,所述第七调整系数大于所述第八调整系数;
    当所述目标音频处理算法为AGC算法时,如果所述类型信息为话音活动帧类型,则基于预设的第九调整系数对所述衰减增益因子进行调整,如果所述类型信息为非话音活动帧类型,则基于预设的第十调整系数对所述衰减增益因子进行调整,所述第九调整系数大于所述第十调整系数;或
    当所述目标音频处理算法为AGC算法时,如果所述类型信息为语音类型,则基于预设的第十一调整系数对所述衰减增益因子进行调整,如果所述类型信息为音乐类型,则基于预设的第十二调整系数对所述衰减增益因子进行调整,所述第十一调整系数大于所述第十二调整系数。
  37. 根据权利要求31-36任一所述的装置,其特征在于,所述目标参数包括基于所述目标音频处理算法进行处理过程中的初始参数。
  38. 根据权利要求37所述的装置,其特征在于,所述目标音频处理算法包括抖动缓存管理JBM算法,所述初始参数包括音频数据的缓存深度。
  39. 根据权利要求37或38所述的装置,其特征在于,所述目标音频处理算法包括时间尺度调整TSM算法,所述初始参数包括音频数据的拉伸参数或压缩参数。
  40. 根据权利要求38或39所述的装置,其特征在于,所述调整单元,用于:
    当所述目标音频处理算法为JBM算法时,如果所述类型信息为话音活动帧类型,则基于预设的第十三调整系数对所述缓存深度进行调整,如果所述类型信息为非话音活动帧类型,则基于预设的第十四调整系数对所述缓存深度进行调整,所述第十三调整系数大于所述第十四调整系数;
    当所述目标音频处理算法为TSM算法时,如果所述类型信息为话音活动帧类型,则基于预设的第十五调整系数对所述拉伸参数或压缩参数进行调整,如果所述类型信息为非话音活动帧类型,则基于预设的第十六调整系数对所述拉伸参数或压缩参数进行调整,所述第十五调整系数小于所述第十六调整系数;或
    当所述目标音频处理算法为TSM算法时,如果所述类型信息为语音类型,则基于预设的第十七调整系数对所述噪声进行调整,如果所述类型信息为音乐类型,则基于预设的第十八调整系数对所述噪声进行调整,所述第十七调整系数大于所述第十八调整系数。
  41. 一种计算机存储介质,其特征在于,所述存储介质上存储计算机程序,所述计算机程序被处理器执行时实现以下步骤:
    获取待处理的音频数据;
    确定待使用的目标音频处理算法和所述音频数据的类型信息;
    基于所述音频数据的类型信息和所述目标音频处理算法,判断是否对所述目标音频处理算法进行调整;
    如果判断对所述目标音频处理算法进行调整,则对所述目标音频处理算法进行调整,基 于调整后的目标音频处理算法对所述音频数据进行处理;
    如果判断不对所述目标音频处理算法进行调整,则基于所述目标音频处理算法对所述音频数据进行处理。
  42. 一种计算机存储介质,其特征在于,所述存储介质上存储计算机程序,所述计算机程序被处理器执行时实现以下步骤:
    获取待处理的音频数据;
    确定待使用的目标音频处理算法和所述音频数据的类型信息;
    基于所述类型信息确定调整系数;
    基于所述目标音频处理算法确定需要进行参数值调整的目标参数;
    基于所述调整系数对所述目标参数的参数值进行调整;
    基于所述目标参数调整后的参数值,对所述音频数据进行处理。
PCT/CN2017/098350 2016-11-30 2017-08-21 一种处理音频数据的方法和装置 WO2018099143A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201611080131.0 2016-11-30
CN201611080131.0A CN108133712B (zh) 2016-11-30 2016-11-30 一种处理音频数据的方法和装置

Publications (1)

Publication Number Publication Date
WO2018099143A1 true WO2018099143A1 (zh) 2018-06-07

Family

ID=62242769

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/098350 WO2018099143A1 (zh) 2016-11-30 2017-08-21 一种处理音频数据的方法和装置

Country Status (2)

Country Link
CN (1) CN108133712B (zh)
WO (1) WO2018099143A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113113046A (zh) * 2021-04-14 2021-07-13 杭州朗和科技有限公司 音频处理的性能检测方法、装置、存储介质及电子设备

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111402910B (zh) * 2018-12-17 2023-09-01 华为技术有限公司 一种消除回声的方法和设备
CN111883171B (zh) * 2020-04-08 2023-09-22 珠海市杰理科技股份有限公司 音频信号的处理方法及系统、音频处理芯片、蓝牙设备
CN114003193B (zh) * 2020-07-28 2023-10-17 宏碁股份有限公司 电子装置与声音模式调整方法
CN114006890B (zh) * 2021-10-26 2024-02-06 深圳Tcl新技术有限公司 一种数据传输方法、设备及存储介质和终端设备

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004036551A1 (en) * 2002-10-14 2004-04-29 Widerthan.Com Co., Ltd. Preprocessing of digital audio data for mobile audio codecs
CN1964187A (zh) * 2005-11-11 2007-05-16 鸿富锦精密工业(深圳)有限公司 音量管理系统及方法
CN101009099A (zh) * 2007-01-26 2007-08-01 北京中星微电子有限公司 数字自动增益控制方法及装置
CN101060315A (zh) * 2006-04-21 2007-10-24 鸿富锦精密工业(深圳)有限公司 音量管理系统及方法
CN102985967A (zh) * 2010-11-02 2013-03-20 谷歌公司 自适应音频代码转换
CN104200810A (zh) * 2014-08-29 2014-12-10 无锡中星微电子有限公司 自动增益控制装置及方法

Family Cites Families (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4110734B2 (ja) * 2000-11-27 2008-07-02 沖電気工業株式会社 音声パケット通信の品質制御装置
CN101404160B (zh) * 2008-11-21 2011-05-04 北京科技大学 一种基于音频识别的语音降噪方法
CN103634439B (zh) * 2012-08-21 2016-12-21 佛山市爱翔电器有限公司 降噪处理系统
US20150179181A1 (en) * 2013-12-20 2015-06-25 Microsoft Corporation Adapting audio based upon detected environmental accoustics
JP6233103B2 (ja) * 2014-03-05 2017-11-22 富士通株式会社 音声合成装置、音声合成方法及び音声合成プログラム
US20150327035A1 (en) * 2014-05-12 2015-11-12 Intel Corporation Far-end context dependent pre-processing
CN105336339B (zh) * 2014-06-03 2019-05-03 华为技术有限公司 一种语音频信号的处理方法和装置
EP2960899A1 (en) * 2014-06-25 2015-12-30 Thomson Licensing Method of singing voice separation from an audio mixture and corresponding apparatus
JP2016035501A (ja) * 2014-08-01 2016-03-17 富士通株式会社 音声符号化装置、音声符号化方法、音声符号化用コンピュータプログラム、音声復号装置、音声復号方法及び音声復号用コンピュータプログラム
DE102015204253B4 (de) * 2015-03-10 2016-11-10 Sivantos Pte. Ltd. Verfahren zur frequenzabhängigen Rauschunterdrückung eines Eingangssignals sowie Hörgerät
US9489963B2 (en) * 2015-03-16 2016-11-08 Qualcomm Technologies International, Ltd. Correlation-based two microphone algorithm for noise reduction in reverberation
JP6511897B2 (ja) * 2015-03-24 2019-05-15 株式会社Jvcケンウッド 雑音低減装置、雑音低減方法及びプログラム
CN106157963B (zh) * 2015-04-08 2019-10-15 质音通讯科技(深圳)有限公司 一种音频信号的降噪处理方法和装置及电子设备
CN105654962B (zh) * 2015-05-18 2020-01-10 宇龙计算机通信科技(深圳)有限公司 信号处理方法、装置及电子设备

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2004036551A1 (en) * 2002-10-14 2004-04-29 Widerthan.Com Co., Ltd. Preprocessing of digital audio data for mobile audio codecs
CN1964187A (zh) * 2005-11-11 2007-05-16 鸿富锦精密工业(深圳)有限公司 音量管理系统及方法
CN101060315A (zh) * 2006-04-21 2007-10-24 鸿富锦精密工业(深圳)有限公司 音量管理系统及方法
CN101009099A (zh) * 2007-01-26 2007-08-01 北京中星微电子有限公司 数字自动增益控制方法及装置
CN102985967A (zh) * 2010-11-02 2013-03-20 谷歌公司 自适应音频代码转换
CN104200810A (zh) * 2014-08-29 2014-12-10 无锡中星微电子有限公司 自动增益控制装置及方法

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113113046A (zh) * 2021-04-14 2021-07-13 杭州朗和科技有限公司 音频处理的性能检测方法、装置、存储介质及电子设备
CN113113046B (zh) * 2021-04-14 2024-01-19 杭州网易智企科技有限公司 音频处理的性能检测方法、装置、存储介质及电子设备

Also Published As

Publication number Publication date
CN108133712A (zh) 2018-06-08
CN108133712B (zh) 2021-02-12

Similar Documents

Publication Publication Date Title
WO2018099143A1 (zh) 一种处理音频数据的方法和装置
US10186276B2 (en) Adaptive noise suppression for super wideband music
KR101970370B1 (ko) 오디오 신호의 처리 기법
KR101852892B1 (ko) 음성 인식 방법, 음성 인식 장치 및 전자 장치
JP5085556B2 (ja) エコー除去の構成
KR102317686B1 (ko) 잡음 환경에 적응적인 음성 신호 처리방법 및 장치
US20120263317A1 (en) Systems, methods, apparatus, and computer readable media for equalization
CA2766196C (en) Apparatus, method and computer program for controlling an acoustic signal
JP2008543194A (ja) オーディオ信号ゲイン制御装置及び方法
US20130329895A1 (en) Microphone occlusion detector
WO2018018705A1 (zh) 一种语音通话的方法、装置及终端
US11849274B2 (en) Systems, apparatus, and methods for acoustic transparency
US9491545B2 (en) Methods and devices for reverberation suppression
US9185506B1 (en) Comfort noise generation based on noise estimation
JP2008197200A (ja) 了解度自動調整装置及び了解度自動調整方法
AU2007349607A1 (en) Method of transmitting data in a communication system
KR20180040716A (ko) 음질 향상을 위한 신호 처리방법 및 장치
WO2013078677A1 (zh) 一种自适应调节音效的方法和设备
US20180151187A1 (en) Audio Signal Processing
US9392365B1 (en) Psychoacoustic hearing and masking thresholds-based noise compensator system
JP2022547860A (ja) コンテキスト適応の音声了解度を向上させる方法
US20240029755A1 (en) Intelligent speech or dialogue enhancement
CA3228059A1 (en) Method and device for limiting of output synthesis distortion in a sound codec
AU2012200349A1 (en) Method of transmitting data in a communication system

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17876026

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17876026

Country of ref document: EP

Kind code of ref document: A1