WO2022218252A1 - 音频信号中噪声响度的获取方法、装置和电子设备 - Google Patents

音频信号中噪声响度的获取方法、装置和电子设备 Download PDF

Info

Publication number
WO2022218252A1
WO2022218252A1 PCT/CN2022/086095 CN2022086095W WO2022218252A1 WO 2022218252 A1 WO2022218252 A1 WO 2022218252A1 CN 2022086095 W CN2022086095 W CN 2022086095W WO 2022218252 A1 WO2022218252 A1 WO 2022218252A1
Authority
WO
WIPO (PCT)
Prior art keywords
noise
power spectrum
audio frame
target
observation window
Prior art date
Application number
PCT/CN2022/086095
Other languages
English (en)
French (fr)
Inventor
吴晨晨
Original Assignee
维沃移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 维沃移动通信有限公司 filed Critical 维沃移动通信有限公司
Publication of WO2022218252A1 publication Critical patent/WO2022218252A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals

Definitions

  • the present application belongs to the technical field of audio signal processing, and in particular relates to a method, device and electronic device for acquiring the loudness of noise in an audio signal.
  • the voice enhancement algorithm can remove most of the interfering audio (ie, noise) in the call. Therefore, the voice enhancement algorithm is of great significance for improving the call quality of such electronic devices.
  • Noise estimation is one of the crucial links in speech enhancement.
  • Noise estimation refers to estimating the loudness of the noise (ie, the power spectrum of the noise) in the audio signal generated and transmitted during the voice call.
  • Accurate noise estimation of audio signals is a prerequisite for ensuring the effect of speech enhancement.
  • a deficiency in the related art is that the degree of deviation between the estimated value of the loudness of the noise in the audio signal and the true value is relatively large.
  • the purpose of the embodiments of the present application is to provide a method, device and electronic device for acquiring the loudness of noise in an audio signal, which can solve the technical problem of large deviation of the results of noise estimation when the minimum value tracking method is used for noise estimation.
  • an embodiment of the present application provides a method for acquiring noise loudness in an audio signal, including: acquiring N subband power spectra of N audio frames in a target audio signal; M target power spectrums in the power spectrum, obtain the noise power spectrum estimate corresponding to each audio frame in the N audio frames; perform smooth update processing on the noise power spectrum estimate; perform compensation and correction processing on the processed noise power spectrum estimate to obtain The noise loudness of the target audio signal; wherein, N is an integer greater than or equal to 1, and M is an integer greater than or equal to 2.
  • an embodiment of the present application provides a device for acquiring the loudness of noise in an audio signal, including: an acquisition module for acquiring N subband power spectra of N audio frames in a target audio signal; an estimation module for According to the M target power spectra in each sub-band power spectrum of the N sub-band power spectra obtained by the acquisition module, the noise power spectrum estimation corresponding to each audio frame in the N audio frames is obtained; the update module is used for the estimation module.
  • the noise power spectrum estimate is subjected to smooth update processing; the correction module is used to compensate and correct the noise power spectrum estimate processed by the update module to obtain the noise loudness of the target audio signal; wherein, N is an integer greater than or equal to 1, and M is Integer greater than or equal to 2.
  • an embodiment of the present application provides an electronic device, the electronic device includes a processor, a memory, and a program or instruction stored in the memory and executable on the processor.
  • the program or instruction is executed by the processor, the The steps of the method of the first aspect.
  • an embodiment of the present application provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, the steps of the method of the first aspect are implemented.
  • an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run programs or instructions to implement the method of the first aspect.
  • N subband power spectra in each subband power spectrum in the N subband power spectra can be obtained according to M target power spectra in each subband power spectrum.
  • the noise power spectrum estimate is smoothed and updated, and the processed noise power spectrum estimate is compensated and corrected to obtain the noise loudness of the target audio signal.
  • N is an integer greater than or equal to 1
  • M is an integer greater than or equal to 2.
  • the acquisition method provided in the embodiment of the present application performs statistics on two or more target power spectra, and thereby obtains a noise power spectrum estimate according to the two or more than two target power spectra.
  • the method provided by the embodiment of the present application can effectively reduce the deviation between the noise estimation (that is, the noise loudness obtained by estimation) and the true value of the noise, thereby improving the voice enhancement effect and voice call quality of the electronic device.
  • Fig. 1 is one of the step flow charts of the method for acquiring noise loudness in an audio signal according to an embodiment of the present application
  • FIG. 2 is the second flowchart of the steps of the method for acquiring the loudness of noise in an audio signal according to an embodiment of the present application
  • 3 is the third step flow chart of the method for acquiring noise loudness in an audio signal according to an embodiment of the present application
  • FIG. 5 is one of the schematic diagrams of the composition of an apparatus for acquiring noise loudness in an audio signal according to an embodiment of the present application
  • FIG. 6 is the second schematic diagram of the composition of the apparatus for acquiring the loudness of noise in an audio signal according to an embodiment of the present application
  • FIG. 8 is the second schematic diagram of the composition of the electronic device according to the embodiment of the present application.
  • first, second and the like in the description and claims of the present application are used to distinguish similar objects, and are not used to describe a specific order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments of the present application can be practiced in sequences other than those illustrated or described herein, and distinguish between “first”, “second”, etc.
  • the objects are usually of one type, and the number of objects is not limited.
  • the first object may be one or more than one.
  • “and/or” in the description and claims indicates at least one of the connected objects, and the character “/" generally indicates that the associated objects are in an "or” relationship.
  • noise estimation In order to achieve speech enhancement, it is necessary to estimate the loudness of the noise in the call (ie, noise estimation).
  • the methods for implementing noise estimation in the related art may include the following:
  • the first one is to realize noise estimation through time recursive averaging algorithm. This method cannot cover the situation that the ambient background noise of the speech segment changes. Therefore, this method has the disadvantage that the noise estimation is not timely.
  • the second method is to realize noise estimation based on histogram.
  • the statistical steps of this method are carried out within a fixed window length and need to be repeatedly calculated in all frequency bands. Therefore, this method has the disadvantage of a large amount of calculation.
  • the third is to achieve noise estimation through the minimum tracking (Minima-tracking Algorithms), which can achieve a rough estimation of the noise level by tracking the minimum power spectral band (ie, the minimum spectral power) of each audio frame.
  • the delay of this method is less and the amount of calculation is reasonable, and it is a relatively ideal noise estimation method.
  • the disadvantage of the minimum value tracking method is that the deviation of the estimated noise value from the true value of the noise is relatively large. Therefore, how to reduce the deviation degree of noise estimation using the minimum value tracking method is a technical problem to be solved urgently by those skilled in the art.
  • an embodiment of the present application provides a method for acquiring the loudness of noise in an audio signal.
  • the execution body of the method may be an acquisition device, and the acquisition device may be an electronic device, or an electronic device capable of implementing the method.
  • the functional modules and/or functional bodies of the acquisition method may be specifically determined according to actual usage requirements, which are not limited in the embodiments of the present application.
  • the obtaining device is an electronic device, that is, the executing subject of the obtaining method is an electronic device, for example, for illustration.
  • the electronic device in the embodiment of the present application may be a mobile phone, a tablet computer, a notebook computer, a handheld computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook, or a personal digital assistant (personal digital assistant).
  • assistant, PDA and other devices, or the electronic device may also be other types of electronic devices, which are not limited in the embodiments of the present application.
  • an embodiment of the present application provides a method for acquiring noise loudness in an audio signal, and the acquiring method includes the following S101 to S104:
  • the acquiring device acquires N subband power spectra of N audio frames in the target audio signal.
  • N is an integer greater than or equal to 1.
  • the above-mentioned voice call audio signal may be collected and obtained through a main microphone (Microphone, Mic) of the electronic device, or may be jointly collected and obtained through the main microphone and the auxiliary microphone of the electronic device.
  • a main microphone Microphone, Mic
  • the above-mentioned voice call audio signal may be collected and obtained through a main microphone (Microphone, Mic) of the electronic device, or may be jointly collected and obtained through the main microphone and the auxiliary microphone of the electronic device.
  • it can be determined according to actual use requirements, which is not limited in the embodiments of the present application.
  • the above-mentioned target audio signal may be a voice audio signal transmitted by an electrical signal, or may be an audio signal transmitted based on an Internet protocol.
  • the above-mentioned target audio signal is an audio signal in a voice call audio signal, and the target audio signal may include both a voice audio signal and a noise audio signal.
  • the target audio signal may be a part or all of the audio signal in the voice call audio signal. Specifically, it can be determined according to actual use requirements, which is not limited in the embodiment of the present application.
  • noise audio signals may include noise audio signals generated due to a noisy surrounding environment during a voice call, and may also include noise audio signals caused by unsatisfactory communication transmission quality during the transmission of the target audio signal.
  • the above-mentioned noise audio signal may be a steady-state noise audio signal.
  • the steady-state noise audio signal may be a noise audio signal whose repetition frequency is greater than a preset frequency (eg, 10 Hz), or may be a noise audio signal whose sound level fluctuates not greater than a preset decibel (eg, 3 dB) during the measurement.
  • N audio frames may be N continuous audio frames, or may be N non-consecutive audio frames.
  • the N audio frames may be some or all of the audio frames in the target audio signal. Specifically, it can be determined according to actual use requirements, which is not limited in the embodiment of the present application.
  • each of the above N audio frames may respectively correspond to one subband power spectrum, and the N audio frames may correspond to N subband power spectra.
  • each subband power spectrum in the above N subband power spectrums may be a logarithmic value (ie, a log value) of the subband spectrum power, respectively.
  • each subband power spectrum in the above N subband power spectra can be used to characterize the distribution of the audio signal power in the respective subband spectrum following the frequency change.
  • the obtaining device obtains an estimate of the noise power spectrum corresponding to each audio frame in the N audio frames according to the M target power spectra in each subband power spectrum in the N subband power spectra.
  • M is an integer greater than or equal to 2.
  • the above-mentioned M target power spectra are power spectra used to obtain the estimated noise power spectrum corresponding to each of the N audio frames.
  • the specific number of the above-mentioned M target power spectrums may be determined according to the number of N audio frames, and may also be determined according to the power value of each subband power spectrum in the N subband power spectrums. Specifically, it can be determined according to actual use requirements, which is not limited in the embodiment of the present application.
  • the noise power spectrum estimation can be obtained according to the M target power spectra.
  • the principle of minimum tracking is to assume that even during speech activity, the noisy speech power of a single frequency band generally decays to the power level of the noise.
  • the electronic device in the embodiment of the present application can obtain the estimation of the noise power spectrum corresponding to each audio frame in the N audio frames according to the M target power spectra in each subband power spectrum in the N subband power spectra.
  • the M target power spectra are power spectra that fall within a preset power spectrum interval.
  • the M target power spectrums may be all or part of the power spectrums that fall within the preset power spectrum range.
  • the preset power spectrum interval is a power spectrum interval with a relatively small or relatively minimum power value in the power spectrum of each subband.
  • the preset power spectrum interval includes at least two or more than two power spectrums.
  • the preset power spectrum interval may be a set of the top P% or top Q power spectrums in the power spectrum of each subband arranged in order from low to high.
  • the preset power spectrum interval may be a set of the last R% or the last S power spectra in the power spectrum of each subband arranged in descending order.
  • P and R are respectively positive numbers
  • Q and S are respectively positive integers.
  • the preset power spectrum interval may be the set of the first 10% or 25% of the power spectrums in the power spectrum of each subband in the order from low to high, or it may be the power spectrum of each subband in the order from low to high.
  • a collection of the top 50 or top 80 power spectra of the permutations may be determined according to actual use requirements, which is not limited in the embodiment of the present application.
  • the preset power spectrum interval may be a set of the power spectrums of the last 5% or 8% of the power spectrums of each subband arranged in ascending order, or may also be a set of power spectrums of each subband power spectrum in descending order.
  • the set of power spectra of the bottom 30 or the bottom 20 in the lowest order can be determined according to actual use requirements, which is not limited in the embodiment of the present application.
  • S102 includes the following S1021 to S1022:
  • the obtaining device assigns a weight value to each of the M target power spectra respectively.
  • the weight values assigned to each target power spectrum may be equal or unequal. Specifically, it can be determined according to actual use requirements, which is not limited in the embodiment of the present application.
  • the obtaining device obtains an estimate of the noise power spectrum according to the M target power spectra and weight values.
  • the electronic device obtains an estimate of the noise power spectrum through the weighted average calculation formula according to the M target power spectrums and the weight values.
  • the purpose of the electronic device assigning weight values to each of the M target power spectra respectively is to introduce a weight allocation mechanism in the process of obtaining the noise power spectrum estimation. On this basis, the electronic device obtains an estimate of the noise power spectrum according to the M target power spectra and weight values. Therefore, by reasonably assigning the weights occupied by each target power spectrum in the M target power spectra, the electronic device can further reduce the degree of deviation of the noise power spectrum estimation relative to the true value of the noise power spectrum.
  • the obtaining device performs smooth update processing on the noise power spectrum estimation.
  • the electronic device uses a smoothing factor to perform smooth update processing on the noise power spectrum estimate.
  • the obtaining device performs compensation and correction processing on the processed noise power spectrum estimation to obtain the noise loudness of the target audio signal.
  • the electronic device uses a compensation correction factor to perform compensation and correction processing on the noise power spectrum estimation.
  • the purpose of the smooth update process and the compensation correction process is to smooth and compensate the noise power spectrum estimate, so that the noise power spectrum estimate is closer to the true value of the noise power spectrum.
  • the values of the smoothing factor and the compensation correction factor may be specifically determined according to actual usage requirements, which are not limited in this embodiment of the present application.
  • the electronic device in the embodiment of the present application obtains, according to the M target power spectra in each subband power spectrum of the N subband power spectra, a noise power spectrum estimate corresponding to each audio frame in the N audio frames, and then processes and compensates by smooth update processing. In the correction process, the noise loudness of the target audio signal is estimated according to the noise power spectrum.
  • the noise power spectrum estimation acquisition method ie the noise loudness acquisition method
  • the method for obtaining the noise power spectrum estimation still has the defect of a large deviation between the noise estimated value and the true value.
  • the acquisition method provided in the embodiment of the present application performs statistics on two or more target power spectra, and thereby obtains a noise power spectrum estimate according to the two or more target power spectra . Therefore, compared with the related art in which noise estimation is performed by tracking the minimum power spectrum band of each audio frame, the method provided by the embodiments of the present application can effectively reduce the noise estimation (that is, the noise loudness obtained by estimation) and the noise truth value. The degree of deviation between the two, and thereby improve the call quality of the electronic device.
  • the target audio signal includes alternating long and short viewing windows.
  • one long observation window of the plurality of long observation windows is adjacent to two short observation windows of the plurality of short observation windows, respectively. That is to say, the length of each observation window of the target audio signal changes alternately.
  • the length of the long viewing window is greater than the length of the short viewing window.
  • the specific lengths of the long observation window and the short observation window may be determined according to actual usage requirements, which are not limited in the embodiment of the present application.
  • the acquisition method provided by this embodiment of the present application may further include the following step S105:
  • the acquiring device determines the transition level of noise in the target audio signal according to the effective value and the signal-to-noise ratio of each audio frame in the short observation window.
  • the effective value of each audio frame may be the root mean square (Root Mean Square) of spectral values of each audio frame.
  • the effective value of each audio frame may be the square root of the average of the squares of the spectral values of each audio frame.
  • the signal-to-noise ratio (Signal Noise Ratio, SNR) of each audio frame may be a ratio between the power of the speech signal and the power of the noise signal in each audio frame.
  • the signal-to-noise ratio of each audio frame can be represented by decibels. The higher the signal-to-noise ratio of each audio frame, the less noise in the audio frame.
  • the effective value and signal-to-noise ratio of each audio frame can characterize the noise in the audio frame. Therefore, according to the effective value and signal-to-noise ratio of each audio frame, the change of noise in the target audio signal (that is, the target audio signal) can be known. transition level of noise in the audio signal).
  • transition level of the noise in the target audio signal can be used to measure the change of the loudness of the noise.
  • the transition level of noise in the target audio signal may include one level, or may include multiple levels.
  • the hopping level of the electronic device may include a first-level hopping and a second-level hopping, and may also include a third-level hopping, and a fourth-level hopping.
  • the lower the jump level the greater the degree of change in noise.
  • the jump parameter value of the first-level jump is greater than the jump parameter value of the second-level jump.
  • the jump parameter value can be characterized by the decibel change value or the decibel change rate of the audio signal. For example, when the decibel change value of the audio signal of the first-level jump is greater than the decibel change value of the audio signal of the second-level jump, or the decibel change rate of the audio signal of the first-level jump is higher than the decibel change rate of the audio signal of the second-level jump , then the jump parameter value of the first-level jump is greater than the jump parameter value of the second-level jump.
  • the above-mentioned jump parameter values may be characterized by numerical values, levels, or percentages. It can be understood that the above audio signal is an audio signal of an audio frame without speech. That is, the above-mentioned audio signal is an audio signal that can characterize the noise level of the environment where the user of the electronic device is located.
  • the jump level is a first-level jump
  • the user of the electronic device has entered an apparently relatively noisy or apparently relatively quiet environment.
  • the hopping level is one-level hopping, it can be understood that the call quality of the user of the electronic device is significantly improved or significantly decreased.
  • the electronic device in the embodiment of the present application can accurately determine the transition level of noise in the target audio signal according to the effective value and the signal-to-noise ratio of each audio frame in the short observation window. Furthermore, the electronic device in the embodiment of the present application may determine or select a smooth update processing method suitable for the jump level according to the jump level, so as to further reduce the deviation between the noise estimate and the noise true value. Purpose.
  • S103 includes the following S1031 to S1032:
  • the obtaining device performs a first smoothing update process on the noise power spectrum estimation.
  • the first smooth update process may be a smooth update process corresponding to one-level jump.
  • the jump level is one-level jump, it can be understood that the noise in the target audio signal is obviously stronger or weaker.
  • the above S1031 includes the following S1031a to S1031b:
  • the obtaining device obtains the first smoothing factor according to the audio frame information of the audio frame in the short observation window.
  • the audio frame information includes: signal-to-noise ratio information and voice existence probability information.
  • the above signal-to-noise ratio information represents the ratio between the power of the speech signal of the audio frame and the power of the noise signal in the short observation window of the target audio signal.
  • the above-mentioned speech existence probability information may be information representing the possibility of speech existence probability in the audio frame in the short observation window of the target audio signal.
  • the voice existence probability information may be obtained by means of neural network-based voice activity detection (Neural Network Voice Activity Detection, NNVAD).
  • NNVAD Neural Network Voice Activity Detection
  • the obtaining device uses the first smoothing factor to perform a first smoothing update process on the noise power spectrum estimation.
  • the reason why the obtaining apparatus in this embodiment of the present application uses S1031a to S1031b to perform the first smooth update process is as follows.
  • the noise loudness estimation is performed by the method of minimum value tracking
  • the update of the noise estimation by this method will be performed in both the speech segment and the non-speech segment.
  • the result of noise power spectrum estimation is easily affected by the high signal-to-noise ratio in the speech segment, causing it to be forced up.
  • the signal-to-noise ratio of the audio signal of the speech segment is high, the deviation degree of the noise estimate of the audio signal of the speech segment is larger and higher than the true value of the noise.
  • the electronic device of the embodiment of the present application adopts S1031a to S1031b, and combines the information of the voice existence probability and the signal-to-noise ratio information of the audio signal to obtain the first smoothing factor, and uses the obtained first smoothing factor to perform the first smoothing update process . Therefore, the electronic device according to the embodiment of the present application can significantly reduce the degree of deviation between the noise estimate and the true value of the noise.
  • the obtaining device performs a second smoothing update process on the noise power spectrum estimation.
  • the second smooth update process may be the smooth update process corresponding to the secondary jump.
  • the jump level is a second-level jump
  • the noise in the target audio signal becomes stronger or weaker to a lesser extent or is less obvious.
  • S1032 includes the following S1032a to S1032c:
  • the obtaining device obtains an initial smoothing factor by fitting according to the noise power spectrum estimation corresponding to the audio frame in the first long observation window and the noise power spectrum estimation corresponding to the audio frame in the first short observation window.
  • first long observation window and the first short observation window are two adjacent observation windows with alternate lengths.
  • the acquisition device performs superposition fitting on the initial smoothing factor by using the audio frame information of the audio frame in the first long observation window to obtain the second smoothing factor.
  • the audio frame information includes: signal-to-noise ratio information and voice existence probability information.
  • the above signal-to-noise ratio represents the ratio between the power of the speech signal of the audio frame and the power of the noise signal in the short observation window of the target audio signal.
  • the above-mentioned speech existence probability information may be information representing the probability of speech existence in the audio frame in the short observation window of the target audio signal.
  • the voice presence probability information may be obtained by means of neural network-based voice activity detection (Neural Network Voice Activity Detection, NNVAD).
  • NNVAD Neural Network Voice Activity Detection
  • the obtaining device uses the second smoothing factor to perform a second smoothing update process on the noise power spectrum estimation.
  • the electronic device may consider that the user of the electronic device has entered a slightly changed noise field environment (that is, the noise field environment around the user of the electronic device has relatively insignificant changes). In the above situation, the electronic device can first count the M target power spectra of each audio frame in the long observation window (that is, the power spectra in the small value interval), and then obtain the estimation of the noise power spectrum in the long observation window.
  • an initial smoothing factor is calculated by fitting.
  • count the effective value and signal-to-noise ratio of each audio frame in the long observation window calculate the signal-to-noise ratio of the signal in the long observation window, combine the voice existence probability information of the audio signal segment in the long observation window, and superimpose and fit the second smoothing factor, and update the current noise power spectrum estimate.
  • the electronic device in this embodiment of the present application can determine or select a smoothing factor suitable for the noise field environment for update processing, so as to reduce the noise estimate The degree of deviation from the true value of the noise.
  • a method for acquiring the loudness of noise in an audio signal can be implemented through the following S201 to S220:
  • the acquiring device calculates the subband power spectrum acquired by the main microphone.
  • the subband power spectrum is a logarithmic power spectrum.
  • the obtaining device calculates the effective value, the signal-to-noise ratio, the minimum value interval, and the weight value distribution of the current audio frame, and obtains the noise power spectrum estimation.
  • the minimum value interval is the minimum value interval of the N subband power spectra. That is, the small value interval is the M target power spectrums that fall within the preset power spectrum interval.
  • the acquisition device updates the effective value, the signal-to-noise ratio, the noise power spectrum estimation and the speech frame identifier in the long observation window.
  • the acquisition device updates the effective value in the short observation window, the signal-to-noise ratio, the noise power spectrum estimation and the speech frame identifier.
  • the voice frame identifiers in S203 and S204 are used to indicate voice existence probability information.
  • the jump level of the noise can be understood as the magnitude of the sudden change of the background noise.
  • a first-order transition means the noise gets bigger or much smaller, eg, 10dB and above.
  • the second-level jump means that the noise becomes larger or smaller, for example, less than 10dB.
  • the audio signal attribute of the current observation window changes, the audio signal attribute of the current observation window is recorded, and the identifier of the steady-state noise state change of the audio signal segment is output.
  • the obtaining device determines whether the hopping level is a first-level hopping.
  • step S207 is executed, and if the determination result is no, step S215 is executed.
  • the jump level is determined according to the noise state identifier.
  • the acquiring device determines whether the background noise is increased.
  • step S208 is performed, and if the determination result is no, step S210 is performed.
  • the acquisition device counts the effective value and the signal-to-noise ratio in the short observation window, and determines the selected small value interval range and distribution weight coefficient according to the distribution law, and obtains the noise power spectrum estimation of the signal in the short observation window segment.
  • the range of the small value interval can be understood as the range of the M target power spectra.
  • the acquisition device counts the speech existence probability information and signal-to-noise ratio information of each audio frame in the short observation window, and adaptively selects a smoothing factor for steady-state noise power spectrum estimation according to the audio signal characteristics of the current observation window.
  • the acquiring device determines whether the reduction range of the background noise is less than 20 dB.
  • step S211 is performed, and if the determination result is no, step S214 is performed.
  • the acquisition device counts the effective value and the signal-to-noise ratio in the short observation window, and determines the selected frame minimum value interval range and distribution weight coefficient according to the distribution law, and obtains the noise power spectrum estimation of the short observation window segment signal.
  • the acquisition device counts the speech existence probability information and signal-to-noise ratio information of each audio frame in the short observation window, and smoothes the smoothing factor of the steady-state noise power spectrum estimation according to the audio signal characteristics of the current observation window.
  • the obtaining device updates the current window noise level in combination with the estimated value of the noise power spectrum of the previous observation window.
  • the acquisition device counts the effective value and signal-to-noise ratio of each audio frame in the short observation window, selects the small value interval of some audio frames, and assigns weight coefficients to obtain the noise power spectrum estimation in the short observation window.
  • the obtaining device counts the small value interval of each audio frame in the long observation window, and obtains the estimation of the noise power spectrum in the long observation window.
  • the obtaining device calculates a smoothing factor by fitting according to the noise power spectrum estimation of the current observation window and the noise power spectrum estimation of the previous observation window.
  • the acquiring device counts the effective value and the signal-to-noise ratio of each audio frame in the long observation window to obtain the signal-to-noise ratio of the signal segment in the observation window.
  • the acquisition device combines the speech existence probability information of the signal segment in the long observation window and the signal-to-noise ratio information of the speech signal segment, determines the attribute of the speech signal segment, and superimposes it to generate an aggregated smoothing window coefficient.
  • the acquiring device updates the noise level of the current observation window in combination with the estimated value of the noise power spectrum of the previous observation window.
  • the obtaining device outputs the estimated value of the current noise power spectrum through the compensation correction function.
  • the noise of the background environment undergoes a first-order jump, that is, the noise level is obviously stronger or weaker than the previous noise level.
  • the distribution law of the RMS value and the signal-to-noise ratio is calculated, and the small value interval of some audio frames is selected to obtain the initial short window.
  • Estimated value of the noise power spectrum of the inner signal segment is selected. Then, according to the signal SNR distribution in the short observation window and the speech probability of each audio frame, the smoothing factor of the noise power spectrum estimate is adaptively selected, and the noise power spectrum estimate of the current signal segment is updated again.
  • the same estimation method as above is adopted.
  • the noise power reduction in the short observation window is small (for example, above 20dB)
  • the small value interval of part of the audio frame is matched with the weight coefficient to obtain the estimated value of the noise power spectrum in the short window.
  • the hopping level can be considered to be a second-level hopping (ie, a relatively slight hopping).
  • the electronic device first counts the small value interval of each audio frame in the long observation window, and obtains the estimation of the noise power spectrum in the long observation window. According to the noise power spectrum estimate of the current observation window and the noise power spectrum estimate of the previous observation window, the smoothing factor is calculated by fitting.
  • the electronic device can reduce the degree of deviation between the estimated noise power spectrum and the true value of the noise power spectrum, so as to accurately estimate the loudness of noise in the audio signal, thereby improving the call quality of the electronic device.
  • the embodiment of the present application also provides an apparatus 200 for acquiring the loudness of noise in an audio signal
  • the acquiring apparatus 200 may be an apparatus, or may be a component, an integrated circuit, or a chip in a terminal.
  • the obtaining apparatus 200 may be a mobile electronic device or a non-mobile electronic device.
  • the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palmtop computer, an in-vehicle electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook, or a personal digital assistant (personal digital assistant).
  • UMPC ultra-mobile personal computer
  • netbook or a personal digital assistant
  • non-mobile electronic devices can be servers, network attached storage (Network Attached Storage, NAS), personal computer (personal computer, PC), television (television, TV), teller machine or self-service machine, etc., this application Examples are not specifically limited.
  • Network Attached Storage NAS
  • personal computer personal computer, PC
  • television television
  • teller machine or self-service machine etc.
  • the obtaining apparatus 200 in this embodiment of the present application may be an apparatus having an operating system.
  • the operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, which are not specifically limited in the embodiments of the present application.
  • the obtaining apparatus 200 provided in this embodiment of the present application can implement each process implemented by the method embodiments in FIG. 1 to FIG. 4 , and in order to avoid repetition, details are not repeated here.
  • the obtaining apparatus 200 includes: an obtaining module 210, configured to obtain N subband power spectra of N audio frames in the target audio signal.
  • the estimation module 220 is configured to obtain an estimate of the noise power spectrum corresponding to each of the N audio frames according to the M target power spectrums in each subband power spectrum of the N subband power spectra acquired by the acquisition module 210 .
  • the update module 230 is configured to perform smooth update processing on the noise power spectrum estimation obtained by the estimation module 220 .
  • the correction module 240 is configured to perform compensation and correction processing on the noise power spectrum estimation processed by the update module 230 to obtain the noise loudness of the target audio signal.
  • N is an integer greater than or equal to 1
  • M is an integer greater than or equal to 2.
  • the M target power spectra are power spectra that fall within a preset power spectrum interval.
  • the M target power spectrums may be all or part of the power spectrums that fall within the preset power spectrum range.
  • the preset power spectrum interval is the set of the top P% or top Q power spectra in the power spectrum of each subband arranged in order from low to high, or the preset power spectrum interval is the power spectrum of each subband in the order from high to low.
  • the target audio signal includes alternating long observation windows and short observation windows.
  • the acquiring apparatus 200 further includes: a determining module 250 .
  • the determination module 250 is used to determine the jump of noise in the target audio signal according to the effective value and the signal-to-noise ratio of each audio frame in the short observation window before the update module 230 performs smooth update processing on the noise power spectrum estimate obtained by the estimation module 220. level.
  • the update module 230 is specifically configured to: perform a first smooth update process on the noise power spectrum estimation obtained by the estimation module 220 when the hopping level is one-level hopping. When the hopping level is two-level hopping, a second smoothing update process is performed on the noise power spectrum estimation obtained by the estimation module 220 .
  • the jump parameter value of the first-level jump is greater than the jump parameter value of the second-level jump.
  • the updating module 230 is specifically configured to: obtain the first smoothing factor according to the audio frame information of the audio frame in the short observation window.
  • the noise power spectrum estimation obtained by the estimation module 220 is subjected to a first smoothing update process by using the first smoothing factor.
  • the audio frame information includes: signal-to-noise ratio information and voice existence probability information.
  • the update module 230 is specifically configured to: fit the noise power spectrum estimation corresponding to the audio frame in the first long observation window and the noise power spectrum estimation corresponding to the audio frame in the first short observation window. Get the initial smoothing factor. According to the audio frame information of the audio frame in the first long observation window, the initial smoothing factor is superimposed and fitted to obtain the second smoothing factor. A second smoothing update process is performed on the noise power spectrum estimate obtained by the estimation module 220 using the second smoothing factor.
  • the audio frame information includes: signal-to-noise ratio information and speech existence probability information, and the first long observation window and the first short observation window are adjacent observation windows.
  • the estimation module 220 is specifically configured to: assign a weight value to each of the M target power spectra respectively. According to the M target power spectrums and weight values, the noise power spectrum estimation is obtained.
  • the obtaining apparatus 200 provided in this embodiment of the present application performs statistics on two or more target power spectra, and thereby obtains a noise power spectrum estimate according to the two or more than two target power spectra.
  • the obtaining apparatus 200 provided by the embodiment of the present application can effectively reduce the noise estimation (that is, the noise loudness obtained by estimation) and the noise true value. deviation between.
  • an embodiment of the present application further provides an electronic device 100, including a processor 110, a memory 109, and a program or instruction stored in the memory 109 and executable on the processor 110, and the program or instruction is processed
  • the device 110 is executed, each process of the acquisition method according to any embodiment of the present application is implemented, and the same technical effect can be achieved. To avoid repetition, details are not repeated here.
  • the electronic devices in the embodiments of the present application include the above-mentioned mobile electronic devices and non-mobile electronic devices.
  • FIG. 8 is a schematic diagram of a hardware structure of an electronic device 100 implementing an embodiment of the present application.
  • the electronic device 100 includes but is not limited to: a radio frequency unit 101, a network module 102, an audio output unit 103, an input unit 104, a sensor 105, a display unit 106, a user input unit 107, an interface unit 108, a memory 109, and a processor 110, etc. part.
  • the audio output unit 103 is used as an acquisition module to acquire N subband power spectra of N audio frames in the target audio signal.
  • the processor 110 as an estimation module, an update module and a correction module, is used to obtain M target power spectra in each subband power spectrum in the N subband power spectra obtained by the acquisition module to obtain the corresponding values of each audio frame in the N audio frames.
  • the noise power spectrum estimation is used to perform smooth update processing on the noise power spectrum estimate obtained by the estimation module, and to perform compensation and correction processing on the noise power spectrum estimate processed by the update module to obtain the noise loudness of the target audio signal.
  • N is an integer greater than or equal to 1
  • M is an integer greater than or equal to 2.
  • the M target power spectra are power spectra that fall within a preset power spectrum interval.
  • the M target power spectrums may be all or part of the power spectrums that fall within the preset power spectrum range.
  • the preset power spectrum interval is the set of the top P% or top Q power spectra in the power spectrum of each subband arranged in order from low to high, or the preset power spectrum interval is the power spectrum of each subband in the order from high to low.
  • the target audio signal includes alternating long observation windows and short observation windows
  • the processor 110 is also used as a determination module for smoothly updating the noise power spectrum estimation obtained by the estimation module in the update module.
  • the transition level of noise in the target audio signal is determined.
  • the updating module is specifically configured to: perform a first smoothing update process on the noise power spectrum estimation obtained by the estimation module when the hopping level is one-level hopping. When the hopping level is two-level hopping, a second smoothing update process is performed on the noise power spectrum estimation obtained by the estimation module.
  • the jump parameter value of the first-level jump is greater than the jump parameter value of the second-level jump.
  • the processor 110 is used as an update module, which is specifically configured to: obtain the first smoothing factor according to the audio frame information of the audio frame in the short observation window.
  • the noise power spectrum estimation obtained by the estimation module 220 is subjected to a first smoothing update process by using the first smoothing factor.
  • the audio frame information includes: signal-to-noise ratio information and voice existence probability information.
  • the processor 110 is specifically configured to: fit the noise power spectrum estimate corresponding to the audio frame in the first long observation window and the noise power spectrum estimate corresponding to the audio frame in the first short observation window. Get the initial smoothing factor. According to the audio frame information of the audio frame in the first long observation window, the initial smoothing factor is superimposed and fitted to obtain the second smoothing factor. A second smoothing update process is performed on the noise power spectrum estimate obtained by the estimation module using the second smoothing factor.
  • the audio frame information includes: signal-to-noise ratio information and speech existence probability information, and the first long observation window and the first short observation window are adjacent observation windows.
  • the processor 110 is used as an estimation module, which is specifically configured to: assign a weight value to each of the M target power spectra respectively. According to the M target power spectrums and weight values, the noise power spectrum estimation is obtained.
  • the electronic device 100 provided in this embodiment of the present application performs statistics on two or more target power spectra, and thus obtains a noise power spectrum estimate according to the two or more than two target power spectra.
  • the electronic device 100 provided by the embodiment of the present application can effectively reduce the noise estimation (that is, the noise loudness obtained by estimation) and the noise true value. deviation between.
  • the electronic device 100 may also include a power source (such as a battery) for supplying power to various components, and the power source may be logically connected to the processor 110 through a power management system, so as to manage charging, discharging, and power management through the power management system. consumption management and other functions.
  • a power source such as a battery
  • the structure of the electronic device shown in FIG. 8 does not constitute a limitation on the electronic device.
  • the electronic device may include more or less components than the one shown, or combine some components, or arrange different components, which will not be repeated here. .
  • the input unit 104 may include a graphics processor (Graphics Processing Unit, GPU) 1041 and a microphone 1042. Such as camera) to obtain still pictures or video image data for processing.
  • the display unit 106 may include a display panel 1061, which may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like.
  • the user input unit 107 includes a touch panel 1071 and other input devices 1072 .
  • the touch panel 1071 is also called a touch screen.
  • the touch panel 1071 may include two parts, a touch detection device and a touch controller.
  • Other input devices 1072 may include, but are not limited to, physical keyboards, function keys (such as volume control keys, switch keys, etc.), trackballs, mice, and joysticks, which are not described herein again.
  • Memory 109 may be used to store software programs as well as various data including, but not limited to, application programs and operating systems.
  • the processor 110 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, and an application program, and the like, and the modem processor mainly processes wireless communication. It can be understood that, the above-mentioned modulation and demodulation processor may not be integrated into the processor 110 .
  • Embodiments of the present application further provide a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or instruction is executed by a processor, each process of the foregoing obtaining method embodiment can be achieved, and the same technical effect can be achieved , in order to avoid repetition, it will not be repeated here.
  • the processor is the processor in the electronic device in the above embodiment.
  • the readable storage medium includes a computer-readable storage medium, such as a computer read-only memory (Read-Only Memory, ROM), a random access memory (Random Access Memory, RAM), a magnetic disk or an optical disk, and the like.
  • An embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement each process of the foregoing method embodiments , and can achieve the same technical effect, in order to avoid repetition, it is not repeated here.
  • the chip mentioned in the embodiments of the present application may also be referred to as a system-on-chip, a system-on-chip, a system-on-a-chip, or a system-on-a-chip, or the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

一种音频信号中噪声响度的获取方法、装置和电子设备、可读存储介质、计算机产品和芯片,该方法包括:获取目标音频信号中的N个音频帧的N个子带功率谱(S101);根据N个子带功率谱中各个子带功率谱中的M个目标功率谱,得到N个音频帧中各个音频帧对应的噪声功率谱估计(S102);对噪声功率谱估计进行平滑更新处理(S103);对处理后的噪声功率谱估计进行补偿矫正处理,得到目标音频信号的噪声响度(S104)。其中,N为大于或等于1的整数,M为大于或等于2的整数。

Description

音频信号中噪声响度的获取方法、装置和电子设备
相关申请的交叉引用
本申请主张在2021年04月13日在中国提交的中国专利申请号:202110395202.0的优先权,其全部内容通过引用包含于此。
技术领域
本申请属于音频信号处理的技术领域,具体涉及一种音频信号中噪声响度的获取方法、装置和电子设备。
背景技术
随着科技的发展,能够实现通话功能的电子设备得到了广泛的应用。语音增强算法能够去除通话中大部分的干扰音频(即噪声),因此,语音增强算法对于提高该类电子设备的通话质量,具有十分重要的意义。
噪声估计是语音增强中至关重要的环节之一。噪声估计是指对语音通话过程中产生和传输的音频信号之中噪声的响度(即噪声的功率谱)进行估计。对音频信号进行准确地噪声估计,是保证语音增强效果的前提条件。
相关技术中存在的不足是,音频信号之中噪声的响度的估计值与真值之间的偏离度较大。
发明内容
本申请实施例的目的是提供一种音频信号中噪声响度的获取方法、装置和电子设备,能够解决在采用最小值跟踪法进行噪声估计时,噪声估计的结果的偏离度较大的技术问题。
第一方面,本申请实施例提供了一种音频信号中噪声响度的获取方法,包括:获取目标音频信号中的N个音频帧的N个子带功率谱;根据N个子带功率谱中各个子带功率谱中的M个目标功率谱,得到N个音频帧中各个音频帧对应的噪声功率谱估计;对噪声功率谱估计进行平滑更新处理;对处理后的噪声功率谱估计进行补偿矫正处理,得到目标音频信号的噪声响度;其中,N为大于或等于1的整数,M为大于或等于2的整数。
第二方面,本申请实施例提供了一种音频信号中噪声响度的获取装置,包括:获取模块,用于获取目标音频信号中的N个音频帧的N个子带功率谱;估计模块,用于根据获取模块获取的N个子带功率谱中各个子带功率谱中的M个目标功率谱,得到N个音频帧中各个音频帧对应的噪声功率谱估计;更新模块,用于对估计模块得到的噪声功率谱估计进行平滑更新处理;矫正模块,用于对更新模块处理后的噪声功率谱估计进行补偿矫正处理,得到目标音频信号的噪声响度;其中,N为大于或等于1的整数,M为大于或等于2的整数。
第三方面,本申请实施例提供了一种电子设备,该电子设备包括处理器、存储器 及存储在存储器上并可在处理器上运行的程序或指令,程序或指令被处理器执行时实现如第一方面的方法的步骤。
第四方面,本申请实施例提供了一种可读存储介质,可读存储介质上存储程序或指令,程序或指令被处理器执行时实现如第一方面的方法的步骤。
第五方面,本申请实施例提供了一种芯片,芯片包括处理器和通信接口,通信接口和处理器耦合,处理器用于运行程序或指令,实现如第一方面的方法。
在本申请实施例中,在获取到目标音频信号中的N个音频帧的N个子带功率谱后,可以根据N个子带功率谱中各个子带功率谱中的M个目标功率谱,得到N个音频帧中各个音频帧对应的噪声功率谱估计。最后,对噪声功率谱估计进行平滑更新处理,并对处理后的噪声功率谱估计进行补偿矫正处理,得到目标音频信号的噪声响度。在本申请实施例提供的获取方法中,N为大于或等于1的整数,M为大于或等于2的整数。换言之,本申请实施例提供的获取方法对两个或两个以上的目标功率谱进行统计,并由此根据两个或两个以上的目标功率谱得到噪声功率谱估计。本申请实施例提供的方法能够有效减小噪声估计(即通过估计获得的噪声响度)与噪声真值之间的偏离度,并由此提高电子设备的语音增强效果和语音通话质量。
附图说明
图1是本申请实施例的音频信号中噪声响度的获取方法的步骤流程图之一;
图2是本申请实施例的音频信号中噪声响度的获取方法的步骤流程图之二;
图3是本申请实施例的音频信号中噪声响度的获取方法的步骤流程图之三;
图4是本申请实施例的音频信号中噪声响度的获取方法的步骤流程图之四;
图5是本申请实施例的音频信号中噪声响度的获取装置的组成示意图之一;
图6是本申请实施例的音频信号中噪声响度的获取装置的组成示意图之二;
图7是本申请实施例的电子设备的组成示意图之一;
图8是本申请实施例的电子设备的组成示意图之二。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书中的术语“第一”、“第二”等是用于区别类似的对象,而不用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便本申请的实施例能够以除了在这里图示或描述的那些以外的顺序实施,且“第一”、“第二”等所区分的对象通常为一类,并不限定对象的个数,例如第一对象可以是一个,也可以是多个。此外,说明书以及权利要求中“和/或”表示所连接对象的至少其中之一,字符“/”,一般表示前后关联对象是一种“或”的关系。
下面结合附图,通过具体的实施例及其应用场景对本申请实施例提供的音频信号中噪声响度的获取方法、装置和电子设备进行详细地说明。
随着科技的发展,手机、个人电脑、智能手表等能够实现通话功能的电子设备得到了广泛的应用。为了提高通话质量,采用语音增强算法对通话中的干扰音频(即噪声)进行去除或过滤,是十分必要的。
为了实现语音增强,需要对通话中的噪声响度进行估计(即噪声估计)。相关技术中实现噪声估计的方法可以有以下几种:
第一种是通过时间递归平均算法实现噪声估计,该方法无法覆盖语音段环境背景噪声发生改变的情况,因此该方法存在噪声估计不及时的弊端。
第二种方法是基于直方图实现噪声估计,该方法的统计步骤在固定窗长内进行,并需要在所有频带上实施重复地计算,因此该方法存在计算量较大的不足。
第三种是通过最小值跟踪(Minima-tracking Algorithms)实现噪声估计,该方法通过跟踪每音频帧的最小值功率谱频带(即频谱功率的最小值),可以实现对噪声水平的粗略估计。该方法的延迟较少且计算量合理,是相对理想的噪声估计方法。
然而,最小值跟踪法存在的不足是,其噪声估计值与噪声真值的偏离度相对较大。因此,在采用最小值跟踪法进行噪声估计时,如何降低其偏离度,是本领域技术人员亟待解决的技术问题。
为了解决上述技术问题,本申请实施例提供了一种音频信号中噪声响度的获取方法,该方法的执行主体可以为获取装置,该获取装置可以为电子设备,也可以为电子设备中能够实现该获取方法的功能模块和/或功能主体,具体可以根据实际使用需求确定,本申请实施例不作限定。为了更加清楚地描述本申请实施例提供的获取方法,下面方法实施例中以获取装置为电子设备,即获取方法的执行主体为电子设备为例进行示例性地说明。
本申请实施例的电子设备可以为手机、平板电脑、笔记本电脑、掌上电脑、车载电子设备、可穿戴设备、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本或者个人数字助理(personal digital assistant,PDA)等设备,或者该电子设备还可以为其他类型的电子设备,本申请实施例不作限定。
下面以各个实施例为例,对本申请实施例提供的音频信号中噪声响度的获取方法进行详细的说明。
如图1所示,本申请实施例提供了一种音频信号中噪声响度的获取方法,该获取方法包括下述的S101至S104:
S101、获取装置获取目标音频信号中的N个音频帧的N个子带功率谱。
可选地,N为大于或等于1的整数。
可选地,上述语音通话音频信号可以通过电子设备的主麦克(Microphone,Mic)采集获取,还可以通过电子设备的主麦克和辅麦克共同采集获取。具体可以根据实际 使用需求确定,本申请实施例不作限定。
可选地,上述目标音频信号可以为通过电信号传输的话音频信号,还可以为基于互联网协议传输的音频信号。
可以理解,上述目标音频信号为语音通话音频信号中的音频信号,该目标音频信号中既可以包括语音音频信号,也可以包括噪音音频信号。
可选地,目标音频信号可以为语音通话音频信号中的部分音频信号或全部音频信号。具体可以根据实际使用需求确定,本申请实施例不作限定。
可以理解,上述噪音音频信号可以包括在语音通话过程中因周围环境嘈杂而产生的噪音音频信号,还可以包括在目标音频信号的传输过程中因通信传输质量不够理想而导致噪音音频信号。
可选地,上述噪音音频信号可以为稳态噪声音频信号。稳态噪声音频信号可以为重复频率大于预设频率(比如10Hz)的噪声音频信号,还可以为在测量期间声级起伏不大于预设分贝(比如3dB)的噪声音频信号。
可以理解,上述N个音频帧可以为N个连续的音频帧,也可以为N个非连续的音频帧。
可选地,N个音频帧可以为目标音频信号中的部分音频帧或全部音频帧。具体可以根据实际使用需求确定,本申请实施例不作限定。
可以理解,上述N个音频帧中的每个音频帧可以分别对应有一个子带功率谱,N个音频帧则可以对应有N个子带功率谱。
可选地,上述N个子带功率谱中的各个子带功率谱分别可以为子带频谱功率的对数值(即log值)。
可以理解,上述N个子带功率谱中的各个子带功率谱可与用于表征各个子带频谱中音频信号功率跟随频率变化的分布情况。
S102、获取装置根据N个子带功率谱中各个子带功率谱中的M个目标功率谱,得到N个音频帧中各个音频帧对应的噪声功率谱估计。
可选地,M为大于或等于2的整数。
可以理解,上述M个目标功率谱为用于得到N个音频帧中各个音频帧对应的噪声功率谱估计的功率谱。
可选地,上述M个目标功率谱的具体数量可以根据N个音频帧的数量确定,还可以根据N个子带功率谱中各个子带功率谱的功率值确定。具体可以根据实际使用需求确定,本申请实施例不作限定。
可选地,可以基于最小值跟踪的原理和算法,根据M个目标功率谱得到噪声功率谱估计。具体而言,最小值跟踪的原理是:假定即使在语音活动期间,单个频带的带噪语音功率通常会衰减到噪声的功率水平。相应地,本申请实施例的电子设备可以根据N个子带功率谱中各个子带功率谱中的M个目标功率谱,得到N个音频帧中各个 音频帧对应的噪声功率谱估计。
可选地,M个目标功率谱为落入预设功率谱区间的功率谱。其中,M个目标功率谱可以为落入预设功率谱区间的全部功率谱或部分功率谱。
可以理解,基于最小值跟踪的基本原理,预设功率谱区间为各个子带功率谱中功率值相对较小或相对最小的功率谱区间。
可以理解,预设功率谱区间内包括至少两个或两个以上的功率谱。
可选地,预设功率谱区间可以为各个子带功率谱中按由低至高顺序排列的前P%或前Q个的功率谱的集合。
可选地,预设功率谱区间可以为各个子带功率谱中按由高至低顺序排列的后R%或后S个的功率谱的集合。
可以理解,P和R分别为正数,Q和S分别为正整数。
举例而言,预设功率谱区间可以为各个子带功率谱中按由低至高顺序排列的前10%或25%的功率谱的集合,还可以为各个子带功率谱中按由低至高顺序排列的前50个或前80个的功率谱的集合。具体可以根据实际使用需求确定,本申请实施例不作限定。
再次举例而言,预设功率谱区间可以为各个子带功率谱中按由高至低顺序排列的后5%或8%的功率谱的集合,还可以为各个子带功率谱中按由高至低顺序排列的后30个或后20个的功率谱的集合。具体可以根据实际使用需求确定,本申请实施例不作限定。
进一步可选地,如图2所示,S102包括下述的S1021至S1022:
S1021、获取装置为M个目标功率谱中各个目标功率谱分别赋予权重值。
可选地,各个目标功率谱被赋予权重值可以相等,也可以不相等。具体可以根据实际使用需求确定,本申请实施例不作限定。
S1022、获取装置根据M个目标功率谱和权重值,得到噪声功率谱估计。
可以理解,电子设备根据M个目标功率谱和权重值,通过加权平均计算公式,得到噪声功率谱估计。
电子设备为M个目标功率谱中各个目标功率谱分别赋予权重值的目的在于通过在得到噪声功率谱估计的过程中引入权重分配机制。在此基础上,电子设备根据M个目标功率谱和权重值,得到噪声功率谱估计。由此,通过对M个目标功率谱中各个目标功率谱所占权重的合理分配,电子设备能够进一步减小噪声功率谱估计相对于噪声功率谱真值的偏离度。
S103、获取装置对噪声功率谱估计进行平滑更新处理。
可选地,电子设备采用平滑因子,对噪声功率谱估计进行平滑更新处理。
S104、获取装置对处理后的噪声功率谱估计进行补偿矫正处理,得到目标音频信号的噪声响度。
可选地,电子设备采用补偿矫正因子,对噪声功率谱估计进行补偿矫正处理。
可以理解,平滑更新处理和补偿矫正处理的目的在于对噪声功率谱估计进行平滑和补偿,以便使得噪声功率谱估计与噪声功率谱真值更为接近。平滑因子和补偿矫正因子的取值具体可以根据实际使用需求确定,本申请实施例不作限定。
本申请实施例的电子设备根据N个子带功率谱中各个子带功率谱中的M个目标功率谱,得到N个音频帧中各个音频帧对应的噪声功率谱估计,进而通过平滑更新处理和补偿矫正处理,根据噪声功率谱估计得到目标音频信号的噪声响度。需要说明的是,相关技术中的噪声功率谱估计获取方法(即噪声响度获取方法)通过跟踪每音频帧的最小值功率谱频带,实现对语音通话中噪声水平的粗略估计。然而,相关技术中的噪声功率谱估计获取方法即使增加了补偿因子,其得到噪声功率谱估计的方式仍然存在噪声估计值与真值的偏离度较大的缺陷。为了解决相关技术中的上述缺陷,本申请实施例提供的获取方法对两个或两个以上的目标功率谱进行统计,并由此根据两个或两个以上的目标功率谱得到噪声功率谱估计。因此,相比于通过跟踪每音频帧的最小值功率谱频带而进行噪声估计的相关技术,本申请实施例提供的方法能够有效减小噪声估计(即通过估计获得的噪声响度)与噪声真值之间的偏离度,并由此提高电子设备的通话质量。
可选地,目标音频信号包括相互交替的长观察窗口和短观察窗口。
示例性地,多个长观察窗口中的一个长观察窗口分别与多个短观察窗口中的两个个短观察窗口相邻。也就是说,目标音频信号的各个观察窗口的长度为长短交替变化的。
可以理解,长观察窗口的长度大于短观察窗口的长度。长观察窗口与短观察窗口的具体长度可以根据实际使用需求确定,本申请实施例不作限定。
示例性地,如图3所示,在S103之前,本申请实施例提供的获取方法还可以包括如下步骤S105:
S105、获取装置根据短观察窗口中各个音频帧的有效值和信噪比,判定目标音频信号中噪声的跳变级别。
可选地,各个音频帧的有效值可以为各个音频帧的频谱值的均方根(Root Mean Square)。换言之,各个音频帧的有效值可以为各个音频帧的频谱值的平方的平均值的平方根。
可选地,各个音频帧的信噪比(Signal Noise Ratio,SNR)可以为各个音频帧中语音信号的功率与噪声信号的功率之间的比值。其中,各个音频帧的信噪比可以通过分贝数表示。各个音频帧的信噪比越高,则表明该音频帧中的噪音越小。
每个音频帧的有效值和信噪比能够表征该音频帧中的噪声情况,由此,根据各个音频帧的有效值和信噪比,则能够获知目标音频信号中噪声的变化情况(即目标音频信号中噪声的跳变级别)。
可以理解,目标音频信号中噪声的跳变级别可以用于衡量噪声响度的变化情况。
可选地,目标音频信号中噪声的跳变级别可以包括一个级别,也可以包括多个级别。
可选地,电子设备的跳变级别可以包括一级跳变和二级跳变,还可以包括三级跳变,以及四级跳变。
可选地,跳变级别越低,则表示噪声的变化程度越大。
可以理解,一级跳变的跳变参数值大于二级跳变的跳变参数值。
需要说明的是,跳变参数值可以通过音频信号的分贝变化值或分贝变化率进行表征。比如,在一级跳变的音频信号分贝变化值大于二级跳变的音频信号分贝变化值,或一级跳变的音频信号分贝变化率高于二级跳变的音频信号分贝变化率的情况下,则一级跳变的跳变参数值大于二级跳变的跳变参数值。可以理解,上述跳变参数值可以通过数值、级别,或百分比进行表征。可以理解,上述音频信号为无语音存在的音频帧的音频信号。即:上述音频信号为能够表征电子设备的用户所处环境嘈杂程度的音频信号。
示例性地,在跳变级别为一级跳变的情况下,则可以理解为电子设备的用户进入了明显相对嘈杂或明显相对安静的环境。
示例性地,在跳变级别为一级跳变的情况下,则可以理解为电子设备的用户的通话质量具有明显地提升或明显地降低。
可以理解,本申请实施例中的电子设备可以通过根据短观察窗口中各个音频帧的有效值和信噪比,准确判定目标音频信号中噪声的跳变级别。进而,本申请实施例中的电子设备可以根据该跳变级别,确定或选取与该跳变级别相适应的平滑更新处理方式,以达到进一步减小噪声估计与噪声真值之间的偏离度的目的。
可选地,如图3所示,在通过S105判定目标音频信号中噪声的跳变级别的情况下,S103包括下述的S1031至S1032:
S1031、获取装置在跳变级别为一级跳变的情况下,对噪声功率谱估计进行第一平滑更新处理。
可以理解,第一平滑更新处理可以为与一级跳变对应的平滑更新处理。
可选地,在跳变级别为一级跳变的情况下,则可以理解为目标音频信号中的噪声明显地变强或变弱了。
进一步可选地,上述S1031包括下述的S1031a至S1031b:
S1031a、获取装置根据短观察窗口中音频帧的音频帧信息,得到第一平滑因子。
可选地,本申请实施例中,音频帧信息包括:信噪比信息和语音存在概率信息。
可以理解,上述信噪比信息表征了目标音频信号的短观察窗口中音频帧的语音信号的功率与噪声信号的功率之间的比值。
可以理解,上述语音存在概率信息可以为表征目标音频信号的短观察窗口中音频 帧存在语音的概率的可能性的信息。
可选地,本申请实施例中,语音存在概率信息可以通过基于神经网络的语音活动性检测(Neural Network Voice Activity Detection,NNVAD)的方式获得。
S1031b、获取装置采用第一平滑因子对噪声功率谱估计进行第一平滑更新处理。
本申请实施例的获取装置采用S1031a至S1031b进行第一平滑更新处理的原因如下。在采用最小值跟踪的方式进行噪声响度估计时,该方法对噪声估计的更新在语音段和非语音段均会进行。然而,噪声功率谱估计的结果在语音段却容易受到高信噪比的影响而导致其被迫抬升。换言之,由于语音段的音频信号的信噪比较高,因此语音段的音频信号的噪声估计的偏离度较大,并且高于噪声真值。因此,本申请实施例的电子设备采用S1031a至S1031b,结合音频信号的语音存在概率信息和信噪比信息的情况,得到第一平滑因子,并采用得到的第一平滑因子进行第一平滑更新处理。由此,本申请实施例的电子设备可以显著降低噪声估计与噪声真值之间的偏离度。
S1032、获取装置在跳变级别为二级跳变的情况下,对噪声功率谱估计进行第二平滑更新处理。
可以理解,第二平滑更新处理可以为与二级跳变对应的平滑更新处理。
可选地,本申请实施例中,在跳变级别为二级跳变的情况下,则可以理解为目标音频信号中的噪声变强或变弱的程度较小或较为不明显
进一步可选地,S1032包括下述的S1032a至S1032c:
S1032a、获取装置根据第一长观察窗口中音频帧对应的噪声功率谱估计和第一短观察窗口中音频帧对应的噪声功率谱估计,拟合得到初始平滑因子。
可以理解,第一长观察窗口和第一短观察窗口为相邻且长短交替的两个观察窗口。
S1032b、获取装置通过第一长观察窗口中音频帧的音频帧信息,对初始平滑因子进行叠加拟合,得到第二平滑因子。
可选地,音频帧信息包括:信噪比信息和语音存在概率信息。
可以理解,上述信噪比表征了目标音频信号的短观察窗口中音频帧的语音信号的功率与噪声信号的功率之间的比值。
可以理解,上述语音存在概率信息可以为表征目标音频信号的短观察窗口中音频帧存在语音的概率的信息。
可选地,语音存在概率信息可以通过基于神经网络的语音活动性检测(Neural Network Voice Activity Detection,NNVAD)的方式获得。
S1032c、获取装置采用第二平滑因子对噪声功率谱估计进行第二平滑更新处理。
本申请实施例的获取装置采用S1032a至S1032c进行第二平滑更新处理的原因如下。在跳变级别为二级跳变的情况下,电子设备可以认为电子设备的用户进入了轻微变化的噪声场环境(即电子设备的用户周围的噪声场环境变化相对不明显)。在上述情况下,电子设备可以首先统计长观察窗口内每个音频帧的M个目标功率谱(即小值 区间内的功率谱),进而得到长观察窗口内的噪声功率谱估计。随后,根据当前观察窗口的噪声估计和前一个观察窗口(即第一长观察窗口和第一短观察窗口)的噪声功率谱估计,拟合计算出初始平滑因子。最后,统计长观察窗口内每个音频帧的有效值和信噪比,计算长观察窗口内信号的信噪比,结合长观察窗口内音频信号段的语音存在概率信息,叠加拟合出第二平滑因子,并更新当前的噪声功率谱估计。由此,在电子设备的用户进入了轻微变化的噪声场环境的情况下,本申请实施例的电子设备可以确定或选取与该噪声场环境相适应的平滑因子进行更新处理,以减小噪声估计与噪声真值之间的偏离度。
进一步示例性地,如图4所示,可以通过以下S201至S220实现对音频信号中噪声响度的获取方法:
S201、获取装置计算主麦克采集获取的子带功率谱。
其中,该子带功率谱为对数功率谱。
S202、获取装置计算当前音频帧的有效值、信噪比、小值区间,以及权重值分配情况,得到噪声功率谱估计。
其中,该小值区间为N个子带功率谱的小值区间。即:该小值区间为落入预设功率谱区间的M个目标功率谱。
S203、获取装置更新长观察窗口内有效值、信噪比、噪声功率谱估计和语音帧标识符。
S204、获取装置更新短观察窗口内有效值,信噪比,噪声功率谱估计和语音帧标识符。
其中,S203和S204中的语音帧标识符用于指示语音存在概率信息。
S205、获取装置短观察窗口内各状态变量投票,判断当前观察窗口的音频信号属性,以及噪声的跳变级别。
其中,该噪声的跳变级别可以理解为背景噪声突变的幅度。一级跳变代表噪声变大或者变小很多,例如10dB及以上。二级跳变代表噪声变大或者变小比较轻微,例如10dB以下。
其中,若当前观察窗口的音频信号属性发生改变,则记录当前观察窗口的音频信号属性,输出音频信号段稳态噪声状态变化的标识符。
S206、获取装置判定跳变级别是否为一级跳变。
其中,判定结果为是,则执行步骤S207,判定结果为否,则执行步骤S215。
其中,跳变级别根据噪声状态标识符进行判断。
S207、获取装置判定背景噪声是否提升。
其中,判定结果为是,则执行步骤S208,判定结果为否,则执行步骤S210。
S208、获取装置统计短观察窗口内有效值和信噪比,并根据其分布规律决定选用的小值区间范围和分配权重系数,获得短观察窗口段信号的噪声功率谱估计。
其中,该小值区间范围可以理解为M个目标功率谱的范围。
S209、获取装置统计短观察窗口内每音频帧的语音存在概率信息和信噪比信息,依据当前观察窗口的音频信号特征,自适应地选定稳态噪声功率谱估计的平滑因子。
S210、获取装置判定背景噪声下降幅度是否小于20dB。
其中,判定结果为是,则执行步骤S211,判定结果为否,则执行步骤S214。
S211、获取装置统计短观察窗口内有效值和信噪比,并根据其分布规律决定选用的帧小值区间范围和分配权重系数,获得短观察窗口段信号的噪声功率谱估计。
S212、获取装置统计短观察窗口内每音频帧的语音存在概率信息和信噪比信息,依据当前观察窗口的音频信号特征,平滑稳态噪声功率谱估计的平滑因子。
S213、获取装置结合前一观察窗口噪声功率谱估计值更新当前窗噪声水平。
S214、获取装置统计短观察窗口内每音频帧的有效值和信噪比,选定部分音频帧的小值区间,并分配权重系数,以获得短观察窗口内的噪声功率谱估计。
S215、获取装置统计长观察窗口内每一音频帧的小值区间,获得长观察窗口内的噪声功率谱估计。
S216、获取装置根据当前观察窗口的噪声功率谱估计和前一个观察窗口的噪声功率谱估计,拟合计算平滑因子。
S217、获取装置统计长观察窗口内每一音频帧的有效值和信噪比,得到观察窗口内信号段的信噪比。
S218、获取装置结合长观察窗口内信号段的语音存在概率信息和语音信号段信噪比信息,判别语音信号段的属性,叠加生成聚合平滑窗系数。
S219、获取装置结合前一观察窗口噪声功率谱估计值更新当前观察窗口的噪声水平。
S220、获取装置通过补偿矫正函数,输出当前噪声功率谱估计值。
可以理解,若背景环境的噪声发生一级跳变,即噪声水平都比过去的噪声水平要强或者弱的明显。当进入了突然变强的背景噪声场环境,基于短观察窗口内音频信号的各项状态变量,统计有效值和信噪比的分布规律,选定部分音频帧的小值区间,获得初始短窗内信号段的噪声功率谱估计值。再根据短观察窗口内信号信噪比分布和每一音频帧的语音概率,自适应地选定噪声功率谱估计的平滑因子,再次更新当前信号段的噪声功率谱估计。当进入突然变弱的背景噪声场环境(比如背景噪声下降幅度20dB以下),则采取上述相同的估计方法。当短观察窗口内的噪声功率降低较小(比如在20dB之上),则假定电子设备的用户进入了较为安静的环境,依据短观察窗口内有效值和信噪比的分布规律,选定新的部分音频帧的小值区间,再配以权重系数,获得短窗内的噪声功率谱估计值。
可以理解,若背景环境的噪声未触发一级跳变标识符,则可以认为跳变级别为二级跳变(即相对轻微的跳变)。在进入了二级跳变的噪声场环境的情况下,电子设备 首先统计长观察窗口内每一音频帧的小值区间,得到长观察窗口内的噪声功率谱估计。根据当前观察窗口的噪声功率谱估计和前一个观察窗口的噪声功率谱估计,拟合计算出平滑因子。再统计长观察窗口内每一音频帧的有效值和信噪比,计算长观察窗口内音频信号的信噪比,结合长观察窗口内音频信号段的语音存在概率信息,叠加拟合出的平滑因子,生成聚合平滑窗系数,最后更新当前的噪声功率谱估计。
通过上述S201至S220,电子设备可以降低噪声功率谱估计与噪声功率谱真值之间的偏离度,以实现对音频信号中噪声响度的准确地估计,并由此提高电子设备的通话质量。
本申请实施例还提供了一种音频信号中噪声响度的获取装置200,该获取装置200可以是装置,也可以是终端中的部件、集成电路、或芯片。该获取装置200可以是移动电子设备,也可以为非移动电子设备。示例性的,移动电子设备可以为手机、平板电脑、笔记本电脑、掌上电脑、车载电子设备、可穿戴设备、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本或者个人数字助理(personal digital assistant,PDA)等,非移动电子设备可以为服务器、网络附属存储器(Network Attached Storage,NAS)、个人计算机(personal computer,PC)、电视机(television,TV)、柜员机或者自助机等,本申请实施例不作具体限定。
本申请实施例中的获取装置200可以为具有操作系统的装置。该操作系统可以为安卓(Android)操作系统,可以为ios操作系统,还可以为其他可能的操作系统,本申请实施例不作具体限定。
本申请实施例提供的获取装置200能够实现图1至图4的方法实施例实现的各个过程,为避免重复,这里不再赘述。
如图5所示,该获取装置200包括:获取模块210,用于获取目标音频信号中的N个音频帧的N个子带功率谱。估计模块220,用于根据获取模块210获取的N个子带功率谱中各个子带功率谱中的M个目标功率谱,得到N个音频帧中各个音频帧对应的噪声功率谱估计。更新模块230,用于对估计模块220得到的噪声功率谱估计进行平滑更新处理。矫正模块240,用于对更新模块230处理后的噪声功率谱估计进行补偿矫正处理,得到目标音频信号的噪声响度。其中,N为大于或等于1的整数,M为大于或等于2的整数。
可选地,本申请实施例中,M个目标功率谱为落入预设功率谱区间的功率谱。其中,M个目标功率谱可以为落入预设功率谱区间的全部功率谱或部分功率谱。预设功率谱区间为各个子带功率谱中按由低至高顺序排列的前P%或前Q个的功率谱的集合,或预设功率谱区间为各个子带功率谱中按由高至低顺序排列的后R%或后S个的功率谱的集合,其中,P和R分别为正数,Q和S分别为正整数。
可选地,本申请实施例中,目标音频信号包括相互交替的长观察窗口和短观察窗口,如图6所示,获取装置200还包括:判定模块250。判定模块250用于在更新模 块230对估计模块220得到的噪声功率谱估计进行平滑更新处理之前,根据短观察窗口中各个音频帧的有效值和信噪比,判定目标音频信号中噪声的跳变级别。更新模块230具体用于:在跳变级别为一级跳变的情况下,对估计模块220得到的噪声功率谱估计进行第一平滑更新处理。在跳变级别为二级跳变的情况下,对估计模块220得到的噪声功率谱估计进行第二平滑更新处理。其中,一级跳变的跳变参数值大于二级跳变的跳变参数值。
可选地,本申请实施例中,更新模块230具体用于:根据短观察窗口中音频帧的音频帧信息,得到第一平滑因子。采用第一平滑因子对估计模块220得到的噪声功率谱估计进行第一平滑更新处理。其中,音频帧信息包括:信噪比信息和语音存在概率信息。
可选地,本申请实施例中,更新模块230具体用于:根据第一长观察窗口中音频帧对应的噪声功率谱估计和第一短观察窗口中音频帧对应的噪声功率谱估计,拟合得到初始平滑因子。通过第一长观察窗口中音频帧的音频帧信息,对初始平滑因子进行叠加拟合,得到第二平滑因子。采用第二平滑因子对估计模块220得到的噪声功率谱估计进行第二平滑更新处理。其中,音频帧信息包括:信噪比信息和语音存在概率信息,第一长观察窗口和第一短观察窗口为相邻的观察窗口。
可选地,本申请实施例中,估计模块220具体用于:为M个目标功率谱中各个目标功率谱分别赋予权重值。根据M个目标功率谱和权重值,得到噪声功率谱估计。
本申请实施例提供的获取装置200对两个或两个以上的目标功率谱进行统计,并由此根据两个或两个以上的目标功率谱得到噪声功率谱估计。相比于通过跟踪每音频帧的最小值功率谱频带而进行噪声估计的相关技术,本申请实施例提供的获取装置200能够有效减小噪声估计(即通过估计获得的噪声响度)与噪声真值之间的偏离度。
如图7所示,本申请实施例还提供了一种电子设备100,包括处理器110,存储器109及存储在存储器109上并可在处理器110上运行的程序或指令,程序或指令被处理器110执行时实现如本申请任一实施例的获取方法的的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
需要说明的是,本申请实施例中的电子设备包括上述的移动电子设备和非移动电子设备。
图8为实现本申请实施例的一种电子设备100的硬件结构示意图。
该电子设备100包括但不限于:射频单元101、网络模块102、音频输出单元103、输入单元104、传感器105、显示单元106、用户输入单元107、接口单元108、存储器109、以及处理器110等部件。
音频输出单元103作为获取模块,用于获取目标音频信号中的N个音频帧的N个子带功率谱。处理器110作为估计模块、更新模块和矫正模块,用于根据获取模块获取的N个子带功率谱中各个子带功率谱中的M个目标功率谱,得到N个音频帧中各 个音频帧对应的噪声功率谱估计,并用于对估计模块得到的噪声功率谱估计进行平滑更新处理,以及用于对更新模块处理后的噪声功率谱估计进行补偿矫正处理,得到目标音频信号的噪声响度。其中,N为大于或等于1的整数,M为大于或等于2的整数。
可选地,本申请实施例中,M个目标功率谱为落入预设功率谱区间的功率谱。其中,M个目标功率谱可以为落入预设功率谱区间的全部功率谱或部分功率谱。预设功率谱区间为各个子带功率谱中按由低至高顺序排列的前P%或前Q个的功率谱的集合,或预设功率谱区间为各个子带功率谱中按由高至低顺序排列的后R%或后S个的功率谱的集合,其中,P和R分别为正数,Q和S分别为正整数。
可选地,本申请实施例中,目标音频信号包括相互交替的长观察窗口和短观察窗口,处理器110还作为判定模块,用于在更新模块对估计模块得到的噪声功率谱估计进行平滑更新处理之前,根据短观察窗口中各个音频帧的有效值和信噪比,判定目标音频信号中噪声的跳变级别。更新模块具体用于:在跳变级别为一级跳变的情况下,对估计模块得到的噪声功率谱估计进行第一平滑更新处理。在跳变级别为二级跳变的情况下,对估计模块得到的噪声功率谱估计进行第二平滑更新处理。其中,一级跳变的跳变参数值大于二级跳变的跳变参数值。
可选地,本申请实施例中,处理器110作为更新模块,其具体用于:根据短观察窗口中音频帧的音频帧信息,得到第一平滑因子。采用第一平滑因子对估计模块220得到的噪声功率谱估计进行第一平滑更新处理。其中,音频帧信息包括:信噪比信息和语音存在概率信息。
可选地,本申请实施例中,处理器110具体用于:根据第一长观察窗口中音频帧对应的噪声功率谱估计和第一短观察窗口中音频帧对应的噪声功率谱估计,拟合得到初始平滑因子。通过第一长观察窗口中音频帧的音频帧信息,对初始平滑因子进行叠加拟合,得到第二平滑因子。采用第二平滑因子对估计模块得到的噪声功率谱估计进行第二平滑更新处理。其中,音频帧信息包括:信噪比信息和语音存在概率信息,第一长观察窗口和第一短观察窗口为相邻的观察窗口。
可选地,本申请实施例中,处理器110作为估计模,其具体用于:为M个目标功率谱中各个目标功率谱分别赋予权重值。根据M个目标功率谱和权重值,得到噪声功率谱估计。
本申请实施例提供的电子设备100对两个或两个以上的目标功率谱进行统计,并由此根据两个或两个以上的目标功率谱得到噪声功率谱估计。相比于通过跟踪每音频帧的最小值功率谱频带而进行噪声估计的相关技术,本申请实施例提供的电子设备100能够有效减小噪声估计(即通过估计获得的噪声响度)与噪声真值之间的偏离度。
本领域技术人员可以理解,电子设备100还可以包括给各个部件供电的电源(比如电池),电源可以通过电源管理系统与处理器110逻辑相连,从而通过电源管理系统实现管理充电、放电、以及功耗管理等功能。图8中示出的电子设备结构并不构成 对电子设备的限定,电子设备可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置,在此不再赘述。
应理解的是,本申请实施例中,输入单元104可以包括图形处理器(Graphics Processing Unit,GPU)1041和麦克风1042,图形处理器1041对在视频捕获模式或图像捕获模式中由图像捕获装置(如摄像头)获得的静态图片或视频的图像数据进行处理。显示单元106可包括显示面板1061,可以采用液晶显示器、有机发光二极管等形式来配置显示面板1061。用户输入单元107包括触控面板1071以及其他输入设备1072。触控面板1071,也称为触摸屏。触控面板1071可包括触摸检测装置和触摸控制器两个部分。其他输入设备1072可以包括但不限于物理键盘、功能键(比如音量控制按键、开关按键等)、轨迹球、鼠标、操作杆,在此不再赘述。存储器109可用于存储软件程序以及各种数据,包括但不限于应用程序和操作系统。处理器110可集成应用处理器和调制解调处理器,其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器110中。
本申请实施例还提供一种可读存储介质,可读存储介质上存储有程序或指令,该程序或指令被处理器执行时实现上述获取方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
其中,处理器为上述实施例中的电子设备中的处理器。可读存储介质,包括计算机可读存储介质,如计算机只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等。
本申请实施例另提供了一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现上述方法实施例的各个过程,且能达到相同的技术效果,为避免重复,这里不再赘述。
应理解,本申请实施例提到的芯片还可以称为系统级芯片、系统芯片、芯片系统或片上系统芯片等。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者装置不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、方法、物品或者装置中还存在另外的相同要素。此外,需要指出的是,本申请实施方式中的方法和装置的范围不限按示出或讨论的顺序来执行功能,还可包括根据所涉及的功能按基本同时的方式或按相反的顺序来执行功能,例如,可以按不同于所描述的次序来执行所描述的方法,并且还可以添加、省去、或组合各种步骤。另外,参照某些示例所描述的特征可在其他示例中被组合。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方 法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以计算机软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,或者网络设备等)执行本申请各个实施例所述的方法。
上面结合附图对本申请的实施例进行了描述,但是本申请并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本申请的启示下,在不脱离本申请宗旨和权利要求所保护的范围情况下,还可做出很多形式,均属于本申请的保护之内。

Claims (17)

  1. 一种音频信号中噪声响度的获取方法,包括:
    获取目标音频信号中的N个音频帧的N个子带功率谱;
    根据所述N个子带功率谱中各个子带功率谱中的M个目标功率谱,得到所述N个音频帧中各个音频帧对应的噪声功率谱估计;
    对所述噪声功率谱估计进行平滑更新处理;
    对处理后的所述噪声功率谱估计进行补偿矫正处理,得到所述目标音频信号的噪声响度;
    其中,N为大于或等于1的整数,M为大于或等于2的整数。
  2. 根据权利要求1所述的获取方法,其中,所述M个目标功率谱为落入预设功率谱区间的功率谱;所述预设功率谱区间为所述各个子带功率谱中按由低至高顺序排列的前P%或前Q个的功率谱的集合,或所述预设功率谱区间为所述各个子带功率谱中按由高至低顺序排列的后R%或后S个的功率谱的集合,其中,P和R分别为正数,Q和S分别为正整数。
  3. 根据权利要求1所述的获取方法,其中,所述目标音频信号包括相互交替的长观察窗口和短观察窗口,所述对所述噪声功率谱估计进行平滑更新处理之前,所述获取方法还包括:
    根据所述短观察窗口中各个音频帧的有效值和信噪比,判定所述目标音频信号中噪声的跳变级别;
    所述对所述噪声功率谱估计进行平滑更新处理,包括:
    在所述跳变级别为一级跳变的情况下,对所述噪声功率谱估计进行第一平滑更新处理;
    在所述跳变级别为二级跳变的情况下,对所述噪声功率谱估计进行第二平滑更新处理;
    其中,所述一级跳变的跳变参数值大于所述二级跳变的跳变参数值。
  4. 根据权利要求3所述的获取方法,其中,所述对所述噪声功率谱估计进行第一平滑更新处理,包括:
    根据所述短观察窗口中音频帧的音频帧信息,得到第一平滑因子;
    采用所述第一平滑因子对所述噪声功率谱估计进行所述第一平滑更新处理;
    其中,所述音频帧信息包括:信噪比信息和语音存在概率信息。
  5. 根据权利要求3所述的获取方法,其中,所述对所述噪声功率谱估计进行第二平滑更新处理,包括:
    根据第一长观察窗口中音频帧对应的噪声功率谱估计和第一短观察窗口中音频帧对应的噪声功率谱估计,拟合得到初始平滑因子;
    通过所述第一长观察窗口中音频帧的音频帧信息,对所述初始平滑因子进行叠加 拟合,得到第二平滑因子;
    采用所述第二平滑因子对所述噪声功率谱估计进行所述第二平滑更新处理;
    其中,所述音频帧信息包括:信噪比信息和语音存在概率信息,所述第一长观察窗口和所述第一短观察窗口为相邻的观察窗口。
  6. 根据权利要求1至5中任一项所述的获取方法,其中,所述根据所述N个子带功率谱中各个子带功率谱中的M个目标功率谱,得到所述N个音频帧中各个音频帧对应的噪声功率谱估计,包括:
    为所述M个目标功率谱中各个目标功率谱分别赋予权重值;
    根据所述M个目标功率谱和所述权重值,得到所述噪声功率谱估计。
  7. 一种音频信号中噪声响度的获取装置,包括:
    获取模块,用于获取目标音频信号中的N个音频帧的N个子带功率谱;
    估计模块,用于根据所述获取模块获取的所述N个子带功率谱中各个子带功率谱中的M个目标功率谱,得到所述N个音频帧中各个音频帧对应的噪声功率谱估计;
    更新模块,用于对所述估计模块得到的所述噪声功率谱估计进行平滑更新处理;
    矫正模块,用于对所述更新模块处理后的所述噪声功率谱估计进行补偿矫正处理,得到所述目标音频信号的噪声响度;
    其中,N为大于或等于1的整数,M为大于或等于2的整数。
  8. 根据权利要求7所述的获取装置,其中,所述M个目标功率谱为落入预设功率谱区间的功率谱;所述预设功率谱区间为所述各个子带功率谱中按由低至高顺序排列的前P%或前Q个的功率谱的集合,或所述预设功率谱区间为所述各个子带功率谱中按由高至低顺序排列的后R%或后S个的功率谱的集合,其中,P和R分别为正数,Q和S分别为正整数。
  9. 根据权利要求7所述的获取装置,其中,所述目标音频信号包括相互交替的长观察窗口和短观察窗口,所述获取装置还包括:
    判定模块,所述判定模块用于在所述更新模块对所述估计模块得到的所述噪声功率谱估计进行平滑更新处理之前,根据所述短观察窗口中各个音频帧的有效值和信噪比,判定所述目标音频信号中噪声的跳变级别;
    所述更新模块具体用于:
    在所述跳变级别为一级跳变的情况下,对所述估计模块得到的所述噪声功率谱估计进行第一平滑更新处理;
    在所述跳变级别为二级跳变的情况下,对所述估计模块得到的所述噪声功率谱估计进行第二平滑更新处理;
    其中,所述一级跳变的跳变参数值大于所述二级跳变的跳变参数值。
  10. 根据权利要求9所述的获取装置,其中,所述更新模块具体用于:
    根据所述短观察窗口中音频帧的音频帧信息,得到第一平滑因子;
    采用所述第一平滑因子对所述估计模块得到的所述噪声功率谱估计进行所述第一平滑更新处理;
    其中,所述音频帧信息包括:信噪比信息和语音存在概率信息。
  11. 根据权利要求9所述的获取装置,其中,所述更新模块具体用于:
    根据第一长观察窗口中音频帧对应的噪声功率谱估计和第一短观察窗口中音频帧对应的噪声功率谱估计,拟合得到初始平滑因子;
    通过所述第一长观察窗口中音频帧的音频帧信息,对所述初始平滑因子进行叠加拟合,得到第二平滑因子;
    采用所述第二平滑因子对所述估计模块得到的所述噪声功率谱估计进行所述第二平滑更新处理;
    其中,所述音频帧信息包括:信噪比信息和语音存在概率信息,所述第一长观察窗口和所述第一短观察窗口为相邻的观察窗口。
  12. 根据权利要求7至11中任一项所述的获取装置,其中,所述估计模块具体用于:
    为所述M个目标功率谱中各个目标功率谱分别赋予权重值;
    根据所述M个目标功率谱和所述权重值,得到所述噪声功率谱估计。
  13. 一种电子设备,包括处理器,存储器及存储在所述存储器上并可在所述处理器上运行的程序或指令,所述程序或指令被所述处理器执行时实现如权利要求1至6中任一项所述的获取方法的步骤。
  14. 一种可读存储介质,所述可读存储介质上存储程序或指令,所述程序或指令被处理器执行时实现如权利要求1至6中任一项所述的获取方法的步骤。
  15. 一种计算机软件产品,所述计算机软件产品被至少一个处理器执行以实现如权利要求1至6中任一项所述的获取方法。
  16. 一种电子设备,包括电子设备被配置成用于执行如权利要求1至6中任一项所述的获取方法。
  17. 一种芯片,所述芯片包括处理器和通信接口,所述通信接口和所述处理器耦合,所述处理器用于运行程序或指令,实现如1至6中任一项所述的获取方法。
PCT/CN2022/086095 2021-04-13 2022-04-11 音频信号中噪声响度的获取方法、装置和电子设备 WO2022218252A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110395202.0 2021-04-13
CN202110395202.0A CN113270107B (zh) 2021-04-13 2021-04-13 音频信号中噪声响度的获取方法、装置和电子设备

Publications (1)

Publication Number Publication Date
WO2022218252A1 true WO2022218252A1 (zh) 2022-10-20

Family

ID=77228737

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/086095 WO2022218252A1 (zh) 2021-04-13 2022-04-11 音频信号中噪声响度的获取方法、装置和电子设备

Country Status (2)

Country Link
CN (1) CN113270107B (zh)
WO (1) WO2022218252A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113270107B (zh) * 2021-04-13 2024-02-06 维沃移动通信有限公司 音频信号中噪声响度的获取方法、装置和电子设备
CN113707146A (zh) * 2021-08-31 2021-11-26 北京达佳互联信息技术有限公司 信息交互方法和信息交互装置
CN114112006A (zh) * 2021-11-26 2022-03-01 中科传启(苏州)科技有限公司 一种噪声监测方法、装置及电子设备

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102187388A (zh) * 2008-10-15 2011-09-14 高通股份有限公司 噪声估计的方法和设备
CN104471855A (zh) * 2012-07-12 2015-03-25 Dts公司 具有噪声检测和响度下降检测的响度控制
CN105142067A (zh) * 2014-05-26 2015-12-09 杜比实验室特许公司 音频信号响度控制
CN109643554A (zh) * 2018-11-28 2019-04-16 深圳市汇顶科技股份有限公司 自适应语音增强方法和电子设备
CN111933165A (zh) * 2020-07-30 2020-11-13 西南电子技术研究所(中国电子科技集团公司第十研究所) 突变噪声快速估计方法
CN112133322A (zh) * 2020-10-19 2020-12-25 南通赛洋电子有限公司 一种基于噪声分类优化imcra算法的语音增强方法
WO2021007841A1 (zh) * 2019-07-18 2021-01-21 深圳市汇顶科技股份有限公司 噪声估计方法、噪声估计装置、语音处理芯片以及电子设备
CN113270107A (zh) * 2021-04-13 2021-08-17 维沃移动通信有限公司 音频信号中噪声响度的获取方法、装置和电子设备

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7428490B2 (en) * 2003-09-30 2008-09-23 Intel Corporation Method for spectral subtraction in speech enhancement
KR101295727B1 (ko) * 2010-11-30 2013-08-16 (주)트란소노 적응적 잡음추정 장치 및 방법
WO2018217059A1 (en) * 2017-05-25 2018-11-29 Samsung Electronics Co., Ltd. Method and electronic device for managing loudness of audio signal

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102187388A (zh) * 2008-10-15 2011-09-14 高通股份有限公司 噪声估计的方法和设备
CN104471855A (zh) * 2012-07-12 2015-03-25 Dts公司 具有噪声检测和响度下降检测的响度控制
CN105142067A (zh) * 2014-05-26 2015-12-09 杜比实验室特许公司 音频信号响度控制
CN109643554A (zh) * 2018-11-28 2019-04-16 深圳市汇顶科技股份有限公司 自适应语音增强方法和电子设备
WO2021007841A1 (zh) * 2019-07-18 2021-01-21 深圳市汇顶科技股份有限公司 噪声估计方法、噪声估计装置、语音处理芯片以及电子设备
CN111933165A (zh) * 2020-07-30 2020-11-13 西南电子技术研究所(中国电子科技集团公司第十研究所) 突变噪声快速估计方法
CN112133322A (zh) * 2020-10-19 2020-12-25 南通赛洋电子有限公司 一种基于噪声分类优化imcra算法的语音增强方法
CN113270107A (zh) * 2021-04-13 2021-08-17 维沃移动通信有限公司 音频信号中噪声响度的获取方法、装置和电子设备

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
YU YAO, HEMING ZHAO: "Improved of Noise Estimation Algorithm based on Minimum Statistic", COMPUTER ENGINEERING AND APPLICATIONS, HUABEI JISUAN JISHU YANJIUSUO, CN, vol. 49, no. 4, 30 April 2013 (2013-04-30), CN , pages 134 - 137, XP055977333, ISSN: 1002-8331, DOI: 10.3778/j.issn.1002-8331.1107-0168 *

Also Published As

Publication number Publication date
CN113270107B (zh) 2024-02-06
CN113270107A (zh) 2021-08-17

Similar Documents

Publication Publication Date Title
WO2022218252A1 (zh) 音频信号中噪声响度的获取方法、装置和电子设备
JP6361156B2 (ja) 雑音推定装置、方法及びプログラム
EP2828856A2 (en) Harmonicity estimation, audio classification, pitch determination and noise estimation
JP6339896B2 (ja) 雑音抑圧装置および雑音抑圧方法
EP2987314B1 (en) Echo suppression
WO2021007841A1 (zh) 噪声估计方法、噪声估计装置、语音处理芯片以及电子设备
CN105027540A (zh) 回波抑制
CN112292844A (zh) 双端通话检测方法、双端通话检测装置以及回声消除系统
CN110136733B (zh) 一种音频信号的解混响方法和装置
CN108806715B (zh) 降噪性能评价方法及系统
CN111261148A (zh) 语音模型的训练方法、语音增强处理方法及相关设备
WO2022218254A1 (zh) 语音信号增强方法、装置及电子设备
CN106297818B (zh) 一种获取去噪语音信号的方法和装置
WO2022143522A1 (zh) 音频信号处理方法、装置和电子设备
CN111161748A (zh) 一种双讲状态检测方法、装置以及电子设备
CN112669878B (zh) 声音增益值的计算方法、装置和电子设备
WO2024041512A1 (zh) 音频降噪方法、装置、电子设备及可读存储介质
CN112289337A (zh) 一种滤除机器学习语音增强后的残留噪声的方法及装置
CN112997249B (zh) 语音处理方法、装置、存储介质及电子设备
CN113205824B (zh) 声音信号处理方法、装置、存储介质、芯片及相关设备
CN1276896A (zh) 数字语音信号的去噪声方法
WO2021042538A1 (zh) 一种音频处理方法、装置及计算机存储介质
CN111782859A (zh) 一种音频可视化方法、装置和存储介质
CN113763976A (zh) 音频信号的降噪方法、装置、可读介质和电子设备
CN111863006A (zh) 一种音频信号处理方法、音频信号处理装置和耳机

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22787478

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE