WO2016078439A1 - 一种语音处理的方法及装置 - Google Patents

一种语音处理的方法及装置 Download PDF

Info

Publication number
WO2016078439A1
WO2016078439A1 PCT/CN2015/085209 CN2015085209W WO2016078439A1 WO 2016078439 A1 WO2016078439 A1 WO 2016078439A1 CN 2015085209 W CN2015085209 W CN 2015085209W WO 2016078439 A1 WO2016078439 A1 WO 2016078439A1
Authority
WO
WIPO (PCT)
Prior art keywords
frequency
spectrum
user equipment
slope
module
Prior art date
Application number
PCT/CN2015/085209
Other languages
English (en)
French (fr)
Inventor
郭李
仇存收
刘立
田立生
常青
王金鑫
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2016078439A1 publication Critical patent/WO2016078439A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/90Pitch determination of speech signals

Definitions

  • the present invention relates to the field of communications, and in particular, to a voice processing method and apparatus.
  • voice encoding and decoding In voice communication equipment or high-quality recording equipment, voice encoding and decoding, voice pre- and post-processing, speech synthesis, speech recognition and other technologies are required. All of these techniques for processing speech require framing the speech signal. Frame processing, where harmonic detection is a key technology for speech processing.
  • the existing harmonic detection technology mainly uses the autocorrelation method to determine the harmonics by calculating the autocorrelation function of the speech signal and detecting the position where the peak appears.
  • the application of the autocorrelation method for harmonic detection is susceptible to interference from the speech formant, resulting in a high false positive rate of harmonics.
  • the embodiment of the invention provides a voice processing method and device, which are used to solve the problem of high harmonic misjudgment rate in voice processing in the prior art.
  • a first aspect of the present invention provides a method for voice processing, including:
  • the user equipment adds window and framing the acquired voice signal
  • the user equipment emphasizes high frequency harmonic components in the voice signal after windowing and framing
  • the user equipment calculates a slope of each frequency point in the spectrum
  • the user equipment determines a center frequency point of the voice signal according to the slope, and determines a harmonic according to the center frequency point.
  • the method further includes:
  • the user equipment counts the number of the harmonics, and determines whether the number of the harmonics is greater than a preset threshold, and if yes, determines that the voice signal has voice.
  • the method further includes:
  • the user equipment determines a pitch frequency by calculating a frequency difference between adjacent harmonics.
  • the determining, by the user equipment, a slope of each frequency point in the frequency spectrum includes:
  • the determining, by the user equipment, the center frequency of the voice signal according to the slope includes:
  • the user equipment acquires a start edge and a falling edge of the center frequency point according to the slope, and determines a center frequency point of the voice signal by using the start edge and the falling edge.
  • the calculating, by the user equipment, the frequency difference of the adjacent the harmonics, determining the pitch frequency includes:
  • the user equipment determines the frequency difference of the most frequent occurrences as the pitch frequency by calculating a frequency difference of the adjacent harmonics and counting the frequency difference with the most occurrences.
  • the method before the user equipment calculates a slope of each frequency point in the spectrum, the method further includes:
  • the user equipment calculates a log spectrum X HE (t, f) of a high energy component in the speech signal, a log spectrum of the high energy component Where max is the maximum value, X STFT (t, f) is the spectrum of the speech signal, and S NN (t, f) is the spectrum of the background noise that is calculated.
  • the user equipment adds a high frequency harmonic component in the windowed and framed speech signal, including:
  • the user equipment uses a low-order high-pass filter to emphasize the high-frequency harmonic components in the windowed and framed speech signals.
  • a second aspect of the present invention provides a device for voice processing, including:
  • Windowing and framing module for windowing and framing the acquired voice signal
  • a weighting module configured to: after windowing and framing the acquired voice signal by the windowing and framing module, adding a high frequency harmonic component in the window signal after the windowing and framing;
  • An acquiring module configured to acquire, after the weighting module adds the high-frequency harmonic component, a spectrum of the voice signal after the weighting of the high-frequency harmonic component is obtained according to the FFT;
  • a first calculating module configured to calculate a slope of each frequency point in the spectrum after the acquiring module acquires a spectrum of the voice signal after the weighting of the high-frequency harmonic component
  • a first determining module after the first calculating module calculates a slope of each frequency point in the frequency spectrum, determining a center frequency point of the voice signal according to the slope, and determining a harmonic according to the center frequency point .
  • the device further includes:
  • a statistics module configured to: after the first determining module determines a center frequency of the voice signal, and determine a harmonic according to the center frequency, and count the number of the harmonics;
  • a judging module configured to determine, after the statistics module counts the number of the harmonics, whether the number of the harmonics is greater than a preset threshold
  • a second determining module configured to: when the determining module determines that the number of the harmonics is greater than a preset threshold, determining that the voice signal has a voice.
  • the device further includes:
  • a second calculating module configured to calculate a frequency difference between adjacent harmonics
  • a third determining module configured to determine a pitch frequency according to a frequency difference of adjacent harmonics calculated by the second calculating module.
  • the first determining module is specifically configured to obtain a start edge and a falling edge of the center frequency point according to the slope, and determine a center frequency point of the voice signal by using the start edge and the falling edge.
  • the third determining module is specifically configured to determine, according to a frequency difference of the adjacent harmonics, a frequency difference that is the most frequently occurring, and determine a frequency difference that is the most frequently occurring as the pitch frequency.
  • the device further includes:
  • a third calculation module configured to calculate a log spectrum X HE (t, f) of a high energy component in the speech signal, a log spectrum of the high energy component
  • max is the maximum value
  • X STFT (t, f) is the spectrum of the speech signal
  • S NN (t, f) is the spectrum of the background noise that is calculated.
  • the weighting module is specifically configured to use a low-order high-pass filter to emphasize high-frequency harmonic components in the windowed and framed speech signals.
  • a third aspect of the present invention provides an apparatus for voice processing, including a processor
  • the processor is configured to perform the following steps:
  • a center frequency point of the speech signal is determined according to the slope, and a harmonic is determined according to the center frequency point.
  • the processor is further configured to perform the following steps:
  • the pitch frequency is determined by calculating the frequency difference of adjacent harmonics.
  • the user equipment adds window and framing the acquired speech signal, and then adds the high-frequency harmonic components in the windowed and framed speech signals, so that the harmonic energy becomes uniform, and then according to
  • the fast Fourier transform FFT obtains the spectrum of the speech signal after the high-frequency harmonic component is emphasized, and calculates the slope of each frequency point in the spectrum to determine the center frequency of the speech signal according to the slope, and determines the harmonic according to the center frequency.
  • the process of determining harmonics by using the technical scheme is not interfered by the formant, thereby improving the accuracy of speech discrimination and improving the quality of speech processing.
  • FIG. 1 is a schematic diagram of an embodiment of a method for voice processing according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of another embodiment of a method for voice processing according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram of another embodiment of a method for voice processing according to an embodiment of the present invention.
  • FIG. 4 is a schematic diagram of an embodiment of an apparatus for voice processing according to an embodiment of the present invention.
  • FIG. 5 is a schematic diagram of another embodiment of an apparatus for voice processing according to an embodiment of the present invention.
  • FIG. 6 is a schematic diagram of another embodiment of an apparatus for voice processing according to an embodiment of the present invention.
  • the embodiment of the invention provides a speech processing method and device, which is used to solve the problem of high harmonic misjudgment rate in speech processing in the prior art, improve the accuracy of speech discrimination, and improve the quality of speech processing.
  • the technical solution of the present invention can be applied to various communication systems, such as GSM, Code Division Multiple Access (CDMA), Wideband Code Division Multiple Access (WCDMA), and general packet radio services.
  • GSM Global System for Mobile Communications
  • CDMA Code Division Multiple Access
  • WCDMA Wideband Code Division Multiple Access
  • GPRS General Packet Radio Service
  • LTE Long Term Evolution
  • a User Equipment which may also be called a Mobile Terminal, a mobile user equipment, or the like, may communicate with one or more core networks via a radio access network (eg, RAN, Radio Access Network).
  • the user equipment may be a mobile terminal, such as a mobile phone (or "cellular" phone) and a computer with a mobile terminal, for example, a portable, pocket, handheld, computer built-in or in-vehicle mobile device,
  • the wireless access network exchanges languages and/or data.
  • the base station which may be a base station (BTS, Base Transceiver Station) in GSM or CDMA, or a base station (NodeB) in WCDMA, or an evolved base station (eNB or e-NodeB, evolutional Node B) in LTE,
  • BTS Base Transceiver Station
  • NodeB base station
  • eNB evolved base station
  • e-NodeB evolutional Node B
  • the existing single voice feature parameter (or a combination of multiple feature parameters) is used for voice presence detection due to its noise immunity.
  • the weak ability of acoustic interference leads to high false positive rate; while the application of autocorrelation method for the fundamental frequency and harmonic detection is easily interfered by the speech formant, resulting in misjudgment of the pitch frequency.
  • a speech processing method is provided to solve the problem of high harmonic misjudgment rate in speech processing in the prior art, realizing speech presence detection, and determining simultaneous processing of harmonics and pitch frequency, which is a new idea Technical solution.
  • an embodiment of a method for voice processing in an embodiment of the present invention includes:
  • the user equipment adds a window and a frame to the obtained voice signal.
  • windowing of the voice signal is a necessary process. Since the user equipment can only process signals of a limited length, the original signal X(t) is truncated by T (sampling time), that is, limited. After XT(t) is further processed, this process is to add window, and the Hamming window can be used to window the speech signal to reduce the influence of the Gibbs effect. For a speech signal, it is non-stationary, so it needs to be framed in speech processing, and continuously repeats many frames, each frame length is about 20ms-30ms, and the speech signal is regarded as a steady-state signal in this interval.
  • the voice signal obtained by the user equipment may be obtained from the base station, or may be acquired by itself, and is not specifically limited herein.
  • the user equipment adds a high frequency harmonic component in the voice signal after windowing and framing;
  • the high-frequency harmonic component of the speech signal is emphasized, that is, the peak of the high-frequency harmonic is raised, so that the performance is improved, and the harmonics are made.
  • the energy becomes uniform.
  • the user equipment acquires, according to a fast Fourier transform FFT, a spectrum of a voice signal after adding a high-frequency harmonic component;
  • the time domain speech signal is transformed into a frequency spectrum of the speech signal according to a Fast Fourier Transform (FFT).
  • FFT Fast Fourier Transform
  • the user equipment calculates a slope of each frequency point in the spectrum.
  • the slope of each frequency point is calculated by calculating the derivative along the frequency axis in the spectrum.
  • the user equipment determines a center frequency of the voice signal according to the slope, and determines a harmonic according to the center frequency point;
  • the slope at 180 Hz is about +1, and the slope at the next frequency point 220 Hz is about -1, then it can be determined that the center frequency of the speech signal is at 200 Hz, and according to one The heart rate point determines a harmonic.
  • the user equipment adds window and framing the acquired voice signal, and then adds the high-frequency harmonic components in the windowed and framing voice signals to make the harmonic energy uniform, and then according to the FFT.
  • the wave process is not disturbed by the formant, thereby improving the accuracy of speech discrimination and improving the quality of speech processing.
  • FIG. 2 another embodiment of the method for voice processing in the embodiment of the present invention includes:
  • the user equipment adds a window and a frame to the obtained voice signal.
  • windowing of the voice signal is a necessary process. Since the user equipment can only process signals of a limited length, the original signal X(t) is truncated by T (sampling time), that is, limited. After XT(t) is further processed, this process is to add window, and the Hamming window can be used to window the speech signal to reduce the influence of the Gibbs effect. For a speech signal, it is non-stationary, so it needs to be framed in speech processing, and continuously repeats many frames, each frame length is about 20ms-30ms, and the speech signal is regarded as a steady-state signal in this interval.
  • the voice signal obtained by the user equipment may be obtained from the base station, or may be acquired by itself, and is not specifically limited herein.
  • the user equipment adds a high frequency harmonic component in the voice signal after windowing and framing;
  • the high-frequency harmonic component of the speech signal is emphasized, that is, the peak of the high-frequency harmonic is raised, so that the performance is improved, and the harmonics are made.
  • the energy becomes uniform.
  • the user equipment acquires, according to the FFT, a spectrum of the voice signal after the high-frequency harmonic component is added;
  • the time domain speech signal is transformed into a frequency spectrum of the speech signal according to the FFT.
  • the user equipment calculates a slope of each frequency point in the spectrum.
  • the slope of each frequency point is calculated by calculating the derivative along the frequency axis in the spectrum.
  • the user equipment determines a center frequency point of the voice signal according to the slope, and determines a harmonic according to the center frequency point.
  • the slope at 180 Hz is about +1, and the next frequency is at 220 Hz.
  • the slope of about -1 it is possible to determine the center frequency of the speech signal at 200 Hz and determine a harmonic based on a center frequency.
  • the user equipment determines whether the number of harmonics is greater than a preset threshold, and if so, step 208;
  • the preset threshold may be 15, which is not specifically limited herein.
  • the user equipment determines a pitch frequency by calculating a frequency difference of adjacent harmonics.
  • the voice signal when the person is phoning, the voice signal can be classified into two types: unvoiced and voiced according to whether the vocal cord is vibrating.
  • Voiced sound also known as voiced language, carries most of the energy in the language. Voiced sounds show significant periodicity in the time domain; while unvoiced sounds are similar to white noise, with no obvious periodicity.
  • the voiced sound When the voiced sound is present, the airflow passes through the glottis to cause the vocal cord to produce a oscillating vibration, producing a quasi-period excitation pulse train.
  • the frequency of such vocal cord vibration is called the fundamental frequency.
  • the pitch frequency is related to the length, thickness, toughness, stiffness and pronunciation habits of the individual vocal cords, and largely reflects the characteristics of the individual. In addition, the pitch frequency varies according to the gender and age of the person. In general, male speakers have lower pitch frequencies, while female speakers and children have higher pitch frequencies.
  • the user equipment adds window and framing the acquired voice signal, and then adds the high-frequency harmonic components in the windowed and framing voice signals to make the harmonic energy uniform, and then according to the FFT.
  • the process of voice presence detection is not interfered by the formant, thereby improving the accuracy of speech discrimination, improving the quality of speech processing, and realizing the detection of speech presence and determining the simultaneous processing of harmonics and pitch frequency.
  • a specific embodiment of a method for voice processing in an embodiment of the present invention includes:
  • the user equipment adds a window and a frame to the obtained voice signal.
  • windowing processing of a voice signal is a necessary process. Since the user equipment can only process signals of a limited length, the original signal X(t) is truncated by T (sampling time), that is, limited. After becoming XT(t), it is further processed. This process is windowed and can be used. The window windowed the speech signal to reduce the effects of the Gibbs effect. For a speech signal, it is non-stationary, so it needs to be framed in speech processing, and continuously repeats many frames, each frame length is about 20ms-30ms, and the speech signal is regarded as a steady-state signal in this interval.
  • the user equipment uses a low-order high-pass filter to emphasize high-frequency harmonic components in the windowed and framed speech signals;
  • a high-pass filter is used to eliminate low-frequency noise, and the high-frequency harmonic component in the voice signal is emphasized, that is, the peak of the high-frequency harmonic is raised, and the performance is improved, so that the harmonic energy is uniform.
  • the user equipment acquires, according to the FFT, a spectrum of the voice signal after the high-frequency harmonic component is added;
  • the time domain speech signal is transformed into a frequency spectrum of the speech signal according to the FFT.
  • the user equipment calculates a log spectrum of high energy components in the voice signal
  • the user equipment calculates a logarithmic spectrum X HE (t, f) of the high energy component in the speech signal, and a log spectrum of the high energy component.
  • max is the maximum value
  • X STFT (t, f) is the spectrum of the speech signal
  • S NN (t, f) is the spectrum of the background noise calculated.
  • the logarithmic spectrum of the high-energy component in the speech signal is calculated. It is useful to calculate the slope of each frequency point in the spectrum.
  • the user equipment uses a Sobel Sobel operator to calculate a slope of each frequency point in the spectrum.
  • A can be This is not specifically limited.
  • the user equipment determines a center frequency point of the voice signal according to the slope, and determines a harmonic according to the center frequency point;
  • the user equipment acquires a start edge and a falling edge of the center frequency point according to the slope, and determines a center frequency point of the voice signal by using the start edge and the falling edge, for example, a slope at 180 Hz is approximately +1, and the slope at the next frequency point of 220 Hz is about -1, then it can be determined that the center frequency of the speech signal is at 200 Hz, and one harmonic is determined according to a center frequency point.
  • the user equipment collects the number of the harmonics.
  • the user equipment determines whether the number of harmonics is greater than a preset threshold, and if so, step 209;
  • the preset threshold may be 15, which is not specifically limited herein.
  • the user equipment determines the frequency difference of the adjacent harmonics, and counts the frequency difference with the most occurrences, and determines the frequency difference with the most occurrence frequency as the pitch frequency.
  • the pitch frequency of the male voice is about 200 Hz
  • the frequency difference distribution of the adjacent harmonics is: 180, 190, 200, 200, 210, 190, 200, wherein The most frequent occurrence is 200 Hz, and the frequency difference with the highest number of occurrences is 200 Hz, that is, the pitch frequency of the speech is determined to be 200 Hz.
  • the user equipment performs windowing and framing on the acquired voice signal, and then uses a low-order high-pass filter to emphasize the high-frequency harmonic components in the windowed and framing voice signals, so that The performance of the high-frequency harmonic is improved, and the spectrum of the speech signal after the emphasis of the high-frequency harmonic component is obtained according to the FFT.
  • a low-order high-pass filter to emphasize the high-frequency harmonic components in the windowed and framing voice signals, so that The performance of the high-frequency harmonic is improved, and the spectrum of the speech signal after the emphasis of the high-frequency harmonic component is obtained according to the FFT.
  • the number of harmonics is greater than a preset threshold, it is determined that the voice signal has a voice, and by calculating a frequency difference between the adjacent harmonics, And counting the frequency difference with the most occurrences, determining the frequency difference with the most occurrences as the pitch frequency, and determining the harmonic, the pitch frequency and the voice presence detection by using the technical scheme Cheng, will not be disturbed formants, thereby increasing the accuracy of speech discrimination, improve the quality of speech processing, and, detecting the presence of voice, the determination processing while the pitch frequency and the harmonics.
  • an embodiment of the apparatus 400 for voice processing in the embodiment of the present invention includes:
  • a windowing and framing module 401 configured to window and framing the acquired voice signal
  • windowing of the voice signal is a necessary process. Since the user equipment can only process signals of a limited length, the original signal X(t) is truncated by T (sampling time), that is, limited. After XT(t), it will be further processed. This process is to add window, you can use Hamming window to voice. The signal is windowed to reduce the effects of the Gibbs effect. For a speech signal, it is non-stationary, so it needs to be framed in speech processing, and continuously repeats many frames, each frame length is about 20ms-30ms, and the speech signal is regarded as a steady-state signal in this interval.
  • the voice signal obtained by the user equipment may be obtained from the base station, or may be acquired by itself, and is not specifically limited herein.
  • the weighting module 402 is configured to: after windowing and framing the acquired voice signal by the windowing and framing module 401, weighting the high frequency harmonic components in the windowed and framed speech signals;
  • the high-frequency harmonic component of the speech signal is emphasized, that is, the peak of the high-frequency harmonic is raised, so that the performance is improved, and the harmonics are made. The energy is even.
  • the obtaining module 403 is configured to acquire, after the weighting module 402, the high frequency harmonic component, the spectrum of the voice signal after the weighting of the high frequency harmonic component is obtained according to the FFT;
  • the time domain speech signal is transformed into a frequency spectrum of the speech signal according to the FFT.
  • a first calculating module 404 configured to calculate a slope of each frequency point in the spectrum after the acquiring module 403 acquires a spectrum of the voice signal after the weighting of the high-frequency harmonic component;
  • the slope of each frequency point is calculated by calculating the derivative along the frequency axis in the spectrum.
  • a first determining module 405, configured to: after the first calculating module 404 calculates a slope of each frequency point in the frequency spectrum, determine a center frequency point of the voice signal according to the slope, and determine according to the center frequency point harmonic;
  • the slope at 180 Hz is about +1, and the slope at the next frequency point 220 Hz is about -1, then it can be determined that the center frequency of the speech signal is at 200 Hz, and according to a center The frequency determines a harmonic.
  • the user equipment adds window and framing the acquired voice signal, and then adds the high-frequency harmonic components in the windowed and framing voice signals to make the harmonic energy uniform, and then according to the FFT.
  • the wave process is not disturbed by the formant, thereby improving the accuracy of speech discrimination and improving the quality of speech processing.
  • another embodiment of the apparatus 500 for voice processing in the embodiment of the present invention includes:
  • a windowing and framing module 501 configured to window and framing the acquired voice signal
  • the weighting module 502 is configured to: after windowing and framing the acquired voice signal by the windowing and framing module 501, adding a high frequency harmonic component in the windowed and framed speech signal;
  • the weighting module is specifically configured to use a low-order high-pass filter to emphasize high-frequency harmonic components in the windowed and framed speech signals.
  • a high-pass filter is used to eliminate low-frequency noise, and the high-frequency harmonic component in the voice signal is emphasized, that is, the peak of the high-frequency harmonic is raised, and the performance is improved, so that the harmonic energy is uniform.
  • the obtaining module 503 is configured to acquire, after the weighting module 502 adds the high-frequency harmonic component, a spectrum of the voice signal after the emphasis of the high-frequency harmonic component is obtained according to the FFT;
  • the time domain speech signal is transformed into a frequency spectrum of the speech signal according to the FFT.
  • a third calculating module 504 configured to calculate a log spectrum of a high-energy component in the voice signal after the acquiring module acquires a spectrum of the voice signal after the weighting of the high-frequency harmonic component; a log spectrum of the high-energy component
  • max is the maximum value
  • X STFT (t, f) is the spectrum of the speech signal
  • S NN (t, f) is the spectrum of the background noise that is calculated.
  • calculating the log spectrum of the high energy component in the speech signal is more advantageous for calculating the slope of each frequency point in the spectrum.
  • a first calculating module 505, configured to calculate a slope of each frequency point in the frequency spectrum after the third calculating module 504 calculates a log spectrum of the high energy component in the voice signal;
  • A can be This is not specifically limited.
  • the slope is smoother and more accurate than the single spectral line.
  • a first determining module 506 configured to determine a center frequency of the voice signal according to the slope after the first calculating module 505 calculates a slope of each frequency point in the frequency spectrum, and determine, according to the center frequency point, Harmonic
  • the first determining module is configured to obtain a start edge and a falling edge of the center frequency point according to the slope, and determine a center frequency point of the voice signal by using the start edge and the falling edge.
  • the slope at 180 Hz is about +1, and the slope at the next frequency point of 220 Hz is about -1, then it can be determined that the center frequency of the speech signal is at 200 Hz, and one harmonic is determined according to a center frequency.
  • the statistics module 507 is configured to determine, at the first determining module 506, a center frequency point of the voice signal, and determine a harmonic number according to the center frequency point, and count the number of the harmonics;
  • the determining module 508 is configured to determine, after the statistics module 507 counts the number of the harmonics, whether the number of the harmonics is greater than a preset threshold;
  • the preset threshold may be 15, which is not specifically limited herein.
  • a second determining module 509 configured to: when the determining module 508 determines that the number of the harmonics is greater than a preset threshold, determining that the voice signal has a voice;
  • a second calculating module 510 configured to calculate, after the second determining module 509 determines that the voice signal has a voice, calculate a frequency difference between adjacent harmonics
  • the third determining module 511 is configured to determine a pitch frequency according to a frequency difference of adjacent harmonics calculated by the second calculating module 510.
  • the third determining module is specifically configured to determine, according to a frequency difference of the adjacent harmonics, a frequency difference that is the most frequently occurring, and determine the frequency difference that is the most frequently occurring as the pitch frequency.
  • the pitch frequency of the male voice is about 200 Hz
  • the frequency difference distribution of the adjacent harmonics is: 180, 190, 200, 200, 210, 190, 200, wherein the most frequent occurrence is 200 Hz.
  • the frequency difference that is most frequently counted by counting is 200 Hz, that is, the pitch frequency of the speech is determined to be 200 Hz.
  • the user equipment performs windowing and framing on the acquired voice signal, and then uses a low-order high-pass filter to emphasize the high-frequency harmonic components in the windowed and framing voice signals, so that The performance of the high-frequency harmonic is improved, and the spectrum of the speech signal after the emphasis of the high-frequency harmonic component is obtained according to the FFT.
  • a low-order high-pass filter to emphasize the high-frequency harmonic components in the windowed and framing voice signals, so that The performance of the high-frequency harmonic is improved, and the spectrum of the speech signal after the emphasis of the high-frequency harmonic component is obtained according to the FFT.
  • the voice signal has a voice
  • a preset threshold it is determined that the voice signal has a voice, and by calculating a frequency difference between the adjacent harmonics, And counting the frequency difference with the most occurrences, and determining the frequency difference with the most occurrences as The pitch frequency is not disturbed by the formant, the accuracy of the voice discrimination is improved, the quality of the voice processing is improved, and the simultaneous detection of the voice, the determination of the harmonics and the pitch frequency are simultaneously processed.
  • FIG. 4 to FIG. 5 illustrates the specific structure of the voice processing from the perspective of the function module.
  • the specific structure of the voice processing is described from the hardware point of view below with reference to the embodiment of FIG. 6:
  • FIG. 6 is a schematic structural diagram of a device 600 for voice processing according to an embodiment of the present invention, which may include at least one processor 601 (such as a CPU, Central Processing Unit), at least one network interface, or other communication interface.
  • the memory 602, the at least one communication bus, the at least one input device 603, the at least one output device 604, and the uninterruptible power supply UPS 605 are used to effect connection communication between the devices.
  • the processor 601 is configured to execute executable modules, such as computer programs, stored in the memory 602.
  • the memory 602 may include a high speed random access memory (RAM), and may also include a non-volatile memory such as at least one disk memory.
  • the communication connection between the system gateway and at least one other network element is implemented by at least one network interface (which may be wired or wireless), and an Internet, a wide area network, a local network, a metropolitan area network, etc. may be used.
  • program instructions are stored in the memory 602, and the program instructions may be executed by the processor 601.
  • the processor 601 specifically performs the following steps:
  • a center frequency point of the speech signal is determined according to the slope, and a harmonic is determined according to the center frequency point.
  • the processor 601 can also perform the following steps:
  • the pitch frequency is determined by calculating the frequency difference of adjacent harmonics.
  • the processor adds the window and the framed speech signal, and then adds the high-frequency harmonic components in the windowed and framed speech signals to make the harmonic energy uniform, and then obtains the high frequency according to the FFT.
  • the spectrum of the speech signal after the harmonic component and calculate the slope of each frequency point in the spectrum, to determine the center frequency of the speech signal according to the slope, and determine the harmonic according to the center frequency, and then count the number of harmonics, and judge When the number of harmonics is greater than a preset threshold, it is determined that there is speech in the speech signal.
  • the fundamental frequency is determined by calculating the frequency difference of adjacent harmonics, and the technical solution is used to determine the harmonic, the pitch frequency and the presence detection of the speech. The process is not interfered by the formant, thereby improving the accuracy of speech discrimination, improving the quality of speech processing, and realizing the detection of speech presence and determining the simultaneous processing of harmonics and pitch frequency.
  • the disclosed apparatus and method may be implemented in other manners.
  • the device embodiments described above are merely illustrative.
  • the division of cells is only a logical function division.
  • multiple units or components may be combined or integrated. Go to another system, or some features can be ignored or not executed.
  • the mutual coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection through some interface, device or unit, and may be in an electrical, mechanical or other form.
  • the units described as separate components may or may not be physically separate, and the components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of the embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically separately, or two or more units may be integrated into one unit.
  • the above integrated unit can be implemented in the form of hardware or in the form of a software functional unit.
  • An integrated unit if implemented in the form of a software functional unit and sold or used as a standalone product, can be stored in a computer readable storage medium.
  • the technical solution of the present invention which is essential or contributes to the prior art, or all or part of the technical solution, may be embodied in the form of a software product stored in a storage medium.
  • a computer device which can be a personal computer, a server, or The network device or the like
  • the foregoing storage medium includes: a U disk, a mobile hard disk, a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephone Function (AREA)
  • Telephonic Communication Services (AREA)

Abstract

一种语音处理的方法及装置,用于解决现有技术中语音处理时误判率高的问题,所述方法包括:用户设备将获取的语音信号进行加窗和分帧(101);加重所述进行加窗和分帧后的语音信号中的高频谐波成分(102);根据快速傅立叶变换FFT获取所述加重高频谐波成分后的语音信号的频谱(103);计算所述频谱中各个频点的斜率(104);根据所述斜率确定所述语音信号的中心频点,并根据所述中心频点确定谐波(105)。

Description

一种语音处理的方法及装置 技术领域
本发明涉及通信领域,具体涉及一种语音处理方法及装置。
背景技术
在语音通信设备或高质量录音设备中,都需要用到语音编解码、语音前后处理、语音合成、语音识别等技术,所有这些对语音进行处理的技术都需要对语音信号进行分帧,然后逐帧进行处理,其中,谐波检测是语音处理的关键技术。
现有的谐波检测技术主要是应用自相关法,通过计算语音信号的自相关函数,并检测出现峰值的位置,进而确定谐波。
但是,应用自相关法进行谐波检测容易受到语音共振峰的干扰,导致谐波的误判率高。
发明内容
本发明实施例提供一种语音处理方法及装置,用于解决现有技术中语音处理时谐波误判率高的问题。
本发明第一方面提供一种语音处理的方法,包括:
用户设备将获取的语音信号进行加窗和分帧;
所述用户设备加重所述进行加窗和分帧后的语音信号中的高频谐波成分;
所述用户设备根据快速傅立叶变换FFT获取所述加重高频谐波成分后的语音信号的频谱;
所述用户设备计算所述频谱中各个频点的斜率;
所述用户设备根据所述斜率确定所述语音信号的中心频点,并根据所述中心频点确定谐波。
结合第一方面,在第一种可能的实现方式中,所述根据所述中心频点确定谐波之后还包括:
所述用户设备统计所述谐波的个数,并判断所述谐波的个数是否大于预设的阈值,若是,则确定所述语音信号存在语音。
结合第一方面,在第二种可能的实现方式中,所述根据所述中心频点确定谐波之后还包括:
所述用户设备通过计算相邻所述谐波的频率差,确定基音频率。
结合第一方面,在第三种可能的实现方式中,所述用户设备计算所述频谱中各个频点的斜率包括:
所述用户设备采用索贝尔sobel算子计算所述频谱中各个频点的斜率g,所述斜率g=A*B,其中,A为sobel算子,B为所述频谱的矩阵。
结合第一方面,在第四种可能的实现方式中,所述用户设备根据所述斜率确定所述语音信号的中心频点包括:
所述用户设备根据所述斜率获取中心频点的起始沿和下降沿,并通过所述起始沿和下降沿确定所述语音信号的中心频点。
结合第一方面,在第五种可能的实现方式中,所述用户设备通过计算相邻所述谐波的频率差,确定基音频率包括:
所述用户设备通过计算相邻所述谐波的频率差,并统计出现次数最多的频率差,将所述出现次数最多的频率差确定为所述基音频率。
结合第一方面,在第六种可能的实现方式中,所述用户设备计算所述频谱中各个频点的斜率之前还包括:
所述用户设备计算所述语音信号中高能量成分的对数谱XHE(t,f),所述高能量成分的对数谱
Figure PCTCN2015085209-appb-000001
其中,max为取最大值符号,XSTFT(t,f)为语音信号的频谱,SNN(t,f)为通过计算的背景噪声的频谱。
结合第一方面,在第七种可能的实现方式中,所述用户设备加重所述进行加窗和分帧后的语音信号中的高频谐波成分包括:
所述用户设备采用低阶的高通滤波器加重所述进行加窗和分帧后的语音信号中的高频谐波成分。
本发明第二方面提供一种语音处理的装置,包括:
加窗和分帧模块,用于将获取的语音信号进行加窗和分帧;
加重模块,用于在所述加窗和分帧模块将获取的语音信号进行加窗和分帧后,加重所述进行加窗和分帧后的语音信号中的高频谐波成分;
获取模块,用于在所述加重模块加重所述高频谐波成分后,根据FFT获取所述加重高频谐波成分后的语音信号的频谱;
第一计算模块,用于在所述获取模块获取所述加重高频谐波成分后的语音信号的频谱后,计算所述频谱中各个频点的斜率;
第一确定模块,用于在所述第一计算模块计算所述频谱中各个频点的斜率后,根据所述斜率确定所述语音信号的中心频点,并根据所述中心频点确定谐波。
结合第二方面,在第一种可能的实现方式中,所述装置还包括:
统计模块,用于在所述第一确定模块确定所述语音信号的中心频点后,并根据所述中心频点确定谐波后,统计所述谐波的个数;
判断模块,用于在所述统计模块统计所述谐波的个数后,判断所述谐波的个数是否大于预设的阈值;
第二确定模块,用于当所述判断模块判断所述谐波的个数大于预设的阈值时,则确定所述语音信号存在语音。
结合第二方面,在第二种可能的实现方式中,所述装置还包括:
第二计算模块,用于计算相邻所述谐波的频率差;
第三确定模块,用于根据所述第二计算模块计算的相邻所述谐波的频率差,确定基音频率。
结合第二方面,在第三种可能的实现方式中,
所述第一计算模块,具体用于采用索贝尔sobel算子计算所述频谱中各个频点的斜率g,所述斜率g=A*B,其中,A为sobel算子,B为所述频谱的矩阵。
结合第二方面,在第四种可能的实现方式中,
所述第一确定模块,具体用于根据所述斜率获取中心频点的起始沿和下降沿,并通过所述起始沿和下降沿确定所述语音信号的中心频点。
结合第二方面,在第五种可能的实现方式中,
所述第三确定模块,具体用于根据相邻所述谐波的频率差,并统计出现次数最多的频率差,将所述出现次数最多的频率差确定为所述基音频率。
结合第二方面,在第六种可能的实现方式中,所述装置还包括:
第三计算模块,用于计算所述语音信号中高能量成分的对数谱XHE(t,f),所述高能量成分的对数谱
Figure PCTCN2015085209-appb-000002
其中,max为取最大值符号,XSTFT(t,f)为语音信号的频谱,SNN(t,f)为通过计算的背景噪声的频谱。
结合第二方面,在第七种可能的实现方式中,
所述加重模块,具体用于采用低阶的高通滤波器加重所述进行加窗和分帧后的语音信号中的高频谐波成分。
本发明第三方面提供一种语音处理的装置,包括处理器;
所述处理器,用于执行如下步骤:
将获取的语音信号进行加窗和分帧;
加重所述进行加窗和分帧后的语音信号中的高频谐波成分;
根据FFT获取所述加重高频谐波成分后的语音信号的频谱;
计算所述频谱中各个频点的斜率;
根据所述斜率确定所述语音信号的中心频点,并根据所述中心频点确定谐波。
结合第三方面,在第一种可能的实现方式中,
所述处理器还用于执行如下步骤:
统计所述谐波的个数,并判断所述谐波的个数是否大于预设的阈值,若是,则确定所述语音信号存在语音;
通过计算相邻所述谐波的频率差,确定基音频率。
应用以上技术方案,用户设备将获取的语音信号进行加窗和分帧,再加重进行加窗和分帧后的语音信号中的高频谐波成分,使各谐波能量变得均匀,再根据快速傅立叶变换FFT获取加重高频谐波成分后的语音信号的频谱,并计算频谱中各个频点的斜率,以根据斜率确定语音信号的中心频点,并根据中心频点确定谐波。采用该技术方案确定谐波的过程,不会受到共振峰的干扰,从而提高语音判别的准确率,提升语音处理的质量。
附图说明
图1为本发明实施例中语音处理的方法的一个实施例示意图;
图2为本发明实施例中语音处理的方法的另一个实施例示意图;
图3为本发明实施例中语音处理的方法的另一个实施例示意图;
图4为本发明实施例中语音处理的装置的一个实施例示意图;
图5为本发明实施例中语音处理的装置的另一个实施例示意图;
图6为本发明实施例中语音处理的装置的另一个实施例示意图。
具体实施方式
本发明实施例提供一种语音处理方法及装置,用于解决现有技术中语音处理时谐波误判率高的问题,提高语音判别的准确率,提升语音处理的质量。
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
本发明的技术方案,可以应用于各种通信系统,例如:GSM,码分多址(CDMA,CodeDivision Multiple Access)系统,宽带码分多址(WCDMA,Wideband Code DivisionMultiple Access Wireless),通用分组无线业务(GPRS,General Packet Radio Service),长期演进(LTE,Long Term Evolution)等。
用户设备(UE,User Equipment),也可称之为移动终端(Mobile Terminal)、移动用户设备等,可以经无线接入网(例如,RAN,Radio Access Network)与一个或多个核心网进行通信,用户设备可以是移动终端,如移动电话(或称为“蜂窝”电话)和具有移动终端的计算机,例如,可以是便携式、袖珍式、手持式、计算机内置的或者车载的移动装置,它们与无线接入网交换语言和/或数据。
基站,可以是GSM或CDMA中的基站(BTS,Base Transceiver Station),也可以是WCDMA中的基站(NodeB),还可以是LTE中的演进型基站(eNB或e-NodeB,evolutionalNode B),本发明并不限定。
由于现有的语音存在检测、基音频率及谐波检测都是分开处理的,现有的单个语音特征参数(或多个特征参数结合)进行语音存在检测时,由于其抗噪 声干扰的能力弱导致误判率高;而应用自相关法进行基音频率及谐波检测容易受到语音共振峰的干扰,造成基音频率的误判。
根据本发明实施例,提供一种语音处理方法来解决现有技术中语音处理时谐波误判率高的问题,实现语音存在检测,确定谐波及基音频率的同时处理,是一种全新思路的技术方案。
请参阅图1,本发明实施例中语音处理的方法的一个实施例包括:
101、用户设备将获取的语音信号进行加窗和分帧;
在本发明实施例中,对语音信号的加窗是一个必经的过程,由于用户设备只能处理有限长度的信号,因此原始信号X(t)要以T(采样时间)截断,即有限化,成为XT(t)后再进一步处理,这个过程就是加窗,可以采用哈明窗对语音信号加窗,以减小吉布斯效应的影响。对于一个语音信号而言,它是非平稳的,因此在语音处理时需要进行分帧,连续重复发出好多帧,每帧长度约20ms-30ms,在这一区间内把语音信号看作为稳态信号。
需要说明的是,该用户设备获取的语音信号可以从基站获取,也可以是自身检测获取,此处不做具体限定。
102、用户设备加重进行加窗和分帧后的语音信号中的高频谐波成分;
在本发明实施例中,由于语音信号的高频谐波能量较弱,因此加重语音信号的高频谐波成分,即将该高频谐波的波峰抬高,使其性能提升,使各谐波能量变得均匀。
103、用户设备根据快速傅立叶变换FFT获取加重高频谐波成分后的语音信号的频谱;
在本发明实施例中,根据快速傅立叶变换(Fast Fourier Transform,FFT)将时域语音信号变换成为语音信号的频谱。
104、用户设备计算频谱中各个频点的斜率;
在本发明实施例中,通过沿频谱中的频率轴计算导数,即计算各个频点的斜率。
105、用户设备根据斜率确定语音信号的中心频点,并根据中心频点确定谐波;
在本发明实施例中,例如,180Hz处的斜率约为+1,而下一个频点220Hz处斜率约为-1,则可以判定在200Hz处为该语音信号的中心频点,并根据一个中 心频点确定一个谐波。
本发明实施例中,用户设备将获取的语音信号进行加窗和分帧,再加重进行加窗和分帧后的语音信号中的高频谐波成分,使各谐波能量均匀,再根据FFT获取加重高频谐波成分后的语音信号的频谱,并计算频谱中各个频点的斜率,以根据斜率确定语音信号的中心频点,并根据中心频点确定谐波,采用该技术方案确定谐波的过程,不会受到共振峰的干扰,从而提高语音判别的准确率,提升语音处理的质量。
请参阅图2,在上述实施例的基础上,本发明实施例中语音处理的方法的另一个实施例包括:
201、用户设备将获取的语音信号进行加窗和分帧;
在本发明实施例中,对语音信号的加窗是一个必经的过程,由于用户设备只能处理有限长度的信号,因此原始信号X(t)要以T(采样时间)截断,即有限化,成为XT(t)后再进一步处理,这个过程就是加窗,可以采用哈明窗对语音信号加窗,以减小吉布斯效应的影响。对于一个语音信号而言,它是非平稳的,因此在语音处理时需要进行分帧,连续重复发出好多帧,每帧长度约20ms-30ms,在这一区间内把语音信号看作为稳态信号。
需要说明的是,该用户设备获取的语音信号可以从基站获取,也可以是自身检测获取,此处不做具体限定。
202、用户设备加重进行加窗和分帧后的语音信号中的高频谐波成分;
在本发明实施例中,由于语音信号的高频谐波能量较弱,因此加重语音信号的高频谐波成分,即将该高频谐波的波峰抬高,使其性能提升,使各谐波能量变得均匀。
203、用户设备根据FFT获取加重高频谐波成分后的语音信号的频谱;
在本发明实施例中,根据FFT将时域语音信号变换成为语音信号的频谱。
204、用户设备计算频谱中各个频点的斜率;
在本发明实施例中,通过沿频谱中的频率轴计算导数,即计算各个频点的斜率。
205、用户设备根据斜率确定语音信号的中心频点,并根据中心频点确定谐波;
在本发明实施例中,例如,180Hz处的斜率约为+1,而下一个频点220Hz处 斜率约为-1,则可以判定在200Hz处为该语音信号的中心频点,并根据一个中心频点确定一个谐波。
206、用户设备统计谐波的个数;
207、用户设备判断谐波的个数是否大于预设的阈值,若是,则执行步骤208;
208、当谐波的个数大于预设的阈值时,则确定语音信号存在语音;
在本发明实施例中,预设的阈值可以为15,此处不做具体限定。
209、用户设备通过计算相邻谐波的频率差,确定基音频率。
在本发明实施例中,人在发音时,根据声带是否震动可以将语音信号分为清音跟浊音两种。浊音又称有声语言,携带着语言中大部分的能量,浊音在时域上呈现出明显的周期性;而清音类似于白噪声,没有明显的周期性。发浊音时,气流通过声门使声带产生张弛震荡式振动,产生准周期的激励脉冲串。这种声带振动的频率称为基音频率,通常,基音频率与个人声带的长短、薄厚、韧性、劲度和发音习惯等有关系,在很大程度上反应了个人的特征。此外,基音频率还跟随着人的性别、年龄不同而有所不同。一般来说,男性说话者的基音频率较低,而女性说话者和小孩的基音频率相对较高。
本发明实施例中,用户设备将获取的语音信号进行加窗和分帧,再加重进行加窗和分帧后的语音信号中的高频谐波成分,使各谐波能量均匀,再根据FFT获取加重高频谐波成分后的语音信号的频谱,并计算频谱中各个频点的斜率,以根据斜率确定语音信号的中心频点,并根据中心频点确定谐波,进而统计谐波的个数,并判断谐波的个数大于预设的阈值时,则确定语音信号存在语音,最后,通过计算相邻谐波的频率差,确定基音频率,采用该技术方案确定谐波,基音频率和语音存在检测的过程,不会受到共振峰的干扰,从而提高语音判别的准确率,提升语音处理的质量,而且,实现语音存在检测,确定谐波及基音频率的同时处理。
请参阅图3,本发明实施例中语音处理的方法的一个具体实施例包括:
301、用户设备将获取的语音信号进行加窗和分帧;
在本发明实施例中,对语音信号的加窗处理是一个必经的过程,由于用户设备只能处理有限长度的信号,因此原始信号X(t)要以T(采样时间)截断,即有限化,成为XT(t)后再进一步处理,这个过程就是加窗处理,可以采用哈明 窗对语音信号加窗,以减小吉布斯效应的影响。对于一个语音信号而言,它是非平稳的,因此在语音处理时需要进行分帧,连续重复发出好多帧,每帧长度约20ms-30ms,在这一区间内把语音信号看作为稳态信号。
302、用户设备采用低阶的高通滤波器加重进行加窗和分帧后的语音信号中的高频谐波成分;
在本发明实施例中,采用高通滤波器,消除低频噪音,加重语音信号中的高频谐波成分,即将该高频谐波的波峰抬高,使其性能提升,使各谐波能量均匀。
303、用户设备根据FFT获取加重高频谐波成分后的语音信号的频谱;
在本发明实施例中,根据FFT将时域语音信号变换成为语音信号的频谱。
304、用户设备计算语音信号中高能量成分的对数谱;
在本发明实施例中,用户设备计算语音信号中高能量成分的对数谱XHE(t,f),高能量成分的对数谱
Figure PCTCN2015085209-appb-000003
其中,max为取最大值符号,XSTFT(t,f)为语音信号的频谱,SNN(t,f)为通过计算的背景噪声的频谱,计算语音信号中高能量成分的对数谱更有利于计算频谱中各个频点的斜率。
305、用户设备采用索贝尔sobel算子计算频谱中各个频点的斜率;
在本发明实施例中,用户设备采用索贝尔sobel算子计算所述频谱中各个频点的斜率g,所述斜率g=A*B,其中,A为sobel算子,B为频谱的矩阵。
需要说明的是,A可以为
Figure PCTCN2015085209-appb-000004
此处不做具体限定。
306、用户设备根据斜率确定语音信号的中心频点,并根据中心频点确定谐波;
可选的,用户设备根据所述斜率获取中心频点的起始沿和下降沿,并通过所述起始沿和下降沿确定所述语音信号的中心频点,例如,180Hz处的斜率约为+1,而下一个频点220Hz处斜率约为-1,则可以判定在200Hz处为该语音信号的中心频点,并根据一个中心频点确定一个谐波。
307、用户设备统计所述谐波的个数;
308、用户设备判断谐波的个数是否大于预设的阈值,若是,则执行步骤209;
309、当谐波的个数大于预设的阈值时,则确定语音信号存在语音;
在本发明实施例中,预设的阈值可以为15,此处不做具体限定。
310、用户设备通过计算相邻谐波的频率差,并统计出现次数最多的频率差,将出现次数最多的频率差确定为基音频率。
结合步骤209的说明,在本发明实施例中,例如男性语音的基音频率约为200Hz左右,若相邻谐波的频率差分布为:180、190、200、200、210、190、200,其中出现次数最多的为200Hz,通过统计出现次数最多的频率差为200Hz,即确定该语音的基音频率为200Hz。
本发明实施例中,用户设备将获取的语音信号进行加窗和分帧,后采用低阶的高通滤波器加重所述进行加窗和分帧后的语音信号中的高频谐波成分,使得高频谐波的性能提升,并根据FFT获取所述加重高频谐波成分后的语音信号的频谱,通过计算语音信号中高能量成分的对数谱更有利于计算频谱中各个频点的斜率,进一步采用索贝尔sobel算子计算所述频谱中各个频点的斜率,比单根谱线求斜率更平滑、更准确,并根据所述斜率确定所述语音信号的中心频点,并根据所述中心频点确定谐波,进一步统计所述谐波的个数,当谐波的个数大于预设的阈值,则确定所述语音信号存在语音,通过计算相邻所述谐波的频率差,并统计出现次数最多的频率差,将所述出现次数最多的频率差确定为所述基音频率,采用该技术方案确定谐波,基音频率和语音存在检测的过程,不会受到共振峰的干扰,从而提高语音判别的准确率,提升语音处理的质量,而且,实现语音存在检测、确定谐波及基音频率的同时处理。
为便于更好的实施本发明实施例的上述相关方法,下面还提供用于配合上述方法的相关装置。
请参阅图4,本发明实施例中语音处理的装置400的一个实施例包括:
加窗和分帧模块401,用于将获取的语音信号进行加窗和分帧;
在本发明实施例中,对语音信号的加窗是一个必经的过程,由于用户设备只能处理有限长度的信号,因此原始信号X(t)要以T(采样时间)截断,即有限化,成为XT(t)后再进一步处理,这个过程就是加窗,可以采用哈明窗对语音 信号加窗,以减小吉布斯效应的影响。对于一个语音信号而言,它是非平稳的,因此在语音处理时需要进行分帧,连续重复发出好多帧,每帧长度约20ms-30ms,在这一区间内把语音信号看作为稳态信号。
需要说明的是,该用户设备获取的语音信号可以从基站获取,也可以是自身检测获取,此处不做具体限定。
加重模块402,用于在所述加窗和分帧模块401将获取的语音信号进行加窗和分帧后,加重所述进行加窗和分帧后的语音信号中的高频谐波成分;
在本发明实施例中,由于语音信号的高频谐波能量较弱,因此加重语音信号的高频谐波成分,即将该高频谐波的波峰抬高,使其性能提升,使各谐波能量均匀。
获取模块403,用于在所述加重模块402加重所述高频谐波成分后,根据FFT获取所述加重高频谐波成分后的语音信号的频谱;
在本发明实施例中,根据FFT将时域语音信号变换成为语音信号的频谱。
第一计算模块404,用于在所述获取模块403获取所述加重高频谐波成分后的语音信号的频谱后,计算所述频谱中各个频点的斜率;
在本发明实施例中,通过沿频谱中的频率轴计算导数,即计算各个频点的斜率。
第一确定模块405,用于在所述第一计算模块404计算所述频谱中各个频点的斜率后,根据所述斜率确定所述语音信号的中心频点,并根据所述中心频点确定谐波;
在本发明实施例中,例如,180Hz处的斜率约为+1,而下一个频点220Hz处斜率约为-1,则可以判定在200Hz处为该语音信号的中心频点,并根据一个中心频点确定一个谐波。
本发明实施例中,用户设备将获取的语音信号进行加窗和分帧,再加重进行加窗和分帧后的语音信号中的高频谐波成分,使各谐波能量均匀,再根据FFT获取加重高频谐波成分后的语音信号的频谱,并计算频谱中各个频点的斜率,以根据斜率确定语音信号的中心频点,并根据中心频点确定谐波,采用该技术方案确定谐波的过程,不会受到共振峰的干扰,从而提高语音判别的准确率,提升语音处理的质量。
请参阅如5,本发明实施例中语音处理的装置500的另一个实施例包括:
加窗和分帧模块501,用于将获取的语音信号进行加窗和分帧;
加重模块502,用于在所述加窗和分帧模块501将获取的语音信号进行加窗和分帧后,加重所述进行加窗和分帧后的语音信号中的高频谐波成分;
可选的,所述加重模块,具体用于采用低阶的高通滤波器加重所述进行加窗和分帧后的语音信号中的高频谐波成分。
在本发明实施例中,采用高通滤波器,消除低频噪音,加重语音信号中的高频谐波成分,即将该高频谐波的波峰抬高,使其性能提升,使各谐波能量均匀。
获取模块503,用于在所述加重模块502加重所述高频谐波成分后,根据FFT获取所述加重高频谐波成分后的语音信号的频谱;
在本发明实施例中,根据FFT将时域语音信号变换成为语音信号的频谱。
第三计算模块504,用于在所述获取模块获取所述加重高频谐波成分后的语音信号的频谱后,计算所述语音信号中高能量成分的对数谱;高能量成分的对数谱
Figure PCTCN2015085209-appb-000005
其中,max为取最大值符号,XSTFT(t,f)为语音信号的频谱,SNN(t,f)为通过计算的背景噪声的频谱。
在本发明实施例中,计算语音信号中高能量成分的对数谱更有利于计算频谱中各个频点的斜率。
第一计算模块505,用于在所述第三计算模块504计算所述语音信号中高能量成分的对数谱后,计算所述频谱中各个频点的斜率;
可选的,所述第一计算模块,具体用于采用索贝尔sobel算子计算所述频谱中各个频点的斜率g,所述斜率g=A*B,其中,A为sobel算子,B为频谱的矩阵。
需要说明的是,A可以为
Figure PCTCN2015085209-appb-000006
此处不做具体限定。
利用索贝尔sobel算子求解各个频点斜率,比单根谱线求斜率更平滑、更准确。
第一确定模块506,用于在所述第一计算模块505计算所述频谱中各个频点的斜率后,根据所述斜率确定所述语音信号的中心频点,并根据所述中心频点确定谐波;
可选的,所述第一确定模块,具体用于根据所述斜率获取中心频点的起始沿和下降沿,并通过所述起始沿和下降沿确定所述语音信号的中心频点。
例如,180Hz处的斜率约为+1,而下一个频点220Hz处斜率约为-1,则可以判定在200Hz处为该语音信号的中心频点,并根据一个中心频点确定一个谐波。
统计模块507,用于在所述第一确定模块506确定所述语音信号的中心频点,并根据所述中心频点确定谐波后,统计所述谐波的个数;
判断模块508,用于在所述统计模块507统计所述谐波的个数后,判断所述谐波的个数是否大于预设的阈值;
在本发明实施例中,预设的阈值可以为15,此处不做具体限定。
第二确定模块509,用于当所述判断模块508判断所述谐波的个数大于预设的阈值时,则确定所述语音信号存在语音;
第二计算模块510,用于在所述第二确定模块509确定所述语音信号存在语音后,计算相邻所述谐波的频率差;
第三确定模块511,用于根据所述第二计算模块510计算的相邻所述谐波的频率差,确定基音频率。
可选的,所述第三确定模块,具体用于根据相邻所述谐波的频率差,并统计出现次数最多的频率差,将所述出现次数最多的频率差确定为所述基音频率。
在本发明实施例中,例如男性语音的基音频率约为200Hz左右,若相邻谐波的频率差分布为:180、190、200、200、210、190、200,其中出现次数最多的为200Hz,通过统计出现次数最多的频率差为200Hz,即确定该语音的基音频率为200Hz。
本发明实施例中,用户设备将获取的语音信号进行加窗和分帧,后采用低阶的高通滤波器加重所述进行加窗和分帧后的语音信号中的高频谐波成分,使得高频谐波的性能提升,并根据FFT获取所述加重高频谐波成分后的语音信号的频谱,通过计算语音信号中高能量成分的对数谱更有利于计算频谱中各个频点的斜率,进一步采用索贝尔sobel算子计算所述频谱中各个频点的斜率,比单根谱线求斜率更平滑、更准确,并根据所述斜率确定所述语音信号的中心频点,并根据所述中心频点确定谐波,进一步统计所述谐波的个数,当谐波的个数大于预设的阈值,则确定所述语音信号存在语音,通过计算相邻所述谐波的频率差,并统计出现次数最多的频率差,将所述出现次数最多的频率差确定为 所述基音频率,从而不会受到共振峰的干扰,提高语音判别的准确率,提升语音处理的质量,而且,实现语音存在检测、确定谐波及基音频率的同时处理。
图4至图5所示的实施例从功能模块的角度对语音处理的具体结构进行了说明,以下结合图6的实施例从硬件角度对语音处理的具体结构进行说明:
请参阅图6,图6为本发明实施例提供的语音处理的装置600的一个结构示意图,其中,可包括至少一个处理器601(例如CPU,Central Processing Unit)、至少一个网络接口或者其它通信接口、存储器602、至少一个通信总线、至少一个输入装置603、至少一个输出装置604和不间断电源UPS 605用于实现这些装置之间的连接通信。处理器601用于执行存储器602中存储的可执行模块,例如计算机程序。存储器602可能包含高速随机存取存储器(RAM,Random Access Memory),也可能还包括非不稳定的存储器(non-volatile memory),例如至少一个磁盘存储器。通过至少一个网络接口(可以是有线或者无线)实现该系统网关与至少一个其它网元之间的通信连接,可以使用互联网,广域网,本地网,城域网等。
如图6所示,在一些实施方式中,存储器602中存储了程序指令,程序指令可以被处理器601执行,处理器601具体执行以下步骤:
将获取的语音信号进行加窗和分帧;
加重所述进行加窗和分帧后的语音信号中的高频谐波成分;
根据快速傅立叶变换FFT获取所述加重高频谐波成分后的语音信号的频谱;
计算所述频谱中各个频点的斜率;
根据所述斜率确定所述语音信号的中心频点,并根据所述中心频点确定谐波。
在一些实施方式中,处理器601还可以执行以下步骤:
统计所述谐波的个数,并判断所述谐波的个数是否大于预设的阈值,若是,则确定所述语音信号存在语音;
通过计算相邻所述谐波的频率差,确定基音频率。
需要说明的是,在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述的部分,可以参见其他实施例的相关描述。比如,在图6所述的实施例中没有详述的部分,可以参见上述图1到图5的方法或装置实施 例的相关描述。
可见,处理器将获取的语音信号进行加窗和分帧,再加重进行加窗和分帧后的语音信号中的高频谐波成分,使各谐波能量均匀,再根据FFT获取加重高频谐波成分后的语音信号的频谱,并计算频谱中各个频点的斜率,以根据斜率确定语音信号的中心频点,并根据中心频点确定谐波,进而统计谐波的个数,并判断谐波的个数大于预设的阈值时,则确定语音信号存在语音,最后,通过计算相邻谐波的频率差,确定基音频率,采用该技术方案确定谐波,基音频率和语音存在检测的过程,不会受到共振峰的干扰,从而提高语音判别的准确率,提升语音处理的质量,而且,实现语音存在检测,确定谐波及基音频率的同时处理。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或 者网络设备等)执行本发明各个实施例方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
以上,以上实施例仅用以说明本发明的技术方案,而非对其限制;尽管参照前述实施例对本发明进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。

Claims (18)

  1. 一种语音处理的方法,其特征在于,包括:
    用户设备将获取的语音信号进行加窗和分帧;
    所述用户设备加重所述进行加窗和分帧后的语音信号中的高频谐波成分;
    所述用户设备根据快速傅立叶变换FFT获取所述加重高频谐波成分后的语音信号的频谱;
    所述用户设备计算所述频谱中各个频点的斜率;
    所述用户设备根据所述斜率确定所述语音信号的中心频点,并根据所述中心频点确定谐波。
  2. 根据权利要求1所述的方法,其特征在于,所述根据所述中心频点确定谐波之后还包括:
    所述用户设备统计所述谐波的个数,并判断所述谐波的个数是否大于预设的阈值,若是,则确定所述语音信号存在语音。
  3. 根据权利要求1所述的方法,其特征在于,所述根据所述中心频点确定谐波之后还包括:
    所述用户设备通过计算相邻所述谐波的频率差,确定基音频率。
  4. 根据权利要求1所述的方法,其特征在于,所述用户设备计算所述频谱中各个频点的斜率包括:
    所述用户设备采用索贝尔sobel算子计算所述频谱中各个频点的斜率g,所述斜率g=A*B,其中,A为sobel算子,B为所述频谱的矩阵。
  5. 根据权利要求1所述的方法,其特征在于,所述用户设备根据所述斜率确定所述语音信号的中心频点包括:
    所述用户设备根据所述斜率获取中心频点的起始沿和下降沿,并通过所述起始沿和下降沿确定所述语音信号的中心频点。
  6. 根据权利要求1所述的方法,其特征在于,所述用户设备通过计算相邻所述谐波的频率差,确定基音频率包括:
    所述用户设备通过计算相邻所述谐波的频率差,并统计出现次数最多的频率差,将所述出现次数最多的频率差确定为所述基音频率。
  7. 根据权利要求1所述的方法,其特征在于,所述用户设备计算所述频 谱中各个频点的斜率之前还包括:
    所述用户设备计算所述语音信号中高能量成分的对数谱XHE(t,f),所述高能量成分的对数谱
    Figure PCTCN2015085209-appb-100001
    其中,max为取最大值符号,XSTFT(t,f)为语音信号的频谱,SNN(t,f)为通过计算的背景噪声的频谱。
  8. 根据权利要求1所述的方法,其特征在于,所述用户设备加重所述进行加窗和分帧后的语音信号中的高频谐波成分包括:
    所述用户设备采用低阶的高通滤波器加重所述进行加窗和分帧后的语音信号中的高频谐波成分。
  9. 一种语音处理的装置,其特征在于,包括:
    加窗和分帧模块,用于将获取的语音信号进行加窗和分帧;
    加重模块,用于在所述加窗和分帧模块将获取的语音信号进行加窗和分帧后,加重所述进行加窗和分帧后的语音信号中的高频谐波成分;
    获取模块,用于在所述加重模块加重所述高频谐波成分后,根据FFT获取所述加重高频谐波成分后的语音信号的频谱;
    第一计算模块,用于在所述获取模块获取所述加重高频谐波成分后的语音信号的频谱后,计算所述频谱中各个频点的斜率;
    第一确定模块,用于在所述第一计算模块计算所述频谱中各个频点的斜率后,根据所述斜率确定所述语音信号的中心频点,并根据所述中心频点确定谐波。
  10. 根据权利要求9所述的装置,其特征在于,所述装置还包括:
    统计模块,用于在所述第一确定模块确定所述语音信号的中心频点后,并根据所述中心频点确定谐波后,统计所述谐波的个数;
    判断模块,用于在所述统计模块统计所述谐波的个数后,判断所述谐波的个数是否大于预设的阈值;
    第二确定模块,用于当所述判断模块判断所述谐波的个数大于预设的阈值时,则确定所述语音信号存在语音。
  11. 根据权利要求9所述的装置,其特征在于,所述装置还包括:
    第二计算模块,用于计算相邻所述谐波的频率差;
    第三确定模块,用于根据所述第二计算模块计算的相邻所述谐波的频率差,确定基音频率。
  12. 根据权利要求9所述的装置,其特征在于,
    所述第一计算模块,具体用于采用索贝尔sobel算子计算所述频谱中各个频点的斜率g,所述斜率g=A*B,其中,A为sobel算子,B为所述频谱的矩阵。
  13. 根据权利要求9所述的装置,其特征在于,
    所述第一确定模块,具体用于根据所述斜率获取中心频点的起始沿和下降沿,并通过所述起始沿和下降沿确定所述语音信号的中心频点。
  14. 根据权利要求9所述的装置,其特征在于,
    所述第三确定模块,具体用于根据相邻所述谐波的频率差,并统计出现次数最多的频率差,将所述出现次数最多的频率差确定为所述基音频率。
  15. 根据权利要求9所述的装置,其特征在于,所述装置还包括:
    第三计算模块,用于计算所述语音信号中高能量成分的对数谱XHE(t,f),所述高能量成分的对数谱
    Figure PCTCN2015085209-appb-100002
    其中,max为取最大值符号,XSTFT(t,f)为语音信号的频谱,SNN(t,f)为通过计算的背景噪声的频谱。
  16. 根据权利要求9所述的装置,其特征在于,
    所述加重模块,具体用于采用低阶的高通滤波器加重所述进行加窗和分帧后的语音信号中的高频谐波成分。
  17. 一种语音处理的装置,其特征在于,包括处理器;
    所述处理器,用于执行如下步骤:
    将获取的语音信号进行加窗和分帧;
    加重所述进行加窗和分帧后的语音信号中的高频谐波成分;
    根据FFT获取所述加重高频谐波成分后的语音信号的频谱;
    计算所述频谱中各个频点的斜率;
    根据所述斜率确定所述语音信号的中心频点,并根据所述中心频点确定谐波。
  18. 根据权利要求17所述的装置,其特征在于,
    所述处理器还用于执行如下步骤:
    统计所述谐波的个数,并判断所述谐波的个数是否大于预设的阈值,若是,则确定所述语音信号存在语音;
    通过计算相邻所述谐波的频率差,确定基音频率。
PCT/CN2015/085209 2014-11-18 2015-07-27 一种语音处理的方法及装置 WO2016078439A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201410657804.9 2014-11-18
CN201410657804.9A CN105590629B (zh) 2014-11-18 2014-11-18 一种语音处理的方法及装置

Publications (1)

Publication Number Publication Date
WO2016078439A1 true WO2016078439A1 (zh) 2016-05-26

Family

ID=55930151

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/085209 WO2016078439A1 (zh) 2014-11-18 2015-07-27 一种语音处理的方法及装置

Country Status (2)

Country Link
CN (1) CN105590629B (zh)
WO (1) WO2016078439A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108281152A (zh) * 2018-01-18 2018-07-13 腾讯音乐娱乐科技(深圳)有限公司 音频处理方法、装置及存储介质
CN117116245A (zh) * 2023-10-18 2023-11-24 武汉海微科技有限公司 声音信号的谐波生成方法、装置、设备及存储介质

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105845146B (zh) * 2016-05-23 2019-09-06 珠海市杰理科技股份有限公司 语音信号处理的方法及装置
CN107767880B (zh) * 2016-08-16 2021-04-16 杭州萤石网络有限公司 一种语音检测方法、摄像机和智能家居看护系统
CN113077806B (zh) * 2021-03-23 2023-10-13 杭州网易智企科技有限公司 音频处理方法及装置、模型训练方法及装置、介质和设备

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1527994A (zh) * 2000-07-14 2004-09-08 国际商业机器公司 快速频域音调估计
CN1659625A (zh) * 2002-05-31 2005-08-24 沃伊斯亚吉公司 在基于线性预测的语音编码解码器中有效帧删除隐藏的方法和器件
US6993481B2 (en) * 2000-12-04 2006-01-31 Global Ip Sound Ab Detection of speech activity using feature model adaptation
US20060095254A1 (en) * 2004-10-29 2006-05-04 Walker John Q Ii Methods, systems and computer program products for detecting musical notes in an audio signal
CN101199002A (zh) * 2005-06-09 2008-06-11 A.G.I.株式会社 检测音调频率的语音分析器、语音分析方法以及语音分析程序
CN101496095A (zh) * 2006-07-31 2009-07-29 高通股份有限公司 用于信号变化检测的系统、方法及设备
CN101625860A (zh) * 2008-07-10 2010-01-13 新奥特(北京)视频技术有限公司 语音端点检测中的背景噪声自适应调整方法

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1527994A (zh) * 2000-07-14 2004-09-08 国际商业机器公司 快速频域音调估计
US6993481B2 (en) * 2000-12-04 2006-01-31 Global Ip Sound Ab Detection of speech activity using feature model adaptation
CN1659625A (zh) * 2002-05-31 2005-08-24 沃伊斯亚吉公司 在基于线性预测的语音编码解码器中有效帧删除隐藏的方法和器件
US20060095254A1 (en) * 2004-10-29 2006-05-04 Walker John Q Ii Methods, systems and computer program products for detecting musical notes in an audio signal
CN101199002A (zh) * 2005-06-09 2008-06-11 A.G.I.株式会社 检测音调频率的语音分析器、语音分析方法以及语音分析程序
CN101496095A (zh) * 2006-07-31 2009-07-29 高通股份有限公司 用于信号变化检测的系统、方法及设备
CN101625860A (zh) * 2008-07-10 2010-01-13 新奥特(北京)视频技术有限公司 语音端点检测中的背景噪声自适应调整方法

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108281152A (zh) * 2018-01-18 2018-07-13 腾讯音乐娱乐科技(深圳)有限公司 音频处理方法、装置及存储介质
CN117116245A (zh) * 2023-10-18 2023-11-24 武汉海微科技有限公司 声音信号的谐波生成方法、装置、设备及存储介质
CN117116245B (zh) * 2023-10-18 2024-01-30 武汉海微科技有限公司 声音信号的谐波生成方法、装置、设备及存储介质

Also Published As

Publication number Publication date
CN105590629A (zh) 2016-05-18
CN105590629B (zh) 2018-09-21

Similar Documents

Publication Publication Date Title
WO2016078439A1 (zh) 一种语音处理的方法及装置
US10074384B2 (en) State estimating apparatus, state estimating method, and state estimating computer program
CN111128213B (zh) 一种分频段进行处理的噪声抑制方法及其系统
EP2363852B1 (en) Computer-based method and system of assessing intelligibility of speech represented by a speech signal
EP3493203A1 (en) Method for encoding multi-channel signal and encoder
US20210193149A1 (en) Method, apparatus and device for voiceprint recognition, and medium
CN106486131A (zh) 一种语音去噪的方法及装置
US20150081287A1 (en) Adaptive noise reduction for high noise environments
EP2927906B1 (en) Method and apparatus for detecting voice signal
US10089999B2 (en) Frequency domain noise detection of audio with tone parameter
Mittal et al. Study of characteristics of aperiodicity in Noh voices
US20170309297A1 (en) Methods and systems for classifying audio segments of an audio signal
CN104269180A (zh) 一种用于语音质量客观评价的准干净语音构造方法
US9208794B1 (en) Providing sound models of an input signal using continuous and/or linear fitting
US9058820B1 (en) Identifying speech portions of a sound model using various statistics thereof
US20150325252A1 (en) Method and device for eliminating noise, and mobile terminal
Virebrand Real-time monitoring of voice characteristics usingaccelerometer and microphone measurements
CN111081249A (zh) 一种模式选择方法、装置及计算机可读存储介质
Arsikere et al. Automatic height estimation using the second subglottal resonance
JP2015082093A (ja) 異常会話検出装置、異常会話検出方法及び異常会話検出用コンピュータプログラム
Tian et al. Spoofing detection under noisy conditions: a preliminary investigation and an initial database
Valentini-Botinhao et al. Improving intelligibility in noise of HMM-generated speech via noise-dependent and-independent methods
CN108062959B (zh) 一种声音降噪方法及装置
CN111477246A (zh) 语音处理方法、装置及智能终端
WO2015000401A1 (zh) 音频信号分类处理方法、装置及设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15861781

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15861781

Country of ref document: EP

Kind code of ref document: A1