WO2010108458A1 - 音频信号的分类方法及装置 - Google Patents

音频信号的分类方法及装置 Download PDF

Info

Publication number
WO2010108458A1
WO2010108458A1 PCT/CN2010/071373 CN2010071373W WO2010108458A1 WO 2010108458 A1 WO2010108458 A1 WO 2010108458A1 CN 2010071373 W CN2010071373 W CN 2010071373W WO 2010108458 A1 WO2010108458 A1 WO 2010108458A1
Authority
WO
WIPO (PCT)
Prior art keywords
audio signal
classified
subband
sub
pitch
Prior art date
Application number
PCT/CN2010/071373
Other languages
English (en)
French (fr)
Inventor
许丽净
吴顺妹
陈立维
张清
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to KR1020117024685A priority Critical patent/KR101327895B1/ko
Priority to AU2010227994A priority patent/AU2010227994B2/en
Priority to BRPI1013585A priority patent/BRPI1013585A2/pt
Priority to JP2012501127A priority patent/JP2012522255A/ja
Priority to SG2011070166A priority patent/SG174597A1/en
Priority to EP10755458.6A priority patent/EP2413313B1/en
Publication of WO2010108458A1 publication Critical patent/WO2010108458A1/zh
Priority to US13/246,485 priority patent/US8682664B2/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H1/00Details of electrophonic musical instruments
    • G10H1/0008Associated control or indicating means
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2210/00Aspects or methods of musical processing having intrinsic musical character, i.e. involving musical theory or musical parameters or relying on musical knowledge, as applied in electrophonic musical tools or instruments
    • G10H2210/031Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal
    • G10H2210/046Musical analysis, i.e. isolation, extraction or identification of musical elements or musical parameters from a raw acoustic signal or from an encoded audio signal for differentiation between music and non-music signals, based on the identification of musical parameters, e.g. based on tempo detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10HELECTROPHONIC MUSICAL INSTRUMENTS; INSTRUMENTS IN WHICH THE TONES ARE GENERATED BY ELECTROMECHANICAL MEANS OR ELECTRONIC GENERATORS, OR IN WHICH THE TONES ARE SYNTHESISED FROM A DATA STORE
    • G10H2250/00Aspects of algorithms or signal processing methods without intrinsic musical character, yet specifically adapted for or used in electrophonic musical processing
    • G10H2250/025Envelope processing of music signals in, e.g. time domain, transform domain or cepstrum domain
    • G10H2250/031Spectrum envelope processing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L2025/783Detection of presence or absence of voice signals based on threshold decision

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a method and an apparatus for classifying audio signals. Background technique
  • the speech coder is good at encoding speech-type audio signals at low to medium bit rates, but not for music-type audio signals; audio encoders are suitable for speech types and music types at high bit rates.
  • the audio signal is encoded, but the encoding of the speech type audio signal is not ideal at low to medium bit rates.
  • a coding process suitable for speech audio encoder at medium and low bit rate mainly includes: Firstly, the signal classification module is used to discriminate the audio signal. The type, and then the corresponding encoding method is selected according to the type of the audio signal discriminated, the speech encoder is selected for the audio signal of the speech type, and the audio encoder is selected for the audio signal of the music type.
  • the method for discriminating the type of the audio signal described above mainly includes:
  • the audio signals are classified into six categories: voice type, music type, noise type, short sequence, pending sequence, and short pending sequence.
  • Embodiments of the present invention provide a method and apparatus for classifying audio signals, which compete for low audio signal classification complexity and reduce computational complexity.
  • a method of classifying an audio signal comprising:
  • a device for classifying an audio signal comprising:
  • a tone acquiring module configured to acquire a tone feature parameter of the audio signal to be classified in at least one subband
  • a classification module configured to determine, according to the acquired feature parameters, a type of the audio signal to be classified.
  • Embodiment 1 is a flowchart of a method for classifying an audio signal according to Embodiment 1 of the present invention
  • FIG. 2 is a flowchart of a method for classifying an audio signal according to Embodiment 2 of the present invention
  • 3 is a flowchart of a method for classifying an audio signal according to Embodiment 3 of the present invention
  • FIG. 4 is a block diagram of a device for classifying an audio signal according to Embodiment 4 of the present invention
  • FIG. 5 is a block diagram of a device for classifying an audio signal according to Embodiment 5 of the present invention
  • 6 is a block diagram of a classifying apparatus for an audio signal according to Embodiment 6 of the present invention.
  • An embodiment of the present invention provides a method and an apparatus for classifying an audio signal.
  • the specific implementation process of the method includes: acquiring a tone feature parameter of an audio signal to be classified in at least one subband; and determining the to-be-classified according to the acquired feature parameter.
  • the type of audio signal includes: acquiring a tone feature parameter of an audio signal to be classified in at least one subband; and determining the to-be-classified according to the acquired feature parameter. The type of audio signal.
  • the method is implemented by a device comprising the following modules: a tone acquisition module and a classification module.
  • the tone acquiring module is configured to acquire a tone feature parameter of the to-be-classified audio signal in the at least one sub-band; and the classification module is configured to determine, according to the acquired feature parameter, the type of the to-be-classified audio signal.
  • the embodiment of the invention provides a method and a device for classifying audio signals.
  • the tone feature parameters By acquiring the tone feature parameters, the type of the audio signal to be classified can be determined, the number of feature parameters to be calculated is small, the classification method is simple, and P is low in the classification process. Computation.
  • This embodiment provides a method for classifying an audio signal. As shown in FIG. 1, the method includes the following steps:
  • the sampling frequency is 48 kHz
  • the received current frame audio signal is the k-th frame audio signal.
  • the following is the process of calculating the pitch characteristic parameters of the current frame of the audio signal.
  • N represents the frame length and h (1) represents the Hanning window data of the first sample point of the k-frame audio signal.
  • the FFT transform coefficient calculates the kth power spectral density in the k-th frame audio signal.
  • s(l) represents the original input sample point of the k-th frame audio signal and X(k,) represents the kth power spectral density in the k-th frame audio signal.
  • the calculated power spectral density X (k,) is corrected such that the maximum value of the power spectral density is the reference sound pressure level (96 dB).
  • the frequency region is divided into four frequency sub-bands, and the four frequency sub-bands are respectively represented by sb ⁇ and ⁇ . If a certain condition is satisfied between the power spectral density X (k,) and the adjacent first plurality of power spectral densities, the certain condition may be the condition shown in the following formula (3) in this embodiment, and then The subband corresponding to the X(k,) contains a tone, and the number of the tones is counted.
  • the number of coefficients of the power spectral density (ie, the length) is N/2, which corresponds to the value of the above j.
  • N the number of coefficients of the power spectral density
  • s b corresponds to 2 ⁇ k, ⁇ 63, the corresponding power spectral density coefficient is 0 to (N/16-1), and the corresponding frequency range is [0 kHz, 3 kHz);
  • the corresponding frequency range is [3kHz, 6kHz);
  • Sb 2 corresponds to 127 ⁇ k, ⁇ 255, and the corresponding power spectral density coefficients are N/8 to (N/4-1), and the corresponding frequency range is [6 kHz, 12 kHz);
  • the corresponding power spectral density coefficient is N/4 to N/2, and the corresponding frequency range is [12kHz, 24kHz).
  • k be one by one in the interval of 2 or less and less than 63.
  • the value is judged whether it satisfies the condition of formula (3), and the value interval of traversing the complete k After that, the number of k, which satisfies the condition, is counted, and the number k of the condition is satisfied, that is, the k-th frame audio signal is in the sub- With ⁇ .
  • k be one by one in the interval of 63 or less and less than 127. For each value of k, judge whether it satisfies the condition of formula (3), and traverse the complete k. After the value interval, the number of k, which satisfies the condition, is counted, and the number k of the condition that satisfies the condition is the number of sub-bands of the k-frame audio signal in the sub-band NT kl .
  • k be one by one in the interval greater than or equal to 127 and less than 255.
  • For each value of k judge whether it satisfies the condition of formula (3), and traverse the complete k. After the value interval, the number of k, which satisfies the condition, is counted, and the number k of the condition that satisfies the condition is the number of sub-bands of the k-frame audio signal in the sub-band ⁇ NT k 2 .
  • the number of sub-band tones NT k - 3 existing in the sub-band ⁇ of the k-th frame audio signal can also be counted.
  • the k-frame audio signal is calculated in four sub-bands according to the NT k — i calculated by 503. , sb ,
  • NT k sum ⁇ NT k ; ( 4 )
  • NT k — sum represents the total number of tones of the k - th frame audio signal.
  • the predetermined number of frames is M
  • the k-frame audio signal and the pre- (M-1) frame audio signal of the k-th frame are included in the M frame
  • the relationship between the value of M and the value of k is calculated.
  • the k-frame audio signal is the average of the number of sub-bands in each sub-band of the M-frame audio signal.
  • the mean value of the sub-band tone can be calculated by the following formula (5):
  • NT represents the number of sub-band tones of the j-frame audio signal in sub-band i
  • ave—NTi represents the mean of the number of sub-bands in sub-band i.
  • an appropriate formula is selected for calculation based on the relationship between the value of k and the value of M in the calculation.
  • the sub-band pitch number mean for each sub-band, and calculate the sub-band pitch number average ave-NT in the low-frequency sub-band.
  • the number of sub-bands in the higher frequency sub-band 2 is ave-NT 2 .
  • the predetermined number of frames is M
  • the k-frame audio signal and the pre- (M-1) frame audio signal of the k-th frame are included in the M frame
  • the relationship between the value of M and the value of k is calculated.
  • the kth frame audio signal averages the total number of tones included in each frame of the audio signal within the audio signal of the M frame.
  • the total number of tones can be calculated by the following formula (6):
  • NTj— sum represents the total number of tones in the jth frame
  • ave—NT sum represents the mean number of tones.
  • an appropriate formula is selected for calculation based on the relationship between the value of k and the value of M in the calculation.
  • ave—NTi represents the mean number of subbands in subband i
  • ave—NT sum represents the mean of the total number of tones
  • ave—NT— ratioi represents the subband of the k-frame audio signal in subband i The ratio of the mean number of tones to the mean of the total number of tones.
  • the sub-band pitch number ave-NT in the low-frequency sub-band A calculated by 205 is used.
  • the sub-band pitch number average ave-NT 2 , the pitch characteristic parameter ave NT ratioo and the sub-band of the k-th frame audio signal in the sub-band A can be respectively calculated by the formula (7).
  • the tone characteristic parameter ave_NT_ratio 2 in ⁇ is used, and the ave NT ratioo and ave_NT_ ratio 2 are used as the tone feature parameters of the k-th frame audio signal.
  • the pitch characteristic parameters to be considered are pitch characteristic parameters in the low frequency sub-band and in the higher frequency sub-band, but the design of the present invention is not limited to this one in the embodiment, according to the design The need to calculate the pitch characteristic parameters in other sub-bands.
  • the pitch characteristic parameter ave_NT_ ratioo in the sub-band A calculated in 507 and the pitch characteristic parameter ave_NT_ratio 2 in the sub-band ⁇ satisfy a certain relationship with the first parameter and the second parameter,
  • the certain relationship can be as follows (12) in this embodiment:
  • ave—NT— ratio Representing the pitch characteristic parameter of the k-th frame audio signal in the low frequency sub-band, ave_NT- ratio 2 represents the pitch characteristic parameter of the k-th frame audio signal in the higher frequency sub-band, ⁇ represents the first coefficient, and ⁇ represents the second coefficient.
  • the k-th frame audio signal is a voice type audio signal, Otherwise it is a music type audio signal.
  • the following is a process of smoothing the current one frame of the audio signal.
  • the type of the k-th frame audio signal is the same as the type of the (k-1)-th frame audio signal, and if the result of the judgment is the type of the k-th frame audio signal and the type of the (k-1)-th frame audio signal If not, execute 511, otherwise execute 512.
  • the type of the k-th frame audio signal is modified to the type of the (k-1)-th frame audio signal.
  • the type of the audio signal of the previous frame and the subsequent frame is used.
  • the technical solution, but the method belongs to the process of understanding the frame related information before and after, and the method of understanding the first few frames and the last few frames is not limited by the description in the embodiment.
  • a scheme that specifically understands at least the type of audio signal of at least the previous frame or at least the latter frame in the process is applicable to the embodiment of the present invention.
  • Example 2 This embodiment discloses a method for classifying an audio signal. As shown in FIG. 2, the method includes:
  • the frequency region is divided into four frequency sub-bands, and in each sub-band, the current one-frame audio signal can acquire a corresponding tone characteristic parameter.
  • the current one-frame audio signal can acquire a corresponding tone characteristic parameter.
  • 102, 103 do not limit the order of execution, and may even be performed simultaneously.
  • the technical solution provided in this embodiment solves the technical problem of determining the type of the audio signal according to the tone characteristic parameter and the spectral gradient characteristic parameter of the audio signal, and solves the problem in the prior art when classifying the type of the audio signal.
  • the five characteristic parameters such as harmonics, noise and rhythm, lead to technical problems with complex classification methods, and further reduce the complexity of the classification method when classifying audio signals and reduce the computational complexity of the classification.
  • This embodiment discloses a method for classifying an audio signal. As shown in FIG. 3, the method includes the following steps:
  • the following is the process of calculating the pitch characteristic parameters of the current frame of the audio signal.
  • the windowing processing of the Hanning window is performed on the time domain data of the k-th frame audio signal.
  • N represents the frame length and h (1) represents the Hanning window data of the first sample point of the k-frame audio signal.
  • s(l) represents the original input sample point of the k-th frame audio signal and X(k,) represents the kth power spectral density in the k-th frame audio signal.
  • the calculated power density X (k,) is corrected such that the maximum value of the power spectral density is the reference sound pressure level (96 dB).
  • the frequency region is divided into four frequency sub-bands, and the four frequency sub-bands are respectively represented by sb ⁇ and ⁇ .
  • the certain condition may be the condition shown in the following formula (3) in this embodiment, and then
  • the sub-band corresponding to the X (k,) contains a pitch, and the number of the tones is counted, and the number of sub-bands in the sub-band NT ki is obtained , and the NT k — i represents the k-th frame audio.
  • the number of coefficients of the power spectral density (i.e., the length) is N/2, which corresponds to the value of the above j.
  • N the number of coefficients of the power spectral density
  • s b ° Corresponds to 2 ⁇ k, ⁇ 63, the corresponding power spectral density coefficient is 0th to the (N/16-1)th, and the corresponding frequency range is [0kHz, 3kHz);
  • the corresponding power spectral density coefficient is N/16 to (N/8-1), and the corresponding frequency range is [3 kHz, 6 kHz);
  • Sb 2 corresponds to 127 ⁇ k, ⁇ 255, the corresponding power spectral density coefficient is N/8 to (N/4-1), and the corresponding frequency range is [6 kHz, 12 kHz);
  • the corresponding power spectral density coefficient is N/4 to N/2, and the corresponding frequency range is [12kHz, 24kHz).
  • k be one by one in the interval of 2 or less and less than 63.
  • the value is judged whether it satisfies the condition of formula (3), and the value interval of traversing the complete k After that, the number of k, which satisfies the condition, is counted, and the number k of the condition is satisfied, that is, the audio signal of the kth frame is in the subband ⁇ .
  • k be one by one in the interval of 63 or less and less than 127. For each value of k, judge whether it satisfies the condition of formula (3), and traverse the complete k. of After the value interval, the number of k, which satisfies the condition, is counted, and the number k of the condition that satisfies the condition is the number of sub-band tones NT kl of the k-th frame audio signal in the sub-band ⁇ .
  • k be one by one in the interval greater than or equal to 127 and less than 255.
  • the value is judged whether it satisfies the condition of formula (3), and traverses the complete k. After the value interval, the number of k, which satisfies the condition, is counted, and the number k of the condition that satisfies the condition is the number of sub-bands of the k-frame audio signal in the sub-band NT k 2 .
  • the number of sub-band tones NT k - 3 existing in the sub-band ⁇ of the k-th frame audio signal can also be counted.
  • the k-frame audio signal is calculated in four sub-bands according to the NT k — i calculated by 203. , ⁇ , sb 2 and the sum of the sub-bands in the sub-band.
  • NT k sum ⁇ NT k ; ( 4 )
  • NT k — sum represents the total number of tones of the k - th frame audio signal.
  • the predetermined number of frames is M
  • the k-frame audio signal and the pre- (M-1) frame audio signal of the k-th frame are included in the M frame, and the relationship between the value of M and the value of k is calculated.
  • the k-frame audio signal is the average of the number of sub-bands in each sub-band of the M-frame audio signal.
  • the mean value of the sub-band tone can be calculated by the following formula (5):
  • NT represents the number of subband tones in the sub-band i of the j-th frame audio signal
  • ave_NTi represents the mean number of sub-band tones in the sub-band i.
  • an appropriate formula is selected for calculation based on the relationship between the value of k and the value of M in the calculation.
  • the sub-band pitch number mean for each sub-band, and calculate the sub-band pitch number average ave-NT in the low-frequency sub-band ⁇ .
  • the number of sub-bands in the higher frequency sub-band 2 is ave-NT 2 .
  • the predetermined number of frames is M
  • the k-frame audio signal and the pre- (M-1) frame audio signal of the k-th frame are included in the M frame
  • the relationship between the value of M and the value of k is calculated.
  • the kth frame audio signal averages the total number of tones included in each frame of the audio signal within the audio signal of the M frame.
  • NTj— sum represents the total number of tones in the jth frame
  • ave—NT sum represents the mean number of tones.
  • an appropriate formula is selected for calculation based on the relationship between the value of k and the value of M in the calculation.
  • the pitch characteristic parameter can be specifically calculated by the following formula (7): ave NT ratio
  • ave NTi represents the mean of the subband tones in subband i
  • ave NT sum represents the total The mean number of pitches
  • ave—NT—ratioi represents the ratio of the mean number of subbands in the sub-band i of the k-th frame audio signal to the mean of the total number of tones.
  • the sub-band pitch number ave-NT in the low-frequency sub-band A calculated by 205 is used.
  • the sub-band pitch number average ave-NT 2 by equation (7), the pitch characteristic parameter ave NT ratioo and sub-band in the k-frame audio signal in the sub-band can be respectively calculated.
  • the pitch characteristic parameter ave_NT_ratio 2 in ⁇ , and the ave NT ratioo and ave-NT- ratio 2 are used as the tonal feature parameters of the k-th frame audio signal.
  • the pitch characteristic parameters to be considered are pitch characteristic parameters in the low frequency sub-band and in the higher frequency sub-band, but the design of the present invention is not limited to this one in the embodiment, according to the design The need to calculate the pitch characteristic parameters in other sub-bands.
  • the following is the process of calculating the spectral tilt characteristic parameter of the current frame of the audio signal.
  • the frequency slope of the k-th frame audio signal can be calculated by the following formula (8):
  • s ( n ) represents the nth time domain sample point of the k-th frame audio signal
  • r represents an autocorrelation parameter
  • spec_tilk represents the spectral slope of the k-th frame audio signal
  • the predetermined number of frames is M
  • the k-frame audio signal and the k-th frame front (M-1) frame audio signal are included in the M frame, and the relationship between the value of M and the value of k is calculated.
  • the spectral slope of each frame of the audio signal is averaged within the audio signal of the M frame, i.e., the mean of the spectral tilt within the audio signal of the M frame.
  • the mean value of the spectral tilt can be calculated by the following formula (9): Where k represents the frame number of the current frame of the audio signal, M represents the specified number of frames, spec-tiltj represents the spectral slope of the audio signal of the jth frame, and ave_spec-tilt is the mean of the spectral tilt.
  • k represents the frame number of the current frame of the audio signal
  • M represents the specified number of frames
  • spec-tiltj represents the spectral slope of the audio signal of the jth frame
  • ave_spec-tilt is the mean of the spectral tilt.
  • an appropriate formula is selected for calculation based on the relationship between the value of k and the value of M in the calculation.
  • the predetermined number of frames is M
  • the k-frame audio signal and the pre- (M-1) frame audio signal of the k-th frame are included in the M frame
  • the relationship between the value of M and the value of k is calculated.
  • the mean square error is the spectral slope characteristic parameter of the current frame of the audio signal.
  • the spectral slope characteristic parameter can be calculated by the following formula (10): if k ⁇ (M- ⁇ ) if k ⁇ (Ml) ( 10 )
  • k represents the frame number of the current frame of the audio signal
  • ave_spec-tilt is the mean value of the spectral tilt
  • dif_spec-tilt is the spectral tilt characteristic parameter.
  • an appropriate formula is selected for calculation based on the relationship between the value of k and the value of M in the calculation.
  • the processes (202 to 207) for calculating the pitch characteristic parameters and the processes of the spectral gradient characteristic parameters (208 to 210) described in the above embodiments do not limit the order of execution, and may even be performed simultaneously.
  • determining characteristic parameters of the tone in the subband A in 207 calculated ave_NT_ratio 0, tone characteristic parameters of subband ⁇ ave_NT_ratio 2 and 210 calculated in the spectral tilt characteristic parameters dif- spec- Whether the tilt satisfies a certain relationship with the first parameter, the second parameter, and the third parameter, and the relationship may be in the following relationship (11) in this embodiment:
  • ave—NT— ratio Representing the pitch characteristic parameter of the k-th frame audio signal in the low frequency sub-band, ave_NT- ratio 2 represents the pitch characteristic parameter of the k-th frame audio signal in the higher frequency sub-band, dif_spec-tilt represents the k-th frame audio The spectral slope characteristic parameter of the signal, ⁇ represents the first coefficient, ⁇ represents the second coefficient, and ⁇ represents the third coefficient.
  • the k-th frame audio signal is a speech type audio signal, otherwise it is a music type audio signal.
  • the following is a process of smoothing the current one frame of the audio signal.
  • step 212 needs to wait for the (k+1) when determining the type of the current frame audio signal, that is, the type of the kth frame audio signal. After the type of the frame audio signal is judged, the next step 213 can be performed.
  • the type of the first three frames of the current audio signal and the type of the last three frames may also be determined. Or the type of the first five frames of the current audio signal and the type of the last five frames, etc., to determine whether the current audio signal needs to be smoothed, and the number of related frames that need to be understood is not described in this embodiment. limit. Because you know more about the relevant information before and after, this smoothing effect may be better.
  • the classification of the audio signal according to the five characteristic parameters is required.
  • the classification of the audio signal can be realized according to the two characteristic parameters, and the classification algorithm is simple. The complexity is low, and P strives to reduce the computational complexity of the classification process.
  • the scheme of the embodiment also adopts a technical means for smoothing the classified audio signal, and obtains a recognition rate that can improve the type of the audio signal. Therefore, the beneficial effects of the speech encoder and the audio encoder can be fully utilized in the subsequent encoding process.
  • the embodiment specifically provides an audio signal classification device.
  • the device includes: a receiving module 40, a tone acquiring module 41, a classifying module 43, a first determining module 44, and a second The determining module 45, the smoothing module 46, and the first setting module 47.
  • the receiving module 40 is configured to receive an audio signal of a current frame, where the audio signal of the current frame is an audio signal to be classified;
  • the tone acquiring module 41 is configured to acquire a tone feature parameter of the audio signal to be classified in at least one subband;
  • the module 43 is configured to determine the type of the audio signal to be classified according to the pitch feature parameter acquired by the tone acquisition module 41.
  • the first determining module 44 is configured to determine, after the classification module 43 classifies the type of the audio signal to be classified, Whether at least the type of the previous frame audio signal before the audio signal is the same as the type of the at least one subsequent frame audio signal corresponding to the audio signal to be classified; the second determining module 45 is configured to determine when the first determining module 44 Determining whether the type of the audio signal to be classified is different from the type of the audio signal of the at least one previous frame when the type of the audio signal to be classified is the same as that of the at least one subsequent frame of the audio signal; the smoothing module 46 is used to be the first When the two determining module 45 determines that the type of the audio signal is different from the at least one previous frame, the audio signal to be classified is Type smoothing; a first setting module 47 for computing the preset predetermined number of frames.
  • the classification module 43 includes: a determination unit 431, and a classification unit 432.
  • the determining unit 431 is configured to determine whether the to-be-classified audio signal is greater than the first coefficient in the low frequency sub-band, and the tonal feature parameter in the higher frequency sub-band is smaller than the second coefficient; the classification unit 432 is configured to be used The determining unit 431 determines that the tonal feature parameter of the to-be-classified audio signal in the low frequency subband is greater than the first coefficient, and determines that the tonal audio signal is to be classified when the tonal feature parameter in the higher frequency subband is smaller than the second coefficient
  • the type is voice type, otherwise it is a music type.
  • the tone acquiring module 41 calculates the pitch feature parameter according to the number of tones in the at least one subband and the total number of tones of the audio signal to be classified according to the audio signal to be classified.
  • the tone acquisition module 41 includes: a first calculation unit 411, a second calculation unit 412, and a tone feature unit 413.
  • the first calculating unit 411 is configured to calculate a sub-band pitch number average value of the to-be-classified audio signal in the at least one sub-band; the second calculating unit 412 is configured to calculate a total pitch number average value of the to-be-classified audio signal;
  • the tone feature unit 413 is configured to use, as the pitch feature parameter of the to-be-classified audio signal in the corresponding sub-band, a ratio of the sub-band tone number mean value in the at least one sub-band to the total tone number-average value, respectively. .
  • the first calculating unit 411 calculates the sub-band pitch number average value of the to-be-classified audio signal in the at least one sub-band, including: the frame number calculated according to the preset set by the first setting module 47 and the frame number of the audio signal to be classified.
  • the relationship calculates the mean of the number of subband tones in a subband.
  • the second calculating unit 412 calculates the total number of pitch numbers of the audio signals to be classified, including: calculating the total number of pitches according to the relationship between the number of frames calculated by the first setting module and the frame number of the audio signal to be classified.
  • the audio signal classification apparatus obtained in this embodiment obtains the technical effect of determining the type of most audio signals by using the technical means of acquiring the tone characteristic parameters of the audio signal, and P strives to lower the classification process of the audio signal.
  • the difficulty of the classification method also reduces the amount of calculation.
  • the embodiment discloses an apparatus for classifying an audio signal.
  • the apparatus includes: a receiving module 30, a tone acquiring module 31, and a spectrum tilt acquiring module 32.
  • Classification module 33 is included in the apparatus.
  • the receiving module 30 is configured to receive the audio signal of the current frame; the tone acquiring module 31 is configured to acquire the tone feature parameter of the audio signal to be classified in the at least one subband; and the spectrum tilt acquiring module 32 is configured to acquire the spectrum of the audio signal to be classified.
  • the classification module 33 is configured to determine the type of the to-be-classified audio signal according to the pitch feature parameter acquired by the tone acquisition module 31 and the spectral gradient feature parameter acquired by the spectral gradient acquisition module 32.
  • the audio signal when the audio signal is classified, it is required to refer to the characteristic parameters of the audio signal, so that the classification complexity is high and the calculation amount is large, and the scheme provided by this embodiment classifies the audio signal.
  • the type of the audio signal can be distinguished, the classification of the audio signal is simplified, and the amount of calculation in the classification process is also reduced.
  • Example 6 The embodiment specifically provides an audio signal classification device. As shown in FIG. 6, the device includes: a receiving module 40, a tone acquiring module 41, a spectrum tilt acquiring module 42, a classifying module 43, and a first determining module 44. The second determining module 45, the smoothing module 46, the first setting module 47 and the second setting module 48.
  • the receiving module 40 is configured to receive an audio signal of a current frame, where the audio signal of the current frame is an audio signal to be classified; the tone acquiring module 41 is configured to acquire a tone feature parameter of the audio signal to be classified in at least one subband;
  • the tone acquiring module 41 calculates the pitch feature parameter according to the number of tones in the at least one subband and the total number of tones of the audio signal to be classified according to the audio signal to be classified.
  • the classification module 43 includes: a determination unit 431, and a classification unit 432.
  • the determining unit 431 is configured to: when the tonal feature parameter of the to-be-classified audio signal in the low frequency subband is greater than the first coefficient, and the tonal feature parameter in the higher frequency subband is smaller than the second coefficient, determine Whether the spectral gradient characteristic parameter of the audio signal is greater than the third coefficient; the classification unit 432 is configured to determine, when the determining unit determines that the spectral gradient characteristic parameter of the to-be-classified audio signal is greater than the third coefficient, determining the audio signal to be classified
  • the type is voice type, otherwise it is a music type.
  • the tone acquisition module 41 includes: a first calculation unit 411, a second calculation unit 412, and a tone feature unit 413.
  • the first calculating unit 411 is configured to calculate a sub-band pitch number average value of the to-be-classified audio signal in the at least one sub-band; the second calculating unit 412 is configured to calculate a total pitch number-average value of the to-be-classified audio signal; The ratio of the sub-band pitch number average value in the at least one sub-band to the total pitch number average value is used as the tonal feature parameter of the to-be-classified audio signal in the corresponding sub-band.
  • the first calculating unit 411 calculates the relationship between the number of frames calculated by the first setting module 47 and the frame number of the audio signal to be classified according to the average number of subbands of the to-be-classified audio signal in at least one subband. Calculate the number of subband tones in a subband.
  • the calculating, by the second calculating unit 412, the total number of pitch numbers of the audio signals to be classified includes: calculating the total number of pitches according to the relationship between the number of frames calculated by the first setting module 47 and the frame number of the audio signal to be classified. .
  • the frequency gradient obtaining module 42 includes: a third calculating unit 421, a spectrum tilt characteristic unit 422.
  • the third calculating unit 421 is configured to calculate a frequency mean value mean value of the audio signal to be classified; the frequency language tilt characteristic unit 422 is configured to use a mean square error of the spectral slope of the at least one audio signal and the mean value of the spectral tilt as the The spectral slope characteristic parameter of the audio signal to be classified.
  • the calculating, by the third calculating unit 421, the mean value of the spectral inclination of the audio signal to be classified includes: calculating the average value of the spectral inclination according to the relationship between the number of frames calculated by the second setting module 48 and the frame number of the audio signal to be classified.
  • the spectral slope characteristic unit 422 calculates a mean square error of the spectral slope of the at least one audio signal and the average of the spectral slopes, including: a frame number calculated according to a specification set by the second setting module 48 The relationship of the frame numbers of the audio signals to be classified calculates a spectral tilt characteristic parameter.
  • the first setting module 47 and the second setting module 48 in this embodiment can be implemented by one program or module, and even the same predetermined number of frames can be set.
  • the solution provided in this embodiment has the following beneficial effects: simple classification, low complexity, small computation amount, no extra delay is introduced for the encoder, and can meet the requirements of the speech audio encoder in the middle and low code rate in the classification process. Real-time coding, low complexity requirements.
  • the embodiments of the present invention are mainly applied to the field of communication technologies, and implement fast, accurate and real-time classification of audio signal types. As the development of network technology is likely to be applied to other scenarios in the field, it is also possible to switch to similar or similar technical fields.
  • the present invention can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is a better implementation. the way.
  • the technical solution of the present invention which is essential or contributes to the prior art, can be embodied in the form of a software product stored in a readable storage medium, such as a floppy disk of a computer.
  • a hard disk or optical disk, etc. includes instructions for causing an encoder to perform the methods described in various embodiments of the present invention.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Telephone Function (AREA)
  • Stereophonic System (AREA)
  • Auxiliary Devices For Music (AREA)
  • Compression, Expansion, Code Conversion, And Decoders (AREA)
  • Circuits Of Receivers In General (AREA)
  • Circuit For Audible Band Transducer (AREA)

Description

音频信号的分类方法及装置 本申请要求于 2009 年 3 月 27 日提交中国专利局, 申请号为 200910129157.3 , 发明名称为"音频信号的分类方法及装置 "的中国专利申请的 优先权, 其全部内容通过引用结合在本申请中。
技术领域
本发明涉及通信技术领域, 尤其涉及一种音频信号的分类方法及装置。 背景技术
语音编码器擅长于在中低码率下对语音类型的音频信号进行编码,而对音 乐类型的音频信号编码效果则欠佳;音频编码器适用于在高码率下对语音类型 和音乐类型的音频信号进行编码,但在中低码率下对语音类型的音频信号编码 效果不够理想。为了使语音音频混合的音频信号在中低码率下能够取得良好的 编码效果, 一个适用于在中低码率下的语音音频编码器的编码过程主要包括: 首先利用信号分类模块判别音频信号的类型,再^^据判别出来的音频信号的类 型选择对应的编码方法,对于语音类型的音频信号选择语音编码器,对于音乐 类型的音频信号选择音频编码器。
在现有技术当中, 对于上述判别音频信号的类型的方法主要包括:
1、 利用窗函数将输入信号划分为一系列的重叠的帧;
2、 利用快速傅里叶变换(FFT )计算每帧的频谱系数;
3、 根据每帧的频谱系数, 对于每个段计算五个方面的特征参数: 谐波、 噪音、 拖尾、 拖延及节奏;
4、 基于上述特征参数的值, 把音频信号分为六类: 语音类型、 音乐类型、 噪音类型、 短序列、 待定序列、 及短待定序列。
在实现上述判别音频信号的类型的过程中,发明人发现现有技术中至少存 在如下问题: 该方法在分类的过程中需要计算多方面的特征参数,对于音频信 号的分类也较复杂, 由此而导致了分类复杂度较高。 发明内容
本发明的实施例提供一种音频信号的分类方法及装置, P争低音频信号分类 复杂度, 减少运算量。
为达到上述目的, 本发明的实施例采用如下技术方案:
一种音频信号的分类的方法, 包括:
获取待分类音频信号在至少一个子带中的音调特征参数;
根据获取的特征参数判定所述待分类音频信号的类型。
一种音频信号的分类的装置, 包括:
音调获取模块,用于获取待分类音频信号在至少一个子带中的音调特征参 数;
分类模块, 用于根据获取的特征参数判定所述待分类音频信号的类型。 本发明实施例提供的方案通过采用音频信号的音调特性对音频信号进行 分类的技术手段, 克服了现有技术中在对音频信号分类时分类复杂的技术问 题,进而达到了降低音频信号分类复杂度, 减少分类时所需要的运算量的技术 效果。 附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施 例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地, 下面描述 中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付 出创造性劳动性的前提下, 还可以根据这些附图获得其他的附图。
图 1为本发明实施例 1音频信号的分类方法的流程图;
图 2为本发明实施例 2音频信号的分类方法的流程图; 图 3为本发明实施例 3音频信号的分类方法的流程图; 图 4为本发明实施例 4音频信号的分类装置的框图; 图 5为本发明实施例 5音频信号的分类装置的框图; 图 6为本发明实施例 6音频信号的分类装置的框图。 具体实施方式
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清 楚、 完整地描述, 显然, 所描述的实施例仅仅是本发明一部分实施例, 而不是 全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造 性劳动前提下所获得的所有其他实施例, 都属于本发明保护的范围。
本发明实施例提供了一种音频信号的分类方法及装置,该方法的具体执行 过程包括: 获取待分类音频信号在至少一个子带中的音调特征参数;根据获取 的特征参数判定所述待分类音频信号的类型。
该方法通过包括如下模块的装置实现: 音调获取模块和分类模块。 其中, 该音调获取模块用于获取待分类音频信号在至少一个子带中的音调特征参数; 分类模块用于根据获取的特征参数判定所述待分类音频信号的类型。
本发明实施例提供音频信号的分类方法及装置,通过获取音调特征参数便 可以判断出待分类音频信号的类型, 需要计算的特征参数的方面少,分类方法 简单, P争低了分类过程中的运算量。
实施例 1
本实施例提供一种音频信号的分类方法,如图 1所示,该方法包括如下步 骤:
501 , 接收到当前一帧音频信号, 该音频信号即为待分类音频信号。
具体为: 设采样频率为 48kHz, 帧长 N=1024个样本点, 接收到的当前一 帧音频信号为第 k帧音频信号。
下述为计算当前一帧音频信号的音调特征参数的过程。
502, 计算该当前一帧音频信号的功率谱密度。 具体为: 对第 k帧音频信号的时域数据进行加汉宁窗的加窗处理。 可通过如下汉宁窗的公式计算:
Figure imgf000006_0001
其中, N代表帧长, h (1)代表第 k帧音频信号的第 1个样本点的汉宁窗 数据。
对加窗后的第 k帧音频信号的时域数据进行长度为 N的 FFT变换(因为 FFT变换是关于 N/2对称的, 所以实际计算长度为 N/2的 FFT变换即可), 并 利用 FFT变换系数计算该第 k帧音频信号中第 k,个功率谱密度。
该第 k帧音频信号中第 k,个功率谱密度可通过如下计算公式计算: (t') = 10-log
Figure imgf000006_0002
0< '<N/2,0</≤N-l
其中 s(l)代表第 k帧音频信号的原始输入样本点, X ( k,)代表第 k帧音 频信号中第 k,个功率谱密度。
对计算出的功率谱密度 X (k,)进行校正, 使得该功率谱密度的最大值为 参考声压级 (96dB)。
503, 利用上述功率谱密度检测在频率区域的每个子带中是否有音调的存 在, 并统计在对应子带中存在的音调的个数,将该音调个数作为在该子带中的 子带音调个数。
具体为: 将频率区域划分为四个频率子带, 分别用 、 sb ^及 ^表示 这四个频率子带。 如果功率谱密度 X (k,)与相邻的第若干个功率谱密度之间 满足一定的条件, 该一定条件在本实施例中可以为如下公式(3) 的所示的条 件, 则认为与该 X(k,)对应的子带中含有音调, 并对该音调的个数进行统计, 得出在该子带中的子带音调个数 NTk— i, 该 NTk— i代表第 k帧音频信号在子带 sbi ( i代表子带的编号, 并且 i=0, 1 , 2, 3 ) 中的子带音调个数。
(^-ΐ) < (^)≤^'+ΐ) and X{k')-X{k'+j)≥TdB
(3)
其中, 的取值规定如下:
. _
7 ~ 5
Figure imgf000007_0001
0 在本实施例中, 已知功率谱密度的系数个数(即长度)为 N/2, 对应于上 述 j的取值规定, 对于 k,值的取值区间的意义进一步说明如下:
sb 对应 2 < k,<63, 对应的功率谱密度系数为第 0个到第 (N/16-1)个, 对 应的频率范围是 [0kHz, 3kHz);
对应 63 < k,<127, 对应的功率谱密度系数为第 N/16 个到第 (N/8-1) 个, 对应的频率范围是 [3kHz, 6kHz);
sb2 对应 127 < k,<255, 对应的功率谱密度系数为第 N/8 个到第 (N/4-1) 个, 对应的频率范围是 [6kHz, 12kHz);
s : 对应 255 < k'<500, 对应的功率谱密度系数为第 N/4个到第 N/2个, 对应的频率范围是 [12kHz, 24kHz)。
其中, A及 对应低频子带部分; 对应较高频子带部分; ^对应高频 子带部分。
具体统计 NTk— i的过程如下:
对于子带 , 使 k,在大于等于 2小于 63的区间内逐一取值, 对于每一个 k,的取值, 判断其是否满足公式(3 )的条件, 在遍历完整个 k,的取值区间后, 统计满足条件的 k,的个数, 该满足条件的 k,个数, 即为第 k帧音频信号在子 带 ^。中的存在的子带音调个数 NTk— o。
例如: 若当 k,= 3, k'=5, k,=10时, 公式(3 )成立, 则认为在子带 ^中 有 3个子带音调, 即 NTk—。=3。
同样地, 对于子带^, 使 k,在大于等于 63小于 127的区间内逐一取值, 对于每一个 k,的取值, 判断其是否满足公式(3 ) 的条件, 在遍历完整个 k,的 取值区间后, 统计满足条件的 k,的个数, 该满足条件的 k,个数, 即为第 k帧 音频信号在子带 中的存在的子带音调个数 NTk l
同样地, 对于子带 ^, 使 k,在大于等于 127小于 255的区间内逐一取值, 对于每一个 k,的取值, 判断其是否满足公式(3 ) 的条件, 在遍历完整个 k,的 取值区间后, 统计满足条件的 k,的个数, 该满足条件的 k,个数, 即为第 k帧 音频信号在子带 ^中的存在的子带音调个数 NTk2
利用同样的方法, 也可统计该第 k帧音频信号在子带 ^中的存在的子带 音调个数 NTk3
504, 计算当前一帧音频信号的总音调个数。
具体为:根据 503统计出的 NTk— i计算第 k帧音频信号在四个子带 ^。、 sb、、
Sb2及 ^中的子带音调个数之和。
该第 k帧音频信号在四个子带 、 Sb ^及 ^中的子带音调个数之和即 为该第 k帧音频信号的中的音调个数, 具体可通过如下公式计算:
NTk sum =∑NTk ; ( 4 )
=0 其中, NTksum代表第 k帧音频信号的总音调个数。
505, 计算在规定帧数内当前一帧音频信号在对应子带中的子带音调个数 均值。
具体为: 设该规定帧数为 M, 在该 M帧内包括第 k帧音频信号和第 k帧 的前(M-1 )帧音频信号,根据 M的值与 k的值之间关系计算第 k帧音频信号 在这 M帧音频信号每个子带中的子带音调个数均值。 该子带音调个数均值具体可通过如下公式( 5 )计算:
∑ NT
if k < {M - \)
ave NT k + 1
∑ NT
+1
if k≥ {M - \)
M 、 ' ( 5 ) 其中, NT 代表第 j帧音频信号在子带 i中的子带音调个数, ave— NTi代 表在子带 i中的子带音调个数均值。 特别地, 由公式(5 )可知, 在计算时需 根据 k的值与 M的值的关系选择适当的公式进行计算。
特别地,在本实施例中根据设计的需要, 不必对每个子带都计算子带音调 个数均值, 计算在低频子带 中的子带音调个数均值 ave— NT。, 及在较高频 子带 2中的子带音调个数 ave— NT2.即可。
506, 计算在规定帧数内当前一帧音频信号总的音调个数均值。
具体为: 设该规定帧数为 M, 在该 M帧内包括第 k帧音频信号和第 k帧 的前(M-1 )帧音频信号,根据 M的值与 k的值之间的关系计算第 k帧音频信 号在这 M帧的音频信号内平均每帧音频信号包含的总音调个数。
该总音调个数具体可如下公式(6 )计算:
Figure imgf000009_0001
其中, NTj— sum代表第 j帧总音调个数, ave— NTsum代表总的音调个数均值。 特别地, 由公式( 6 )可知, 在计算时需根据 k的值与 M的值的关系选择适当 的公式进行计算。
507, 将计算出的在至少一个子带中的子带音调个数均值与总的音调个数 均值之比分别作为当前一帧音频信号在对应子带中的音调特征参数。 该音调特征参数具体可通过如下公式( 7 )计算:
- - ' —NT
( 7 )
其中, ave— NTi代表在子带 i中的子带音调个数均值, ave— NTsum代表总 的音调个数均值, ave— NT— ratioi代表第 k帧音频信号在子带 i中的子带音调个 数均值与总的音调个数均值的比值。
特别地, 在本实施例中, 利用 205计算出来的在低频子带 A中的子带音 调个数均值 ave— NT。及在较高频子带 s 中的子带音调个数均值 ave— NT2,通 过公式 (7 ) 可分别计算出第 k 帧音频信号在子带 A中的音调特征参数 ave NT ratioo和在子带 ^中的音调特征参数 ave_NT_ratio2 , 并将该 ave NT ratioo和 ave— NT— ratio2作为第 k帧音频信号的音调特征参数。
在本实施例中,需要考虑的音调特征参数是在低频子中和在较高频子带中 的音调特征参数,但本发明的设计方案并不仅限于在本实施例中的这一个,根 据设计的需要, 还可以计算在其它子带中的音调特征参数。
508, 根据上述过程中计算得出的音调特征参数判断当前一帧音频信号的 类型。
具体为: 判断 507 中计算得出的在子带 A中的音调特征参数 ave— NT— ratioo、在子带 ^中的音调特征参数 ave_NT_ratio2是否与第一参数和 第二参数满足一定关系, 该一定关系在本实施例中可如下关系式( 12 ):
{μνβ— NT— mtioQ > ) and {ave _ NT _ ratio 2 < β)
(12)
其中, ave— NT— ratio。代表第 k帧音频信号在低频子带中的音调特征参 数, ave— NT— ratio2代表第 k帧音频信号在较高频子带中的音调特征参数, α代 表第一系数, β代表第二系数。
如果满足上述关系式 (12),则判定第 k帧音频信号为语音类型的音频信号, 否则为音乐类型的音频信号。
下述为当前一帧音频信号进行平滑处理的过程。
509, 对于已判断出音频信号的类型的当前一帧音频信号, 再判断该当前 一帧音频信号的前一帧音频信号的类型是否与当前音频信号的后一帧音频信 号的类型相同, 如果判定为两者相同, 则执行 510, 否则执行 512。
具体为: 判断第 (k-1 )帧音频信号的类型是否与第 (k+1 )帧音频信号的 类型相同, 如果判定的结果为第 (k-1 )帧音频信号的类型与第 (k+1 )帧音频 信号的类型相同, 则执行 510, 否则执行 512。
510, 判断当前一帧音频信号的类型是否与当前一帧音频信号的前一帧音 频信号的类型相同, 如果判定为不相同则执行 511, 否则执行 512。
具体为: 判断第 k帧音频信号的类型是否与第 (k-1 ) 帧音频信号的类型 相同, 如果判断的结果为第 k帧音频信号的类型与第 (k-1 ) 帧音频信号的类 型不相同, 则执行 511, 否则执行 512。
511, 将当前一帧音频信号的类型修改为前一帧音频信号的类型。
具体为: 将第 k帧音频信号的类型修改为第 (k-1 ) 帧音频信号的类型。 本实施例在所述当前一帧音频信号进行平滑处理的过程中,具体判断是否 需要对当前一帧音频信号的类型进行平滑处理时,采用了需了解前一帧和后一 帧音频信号的类型的技术方案,但该方法属于了解前后帧相关信息的过程,具 体采用了解前几帧和后几帧的方法并不受本实施例所描述的限制。在该过程中 具体了解至少前一帧或至少后一帧音频信号的类型的方案都适用与本发明的 实施例。
512, 结束流程。
在现有技术中,对音频信号的类型进行分类时需要考虑五种特征参数,在 本实施例所提供的方法中,通过采用计算音频信号的音调特征参数, 即可判断 出大部分的音频信号的类型。 与现有技术相比, 分类方法简单, 运算量低。
实施例 2 本实施例公开一种音频信号的分类方法, 如图 2所示, 该方法包括:
101 , 接收到当前一帧的音频信号, 该音频信号即为待分类音频信号。
102, 获取当前一帧音频信号在至少一个子带中的音调特征参数。
一般将频率区域分成 4个频率子带,在每一个子带中, 当前一帧音频信号 都可以获取一个对应的音调特征参数。 当然, 根据设计的需要, 也可以选择获 取当前一帧音频信号在其中的一个或两个子带中的音调特征参数。
103 , 获取当前一帧音频信号的频谱倾斜度特征参数。
在本实施例中, 102、 103不限定执行的顺序, 甚至可以同时执行。
104, 根据在 102中获取的至少一个音调特征参数和在 103中获取的频谱 倾斜度特征参数, 判断当前一帧音频信号的类型。
本实施例提供的技术方案,通过采取根据音频信号的音调特征参数及频谱 倾斜度特征参数来判断音频信号的类型的技术手段,解决了现有技术中,在对 音频信号的类型进行分类时需要谐波、噪音及节奏等五种特征参数导致分类方 法复杂的技术问题,进而取得了降低在对音频信号进行分类时,分类方法的复 杂度, 减少分类时的运算量的技术效果。
实施例 3
本实施例公开一种音频信号的分类方法,如图 3所示,该方法包括如下步 骤:
201 , 接收到当前一帧音频信号, 该音频信号即为待分类音频信号。
具体为: 设采样频率为 48kHz, 帧长 N=1024个样本点, 接收到的当前一 帧音频信号为第 k帧音频信号。
下述为计算当前一帧音频信号的音调特征参数的过程。
202, 计算该当前一帧音频信号的功率谱密度。
具体为: 对第 k帧音频信号的时域数据进行加汉宁窗的加窗处理。
可通过如下汉宁窗的公式计算: 0.5 1— cos 2π·— 0≤/<N-l
. N)
其中, N代表帧长, h (1)代表第 k帧音频信号的第 1个样本点的汉宁窗 数据。
对加窗后的第 k帧 83音频信号的时域数据进行长度为 N的 FFT变换(因为 FFT变换是关于 N/2对称的, 所以实际计算长度为 N/2的 FFT变换即可), 并 利用 FFT变换系数计算该第 k帧音频信号中第 k个的功率谱密度。
该第 k帧音频信号中第 k,个功率谱密度可通过如下计算公式计算:
=10 log
Figure imgf000013_0001
0<^<N/2,0</<N-l
其中 s(l)代表第 k帧音频信号的原始输入样本点, X ( k,)代表第 k帧音频 信号中第 k,个功率谱密度。
对计算出的功率语密度 X ( k,)进行校正, 使得该功率谱密度的最大值为 参考声压级 (96dB)。
203, 利用上述功率谱密度检测在频率区域的每个子带中是否有音调的存 在, 并统计在对应子带中存在的音调的个数,将该音调个数作为在该子带中的 子带音调个数。
具体为: 将频率区域划分为四个频率子带, 分别用 、 sb ^及 ^表 示这四个频率子带。 如果功率谱密度 X (k,)与相邻的第若干个功率谱密度之 间满足一定的条件, 该一定条件在本实施例中可以为如下公式(3) 的所示的 条件, 则认为与该 X (k,)对应的子带中含有音调, 并对该音调的个数进行统 计, 得出在该子带中的子带音调个数 NTk i, 该 NTk— i代表第 k帧音频信号在 子带 sbi (i代表子带的编号, 并且 i=0, 1, 2, 3) 中的子带音调个数。
- 1)< )≤ +1) and X{k')-X{k'+j)≥TdB (3)
其中, ·的取值规定如下:
Figure imgf000014_0001
在本实施例中, 已知功率谱密度的系数个数(即长度)为 N/2, 对应于上 述 j的取值规定, 对于 k,值的取值区间的意义进一步说明如下:
sb° : 对应 2 < k,<63, 对应的功率谱密度系数为第 0个到第 (N/16-1)个, 对 应的频率范围是 [0kHz, 3kHz);
sb . 对应 63 < k,<127, 对应的功率谱密度系数为第 N/16 个到第 (N/8-1) 个, 对应的频率范围是 [3kHz, 6kHz);
sb2 : 对应 127 < k,<255, 对应的功率谱密度系数为第 N/8 个到第 (N/4-1) 个, 对应的频率范围是 [6kHz, 12kHz);
s : 对应 255 < k'<500, 对应的功率谱密度系数为第 N/4个到第 N/2个, 对应的频率范围是 [12kHz, 24kHz)。
其中, A及 对应低频子带部分; 对应较高频子带部分; ^对应高频 子带部分。
具体统计 NTk— i的过程如下:
对于子带 , 使 k,在大于等于 2小于 63的区间内逐一取值, 对于每一个 k,的取值, 判断其是否满足公式(3 )的条件, 在遍历完整个 k,的取值区间后, 统计满足条件的 k,的个数, 该满足条件的 k,个数, 即为第 k帧音频信号在子 带 ^。中的存在的子带音调个数 NTk— 0。
例如: 若当 k,= 3, k'=5, k,=10时, 公式(3 )成立, 则认为在子带 ^中 有 3个子带音调, 即 NTk—。=3。
同样地, 对于子带^, 使 k,在大于等于 63小于 127的区间内逐一取值, 对于每一个 k,的取值, 判断其是否满足公式(3 ) 的条件, 在遍历完整个 k,的 取值区间后, 统计满足条件的 k,的个数, 该满足条件的 k,个数, 即为第 k帧 音频信号在子带 ^中的存在的子带音调个数 NTk l
同样地, 对于子带 , 使 k,在大于等于 127小于 255的区间内逐一取值, 对于每一个 k,的取值, 判断其是否满足公式(3 ) 的条件, 在遍历完整个 k,的 取值区间后, 统计满足条件的 k,的个数, 该满足条件的 k,个数, 即为第 k帧 音频信号在子带 中的存在的子带音调个数 NTk2
利用同样的方法, 也可统计该第 k帧音频信号在子带 ^中的存在的子带 音调个数 NTk3
204, 计算当前一帧音频信号的总音调个数。
具体为:根据 203统计出的 NTk— i计算第 k帧音频信号在四个子带 ^。、 ^、 sb2及 中的子带音调个数之和。
该第 k帧音频信号在四个子带 、 sb ^及 ^中的子带音调个数之和即 为该第 k帧音频信号的中的音调个数, 具体可通过如下公式计算:
NTk sum =∑NTk ; ( 4 )
=0 其中, NTksum代表第 k帧音频信号的总音调个数。
205, 计算在规定帧数内当前一帧音频信号在对应子带中的子带音调个数 均值。
具体为: 设该规定帧数为 M, 在该 M帧内包括第 k帧音频信号和第 k帧 的前( M-1 )帧音频信号,根据 M的值与 k的值之间关系计算第 k帧音频信号 在这 M帧音频信号每个子带中的子带音调个数均值。
该子带音调个数均值具体可通过如下公式( 5 )计算:
Figure imgf000015_0001
其中, NT 代表第 j帧音频信号在子带 i中的子带音调个数, ave— NTi代 表在子带 i中的子带音调个数均值。 特别地, 由公式(5)可知, 在计算时需 根据 k的值与 M的值的关系选择适当的公式进行计算。
特别地,在本实施例中根据设计的需要, 不必对每个子带都计算子带音调 个数均值,计算在低频子带 Λ中的子带音调个数均值 ave— NT。,及在较高频子 带 2中的子带音调个数 ave— NT2.即可。
206, 计算在规定帧数内当前一帧音频信号总的音调个数均值。
具体为: 设该规定帧数为 M, 在该 M帧内包括第 k帧音频信号和第 k帧 的前(M-1 )帧音频信号,根据 M的值与 k的值之间的关系计算第 k帧音频信 号在这 M帧的音频信号内平均每帧音频信号包含的总音调个数。
该总音调个数具体可如下公式(6)计算: = ―
ifk<{M-\)
ave NT = k+\
NT,- (6)
- ifk≥(M-\)
M
其中, NTj— sum代表第 j帧总音调个数, ave— NTsum代表总的音调个数均值。 特别地, 由公式( 6 )可知, 在计算时需根据 k的值与 M的值的关系选择适当 的公式进行计算。
207, 将计算出的在至少一个子带中的子带音调个数均值与总的音调个数 均值之比分别作为当前一帧音频信号在对应子带中的音调特征参数。
该音调特征参数具体可通过如下公式( 7 )计算: ave NT ratio
NT
(7)
其中, ave NTi代表在子带 i中的子带音调个数均值, ave NTsum代表总 的音调个数均值, ave— NT— ratioi代表第 k帧音频信号在子带 i中的子带音调个 数均值与总的音调个数均值的比值。
特别地, 在本实施例中, 利用 205计算出来的在低频子带 A中的子带音 调个数均值 ave— NT。及在较高频子带 s 中的子带音调个数均值 ave— NT2,通 过公式 (7 ) 可分别计算出第 k 帧音频信号在子带 中的音调特征参数 ave NT ratioo 和在子带 ^中的音调特征参数 ave_NT_ratio2, 并将该 ave NT ratioo和 ave— NT— ratio2作为第 k帧音频信号的音调特征参数。
在本实施例中,需要考虑的音调特征参数是在低频子中和在较高频子带中 的音调特征参数,但本发明的设计方案并不仅限于在本实施例中的这一个,根 据设计的需要, 还可以计算在其它子带中的音调特征参数。
下述为计算当前一帧音频信号的频谱倾斜度特征参数的过程。
208, 计算一帧音频信号的频语倾斜度。
具体为: 计算第 k帧音频信号的频谱倾斜度。
该第 k帧音频信号的频语倾斜度可通过如下公式( 8 )计算:
Figure imgf000017_0001
其中, s ( n )代表第 k帧音频信号的第 n个时域样本点, r代表自相关 参数, spec— tiltk代表该第 k帧音频信号的频谱倾斜度。
209, 根据上述计算出的一帧频谱倾斜度, 计算当前一帧音频信号在规定 帧数内的频谱倾斜度均值。
具体为: 设该规定帧数为 M, 在该 M帧内包括第 k帧音频信号和第 k帧 的前(M-1 ) 帧音频信号, 根据 M的值与 k的值之间的关系计算在这 M帧的 音频信号内平均每帧音频信号的频谱倾斜度, 即在这 M帧的音频信号内的频 谱倾斜度均值。
该频谱倾斜度均值具体可通过如下公式( 9 )计算:
Figure imgf000018_0001
其中, k代表当前一帧音频信号的帧号, M代表规定帧数, spec-tiltj代表 第 j帧的音频信号的频谱倾斜度, ave— spec— tilt为频谱倾斜度均值。 特别地, 由公式(9)可知, 在计算时需根据 k的值与 M的值的关系选择适当的公式进 行计算。
210, 将至少一个音频信号的频谱倾斜度与上述计算得出的频谱倾斜度均 值的均方差作为该当前一帧音频信号的频谱倾斜度特征参数。
具体为: 设该规定帧数为 M, 在该 M帧内包括第 k帧音频信号和第 k帧 的前(M-1 )帧音频信号,根据 M的值与 k的值之间的关系计算至少一个音频 信号的频谱倾斜度与频语倾斜度均值的均方差。该均方差即为当前一帧音频信 号的频谱倾斜度特征参数。
该频谱倾斜度特征参数可通过如下公式(10)计算: if k<(M-\) if k≥(M-l) ( 10 )
Figure imgf000018_0002
其中, k代表当前一帧音频信号的帧号, ave— spec— tilt为频谱倾斜度均值, dif— spec— tilt为频谱倾斜度特征参数。 特别地, 由公式(10)可知, 在计算时 需根据 k的值与 M的值的关系选择适当的公式进行计算。
上述实施例中描述的计算音调特征参数的过程 (202到 207)和频谱倾斜 度特征参数的过程 ( 208到 210 )并不限定执行的顺序, 甚至可以同时执行。
211, 根据上述过程中计算得出的音调特征参数和频谱倾斜度特征参数判 断当前一帧音频信号的类型。
具体为: 判断 207 中计算得出的在子带 A中的音调特征参数 ave_NT_ratio0、在子带 ^中的音调特征参数 ave_NT_ratio2及在 210中计算得 出的频谱倾斜度特征参数 dif— spec— tilt 是否与第一参数、第二参数和第三参数 满足一定关系, 该关系在本实施例中可如下关系式( 11 ):
{ tve—NT— ratioQ > ) and (ave _ NT _ ratio 2 < β) αηά (dif _ spec _ tilt > γ)
(11)
其中, ave— NT— ratio。代表第 k帧音频信号在低频子带中的音调特征参 数, ave— NT— ratio2代表第 k 帧音频信号在较高频子带中的音调特征参数, dif— spec— tilt代表第 k帧音频信号的频谱倾斜度特征参数, α代表第一系数, β 代表第二系数, γ代表第三系数。
如果满足所述一定关系, 即上述关系式 (11), 则判定第 k帧音频信号为语 音类型的音频信号, 否则为音乐类型的音频信号。
下述为当前一帧音频信号进行平滑处理的过程。
212, 对于已判断出音频信号的类型的当前一帧音频信号, 再判断该当前 一帧音频信号的前一帧音频信号的类型是否与当前音频信号的后一帧音频信 号的类型相同, 如果判定为两者相同, 则执行 213, 否则执行 215。
具体为: 判断第 (k-1 )帧音频信号的类型是否与第 (k+1 )帧音频信号的 类型相同, 如果判定的结果为第 (k-1 )帧音频信号的类型与第 (k+1 )帧音频 信号的类型相同, 则执行 213, 否则执行 215。
213 , 判断当前一帧音频信号的类型是否与当前一帧音频信号的前一帧音 频信号的类型相同, 如果判定为不相同则执行 214, 否则执行 215。
具体为: 判断第 k帧音频信号的类型是否与第 (k-1 ) 帧音频信号的类型 相同, 如果判断的结果为第 k帧音频信号的类型与第 (k-1 ) 帧音频信号的类 型不相同, 则执行 214, 否则执行 215。
214, 将当前一帧音频信号的类型修改为前一帧音频信号的类型。 具体为: 将第 k帧音频信号的类型修改为第 (k-1 ) 帧音频信号的类型。 在本实施例描述的对当前一帧音频信号进程平滑处理的过程中, 步骤 212 在判断当前一帧音频信号的类型时, 即第 k帧音频信号的类型时, 需要等待第 ( k+1 ) 帧音频信号的类型判断出来后才能进行下一步骤 213, 在这里似乎是 引入了一帧的延时用于等待判断出第 (k+1 ) 帧音频信号的类型, 但通常编码 器算法本身在对每帧音频信号进行编码时均会有一帧的延时,本实施例正好利 用了这一帧的延时来实施平滑处理的过程,既可避免对当前一帧音频信号的类 型的误判, 又不会引入额外的延时,取得了可对音频信号进行实时分类的技术 效果。
在对于延时上的要求不是很严格的情况下,在本实施例的当前一帧音频信 号进行平滑处理的过程中,还可以通过判断当前音频信号的前三帧的类型和后 三帧的类型,或者当前音频信号的前五帧的类型和后五帧的类型等来决定是否 需要对当前音频信号进行平滑处理,具体需要了解的前后相关帧的个数并不受 本实施例中所描述的限制。 因为多了解一些前后相关信息,这样的平滑处理后 的效果可能会更好。
215, 流程结束。
与现有技术需要根据五种特征参数来实现对音频信号的类型进行分类相 比本实施例提供的音频信号的分类方法根据两种特征参数即可实现对音频信 号的类型的分类,分类算法简单, 复杂度低, P争低了分类过程的运算量; 同时, 本实施例的方案还采用了对分类后的音频信号进行平滑处理的技术手段,取得 了可提高对音频信号的类型的识别率,使得后续编码过程中能够充分发挥语音 编码器及音频编码器作用的有益效果。
实施例 4
对应与上述实施 1, 本实施例具体提供一种音频信号的分类装置, 如图 4 所示, 该装置包括: 接收模块 40, 音调获取模块 41, 分类模块 43, 第一判断 模块 44, 第二判断模块 45, 平滑模块 46, 第一设定模块 47。 接收模块 40用于接收当前一帧的音频信号, 该当前一帧的音频信号即为 待分类音频信号; 音调获取模块 41用于获取待分类音频信号在至少一个子带 中的音调特征参数;分类模块 43用于根据音调获取模块 41获取的音调特征参 数判定所述待分类音频信号的类型;第一判断模块 44用于在分类模块 43对待 分类音频信号的类型分类后,判断在所述待分类音频信号之前的至少前一帧音 频信号的类型是否与在所述待分类音频信号之后对应的至少后一帧音频信号 的类型相同;第二判断模块 45用于当第一判断模块 44判定与在所述待分类音 频信号之后对应的至少后一帧音频信号的类型相同时,判断所述待分类音频信 号的类型是否与所述至少前一帧音频信号的类型不同; 平滑模块 46用于当第 二判断模块 45判定与所述至少前一帧音频信号的类型不同时, 对所述待分类 音频信号的类型进行平滑处理; 第一设定模块 47用于预先设定规定计算的帧 数。
在本实施例中, 若所述音调获取模块 41获取的在至少一个子带中的音调 特征参数为: 在低频子带中的音调特征参数和在较高频子带中的音调特征参 数, 则所述分类模块 43包括: 判断单元 431, 分类单元 432。
判断单元 431 用于判断所述待分类音频信号是否在低频子带中的音调特 征参数大于第一系数, 并且在较高频子带中的音调特征参数小于第二系数; 分 类单元 432用于当判断单元 431判定所述待分类音频信号在低频子带中的音调 特征参数大于第一系数, 并且在较高频子带中的音调特征参数小于第二系数 时, 判定所述待分类音频信号的类型为语音类型, 否则为音乐类型。
其中, 音调获取模块 41是根据待分类音频信号在至少一个子带中的音调 个数和所述待分类音频信号总的音调个数计算所述音调特征参数的。
进一步地, 在本实施例中音调获取模块 41 包括: 第一计算单元 411, 第 二计算单元 412, 音调特征单元 413。
第一计算单元 411 用于计算待分类音频信号在至少一个子带中的子带音 调个数均值; 第二计算单元 412用于计算待分类音频信号总的音调个数均值; 音调特征单元 413 用于将所述在至少一个子带中的子带音调个数均值与所述 总的音调个数均值的比值分别作为所述待分类音频信号在对应子带中的音调 特征参数。
其中,第一计算单元 411计算待分类音频信号在至少一个子带中的子带音 调个数均值包括: 根据第一设定模块 47设定的规定计算的帧数与待分类音频 信号的帧号的关系计算在一个子带中的子带音调个数均值。
第二计算单元 412计算待分类音频信号总的音调个数均值包括:根据第一 设定模块设定的规定计算的帧数与待分类音频信号的帧号的关系计算总的音 调个数均值。
本实施例提供的音频信号的分类装置通过采用获取音频信号的音调特征 参数的技术手段,取得了可判断出大部分音频信号的类型的技术效果, P争低了 在对音频信号的分类过程中分类方法的难度, 同时也减少了运算量。
实施例 5
对应与上述实施 2的音频信号的分类方法,本实施例公开一种音频信号的 分类装置, 如图 5所示, 该装置包括: 接收模块 30, 音调获取模块 31, 频谱 倾斜度获取模块 32, 分类模块 33。
接收模块 30用于接收当前一帧的音频信号;音调获取模块 31用于获取待 分类音频信号在至少一个子带中的音调特征参数; 频谱倾斜度获取模块 32用 于获取待分类音频信号的频谱倾斜度特征参数; 分类模块 33用于根据音调获 取模块 31获取的所述音调特征参数和频谱倾斜度获取模块 32获取的频谱倾斜 度特征参数确定所述待分类音频信号的类型。
在现有技术中,在对音频信号进行分类时的需要参考音频信号的多方面的 特征参数, 使得分类复杂度高、运算量大, 而本实施例提供的方案在对音频信 号进行分类时,根据该音频信号的音调和频谱倾斜度两种特征参数即可分辨出 该音频信号的类型,使对音频信号的分类变得简单, 同时也减少了在分类过程 中的运算量。
实施例 6 本实施例具体提供一种音频信号的分类装置, 如图 6所示, 该装置包括: 接收模块 40, 音调获取模块 41, 频谱倾斜度获取模块 42, 分类模块 43, 第一 判断模块 44, 第二判断模块 45, 平滑模块 46, 第一设定模块 47和第二设定 模块 48。
接收模块 40用于接收当前一帧的音频信号, 该当前一帧的音频信号即为 待分类音频信号; 音调获取模块 41用于获取待分类音频信号在至少一个子带 中的音调特征参数; 频谱倾斜度获取模块 42用于获取待分类音频信号的频谱 倾斜度特征参数;分类模块 43用于根据音调获取模块 41获取的所述音调特征 参数和频谱倾斜度获取模块 42获取的频谱倾斜度特征参数判断所述待分类音 频信号的类型;第一判断模块 44用于在分类模块 43对待分类音频信号的类型 分类后,判断在所述待分类音频信号之前的至少前一帧音频信号的类型是否与 在所述待分类音频信号之后对应的至少后一帧音频信号的类型相同;第二判断 模块 45用于当第一判断模块 44判定与在所述待分类音频信号之后对应的至少 后一帧音频信号的类型相同时,判断所述待分类音频信号的类型是否与所述至 少前一帧音频信号的类型不同;平滑模块 46用于当第二判断模块 45判定与所 述至少前一帧音频信号的类型不同时,对所述待分类音频信号的类型进行平滑 处理; 第一设定模块 47用于预先设定计算音调特征参数时需要规定计算的帧 数; 第二设定模块 48用于预先设定计算频谱倾斜度特征参数时需要规定计算 的帧数。
其中, 音调获取模块 41是根据待分类音频信号在至少一个子带中的音调 个数和所述待分类音频信号总的音调个数计算所述音调特征参数的。
在本实施例中, 若所述音调获取模块 41获取的在至少一个子带中的音调 特征参数为: 在低频子带中的音调特征参数和在较高频子带中的音调特征参 数, 则所述分类模块 43包括: 判断单元 431, 分类单元 432。
判断单元 431 用于当所述待分类音频信号在低频子带中的音调特征参数 大于第一系数, 并且在较高频子带中的音调特征参数小于第二系数时,判断所 述音频信号的频谱倾斜度特征参数是否大于第三系数;分类单元 432用于当判 断单元判定所述待分类音频信号的频谱倾斜度特征参数大于第三系数时,判定 所述待分类音频信号的类型为语音类型, 否则为音乐类型。
进一步的, 在本实施例中音调获取模块 41 包括: 第一计算单元 411, 第 二计算单元 412, 音调特征单元 413。
第一计算单元 411 用于计算待分类音频信号在至少一个子带中的子带音 调个数均值; 第二计算单元 412用于计算待分类音频信号总的音调个数均值; 音调特征单元 413 用于将所述在至少一个子带中的子带音调个数均值与所述 总的音调个数均值的比值分别作为所述待分类音频信号在对应子带中的音调 特征参数。
其中,第一计算单元 411计算待分类音频信号在至少一个子带中的子带音 调个数均值根据第一设定模块 47设定的规定计算的帧数与待分类音频信号的 帧号的关系计算在一个子带中的子带音调个数均值。
第二计算单元 412计算待分类音频信号总的音调个数均值包括:根据第一 设定模块 47设定的规定计算的帧数与待分类音频信号的帧号的关系计算总的 音调个数均值。
进一步的, 本实施中, 频语倾斜度获取模块 42包括: 第三计算单元 421, 频谱倾斜度特征单元 422。
第三计算单元 421用于计算待分类音频信号的频语倾斜度均值; 频语倾 斜度特征单元 422 用于将至少一个音频信号的频谱倾斜度与所述频谱倾斜度 均值的均方差作为所述待分类音频信号的频谱倾斜度特征参数。
其中, 第三计算单元 421计算待分类音频信号的频谱倾斜度均值包括:根 据第二设定模块 48设定的规定计算的帧数与待分类音频信号的帧号的关系计 算频谱倾斜度均值。
频谱倾斜度特征单元 422 计算至少一个音频信号的频谱倾斜度与所述频 谱倾斜度均值的均方差包括: 根据第二设定模块 48设定的规定计算的帧数与 待分类音频信号的帧号的关系计算频谱倾斜度特征参数。
本实施例中的第一设定模块 47和第二设定模块 48可以通过一个程序或者 模块实现, 甚至也可以设定相同的规定计算的帧数的值。
本实施例提供的方案具有如下有益效果:分类简单,复杂度低,运算量小, 不为编码器引入额外的延时,可满足中低码率下的语音音频编码器在分类过程 中要求的实时编码、 低复杂度的需求。
本发明实施例主要运用于通信技术领域, 实现对音频信号的类型进行快 速,准确并实时的分类。随着网络技术的发展有可能应用到本领域的其它场景, 也有可能转用到类似或者相近的技术领域上去。 通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本发 明可借助软件加必需的通用硬件平台的方式来实现, 当然也可以通过硬件,但 很多情况下前者是更佳的实施方式。基于这样的理解,本发明的技术方案本质 上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算 机软件产品存储在可读取的存储介质中, 如计算机的软盘, 硬盘或光盘等, 包 括若干指令用以使得一台编码器执行本发明各个实施例所述的方法。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于 此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内, 可轻易想到 变化或替换, 都应涵盖在本发明的保护范围之内。 因此, 本发明的保护范围应 以所述权利要求的保护范围为准。

Claims

权 利 要 求
1、 一种音频信号的分类方法, 其特征在于, 包括:
获取待分类音频信号在至少一个子带中的音调特征参数;
根据获取的所述音调特征参数判定所述待分类音频信号的类型。
2、 根据权利要求 1所述的音频信号的分类方法, 其特征在于, 该方法还 包括:
获取待分类音频信号的频谱倾斜度特征参数;
根据获取的所述频谱倾斜度特征参数确定所述判定的待分类音频信号的 类型。
3、 根据权利要求 1所述的音频信号的分类方法, 其特征在于, 若所述在 至少一个子带中的音调特征参数为:在低频子带中的音调特征参数和在较高频 子带中的音调特征参数,则所述根据获取的特征参数判定所述待分类音频信号 的类型包括:
判断所述待分类音频信号在低频子带中的音调特征参数是否大于第一系 数, 并且在较高频子带中的音调特征参数小于第二系数;
如果所述待分类音频信号在低频子带中的音调特征参数大于第一系数,并 且在较高频子带中的音调特征参数小于第二系数,则所述待分类音频信号的类 型为语音类型, 否则为音乐类型。
4、 根据权利要求 2所述的音频信号的分类方法, 其特征在于, 若所述在 至少一个子带中的音调特征参数为:在低频子带中的音调特征参数和在较高频 子带中的音调特征参数,则所述根据获取的所述频谱倾斜度特征参数确定所述 判定的待分类音频信号的类型包括:
当所述待分类音频信号在低频子带中的音调特征参数大于第一系数,并且 在较高频子带中的音调特征参数小于第二系数时,判断所述待分类音频信号的 频谱倾斜度特征参数是否大于第三系数;
如果所述待分类音频信号的频谱倾斜度特征参数大于第三系数,则所述待 分类音频信号的类型为语音类型, 否则为音乐类型。
5、 根据权利要求 1所述的音频信号的分类方法, 其特征在于, 所述获取 待分类音频信号在至少一个子带中的音调特征参数为:
根据待分类音频信号在至少一个子带中的音调个数和所述待分类音频信 号总的音调个数计算所述音调特征参数。
6、 根据权利要求 5所述的音频信号的分类方法, 其特征在于, 所述根据 待分类音频信号在至少一个子带中的音调个数和所述待分类音频信号总的音 调个数计算所述音调特征参数包括:
计算待分类音频信号在至少一个子带中的子带音调个数均值;
计算待分类音频信号总的音调个数均值;
将所述在至少一个子带中的子带音调个数均值与所述总的音调个数均值 的比值分别作为所述待分类音频信号在对应子带中的音调特征参数。
7、 根据权利要求 6所述的音频信号的分类方法, 其特征在于,
预先设定规定计算的帧数;所述计算待分类音频信号在至少一个子带中的 子带音调个数均值包括:
根据规定计算的帧数与待分类音频信号的帧号的关系计算在一个子带中 的子带音调个数均值。
8、 根据权利要求 6所述的音频信号的分类方法, 其特征在于, 预先设定 规定计算的帧数; 所述计算待分类音频信号总的音调个数均值包括:
根据规定计算的帧数与待分类音频信号的帧号的关系计算总的音调个数 均值。
9、 根据权利要求 2所述的音频信号的分类方法, 其特征在于, 所述获取 所述待分类音频信号的频谱倾斜度特征参数包括:
计算待分类音频信号的频谱倾斜度均值;
将至少一个音频信号的频谱倾斜度与所述频谱倾斜度均值的均方差作为 所述待分类音频信号的频谱倾斜度特征参数。
10、 根据权利要求 9所述的音频信号的分类方法, 其特征在于, 预先设定规定计算的帧数;所述计算待分类音频信号的频谱倾斜度均值包 括: 根据规定计算的帧数与待分类音频信号的帧号的关系计算频谱倾斜度均 值。
11、 根据权利要求 9所述的音频信号的分类方法, 其特征在于, 预先设定规定计算的帧数;所述至少一个音频信号的频谱倾斜度与所述频 语倾斜度均值的均方差包括:根据规定计算的帧数与待分类音频信号的帧号的 关系计算频谱倾斜度特征参数。
12、 一种音频信号的分类装置, 其特征在于, 包括:
音调获取模块,用于获取待分类音频信号在至少一个子带中的音调特征参 数;
分类模块,用于根据获取的所述音调特征参数判定所述待分类音频信号的 类型。
13、 根据权利要求 12所述的音频信号的分类装置, 其特征在于, 该装置 还包括:
频谱倾斜度获取模块, 用于获取待分类音频信号的频谱倾斜度特征参数; 则所述分类模块还用于根据所述频谱倾斜度获取模块获取的频谱倾斜度 特征参数确定所述判定的待分类音频信号的类型。
14、 根据权利要求 12所述的音频信号的分类装置, 其特征在于, 当所述 音调获取模块获取的在至少一个子带中的音调特征参数为:在低频子带中的音 调特征参数和在较高频子带中的音调特征参数时, 所述分类模块包括:
判断单元,用于判断所述待分类音频信号是否在低频子带中的音调特征参 数大于第一系数, 并且在较高频子带中的音调特征参数小于第二系数:
分类单元,用于当判断单元判定所述待分类音频信号在低频子带中的音调 特征参数大于第一系数, 并且在较高频子带中的音调特征参数小于第二系数 时, 判定所述待分类音频信号的类型为语音类型, 否则为音乐类型。
15、 根据权利要求 13所述的音频信号的分类装置, 其特征在于, 当所述 音调获取模块获取的在至少一个子带中的音调特征参数为:在低频子带中的音 调特征参数和在较高频子带中的音调特征参数时, 所述分类模块包括的
判断单元还用于当所述待分类音频信号在低频子带中的音调特征参数大 于第一系数, 并且在高频子带中的音调特征参数小于第二系数时,判断所述音 频信号的频谱倾斜度特征参数是否大于第三系数;
分类单元还用于当判断单元判定所述待分类音频信号的频谱倾斜度特征 参数大于第三系数时,判定所述待分类音频信号的类型为语音类型, 否则为音 乐类型。
16、 根据权利要求 12所述的音频信号的分类装置, 其特征在于, 所述音 调获取模块根据待分类音频信号在至少一个子带中的音调个数和所述待分类 音频信号总的音调个数计算所述音调特征参数。
17、根据权利要求 12或 16所述的音频信号的分类装置, 其特征在于, 所 述音调获取模块包括:
第一计算单元,用于计算待分类音频信号在至少一个子带中的子带音调个 数均值;
第二计算单元, 用于计算待分类音频信号总的音调个数均值;
音调特征单元,用于将所述在至少一个子带中的子带音调个数均值与所述 总的音调个数均值的比值分别作为所述待分类音频信号在对应子带中的音调 特征参数。
18、 根据权利要求 17所述的音频信号的分类装置, 其特征在于, 该装置 还包括:
第一设定模块, 用于预先设定规定计算的帧数;
所述第一计算单元计算待分类音频信号在至少一个子带中的子带音调个 数均值包括:根据第一设定模块设定的规定计算的帧数与待分类音频信号的帧 号的关系计算在一个子带中的子带音调个数均值。
19、 根据权利要求 17所述的音频信号的分类装置, 其特征在于, 该装置 还包括:
第一设定模块, 用于预先设定规定计算的帧数;
所述第二计算单元计算待分类音频信号总的音调个数均值包括:根据第一 设定模块设定的规定计算的帧数与待分类音频信号的帧号的关系计算总的音 调个数均值。
20、 根据权利要求 12所述的音频信号的分类装置, 其特征在于, 所述频 谱倾斜度获取模块包括:
第三计算单元, 用于计算待分类音频信号的频语倾斜度均值; 频语倾斜度特征单元,用于将至少一个音频信号的频谱倾斜度与所述频谱 倾斜度均值的均方差作为所述待分类音频信号的频谱倾斜度特征参数。
21、 根据权利要 20所述的音频信号的分类装置, 其特征在于, 该装置还 包括:
第二设定模块, 用于预先设定规定计算的帧数;
所述第三计算单元计算待分类音频信号的频谱倾斜度均值包括:根据第二 设定模块设定的规定计算的帧数与待分类音频信号的帧号的关系计算频谱倾 斜度均值。
22、 根据权利要求 20所述的音频信号的分类装置, 其特征在于, 该装置 还包括:
第二设定模块, 用于预先设定规定计算的帧数;
所述频谱倾斜度特征单元计算至少一个音频信号的频谱倾斜度与所述频 谱倾斜度均值的均方差包括:根据第二设定模块设定的规定计算的帧数与待分 类音频信号的帧号的关系计算频谱倾斜度特征参数。
PCT/CN2010/071373 2009-03-27 2010-03-27 音频信号的分类方法及装置 WO2010108458A1 (zh)

Priority Applications (7)

Application Number Priority Date Filing Date Title
KR1020117024685A KR101327895B1 (ko) 2009-03-27 2010-03-27 오디오 신호 분류를 위한 방법 및 장치
AU2010227994A AU2010227994B2 (en) 2009-03-27 2010-03-27 Method and device for audio signal classifacation
BRPI1013585A BRPI1013585A2 (pt) 2009-03-27 2010-03-27 método e dispositivo para classificação de sinal de áudio
JP2012501127A JP2012522255A (ja) 2009-03-27 2010-03-27 オーディオ信号分類の方法および装置
SG2011070166A SG174597A1 (en) 2009-03-27 2010-03-27 Method and device for audio signal classification
EP10755458.6A EP2413313B1 (en) 2009-03-27 2010-03-27 Method and device for audio signal classification
US13/246,485 US8682664B2 (en) 2009-03-27 2011-09-27 Method and device for audio signal classification using tonal characteristic parameters and spectral tilt characteristic parameters

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN200910129157.3 2009-03-27
CN2009101291573A CN101847412B (zh) 2009-03-27 2009-03-27 音频信号的分类方法及装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US13/246,485 Continuation US8682664B2 (en) 2009-03-27 2011-09-27 Method and device for audio signal classification using tonal characteristic parameters and spectral tilt characteristic parameters

Publications (1)

Publication Number Publication Date
WO2010108458A1 true WO2010108458A1 (zh) 2010-09-30

Family

ID=42772007

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2010/071373 WO2010108458A1 (zh) 2009-03-27 2010-03-27 音频信号的分类方法及装置

Country Status (9)

Country Link
US (1) US8682664B2 (zh)
EP (1) EP2413313B1 (zh)
JP (1) JP2012522255A (zh)
KR (1) KR101327895B1 (zh)
CN (1) CN101847412B (zh)
AU (1) AU2010227994B2 (zh)
BR (1) BRPI1013585A2 (zh)
SG (1) SG174597A1 (zh)
WO (1) WO2010108458A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111816170A (zh) * 2020-07-29 2020-10-23 网易(杭州)网络有限公司 一种音频分类模型的训练和垃圾音频识别方法和装置

Families Citing this family (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4665836B2 (ja) * 2006-05-31 2011-04-06 日本ビクター株式会社 楽曲分類装置、楽曲分類方法、及び楽曲分類プログラム
CN101847412B (zh) 2009-03-27 2012-02-15 华为技术有限公司 音频信号的分类方法及装置
TWI591620B (zh) 2012-03-21 2017-07-11 三星電子股份有限公司 產生高頻雜訊的方法
RU2656681C1 (ru) * 2012-11-13 2018-06-06 Самсунг Электроникс Ко., Лтд. Способ и устройство для определения режима кодирования, способ и устройство для кодирования аудиосигналов и способ, и устройство для декодирования аудиосигналов
US11222697B2 (en) 2013-02-28 2022-01-11 Samsung Electronics Co., Ltd. Three-dimensional nonvolatile memory and method of performing read operation in the nonvolatile memory
US9665403B2 (en) * 2013-03-15 2017-05-30 Miosoft Corporation Executing algorithms in parallel
CN104282315B (zh) * 2013-07-02 2017-11-24 华为技术有限公司 音频信号分类处理方法、装置及设备
CN106409313B (zh) * 2013-08-06 2021-04-20 华为技术有限公司 一种音频信号分类方法和装置
JP2015037212A (ja) * 2013-08-12 2015-02-23 オリンパスイメージング株式会社 情報処理装置、撮影機器及び情報処理方法
CN105336344B (zh) * 2014-07-10 2019-08-20 华为技术有限公司 杂音检测方法和装置
CN104700833A (zh) * 2014-12-29 2015-06-10 芜湖乐锐思信息咨询有限公司 一种大数据语音分类方法
EP3504708B1 (en) * 2016-09-09 2020-07-15 Huawei Technologies Co., Ltd. A device and method for classifying an acoustic environment
CN107492383B (zh) * 2017-08-07 2022-01-11 上海六界信息技术有限公司 直播内容的筛选方法、装置、设备及存储介质
CN111524536B (zh) * 2019-02-01 2023-09-08 富士通株式会社 信号处理方法和信息处理设备
CN111857639B (zh) * 2020-06-28 2023-01-24 浙江大华技术股份有限公司 音频输入信号的检测系统、方法、计算机设备和存储介质

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060015333A1 (en) * 2004-07-16 2006-01-19 Mindspeed Technologies, Inc. Low-complexity music detection algorithm and system
CN101136199A (zh) * 2006-08-30 2008-03-05 国际商业机器公司 语音数据处理方法和设备

Family Cites Families (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
DE3102385A1 (de) * 1981-01-24 1982-09-02 Blaupunkt-Werke Gmbh, 3200 Hildesheim Schaltungsanordnung zur selbstaetigen aenderung der einstellung von tonwiedergabegeraeten, insbesondere rundfunkempfaengern
DE19505435C1 (de) * 1995-02-17 1995-12-07 Fraunhofer Ges Forschung Verfahren und Vorrichtung zum Bestimmen der Tonalität eines Audiosignals
US5778335A (en) * 1996-02-26 1998-07-07 The Regents Of The University Of California Method and apparatus for efficient multiband celp wideband speech and music coding and decoding
JP3700890B2 (ja) * 1997-07-09 2005-09-28 ソニー株式会社 信号識別装置及び信号識別方法
JPH11202900A (ja) * 1998-01-13 1999-07-30 Nec Corp 音声データ圧縮方法及びそれを適用した音声データ圧縮システム
KR100304092B1 (ko) * 1998-03-11 2001-09-26 마츠시타 덴끼 산교 가부시키가이샤 오디오 신호 부호화 장치, 오디오 신호 복호화 장치 및 오디오 신호 부호화/복호화 장치
JP2000099069A (ja) * 1998-09-24 2000-04-07 Sony Corp 情報信号処理方法及び装置
US6694293B2 (en) 2001-02-13 2004-02-17 Mindspeed Technologies, Inc. Speech coding system with a music classifier
DE10109648C2 (de) * 2001-02-28 2003-01-30 Fraunhofer Ges Forschung Verfahren und Vorrichtung zum Charakterisieren eines Signals und Verfahren und Vorrichtung zum Erzeugen eines indexierten Signals
DE10134471C2 (de) * 2001-02-28 2003-05-22 Fraunhofer Ges Forschung Verfahren und Vorrichtung zum Charakterisieren eines Signals und Verfahren und Vorrichtung zum Erzeugen eines indexierten Signals
JP2002344852A (ja) * 2001-05-14 2002-11-29 Sony Corp 情報信号処理装置および情報信号処理方法
DE10133333C1 (de) * 2001-07-10 2002-12-05 Fraunhofer Ges Forschung Verfahren und Vorrichtung zum Erzeugen eines Fingerabdrucks und Verfahren und Vorrichtung zum Identifizieren eines Audiosignals
KR100880480B1 (ko) * 2002-02-21 2009-01-28 엘지전자 주식회사 디지털 오디오 신호의 실시간 음악/음성 식별 방법 및시스템
US20040024598A1 (en) * 2002-07-03 2004-02-05 Amit Srivastava Thematic segmentation of speech
JP2004240214A (ja) 2003-02-06 2004-08-26 Nippon Telegr & Teleph Corp <Ntt> 音響信号判別方法、音響信号判別装置、音響信号判別プログラム
EP1531458B1 (en) * 2003-11-12 2008-04-16 Sony Deutschland GmbH Apparatus and method for automatic extraction of important events in audio signals
FR2863080B1 (fr) * 2003-11-27 2006-02-24 Advestigo Procede d'indexation et d'identification de documents multimedias
US7026536B2 (en) * 2004-03-25 2006-04-11 Microsoft Corporation Beat analysis of musical signals
DE102004036154B3 (de) * 2004-07-26 2005-12-22 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Vorrichtung und Verfahren zur robusten Klassifizierung von Audiosignalen sowie Verfahren zu Einrichtung und Betrieb einer Audiosignal-Datenbank sowie Computer-Programm
US8126706B2 (en) * 2005-12-09 2012-02-28 Acoustic Technologies, Inc. Music detector for echo cancellation and noise reduction
TWI312982B (en) * 2006-05-22 2009-08-01 Nat Cheng Kung Universit Audio signal segmentation algorithm
US20080034396A1 (en) * 2006-05-30 2008-02-07 Lev Zvi H System and method for video distribution and billing
JP4665836B2 (ja) 2006-05-31 2011-04-06 日本ビクター株式会社 楽曲分類装置、楽曲分類方法、及び楽曲分類プログラム
JP2008015388A (ja) * 2006-07-10 2008-01-24 Dds:Kk 歌唱力評価方法及びカラオケ装置
CA2690433C (en) * 2007-06-22 2016-01-19 Voiceage Corporation Method and device for sound activity detection and sound signal classification
US8600740B2 (en) * 2008-01-28 2013-12-03 Qualcomm Incorporated Systems, methods and apparatus for context descriptor transmission
CN101236742B (zh) * 2008-03-03 2011-08-10 中兴通讯股份有限公司 音乐/非音乐的实时检测方法和装置
WO2009148731A1 (en) * 2008-06-02 2009-12-10 Massachusetts Institute Of Technology Fast pattern classification based on a sparse transform
US8321214B2 (en) * 2008-06-02 2012-11-27 Qualcomm Incorporated Systems, methods, and apparatus for multichannel signal amplitude balancing
PL2301011T3 (pl) * 2008-07-11 2019-03-29 Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. Sposób i dyskryminator do klasyfikacji różnych segmentów sygnału audio zawierającego segmenty mowy i muzyki
CN101847412B (zh) 2009-03-27 2012-02-15 华为技术有限公司 音频信号的分类方法及装置

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060015333A1 (en) * 2004-07-16 2006-01-19 Mindspeed Technologies, Inc. Low-complexity music detection algorithm and system
CN101136199A (zh) * 2006-08-30 2008-03-05 国际商业机器公司 语音数据处理方法和设备

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
See also references of EP2413313A4 *
WU, SHUN-MEI ET AL.: "Real-time Speech/Music Classification Arithmetic Based On Tonality", AUDIO ENGINEERING, vol. 34, no. 2, February 2010 (2010-02-01), pages 66 - 68, XP008167623 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111816170A (zh) * 2020-07-29 2020-10-23 网易(杭州)网络有限公司 一种音频分类模型的训练和垃圾音频识别方法和装置
CN111816170B (zh) * 2020-07-29 2024-01-19 杭州网易智企科技有限公司 一种音频分类模型的训练和垃圾音频识别方法和装置

Also Published As

Publication number Publication date
CN101847412B (zh) 2012-02-15
US8682664B2 (en) 2014-03-25
EP2413313A4 (en) 2012-02-29
AU2010227994B2 (en) 2013-11-14
CN101847412A (zh) 2010-09-29
KR101327895B1 (ko) 2013-11-13
EP2413313B1 (en) 2013-05-29
US20120016677A1 (en) 2012-01-19
SG174597A1 (en) 2011-10-28
BRPI1013585A2 (pt) 2016-04-12
JP2012522255A (ja) 2012-09-20
EP2413313A1 (en) 2012-02-01
AU2010227994A1 (en) 2011-11-03
KR20120000090A (ko) 2012-01-03

Similar Documents

Publication Publication Date Title
WO2010108458A1 (zh) 音频信号的分类方法及装置
US8725499B2 (en) Systems, methods, and apparatus for signal change detection
JP3277398B2 (ja) 有声音判別方法
RU2418321C2 (ru) Классификатор на основе нейронных сетей для выделения аудио источников из монофонического аудио сигнала
Deshmukh et al. Use of temporal information: Detection of periodicity, aperiodicity, and pitch in speech
CN109545188A (zh) 一种实时语音端点检测方法及装置
US11677879B2 (en) Howl detection in conference systems
RU2684194C1 (ru) Способ получения кадра модификации речевой активности, устройство и способ обнаружения речевой активности
CN110536215A (zh) 音频信号处理的方法、装置、计算设置及存储介质
US9454976B2 (en) Efficient discrimination of voiced and unvoiced sounds
CN103026407A (zh) 带宽扩展器
WO2014177084A1 (zh) 激活音检测方法和装置
JP2007041593A (ja) 音声信号のハーモニック成分を用いた有声音/無声音分離情報を抽出する方法及び装置
JP4050350B2 (ja) 音声認識をする方法とシステム
WO2013170610A1 (zh) 检测基音周期的正确性的方法和装置
Tan et al. Noise-robust F0 estimation using SNR-weighted summary correlograms from multi-band comb filters
CN110379438B (zh) 一种语音信号基频检测与提取方法及系统
Schroeder Parameter estimation in speech: a lesson in unorthodoxy
Chen et al. Robust voice activity detection algorithm based on the perceptual wavelet packet transform
WO2022068440A1 (zh) 啸叫抑制方法、装置、计算机设备和存储介质
US10762887B1 (en) Smart voice enhancement architecture for tempo tracking among music, speech, and noise
KR0171004B1 (ko) Samdf를 이용한 기본 주파수와 제1포만트의 비율 측정방법
KR20230066056A (ko) 사운드 코덱에 있어서 비상관 스테레오 콘텐츠의 분류, 크로스-토크 검출 및 스테레오 모드 선택을 위한 방법 및 디바이스
CN110827859A (zh) 一种颤音识别的方法与装置
CN117524264A (zh) 语音检测方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 10755458

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2012501127

Country of ref document: JP

WWE Wipo information: entry into national phase

Ref document number: 4052/KOLNP/2011

Country of ref document: IN

WWE Wipo information: entry into national phase

Ref document number: 2010755458

Country of ref document: EP

ENP Entry into the national phase

Ref document number: 20117024685

Country of ref document: KR

Kind code of ref document: A

ENP Entry into the national phase

Ref document number: 2010227994

Country of ref document: AU

Date of ref document: 20100327

Kind code of ref document: A

REG Reference to national code

Ref country code: BR

Ref legal event code: B01A

Ref document number: PI1013585

Country of ref document: BR

ENP Entry into the national phase

Ref document number: PI1013585

Country of ref document: BR

Kind code of ref document: A2

Effective date: 20110927