WO2008067735A1 - Procédé et dispositif de classement pour un signal sonore - Google Patents

Procédé et dispositif de classement pour un signal sonore Download PDF

Info

Publication number
WO2008067735A1
WO2008067735A1 PCT/CN2007/003798 CN2007003798W WO2008067735A1 WO 2008067735 A1 WO2008067735 A1 WO 2008067735A1 CN 2007003798 W CN2007003798 W CN 2007003798W WO 2008067735 A1 WO2008067735 A1 WO 2008067735A1
Authority
WO
WIPO (PCT)
Prior art keywords
parameter
signal
module
type
noise
Prior art date
Application number
PCT/CN2007/003798
Other languages
English (en)
Chinese (zh)
Inventor
Wei Li
Lijing Xu
Qing Zhang
Jianfeng Xu
Shenghu Sang
Zhengzhong Du
Qin Yan
Haojiang Deng
Jun Wang
Original Assignee
Huawei Technologies Co., Ltd.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co., Ltd. filed Critical Huawei Technologies Co., Ltd.
Priority to EP07855800A priority Critical patent/EP2096629B1/fr
Publication of WO2008067735A1 publication Critical patent/WO2008067735A1/fr

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/16Vocoder architecture
    • G10L19/18Vocoders using multiple modes
    • G10L19/20Vocoders using multiple modes using sound class specific coding, hybrid encoders or object based coding

Definitions

  • the present invention relates to the field of speech coding technologies, and in particular, to a sound signal classification method and a sound signal classification device. Background technique
  • voice activity detection (VAD, Voice Activity Detection) is used in speech coding.
  • VAD Voice Activity Detection
  • Technology that allows the encoder to encode background noise and active speech at different rates, encoding background noise at a lower rate, and encoding the active speech at a higher rate, thereby reducing the average
  • the code rate greatly promotes the development of variable rate speech coding technology.
  • VADs Existing signal detectors
  • AMR-WB+ encoder is coded in different modes depending on whether the input audio signal is speech or music after VAD detection to minimize the bit rate and ensure the encoding quality.
  • the two different coding modes in AMR-WB+ include: Algebraic Code Excited Linear Prediction and TCX (Transformation Coded Excited) two core coding algorithms.
  • ACELP belongs to the voice vocalization model, which makes full use of the characteristics of speech. It has high coding efficiency for speech signals, and its technology is quite mature. Therefore, it can be extended by using the former on the universal audio encoder to make the speech coding quality very good. Great improvement.
  • the encoding quality of wideband music is improved by extending the use of TCX encoding on a low bit rate speech coder.
  • Closed-loop selection corresponds to high complexity, which is the default option. It is a choice of ergodic search based on perceptually weighted SNR. Obviously, this selection method is very accurate, but its computational complexity is very high, and the code size is also very high. Larger.
  • the open loop selection includes the following steps:
  • the VAD module determines whether the signal is a non-useful signal or a useful signal based on the tone identification (Tone_flag) and the sub-band energy parameter (Level[n]).
  • step 102 preliminary mode selection (EC) is performed
  • the mode initially determined in step 102 is modified and refined mode selection (ESC) to determine the selected coding mode, based on the open loop pitch parameters and the ISF parameters.
  • ESC mode selection
  • step 104 TCXS processing is performed, that is, when the number of consecutively selecting the speech signal encoding mode is less than three times, a small-scale closed loop traversal search is performed, and finally the encoding mode is determined, wherein the speech signal encoding mode is ACELP, and the music signal encoding mode is TCX.
  • VAD detection algorithm in speech detection and noise immunity is better in various current encoders, in some special music signal tailing parts, it is possible to mistake the music signal into noise, which will result in The ending of the music is truncated, which sounds unnatural.
  • the AMR-WB+ mode selection algorithm does not consider the signal-to-noise ratio environment in which the signal is located, and the performance of distinguishing between speech and music is further deteriorated under low SNR conditions. Summary of the invention
  • the embodiments of the present invention provide a sound signal classification method and a sound signal classification device, which can improve the accuracy of classification and detection of sound signals.
  • a sound signal classification detection method provided by an embodiment of the present invention includes: receiving a sound signal, determining an update rate of the background noise according to a background noise spectrum distribution parameter and a spectrum distribution parameter of the sound signal; and performing noise parameters according to the update rate Updating, and classifying the sound signal based on the subband energy parameter and the updated noise parameter.
  • a sound signal classification device provided by an embodiment of the present invention includes: a background noise parameter update module and a signal initial classification PSC module;
  • the background noise parameter updating module is configured to determine an update rate of the background noise according to the background noise spectrum distribution parameter and the frequency distribution parameter of the current sound signal, and send the determined update rate;
  • the PSC module is configured to receive an update rate from the background noise parameter update module, update the noise parameter, and classify the current sound signal according to the subband energy parameter and the updated noise parameter, and send the classified sound signal type. . .
  • the update rate of the background noise is determined, and the noise parameter is updated according to the update rate, and then the signal is initially classified according to the sub-band energy parameter and the updated noise parameter, and the received voice signal is determined.
  • the non-useful signal and the useful signal reduce the misjudgment of determining the useful signal as a noise signal, and improve the accuracy of the classification of the sound signal.
  • FIG. 1 is a schematic diagram of an open loop selection of an AMR-WB+ encoding algorithm in the prior art
  • FIG. 2 is a general flowchart of a sound signal classification detecting method according to an embodiment of the present invention
  • FIG. 3 is a composition of a sound signal sorting apparatus according to an embodiment of the present invention
  • 4 is a schematic diagram of a system composition based on a specific embodiment of the present invention
  • FIG. 5 is a diagram of an encoder parameter extraction module for calculating various types according to an embodiment of the present invention. Flow chart of parameters;
  • FIG. 6 is a flow chart of another encoder parameter extraction module for calculating various parameters according to an embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a PSC module according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram of determining a feature parameter by a signal classification decision module according to an embodiment of the present invention.
  • FIG. 9 is a schematic diagram of a voice classification decision module performing voice decision according to an embodiment of the present invention.
  • FIG. 10 is a schematic diagram of a signal classification decision module performing music decision according to an embodiment of the present invention.
  • FIG. 11 is a schematic diagram of a signal classification decision module for correcting an initial decision result according to an embodiment of the present invention.
  • FIG. 12 is a schematic diagram showing a preliminary classification of an uncertain signal by a signal classification decision module according to an embodiment of the present invention.
  • FIG. 13 is a schematic diagram of a final classification and correction of a signal by a signal classification decision module according to an embodiment of the present invention
  • Figure 14 is a diagram showing the parameter update of the signal classification decision module in the embodiment of the present invention. detailed description
  • the update rate of the background noise is determined according to the spectrum distribution parameter of the current sound signal and the background noise spectrum distribution parameter, and the noise parameter is updated according to the update rate, and the useful signal in the received voice signal is determined.
  • the non-useful signal is used, it is performed according to the updated noise parameter, so that the accuracy of the noise parameter is higher when determining the useful signal and the non-useful signal, and the accuracy of the sound signal classification is improved.
  • a voice signal classification detection is first provided.
  • the method includes:
  • Step 201 Receive a sound signal, and determine an update rate of the background noise according to the background noise spectrum distribution parameter and a spectrum distribution parameter of the sound signal.
  • Step 202 Update a noise parameter according to the update rate, and classify the sound signal according to the subband energy parameter and the updated noise parameter.
  • the classification of the sound signals is mainly divided into useful signal types and non-useful signal types. Thereafter, the type of the useful signal may further be determined, the type including the voice signal and the music signal, and when determined, based on whether the noise converges, the selection is determined based on the open loop pitch parameter, the pilot frequency parameter, and the subband energy parameter, or the selection is based on The spectral frequency parameter and the sub-band energy parameter are determined.
  • the P-segment has a low-sound effect
  • the determined useful signal type is also obtained, and the signal smear length is determined according to the useful signal type, and further The useful signal and the non-useful signal in the received speech signal are determined based on the tail length of the signal.
  • the smearing of the music signal can be set larger, thereby improving the sound effect of the music signal.
  • the useful signal is determined to be a speech signal or a music signal
  • the signal that cannot be determined very accurately can be first set to an indeterminate type, and then the undetermined type is corrected according to other parameters, and finally the type of the useful signal is determined.
  • the encoding method of the non-useful signal does not need to calculate the spectral frequency parameter, in order to reduce the calculation amount in the classification process and improve the classification efficiency, if the corresponding non-useful signal is determined, the corresponding coding mode does not need to calculate the spectral frequency. For parameters, the lead frequency parameter is not calculated.
  • an embodiment of the present invention further provides an audio signal classification apparatus, including a background noise parameter update module and a signal initial classification (PSC) module.
  • the background noise parameter update module is configured to use a spectrum distribution parameter and a background of the current sound signal.
  • the noise spectrum distribution parameter determines an update rate of the background noise, and transmits the determined update rate to the PSC module;
  • the PSC module is configured to update the noise parameter according to an update rate from the background noise parameter update module, and according to the sub
  • the signal is initially classified with an energy parameter and an updated noise parameter, and the received speech signal is determined to be a useful signal type or a non-useful signal type.
  • the sound signal classification device may further include: a signal classification decision module;
  • the PSC module also transmits the determined signal type to the signal classification decision module; the signal classification decision module determines the type of the useful signal based on the open loop pitch parameter, the guided spectral frequency parameter, and the subband energy parameter, or based on the guided spectral frequency parameter and the subband energy parameter.
  • the type includes a voice signal and a music signal.
  • the sound signal classification device may further include: a classification parameter extraction module;
  • the PSC module transmits the determined signal type to the signal classification decision module by using a classification parameter extraction module; the classification parameter extraction module is further configured to acquire the included spectral frequency parameter and the sub-band energy parameter, or further obtain an open-loop pitch parameter, which will be obtained.
  • Parameter processing is transmitted to the classification decision module for signal classification feature parameters; and processing the parameters to be acquired as a spectral distribution parameter and a background noise spectral distribution parameter of the sound signal, and transmitting the spectral distribution parameters to the background noise parameter update Module; the classification decision module determines the type of the useful signal according to the signal classification feature parameter and the signal type determined by the PSC module, the type including the voice signal and the music signal.
  • the PSC module is further operable to transmit a signal to noise ratio of the sound signal calculated in the process of determining the signal type to the signal classification decision module; the signal classification decision module further determines the useful signal as a voice signal or music according to the signal to noise ratio signal.
  • the sound signal classification device may further include: an encoder mode and a rate selection module; the signal classification decision module transmits the determined signal type to the encoder mode and the rate selection module; and the encoder mode and rate selection module 4 receives the data
  • the signal type is indeed The encoding mode and rate of the sound signal.
  • the sound signal classification device may further include: an encoder parameter extraction module, configured to extract a guide frequency parameter and a sub-band energy parameter, or further extract an open-loop pitch parameter, and transmit the extracted parameter to the classification parameter extraction module And transmitting the extracted subband energy parameters to the PSC module.
  • an encoder parameter extraction module configured to extract a guide frequency parameter and a sub-band energy parameter, or further extract an open-loop pitch parameter, and transmit the extracted parameter to the classification parameter extraction module And transmitting the extracted subband energy parameters to the PSC module.
  • FIG. 4 it is a schematic diagram of a system composition based on a specific embodiment of the present invention.
  • These include a sound activity detector (SARD) which divides the input audio digital signal into different classes according to the needs of the encoder. It can be divided into non-useful signals, voice and music to provide encoders. The basis for coding mode selection and rate selection.
  • SARD sound activity detector
  • the SAD module internally includes: a background noise estimation control module, a signal initial classification module, a classification parameter extraction module, and a signal classification decision module.
  • a signal classifier used internally by the encoder SAD will make full use of the encoder's own parameters in order to reduce resource consumption and computational complexity. Therefore, the subband energy parameter and encoder are calculated by the encoder parameter extraction module in the encoder. Parameters and provide the calculated parameters to the SAD module.
  • the final output of the SAD module is a signal decision type, including non-useful signals, speech, and music, which are provided to the encoder mode and rate selection module for selecting the encoder mode and rate.
  • the encoder parameter extraction module in the encoder calculates the subband energy parameters and the encoder parameters, and provides the calculated parameters to the SAD module.
  • the calculation of the sub-band energy parameter can adopt the filter group filtering method, and the specific number of sub-bands is required according to the calculation complexity.
  • the classification accuracy requirement is determined, and in the present embodiment, the following description is divided into 12 sub-bands.
  • the process of the encoder parameter extraction module calculating the parameters required by the various SAD modules may be as shown in FIG. 5 or FIG. 6.
  • Step 501 The encoder parameter extraction module first calculates a subband energy parameter.
  • Step 502 The encoder parameter extraction module determines, according to a signal initial judgment result (Vad_flag) from the PSC module, whether an pilot frequency (ISF) operation is required, if necessary, step 503 is performed; otherwise, step 504 is performed.
  • Vad_flag signal initial judgment result
  • Determining whether to perform an ISF operation in this step includes: if the current frame is a non-useful signal, according to the mechanism of the encoder: if the encoder requires an ISF parameter for encoding the non-useful signal, performing an ISF operation; if not, the encoder
  • the parameter extraction module ends. If the current frame is a useful signal, an ISF operation is performed. Calculating ISF parameters for useful signals is required for most coding modes and therefore does not introduce redundant complexity into the encoder.
  • the technical solution of the ISF parameter calculation can refer to the data of various encoders, and will not be described here.
  • Step 503 The encoder parameter extraction module calculates an ISF parameter, and then performs step 504.
  • Step 504 The encoder parameter extraction module calculates an open loop pitch parameter.
  • the PSC module and the classification parameter extraction module, and the remaining parameters are provided to the classification parameters in the SAD.
  • Step 601 to step 603 are substantially the same as steps 501 to 503 in FIG. 5, and in step 604, it is determined whether the noise parameter is initialized, that is, whether the noise estimate converges, and if so, it is calculated at step 605. Open loop pitch parameter; otherwise the open loop pitch parameter is not calculated.
  • the open-loop pitch parameter is a redundant coding algorithm, such as the TCX coding mode, in order to reduce the computational complexity, after the noise estimation converges, it is basically determined that the coding mode corresponding to the signal does not need to calculate the open-loop pitch parameter. Therefore, the open loop pitch parameters are no longer calculated.
  • the open-loop pitch parameters need to be calculated, but this is the calculation of the startup phase, and its complexity can be ignored.
  • the technical solution for calculating the open-loop pitch parameters can refer to the ACELP-based coding, and will not be described here.
  • the basis for determining whether the noise estimate converges may be that the number of consecutively determined noise frames exceeds a threshold noise convergence threshold (THR1). In one example of this embodiment, the value of THR1 is 20.
  • the extracted sub-band energy parameter is: level[i].
  • i denotes the member index of the vector, in this embodiment, 1-12, corresponding to 0-200hz, 200-400hz, 400-600hz, 600-800hz, 800-1200hz, 1200-1600hz, 1600-2000hz, 2000- 2400hz, 2400-3200hz, 3200-40000hz, 4000-4800hz, 4800-6400hz
  • the above extracted ISF parameters are: , where n represents the frame index, and i takes 1 ... 16 to represent the member index in the vector.
  • the extracted open loop pitch parameters include:
  • the signal initial classification module can be implemented by using various existing VAD algorithm schemes, including a background noise estimation sub-module, a computational signal-to-noise ratio sub-module, a useful signal estimation sub-module, a decision threshold adjustment word module, and a comparison sub-module. , trailing protection useful letter Number submodule.
  • a background noise estimation sub-module e.g., a background noise estimation sub-module
  • a computational signal-to-noise ratio sub-module e.g., a useful signal estimation sub-module
  • a decision threshold adjustment word module e.g., a comparison sub-module.
  • trailing protection useful letter Number submodule e.g., trailing protection useful letter Number submodule.
  • Calculating the signal-to-noise ratio sub-module calculates the signal-to-noise ratio according to the parameter and the sub-band energy parameter, and the calculated signal-to-noise ratio ⁇ s (sr) is transmitted to the signal classification decision module in addition to the internal use of the PSC module. So that the signal classification decision module is more accurate in distinguishing between voice and music under low SNR conditions.
  • the present embodiment improves the VAD by the following: First, the calculation of the background noise parameter is controlled by the update rate acc provided by the background noise parameter update module.
  • the background noise estimation sub-module receives the update rate from the background noise parameter update module, updates the noise parameter, and transmits the background noise sub-band energy estimation parameter calculated by the updated noise parameter to the calculation signal-to-noise ratio sub-module.
  • the update rate can take 4 files: accl, acc2, acc3, acc4.
  • update_up and update_down parameters are determined, and update_up and update_down correspond to the update rate of background noise up and down, respectively.
  • n subband index
  • the useful signal is generally protected from noise by smearing, and the length of the smear should be compromised between protecting the signal and improving the transmission efficiency.
  • the length of the smear can be learned to take a constant.
  • multi-rate encoders it is oriented to audio signals including music. Such signals often have long low-energy tails. It is difficult for conventional VAD to detect this part of the tail, so it requires a long tailing pair. It is protected.
  • the classification parameter extraction module is configured to calculate the signal classification decision module and the background noise parameter update module according to the Vad_flag parameter determined by the signal initial classification module and the subband energy parameter, the ISF parameter, and the open loop pitch parameter provided by the encoder parameter extraction module.
  • the parameters, and the subband energy ⁇ :, the ISF parameter, the open loop pitch parameter, and the calculated parameter are provided to the signal classification decision module and the background noise parameter.
  • the parameters calculated by the classification parameter extraction module include:
  • H W is 1 when A is truth and 0 when it is false.
  • Sublevel— high— energy level [10]+ level[l l];
  • Sublevel_low_energy level[0]+ level [1]+ level[2]+ level [3]+ level[4]+ level[5]+ level[6]+ level[7] + level[8]+ level[9 ];
  • the short-term average (isf-meanSD) of the distance is: the average value of the distance Isf_SD for five adjacent frames, where
  • level_meanSD which represents the average value of the energy standard deviation (level_SD) of two adjacent frames
  • the calculation method of the level-SD parameter refers to the above calculation method of Isf_SD.
  • the parameters provided to the background noise update module include: zcr, ra, f-flux, and 1_3 «.
  • the parameters provided to the signal classification decision module include: pitch, meangain, isf-meanSD, and level-meanSD.
  • the signal classification decision module is used to derive the sub-band energy from the sampling parameter extraction module based on the snr, Vad_flag from the signal initial classification module PSC.
  • the parameters, pitch, meangain, Isf- meanSD, level-meanSD finally distinguish the signals into: non-useful signal (NOISE), speech signal (SPEECH) and music signal (MUSIC;).
  • the signal classification decision module can include: parameter updater a module and a decision sub-module; the parameter update sub-module is configured to update a threshold in a signal classification decision process according to the signal-to-noise ratio, and provide an updated threshold to the decision sub-module; the Received from The sound signal type of the PSC module, and the useful signal therein is based on the open loop pitch parameter, the guide frequency parameter, the subband energy parameter and the updated threshold, or based on the spectral frequency parameter and the subband energy parameter and the update The latter threshold determines the type of the useful signal and transmits the determined type of useful signal to the encoder mode and rate selection module.
  • Determining the useful signal as a speech signal or a music signal comprises: first setting a value of the speech identifier bit and a value of the music identification bit to be 0, and then according to the pitch parameter identification, the long-term signal correlation value, the lead distance short-term average parameter, and the sub-band energy
  • the sub-standard deviation average parameter preliminarily determines the signal as a voice type, a music type, or an indeterminate type, and then modifies the value of the voice flag or the music flag according to the initially determined voice type or music type; Whether the number of consecutive frames with energy, long-term signal correlation value, sub-band energy sub-standard deviation average parameter, speech_flag, music_flag, and pitch value of 1 exceeds the preset threshold of the number of trailing frames, the number of consecutive music frames, and continuous The number of speech frames, and the type of the previous frame, are corrected for the initially determined speech type, music type, or uncertainty type to determine the type of useful signal, including speech signals and music signals.
  • the embodiment provides a flag tailing mechanism for the parameter, including pitch_flag, level_meanSD_high_flag, ISF-meanSD_high_flag, ISF-meanSD-low — flag, level—meanSD—low—flag, meangain—flag
  • a flag tailing mechanism for the parameter, including pitch_flag, level_meanSD_high_flag, ISF-meanSD_high_flag, ISF-meanSD-low — flag, level—meanSD—low—flag, meangain—flag
  • the length of the trailing period in Fig. 8 is determined according to the trailing parameter identification value.
  • two kinds of trailing settings are provided, that is, a scheme for determining the trailing parameter identification value:
  • the corresponding The parameter tailing counter value is incremented by one; otherwise, the corresponding parameter trailing counter value is set to 0, and different parameter trailing identifiers are set according to the value of the parameter trailing counter.
  • the specific value is determined according to the actual situation when setting the parameter smear identification value according to the parameter counter, and details are not described herein again.
  • the length of the trailing length is controlled according to the error rate ER of each internal node of the decision tree corresponding to the training parameter, and the parameter with a small error rate is short; the parameter with a large error rate is long.
  • the initial voice is determined. As shown in FIG. 9, the voice flag is set to 0 in step 901. Then, in step 902, it is determined whether Isf_meanSD is greater than a preset first voice voice threshold (for example, 1500). If yes, the setting is set. The value of the voice flag is 1; otherwise,
  • step 903 it is determined whether the pitch value is 1, and the pitch delay value t_top-mean obtained by the switch pitch search is smaller than the pitch voice threshold (for example, 40), and if so, the value of the voice flag is set to 1; otherwise,
  • step 904 it is determined whether the number of consecutive frames whose pitch value is 1 exceeds a preset threshold of the number of trailing frames (for example, 2 frames), and if so, the value of the voice flag is set to 1; otherwise,
  • step 905 it is determined whether the meangain is greater than a preset long-term related speech threshold (for example, 8000), and if so, the value of the voice flag is set to 1; otherwise, in step 906, the level_meanSD_high_flag and the ISF_meanSD_high_flag are determined. Whether one or both of them have a value of 1, and if so, the value of the voice flag is set to 1; otherwise, the value of the voice flag is not changed.
  • a preset long-term related speech threshold for example, 8000
  • step 1101 it is determined whether the instantaneous energy of the subband is less than the subband energy threshold (for example, 5000), and if yes, step 1102 is performed; otherwise, the signal is determined to be an indeterminate class (UNCERTAIN);
  • the subband energy threshold for example, 5000
  • step 1103 it is determined that the value of ISF_meanSD is greater than a preset second pilot voice threshold (for example, 2000). If yes, the signal is determined to be a voice signal; otherwise, in step 1104, it is determined whether level_energy is less than 10000, and the previous decision is made. The number of frames that are noisy exceeds five frames. If so, the current signal class is set to an indeterminate class. This is to reduce the misjudgment of classifying noise into music; otherwise,
  • step 1105 it is determined whether the value of the music flag and the voice flag are both 1, and if so, the current signal class is determined to be an indeterminate class; otherwise,
  • step 1106 it is determined whether the values of the music flag and the voice flag are both 0. If yes, the current signal class is determined to be a bit uncertainty class; otherwise,
  • step 1107 it is determined whether the music flag is 0, the voice flag is 1, and if so, the current signal type is determined to be a voice class; otherwise,
  • step 1108 since the music flag is 1 and the voice flag is 0, the current signal type is determined to be a music class.
  • step 1110 is performed: whether the number of consecutive music frames is greater than 3, and ISF_meanSD is smaller than the music threshold of the guided spectrum, and if yes, the signal is determined as a music signal; Otherwise, the signal is determined to be a speech signal.
  • step 1201 it is determined whether the levd_energy is smaller than the sub-band energy uncertainty.
  • the class threshold for example, 5000
  • step 1202 it is determined whether the continuous frame number of the music is greater than 1 and the ISF_meanSD is smaller than the guided music threshold, and if so, the signal is determined For music; otherwise:
  • step 1211 to step 1216 in FIG. 12 if the voice trailing flag is 1, the music trailing flag is 0, and the current signal category is set to the voice class; for example, the music trailing flag is 1, the voice If the trailing flag is 0, the current signal category is set to music class; if the music trailing flag and the music trailing flag are both 1 or 0 at the same time, the signal class is set to the uncertainty class, then if the music is before Continuity exceeds 20 frames, will The signal is determined to be a music class, and if the continuity of the previous speech exceeds 20 frames, the signal is determined to be a speech class.
  • the final correction of the useful signal type is performed in FIG. 13, and the category modification is continued according to the current context.
  • the current context is music, and the persistence is strong.
  • the music signal can be determined by forcibly correcting according to the value of ISF-meanSD.
  • step 1302 if the current context is speech and the persistence is strong, more than 3 seconds, that is, the current continuous number of speech frames exceeds 150 frames, then the forced correction may be performed according to the value of ISF_meanSD to determine the type of the speech signal; Thereafter, if the signal class is also an indeterminate class, then at step 1303 the signal class is modified according to the previous context, ie, the currently undefined signal class is summarized into the previous signal class.
  • the threshold value is updated according to the signal-to-noise ratio of the signal output initial classification module.
  • the threshold examples listed in the embodiment are the values learned under the 20db signal-to-noise ratio condition.
  • the background noise parameter update module uses some of the spectrum distribution parameters calculated in the classification parameters in the SAD to control the update rate of the background noise. Due to the sudden increase of the energy level of the background noise in the actual application environment, the background noise estimation is likely to be unable to be updated due to the signal being continuously judged as a useful signal, and the setting of the background noise parameter update module solves the problem. problem.
  • the background noise parameter update module calculates the relevant spectral distribution parameter vector according to the parameters from the classification parameter extraction block, and includes the following elements: Short-term average of zero-crossing rate zcr
  • Zcr _ mean m ALPHA'zcr _ mean m _ + (1— ALPHA)»zcr m
  • m represents the frame index
  • This embodiment utilizes the characteristics that the spectral characteristics of the background noise are relatively stable, and the members of the spectrum distribution parameter vector may not be limited to the four listed above.
  • the update rate of the current background noise is controlled by the difference between the current frequency distribution parameter and the background noise spectral distribution parameter estimate. This difference can be achieved by algorithms such as Euclidean distance and Manhattan distance.
  • An inventive example of this patent uses Manhattan distance (a name for distance calculation, similar to Euclidean distance), namely:
  • is the spectrum distribution parameter vector of the current signal and is the background noise spectrum distribution parameter vector estimate.
  • the module when ⁇ 1, the module outputs an update rate accl, which represents the fastest update rate; otherwise, when *1112, the update rate acc2 is output; otherwise, when ⁇ 3, the update rate acc3 is output; Otherwise, the update rate acc4 is output.
  • TH1, TH2, TH3 and TH4 are update thresholds, which are determined according to the actual environmental conditions.
  • the update rate of the background noise is determined, and the noise parameter is updated according to the update rate, and then the signal is initially classified according to the sub-band energy parameter and the updated noise parameter, and the received voice signal is determined.
  • Non-useful signals and useful letters No. which reduces the misjudgment of determining the useful signal as a noise signal, and improves the accuracy of the classification of the sound signal.
  • the present invention can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is a better implementation. the way.
  • the technical solution of the present invention which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium, including a plurality of instructions for making a A computer device (which may be a personal computer, server, or network device, etc.) performs the methods described in various embodiments of the present invention.

Abstract

La présente invention concerne un procédé de classement pour un signal sonore qui comprend : la réception du signal sonore, la détermination d'une vitesse de mise à jour d'un bruit de fond en fonction d'un paramètre de répartition spectrale d'un bruit de fond et d'un paramètre de répartition spectrale du signal sonore ; la mise à jour du paramètre de bruit en fonction de la vitesse de mise à jour, et le classement du signal sonore en fonction d'un paramètre d'énergie de sous-bande et du paramètre de bruit mis à jour. Le dispositif de classement pour signal sonore applique que le procédé ci-dessus.
PCT/CN2007/003798 2006-12-05 2007-12-26 Procédé et dispositif de classement pour un signal sonore WO2008067735A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
EP07855800A EP2096629B1 (fr) 2006-12-05 2007-12-26 Procédé et appareil pour le classement de signaux sonores

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN 200610164456 CN100483509C (zh) 2006-12-05 2006-12-05 声音信号分类方法和装置
CN200610164456.7 2006-12-05

Publications (1)

Publication Number Publication Date
WO2008067735A1 true WO2008067735A1 (fr) 2008-06-12

Family

ID=39491665

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2007/003798 WO2008067735A1 (fr) 2006-12-05 2007-12-26 Procédé et dispositif de classement pour un signal sonore

Country Status (3)

Country Link
EP (1) EP2096629B1 (fr)
CN (1) CN100483509C (fr)
WO (1) WO2008067735A1 (fr)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2684194C1 (ru) * 2015-06-26 2019-04-04 ЗетТиИ Корпорейшн Способ получения кадра модификации речевой активности, устройство и способ обнаружения речевой активности

Families Citing this family (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP5168162B2 (ja) * 2009-01-16 2013-03-21 沖電気工業株式会社 音信号調整装置、プログラム及び方法、並びに、電話装置
CN102714034B (zh) * 2009-10-15 2014-06-04 华为技术有限公司 信号处理的方法、装置和系统
CN102299693B (zh) * 2010-06-28 2017-05-03 瀚宇彩晶股份有限公司 音讯调整系统及方法
SG10201604880YA (en) 2010-07-02 2016-08-30 Dolby Int Ab Selective bass post filter
CN102446506B (zh) * 2010-10-11 2013-06-05 华为技术有限公司 音频信号的分类识别方法及装置
EP2702585B1 (fr) 2011-04-28 2014-12-31 Telefonaktiebolaget LM Ericsson (PUBL) Classification de signal audio s'appuyant sur les trames
US8990074B2 (en) * 2011-05-24 2015-03-24 Qualcomm Incorporated Noise-robust speech coding mode classification
US9099098B2 (en) * 2012-01-20 2015-08-04 Qualcomm Incorporated Voice activity detection in presence of background noise
US9173025B2 (en) 2012-02-08 2015-10-27 Dolby Laboratories Licensing Corporation Combined suppression of noise, echo, and out-of-location signals
US8712076B2 (en) 2012-02-08 2014-04-29 Dolby Laboratories Licensing Corporation Post-processing including median filtering of noise suppression gains
CN107195313B (zh) * 2012-08-31 2021-02-09 瑞典爱立信有限公司 用于语音活动性检测的方法和设备
CN102928713B (zh) * 2012-11-02 2017-09-19 北京美尔斯通科技发展股份有限公司 一种磁场天线的本底噪声测量方法
TWI648730B (zh) * 2012-11-13 2019-01-21 南韓商三星電子股份有限公司 決定編碼模式的裝置以及音訊編碼裝置
CN104347067B (zh) 2013-08-06 2017-04-12 华为技术有限公司 一种音频信号分类方法和装置
CN106328152B (zh) * 2015-06-30 2020-01-31 芋头科技(杭州)有限公司 一种室内噪声污染自动识别监测系统
CN105654944B (zh) * 2015-12-30 2019-11-01 中国科学院自动化研究所 一种融合了短时与长时特征建模的环境声识别方法及装置
CN107123419A (zh) * 2017-05-18 2017-09-01 北京大生在线科技有限公司 Sphinx语速识别中背景降噪的优化方法
CN108257617B (zh) * 2018-01-11 2021-01-19 会听声学科技(北京)有限公司 一种噪声场景识别系统及方法
CN110992989B (zh) * 2019-12-06 2022-05-27 广州国音智能科技有限公司 语音采集方法、装置及计算机可读存储介质
CN113257276B (zh) * 2021-05-07 2024-03-29 普联国际有限公司 一种音频场景检测方法、装置、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1296258A (zh) * 1999-11-10 2001-05-23 三菱电机株式会社 噪声抑制装置
CN1331825A (zh) * 1998-12-21 2002-01-16 高通股份有限公司 周期性语音编码法
CN1354455A (zh) * 2000-11-18 2002-06-19 深圳市中兴通讯股份有限公司 一种从噪声环境中识别出语音和音乐的声音活动检测方法
CN1430778A (zh) * 2001-03-28 2003-07-16 三菱电机株式会社 噪声抑制装置
CN1624766A (zh) * 2000-08-21 2005-06-08 康奈克森特系统公司 语音编码中噪音鲁棒分类方法

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5742734A (en) * 1994-08-10 1998-04-21 Qualcomm Incorporated Encoding rate selection in a variable rate vocoder
US6691084B2 (en) * 1998-12-21 2004-02-10 Qualcomm Incorporated Multiple mode variable rate speech coding
US6694293B2 (en) * 2001-02-13 2004-02-17 Mindspeed Technologies, Inc. Speech coding system with a music classifier

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1331825A (zh) * 1998-12-21 2002-01-16 高通股份有限公司 周期性语音编码法
CN1296258A (zh) * 1999-11-10 2001-05-23 三菱电机株式会社 噪声抑制装置
CN1624766A (zh) * 2000-08-21 2005-06-08 康奈克森特系统公司 语音编码中噪音鲁棒分类方法
CN1354455A (zh) * 2000-11-18 2002-06-19 深圳市中兴通讯股份有限公司 一种从噪声环境中识别出语音和音乐的声音活动检测方法
CN1430778A (zh) * 2001-03-28 2003-07-16 三菱电机株式会社 噪声抑制装置

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
BAI L. ET AL.: "Feature Analysis and Extraction for Audio Automatic Classification", MINI-MICRO SYSTEMS, vol. 26, no. 11, November 2005 (2005-11-01), pages 2029 - 2034, XP008109115 *
QI F. AND BAO C.: "A new method to voiced/unvoiced/silence of speech classification using Support Vector Machine", ACTA ELECTRONICA SINICA, vol. 34, no. 4, April 2006 (2006-04-01), pages 605 - 611, XP008109092 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2684194C1 (ru) * 2015-06-26 2019-04-04 ЗетТиИ Корпорейшн Способ получения кадра модификации речевой активности, устройство и способ обнаружения речевой активности

Also Published As

Publication number Publication date
EP2096629B1 (fr) 2012-10-24
CN101197135A (zh) 2008-06-11
EP2096629A4 (fr) 2011-01-26
CN100483509C (zh) 2009-04-29
EP2096629A1 (fr) 2009-09-02

Similar Documents

Publication Publication Date Title
WO2008067735A1 (fr) Procédé et dispositif de classement pour un signal sonore
JP3197155B2 (ja) ディジタル音声コーダにおける音声信号ピッチ周期の推定および分類のための方法および装置
KR100883656B1 (ko) 오디오 신호의 분류 방법 및 장치와 이를 이용한 오디오신호의 부호화/복호화 방법 및 장치
KR100964402B1 (ko) 오디오 신호의 부호화 모드 결정 방법 및 장치와 이를 이용한 오디오 신호의 부호화/복호화 방법 및 장치
CN107004409B (zh) 利用运行范围归一化的神经网络语音活动检测
WO2008067719A1 (fr) Procédé de détection d'activité sonore et dispositif de détection d'activité sonore
CN106409313B (zh) 一种音频信号分类方法和装置
US6993481B2 (en) Detection of speech activity using feature model adaptation
RU2417456C2 (ru) Системы, способы и устройства для обнаружения изменения сигналов
WO2010072115A1 (fr) Procédé de traitement de classification de signaux, dispositif de traitement de classification et système d'encodage
KR101116363B1 (ko) 음성신호 분류방법 및 장치, 및 이를 이용한 음성신호부호화방법 및 장치
CN101399039B (zh) 一种确定非噪声音频信号类别的方法及装置
WO2008148321A1 (fr) Appareil de codage et de décodage et procédé de traitement du bruit de fond et dispositif de communication utilisant cet appareil
US8380494B2 (en) Speech detection using order statistics
US20060015333A1 (en) Low-complexity music detection algorithm and system
CN105103229A (zh) 用于产生频率增强音频信号的译码器、译码方法、用于产生编码信号的编码器以及使用紧密选择边信息的编码方法
US20120197642A1 (en) Signal processing method, device, and system
CN101149921A (zh) 一种静音检测方法和装置
CN100541609C (zh) 一种实现开环基音搜索的方法和装置
JP3331297B2 (ja) 背景音/音声分類方法及び装置並びに音声符号化方法及び装置
Górriz et al. An effective cluster-based model for robust speech detection and speech recognition in noisy environments
CN106256001B (zh) 信号分类方法和装置以及使用其的音频编码方法和装置
CN102903364B (zh) 一种进行语音自适应非连续传输的方法及装置
CN106683681B (zh) 处理丢失帧的方法和装置
CN101393744B (zh) 调整声音激活检测门限值的方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 07855800

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2007855800

Country of ref document: EP