CN108538310B - Voice endpoint detection method based on long-time signal power spectrum change - Google Patents
Voice endpoint detection method based on long-time signal power spectrum change Download PDFInfo
- Publication number
- CN108538310B CN108538310B CN201810266002.3A CN201810266002A CN108538310B CN 108538310 B CN108538310 B CN 108538310B CN 201810266002 A CN201810266002 A CN 201810266002A CN 108538310 B CN108538310 B CN 108538310B
- Authority
- CN
- China
- Prior art keywords
- frame
- power spectrum
- signal
- long
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001228 spectrum Methods 0.000 title claims abstract description 65
- 238000001514 detection method Methods 0.000 title claims abstract description 40
- 230000008859 change Effects 0.000 title claims abstract description 38
- 238000000034 method Methods 0.000 claims abstract description 25
- 238000009432 framing Methods 0.000 claims abstract description 4
- 230000008569 process Effects 0.000 claims abstract description 4
- 230000007774 longterm Effects 0.000 claims description 17
- 238000004364 calculation method Methods 0.000 claims description 6
- 238000012935 Averaging Methods 0.000 claims description 3
- 239000000872 buffer Substances 0.000 claims description 3
- 238000013461 design Methods 0.000 claims description 2
- 238000004422 calculation algorithm Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 230000003595 spectral effect Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000004458 analytical method Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/21—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/84—Detection of presence or absence of voice signals for discriminating voice from noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
- G10L25/87—Detection of discrete points within a voice signal
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)
Abstract
A voice endpoint detection method based on long-time signal power spectrum change comprises the following steps: performing frame windowing on an input signal; calculating a power spectrum of the signal subjected to framing and windowing; calculating a power spectrum change value of the long-time signal; carrying out threshold judgment by using the power spectrum change value of the long-time signal; updating the threshold value, namely performing self-adaptive updating on the threshold value by using the threshold value judgment result of the signals of the past 80 frames; voting judgment, wherein the current target frame is the mth frame, and the power spectrum change value L of the long-time signal at the momentx(m) is determined by the current frame and all R-1 frame signals before the current frame, the current target frame participates in R threshold value judgment, and the result of each time is marked as Dm,Dm+1,…,Dm+R‑1If the result of more than 80% of the R times of threshold judgment is that the current target frame contains a voice frame, judging that the current target frame is the voice frame, otherwise, judging that the current target frame is a noise frame; the above process is repeated until the input signal is over. The invention can obviously improve the detection accuracy rate under the environment of the noise of the ribbon and the machine gun.
Description
Technical Field
The invention relates to a voice endpoint detection method. In particular to a voice endpoint detection method based on long-time signal power spectrum change.
Background
The voice endpoint detection refers to distinguishing a voice segment from a non-voice segment in a noise environment, and is a key technology in the field of voice signal processing such as voice coding, voice enhancement and voice recognition.
Currently, voice endpoint detection methods can be mainly classified into two categories: feature-based methods [1] and methods based on machine learning and pattern recognition. Among them, the feature-based method is widely studied and applied due to its advantages of simplicity, rapidity, etc.
1. Voice endpoint detection based on voice short-time characteristics
Early features for voice endpoint detection were mainly: short-time energy and average zero-crossing rate, spectral entropy and cepstrum distance, etc. The method has ideal detection effect in the environment with high signal-to-noise ratio, but the detection performance is sharply reduced when the signal-to-noise ratio is low. In order to improve the noise resistance and robustness of the algorithm, a series of new methods are proposed by related scholars. Such as noise suppression based voice endpoint detection methods; and a voice endpoint detection method combining Fisher linear discrimination and Mel frequency cepstrum coefficients and the like.
2. Voice endpoint detection based on voice long-term characteristics
The above methods are mostly based on short-term characteristics of speech, and do not fully consider long-term change information of speech. In order to better utilize the Long-term characteristics of voice, Ghosh et al propose a detection method based on Long-term Signal variance (LTSV) characteristics, which has strong noise adaptability and can still effectively distinguish voice sections from non-voice sections under extremely low Signal-to-noise ratio (-10 dB); MaY et al propose voice endpoint detection based on Long-term Spectral Flatness (LSFM) characteristics, distinguish voice and noise by measuring the Spectral Flatness of Long-term voice in different frequency bands, and improve accuracy under non-stationary noise such as noisy human voice (babble) and machine gun (machine gun) and robustness under different noise environments. Although the two methods have better robustness under different noises, the detection performance under low signal-to-noise ratio still has a room for improvement, especially for non-stationary noises with slightly poor detection performance such as babble and machine gun.
Disclosure of Invention
The invention aims to solve the technical problem of providing a long-term signal power spectrum change-based voice endpoint detection method which can improve the robustness of a long-term voice feature-based voice endpoint detection algorithm in different noise environments and improve the detection performance in noise environments such as a babble and a machine gun.
The technical scheme adopted by the invention is as follows: a voice endpoint detection method based on long-time signal power spectrum change comprises the following steps:
1) performing frame windowing on an input signal;
2) calculating a power spectrum of the signal subjected to framing and windowing;
3) calculating a power spectrum change value of the long-time signal;
4) carrying out threshold judgment by using the power spectrum change value of the long-time signal;
5) updating the threshold value, namely performing self-adaptive updating on the threshold value by using the threshold value judgment result of the signals of the past 80 frames;
6) voting judgment, wherein the current target frame is the mth frame, and the power spectrum change value L of the long-time signal at the momentx(m) is determined by the current frame and all R-1 frame signals before the current frame, the current target frame participates in R threshold value judgment, and the result of each time is marked as Dm,Dm+1,…,Dm+R-1If the result of more than 80% of the R times of threshold judgment is that the current target frame contains a voice frame, judging that the current target frame is the voice frame, otherwise, judging that the current target frame is a noise frame;
7) and repeating the steps 1) to 6) until the input signal is finished.
Step 2) respectively adopting a classical periodogram method to calculate the frequency w of each frame signal by calculating the short-time discrete Fourier transform of the signal for each frame input signal x (n)kPower spectrum of the ith frame signal at frequency wkThe power spectrum of (a) is represented as follows:
in the formula, NWIndicates the data length of each frame, NSHRepresenting the length of the movement of each frame of data, h (l) representing the length NWThe window function of (2).
The specific calculation process of the step 3) is as follows:
wherein L isx(m) a long-term signal power spectrum change value N representing the mth frame signalFFTThe number of points representing the fourier transform,the power spectrum variation degree of all the past R frame signals at the k-th frequency point is obtained by averaging the power spectrum variation quantity at the k-th frequency point between any two frames of all the past R frame signals, and the corresponding calculation formula is as follows:
wherein Sx(j,wk) And Sx(i,wk) Respectively showing the power spectrums of the signals of the jth frame and the ith frame at the kth frequency point.
Step 4) utilizing the power spectrum change value L of the long-time signalx(m), judging whether all current R frame signals contain voice frames or not, and if so, judging whether all current R frame signals contain voice frames or notx(m) greater than a predetermined threshold value, indicating that a speech frame is present, and flag DmMarked 1, otherwise no speech frame is present, marked DmAnd is noted as 0.
Step 5) specifically designing two buffers BN(m) and BS+N(m), respectively storing the power spectrum change values of the long-time signals judged as the noise frames and the voice frames in the past 80 frames, wherein the threshold value self-adaptive updating formula is as follows:
T(m)=αmin(BS+N(m))+(1-α)max(BN(m))
alpha is a weight parameter.
With the first 50 frames as initial background noise, initializing a threshold according to the initial background noise:
Tinit=μN+pσN
wherein muNAnd σNRespectively representing the average value and the standard deviation of the power spectrum change value of the signal when the background noise is 50 frames, and p is a weighting coefficient.
The voice endpoint detection method based on the long-time signal power spectrum change can obviously improve the detection accuracy rate in the environment of babble and machine gun noise. By adopting the method of updating the threshold value in a self-adaptive manner, the defect of poor environmental adaptability of the traditional fixed threshold value is overcome. Through test, the accuracy of the voice endpoint detection method is wholly superior to that of voice endpoint detection methods of LTSV and LSFM. Under the condition of machine gun noise, the voice endpoint detection accuracy of the method is obviously superior to that of the voice endpoint detection method of LTSV and LSFM, and the average detection accuracy is improved by over 10 percent.
Drawings
FIG. 1 is a flow chart of a method for detecting a voice endpoint based on long-term signal power spectrum changes according to the present invention;
FIG. 2 is a schematic diagram of voting decisions in the present invention;
fig. 3 shows VAD results in different noise environments.
Detailed Description
The following describes a speech endpoint detection method based on long-term signal power spectrum changes in detail with reference to embodiments and drawings.
The invention discloses a voice endpoint detection method based on long-time signal power spectrum change, which comprises the following steps:
1) the input signal is subjected to frame division and windowing, and as the voice signal is a typical non-stationary signal, but compared with the speed of sound wave vibration, the movement of a sounding organ is very slow, and the voice signal is generally considered to be a stationary signal in a time period of 10 ms-30 ms, the signal to be detected is subjected to frame division and truncation;
2) calculating a power spectrum of the signal subjected to framing and windowing; specifically, the frequency w of each frame of input signal x (n) is obtained by calculating the short-time discrete Fourier transform of the input signal by adopting a classical periodogram methodkPower spectrum of the ith frame signal at frequency wkThe power spectrum of (a) is represented as follows:
in the formula, NWIndicates the data length of each frame, NSHRepresenting the length of the movement of each frame of data, h (l) representing the length NWThe window function of (2).
3) Calculating a power spectrum change value of the long-time signal; the power spectrum change parameter of the long-term signal is determined by the power spectrums of the current frame of the input signal x (n) and all R-1 frames before the current frame, and reflects the non-smoothness of the power spectrum of the signal in the past R frame. The specific calculation process of the power spectrum change value of the long-time signal is as follows:
wherein L isx(m) a long-term signal power spectrum change value N representing the mth frame signalFFTThe number of points representing the fourier transform,the power spectrum variation degree of all the past R frame signals at the k-th frequency point is obtained by averaging the power spectrum variation quantity at the k-th frequency point between any two frames of all the past R frame signals, and the corresponding calculation formula is as follows:
wherein Sx(j,wk) And Sx(i,wk) Respectively showing the power spectrums of the signals of the jth frame and the ith frame at the kth frequency point.
4) Carrying out threshold judgment by using the power spectrum change value of the long-time signal; by using the power spectrum variation value L of the long-time signalx(m), judging whether all current R frame signals contain voice frames or not, and if so, judging whether all current R frame signals contain voice frames or notx(m) greater than a predetermined threshold value, indicating that a speech frame is present, and flag DmMarked 1, otherwise no speech frame is present, marked DmAnd is noted as 0.
5) Updating the threshold value, namely performing self-adaptive updating on the threshold value by using the threshold value judgment result of the signals of the past 80 frames; in particular to design two buffers BN(m) and BS+N(m) storing the power spectrum change values of the long-time signals judged as the noise frame and the voice frame in the past 80 frames respectively, wherein the threshold value self-adaptive updating formula is as follows:
T(m)=αmin(BS+N(m))+(1-α)max(BN(m))
The effect is the best when alpha is 0.3 in the weight parameter simulation experiment.
With the first 50 frames as initial background noise, initializing a threshold according to the initial background noise:
Tinit=μN+pσN
wherein muNAnd σNThe mean value and the standard deviation of the power spectrum change value of the signal are respectively shown when the background noise is 50 frames, p is a weighting coefficient, and the effect is best when p is 3 in a simulation experiment.
6) In the voting decision, the long-term characteristics of the signal are counted, so that information of previous and subsequent frames needs to be considered when performing endpoint detection decision. Fig. 2 shows a voting decision diagram, where the current target frame is the mth frame, and the power spectrum change value L of the long-term signal at this time isx(m) is determined by the current frame and all R-1 frame signals before the current frame, the current target frame participates in R threshold value judgment, and the result of each time is marked as Dm,Dm+1,…,Dm+R-1If the result of more than 80% of the R times of threshold judgment is that the current target frame contains a voice frame, judging that the current target frame is the voice frame, otherwise, judging that the current target frame is a noise frame;
7) and repeating the steps 1) to 6) until the input signal is finished.
Specific examples are given below:
according to the flowchart shown in fig. 1, an example analysis is performed on a voice endpoint detection method based on long-time signal power spectrum changes according to the present invention, where voice signals are selected from 20 speakers, 10 men and 10 women, in a timmit voice library, each speaker corresponds to 10 sentences, and endpoints are manually labeled for each sentence (0 represents a noise segment and 1 represents a voice segment). Since the sentences in the TIMIT are short (about 3.5 seconds) and most of the sentences are voice, a silence segment of 1 second is added before each sentence in the experiment so as to count the characteristic parameters of the noise and initialize the decision threshold. The noise is selected from NOISEX-92 noise library, wherein four kinds of noise, namely white, ping, babble and machine gun, are selected. And testing the performance of the algorithm under-5, 0, 5 and 10dB of noise environment respectively, wherein the detection accuracy is taken as a performance index and is defined as:
the error frame number comprises a voice frame number judged as a noise frame number by mistake and a noise frame number judged as a voice frame number by mistake.
Examples are specifically as follows:
1. reading a voice signal, and performing frame windowing, wherein each frame is 512 sampling points, a Hamming window with 512 points is added, and the frame is shifted into 256 sampling points.
2. Carrying out 512-point Fourier transform on each frame of windowed data, and calculating power spectrum parameter S of each frame of datax(i,ωk)。
3. According to the signal power spectrum Sx(i,ωk) Counting long-time signal power spectrum change value L of each frame signalx(m) and initializing a threshold value T using background noise information of the start stageinit。
4. By means of Lx(m) carrying out threshold judgment to judge whether the current R frame signal contains a voice frame or not, if so, judging whether the current R frame signal contains the voice frame or not, and if not, judging whether the current R frame signal contains the voice frame or notx(m) greater than a set threshold, indicating the presence of a speech frame, when DmMarked 1, otherwise no speech frame, DmAnd is noted as 0.
5. The decision threshold is adaptively updated using the threshold decision result of the past 80 frame signal.
6. By using DmAnd voting judgment is carried out on the parameters for the current target frame. As shown in fig. 2, for the R frame threshold determination containing the target frame information, if the result exceeds 80% and is a voice frame, the target frame is determined to be a voice frame, otherwise, the target frame is determined to be a noise frame.
Two segments of speech were randomly picked from the TIMIT speech library, and the VAD results in a 0bB noise environment are shown in FIG. 3. Wherein a1, b1, c1 and d1 respectively represent the voice waveform after 0dB white, ping, babble and machine gun noise is added, and a2, b2, c2 and d2 represent the corresponding VAD results.
Under the noise environments with different signal-to-noise ratios, the voice endpoint detection accuracy rates based on the LTSV, the LSFM and the long-time signal power spectrum change value are respectively counted, as shown in Table 1. It can be seen from the table that under the white, ping and babble noise environments, the detection performances of the three methods are relatively close, and the accuracy of the voice endpoint detection based on the long-time signal power spectrum change value is slightly better than that of the other two methods. However, under the noise environment of the machine gun, the voice endpoint detection accuracy based on the LSVM is obviously superior to that of the other two methods.
TABLE 1 statistical table of results
Claims (4)
1. A voice endpoint detection method based on long-time signal power spectrum change is characterized by comprising the following steps:
1) performing frame windowing on an input signal;
2) calculating a power spectrum of the signal subjected to framing and windowing;
3) calculating a power spectrum change value of the long-time signal; the specific calculation process is as follows:
wherein L isx(m) a long-term signal power spectrum change value N representing the mth frame signalFFTThe number of points representing the fourier transform,the power spectrum variation degree of all the past R frame signals at the k-th frequency point is obtained by averaging the power spectrum variation quantity at the k-th frequency point between any two frames of all the past R frame signals, and the corresponding calculation formula is as follows:
wherein S isx(j,wk) And Sx(i,wk) Respectively represents the power spectrum of the signal of the jth frame and the ith frame at the kth frequency point, wkRepresenting the frequency of the k frequency point;
4) carrying out threshold judgment by using the power spectrum change value of the long-time signal;
5) updating the threshold value, namely performing self-adaptive updating on the threshold value by using the threshold value judgment result of the signals of the past 80 frames;
6) voting judgment, wherein the current target frame is the mth frame, and the power spectrum change value L of the long-time signal at the momentx(m) is determined by the current frame and all R-1 frame signals before the current frame, the current target frame participates in R threshold value judgment, and the result of each time is marked as Dm,Dm+1,…,Dm+R-1If the result of more than 80% of the R times of threshold judgment is that the current target frame contains a voice frame, judging that the current target frame is the voice frame, otherwise, judging that the current target frame is a noise frame;
7) and repeating the steps 1) to 6) until the input signal is finished.
2. The method as claimed in claim 1, wherein the step 2) is to calculate the frequency w of the frame signal by calculating the short-time discrete fourier transform of the input signal x (n) by a classical periodogram method for each frame signal x (n)kPower spectrum of the ith frame signal at frequency wkThe power spectrum of (a) is represented as follows:
in the formula, NWIndicates the data length of each frame, NSHRepresenting the length of the movement of each frame of data, h (l) representing the length NWThe window function of (2).
3. The method of claim 1, wherein the method comprises detecting a voice endpoint based on long-term signal power spectrum changesThe method is characterized in that the step 4) utilizes the power spectrum change value L of the long-time signalx(m), judging whether all current R frame signals contain voice frames or not, and if so, judging whether all current R frame signals contain voice frames or notx(m) greater than a predetermined threshold value, indicating that a speech frame is present, and flag DmMarked 1, otherwise no speech frame is present, marked DmAnd is noted as 0.
4. The method according to claim 1, wherein step 5) is to design two buffers BN(m) and BS+N(m), respectively storing the power spectrum change values of the long-time signals judged as the noise frames and the voice frames in the past 80 frames, wherein the threshold value self-adaptive updating formula is as follows:
T(m)=αmin(BS+N(m))+(1-α)max(BN(m))
wherein alpha is a weight parameter;
with the first 50 frames as initial background noise, initializing a threshold according to the initial background noise:
Tinit=μN+pσN
wherein muNAnd σNRespectively representing the average value and the standard deviation of the power spectrum change value of the signal when the background noise is 50 frames, and p is a weighting coefficient.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810266002.3A CN108538310B (en) | 2018-03-28 | 2018-03-28 | Voice endpoint detection method based on long-time signal power spectrum change |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810266002.3A CN108538310B (en) | 2018-03-28 | 2018-03-28 | Voice endpoint detection method based on long-time signal power spectrum change |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108538310A CN108538310A (en) | 2018-09-14 |
CN108538310B true CN108538310B (en) | 2021-06-25 |
Family
ID=63481488
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810266002.3A Active CN108538310B (en) | 2018-03-28 | 2018-03-28 | Voice endpoint detection method based on long-time signal power spectrum change |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108538310B (en) |
Families Citing this family (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109545188B (en) * | 2018-12-07 | 2021-07-09 | 深圳市友杰智新科技有限公司 | Real-time voice endpoint detection method and device |
CN109346062B (en) * | 2018-12-25 | 2021-05-28 | 思必驰科技股份有限公司 | Voice endpoint detection method and device |
CN110047470A (en) * | 2019-04-11 | 2019-07-23 | 深圳市壹鸽科技有限公司 | A kind of sound end detecting method |
CN110085264B (en) * | 2019-04-30 | 2021-10-15 | 北京如布科技有限公司 | Voice signal detection method, device, equipment and storage medium |
CN111179966A (en) * | 2019-11-25 | 2020-05-19 | 泰康保险集团股份有限公司 | Voice analysis method and device, electronic equipment and storage medium |
CN110827858B (en) * | 2019-11-26 | 2022-06-10 | 思必驰科技股份有限公司 | Voice endpoint detection method and system |
CN110890104B (en) * | 2019-11-26 | 2022-05-03 | 思必驰科技股份有限公司 | Voice endpoint detection method and system |
CN111613250B (en) * | 2020-07-06 | 2023-07-18 | 泰康保险集团股份有限公司 | Long voice endpoint detection method and device, storage medium and electronic equipment |
CN112735482B (en) * | 2020-12-04 | 2024-02-13 | 珠海亿智电子科技有限公司 | Endpoint detection method and system based on joint deep neural network |
CN112967738A (en) * | 2021-02-01 | 2021-06-15 | 腾讯音乐娱乐科技(深圳)有限公司 | Human voice detection method and device, electronic equipment and computer readable storage medium |
CN113205823A (en) * | 2021-04-12 | 2021-08-03 | 广东技术师范大学 | Lung sound signal endpoint detection method, system and storage medium |
CN115376530A (en) * | 2021-05-17 | 2022-11-22 | 华为技术有限公司 | Three-dimensional audio signal coding method, device and coder |
CN113345423B (en) * | 2021-06-24 | 2024-02-13 | 中国科学技术大学 | Voice endpoint detection method, device, electronic equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090222258A1 (en) * | 2008-02-29 | 2009-09-03 | Takashi Fukuda | Voice activity detection system, method, and program product |
CN101814290A (en) * | 2009-02-25 | 2010-08-25 | 三星电子株式会社 | Method for enhancing robustness of voice recognition system |
CN102982811A (en) * | 2012-11-24 | 2013-03-20 | 安徽科大讯飞信息科技股份有限公司 | Voice endpoint detection method based on real-time decoding |
CN103236260A (en) * | 2013-03-29 | 2013-08-07 | 京东方科技集团股份有限公司 | Voice recognition system |
CN106558316A (en) * | 2016-11-09 | 2017-04-05 | 天津大学 | It is a kind of based on it is long when signal special frequency band rate of change detection method of uttering long and high-pitched sounds |
CN107371116A (en) * | 2017-07-21 | 2017-11-21 | 天津大学 | A kind of detection method of uttering long and high-pitched sounds based on interframe spectrum flatness deviation |
CN107393555A (en) * | 2017-07-14 | 2017-11-24 | 西安交通大学 | A kind of detecting system and detection method of low signal-to-noise ratio abnormal sound signal |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100538713C (en) * | 2003-12-23 | 2009-09-09 | 广州可夫医疗科技有限公司 | A kind of brain electricity fluctuation signal analysis equipment |
-
2018
- 2018-03-28 CN CN201810266002.3A patent/CN108538310B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090222258A1 (en) * | 2008-02-29 | 2009-09-03 | Takashi Fukuda | Voice activity detection system, method, and program product |
CN101814290A (en) * | 2009-02-25 | 2010-08-25 | 三星电子株式会社 | Method for enhancing robustness of voice recognition system |
CN102982811A (en) * | 2012-11-24 | 2013-03-20 | 安徽科大讯飞信息科技股份有限公司 | Voice endpoint detection method based on real-time decoding |
CN103236260A (en) * | 2013-03-29 | 2013-08-07 | 京东方科技集团股份有限公司 | Voice recognition system |
CN106558316A (en) * | 2016-11-09 | 2017-04-05 | 天津大学 | It is a kind of based on it is long when signal special frequency band rate of change detection method of uttering long and high-pitched sounds |
CN107393555A (en) * | 2017-07-14 | 2017-11-24 | 西安交通大学 | A kind of detecting system and detection method of low signal-to-noise ratio abnormal sound signal |
CN107371116A (en) * | 2017-07-21 | 2017-11-21 | 天津大学 | A kind of detection method of uttering long and high-pitched sounds based on interframe spectrum flatness deviation |
Non-Patent Citations (3)
Title |
---|
Robust voice activity detection using long-term signal variability;Ghosh, Prasanta Kumar et al.;《IEEE Transactions on Audio, Speech, and Language Processing》;20110331;第19卷(第3期);第600-613页 * |
冯璐.基于长时特征的语音端点检测方法研究.《万方学位论文》.2014,第7-33页. * |
基于长时特征的语音端点检测方法研究;冯璐;《万方学位论文》;20141106;第7-33页 * |
Also Published As
Publication number | Publication date |
---|---|
CN108538310A (en) | 2018-09-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108538310B (en) | Voice endpoint detection method based on long-time signal power spectrum change | |
CN105513605B (en) | The speech-enhancement system and sound enhancement method of mobile microphone | |
US5611019A (en) | Method and an apparatus for speech detection for determining whether an input signal is speech or nonspeech | |
Moattar et al. | A simple but efficient real-time voice activity detection algorithm | |
CN109034046B (en) | Method for automatically identifying foreign matters in electric energy meter based on acoustic detection | |
EP2083417B1 (en) | Sound processing device and program | |
CN108305639B (en) | Speech emotion recognition method, computer-readable storage medium and terminal | |
US20080208578A1 (en) | Robust Speaker-Dependent Speech Recognition System | |
WO2009026561A1 (en) | System and method for noise activity detection | |
JP4682154B2 (en) | Automatic speech recognition channel normalization | |
CN108682432B (en) | Speech emotion recognition device | |
Moattar et al. | A new approach for robust realtime voice activity detection using spectral pattern | |
Özaydın | Examination of energy based voice activity detection algorithms for noisy speech signals | |
Bharath et al. | Multitaper based MFCC feature extraction for robust speaker recognition system | |
Chen et al. | InQSS: a speech intelligibility assessment model using a multi-task learning network | |
CN111091816B (en) | Data processing system and method based on voice evaluation | |
CN114530161A (en) | Voice detection method based on spectral subtraction and self-adaptive subband logarithmic energy entropy product | |
Heese et al. | Speech-codebook based soft voice activity detection | |
CN112489692A (en) | Voice endpoint detection method and device | |
CN110610724A (en) | Voice endpoint detection method and device based on non-uniform sub-band separation variance | |
Pham et al. | Performance analysis of wavelet subband based voice activity detection in cocktail party environment | |
Stadtschnitzer et al. | Reliable voice activity detection algorithms under adverse environments | |
Graf et al. | Improved performance measures for voice activity detection | |
CN117711419B (en) | Intelligent data cleaning method for data center | |
TWI756817B (en) | Voice activity detection device and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |