CN108538310B - Voice endpoint detection method based on long-time signal power spectrum change - Google Patents

Voice endpoint detection method based on long-time signal power spectrum change Download PDF

Info

Publication number
CN108538310B
CN108538310B CN201810266002.3A CN201810266002A CN108538310B CN 108538310 B CN108538310 B CN 108538310B CN 201810266002 A CN201810266002 A CN 201810266002A CN 108538310 B CN108538310 B CN 108538310B
Authority
CN
China
Prior art keywords
frame
power spectrum
signal
long
voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810266002.3A
Other languages
Chinese (zh)
Other versions
CN108538310A (en
Inventor
张涛
刘阳
任相赢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201810266002.3A priority Critical patent/CN108538310B/en
Publication of CN108538310A publication Critical patent/CN108538310A/en
Application granted granted Critical
Publication of CN108538310B publication Critical patent/CN108538310B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/87Detection of discrete points within a voice signal

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

A voice endpoint detection method based on long-time signal power spectrum change comprises the following steps: performing frame windowing on an input signal; calculating a power spectrum of the signal subjected to framing and windowing; calculating a power spectrum change value of the long-time signal; carrying out threshold judgment by using the power spectrum change value of the long-time signal; updating the threshold value, namely performing self-adaptive updating on the threshold value by using the threshold value judgment result of the signals of the past 80 frames; voting judgment, wherein the current target frame is the mth frame, and the power spectrum change value L of the long-time signal at the momentx(m) is determined by the current frame and all R-1 frame signals before the current frame, the current target frame participates in R threshold value judgment, and the result of each time is marked as Dm,Dm+1,…,Dm+R‑1If the result of more than 80% of the R times of threshold judgment is that the current target frame contains a voice frame, judging that the current target frame is the voice frame, otherwise, judging that the current target frame is a noise frame; the above process is repeated until the input signal is over. The invention can obviously improve the detection accuracy rate under the environment of the noise of the ribbon and the machine gun.

Description

Voice endpoint detection method based on long-time signal power spectrum change
Technical Field
The invention relates to a voice endpoint detection method. In particular to a voice endpoint detection method based on long-time signal power spectrum change.
Background
The voice endpoint detection refers to distinguishing a voice segment from a non-voice segment in a noise environment, and is a key technology in the field of voice signal processing such as voice coding, voice enhancement and voice recognition.
Currently, voice endpoint detection methods can be mainly classified into two categories: feature-based methods [1] and methods based on machine learning and pattern recognition. Among them, the feature-based method is widely studied and applied due to its advantages of simplicity, rapidity, etc.
1. Voice endpoint detection based on voice short-time characteristics
Early features for voice endpoint detection were mainly: short-time energy and average zero-crossing rate, spectral entropy and cepstrum distance, etc. The method has ideal detection effect in the environment with high signal-to-noise ratio, but the detection performance is sharply reduced when the signal-to-noise ratio is low. In order to improve the noise resistance and robustness of the algorithm, a series of new methods are proposed by related scholars. Such as noise suppression based voice endpoint detection methods; and a voice endpoint detection method combining Fisher linear discrimination and Mel frequency cepstrum coefficients and the like.
2. Voice endpoint detection based on voice long-term characteristics
The above methods are mostly based on short-term characteristics of speech, and do not fully consider long-term change information of speech. In order to better utilize the Long-term characteristics of voice, Ghosh et al propose a detection method based on Long-term Signal variance (LTSV) characteristics, which has strong noise adaptability and can still effectively distinguish voice sections from non-voice sections under extremely low Signal-to-noise ratio (-10 dB); MaY et al propose voice endpoint detection based on Long-term Spectral Flatness (LSFM) characteristics, distinguish voice and noise by measuring the Spectral Flatness of Long-term voice in different frequency bands, and improve accuracy under non-stationary noise such as noisy human voice (babble) and machine gun (machine gun) and robustness under different noise environments. Although the two methods have better robustness under different noises, the detection performance under low signal-to-noise ratio still has a room for improvement, especially for non-stationary noises with slightly poor detection performance such as babble and machine gun.
Disclosure of Invention
The invention aims to solve the technical problem of providing a long-term signal power spectrum change-based voice endpoint detection method which can improve the robustness of a long-term voice feature-based voice endpoint detection algorithm in different noise environments and improve the detection performance in noise environments such as a babble and a machine gun.
The technical scheme adopted by the invention is as follows: a voice endpoint detection method based on long-time signal power spectrum change comprises the following steps:
1) performing frame windowing on an input signal;
2) calculating a power spectrum of the signal subjected to framing and windowing;
3) calculating a power spectrum change value of the long-time signal;
4) carrying out threshold judgment by using the power spectrum change value of the long-time signal;
5) updating the threshold value, namely performing self-adaptive updating on the threshold value by using the threshold value judgment result of the signals of the past 80 frames;
6) voting judgment, wherein the current target frame is the mth frame, and the power spectrum change value L of the long-time signal at the momentx(m) is determined by the current frame and all R-1 frame signals before the current frame, the current target frame participates in R threshold value judgment, and the result of each time is marked as Dm,Dm+1,…,Dm+R-1If the result of more than 80% of the R times of threshold judgment is that the current target frame contains a voice frame, judging that the current target frame is the voice frame, otherwise, judging that the current target frame is a noise frame;
7) and repeating the steps 1) to 6) until the input signal is finished.
Step 2) respectively adopting a classical periodogram method to calculate the frequency w of each frame signal by calculating the short-time discrete Fourier transform of the signal for each frame input signal x (n)kPower spectrum of the ith frame signal at frequency wkThe power spectrum of (a) is represented as follows:
Figure BDA0001611416790000021
in the formula, NWIndicates the data length of each frame, NSHRepresenting the length of the movement of each frame of data, h (l) representing the length NWThe window function of (2).
The specific calculation process of the step 3) is as follows:
Figure BDA0001611416790000022
wherein L isx(m) a long-term signal power spectrum change value N representing the mth frame signalFFTThe number of points representing the fourier transform,
Figure BDA0001611416790000023
the power spectrum variation degree of all the past R frame signals at the k-th frequency point is obtained by averaging the power spectrum variation quantity at the k-th frequency point between any two frames of all the past R frame signals, and the corresponding calculation formula is as follows:
Figure BDA0001611416790000024
wherein Sx(j,wk) And Sx(i,wk) Respectively showing the power spectrums of the signals of the jth frame and the ith frame at the kth frequency point.
Step 4) utilizing the power spectrum change value L of the long-time signalx(m), judging whether all current R frame signals contain voice frames or not, and if so, judging whether all current R frame signals contain voice frames or notx(m) greater than a predetermined threshold value, indicating that a speech frame is present, and flag DmMarked 1, otherwise no speech frame is present, marked DmAnd is noted as 0.
Step 5) specifically designing two buffers BN(m) and BS+N(m), respectively storing the power spectrum change values of the long-time signals judged as the noise frames and the voice frames in the past 80 frames, wherein the threshold value self-adaptive updating formula is as follows:
T(m)=αmin(BS+N(m))+(1-α)max(BN(m))
alpha is a weight parameter.
With the first 50 frames as initial background noise, initializing a threshold according to the initial background noise:
Tinit=μN+pσN
wherein muNAnd σNRespectively representing the average value and the standard deviation of the power spectrum change value of the signal when the background noise is 50 frames, and p is a weighting coefficient.
The voice endpoint detection method based on the long-time signal power spectrum change can obviously improve the detection accuracy rate in the environment of babble and machine gun noise. By adopting the method of updating the threshold value in a self-adaptive manner, the defect of poor environmental adaptability of the traditional fixed threshold value is overcome. Through test, the accuracy of the voice endpoint detection method is wholly superior to that of voice endpoint detection methods of LTSV and LSFM. Under the condition of machine gun noise, the voice endpoint detection accuracy of the method is obviously superior to that of the voice endpoint detection method of LTSV and LSFM, and the average detection accuracy is improved by over 10 percent.
Drawings
FIG. 1 is a flow chart of a method for detecting a voice endpoint based on long-term signal power spectrum changes according to the present invention;
FIG. 2 is a schematic diagram of voting decisions in the present invention;
fig. 3 shows VAD results in different noise environments.
Detailed Description
The following describes a speech endpoint detection method based on long-term signal power spectrum changes in detail with reference to embodiments and drawings.
The invention discloses a voice endpoint detection method based on long-time signal power spectrum change, which comprises the following steps:
1) the input signal is subjected to frame division and windowing, and as the voice signal is a typical non-stationary signal, but compared with the speed of sound wave vibration, the movement of a sounding organ is very slow, and the voice signal is generally considered to be a stationary signal in a time period of 10 ms-30 ms, the signal to be detected is subjected to frame division and truncation;
2) calculating a power spectrum of the signal subjected to framing and windowing; specifically, the frequency w of each frame of input signal x (n) is obtained by calculating the short-time discrete Fourier transform of the input signal by adopting a classical periodogram methodkPower spectrum of the ith frame signal at frequency wkThe power spectrum of (a) is represented as follows:
Figure BDA0001611416790000031
in the formula, NWIndicates the data length of each frame, NSHRepresenting the length of the movement of each frame of data, h (l) representing the length NWThe window function of (2).
3) Calculating a power spectrum change value of the long-time signal; the power spectrum change parameter of the long-term signal is determined by the power spectrums of the current frame of the input signal x (n) and all R-1 frames before the current frame, and reflects the non-smoothness of the power spectrum of the signal in the past R frame. The specific calculation process of the power spectrum change value of the long-time signal is as follows:
Figure BDA0001611416790000032
wherein L isx(m) a long-term signal power spectrum change value N representing the mth frame signalFFTThe number of points representing the fourier transform,
Figure BDA0001611416790000033
the power spectrum variation degree of all the past R frame signals at the k-th frequency point is obtained by averaging the power spectrum variation quantity at the k-th frequency point between any two frames of all the past R frame signals, and the corresponding calculation formula is as follows:
Figure BDA0001611416790000034
wherein Sx(j,wk) And Sx(i,wk) Respectively showing the power spectrums of the signals of the jth frame and the ith frame at the kth frequency point.
4) Carrying out threshold judgment by using the power spectrum change value of the long-time signal; by using the power spectrum variation value L of the long-time signalx(m), judging whether all current R frame signals contain voice frames or not, and if so, judging whether all current R frame signals contain voice frames or notx(m) greater than a predetermined threshold value, indicating that a speech frame is present, and flag DmMarked 1, otherwise no speech frame is present, marked DmAnd is noted as 0.
5) Updating the threshold value, namely performing self-adaptive updating on the threshold value by using the threshold value judgment result of the signals of the past 80 frames; in particular to design two buffers BN(m) and BS+N(m) storing the power spectrum change values of the long-time signals judged as the noise frame and the voice frame in the past 80 frames respectively, wherein the threshold value self-adaptive updating formula is as follows:
T(m)=αmin(BS+N(m))+(1-α)max(BN(m))
The effect is the best when alpha is 0.3 in the weight parameter simulation experiment.
With the first 50 frames as initial background noise, initializing a threshold according to the initial background noise:
Tinit=μN+pσN
wherein muNAnd σNThe mean value and the standard deviation of the power spectrum change value of the signal are respectively shown when the background noise is 50 frames, p is a weighting coefficient, and the effect is best when p is 3 in a simulation experiment.
6) In the voting decision, the long-term characteristics of the signal are counted, so that information of previous and subsequent frames needs to be considered when performing endpoint detection decision. Fig. 2 shows a voting decision diagram, where the current target frame is the mth frame, and the power spectrum change value L of the long-term signal at this time isx(m) is determined by the current frame and all R-1 frame signals before the current frame, the current target frame participates in R threshold value judgment, and the result of each time is marked as Dm,Dm+1,…,Dm+R-1If the result of more than 80% of the R times of threshold judgment is that the current target frame contains a voice frame, judging that the current target frame is the voice frame, otherwise, judging that the current target frame is a noise frame;
7) and repeating the steps 1) to 6) until the input signal is finished.
Specific examples are given below:
according to the flowchart shown in fig. 1, an example analysis is performed on a voice endpoint detection method based on long-time signal power spectrum changes according to the present invention, where voice signals are selected from 20 speakers, 10 men and 10 women, in a timmit voice library, each speaker corresponds to 10 sentences, and endpoints are manually labeled for each sentence (0 represents a noise segment and 1 represents a voice segment). Since the sentences in the TIMIT are short (about 3.5 seconds) and most of the sentences are voice, a silence segment of 1 second is added before each sentence in the experiment so as to count the characteristic parameters of the noise and initialize the decision threshold. The noise is selected from NOISEX-92 noise library, wherein four kinds of noise, namely white, ping, babble and machine gun, are selected. And testing the performance of the algorithm under-5, 0, 5 and 10dB of noise environment respectively, wherein the detection accuracy is taken as a performance index and is defined as:
Figure BDA0001611416790000041
the error frame number comprises a voice frame number judged as a noise frame number by mistake and a noise frame number judged as a voice frame number by mistake.
Examples are specifically as follows:
1. reading a voice signal, and performing frame windowing, wherein each frame is 512 sampling points, a Hamming window with 512 points is added, and the frame is shifted into 256 sampling points.
2. Carrying out 512-point Fourier transform on each frame of windowed data, and calculating power spectrum parameter S of each frame of datax(i,ωk)。
3. According to the signal power spectrum Sx(i,ωk) Counting long-time signal power spectrum change value L of each frame signalx(m) and initializing a threshold value T using background noise information of the start stageinit
4. By means of Lx(m) carrying out threshold judgment to judge whether the current R frame signal contains a voice frame or not, if so, judging whether the current R frame signal contains the voice frame or not, and if not, judging whether the current R frame signal contains the voice frame or notx(m) greater than a set threshold, indicating the presence of a speech frame, when DmMarked 1, otherwise no speech frame, DmAnd is noted as 0.
5. The decision threshold is adaptively updated using the threshold decision result of the past 80 frame signal.
6. By using DmAnd voting judgment is carried out on the parameters for the current target frame. As shown in fig. 2, for the R frame threshold determination containing the target frame information, if the result exceeds 80% and is a voice frame, the target frame is determined to be a voice frame, otherwise, the target frame is determined to be a noise frame.
Two segments of speech were randomly picked from the TIMIT speech library, and the VAD results in a 0bB noise environment are shown in FIG. 3. Wherein a1, b1, c1 and d1 respectively represent the voice waveform after 0dB white, ping, babble and machine gun noise is added, and a2, b2, c2 and d2 represent the corresponding VAD results.
Under the noise environments with different signal-to-noise ratios, the voice endpoint detection accuracy rates based on the LTSV, the LSFM and the long-time signal power spectrum change value are respectively counted, as shown in Table 1. It can be seen from the table that under the white, ping and babble noise environments, the detection performances of the three methods are relatively close, and the accuracy of the voice endpoint detection based on the long-time signal power spectrum change value is slightly better than that of the other two methods. However, under the noise environment of the machine gun, the voice endpoint detection accuracy based on the LSVM is obviously superior to that of the other two methods.
TABLE 1 statistical table of results
Figure BDA0001611416790000051

Claims (4)

1. A voice endpoint detection method based on long-time signal power spectrum change is characterized by comprising the following steps:
1) performing frame windowing on an input signal;
2) calculating a power spectrum of the signal subjected to framing and windowing;
3) calculating a power spectrum change value of the long-time signal; the specific calculation process is as follows:
Figure FDA0003020437120000011
wherein L isx(m) a long-term signal power spectrum change value N representing the mth frame signalFFTThe number of points representing the fourier transform,
Figure FDA0003020437120000012
the power spectrum variation degree of all the past R frame signals at the k-th frequency point is obtained by averaging the power spectrum variation quantity at the k-th frequency point between any two frames of all the past R frame signals, and the corresponding calculation formula is as follows:
Figure FDA0003020437120000013
wherein S isx(j,wk) And Sx(i,wk) Respectively represents the power spectrum of the signal of the jth frame and the ith frame at the kth frequency point, wkRepresenting the frequency of the k frequency point;
4) carrying out threshold judgment by using the power spectrum change value of the long-time signal;
5) updating the threshold value, namely performing self-adaptive updating on the threshold value by using the threshold value judgment result of the signals of the past 80 frames;
6) voting judgment, wherein the current target frame is the mth frame, and the power spectrum change value L of the long-time signal at the momentx(m) is determined by the current frame and all R-1 frame signals before the current frame, the current target frame participates in R threshold value judgment, and the result of each time is marked as Dm,Dm+1,…,Dm+R-1If the result of more than 80% of the R times of threshold judgment is that the current target frame contains a voice frame, judging that the current target frame is the voice frame, otherwise, judging that the current target frame is a noise frame;
7) and repeating the steps 1) to 6) until the input signal is finished.
2. The method as claimed in claim 1, wherein the step 2) is to calculate the frequency w of the frame signal by calculating the short-time discrete fourier transform of the input signal x (n) by a classical periodogram method for each frame signal x (n)kPower spectrum of the ith frame signal at frequency wkThe power spectrum of (a) is represented as follows:
Figure FDA0003020437120000014
in the formula, NWIndicates the data length of each frame, NSHRepresenting the length of the movement of each frame of data, h (l) representing the length NWThe window function of (2).
3. The method of claim 1, wherein the method comprises detecting a voice endpoint based on long-term signal power spectrum changesThe method is characterized in that the step 4) utilizes the power spectrum change value L of the long-time signalx(m), judging whether all current R frame signals contain voice frames or not, and if so, judging whether all current R frame signals contain voice frames or notx(m) greater than a predetermined threshold value, indicating that a speech frame is present, and flag DmMarked 1, otherwise no speech frame is present, marked DmAnd is noted as 0.
4. The method according to claim 1, wherein step 5) is to design two buffers BN(m) and BS+N(m), respectively storing the power spectrum change values of the long-time signals judged as the noise frames and the voice frames in the past 80 frames, wherein the threshold value self-adaptive updating formula is as follows:
T(m)=αmin(BS+N(m))+(1-α)max(BN(m))
wherein alpha is a weight parameter;
with the first 50 frames as initial background noise, initializing a threshold according to the initial background noise:
Tinit=μN+pσN
wherein muNAnd σNRespectively representing the average value and the standard deviation of the power spectrum change value of the signal when the background noise is 50 frames, and p is a weighting coefficient.
CN201810266002.3A 2018-03-28 2018-03-28 Voice endpoint detection method based on long-time signal power spectrum change Active CN108538310B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810266002.3A CN108538310B (en) 2018-03-28 2018-03-28 Voice endpoint detection method based on long-time signal power spectrum change

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810266002.3A CN108538310B (en) 2018-03-28 2018-03-28 Voice endpoint detection method based on long-time signal power spectrum change

Publications (2)

Publication Number Publication Date
CN108538310A CN108538310A (en) 2018-09-14
CN108538310B true CN108538310B (en) 2021-06-25

Family

ID=63481488

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810266002.3A Active CN108538310B (en) 2018-03-28 2018-03-28 Voice endpoint detection method based on long-time signal power spectrum change

Country Status (1)

Country Link
CN (1) CN108538310B (en)

Families Citing this family (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109545188B (en) * 2018-12-07 2021-07-09 深圳市友杰智新科技有限公司 Real-time voice endpoint detection method and device
CN109346062B (en) * 2018-12-25 2021-05-28 思必驰科技股份有限公司 Voice endpoint detection method and device
CN110047470A (en) * 2019-04-11 2019-07-23 深圳市壹鸽科技有限公司 A kind of sound end detecting method
CN110085264B (en) * 2019-04-30 2021-10-15 北京如布科技有限公司 Voice signal detection method, device, equipment and storage medium
CN111179966A (en) * 2019-11-25 2020-05-19 泰康保险集团股份有限公司 Voice analysis method and device, electronic equipment and storage medium
CN110827858B (en) * 2019-11-26 2022-06-10 思必驰科技股份有限公司 Voice endpoint detection method and system
CN110890104B (en) * 2019-11-26 2022-05-03 思必驰科技股份有限公司 Voice endpoint detection method and system
CN111613250B (en) * 2020-07-06 2023-07-18 泰康保险集团股份有限公司 Long voice endpoint detection method and device, storage medium and electronic equipment
CN112735482B (en) * 2020-12-04 2024-02-13 珠海亿智电子科技有限公司 Endpoint detection method and system based on joint deep neural network
CN112967738A (en) * 2021-02-01 2021-06-15 腾讯音乐娱乐科技(深圳)有限公司 Human voice detection method and device, electronic equipment and computer readable storage medium
CN113205823A (en) * 2021-04-12 2021-08-03 广东技术师范大学 Lung sound signal endpoint detection method, system and storage medium
CN115376530A (en) * 2021-05-17 2022-11-22 华为技术有限公司 Three-dimensional audio signal coding method, device and coder
CN113345423B (en) * 2021-06-24 2024-02-13 中国科学技术大学 Voice endpoint detection method, device, electronic equipment and storage medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090222258A1 (en) * 2008-02-29 2009-09-03 Takashi Fukuda Voice activity detection system, method, and program product
CN101814290A (en) * 2009-02-25 2010-08-25 三星电子株式会社 Method for enhancing robustness of voice recognition system
CN102982811A (en) * 2012-11-24 2013-03-20 安徽科大讯飞信息科技股份有限公司 Voice endpoint detection method based on real-time decoding
CN103236260A (en) * 2013-03-29 2013-08-07 京东方科技集团股份有限公司 Voice recognition system
CN106558316A (en) * 2016-11-09 2017-04-05 天津大学 It is a kind of based on it is long when signal special frequency band rate of change detection method of uttering long and high-pitched sounds
CN107371116A (en) * 2017-07-21 2017-11-21 天津大学 A kind of detection method of uttering long and high-pitched sounds based on interframe spectrum flatness deviation
CN107393555A (en) * 2017-07-14 2017-11-24 西安交通大学 A kind of detecting system and detection method of low signal-to-noise ratio abnormal sound signal

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100538713C (en) * 2003-12-23 2009-09-09 广州可夫医疗科技有限公司 A kind of brain electricity fluctuation signal analysis equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090222258A1 (en) * 2008-02-29 2009-09-03 Takashi Fukuda Voice activity detection system, method, and program product
CN101814290A (en) * 2009-02-25 2010-08-25 三星电子株式会社 Method for enhancing robustness of voice recognition system
CN102982811A (en) * 2012-11-24 2013-03-20 安徽科大讯飞信息科技股份有限公司 Voice endpoint detection method based on real-time decoding
CN103236260A (en) * 2013-03-29 2013-08-07 京东方科技集团股份有限公司 Voice recognition system
CN106558316A (en) * 2016-11-09 2017-04-05 天津大学 It is a kind of based on it is long when signal special frequency band rate of change detection method of uttering long and high-pitched sounds
CN107393555A (en) * 2017-07-14 2017-11-24 西安交通大学 A kind of detecting system and detection method of low signal-to-noise ratio abnormal sound signal
CN107371116A (en) * 2017-07-21 2017-11-21 天津大学 A kind of detection method of uttering long and high-pitched sounds based on interframe spectrum flatness deviation

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
Robust voice activity detection using long-term signal variability;Ghosh, Prasanta Kumar et al.;《IEEE Transactions on Audio, Speech, and Language Processing》;20110331;第19卷(第3期);第600-613页 *
冯璐.基于长时特征的语音端点检测方法研究.《万方学位论文》.2014,第7-33页. *
基于长时特征的语音端点检测方法研究;冯璐;《万方学位论文》;20141106;第7-33页 *

Also Published As

Publication number Publication date
CN108538310A (en) 2018-09-14

Similar Documents

Publication Publication Date Title
CN108538310B (en) Voice endpoint detection method based on long-time signal power spectrum change
CN105513605B (en) The speech-enhancement system and sound enhancement method of mobile microphone
US5611019A (en) Method and an apparatus for speech detection for determining whether an input signal is speech or nonspeech
Moattar et al. A simple but efficient real-time voice activity detection algorithm
CN109034046B (en) Method for automatically identifying foreign matters in electric energy meter based on acoustic detection
EP2083417B1 (en) Sound processing device and program
CN108305639B (en) Speech emotion recognition method, computer-readable storage medium and terminal
US20080208578A1 (en) Robust Speaker-Dependent Speech Recognition System
WO2009026561A1 (en) System and method for noise activity detection
JP4682154B2 (en) Automatic speech recognition channel normalization
CN108682432B (en) Speech emotion recognition device
Moattar et al. A new approach for robust realtime voice activity detection using spectral pattern
Özaydın Examination of energy based voice activity detection algorithms for noisy speech signals
Bharath et al. Multitaper based MFCC feature extraction for robust speaker recognition system
Chen et al. InQSS: a speech intelligibility assessment model using a multi-task learning network
CN111091816B (en) Data processing system and method based on voice evaluation
CN114530161A (en) Voice detection method based on spectral subtraction and self-adaptive subband logarithmic energy entropy product
Heese et al. Speech-codebook based soft voice activity detection
CN112489692A (en) Voice endpoint detection method and device
CN110610724A (en) Voice endpoint detection method and device based on non-uniform sub-band separation variance
Pham et al. Performance analysis of wavelet subband based voice activity detection in cocktail party environment
Stadtschnitzer et al. Reliable voice activity detection algorithms under adverse environments
Graf et al. Improved performance measures for voice activity detection
CN117711419B (en) Intelligent data cleaning method for data center
TWI756817B (en) Voice activity detection device and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant