CN108538310A - It is a kind of based on it is long when power spectrum signal variation sound end detecting method - Google Patents

It is a kind of based on it is long when power spectrum signal variation sound end detecting method Download PDF

Info

Publication number
CN108538310A
CN108538310A CN201810266002.3A CN201810266002A CN108538310A CN 108538310 A CN108538310 A CN 108538310A CN 201810266002 A CN201810266002 A CN 201810266002A CN 108538310 A CN108538310 A CN 108538310A
Authority
CN
China
Prior art keywords
frame
power spectrum
long
spectrum signal
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810266002.3A
Other languages
Chinese (zh)
Other versions
CN108538310B (en
Inventor
张涛
刘阳
任相赢
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin University
Original Assignee
Tianjin University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin University filed Critical Tianjin University
Priority to CN201810266002.3A priority Critical patent/CN108538310B/en
Publication of CN108538310A publication Critical patent/CN108538310A/en
Application granted granted Critical
Publication of CN108538310B publication Critical patent/CN108538310B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/87Detection of discrete points within a voice signal

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Measurement Of Mechanical Vibrations Or Ultrasonic Waves (AREA)

Abstract

It is a kind of based on it is long when power spectrum signal variation sound end detecting method:Framing adding window is carried out to input signal;Power spectrum is calculated to the signal after framing adding window;Power spectrum signal changing value when calculating long;Using it is long when power spectrum signal changing value carry out threshold value judgement;Threshold value update is carried out, is that adaptive updates are carried out to threshold value using the threshold value court verdict of 80 frame signals in the past;Ballot judgement, current goal frame are m frames, power spectrum signal changing value L when at this time longx(m) it is codetermined by whole 1 frame signals of R before present frame and present frame, then the judgement of R subthresholds is taken part in altogether for current goal frame, each result is respectively labeled as Dm, Dm+1,…,Dm+R‑1If the result in the judgement of this R subthreshold being more than 80% is comprising speech frame, it is speech frame to adjudicate current goal frame, is otherwise noise frame;It repeats the above process until input signal terminates.The present invention can be obviously improved babble and the Detection accuracy under machine gun noise circumstances.

Description

It is a kind of based on it is long when power spectrum signal variation sound end detecting method
Technical field
The present invention relates to a kind of sound end detecting methods.More particularly to it is a kind of based on it is long when power spectrum signal variation Sound end detecting method.
Background technology
Speech terminals detection refers to that voice segments and non-speech segment are distinguished in noise circumstance, is voice coding, speech enhan-cement With the key technology of the field of voice signal such as speech recognition.
Currently, sound end detecting method can be mainly divided into two major classes:The method [1] of feature based and be based on engineering Practise the method with pattern-recognition.Wherein the method for feature based was extensively studied and answers because of its simple, quick the advantages that With.
1, the speech terminals detection based on voice temporal characteristics
Early stage mainly has for the feature of speech terminals detection:Short-time energy and Average zero-crossing rate, spectrum entropy and cepstrum distance Deng.Detection result of the such methods in the higher environment of signal-to-noise ratio is ideal, but the detection performance meeting when noise is relatively low Drastically decline.In order to improve the noise immunity and robustness of algorithm, related scholar proposes a series of new methods.As being based on making an uproar The sound end detecting method that sound inhibits;Merge the speech terminals detection side of Fisher linear discriminants and Mel frequency cepstral coefficients Method etc..
2, the speech terminals detection based on voice characteristic when long
The above method mostly be voice-based short-time characteristic, do not fully consider voice it is long when change information.In order to more Good land productivity voice it is long when characteristic, Ghosh etc., which is proposed, a kind of being based on long duration change rate signal (Long-term Signal Variability, LTSV) feature detection method, this method have stronger noise adaptation, and Arctic ice area (- Voice segments and non-speech segment can be still efficiently differentiated under 10dB);MaY et al. proposes to be based on long duration signal spectrum flatness The speech terminals detection of (Long-term Spectral Flatness Measure, LSFM) feature, by estimating long duration language Sound distinguishes voice and noise in the spectrum flatness of different frequency bands, improves in noisy voice (babble) and machine gun (machine Gun) accuracy rate under nonstationary noises and the robustness under different noise circumstances such as.Although above two method makes an uproar to difference Have preferable robustness under sound, but the detection performance under low signal-to-noise ratio still has the space of promotion, particularly with babble and The slightly worse nonstationary noise of both detection performances of machine gun.
Invention content
The technical problem to be solved by the invention is to provide it is a kind of can be promoted based on it is long when phonetic feature sound end Robustness of the detection algorithm under different noise circumstances improves the detection property under the noise circumstances such as babble and machine gun Can based on it is long when power spectrum signal variation sound end detecting method.
The technical solution adopted in the present invention is:It is a kind of based on it is long when power spectrum signal variation speech terminals detection side Method includes the following steps:
1) framing adding window is carried out to input signal;
2) power spectrum is calculated to the signal after framing adding window;
3) power spectrum signal changing value when calculating long;
4) using it is long when power spectrum signal changing value carry out threshold value judgement;
5) threshold value update is carried out, is that adaptive updates are carried out to threshold value using the threshold value court verdict of 80 frame signals in the past;
6) ballot judgement, current goal frame are m frames, power spectrum signal changing value L when at this time longx(m) by present frame And whole R-1 frame signals codetermine before present frame, then take part in the judgement of R subthresholds altogether for current goal frame, every time As a result it is respectively labeled as Dm, Dm+1,…,Dm+R-1If the result in the judgement of this R subthreshold being more than 80% is to sentence comprising speech frame Certainly current goal frame is speech frame, is otherwise noise frame;
7) step 1)~step 6) is repeated until input signal terminates.
Step 2) is to use classical period map method by calculating the short of the signal each frame input signal x (n) respectively When discrete Fourier transform acquire the frame signal in frequency wkPower spectrum, the i-th frame signal is in frequency wkPower spectral representation such as Under:
In formula, NWIt indicates per frame data length, NSHIt indicates per frame data movable length, h (l) indicates that length is NWWindow letter Number.
The specific calculating process of step 3) is as follows:
Wherein, Lx(m) indicate m frame signals it is long when power spectrum signal changing value, NFFTFourier transformation points are represented,It indicates power spectrum variation degree of the whole R frame signals at k-th of frequency point in the past, is by past whole R frame signal Arbitrary two frame between power spectrum variable quantity at k-th of frequency point be averaging to obtain, corresponding calculation formula is as follows:
Wherein Sx(j, wk) and Sx(i, wk) power spectrum of jth frame and the i-th frame signal at k-th of frequency point is indicated respectively.
Step 4) is power spectrum signal changing value L when utilizing longx(m), it adjudicates in current whole R frame signals and whether contains language Sound frame, if Lx(m) it is more than the threshold value of setting, expression contains speech frame, marks D at this timemIt is denoted as 1, otherwise indicates to be free of speech frame, Mark DmIt is denoted as 0.
Step 5) is specifically to design two buffer BN(m) and BS+N(m), judgement is noise in 80 frames of storage past respectively Frame and speech frame it is long when power spectrum signal changing value, threshold adaptive more new formula is as follows:
T (m)=α min (BS+N(m))+(1-α)max(BN(m))
α is weight parameter.
To start 50 frames as initial background noise, according to initial background noise initial threshold value:
TinitN+pσN
Wherein μNAnd σNIndicate that the average value and standard deviation of power spectrum signal changing value when 50 frame ambient noises are long, p are respectively Weighting coefficient.
The present invention it is a kind of based on it is long when power spectrum signal variation sound end detecting method, babble can be obviously improved With the Detection accuracy under machine gun noise circumstances.By using the method for adaptive updates threshold value, it is solid to overcome tradition Determine the poor disadvantage of threshold value environmental suitability.Through experimental test, accuracy rate of the invention is integrally better than LTSV, the end-speech of LSFM Point detecting method.Under machine gun noise circumstances, speech terminals detection accuracy rate of the invention is substantially better than LTSV, The sound end detecting method of LSFM, average detected accuracy rate are improved more than 10%.
Description of the drawings
Fig. 1 be the present invention it is a kind of based on it is long when power spectrum signal variation sound end detecting method flow chart;
Fig. 2 is the schematic diagram of judgement of voting in the present invention;
Fig. 3 is the VAD results under different noise circumstances.
Specific implementation mode
With reference to embodiment and attached drawing to the present invention it is a kind of based on it is long when power spectrum signal variation sound end examine Survey method is described in detail.
The present invention it is a kind of based on it is long when power spectrum signal variation sound end detecting method, include the following steps:
1) framing adding window is carried out to input signal, since voice signal is a kind of typical non-stationary signal, but and sound The speed of wave vibration is compared, and the movement of phonatory organ is very slow, it is generally recognized that in 10ms~30ms periods, voice signal It is stationary signal, therefore blocks sub-frame processing to measured signal;
2) power spectrum is calculated to the signal after framing adding window;Specifically respectively to each frame input signal x (n) using classical Period map method acquire the frame signal in frequency w by calculating the discrete Fourier transform in short-term of the signalkPower spectrum, i-th Frame signal is in frequency wkPower spectral representation it is as follows:
In formula, NWIt indicates per frame data length, NSHIt indicates per frame data movable length, h (l) indicates that length is NWWindow letter Number.
3) power spectrum signal changing value when calculating long;Power spectrum signal running parameter is by the current of input signal x (n) when long The power spectrum of whole R-1 frame signals codetermines before frame and present frame, reflects the power spectrum of signal in the non-flat of past R frame Stability.The specific calculating process of power spectrum signal changing value is as follows when long:
Wherein, Lx(m) indicate m frame signals it is long when power spectrum signal changing value, NFFTFourier transformation points are represented,It indicates power spectrum variation degree of the whole R frame signals at k-th of frequency point in the past, is by past whole R frame signal Arbitrary two frame between power spectrum variable quantity at k-th of frequency point be averaging to obtain, corresponding calculation formula is as follows:
Wherein Sx(j, wk) and Sx(i, wk) power spectrum of jth frame and the i-th frame signal at k-th of frequency point is indicated respectively.
4) using it is long when power spectrum signal changing value carry out threshold value judgement;It is power spectrum signal changing value L when utilizing longx (m), it adjudicates in current whole R frame signals and whether contains speech frame, if Lx(m) it is more than the threshold value of setting, expression contains speech frame, D is marked at this timemIt is denoted as 1, otherwise indicates to be free of speech frame, marks DmIt is denoted as 0.
5) threshold value update is carried out, is that adaptive updates are carried out to threshold value using the threshold value court verdict of 80 frame signals in the past; Specifically design two buffer BN(m) and BS+N(m), the length for noise frame and speech frame is adjudicated in 80 frames of storage past respectively When power spectrum signal changing value, threshold adaptive more new formula is as follows:
T (m)=α min (BS+N(m))+(1-α)max(BN(m))
Best results when α is α=0.3 in weight parameter emulation experiment.
To start 50 frames as initial background noise, according to initial background noise initial threshold value:
TinitN+pσN
Wherein μNAnd σNIndicate that the average value and standard deviation of power spectrum signal changing value when 50 frame ambient noises are long, p are respectively Weighting coefficient, best results when p=3 in emulation experiment.
6) ballot judgement, due to having counted the long duration feature of signal, so carrying out needing to consider when end-point detection judgement The information of front and back frame.Ballot judgement schematic diagram is as shown in Fig. 2, current goal frame is m frames, and power spectrum signal becomes when at this time long Change value Lx(m) it is codetermined by whole R-1 frame signals before present frame and present frame, then R is taken part in altogether for current goal frame Subthreshold is adjudicated, and each result is respectively labeled as Dm, Dm+1,…,Dm+R-1If in the judgement of this R subthreshold being more than 80% result It is speech frame comprising speech frame, then to adjudicate current goal frame, is otherwise noise frame;
7) step 1)~step 6) is repeated until input signal terminates.
Specific example is given below:
According to flow chart shown in FIG. 1, to the present invention it is a kind of based on it is long when power spectrum signal variation sound end examine Survey method carries out instance analysis, voice signal 20 speakers in TIMIT sound banks, 10 men, 10 female, each speaker couple 10 sentences are answered, and endpoint (0 represents noise segment, and 1 represents voice segments) is manually marked to each sentence.Due to sentence in TIMIT Shorter (about 3.5 seconds), and most of is voice, therefore mute section of 1 second is added in testing before each sentence, in order to count The characteristic parameter of noise simultaneously initializes decision threshold.Noise be selected from NOISEX-92 noises library, here select white, pink, Tetra- kinds of noises of babble and machine gun.And the testing algorithm performance under the noise circumstance of -5,0,5 and 10dB respectively, this In using Detection accuracy as performance indicator, Detection accuracy is defined as:
Wherein, mistake frame number includes that speech frame is mistaken for noise frame number and noise frame is mistaken for number of speech frames.
Example is specific as follows:
1, voice signal is read, and carries out framing windowing process, per 512 sampled points of frame, adds 512 points of Hamming window, frame It is 256 sampled points to move.
2,512 Fourier transformations are carried out to every frame data after adding window, calculates every frame data power spectrum parameters Sx(i, ωk)。
3, according to power spectrum signal Sx(i,ωk) long per frame signal of statistics when power spectrum signal changing value Lx(m), and it is sharp With the background noise information initial threshold value T of incipient stageinit
4, L is utilizedx(m) threshold value judgement is carried out, adjudicates in current R frame signals whether contain speech frame, if Lx(m) it is more than and sets Determine threshold value, expression contains speech frame, at this time DmIt is denoted as 1, otherwise indicates to be free of speech frame, DmIt is denoted as 0.
5, adaptive updates are carried out to decision threshold using the threshold value court verdict of 80 frame signal of past.
6, D is utilizedmParameter is that current goal frame carries out ballot judgement.As shown in Fig. 2, the R for including target frame information Frame threshold value is adjudicated, if the result for being more than 80% is comprising speech frame, it is speech frame to adjudicate target frame, is otherwise noise frame.
Select two sections of voices at random from TIMIT sound banks, the results are shown in Figure 3 by the VAD under 0bB noise circumstances.Its Middle a1, b1, c1 and d1 indicate the speech waveform after white, pink, babble and machine gun noises of addition 0dB respectively Figure, a2, b2, c2 and d2 indicate corresponding VAD results.
Under the noise circumstance of different signal-to-noise ratio, power spectrum signal when counted respectively based on LTSV, LSFM and being based on long The speech terminals detection accuracy rate of changing value, as shown in table 1.As can be seen from the table, in white, pink and babble noise Under environment, three kinds of method detection performances relatively, based on it is long when power spectrum signal changing value speech terminals detection accuracy rate Slightly it is better than other two method.But under machine gun noise circumstances, the speech terminals detection accuracy rate based on LSVM is obviously excellent In other two method.
1 result statistical form of table

Claims (5)

1. it is a kind of based on it is long when power spectrum signal variation sound end detecting method, which is characterized in that include the following steps:
1) framing adding window is carried out to input signal,;
2) power spectrum is calculated to the signal after framing adding window;
3) power spectrum signal changing value when calculating long;
4) using it is long when power spectrum signal changing value carry out threshold value judgement;
5) threshold value update is carried out, is that adaptive updates are carried out to threshold value using the threshold value court verdict of 80 frame signals in the past;
6) ballot judgement, current goal frame are m frames, power spectrum signal changing value L when at this time longx(m) by present frame and currently Whole R-1 frame signals codetermine before frame, then take part in the judgement of R subthresholds, each result point altogether for current goal frame Biao Ji not bem, Dm+1,…,Dm+R-1If the result in the judgement of this R subthreshold being more than 80% is comprising speech frame, judgement is current Target frame is speech frame, is otherwise noise frame;
7) step 1)~step 6) is repeated until input signal terminates.
2. it is according to claim 1 it is a kind of based on it is long when power spectrum signal variation sound end detecting method, feature Be, step 2) be respectively to each frame input signal x (n) using classical period map method by calculate the signal in short-term from Scattered Fourier transformation acquires the frame signal in frequency wkPower spectrum, the i-th frame signal is in frequency wkPower spectral representation it is as follows:
In formula, NWIt indicates per frame data length, NSHIt indicates per frame data movable length, h (l) indicates that length is NWWindow function.
3. it is according to claim 1 it is a kind of based on it is long when power spectrum signal variation sound end detecting method, feature It is, the specific calculating process of step 3) is as follows:
Wherein, Lx(m) indicate m frame signals it is long when power spectrum signal changing value, NFFTFourier transformation points are represented,Table Show power spectrum variation degree of the whole R frame signals at k-th of frequency point in the past, is by the arbitrary of past whole R frame signal Power spectrum variable quantity between two frames at k-th of frequency point is averaging to obtain, and corresponding calculation formula is as follows:
Wherein Sx(j, wk) and Sx(i, wk) power spectrum of jth frame and the i-th frame signal at k-th of frequency point is indicated respectively.
4. it is according to claim 1 it is a kind of based on it is long when power spectrum signal variation sound end detecting method, feature It is, step 4) is power spectrum signal changing value L when utilizing longx(m), it adjudicates in current whole R frame signals and whether contains voice Frame, if Lx(m) it is more than the threshold value of setting, expression contains speech frame, marks D at this timemIt is denoted as 1, otherwise indicates to be free of speech frame, mark Remember DmIt is denoted as 0.
5. it is according to claim 1 it is a kind of based on it is long when power spectrum signal variation sound end detecting method, feature It is, step 5) is specifically to design two buffer BN(m) and BS+N(m), respectively storage in the past in 80 frames judgement be noise frame and Speech frame it is long when power spectrum signal changing value, threshold adaptive more new formula is as follows:
T (m)=α min (BS+N(m))+(1-α)max(BN(m))
α is weight parameter.
To start 50 frames as initial background noise, according to initial background noise initial threshold value:
TinitN+pσN
Wherein μNAnd σNIndicate that the average value and standard deviation of power spectrum signal changing value when 50 frame ambient noises are long, p are weighting respectively Coefficient.
CN201810266002.3A 2018-03-28 2018-03-28 Voice endpoint detection method based on long-time signal power spectrum change Active CN108538310B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810266002.3A CN108538310B (en) 2018-03-28 2018-03-28 Voice endpoint detection method based on long-time signal power spectrum change

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810266002.3A CN108538310B (en) 2018-03-28 2018-03-28 Voice endpoint detection method based on long-time signal power spectrum change

Publications (2)

Publication Number Publication Date
CN108538310A true CN108538310A (en) 2018-09-14
CN108538310B CN108538310B (en) 2021-06-25

Family

ID=63481488

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810266002.3A Active CN108538310B (en) 2018-03-28 2018-03-28 Voice endpoint detection method based on long-time signal power spectrum change

Country Status (1)

Country Link
CN (1) CN108538310B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109346062A (en) * 2018-12-25 2019-02-15 苏州思必驰信息科技有限公司 Sound end detecting method and device
CN109545188A (en) * 2018-12-07 2019-03-29 深圳市友杰智新科技有限公司 A kind of real-time voice end-point detecting method and device
CN110047470A (en) * 2019-04-11 2019-07-23 深圳市壹鸽科技有限公司 A kind of sound end detecting method
CN110085264A (en) * 2019-04-30 2019-08-02 北京儒博科技有限公司 Voice signal detection method, device, equipment and storage medium
CN110827858A (en) * 2019-11-26 2020-02-21 苏州思必驰信息科技有限公司 Voice endpoint detection method and system
CN110890104A (en) * 2019-11-26 2020-03-17 苏州思必驰信息科技有限公司 Voice endpoint detection method and system
CN111179966A (en) * 2019-11-25 2020-05-19 泰康保险集团股份有限公司 Voice analysis method and device, electronic equipment and storage medium
CN111613250A (en) * 2020-07-06 2020-09-01 泰康保险集团股份有限公司 Long voice endpoint detection method and device, storage medium and electronic equipment
CN112735482A (en) * 2020-12-04 2021-04-30 珠海亿智电子科技有限公司 Endpoint detection method and system based on combined deep neural network
CN112967738A (en) * 2021-02-01 2021-06-15 腾讯音乐娱乐科技(深圳)有限公司 Human voice detection method and device, electronic equipment and computer readable storage medium
CN113205823A (en) * 2021-04-12 2021-08-03 广东技术师范大学 Lung sound signal endpoint detection method, system and storage medium
CN113345423A (en) * 2021-06-24 2021-09-03 科大讯飞股份有限公司 Voice endpoint detection method and device, electronic equipment and storage medium
WO2022242479A1 (en) * 2021-05-17 2022-11-24 华为技术有限公司 Three-dimensional audio signal encoding method and apparatus, and encoder

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1632816A (en) * 2003-12-23 2005-06-29 广州可夫医疗科技有限公司 Method for analyzing EEG fluctuation signal and equipment thereof
US20090222258A1 (en) * 2008-02-29 2009-09-03 Takashi Fukuda Voice activity detection system, method, and program product
CN101814290A (en) * 2009-02-25 2010-08-25 三星电子株式会社 Method for enhancing robustness of voice recognition system
CN102982811A (en) * 2012-11-24 2013-03-20 安徽科大讯飞信息科技股份有限公司 Voice endpoint detection method based on real-time decoding
CN103236260A (en) * 2013-03-29 2013-08-07 京东方科技集团股份有限公司 Voice recognition system
CN106558316A (en) * 2016-11-09 2017-04-05 天津大学 It is a kind of based on it is long when signal special frequency band rate of change detection method of uttering long and high-pitched sounds
CN107371116A (en) * 2017-07-21 2017-11-21 天津大学 A kind of detection method of uttering long and high-pitched sounds based on interframe spectrum flatness deviation
CN107393555A (en) * 2017-07-14 2017-11-24 西安交通大学 A kind of detecting system and detection method of low signal-to-noise ratio abnormal sound signal

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1632816A (en) * 2003-12-23 2005-06-29 广州可夫医疗科技有限公司 Method for analyzing EEG fluctuation signal and equipment thereof
US20090222258A1 (en) * 2008-02-29 2009-09-03 Takashi Fukuda Voice activity detection system, method, and program product
CN101814290A (en) * 2009-02-25 2010-08-25 三星电子株式会社 Method for enhancing robustness of voice recognition system
CN102982811A (en) * 2012-11-24 2013-03-20 安徽科大讯飞信息科技股份有限公司 Voice endpoint detection method based on real-time decoding
CN103236260A (en) * 2013-03-29 2013-08-07 京东方科技集团股份有限公司 Voice recognition system
CN106558316A (en) * 2016-11-09 2017-04-05 天津大学 It is a kind of based on it is long when signal special frequency band rate of change detection method of uttering long and high-pitched sounds
CN107393555A (en) * 2017-07-14 2017-11-24 西安交通大学 A kind of detecting system and detection method of low signal-to-noise ratio abnormal sound signal
CN107371116A (en) * 2017-07-21 2017-11-21 天津大学 A kind of detection method of uttering long and high-pitched sounds based on interframe spectrum flatness deviation

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
GHOSH, PRASANTA KUMAR ET AL.: "Robust voice activity detection using long-term signal variability", 《IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING》 *
冯璐: "基于长时特征的语音端点检测方法研究", 《万方学位论文》 *
张君昌 等: "融合Burg谱估计与信号变化率测度的语音端点检测", 《西安电子科技大学学报(自然科学版)》 *

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109545188A (en) * 2018-12-07 2019-03-29 深圳市友杰智新科技有限公司 A kind of real-time voice end-point detecting method and device
CN109545188B (en) * 2018-12-07 2021-07-09 深圳市友杰智新科技有限公司 Real-time voice endpoint detection method and device
CN109346062A (en) * 2018-12-25 2019-02-15 苏州思必驰信息科技有限公司 Sound end detecting method and device
CN109346062B (en) * 2018-12-25 2021-05-28 思必驰科技股份有限公司 Voice endpoint detection method and device
CN110047470A (en) * 2019-04-11 2019-07-23 深圳市壹鸽科技有限公司 A kind of sound end detecting method
CN110085264B (en) * 2019-04-30 2021-10-15 北京如布科技有限公司 Voice signal detection method, device, equipment and storage medium
CN110085264A (en) * 2019-04-30 2019-08-02 北京儒博科技有限公司 Voice signal detection method, device, equipment and storage medium
CN111179966A (en) * 2019-11-25 2020-05-19 泰康保险集团股份有限公司 Voice analysis method and device, electronic equipment and storage medium
CN110827858A (en) * 2019-11-26 2020-02-21 苏州思必驰信息科技有限公司 Voice endpoint detection method and system
CN110890104A (en) * 2019-11-26 2020-03-17 苏州思必驰信息科技有限公司 Voice endpoint detection method and system
CN110890104B (en) * 2019-11-26 2022-05-03 思必驰科技股份有限公司 Voice endpoint detection method and system
CN111613250A (en) * 2020-07-06 2020-09-01 泰康保险集团股份有限公司 Long voice endpoint detection method and device, storage medium and electronic equipment
CN112735482A (en) * 2020-12-04 2021-04-30 珠海亿智电子科技有限公司 Endpoint detection method and system based on combined deep neural network
CN112735482B (en) * 2020-12-04 2024-02-13 珠海亿智电子科技有限公司 Endpoint detection method and system based on joint deep neural network
CN112967738A (en) * 2021-02-01 2021-06-15 腾讯音乐娱乐科技(深圳)有限公司 Human voice detection method and device, electronic equipment and computer readable storage medium
CN113205823A (en) * 2021-04-12 2021-08-03 广东技术师范大学 Lung sound signal endpoint detection method, system and storage medium
WO2022242479A1 (en) * 2021-05-17 2022-11-24 华为技术有限公司 Three-dimensional audio signal encoding method and apparatus, and encoder
CN113345423A (en) * 2021-06-24 2021-09-03 科大讯飞股份有限公司 Voice endpoint detection method and device, electronic equipment and storage medium
CN113345423B (en) * 2021-06-24 2024-02-13 中国科学技术大学 Voice endpoint detection method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN108538310B (en) 2021-06-25

Similar Documents

Publication Publication Date Title
CN108538310A (en) It is a kind of based on it is long when power spectrum signal variation sound end detecting method
CN108922541B (en) Multi-dimensional characteristic parameter voiceprint recognition method based on DTW and GMM models
EP2083417B1 (en) Sound processing device and program
CN108305639B (en) Speech emotion recognition method, computer-readable storage medium and terminal
CN105825852A (en) Oral English reading test scoring method
CN104078039A (en) Voice recognition system of domestic service robot on basis of hidden Markov model
CN108682432B (en) Speech emotion recognition device
CN103366759A (en) Speech data evaluation method and speech data evaluation device
Archana et al. Gender identification and performance analysis of speech signals
Jaafar et al. Automatic syllables segmentation for frog identification system
CN103366735A (en) A voice data mapping method and apparatus
Eringis et al. Improving speech recognition rate through analysis parameters
Yutai et al. Speaker recognition based on dynamic MFCC parameters
Zhao et al. Speech recognition system based on integrating feature and HMM
Pohjalainen et al. Automatic detection of anger in telephone speech with robust autoregressive modulation filtering
CN202758611U (en) Speech data evaluation device
Varela et al. Combining pulse-based features for rejecting far-field speech in a HMM-based voice activity detector
Yavuz et al. A Phoneme-Based Approach for Eliminating Out-of-vocabulary Problem Turkish Speech Recognition Using Hidden Markov Model.
Slaney et al. Pitch-gesture modeling using subband autocorrelation change detection.
CN111091816B (en) Data processing system and method based on voice evaluation
Heese et al. Speech-codebook based soft voice activity detection
Singh et al. A comparative study of recognition of speech using improved MFCC algorithms and Rasta filters
Jijomon et al. An offline signal processing technique for accurate localisation of stop release bursts in vowel-consonant-vowel utterances
RU2174714C2 (en) Method for separating the basic tone
Joseph et al. Indian accent detection using dynamic time warping

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant