CN102097095A - Speech endpoint detecting method and device - Google Patents

Speech endpoint detecting method and device Download PDF

Info

Publication number
CN102097095A
CN102097095A CN2010106095030A CN201010609503A CN102097095A CN 102097095 A CN102097095 A CN 102097095A CN 2010106095030 A CN2010106095030 A CN 2010106095030A CN 201010609503 A CN201010609503 A CN 201010609503A CN 102097095 A CN102097095 A CN 102097095A
Authority
CN
China
Prior art keywords
frame
voice signal
signal
speech
entropy
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2010106095030A
Other languages
Chinese (zh)
Inventor
苏伟博
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tianjin Yaan Technology Electronic Co Ltd
Original Assignee
Tianjin Yaan Technology Electronic Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tianjin Yaan Technology Electronic Co Ltd filed Critical Tianjin Yaan Technology Electronic Co Ltd
Priority to CN2010106095030A priority Critical patent/CN102097095A/en
Publication of CN102097095A publication Critical patent/CN102097095A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Compression, Expansion, Code Conversion, And Decoders (AREA)

Abstract

The invention belongs to the field of video monitoring, and provides a speech endpoint detecting method and device. The method comprises the following steps: sampling data of an input speech signal, and preprocessing the sampled speech signal; adding a Hamming window to the preprocessed speech signal for framing and recording as Rn (n is more than 0 and less than or equal to N), wherein N is the total number of frames; calculating a frequency spectrum information entropy of the n-th speech signal; and determining the frame as a speech frame if the frequency spectrum information entropy of the n-th speech signal is more than a set threshold value, and otherwise, determining the frame as a non-speech frame. The method applies the frequency spectrum entropy as a characteristic for distinguishing a speech frame from a non-speech frame, can effectively distinguish speech frames from non-speech frames, and has a good detection effect for low signal to noise ratio environments, so the defects that the traditional frequency spectrum entropy-based algorithm only considers the frequency spectrum information of the current frame, the noise frequency spectrum information entropy greatly fluctuates in a non-stationary noise environment, and the difficulty of threshold value selection is increased can be overcome.

Description

A kind of sound end detecting method and device
Technical field
The invention belongs to field of video monitoring, relate in particular to a kind of sound end detecting method and device.
Background technology
At present, in real-time video monitoring, utilize the abnormal sound in the microphone pickup monitoring scene, point to the abnormal sound place, can realize the real-time monitoring of anomalous event thereby regulate camera optical axis.Because the omni-directional acoustic pickup can pick up the sound on all directions, therefore can effectively solve in the traditional video surveillance and occur in blind area, rig camera visual field owing to anomalous event, can not capture the drawback that anomalous event takes place rapidly.In video monitoring, utilize the abnormal sound in the microphone pickup monitoring scene, the most key first step is exactly the sound end detection technique.
Traditional end-point detecting method is as short-time energy, zero-crossing rate scheduling algorithm, based on the improvement algorithm that entropy, zero energy product, entropy combine with energy, better performances when stationary noise or high s/n ratio.Under low signal-to-noise ratio or non-stationary environment, easy and the noise aliasing of the short-time energy of voice, zero-crossing rate is distinguished voiceless sound and noise easily, but be difficult to distinguish voiced sound and noise, zero energy product method can improve the robustness of end-point detection to a certain extent in short-term, but zero energy product characteristic parameter noise robustness is not as good as information entropy in short-term, say to a certain extent, the spectrum entropy has certain robustness to noise, but when signal to noise ratio (S/N ratio) descends, though the shape of spectrum entropy remains unchanged, but the spectrum entropy reduces, and tradition only considers the spectrum information of present frame based on the method for spectrum entropy, and noise spectrum information entropy fluctuation range is very big under the noise circumstance of non-stationary, and this has brought difficulty to selection of threshold.
Summary of the invention
The object of the present invention is to provide and a kind ofly can effectively distinguish voice and non-speech frame, the sound end detecting method of quite good detecting effectiveness is also arranged for the low signal-to-noise ratio environment.
The embodiment of the invention is achieved in that a kind of sound end detecting method, and described detection method comprises:
Input speech signal is carried out data sampling, and the voice signal after the sampling is carried out pre-service;
Pretreated voice signal is added Hamming window carry out the processing of branch frame, be designated as R n(0<n≤N), N is the sum of frame;
Calculate the spectrum information entropy of n frame voice signal;
If the spectrum information entropy of n frame voice signal greater than preset threshold, is judged to be speech frame with this frame, otherwise is judged to be non-speech frame.
The present invention also aims to provide a kind of sound end pick-up unit, it is characterized in that, described pick-up unit comprises:
The voice signal sample processing unit is used for input speech signal is carried out data sampling, and the voice signal after the sampling is carried out pre-service;
Voice signal divides frame processing unit, pretreated voice signal is added Hamming window carry out the processing of branch frame, is designated as R n(0<n≤N), N is the sum of frame;
Spectrum information entropy computing unit is used to calculate the spectrum information entropy of n frame voice signal;
The speech frame determining unit if the spectrum information entropy that is used for n frame voice signal greater than preset threshold, is judged to be speech frame with this frame, otherwise is judged to be non-speech frame.
Advantage of the present invention and good effect are:
The present invention has used the distinguishing characteristic of frequency spectrum entropy as voice and non-voice, can effectively distinguish speech frame and non-speech frame, for the low signal-to-noise ratio environment quite good detecting effectiveness is arranged also, overcome traditional spectrum information of only considering present frame based on the algorithm of frequency spectrum entropy, noise spectrum information entropy fluctuation is very big under the noise circumstance of non-stationary, increased the problem of the difficulty that threshold value selects.
Description of drawings
Fig. 1 is the realization flow figure of the sound end detecting method that provides of the embodiment of the invention;
Fig. 2 is the realization flow figure of the first embodiment of the present invention;
Fig. 3 is the structured flowchart of the sound end pick-up unit that provides of the embodiment of the invention.
Embodiment
In order to make purpose of the present invention, technical scheme and advantage clearer,, the present invention is further elaborated below in conjunction with drawings and Examples.Should be appreciated that specific embodiment described herein only in order to explanation the present invention, and be not used in qualification the present invention.
The embodiment of the invention has proposed sound end detecting method under the low signal-to-noise ratio of a kind of monitoring field.This method is with the distinguishing characteristic of subband spectrum entropy as voice and non-speech frame, at first every frame voice signal is carried out wavelet decomposition, obtain the subband signal of different frequency range, then these subband signals are carried out the FFT conversion, calculate the frequency spectrum entropy of each subband respectively, front and back are carried out smoothing processing at a distance of the subband spectrum entropy of some frames by one group of order statistics wave filter, calculate the frequency spectrum entropy of every frame, judge speech frame and non-speech frame according to its value and preset threshold, in order to improve the precision of algorithm, threshold value is carried out self-adaptation revise.
Fig. 1 shows the process flow diagram of the sound end detecting method that the embodiment of the invention provides.This method comprises:
In step S101, input speech signal is carried out data sampling, and the voice signal after the sampling is carried out pre-service;
In step S102, pretreated voice signal is added Hamming window carry out the processing of branch frame, be designated as R n(0<n≤N), N is the sum of frame;
In step S103, calculate the spectrum information entropy of n frame voice signal;
In step S104,, otherwise be judged to be non-speech frame if the spectrum information entropy of n frame voice signal greater than preset threshold, is judged to be speech frame with this frame.
In step S105,, otherwise turned back to for the 2nd step if n>N then algorithm finish.
As the first embodiment of the present invention, as shown in Figure 2, a kind of sound end detecting method specifically may further comprise the steps:
In step S201, input speech signal is carried out data sampling, because voice signal mainly concentrates on below the 8kHz, adopt the sample frequency of 11.025kHz in embodiments of the present invention as voice signal.
In step S202, the voice signal after the sampling carries out some pre-service, carries out pre-emphasis and can promote HFS, makes the smooth of signal spectrum change, is convenient to carry out spectrum analysis.Reducing the low level influence is because the voice signal of acoustic pickup collection is a negative value, makes it deduct intermediate value, and the voice central shaft is near zero point.Voice time domain amplitude is carried out normalization.
In step S203, pretreated voice signal is added Hamming window carry out the processing of branch frame, the general 20~30ms of frame length, frame moves general 10~20ms, is designated as R n(0<n≤N), N is the sum of frame.Wherein the Hamming window expression formula is:
W ( n ) = 0.54 - 0.46 cos [ 2 πn / ( N - 1 ) ] , ( 0 ≤ n ≤ N - 1 ) 0 , ( n = else )
In step S204, n frame R nVoice signal selects for use db3 series wavelet basis function to carry out five layers of decomposition, obtains the subband signal of six different frequency ranges
Figure BDA0000040999170000052
(0<m≤6), (0<k≤q (m)), q (m) refers to the length of m subband signal, m 6 is designated as the low frequency sub-band signal after the wavelet decomposition, in high-frequency sub-band, m from small to large, frequency reduces successively.
In step S205, each subband signal carries out obtaining corresponding power spectrum after the FFT conversion The FFT change point of wherein every straton band signal is several different according to the subband signal number, first to the 4th straton band signal FFT conversion count value be respectively 512,256,128,64, the five and the conversion of layer 6 subband signal count and get 32.
In step S206, at first calculate the energy of each subband signal, its computing formula is:
E m n = Σ k = 1 q ( m ) X m n ( k ) 2 , ( k = 1,2 Lq ( m ) )
Wherein q (m) refers to the length of m subband signal,
Figure BDA0000040999170000055
Be k point of m subband of n frame,
Figure BDA0000040999170000056
It is the energy of m subband of n frame.
Secondly, calculate the probability of each point of each subband signal, its computing formula is:
p k m = X m n ( k ) + Q E m n + Q , ( k = 1,2 Lq ( m ) )
Wherein,
Figure BDA0000040999170000058
Refer to the probability of k point of m subband signal, Q is a bigger positive number.
At last, calculate the frequency spectrum entropy of each subband signal, its computing formula is:
Es m n = Σ k = 1 q ( m ) p k m log 2 ( p k m ) , ( k = 1,2 Lq ( m ) )
Wherein,
Figure BDA0000040999170000062
The subband spectrum entropy that refers to m subband signal of n frame.
In step S207, get the subband spectrum entropy of each L frame of n frame front and back, each subband spectrum entropy of this (2L+1) frame is made ascending sort respectively (L≤l≤L) obtains in this (2L+1) frame k maximal value in m the subband spectrum entropy
Figure BDA0000040999170000064
The frequency spectrum entropy of m the subband of n frame after then popin is slided is after filtration obtained by following formula
Eh m n = ( 1 - λ ) Es k m n + λg Es ( k + 1 ) m n , (0<k≤2L+1)(0<m≤6)
Wherein,
Figure BDA0000040999170000066
(0<λ<1).λ is called the sampling fractile of order statistics wave filter, and λ satisfies Gaussian distribution, and λ value 0.3 in the present embodiment, L value are 8.
In step S208, calculate the frequency spectrum entropy H of n frame n, its computing formula is:
H n = - 1 M · Σ m = 1 M Eh m n
Wherein, M is 6.
In step S209, the average that initial threshold T gets preceding 10 frame frequencies spectrum entropy multiply by a correction factor a, and when n frame voice signal frequency spectrum entropy during greater than T, judgements present frame is a speech frame, otherwise the judgement present frame is a non-speech frame.The a value is 1.30 in the present embodiment.
In step S210, when detecting voice signal when speech frame enters non-speech frame, the average of getting 5 frame voice signal frequency spectrum entropys again multiply by a correction factor b as threshold value, realizes that the self-adaptation of threshold value is revised.The b value is 1.06 in the present embodiment.
In step S211,, otherwise turn back to step 204 if n>N algorithm finishes.
Fig. 3 shows the structural representation of the sound end pick-up unit that the embodiment of the invention provides.For convenience of explanation, only show part related to the present invention.
This pick-up unit comprises:
Voice signal sample processing unit 31 is used for input speech signal is carried out data sampling, and the voice signal after the sampling is carried out pre-service;
Voice signal divides frame processing unit 32, pretreated voice signal is added Hamming window carry out the processing of branch frame, is designated as R n(0<n≤N), N is the sum of frame;
Spectrum information entropy computing unit 33 is used to calculate the spectrum information entropy of n frame voice signal;
Speech frame determining unit 34 if the spectrum information entropy that is used for n frame voice signal greater than preset threshold, is judged to be speech frame with this frame, otherwise is judged to be non-speech frame.
As a preferred version of the embodiment of the invention, described voice signal sample processing unit 31 comprises:
Voice signal pre-emphasis module 311 is used for that the voice signal after the sampling is carried out some pre-service and carries out pre-emphasis, promotes HFS, makes the smooth of signal spectrum change, is convenient to carry out spectrum analysis;
Reducing low level influences module 312, is used for the voice signal after the sampling is reduced the low level influence, makes voice signal deduct intermediate value, and the voice central shaft is near zero point;
Time domain amplitude normalizing module 313 is used for the voice time domain amplitude of the voice signal after the sampling is carried out normalization.
As a preferred version of the embodiment of the invention, described Hamming window expression formula is:
W ( n ) = 0.54 - 0.46 cos [ 2 πn / ( N - 1 ) ] , ( 0 ≤ n ≤ N - 1 ) 0 , ( n = else )
As a preferred version of the embodiment of the invention, described spectrum information entropy computing unit 33 comprises:
Voice signal decomposing module 331 is used for n frame R nVoice signal selects for use db3 series wavelet basis function to carry out five layers of decomposition, obtains the subband signal of different frequency range;
FFT conversion module 332 is used for each subband signal is carried out obtaining corresponding power spectrum after the FFT conversion;
Subband signal computing module 333 is used to calculate the energy of each subband signal, the probability of each point of each subband signal and the frequency spectrum entropy of each subband signal;
Frequency spectrum entropy smoothing processing module 334 is used for front and back are carried out smoothing processing at a distance of the subband spectrum entropy of some frames by one group of order statistics wave filter;
Frequency spectrum entropy computing module 335 is used to calculate the frequency spectrum entropy of every frame.
As a preferred version of the embodiment of the invention, described pick-up unit also comprises:
Threshold setting unit 35 is used for setting threshold and threshold value is carried out self-adaptation revise.The average that the initial threshold of threshold value is got the frequency spectrum entropy of preceding 10 frame subband signals multiply by a correction factor and obtains; When speech frame enters non-speech frame, described threshold value multiply by a coefficient by the average of getting some frame voice signal frequency spectrum entropys again and carries out self-adaptation and revise.
Beneficial effect of the present invention is:
1, to obtain the subband signal of different frequency range may be sub-band filter method the most easily to wavelet transformation, and choosing of wavelet basis function has very big dirigibility.
2, existing documents and materials prove, under the low signal-to-noise ratio environment, be better than method based on the algorithm of voice spectrum entropy based on energy, and traditional spectrum information of only considering present frame based on the algorithm of frequency spectrum entropy, noise spectrum information entropy fluctuation is very big under the noise circumstance of non-stationary, increased the difficulty that threshold value is selected.The present invention carries out smoothing processing with the subband spectrum entropy of the some frames in front and back by one group of order statistics wave filter, has overcome the shortcoming of tradition based on frequency spectrum entropy algorithm.
3, the present invention carries out the self-adaptation modification to the threshold value of choosing, and has increased the precision of end-point detection.
The above only is preferred embodiment of the present invention, not in order to restriction the present invention, all any modifications of being done within the spirit and principles in the present invention, is equal to and replaces and improvement etc., all should be included within protection scope of the present invention.

Claims (9)

1. a sound end detecting method at first carries out data sampling to input speech signal, and the voice signal after the sampling is carried out pre-service, then pretreated voice signal is added Hamming window and carries out the processing of branch frame, is designated as R n(0<n≤N), N is the sum of frame, it is characterized in that, described detection method also comprises:
Calculate the spectrum information entropy of n frame voice signal;
If the spectrum information entropy of n frame voice signal greater than preset threshold, is judged to be speech frame with this frame, otherwise is judged to be non-speech frame.
2. detection method as claimed in claim 1 is characterized in that, described voice signal after the sampling is carried out pretreated implementation method and comprises:
Voice signal after the sampling is carried out some pre-service carry out pre-emphasis, promote HFS, make the smooth of signal spectrum change, be convenient to carry out spectrum analysis;
Voice signal after the sampling is reduced the low level influence, make voice signal deduct intermediate value, the voice central shaft is near zero point;
Voice time domain amplitude to the voice signal after the sampling is carried out normalization.
3. detection method as claimed in claim 1 is characterized in that, the spectrum information entropy of described calculating n frame voice signal may further comprise the steps:
To n frame R nVoice signal selects for use wavelet basis function to carry out five layers of decomposition, obtains the subband signal of different frequency range;
Each subband signal is carried out obtaining corresponding power spectrum after the FFT conversion;
Calculate the energy of each subband signal, the probability of each point of each subband signal and the frequency spectrum entropy of each subband signal;
Front and back are carried out smoothing processing at a distance of the subband spectrum entropy of some frames by one group of order statistics wave filter;
Calculate the frequency spectrum entropy of every frame.
4. detection method as claimed in claim 1 is characterized in that, the average that the initial threshold of described threshold value is got the frequency spectrum entropy of preceding 10 frame subband signals multiply by a correction factor and obtains.
5. detection method as claimed in claim 1 is characterized in that, when when speech frame enters non-speech frame, described threshold value multiply by a coefficient by the average of getting some frame voice signal frequency spectrum entropys again and carries out self-adaptation and revise.
6. a sound end pick-up unit is characterized in that, described pick-up unit comprises:
The voice signal sample processing unit is used for input speech signal is carried out data sampling, and the voice signal after the sampling is carried out pre-service;
Voice signal divides frame processing unit, pretreated voice signal is added Hamming window carry out the processing of branch frame, is designated as R n(0<n≤N), N is the sum of frame;
Spectrum information entropy computing unit is used to calculate the spectrum information entropy of n frame voice signal;
The speech frame determining unit if the spectrum information entropy that is used for n frame voice signal greater than preset threshold, is judged to be speech frame with this frame, otherwise is judged to be non-speech frame.
7. pick-up unit as claimed in claim 6 is characterized in that, described voice signal sample processing unit comprises:
Voice signal pre-emphasis module is used for that the voice signal after the sampling is carried out some pre-service and carries out pre-emphasis, promotes HFS, makes the smooth of signal spectrum change, is convenient to carry out spectrum analysis;
Reducing low level influences module, is used for the voice signal after the sampling is reduced the low level influence, makes voice signal deduct intermediate value, and the voice central shaft is near zero point;
Time domain amplitude normalizing module is used for the voice time domain amplitude of the voice signal after the sampling is carried out normalization.
8. pick-up unit as claimed in claim 6 is characterized in that, described spectrum information entropy computing unit comprises:
The voice signal decomposing module is used for n frame R nVoice signal selects for use db3 series wavelet basis function to carry out five layers of decomposition, obtains the subband signal of different frequency range;
The FFT conversion module is used for each subband signal is carried out obtaining corresponding power spectrum after the FFT conversion;
The subband signal computing module is used to calculate the energy of each subband signal, the probability of each point of each subband signal and the frequency spectrum entropy of each subband signal;
Frequency spectrum entropy smoothing processing module is used for front and back are carried out smoothing processing at a distance of the subband spectrum entropy of some frames by one group of order statistics wave filter;
Frequency spectrum entropy computing module is used to calculate the frequency spectrum entropy of every frame.
9. pick-up unit as claimed in claim 6 is characterized in that, described pick-up unit also comprises:
The threshold setting unit is used for when when speech frame enters non-speech frame, and described threshold value multiply by a coefficient by the average of getting some frame voice signal frequency spectrum entropys again and carries out self-adaptation and revise.
CN2010106095030A 2010-12-28 2010-12-28 Speech endpoint detecting method and device Pending CN102097095A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010106095030A CN102097095A (en) 2010-12-28 2010-12-28 Speech endpoint detecting method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010106095030A CN102097095A (en) 2010-12-28 2010-12-28 Speech endpoint detecting method and device

Publications (1)

Publication Number Publication Date
CN102097095A true CN102097095A (en) 2011-06-15

Family

ID=44130160

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010106095030A Pending CN102097095A (en) 2010-12-28 2010-12-28 Speech endpoint detecting method and device

Country Status (1)

Country Link
CN (1) CN102097095A (en)

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103117066A (en) * 2013-01-17 2013-05-22 杭州电子科技大学 Low signal to noise ratio voice endpoint detection method based on time-frequency instaneous energy spectrum
CN103187069A (en) * 2011-12-29 2013-07-03 福建联拓科技有限公司 Method and device for subaudio frequency last syllable detection
CN103366758A (en) * 2012-03-31 2013-10-23 多玩娱乐信息技术(北京)有限公司 Method and device for reducing noises of voice of mobile communication equipment
CN103824563A (en) * 2014-02-21 2014-05-28 深圳市微纳集成电路与系统应用研究院 Hearing aid denoising device and method based on module multiplexing
CN104575498A (en) * 2015-01-30 2015-04-29 深圳市云之讯网络技术有限公司 Recognition method and system of effective speeches
CN104980211A (en) * 2015-06-29 2015-10-14 北京航天易联科技发展有限公司 Signal processing method and device
CN104992537A (en) * 2015-06-26 2015-10-21 北京航天易联科技发展有限公司 Signal processing method and device
CN106558316A (en) * 2016-11-09 2017-04-05 天津大学 It is a kind of based on it is long when signal special frequency band rate of change detection method of uttering long and high-pitched sounds
CN106816157A (en) * 2015-11-30 2017-06-09 展讯通信(上海)有限公司 Audio recognition method and device
CN107578770A (en) * 2017-08-31 2018-01-12 百度在线网络技术(北京)有限公司 Networking telephone audio recognition method, device, computer equipment and storage medium
CN107665711A (en) * 2016-07-28 2018-02-06 展讯通信(上海)有限公司 Voice activity detection method and device
CN108122552A (en) * 2017-12-15 2018-06-05 上海智臻智能网络科技股份有限公司 Voice mood recognition methods and device
CN108429999A (en) * 2018-04-06 2018-08-21 东莞市华睿电子科技有限公司 The standby controlling method of intelligent sound box
CN109102823A (en) * 2018-09-05 2018-12-28 河海大学 A kind of sound enhancement method based on subband spectrum entropy
CN109119096A (en) * 2012-12-25 2019-01-01 中兴通讯股份有限公司 The currently active sound keeps the modification method and device of frame number in a kind of VAD judgement
CN110415729A (en) * 2019-07-30 2019-11-05 安谋科技(中国)有限公司 Voice activity detection method, device, medium and system
CN110473547A (en) * 2019-07-12 2019-11-19 云知声智能科技股份有限公司 A kind of audio recognition method
WO2019232848A1 (en) * 2018-06-04 2019-12-12 平安科技(深圳)有限公司 Voice distinguishing method and device, computer device and storage medium
CN110969805A (en) * 2018-09-30 2020-04-07 杭州海康威视数字技术股份有限公司 Safety detection method, device and system
CN111739542A (en) * 2020-05-13 2020-10-02 深圳市微纳感知计算技术有限公司 Method, device and equipment for detecting characteristic sound
CN113593599A (en) * 2021-09-02 2021-11-02 北京云蝶智学科技有限公司 Method for removing noise signal in voice signal
CN110600018B (en) * 2019-09-05 2022-04-26 腾讯科技(深圳)有限公司 Voice recognition method and device and neural network training method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008058842A1 (en) * 2006-11-16 2008-05-22 International Business Machines Corporation Voice activity detection system and method
CN101599269A (en) * 2009-07-02 2009-12-09 中国农业大学 Sound end detecting method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2008058842A1 (en) * 2006-11-16 2008-05-22 International Business Machines Corporation Voice activity detection system and method
CN101599269A (en) * 2009-07-02 2009-12-09 中国农业大学 Sound end detecting method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
郭丽惠 等: "基于顺序统计滤波的实时语音端点检测算法", 《自动化学报》 *

Cited By (34)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103187069A (en) * 2011-12-29 2013-07-03 福建联拓科技有限公司 Method and device for subaudio frequency last syllable detection
CN103187069B (en) * 2011-12-29 2015-03-18 福建联拓科技有限公司 Method and device for subaudio frequency last syllable detection
CN103366758A (en) * 2012-03-31 2013-10-23 多玩娱乐信息技术(北京)有限公司 Method and device for reducing noises of voice of mobile communication equipment
CN103366758B (en) * 2012-03-31 2016-06-08 欢聚时代科技(北京)有限公司 The voice de-noising method of a kind of mobile communication equipment and device
CN109119096A (en) * 2012-12-25 2019-01-01 中兴通讯股份有限公司 The currently active sound keeps the modification method and device of frame number in a kind of VAD judgement
CN109119096B (en) * 2012-12-25 2021-01-22 中兴通讯股份有限公司 Method and device for correcting current active tone hold frame number in VAD (voice over VAD) judgment
CN103117066B (en) * 2013-01-17 2015-04-15 杭州电子科技大学 Low signal to noise ratio voice endpoint detection method based on time-frequency instaneous energy spectrum
CN103117066A (en) * 2013-01-17 2013-05-22 杭州电子科技大学 Low signal to noise ratio voice endpoint detection method based on time-frequency instaneous energy spectrum
CN103824563A (en) * 2014-02-21 2014-05-28 深圳市微纳集成电路与系统应用研究院 Hearing aid denoising device and method based on module multiplexing
CN104575498A (en) * 2015-01-30 2015-04-29 深圳市云之讯网络技术有限公司 Recognition method and system of effective speeches
CN104575498B (en) * 2015-01-30 2018-08-17 深圳市云之讯网络技术有限公司 Efficient voice recognition methods and system
CN104992537B (en) * 2015-06-26 2018-02-09 北京航天易联科技发展有限公司 A kind of signal processing method and device
CN104992537A (en) * 2015-06-26 2015-10-21 北京航天易联科技发展有限公司 Signal processing method and device
CN104980211B (en) * 2015-06-29 2017-12-12 北京航天易联科技发展有限公司 A kind of signal processing method and device
CN104980211A (en) * 2015-06-29 2015-10-14 北京航天易联科技发展有限公司 Signal processing method and device
CN106816157A (en) * 2015-11-30 2017-06-09 展讯通信(上海)有限公司 Audio recognition method and device
CN107665711A (en) * 2016-07-28 2018-02-06 展讯通信(上海)有限公司 Voice activity detection method and device
CN106558316A (en) * 2016-11-09 2017-04-05 天津大学 It is a kind of based on it is long when signal special frequency band rate of change detection method of uttering long and high-pitched sounds
CN107578770A (en) * 2017-08-31 2018-01-12 百度在线网络技术(北京)有限公司 Networking telephone audio recognition method, device, computer equipment and storage medium
CN107578770B (en) * 2017-08-31 2020-11-10 百度在线网络技术(北京)有限公司 Voice recognition method and device for network telephone, computer equipment and storage medium
CN108122552A (en) * 2017-12-15 2018-06-05 上海智臻智能网络科技股份有限公司 Voice mood recognition methods and device
CN108429999A (en) * 2018-04-06 2018-08-21 东莞市华睿电子科技有限公司 The standby controlling method of intelligent sound box
WO2019232848A1 (en) * 2018-06-04 2019-12-12 平安科技(深圳)有限公司 Voice distinguishing method and device, computer device and storage medium
CN109102823A (en) * 2018-09-05 2018-12-28 河海大学 A kind of sound enhancement method based on subband spectrum entropy
CN109102823B (en) * 2018-09-05 2022-12-06 河海大学 Speech enhancement method based on subband spectral entropy
CN110969805A (en) * 2018-09-30 2020-04-07 杭州海康威视数字技术股份有限公司 Safety detection method, device and system
CN110473547B (en) * 2019-07-12 2021-07-30 云知声智能科技股份有限公司 Speech recognition method
CN110473547A (en) * 2019-07-12 2019-11-19 云知声智能科技股份有限公司 A kind of audio recognition method
CN110415729A (en) * 2019-07-30 2019-11-05 安谋科技(中国)有限公司 Voice activity detection method, device, medium and system
CN110415729B (en) * 2019-07-30 2022-05-06 安谋科技(中国)有限公司 Voice activity detection method, device, medium and system
CN110600018B (en) * 2019-09-05 2022-04-26 腾讯科技(深圳)有限公司 Voice recognition method and device and neural network training method and device
CN111739542A (en) * 2020-05-13 2020-10-02 深圳市微纳感知计算技术有限公司 Method, device and equipment for detecting characteristic sound
CN111739542B (en) * 2020-05-13 2023-05-09 深圳市微纳感知计算技术有限公司 Method, device and equipment for detecting characteristic sound
CN113593599A (en) * 2021-09-02 2021-11-02 北京云蝶智学科技有限公司 Method for removing noise signal in voice signal

Similar Documents

Publication Publication Date Title
CN102097095A (en) Speech endpoint detecting method and device
CN103646649B (en) A kind of speech detection method efficiently
US7499686B2 (en) Method and apparatus for multi-sensory speech enhancement on a mobile device
Skowronski et al. Acoustic detection and classification of microchiroptera using machine learning: lessons learned from automatic speech recognition
CN103310789B (en) A kind of sound event recognition method of the parallel model combination based on improving
CN103117067B (en) Voice endpoint detection method under low signal-to-noise ratio
CN103117066B (en) Low signal to noise ratio voice endpoint detection method based on time-frequency instaneous energy spectrum
US11295761B2 (en) Method for constructing voice detection model and voice endpoint detection system
CN105513605A (en) Voice enhancement system and method for cellphone microphone
CN102044246B (en) Method and device for detecting audio signal
CN1530929A (en) System for inhibitting wind noise
CN1679083A (en) Multichannel voice detection in adverse environments
CN106024010A (en) Speech signal dynamic characteristic extraction method based on formant curves
CN102074245A (en) Dual-microphone-based speech enhancement device and speech enhancement method
CN104021789A (en) Self-adaption endpoint detection method using short-time time-frequency value
Zaw et al. The combination of spectral entropy, zero crossing rate, short time energy and linear prediction error for voice activity detection
CN103594094A (en) Self-adaptive spectral subtraction real-time speech enhancement
CN104409078A (en) Abnormal noise detection and recognition system
CN103021405A (en) Voice signal dynamic feature extraction method based on MUSIC and modulation spectrum filter
CN105825857A (en) Voiceprint-recognition-based method for assisting deaf patient in determining sound type
CN105575405A (en) Double-microphone voice active detection method and voice acquisition device
CN111540368B (en) Stable bird sound extraction method and device and computer readable storage medium
Couvreur et al. Automatic noise recognition in urban environments based on artificial neural networks and hidden markov models
CN114234061A (en) Neural network-based intelligent judgment method for water leakage sound of pressurized operation water supply pipeline
Jin et al. An improved speech endpoint detection based on spectral subtraction and adaptive sub-band spectral entropy

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C53 Correction of patent of invention or patent application
CB02 Change of applicant information

Address after: 300384 Tianjin City Huayuan Industrial Zone Ziyuan Road No. 8

Applicant after: Tianjin Yaan Technology Co., Ltd.

Address before: Zi Yuan Road 300384 Tianjin city Nankai District Huayuan Industrial Park No. 8

Applicant before: Yaan Science & Technology Electronic Co., Ltd., Tianjin

COR Change of bibliographic data

Free format text: CORRECT: APPLICANT; FROM: YAAN SCIENCE + TECHNOLOGY ELECTRONIC CO., LTD., TIANJIN TO: TIANJIN YA ANTECHNOLOGY CO., LTD.

C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20110615