CN106847270A - A kind of double threshold place name sound end detecting method - Google Patents

A kind of double threshold place name sound end detecting method Download PDF

Info

Publication number
CN106847270A
CN106847270A CN201611135819.4A CN201611135819A CN106847270A CN 106847270 A CN106847270 A CN 106847270A CN 201611135819 A CN201611135819 A CN 201611135819A CN 106847270 A CN106847270 A CN 106847270A
Authority
CN
China
Prior art keywords
voice
length
variable
threshold value
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201611135819.4A
Other languages
Chinese (zh)
Other versions
CN106847270B (en
Inventor
谢巍
董万里
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN201611135819.4A priority Critical patent/CN106847270B/en
Publication of CN106847270A publication Critical patent/CN106847270A/en
Application granted granted Critical
Publication of CN106847270B publication Critical patent/CN106847270B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • G10L15/05Word boundary detection

Abstract

The invention discloses a kind of double threshold place name sound end detecting method, the energy and minimum energy threshold value, the size of highest energy threshold value of every frame voice signal are judged since the first frame signal, judge the size of zero-crossing rate and zero-crossing rate threshold value, so that it is determined that how next frame signal is detected, and in the case of possibly into voice status, the voice signal at the light time end of pronunciation above occurred to voice segments by increasing variable retains.The characteristics of present invention combines the place name voice signal of isolated word, traditional double threshold method is improved, ensure that schwa and the front portion of voice signal that the duration is very short will not be judged as noise, so as to avoid losing voice signal, the accuracy of end-point detection and the adaptability of live applied environment are improve, the requirement to environment is reduced.

Description

A kind of double threshold place name sound end detecting method
Technical field
The invention belongs to speech terminals detection field, more particularly to a kind of double threshold place name sound end detecting method.
Background technology
With becoming increasingly conspicuous for rapid development of economy and the trend of globalization, modern logistics industry is obtained in developed country Unprecedented development, and generate huge economic benefit and social benefit.Logistic resources have transport, storage, sorting, packaging, Dispatching etc., these resources are dispersed in multiple fields, including manufacturing industry, agricultural, circulation industry etc..
In link is sorted, substantially manually sorted at this stage, because workpeople is chronically at noisy building ring In border, certain sense of fatigue will certainly be produced with body at heart, and the unicity and repeatability of task can also make him Working condition excessively loosen, this necessarily cause sort accuracy decline, cause it is more it is irremediable sorting error thing Therefore occur, therefore carry out the mode of manual detection in industrial circle to the product on streamline and can not meet modernization industry Demand.
Speech recognition changes our life at many aspects till now as the important interface of man-machine interaction It is living, from the speech control system of smart home to vehicle-mounted voice identifying system etc., therefore speech recognition technology and logistics are sorted The fusion of link is the inevitable requirement of development of logistics line.
And in speech recognition technology, end-point detection technology is particularly important link in speech recognition, its effect it is good Bad to directly affect final recognition result, traditional end-point detecting method based on short-time energy and zero-crossing rate is in preferable ring Could be applicable in border, and for the place name voice signal of isolated word, the accuracy rate of end-point detection is relatively low.
The content of the invention
Shortcoming and deficiency it is an object of the invention to overcome prior art, there is provided a kind of double threshold place name sound end inspection Survey method, improves the accuracy of end-point detection.
A kind of double threshold place name sound end detecting method, comprises the following steps:Every frame is judged since the first frame signal The energy of voice signal and minimum energy threshold value, the size of highest energy threshold value, judge the size of zero-crossing rate and zero-crossing rate threshold value, So that it is determined that the appropriate method detected to next frame signal, and in the case of possibly into voice status, by increasing Variable retains come the voice signal of the pronunciation light time period above occurred to voice segments.
Comprise the following steps that:
1st, receive the place name voice signal by pretreatment, judge energy per frame voice signal and minimum energy threshold value, The size of highest energy threshold value and judge the size of zero-crossing rate and zero-crossing rate threshold value;
2nd, when the energy < minimum energy threshold values of the i-th frame voice signal, state variable is set to 0, voice length gauge Number variable is arranged to 0, shows still in Jing Yin section, and continuing return to step 1 carries out next frame detection;
When the energy > minimum energy threshold values of highest energy threshold value > the i-th frame voice signals, and zero-crossing rate > zero-crossing rate thresholds Value, 1 is set to by state variable, shows to be likely to be at voice segments, adds 1 by voice length counting variable, while will likely be in The variable of the length of voice segments adds 1, and return to step 1 carries out next frame detection;
If the 3, stateful variable is 1, the voice signal to being likely to be at voice segments is carried out according to certain standard Screening, further discriminates between noise section and voice segments;
4th, when the energy > highest energy threshold values of the i-th frame voice signal, then state variable is set to 2, indicates entry into language Segment, while adding 1 by voice length counting variable, next frame detection is carried out according to step 5;
5th, the energy > minimum energies threshold value of current frame speech signal or the zero-crossing rate > mistakes of current frame speech signal are judged Whether zero rate threshold value is set up;
If so, represent also in voice segments, be not Jing Yin, state variable is remained 2, voice length counting variable adds 1, Continue next frame according to step 5 to detect;
If not, illustrate that signal turns to Jing Yin section from voice segments, then Jing Yin length is added 1, and to Jing Yin length Determine whether;Until finding all effective voice signals, state parameter is set to 3, terminates process.
Preferably, if stateful variable be 1, and voice signal energy be less than minimum energy threshold value when, judgement can Whether the variable of length that voice segments can be in is set up more than certain threshold value, if so, expression is currently noise section, give up before Phonological component, writ state variable, the variable of voice length counting variable and the length for being likely to be at voice segments is equal to and 0 and returns Step 1 continues next frame detection;If not, then express possibility also in voice segments, hold mode variable is equal to 1 and long by voice Degree counting variable adds 1, and the variable for being likely to be at the length of voice segments plus 1, and return to step 1 carries out next frame detection.
Further, above-mentioned certain threshold value is equal to 6.
Preferably, the step of being determined whether to Jing Yin length be:Whether judge the maximum Jing Yin length of Jing Yin length < Set up;
If so, then hold mode variable is 2, adds 1 by voice length counting variable, and carry out next frame according to step 5 Detection;
If not, then judge whether voice length counting variable < voice signals minimum length is set up;If voice length Counting variable < voice signals minimum length is set up, and show above to detect is all noise, state variable is set to 0, Jing Yin segment length is set to 0, voice length counting variable and is set to 0, is further continued for inspection;If voice length counting variable < voices Signal minimum length is invalid, represents that voice segments have found, it is believed that be effective voice signal, state parameter is set into 3, End process.
Preferably, under original state, writ state variable is equal to 0, and voice length counting variable is equal to 0, not true for calculating The variable that the length of voice segments is likely to be at when surely into voice segments is equal to 0, and Jing Yin length is equal to 0.
Preferably, the value of the minimum energy threshold value is 0.01, and the value of highest energy threshold value is 0.1, and zero-crossing rate threshold value is 100。
Preferably, the Jing Yin maximum length is equal to 10, and the voice signal minimum length is equal to 5.
Preferably, preprocessing process includes that preemphasis is processed and sub-frame processing.
Specifically, preemphasis treatment is come real by the digital filter of the lifting high frequency characteristics with 6dB/ octaves Existing, the high-pass filter meets H (z)=1- μ z-1, μ=0.97;According to frame length 256, frame moves 128 pairs of voice signals and is divided Frame.
The present invention compared with prior art, has the following advantages that and beneficial effect:
The characteristics of present invention combines the place name voice signal of isolated word, is improved by traditional double threshold method, Add the variable slience1 variables for calculating the length for not determining that voice segments are likely to be at during into voice segments, and optimization Various end-point detection parameters, ensure that schwa and the front portion of interrupted place name voice signal that the duration is very short will not Noise is judged as, so as to avoid losing voice signal, the accuracy of end-point detection and fitting for live applied environment is improve Ying Xing, reduces requirement of the end-point detection to environment.
Brief description of the drawings
Fig. 1 is the process schematic of embodiment method.
Specific embodiment
With reference to embodiment and accompanying drawing, the present invention is described in further detail, but embodiments of the present invention are not limited In this.
During the end-point detection of place name voice signal, if one section of place name voice is in voice segments before this, then locate In Jing Yin section, normal voice section is entered back into, then can be considered for a section before normal voice section by traditional end-point detecting method Noise section, then shears voice signal again, and this results in the loss of voice signal, for example " Shijiazhuang " this pronunciation, " stone " Pronunciation is very light very short, not easy to identify.
And the double threshold place name sound end detecting method that the present embodiment is given, based on improved short-time average energy and mistake Zero rate, by adding the variable slience1 for calculating the length for not determining that voice segments are likely to be at during into voice segments, i.e., Make to run into above-mentioned situation, it is also possible to the voice length before normal voice section is preserved, as effective fragment, so as to improve end The validity of point detection.
Before end-point detection is carried out, place name voice signal is pre-processed, including preemphasis treatment (Pre- ) and sub-frame processing emphasis.
Because the mean power of voice signal is influenceed by glottal excitation and mouth and nose radiation, front end is about in more than 80Hz Fall by 6dB octaves, so when speech signal spec-trum is sought, frequency corresponding composition more high is smaller, the frequency spectrum of HFS It is more hard to find than low frequency part, therefore preemphasis treatment is carried out to voice signal.The central idea of preemphasis treatment is to utilize signal The difference of characteristic and noise characteristic is effectively processed signal, it is therefore an objective to is lifted HFS, is become the frequency spectrum of signal It is flat, be maintained at low frequency in the whole frequency band of high frequency, frequency spectrum can be sought with same signal to noise ratio, in order to spectrum analysis or sound Road Parameter analysis.Preemphasis is realized by the digital filter of the lifting high frequency characteristics with 6dB/ octaves, this implementation High-pass filter, the high-pass filter is used to meet H (z)=1- μ z in example-1, μ=0.97.
In addition, voice signal is as a whole, its characteristic and characterizes the parameter of its substantive characteristics and change over time, But it has short-term stationarity characteristic again, be can be regarded as (generally in 10ms~30ms) in a short time one it is approximate constant Stationary process.
Current most of voice process technology is that voice signal is carried out at framing on the basis of in short-term Reason, then extracts characteristic parameter section to each frame respectively, in order that being smoothed between frame and frame, keeps continuity, general using friendship The method of folded framing, makes former frame and a later frame have intersection, and intersection is referred to as frame shifting, will be to frame length and frame during framing The length of shifting is selected, if using larger frame length, very little, amount of calculation can be small for frame number, and the speed of system treatment is fast, but Easily increase the error of end-point detection, if using less frame length, frame number is more, amount of calculation increases, the speed of system treatment Degree is slow.General frame number per second is about 33~100 frames, and frame is moved and typically takes the 1/3~2/3 of frame length, in the present embodiment, according to frame length 256, frame moves 128 pairs of voice signals and carries out framing, and 256,128 are sampled point number.
To place name voice signal by after pretreatment, you can carry out end-point detection, as shown in Figure 1, specific steps are such as Under:
Under original state, writ state variable status=0, voice length counting variable count=0 be not true for calculating The variable slience1=0, Jing Yin length slience=0 of the length of voice segments are likely to be at when surely into voice segments.
S1, reception judge the energy amp [i] and most low energy per frame voice signal by the place name voice signal of pretreatment Measure threshold value amp2, the size of highest energy threshold value amp1 and judge the size of zero-crossing rate zcr [i] and zero-crossing rate threshold value zcr, its In, the value of the minimum energy threshold value amp2 is 0.01, and the value of highest energy threshold value amp1 is 0.1, and zero-crossing rate threshold value zcr is 100。
These threshold values are the threshold values that voice signal is set after normalized, it is assumed that voice signal is x=[x1, x2... xn], then normalized is:
After these treatment, all values in signal x are between [- 1,1].The threshold value set on the basis of this, with Lower data are the threshold values for setting after normalization.
This process is to be detected each frame of voice signal successively, according to the judged result of each frame of voice signal, The value of state variable status is set, so that it is determined that how next frame voice signal should be judged.
S2, as energy amp [i] < minimum energy threshold value amp2 of the i-th frame voice signal, state variable status is set 0 is set to, voice length counting variable count is arranged to 0, shown still in Jing Yin section, continuing return S1 steps carries out next frame Detection;
S3, energy amp [i] > minimum energy threshold value amp2 when highest energy threshold value amp1 > the i-th frame voice signals, and Zero-crossing rate zcr [i] > zero-crossing rate threshold value zcr, 1 is set to by state variable status, shows to be likely to be at voice segments, by voice Length counting variable count adds 1, while the variable sliencel of the length that will likely be in voice segments adds 1, and returns to S1 steps Carry out next frame detection.
S4, if state status=1 is come into, and when the energy of next frame voice signal is less than minimum energy threshold value During amp2, judge whether sliencel > 6 set up, if so, expression is currently noise section, gives up phonological component above, is made State variable status=0, voice length counting variable count=0, is likely to be at the variable slience1 of the length of voice segments =0 and return S1 steps continue next frame detection;If not, then express possibility also in voice segments, hold mode variable Status=1 and voice length counting variable count is added 1, the variable slience1 for being likely to be at the length of voice segments plus 1, Returning to S1 steps carries out next frame detection.
S5, energy amp [i] > highest energy threshold value amp1 when the i-th frame voice signal, then set state variable status 2 are set to, voice segments are indicated entry into, while adding 1 by voice length counting variable count, next frame detection are carried out according to S6 steps.
S6, energy amp [i] the > minimum energy threshold value amp2 for judging current frame speech signal or current frame speech signal Whether zero-crossing rate zcr [i] > zero-crossing rate threshold values zcr sets up.
If so, represent also in voice segments, be not Jing Yin, state variable status is remained 2, voice length is counted and become Amount count adds 1, continues next frame according to S6 steps and detects.
If not, illustrate that signal turns to Jing Yin section from voice segments, then Jing Yin length slience is added 1, it is quiet herein Duration of a sound degree slience variables are with judging whether voice signal terminates later, and to perform S9 steps.
S9, judge whether the maximum Jing Yin length maxslience of Jing Yin length slience < sets up, wherein it is described it is Jing Yin most Long length maxslience=10;
If so, then express possibility also in voice segments, because after above there is voice signal, it is current Jing Yin section Voice length be not reaching to maximum Jing Yin length, then illustrate voice signal behind may not terminate also, may also have signal, because This may also in voice segments, and hold mode variable status is 2, adds 1 by voice length counting variable count, and according to step S6 carries out next frame detection.
If not, then judge whether voice length counting variable count < voice signal minimum lengths minlen sets up, Wherein described voice signal minimum length minlen=5;If voice length counting variable count < voice signal minimum lengths Minlen sets up, and show above to detect is all noise, because:Normal voice signal length should be more than voice Signal minimum length minlen, if being less than this length, is judged to noise, and state variable status is set into 0, Jing Yin segment length Degree slience is set to 0, voice length counting variable count and is set to 0, is further continued for inspection;If voice length counting variable Minlen is invalid for count < voice signal minimum lengths, represents that voice segments have found, it is believed that be effective voice signal, Therefore whole process can be terminated, will state parameter status be set to 3, terminate process.
In the present embodiment for scope judgement all be with more than or less than expression, do not refer to and being equal to, can by be equal to return Become more than that class.
Above-described embodiment is the present invention preferably implementation method, but embodiments of the present invention are not by above-described embodiment Limitation, it is other it is any without departing from Spirit Essence of the invention and the change, modification, replacement made under principle, combine, simplification, Equivalent substitute mode is should be, is included within protection scope of the present invention.

Claims (10)

1. a kind of double threshold place name sound end detecting method, it is characterised in that comprise the following steps:Since the first frame signal Judge the energy and minimum energy threshold value, the size of highest energy threshold value per frame voice signal, judge zero-crossing rate and zero-crossing rate threshold The size of value, so that it is determined that the appropriate method detected to next frame signal, and in the case of possibly into voice status, The voice signal of the pronunciation light time period above occurred to voice segments by increasing variable retains.
2. double threshold place name sound end detecting method according to claim 2, it is characterised in that comprise the following steps that:
(1) the place name voice signal by pretreatment, is received, energy per frame voice signal and minimum energy threshold value, most is judged The size of high-energy threshold value and judge the size of zero-crossing rate and zero-crossing rate threshold value;
(2), when the energy < minimum energy threshold values of the i-th frame voice signal, state variable is set to 0, voice length is counted Variable is arranged to 0, shows still in Jing Yin section, and continuing return to step 1 carries out next frame detection;
When the energy > minimum energy threshold values of highest energy threshold value > the i-th frame voice signals, and zero-crossing rate > zero-crossing rate threshold values, will State variable is set to 1, shows to be likely to be at voice segments, adds 1 by voice length counting variable, while voice segments will likely be in The variable of length add 1, and return to step 1 carries out next frame detection;
(3) if, stateful variable be 1, the voice signal to being likely to be at voice segments is sieved according to certain standard Choosing, further discriminates between noise section and voice segments;
(4), when the energy > highest energy threshold values of the i-th frame voice signal, then state variable is set to 2, indicates entry into voice Section, while adding 1 by voice length counting variable, next frame detection is carried out according to step 5;
(5) the energy > minimum energies threshold value of current frame speech signal or the zero-crossing rate > zero passages of current frame speech signal, are judged Whether rate threshold value is set up;
If so, represent also in voice segments, be not Jing Yin, state variable is remained 2, voice length counting variable adds 1, according to Step 5 continues next frame detection;
If not, illustrate that signal turns to Jing Yin section from voice segments, then Jing Yin length added 1, and Jing Yin length is made into One step judges;Until finding all effective voice signals, state parameter is set to 3, terminates process.
3. double threshold place name sound end detecting method according to claim 2, it is characterised in that in step (3), if It is 1 through stateful variable, and the energy of voice signal, when being less than minimum energy threshold value, judgement is likely to be at the length of voice segments Whether variable is set up more than certain threshold value, if so, expression is currently noise section, gives up phonological component above, and writ state becomes The variable of amount, voice length counting variable and the length for being likely to be at voice segments is equal to 0 and return to step 1 continues next frame inspection Survey;If not, then express possibility also in voice segments, hold mode variable is equal to 1 and adds 1 by voice length counting variable, can The variable of the length that can be in voice segments adds 1, and return to step (1) carries out next frame detection.
4. double threshold place name sound end detecting method according to claim 3, it is characterised in that described certain threshold value etc. In 6.
5. double threshold place name sound end detecting method according to claim 2, it is characterised in that to quiet in step (5) The step of duration of a sound degree is determined whether be:Judge whether the maximum Jing Yin length of Jing Yin length < is set up;
If so, then hold mode variable is 2, adds 1 by voice length counting variable, and carry out next frame detection according to step 5;
If not, then judge whether voice length counting variable < voice signals minimum length is set up;If voice length is counted Variables L T.LT.LT voice signals minimum length is set up, and show above to detect is all noise, state variable is set into 0, Jing Yin Segment length is set to 0, voice length counting variable and is set to 0, is further continued for inspection;If voice length counting variable < voice signals Minimum length is invalid, represents that voice segments have found, it is believed that be effective voice signal, and state parameter is set into 3, terminates Process.
6. double threshold place name sound end detecting method according to claim 2, it is characterised in that under original state, order State variable is equal to 0, and voice length counting variable is equal to 0, and voice segments are likely to be at when not determining into voice segments for calculating Length variable be equal to 0, Jing Yin length be equal to 0.
7. double threshold place name sound end detecting method according to claim 2, it is characterised in that the minimum energy threshold The value of value is 0.01, and the value of highest energy threshold value is 0.1, and zero-crossing rate threshold value is 100.
8. double threshold place name sound end detecting method according to claim 5, it is characterised in that described Jing Yin most to greatly enhance Degree is equal to 10, and the voice signal minimum length is equal to 5.
9. double threshold place name sound end detecting method according to claim 2, it is characterised in that preprocessing process includes Preemphasis treatment and sub-frame processing.
10. double threshold place name sound end detecting method according to claim 9, it is characterised in that preemphasis treatment is Realized by the digital filter of the lifting high frequency characteristics with 6dB/ octaves, the high-pass filter meets H (z)=1- μz-1, μ=0.97;According to frame length 256, frame moves 128 pairs of voice signals and carries out framing.
CN201611135819.4A 2016-12-09 2016-12-09 Double-threshold place name voice endpoint detection method Active CN106847270B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611135819.4A CN106847270B (en) 2016-12-09 2016-12-09 Double-threshold place name voice endpoint detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611135819.4A CN106847270B (en) 2016-12-09 2016-12-09 Double-threshold place name voice endpoint detection method

Publications (2)

Publication Number Publication Date
CN106847270A true CN106847270A (en) 2017-06-13
CN106847270B CN106847270B (en) 2020-08-18

Family

ID=59139133

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611135819.4A Active CN106847270B (en) 2016-12-09 2016-12-09 Double-threshold place name voice endpoint detection method

Country Status (1)

Country Link
CN (1) CN106847270B (en)

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107452399A (en) * 2017-09-18 2017-12-08 腾讯音乐娱乐科技(深圳)有限公司 Audio feature extraction methods and device
CN108332843A (en) * 2018-01-29 2018-07-27 国家电网公司 A kind of noise diagnostics method of electrical equipment malfunction electric arc
CN108847218A (en) * 2018-06-27 2018-11-20 郑州云海信息技术有限公司 A kind of adaptive threshold adjusting sound end detecting method, equipment and readable storage medium storing program for executing
CN109346095A (en) * 2018-10-10 2019-02-15 广州市讯飞樽鸿信息技术有限公司 A kind of heart sound end-point detecting method
CN109949806A (en) * 2019-03-12 2019-06-28 百度国际科技(深圳)有限公司 Information interacting method and device
CN110634473A (en) * 2019-09-20 2019-12-31 广州大学 Voice digital recognition method based on MFCC
CN112511698A (en) * 2020-12-03 2021-03-16 普强时代(珠海横琴)信息技术有限公司 Real-time call analysis method based on universal boundary detection
CN114283840A (en) * 2021-12-22 2022-04-05 天翼爱音乐文化科技有限公司 Instruction audio generation method, system, device and storage medium
WO2023092399A1 (en) * 2021-11-25 2023-06-01 华为技术有限公司 Speech recognition method, speech recognition apparatus, and system
CN116386676A (en) * 2023-06-02 2023-07-04 北京探境科技有限公司 Voice awakening method, voice awakening device and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1426048A (en) * 2001-12-13 2003-06-25 中国科学院自动化研究所 End detection method based on entropy
US20070198251A1 (en) * 2006-02-07 2007-08-23 Jaber Associates, L.L.C. Voice activity detection method and apparatus for voiced/unvoiced decision and pitch estimation in a noisy speech feature extraction
TW200811833A (en) * 2006-08-24 2008-03-01 Inventec Besta Co Ltd Detection method for voice activity endpoint
CN102254558A (en) * 2011-07-01 2011-11-23 重庆邮电大学 Control method of intelligent wheel chair voice recognition based on end point detection
CN103366739A (en) * 2012-03-28 2013-10-23 郑州市科学技术情报研究所 Self-adaptive endpoint detection method and self-adaptive endpoint detection system for isolate word speech recognition
CN104157290A (en) * 2014-08-19 2014-11-19 大连理工大学 Speaker recognition method based on depth learning

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1426048A (en) * 2001-12-13 2003-06-25 中国科学院自动化研究所 End detection method based on entropy
US20070198251A1 (en) * 2006-02-07 2007-08-23 Jaber Associates, L.L.C. Voice activity detection method and apparatus for voiced/unvoiced decision and pitch estimation in a noisy speech feature extraction
TW200811833A (en) * 2006-08-24 2008-03-01 Inventec Besta Co Ltd Detection method for voice activity endpoint
CN102254558A (en) * 2011-07-01 2011-11-23 重庆邮电大学 Control method of intelligent wheel chair voice recognition based on end point detection
CN103366739A (en) * 2012-03-28 2013-10-23 郑州市科学技术情报研究所 Self-adaptive endpoint detection method and self-adaptive endpoint detection system for isolate word speech recognition
CN104157290A (en) * 2014-08-19 2014-11-19 大连理工大学 Speaker recognition method based on depth learning

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107452399A (en) * 2017-09-18 2017-12-08 腾讯音乐娱乐科技(深圳)有限公司 Audio feature extraction methods and device
CN108332843A (en) * 2018-01-29 2018-07-27 国家电网公司 A kind of noise diagnostics method of electrical equipment malfunction electric arc
CN108847218B (en) * 2018-06-27 2020-07-21 苏州浪潮智能科技有限公司 Self-adaptive threshold setting voice endpoint detection method, equipment and readable storage medium
CN108847218A (en) * 2018-06-27 2018-11-20 郑州云海信息技术有限公司 A kind of adaptive threshold adjusting sound end detecting method, equipment and readable storage medium storing program for executing
CN109346095A (en) * 2018-10-10 2019-02-15 广州市讯飞樽鸿信息技术有限公司 A kind of heart sound end-point detecting method
CN109346095B (en) * 2018-10-10 2023-07-07 广州九路科技有限公司 Heart sound endpoint detection method
CN109949806A (en) * 2019-03-12 2019-06-28 百度国际科技(深圳)有限公司 Information interacting method and device
CN110634473A (en) * 2019-09-20 2019-12-31 广州大学 Voice digital recognition method based on MFCC
CN112511698A (en) * 2020-12-03 2021-03-16 普强时代(珠海横琴)信息技术有限公司 Real-time call analysis method based on universal boundary detection
WO2023092399A1 (en) * 2021-11-25 2023-06-01 华为技术有限公司 Speech recognition method, speech recognition apparatus, and system
CN114283840A (en) * 2021-12-22 2022-04-05 天翼爱音乐文化科技有限公司 Instruction audio generation method, system, device and storage medium
CN116386676A (en) * 2023-06-02 2023-07-04 北京探境科技有限公司 Voice awakening method, voice awakening device and storage medium
CN116386676B (en) * 2023-06-02 2023-08-29 北京探境科技有限公司 Voice awakening method, voice awakening device and storage medium

Also Published As

Publication number Publication date
CN106847270B (en) 2020-08-18

Similar Documents

Publication Publication Date Title
CN106847270A (en) A kind of double threshold place name sound end detecting method
WO2021128576A1 (en) Tool condition monitoring dataset enhancement method based on generative adversarial network
CN105611477B (en) The voice enhancement algorithm that depth and range neutral net are combined in digital deaf-aid
EP2096629B1 (en) Method and apparatus for classifying sound signals
CN101197130B (en) Sound activity detecting method and detector thereof
CN107835496B (en) Spam short message identification method and device and server
CN106503805A (en) A kind of bimodal based on machine learning everybody talk with sentiment analysis system and method
CN103366739B (en) Towards self-adaptation end-point detecting method and the system thereof of alone word voice identification
CN107403198A (en) A kind of official website recognition methods based on cascade classifier
CN107492382A (en) Voiceprint extracting method and device based on neutral net
CN111161207B (en) Integrated convolutional neural network fabric defect classification method
CN106601230B (en) Logistics sorting place name voice recognition method and system based on continuous Gaussian mixture HMM model and logistics sorting system
CN106611604A (en) An automatic voice summation tone detection method based on a deep neural network
CN101149921A (en) Mute test method and device
CN110232415B (en) Train bogie fault identification method based on biological information characteristics
CN112992191B (en) Voice endpoint detection method and device, electronic equipment and readable storage medium
CN111564163A (en) RNN-based voice detection method for various counterfeit operations
CN107145778A (en) A kind of intrusion detection method and device
CN109920445A (en) A kind of sound mixing method, device and equipment
CN111833310B (en) Surface defect classification method based on neural network architecture search
CN111931601A (en) System and method for correcting error class label of gear box
KR101140896B1 (en) Method and apparatus for speech segmentation
CN101256772A (en) Method and device for determining attribution class of non-noise audio signal
CN106531195A (en) Dialogue conflict detection method and device
WO2019015226A1 (en) Method for rapidly identifying wind speed distribution pattern

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant