CN106847270A - A kind of double threshold place name sound end detecting method - Google Patents
A kind of double threshold place name sound end detecting method Download PDFInfo
- Publication number
- CN106847270A CN106847270A CN201611135819.4A CN201611135819A CN106847270A CN 106847270 A CN106847270 A CN 106847270A CN 201611135819 A CN201611135819 A CN 201611135819A CN 106847270 A CN106847270 A CN 106847270A
- Authority
- CN
- China
- Prior art keywords
- voice
- length
- variable
- threshold value
- signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
- G10L15/05—Word boundary detection
Abstract
The invention discloses a kind of double threshold place name sound end detecting method, the energy and minimum energy threshold value, the size of highest energy threshold value of every frame voice signal are judged since the first frame signal, judge the size of zero-crossing rate and zero-crossing rate threshold value, so that it is determined that how next frame signal is detected, and in the case of possibly into voice status, the voice signal at the light time end of pronunciation above occurred to voice segments by increasing variable retains.The characteristics of present invention combines the place name voice signal of isolated word, traditional double threshold method is improved, ensure that schwa and the front portion of voice signal that the duration is very short will not be judged as noise, so as to avoid losing voice signal, the accuracy of end-point detection and the adaptability of live applied environment are improve, the requirement to environment is reduced.
Description
Technical field
The invention belongs to speech terminals detection field, more particularly to a kind of double threshold place name sound end detecting method.
Background technology
With becoming increasingly conspicuous for rapid development of economy and the trend of globalization, modern logistics industry is obtained in developed country
Unprecedented development, and generate huge economic benefit and social benefit.Logistic resources have transport, storage, sorting, packaging,
Dispatching etc., these resources are dispersed in multiple fields, including manufacturing industry, agricultural, circulation industry etc..
In link is sorted, substantially manually sorted at this stage, because workpeople is chronically at noisy building ring
In border, certain sense of fatigue will certainly be produced with body at heart, and the unicity and repeatability of task can also make him
Working condition excessively loosen, this necessarily cause sort accuracy decline, cause it is more it is irremediable sorting error thing
Therefore occur, therefore carry out the mode of manual detection in industrial circle to the product on streamline and can not meet modernization industry
Demand.
Speech recognition changes our life at many aspects till now as the important interface of man-machine interaction
It is living, from the speech control system of smart home to vehicle-mounted voice identifying system etc., therefore speech recognition technology and logistics are sorted
The fusion of link is the inevitable requirement of development of logistics line.
And in speech recognition technology, end-point detection technology is particularly important link in speech recognition, its effect it is good
Bad to directly affect final recognition result, traditional end-point detecting method based on short-time energy and zero-crossing rate is in preferable ring
Could be applicable in border, and for the place name voice signal of isolated word, the accuracy rate of end-point detection is relatively low.
The content of the invention
Shortcoming and deficiency it is an object of the invention to overcome prior art, there is provided a kind of double threshold place name sound end inspection
Survey method, improves the accuracy of end-point detection.
A kind of double threshold place name sound end detecting method, comprises the following steps:Every frame is judged since the first frame signal
The energy of voice signal and minimum energy threshold value, the size of highest energy threshold value, judge the size of zero-crossing rate and zero-crossing rate threshold value,
So that it is determined that the appropriate method detected to next frame signal, and in the case of possibly into voice status, by increasing
Variable retains come the voice signal of the pronunciation light time period above occurred to voice segments.
Comprise the following steps that:
1st, receive the place name voice signal by pretreatment, judge energy per frame voice signal and minimum energy threshold value,
The size of highest energy threshold value and judge the size of zero-crossing rate and zero-crossing rate threshold value;
2nd, when the energy < minimum energy threshold values of the i-th frame voice signal, state variable is set to 0, voice length gauge
Number variable is arranged to 0, shows still in Jing Yin section, and continuing return to step 1 carries out next frame detection;
When the energy > minimum energy threshold values of highest energy threshold value > the i-th frame voice signals, and zero-crossing rate > zero-crossing rate thresholds
Value, 1 is set to by state variable, shows to be likely to be at voice segments, adds 1 by voice length counting variable, while will likely be in
The variable of the length of voice segments adds 1, and return to step 1 carries out next frame detection;
If the 3, stateful variable is 1, the voice signal to being likely to be at voice segments is carried out according to certain standard
Screening, further discriminates between noise section and voice segments;
4th, when the energy > highest energy threshold values of the i-th frame voice signal, then state variable is set to 2, indicates entry into language
Segment, while adding 1 by voice length counting variable, next frame detection is carried out according to step 5;
5th, the energy > minimum energies threshold value of current frame speech signal or the zero-crossing rate > mistakes of current frame speech signal are judged
Whether zero rate threshold value is set up;
If so, represent also in voice segments, be not Jing Yin, state variable is remained 2, voice length counting variable adds 1,
Continue next frame according to step 5 to detect;
If not, illustrate that signal turns to Jing Yin section from voice segments, then Jing Yin length is added 1, and to Jing Yin length
Determine whether;Until finding all effective voice signals, state parameter is set to 3, terminates process.
Preferably, if stateful variable be 1, and voice signal energy be less than minimum energy threshold value when, judgement can
Whether the variable of length that voice segments can be in is set up more than certain threshold value, if so, expression is currently noise section, give up before
Phonological component, writ state variable, the variable of voice length counting variable and the length for being likely to be at voice segments is equal to and 0 and returns
Step 1 continues next frame detection;If not, then express possibility also in voice segments, hold mode variable is equal to 1 and long by voice
Degree counting variable adds 1, and the variable for being likely to be at the length of voice segments plus 1, and return to step 1 carries out next frame detection.
Further, above-mentioned certain threshold value is equal to 6.
Preferably, the step of being determined whether to Jing Yin length be:Whether judge the maximum Jing Yin length of Jing Yin length <
Set up;
If so, then hold mode variable is 2, adds 1 by voice length counting variable, and carry out next frame according to step 5
Detection;
If not, then judge whether voice length counting variable < voice signals minimum length is set up;If voice length
Counting variable < voice signals minimum length is set up, and show above to detect is all noise, state variable is set to 0,
Jing Yin segment length is set to 0, voice length counting variable and is set to 0, is further continued for inspection;If voice length counting variable < voices
Signal minimum length is invalid, represents that voice segments have found, it is believed that be effective voice signal, state parameter is set into 3,
End process.
Preferably, under original state, writ state variable is equal to 0, and voice length counting variable is equal to 0, not true for calculating
The variable that the length of voice segments is likely to be at when surely into voice segments is equal to 0, and Jing Yin length is equal to 0.
Preferably, the value of the minimum energy threshold value is 0.01, and the value of highest energy threshold value is 0.1, and zero-crossing rate threshold value is
100。
Preferably, the Jing Yin maximum length is equal to 10, and the voice signal minimum length is equal to 5.
Preferably, preprocessing process includes that preemphasis is processed and sub-frame processing.
Specifically, preemphasis treatment is come real by the digital filter of the lifting high frequency characteristics with 6dB/ octaves
Existing, the high-pass filter meets H (z)=1- μ z-1, μ=0.97;According to frame length 256, frame moves 128 pairs of voice signals and is divided
Frame.
The present invention compared with prior art, has the following advantages that and beneficial effect:
The characteristics of present invention combines the place name voice signal of isolated word, is improved by traditional double threshold method,
Add the variable slience1 variables for calculating the length for not determining that voice segments are likely to be at during into voice segments, and optimization
Various end-point detection parameters, ensure that schwa and the front portion of interrupted place name voice signal that the duration is very short will not
Noise is judged as, so as to avoid losing voice signal, the accuracy of end-point detection and fitting for live applied environment is improve
Ying Xing, reduces requirement of the end-point detection to environment.
Brief description of the drawings
Fig. 1 is the process schematic of embodiment method.
Specific embodiment
With reference to embodiment and accompanying drawing, the present invention is described in further detail, but embodiments of the present invention are not limited
In this.
During the end-point detection of place name voice signal, if one section of place name voice is in voice segments before this, then locate
In Jing Yin section, normal voice section is entered back into, then can be considered for a section before normal voice section by traditional end-point detecting method
Noise section, then shears voice signal again, and this results in the loss of voice signal, for example " Shijiazhuang " this pronunciation, " stone "
Pronunciation is very light very short, not easy to identify.
And the double threshold place name sound end detecting method that the present embodiment is given, based on improved short-time average energy and mistake
Zero rate, by adding the variable slience1 for calculating the length for not determining that voice segments are likely to be at during into voice segments, i.e.,
Make to run into above-mentioned situation, it is also possible to the voice length before normal voice section is preserved, as effective fragment, so as to improve end
The validity of point detection.
Before end-point detection is carried out, place name voice signal is pre-processed, including preemphasis treatment (Pre-
) and sub-frame processing emphasis.
Because the mean power of voice signal is influenceed by glottal excitation and mouth and nose radiation, front end is about in more than 80Hz
Fall by 6dB octaves, so when speech signal spec-trum is sought, frequency corresponding composition more high is smaller, the frequency spectrum of HFS
It is more hard to find than low frequency part, therefore preemphasis treatment is carried out to voice signal.The central idea of preemphasis treatment is to utilize signal
The difference of characteristic and noise characteristic is effectively processed signal, it is therefore an objective to is lifted HFS, is become the frequency spectrum of signal
It is flat, be maintained at low frequency in the whole frequency band of high frequency, frequency spectrum can be sought with same signal to noise ratio, in order to spectrum analysis or sound
Road Parameter analysis.Preemphasis is realized by the digital filter of the lifting high frequency characteristics with 6dB/ octaves, this implementation
High-pass filter, the high-pass filter is used to meet H (z)=1- μ z in example-1, μ=0.97.
In addition, voice signal is as a whole, its characteristic and characterizes the parameter of its substantive characteristics and change over time,
But it has short-term stationarity characteristic again, be can be regarded as (generally in 10ms~30ms) in a short time one it is approximate constant
Stationary process.
Current most of voice process technology is that voice signal is carried out at framing on the basis of in short-term
Reason, then extracts characteristic parameter section to each frame respectively, in order that being smoothed between frame and frame, keeps continuity, general using friendship
The method of folded framing, makes former frame and a later frame have intersection, and intersection is referred to as frame shifting, will be to frame length and frame during framing
The length of shifting is selected, if using larger frame length, very little, amount of calculation can be small for frame number, and the speed of system treatment is fast, but
Easily increase the error of end-point detection, if using less frame length, frame number is more, amount of calculation increases, the speed of system treatment
Degree is slow.General frame number per second is about 33~100 frames, and frame is moved and typically takes the 1/3~2/3 of frame length, in the present embodiment, according to frame length
256, frame moves 128 pairs of voice signals and carries out framing, and 256,128 are sampled point number.
To place name voice signal by after pretreatment, you can carry out end-point detection, as shown in Figure 1, specific steps are such as
Under:
Under original state, writ state variable status=0, voice length counting variable count=0 be not true for calculating
The variable slience1=0, Jing Yin length slience=0 of the length of voice segments are likely to be at when surely into voice segments.
S1, reception judge the energy amp [i] and most low energy per frame voice signal by the place name voice signal of pretreatment
Measure threshold value amp2, the size of highest energy threshold value amp1 and judge the size of zero-crossing rate zcr [i] and zero-crossing rate threshold value zcr, its
In, the value of the minimum energy threshold value amp2 is 0.01, and the value of highest energy threshold value amp1 is 0.1, and zero-crossing rate threshold value zcr is
100。
These threshold values are the threshold values that voice signal is set after normalized, it is assumed that voice signal is x=[x1,
x2... xn], then normalized is:
After these treatment, all values in signal x are between [- 1,1].The threshold value set on the basis of this, with
Lower data are the threshold values for setting after normalization.
This process is to be detected each frame of voice signal successively, according to the judged result of each frame of voice signal,
The value of state variable status is set, so that it is determined that how next frame voice signal should be judged.
S2, as energy amp [i] < minimum energy threshold value amp2 of the i-th frame voice signal, state variable status is set
0 is set to, voice length counting variable count is arranged to 0, shown still in Jing Yin section, continuing return S1 steps carries out next frame
Detection;
S3, energy amp [i] > minimum energy threshold value amp2 when highest energy threshold value amp1 > the i-th frame voice signals, and
Zero-crossing rate zcr [i] > zero-crossing rate threshold value zcr, 1 is set to by state variable status, shows to be likely to be at voice segments, by voice
Length counting variable count adds 1, while the variable sliencel of the length that will likely be in voice segments adds 1, and returns to S1 steps
Carry out next frame detection.
S4, if state status=1 is come into, and when the energy of next frame voice signal is less than minimum energy threshold value
During amp2, judge whether sliencel > 6 set up, if so, expression is currently noise section, gives up phonological component above, is made
State variable status=0, voice length counting variable count=0, is likely to be at the variable slience1 of the length of voice segments
=0 and return S1 steps continue next frame detection;If not, then express possibility also in voice segments, hold mode variable
Status=1 and voice length counting variable count is added 1, the variable slience1 for being likely to be at the length of voice segments plus 1,
Returning to S1 steps carries out next frame detection.
S5, energy amp [i] > highest energy threshold value amp1 when the i-th frame voice signal, then set state variable status
2 are set to, voice segments are indicated entry into, while adding 1 by voice length counting variable count, next frame detection are carried out according to S6 steps.
S6, energy amp [i] the > minimum energy threshold value amp2 for judging current frame speech signal or current frame speech signal
Whether zero-crossing rate zcr [i] > zero-crossing rate threshold values zcr sets up.
If so, represent also in voice segments, be not Jing Yin, state variable status is remained 2, voice length is counted and become
Amount count adds 1, continues next frame according to S6 steps and detects.
If not, illustrate that signal turns to Jing Yin section from voice segments, then Jing Yin length slience is added 1, it is quiet herein
Duration of a sound degree slience variables are with judging whether voice signal terminates later, and to perform S9 steps.
S9, judge whether the maximum Jing Yin length maxslience of Jing Yin length slience < sets up, wherein it is described it is Jing Yin most
Long length maxslience=10;
If so, then express possibility also in voice segments, because after above there is voice signal, it is current Jing Yin section
Voice length be not reaching to maximum Jing Yin length, then illustrate voice signal behind may not terminate also, may also have signal, because
This may also in voice segments, and hold mode variable status is 2, adds 1 by voice length counting variable count, and according to step
S6 carries out next frame detection.
If not, then judge whether voice length counting variable count < voice signal minimum lengths minlen sets up,
Wherein described voice signal minimum length minlen=5;If voice length counting variable count < voice signal minimum lengths
Minlen sets up, and show above to detect is all noise, because:Normal voice signal length should be more than voice
Signal minimum length minlen, if being less than this length, is judged to noise, and state variable status is set into 0, Jing Yin segment length
Degree slience is set to 0, voice length counting variable count and is set to 0, is further continued for inspection;If voice length counting variable
Minlen is invalid for count < voice signal minimum lengths, represents that voice segments have found, it is believed that be effective voice signal,
Therefore whole process can be terminated, will state parameter status be set to 3, terminate process.
In the present embodiment for scope judgement all be with more than or less than expression, do not refer to and being equal to, can by be equal to return
Become more than that class.
Above-described embodiment is the present invention preferably implementation method, but embodiments of the present invention are not by above-described embodiment
Limitation, it is other it is any without departing from Spirit Essence of the invention and the change, modification, replacement made under principle, combine, simplification,
Equivalent substitute mode is should be, is included within protection scope of the present invention.
Claims (10)
1. a kind of double threshold place name sound end detecting method, it is characterised in that comprise the following steps:Since the first frame signal
Judge the energy and minimum energy threshold value, the size of highest energy threshold value per frame voice signal, judge zero-crossing rate and zero-crossing rate threshold
The size of value, so that it is determined that the appropriate method detected to next frame signal, and in the case of possibly into voice status,
The voice signal of the pronunciation light time period above occurred to voice segments by increasing variable retains.
2. double threshold place name sound end detecting method according to claim 2, it is characterised in that comprise the following steps that:
(1) the place name voice signal by pretreatment, is received, energy per frame voice signal and minimum energy threshold value, most is judged
The size of high-energy threshold value and judge the size of zero-crossing rate and zero-crossing rate threshold value;
(2), when the energy < minimum energy threshold values of the i-th frame voice signal, state variable is set to 0, voice length is counted
Variable is arranged to 0, shows still in Jing Yin section, and continuing return to step 1 carries out next frame detection;
When the energy > minimum energy threshold values of highest energy threshold value > the i-th frame voice signals, and zero-crossing rate > zero-crossing rate threshold values, will
State variable is set to 1, shows to be likely to be at voice segments, adds 1 by voice length counting variable, while voice segments will likely be in
The variable of length add 1, and return to step 1 carries out next frame detection;
(3) if, stateful variable be 1, the voice signal to being likely to be at voice segments is sieved according to certain standard
Choosing, further discriminates between noise section and voice segments;
(4), when the energy > highest energy threshold values of the i-th frame voice signal, then state variable is set to 2, indicates entry into voice
Section, while adding 1 by voice length counting variable, next frame detection is carried out according to step 5;
(5) the energy > minimum energies threshold value of current frame speech signal or the zero-crossing rate > zero passages of current frame speech signal, are judged
Whether rate threshold value is set up;
If so, represent also in voice segments, be not Jing Yin, state variable is remained 2, voice length counting variable adds 1, according to
Step 5 continues next frame detection;
If not, illustrate that signal turns to Jing Yin section from voice segments, then Jing Yin length added 1, and Jing Yin length is made into
One step judges;Until finding all effective voice signals, state parameter is set to 3, terminates process.
3. double threshold place name sound end detecting method according to claim 2, it is characterised in that in step (3), if
It is 1 through stateful variable, and the energy of voice signal, when being less than minimum energy threshold value, judgement is likely to be at the length of voice segments
Whether variable is set up more than certain threshold value, if so, expression is currently noise section, gives up phonological component above, and writ state becomes
The variable of amount, voice length counting variable and the length for being likely to be at voice segments is equal to 0 and return to step 1 continues next frame inspection
Survey;If not, then express possibility also in voice segments, hold mode variable is equal to 1 and adds 1 by voice length counting variable, can
The variable of the length that can be in voice segments adds 1, and return to step (1) carries out next frame detection.
4. double threshold place name sound end detecting method according to claim 3, it is characterised in that described certain threshold value etc.
In 6.
5. double threshold place name sound end detecting method according to claim 2, it is characterised in that to quiet in step (5)
The step of duration of a sound degree is determined whether be:Judge whether the maximum Jing Yin length of Jing Yin length < is set up;
If so, then hold mode variable is 2, adds 1 by voice length counting variable, and carry out next frame detection according to step 5;
If not, then judge whether voice length counting variable < voice signals minimum length is set up;If voice length is counted
Variables L T.LT.LT voice signals minimum length is set up, and show above to detect is all noise, state variable is set into 0, Jing Yin
Segment length is set to 0, voice length counting variable and is set to 0, is further continued for inspection;If voice length counting variable < voice signals
Minimum length is invalid, represents that voice segments have found, it is believed that be effective voice signal, and state parameter is set into 3, terminates
Process.
6. double threshold place name sound end detecting method according to claim 2, it is characterised in that under original state, order
State variable is equal to 0, and voice length counting variable is equal to 0, and voice segments are likely to be at when not determining into voice segments for calculating
Length variable be equal to 0, Jing Yin length be equal to 0.
7. double threshold place name sound end detecting method according to claim 2, it is characterised in that the minimum energy threshold
The value of value is 0.01, and the value of highest energy threshold value is 0.1, and zero-crossing rate threshold value is 100.
8. double threshold place name sound end detecting method according to claim 5, it is characterised in that described Jing Yin most to greatly enhance
Degree is equal to 10, and the voice signal minimum length is equal to 5.
9. double threshold place name sound end detecting method according to claim 2, it is characterised in that preprocessing process includes
Preemphasis treatment and sub-frame processing.
10. double threshold place name sound end detecting method according to claim 9, it is characterised in that preemphasis treatment is
Realized by the digital filter of the lifting high frequency characteristics with 6dB/ octaves, the high-pass filter meets H (z)=1-
μz-1, μ=0.97;According to frame length 256, frame moves 128 pairs of voice signals and carries out framing.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611135819.4A CN106847270B (en) | 2016-12-09 | 2016-12-09 | Double-threshold place name voice endpoint detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611135819.4A CN106847270B (en) | 2016-12-09 | 2016-12-09 | Double-threshold place name voice endpoint detection method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106847270A true CN106847270A (en) | 2017-06-13 |
CN106847270B CN106847270B (en) | 2020-08-18 |
Family
ID=59139133
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611135819.4A Active CN106847270B (en) | 2016-12-09 | 2016-12-09 | Double-threshold place name voice endpoint detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106847270B (en) |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107452399A (en) * | 2017-09-18 | 2017-12-08 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio feature extraction methods and device |
CN108332843A (en) * | 2018-01-29 | 2018-07-27 | 国家电网公司 | A kind of noise diagnostics method of electrical equipment malfunction electric arc |
CN108847218A (en) * | 2018-06-27 | 2018-11-20 | 郑州云海信息技术有限公司 | A kind of adaptive threshold adjusting sound end detecting method, equipment and readable storage medium storing program for executing |
CN109346095A (en) * | 2018-10-10 | 2019-02-15 | 广州市讯飞樽鸿信息技术有限公司 | A kind of heart sound end-point detecting method |
CN109949806A (en) * | 2019-03-12 | 2019-06-28 | 百度国际科技(深圳)有限公司 | Information interacting method and device |
CN110634473A (en) * | 2019-09-20 | 2019-12-31 | 广州大学 | Voice digital recognition method based on MFCC |
CN112511698A (en) * | 2020-12-03 | 2021-03-16 | 普强时代(珠海横琴)信息技术有限公司 | Real-time call analysis method based on universal boundary detection |
CN114283840A (en) * | 2021-12-22 | 2022-04-05 | 天翼爱音乐文化科技有限公司 | Instruction audio generation method, system, device and storage medium |
WO2023092399A1 (en) * | 2021-11-25 | 2023-06-01 | 华为技术有限公司 | Speech recognition method, speech recognition apparatus, and system |
CN116386676A (en) * | 2023-06-02 | 2023-07-04 | 北京探境科技有限公司 | Voice awakening method, voice awakening device and storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1426048A (en) * | 2001-12-13 | 2003-06-25 | 中国科学院自动化研究所 | End detection method based on entropy |
US20070198251A1 (en) * | 2006-02-07 | 2007-08-23 | Jaber Associates, L.L.C. | Voice activity detection method and apparatus for voiced/unvoiced decision and pitch estimation in a noisy speech feature extraction |
TW200811833A (en) * | 2006-08-24 | 2008-03-01 | Inventec Besta Co Ltd | Detection method for voice activity endpoint |
CN102254558A (en) * | 2011-07-01 | 2011-11-23 | 重庆邮电大学 | Control method of intelligent wheel chair voice recognition based on end point detection |
CN103366739A (en) * | 2012-03-28 | 2013-10-23 | 郑州市科学技术情报研究所 | Self-adaptive endpoint detection method and self-adaptive endpoint detection system for isolate word speech recognition |
CN104157290A (en) * | 2014-08-19 | 2014-11-19 | 大连理工大学 | Speaker recognition method based on depth learning |
-
2016
- 2016-12-09 CN CN201611135819.4A patent/CN106847270B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1426048A (en) * | 2001-12-13 | 2003-06-25 | 中国科学院自动化研究所 | End detection method based on entropy |
US20070198251A1 (en) * | 2006-02-07 | 2007-08-23 | Jaber Associates, L.L.C. | Voice activity detection method and apparatus for voiced/unvoiced decision and pitch estimation in a noisy speech feature extraction |
TW200811833A (en) * | 2006-08-24 | 2008-03-01 | Inventec Besta Co Ltd | Detection method for voice activity endpoint |
CN102254558A (en) * | 2011-07-01 | 2011-11-23 | 重庆邮电大学 | Control method of intelligent wheel chair voice recognition based on end point detection |
CN103366739A (en) * | 2012-03-28 | 2013-10-23 | 郑州市科学技术情报研究所 | Self-adaptive endpoint detection method and self-adaptive endpoint detection system for isolate word speech recognition |
CN104157290A (en) * | 2014-08-19 | 2014-11-19 | 大连理工大学 | Speaker recognition method based on depth learning |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107452399A (en) * | 2017-09-18 | 2017-12-08 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio feature extraction methods and device |
CN108332843A (en) * | 2018-01-29 | 2018-07-27 | 国家电网公司 | A kind of noise diagnostics method of electrical equipment malfunction electric arc |
CN108847218B (en) * | 2018-06-27 | 2020-07-21 | 苏州浪潮智能科技有限公司 | Self-adaptive threshold setting voice endpoint detection method, equipment and readable storage medium |
CN108847218A (en) * | 2018-06-27 | 2018-11-20 | 郑州云海信息技术有限公司 | A kind of adaptive threshold adjusting sound end detecting method, equipment and readable storage medium storing program for executing |
CN109346095A (en) * | 2018-10-10 | 2019-02-15 | 广州市讯飞樽鸿信息技术有限公司 | A kind of heart sound end-point detecting method |
CN109346095B (en) * | 2018-10-10 | 2023-07-07 | 广州九路科技有限公司 | Heart sound endpoint detection method |
CN109949806A (en) * | 2019-03-12 | 2019-06-28 | 百度国际科技(深圳)有限公司 | Information interacting method and device |
CN110634473A (en) * | 2019-09-20 | 2019-12-31 | 广州大学 | Voice digital recognition method based on MFCC |
CN112511698A (en) * | 2020-12-03 | 2021-03-16 | 普强时代(珠海横琴)信息技术有限公司 | Real-time call analysis method based on universal boundary detection |
WO2023092399A1 (en) * | 2021-11-25 | 2023-06-01 | 华为技术有限公司 | Speech recognition method, speech recognition apparatus, and system |
CN114283840A (en) * | 2021-12-22 | 2022-04-05 | 天翼爱音乐文化科技有限公司 | Instruction audio generation method, system, device and storage medium |
CN116386676A (en) * | 2023-06-02 | 2023-07-04 | 北京探境科技有限公司 | Voice awakening method, voice awakening device and storage medium |
CN116386676B (en) * | 2023-06-02 | 2023-08-29 | 北京探境科技有限公司 | Voice awakening method, voice awakening device and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN106847270B (en) | 2020-08-18 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106847270A (en) | A kind of double threshold place name sound end detecting method | |
WO2021128576A1 (en) | Tool condition monitoring dataset enhancement method based on generative adversarial network | |
CN105611477B (en) | The voice enhancement algorithm that depth and range neutral net are combined in digital deaf-aid | |
EP2096629B1 (en) | Method and apparatus for classifying sound signals | |
CN101197130B (en) | Sound activity detecting method and detector thereof | |
CN107835496B (en) | Spam short message identification method and device and server | |
CN106503805A (en) | A kind of bimodal based on machine learning everybody talk with sentiment analysis system and method | |
CN103366739B (en) | Towards self-adaptation end-point detecting method and the system thereof of alone word voice identification | |
CN107403198A (en) | A kind of official website recognition methods based on cascade classifier | |
CN107492382A (en) | Voiceprint extracting method and device based on neutral net | |
CN111161207B (en) | Integrated convolutional neural network fabric defect classification method | |
CN106601230B (en) | Logistics sorting place name voice recognition method and system based on continuous Gaussian mixture HMM model and logistics sorting system | |
CN106611604A (en) | An automatic voice summation tone detection method based on a deep neural network | |
CN101149921A (en) | Mute test method and device | |
CN110232415B (en) | Train bogie fault identification method based on biological information characteristics | |
CN112992191B (en) | Voice endpoint detection method and device, electronic equipment and readable storage medium | |
CN111564163A (en) | RNN-based voice detection method for various counterfeit operations | |
CN107145778A (en) | A kind of intrusion detection method and device | |
CN109920445A (en) | A kind of sound mixing method, device and equipment | |
CN111833310B (en) | Surface defect classification method based on neural network architecture search | |
CN111931601A (en) | System and method for correcting error class label of gear box | |
KR101140896B1 (en) | Method and apparatus for speech segmentation | |
CN101256772A (en) | Method and device for determining attribution class of non-noise audio signal | |
CN106531195A (en) | Dialogue conflict detection method and device | |
WO2019015226A1 (en) | Method for rapidly identifying wind speed distribution pattern |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |