CN106340310A - Speech detection method and device - Google Patents
Speech detection method and device Download PDFInfo
- Publication number
- CN106340310A CN106340310A CN201510401974.5A CN201510401974A CN106340310A CN 106340310 A CN106340310 A CN 106340310A CN 201510401974 A CN201510401974 A CN 201510401974A CN 106340310 A CN106340310 A CN 106340310A
- Authority
- CN
- China
- Prior art keywords
- acoustic image
- speech detection
- image rule
- present frame
- characteristic vector
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Image Analysis (AREA)
Abstract
The invention provides a speech detection method and device. The speech detection method comprises the steps that sound data corresponding to an input sound signal are framed to acquire a number of sound frames; the eigenvector of the current frame is calculated, wherein the eigenvector comprises the wide window level energy difference, the narrow window level energy difference and the zero-crossing energy difference; the eigenvector of the current frame and a preset fuzzy audio and video rule are matched to acquire the corresponding speech detection score, wherein the fuzzy audio and video rule is acquired by training a sound training sample; and if the calculated speech detection score is greater than a score threshold, sound data corresponding to the current frame are detected. According to the scheme, the speech detection speed can be improved, and the speech detection cost is reduced.
Description
Technical field
The present invention relates to speech detection technical field, more particularly to a kind of speech detection method and device.
Background technology
Mobile terminal, refers to computer equipment used in movement.With integrated circuit technique
Develop rapidly, mobile terminal has had powerful disposal ability, mobile terminal is from simply logical
Words instrument is changed into an integrated information processing platform, and this also increased broader development to mobile terminal
Space.
Traditional mobile terminal, it usually needs user's manual operation, so that user concentrates certain note
Meaning power.Speech detection method and always listen system (always listening system) using so that can
Mobile terminal is carried out with non-manual activation and operates.When described always listen system detectio to arrive acoustical signal when,
Speech detection system will activate, and the acoustical signal detecting is detected.Then, mobile terminal
Will be according to the detected corresponding operation of acoustical signal execution, for example, when user input " dials xx
Mobile phone " voice when, mobile terminal just can be to the voice messaging of " dialing the mobile phone of xx " of user input
Detected, and after correct detection, obtained the information of the phone number of xx from mobile terminal, and dialled
Beat.
But, speech detection method in prior art, generally use complex mathematical model and come to defeated
The acoustical signal entering is detected, accordingly, there exist that detection speed is slow and the problem of high cost.
Content of the invention
The problem that the embodiment of the present invention solves is how to improve the speed of speech detection, and reduces speech detection
Cost.
For solving the above problems, embodiments provide a kind of speech detection method, described voice inspection
Survey method includes:
Sub-frame processing is carried out to the corresponding voice data of acoustical signal inputting and obtains multiple voiced frames;
Calculate the characteristic vector of present frame, described characteristic vector includes wide window position energy difference, narrow window potential energy amount
Difference and zero passage energy difference;
The characteristic vector of present frame is mated with default fuzzy acoustic image rule, is obtained corresponding voice
Detection score value, described fuzzy acoustic image rule is that voice training sample training is obtained;
When the speech detection score value calculating is more than point threshold, the corresponding voice data to present frame
Detected.
Alternatively, described fuzzy acoustic image rule is to be combined to sound using neural network algorithm and genetic algorithm
Training sample is trained obtaining.
Alternatively, described fuzzy acoustic image rule includes the first kind fuzzy acoustic image rule and Equations of The Second Kind obscures acoustic image
Rule, the described characteristic vector by present frame is mated with default fuzzy acoustic image rule, is corresponded to
Speech detection score value, comprising:
When the characteristic vector determining present frame obscures acoustic image rule with the first kind of described default fuzzy acoustic image rule
When then matching, the speech detection score value of the present frame obtaining is 0;
When the characteristic vector determining present frame obscures acoustic image rule with the Equations of The Second Kind of described default fuzzy acoustic image rule
When then matching, the speech detection score value of the present frame obtaining is 1.
Alternatively, described point threshold is the signal to noise ratio of the voice data of present frame.
The embodiment of the present invention additionally provides a kind of speech detection device, and described device includes:
Sub-frame processing unit, is suitable to the corresponding voice data of acoustical signal of input is carried out sub-frame processing and obtains
To multiple voiced frames;
Computing unit, is suitable to calculate the characteristic vector of present frame, described characteristic vector includes wide window potential energy amount
Poor, narrow window position energy difference and zero passage energy difference;
Matching unit, is suitable to be mated the characteristic vector of present frame with default fuzzy acoustic image rule,
Obtain corresponding speech detection score value, described fuzzy acoustic image rule is that voice training sample training is obtained;
Detector unit, is suitable to when the speech detection score value calculating is more than point threshold, to present frame
Corresponding voice data is detected.
Alternatively, described fuzzy acoustic image rule is to be combined to sound using neural network algorithm and genetic algorithm
Training sample is trained obtaining.
Alternatively, described fuzzy acoustic image rule includes the first kind fuzzy acoustic image rule and Equations of The Second Kind obscures acoustic image
Rule, described matching unit is determining the of the characteristic vector of present frame and described default fuzzy acoustic image rule
When the fuzzy acoustic image rule of one class matches, the speech detection score value obtaining present frame is 0;Determining present frame
The Equations of The Second Kind of characteristic vector and described default fuzzy acoustic image rule when obscuring acoustic image rule and matching, obtain
The speech detection score value of present frame is 1.
Alternatively, described point threshold is the signal to noise ratio of the voice data of present frame.
Compared with prior art, technical scheme has the advantage that
Above-mentioned scheme, calculates the corresponding characteristic vector of each voiced frame by default fuzzy acoustic image rule
Speech detection score value, to determine whether the acoustical signal of input to be detected, due to described fuzzy
Acoustic image rule is used only for detecting in present frame whether include voice messaging, and wraps without being concerned about in present frame
The particular content of the speech data including, it is thus possible to improve the speed of speech detection, reduces speech detection
Cost.
Further, the signal to noise ratio of the speech detection score value of calculated present frame and present frame is carried out
Relatively, when the speech detection score value determining present frame is more than the signal to noise ratio of present frame, determine in present frame
Including speech data, because the signal to noise ratio of present frame can reflect that the background that present frame includes is made an uproar exactly
The information of sound, it is thus possible to improve the accuracy rate of speech detection, the experience of lifting user.
Brief description
Fig. 1 is the flow chart of one of embodiment of the present invention speech detection method;
Fig. 2 is the flow chart of another kind of speech detection method in the embodiment of the present invention;
Fig. 3 is the structural representation of one of embodiment of the present invention speech detection device.
Specific embodiment
Of the prior art system is always listened to adopt voice activity detection (voice activity detection, vad)
Technology sound is detected.But, existing voice activity detection method, it usually needs train
The voice data of input is detected to the mathematical model for speech detection, due to described mathematical modulo
Type more sends out miscellaneous so that the process of speech detection is complex, accordingly, there exist detection speed slowly and becomes
This high problem.
For solving the above-mentioned problems in the prior art, the technical scheme that the embodiment of the present invention adopts is passed through
Default fuzzy acoustic image rule calculates the speech detection score value of the corresponding characteristic vector of each voiced frame, comes really
Determine whether the acoustical signal inputting to be detected, the speed of speech detection can be improved, reduce voice
The cost of detection.
Understandable for enabling the above objects, features and advantages of the present invention to become apparent from, below in conjunction with the accompanying drawings
The specific embodiment of the present invention is described in detail.
The flow chart that Fig. 1 shows one of embodiment of the present invention speech detection method.As shown in Figure 1
Speech detection method, may include that
Step s101: sub-frame processing is carried out to the corresponding voice data of acoustical signal inputting and obtains multiple sound
Sound frame;
In being embodied as, it is possible to use the acoustical signal that mike (mic) comes to external world is acquired,
When collecting acoustical signal, the acoustical signal of collection is processed, obtains corresponding voice data,
And the voice data obtaining is divided into plural frame.
Step s102: calculate present frame characteristic vector, described characteristic vector include wide window position energy difference,
Narrow window potential energy amount and zero passage energy difference.
In being embodied as, when the voice data obtaining being carried out sub-frame processing obtaining plural frame,
Order according to the time calculates the corresponding characteristic vector of each frame frame by frame, and the characteristic vector according to each frame
Determine whether to carry out speech detection to corresponding frame.Wherein, for the ease of description, calculating each frame by frame
During the characteristic vector of frame, currently can calculated the frame of characteristic vector as described present frame.
Step s103: the characteristic vector of present frame is mated with default fuzzy acoustic image rule, obtains
Corresponding speech detection score value.
In being embodied as, described fuzzy acoustic image rule is that voice training sample is trained obtaining, institute
State fuzzy acoustic image rule and include a plurality of acoustic image rule, the as set of acoustic image rule.Wherein, each sound
As rule includes corresponding decision-making score value, on the characteristic vector determining present frame and described fuzzy acoustic image rule
During any bar acoustic image rule match in then, the fuzzy acoustic image rule that matches with the characteristic vector of present frame
In decision-making score value, the as speech detection score value of present frame.
Step s104: when the speech detection score value calculating is more than point threshold, present frame is corresponded to
Voice data detected.
In being embodied as, described point threshold can be for immobilizing it is also possible to difference according to each frame
Different and change, those skilled in the art can be configured according to the actual needs.
Below in conjunction with Fig. 2, the speech detection method in the embodiment of the present invention is further described in detail.
Step s201: sub-frame processing is carried out to the corresponding voice data of acoustical signal inputting and obtains multiple sound
Sound frame.
In being embodied as, it is possible to use the acoustical signal that mike (mic) comes to external world is acquired,
When collecting acoustical signal, the acoustical signal of collection is processed, obtains corresponding voice data,
And the voice data obtaining is divided into plural frame.
Step s202: calculate the characteristic vector of present frame.
In being embodied as, the characteristic vector of present frame includes wide window position energy difference (wide-window
Energy difference), narrow window position energy difference (narrow-window energy difference) and zero passage
Energy difference (zero-crossing difference), wherein, wide window position energy difference, narrow window position energy difference and mistake
Zero energy difference can be calculated by below equation respectively:
δ ew=ew-ewa (1)
δ en=en-ena (2)
δ z=z-za (3)
Wherein, δ ew represents described width window position energy difference, and ew represents the wide window potential energy amount of present frame, ewa
Represent the long-term mean value of wide window potential energy amount, δ en represents described narrow window position energy difference, and en represents the narrow of present frame
Window potential energy amount, ena represents the long-term mean value of narrow window potential energy amount, and δ z represents described zero passage energy difference, and z represents
The zero energy excessively of present frame, za represented the long-term mean value of zero energy.
Wherein, due to wide window position energy difference δ ew, the calculating of narrow window position energy difference δ en and zero passage energy difference δ z
All carry out in time domain rather than frequency domain, therefore, and then computing resource can be saved, improve speech detection
Speed.
It is to be herein pointed out for the qualitative change reflecting background noise, wide window potential energy amount long-term
Average ewa, long-term mean value ena of narrow window potential energy amount and long-term mean value zna crossing zero energy, only in noise
It is updated when appearance.
Step s203: the characteristic vector of present frame is mated with default fuzzy acoustic image rule, and sentences
The characteristic vector of disconnected present frame is matched also with the fuzzy acoustic image rule of the first kind in described fuzzy acoustic image rule
It is to obscure acoustic image rule with Equations of The Second Kind to match;When the characteristic vector determining present frame and described first kind mould
When paste rule matches, execution step s204;When the characteristic vector determining present frame and described Equations of The Second Kind mould
When paste rule matches, execution step s205;.
In being embodied as, the grammatical representation in human language defines empirical rule, by described experience
Regular people can carry out language performance using the experience of itself with problem.Pre- in the embodiment of the present invention
If the acoustic image rule that exactly manually obtained according to the cognitive inspiration standard of the problem that comes from of fuzzy acoustic image rule
Set then.
In being embodied as, combined using genetic algorithm and neural algorithm and voice training sample is instructed
Get fuzzy acoustic image rule.Wherein, genetic algorithm can search out the genetic function with several variables
Globally optimal solution, there is larger motility, and insensitive to local Optimal solution problem, therefore, tool
There is good robustness.Neural algorithm then can reduce execution time and can reduce the mistake of genetic algorithm
Rate.Meanwhile, genetic algorithm and neural algorithm are less in the operand that sample sound is trained, permissible
Save computing resource.
Using genetic algorithm and neural algorithm sample sound is trained obtain a series of with described feature to
Each variable (wide window position energy difference δ ew, narrow window position energy difference δ en and zero passage energy difference δ z) phase in amount
The acoustic image sequence of association.Then, more manually it is that described output sequence interpolation decision-making score value obtains finally
Fuzzy acoustic image rule, wherein, when the corresponding voice training sample of described acoustic image sequence includes voice,
It is then that the decision-making score value that described acoustic image sequence is added obtains first kind broad image rule for 1, conversely, then
The decision-making score value adding for described acoustic image sequence is 0, obtains Equations of The Second Kind and obscures acoustic image rule.Table 1 shows
The example of the modulus acoustic image rule in present invention enforcement:
Table 1
Wherein, when corresponding characteristic vector corresponding ultrasonogram picture is zero, show corresponding characteristic vector
Numerical value larger;When corresponding characteristic vector corresponding ultrasonogram picture be when, show corresponding feature to
The numerical value of amount is medium;When corresponding characteristic vector corresponding ultrasonogram picture is △, show corresponding spy
The numerical value levying vector is less.
In being embodied as, when the characteristic vector of the present frame calculating is located at interval accordingly, then with
Corresponding acoustic image image identification forms corresponding acoustic image sequence.Then, by corresponding for present frame acoustic image sequence
Contrasted with default fuzzy acoustic image rule.
Step s204: the first kind in the characteristic vector determining present frame is regular with default fuzzy acoustic image
Fuzzy acoustic image rule matches, and the speech detection score value of output present frame is 0.
In being embodied as, as shown in table 1, the decision-making score value that the first kind obscures in acoustic image rule is 0, because
This, when the characteristic vector determining present frame obscures acoustic image rule with the first kind in default fuzzy acoustic image rule
Then match, the speech detection score value of output present frame is 0.
Step s205: the Equations of The Second Kind in the characteristic vector determining present frame is regular with default fuzzy acoustic image
When fuzzy acoustic image rule matches, the speech detection score value of output present frame is 1.
In being embodied as, as shown in table 1, the decision-making score value that Equations of The Second Kind obscures in acoustic image rule is 1, because
This, when the characteristic vector determining present frame obscures acoustic image rule with the Equations of The Second Kind in default fuzzy acoustic image rule
Then match, the speech detection score value of output present frame is 1.
Step s206: the speech detection score value of present frame and the signal to noise ratio of present frame are compared, judge
Whether the speech detection score value of present frame is more than the signal to noise ratio of present frame, when judged result is to be, permissible
Execution step s207, conversely, then do not execute any operation.
In being embodied as, signal to noise ratio (snr) data of each frame reflects voice signal and noise
Ratio, it meets following condition:
(1) it is evenly distributed in all possible numerical intervals;
(2) slowly varying between adjacent frame;
(3) different optimal thresholds has different numerical value;
(4) magnitude can be minimized according to signal to noise ratio.
Therefore, in view of signal to noise ratio meets above-mentioned condition, using present frame signal to noise ratio as point threshold
Carry out contrast to determine whether to carry out speech detection to present frame with the speech detection score value of present frame, permissible
Improve the accuracy of speech detection.
Step s207: speech detection is carried out to present frame.
Below in conjunction with Fig. 3 to the speech detection method in the embodiment of the present invention corresponding speech detection device
Make further details of introduction.
Fig. 3 shows the structural representation of one of embodiment of the present invention speech detection device, such as Fig. 3
Shown speech detection device 300, can include sub-frame processing unit 301, computing unit 302, coupling
Unit 303 and detector unit 304, wherein:
Sub-frame processing unit 301, is suitable to carry out sub-frame processing to the corresponding voice data of acoustical signal of input
Obtain multiple voiced frames.
Computing unit 302, is suitable to calculate the characteristic vector of present frame, described characteristic vector includes wide window potential energy
Measure poor, narrow window position energy difference and zero passage energy difference.
Matching unit 303, is suitable to be mated the characteristic vector of present frame with default fuzzy acoustic image rule,
Obtain corresponding speech detection score value, described fuzzy acoustic image rule is that voice training sample training is obtained.
Wherein, described fuzzy acoustic image rule is to be combined to voice training sample using neural network algorithm and genetic algorithm
Originally it is trained obtaining.
In being embodied as, described fuzzy acoustic image rule includes the first kind and obscures acoustic image rule and Equations of The Second Kind mould
Paste acoustic image rule, described matching unit 303 is in the characteristic vector determining present frame and described default fuzzy sound
When matching as the fuzzy acoustic image rule of the first kind of rule, the speech detection score value obtaining present frame is 0;?
Determine that the characteristic vector of present frame and the Equations of The Second Kind of described default fuzzy acoustic image rule obscure acoustic image rule phase
Timing, the speech detection score value obtaining present frame is 1.
Detector unit 304, is suitable to when the speech detection score value calculating is more than point threshold, to current
The corresponding voice data of frame is detected.Wherein, in order to improve the accuracy of speech detection, described score value
Threshold value is the signal to noise ratio of the voice data of present frame.
One of ordinary skill in the art will appreciate that all or part step in the various methods of above-described embodiment
Suddenly the program that can be by complete come the hardware to instruct correlation, and this program can be stored in computer-readable
In storage medium, storage medium may include that rom, ram, disk or CD etc..
Above the method and system of the embodiment of the present invention are had been described in detail, the present invention is not limited to this.
Any those skilled in the art, without departing from the spirit and scope of the present invention, all can make various change with
Modification, therefore protection scope of the present invention should be defined by claim limited range.
Claims (8)
1. a kind of speech detection method is it is characterised in that include:
Sub-frame processing is carried out to the corresponding voice data of acoustical signal inputting and obtains multiple voiced frames;
Calculate the characteristic vector of present frame, described characteristic vector includes wide window position energy difference, narrow window position energy difference
With zero passage energy difference;
The characteristic vector of present frame is mated with default fuzzy acoustic image rule, is obtained corresponding voice inspection
Survey score value, described fuzzy acoustic image rule is that voice training sample training is obtained;
When the speech detection score value calculating is more than point threshold, to present frame, corresponding voice data enters
Row detection.
2. speech detection method according to claim 1 is it is characterised in that described fuzzy acoustic image rule is
Combined using neural network algorithm and genetic algorithm and voice training sample is trained obtaining.
3. speech detection method according to claim 1 is it is characterised in that described fuzzy acoustic image rule is wrapped
Include the first kind and obscure acoustic image rule and Equations of The Second Kind fuzzy acoustic image rule, the described characteristic vector by present frame
Mated with default fuzzy acoustic image rule, obtained corresponding speech detection score value, comprising:
When the characteristic vector determining present frame obscures acoustic image rule with the first kind of described default fuzzy acoustic image rule
When then matching, the speech detection score value of the present frame obtaining is 0;
When the characteristic vector determining present frame obscures acoustic image rule with the Equations of The Second Kind of described default fuzzy acoustic image rule
When then matching, the speech detection score value of the present frame obtaining is 1.
4. speech detection method according to claim 1 is it is characterised in that described point threshold is current
The signal to noise ratio of the voice data of frame.
5. a kind of speech detection device is it is characterised in that include:
Sub-frame processing unit, is suitable to the corresponding voice data of acoustical signal of input is carried out sub-frame processing and obtains
Multiple voiced frames;
Computing unit, be suitable to calculate present frame characteristic vector, described characteristic vector include wide window position energy difference,
Narrow window position energy difference and zero passage energy difference;
Matching unit, is suitable to be mated the characteristic vector of present frame with default fuzzy acoustic image rule, obtains
To corresponding speech detection score value, described fuzzy acoustic image rule is that voice training sample training is obtained;
Detector unit, is suitable to when the speech detection score value calculating is more than point threshold, to present frame pair
The voice data answered is detected.
6. speech detection device according to claim 5 is it is characterised in that described fuzzy acoustic image rule is
Combined using neural network algorithm and genetic algorithm and voice training sample is trained obtaining.
7. speech detection device according to claim 5 is it is characterised in that described fuzzy acoustic image rule is wrapped
Include the first kind fuzzy acoustic image rule and Equations of The Second Kind obscures acoustic image rule, described matching unit is current in determination
When the characteristic vector of the frame first kind fuzzy acoustic image rule regular with described default fuzzy acoustic image matches,
The speech detection score value obtaining present frame is 0;In the characteristic vector determining present frame and described default mould
When the fuzzy acoustic image rule of Equations of The Second Kind of paste acoustic image rule matches, obtain the speech detection score value of present frame
For 1.
8. speech detection device according to claim 5 is it is characterised in that described point threshold is current
The signal to noise ratio of the voice data of frame.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510401974.5A CN106340310B (en) | 2015-07-09 | 2015-07-09 | Speech detection method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510401974.5A CN106340310B (en) | 2015-07-09 | 2015-07-09 | Speech detection method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106340310A true CN106340310A (en) | 2017-01-18 |
CN106340310B CN106340310B (en) | 2019-06-07 |
Family
ID=57827293
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510401974.5A Active CN106340310B (en) | 2015-07-09 | 2015-07-09 | Speech detection method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106340310B (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108447501A (en) * | 2018-03-27 | 2018-08-24 | 中南大学 | Pirate video detection method and system based on audio word under a kind of cloud storage environment |
CN108648769A (en) * | 2018-04-20 | 2018-10-12 | 百度在线网络技术(北京)有限公司 | Voice activity detection method, apparatus and equipment |
CN111862985A (en) * | 2019-05-17 | 2020-10-30 | 北京嘀嘀无限科技发展有限公司 | Voice recognition device, method, electronic equipment and storage medium |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040181403A1 (en) * | 2003-03-14 | 2004-09-16 | Chien-Hua Hsu | Coding apparatus and method thereof for detecting audio signal transient |
CN1763844A (en) * | 2004-10-18 | 2006-04-26 | 中国科学院声学研究所 | End-point detecting method, device and speech recognition system based on moving window |
WO2006114101A1 (en) * | 2005-04-26 | 2006-11-02 | Aalborg Universitet | Detection of speech present in a noisy signal and speech enhancement making use thereof |
US20070055504A1 (en) * | 2002-10-29 | 2007-03-08 | Chu Wai C | Optimized windows and interpolation factors, and methods for optimizing windows, interpolation factors and linear prediction analysis in the ITU-T G.729 speech coding standard |
US20100268533A1 (en) * | 2009-04-17 | 2010-10-21 | Samsung Electronics Co., Ltd. | Apparatus and method for detecting speech |
CN101937675A (en) * | 2009-06-29 | 2011-01-05 | 展讯通信(上海)有限公司 | Voice detection method and equipment thereof |
CN102231277A (en) * | 2011-06-29 | 2011-11-02 | 电子科技大学 | Method for protecting mobile terminal privacy based on voiceprint recognition |
CN102405495A (en) * | 2009-03-11 | 2012-04-04 | 谷歌公司 | Audio classification for information retrieval using sparse features |
-
2015
- 2015-07-09 CN CN201510401974.5A patent/CN106340310B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20070055504A1 (en) * | 2002-10-29 | 2007-03-08 | Chu Wai C | Optimized windows and interpolation factors, and methods for optimizing windows, interpolation factors and linear prediction analysis in the ITU-T G.729 speech coding standard |
US20040181403A1 (en) * | 2003-03-14 | 2004-09-16 | Chien-Hua Hsu | Coding apparatus and method thereof for detecting audio signal transient |
CN1763844A (en) * | 2004-10-18 | 2006-04-26 | 中国科学院声学研究所 | End-point detecting method, device and speech recognition system based on moving window |
WO2006114101A1 (en) * | 2005-04-26 | 2006-11-02 | Aalborg Universitet | Detection of speech present in a noisy signal and speech enhancement making use thereof |
CN102405495A (en) * | 2009-03-11 | 2012-04-04 | 谷歌公司 | Audio classification for information retrieval using sparse features |
US20100268533A1 (en) * | 2009-04-17 | 2010-10-21 | Samsung Electronics Co., Ltd. | Apparatus and method for detecting speech |
CN101937675A (en) * | 2009-06-29 | 2011-01-05 | 展讯通信(上海)有限公司 | Voice detection method and equipment thereof |
CN102231277A (en) * | 2011-06-29 | 2011-11-02 | 电子科技大学 | Method for protecting mobile terminal privacy based on voiceprint recognition |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108447501A (en) * | 2018-03-27 | 2018-08-24 | 中南大学 | Pirate video detection method and system based on audio word under a kind of cloud storage environment |
CN108447501B (en) * | 2018-03-27 | 2020-08-18 | 中南大学 | Pirated video detection method and system based on audio words in cloud storage environment |
CN108648769A (en) * | 2018-04-20 | 2018-10-12 | 百度在线网络技术(北京)有限公司 | Voice activity detection method, apparatus and equipment |
CN111862985A (en) * | 2019-05-17 | 2020-10-30 | 北京嘀嘀无限科技发展有限公司 | Voice recognition device, method, electronic equipment and storage medium |
CN111862985B (en) * | 2019-05-17 | 2024-05-31 | 北京嘀嘀无限科技发展有限公司 | Speech recognition device, method, electronic equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN106340310B (en) | 2019-06-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108899044B (en) | Voice signal processing method and device | |
CN110288978B (en) | Speech recognition model training method and device | |
CN113113039B (en) | Noise suppression method and device and mobile terminal | |
CN110310623B (en) | Sample generation method, model training method, device, medium, and electronic apparatus | |
CN109087669B (en) | Audio similarity detection method and device, storage medium and computer equipment | |
CN106486131B (en) | A kind of method and device of speech de-noising | |
WO2020181824A1 (en) | Voiceprint recognition method, apparatus and device, and computer-readable storage medium | |
CN103065631B (en) | A kind of method of speech recognition, device | |
CN103971680B (en) | A kind of method, apparatus of speech recognition | |
CN111210021A (en) | Audio signal processing method, model training method and related device | |
CN105976812A (en) | Voice identification method and equipment thereof | |
CN107610706A (en) | The processing method and processing unit of phonetic search result | |
CN110335593A (en) | Sound end detecting method, device, equipment and storage medium | |
CN110600008A (en) | Voice wake-up optimization method and system | |
CN107274892A (en) | Method for distinguishing speek person and device | |
CN109688271A (en) | The method, apparatus and terminal device of contact information input | |
CN106024017A (en) | Voice detection method and device | |
CN110931028A (en) | Voice processing method and device and electronic equipment | |
CN106340310A (en) | Speech detection method and device | |
CN110895930B (en) | Voice recognition method and device | |
WO2024041512A1 (en) | Audio noise reduction method and apparatus, and electronic device and readable storage medium | |
CN113064118A (en) | Sound source positioning method and device | |
CN116364107A (en) | Voice signal detection method, device, equipment and storage medium | |
CN105788590A (en) | Speech recognition method, device, mobile terminal | |
CN110537223B (en) | Voice detection method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |