CN106340310A - Speech detection method and device - Google Patents

Speech detection method and device Download PDF

Info

Publication number
CN106340310A
CN106340310A CN201510401974.5A CN201510401974A CN106340310A CN 106340310 A CN106340310 A CN 106340310A CN 201510401974 A CN201510401974 A CN 201510401974A CN 106340310 A CN106340310 A CN 106340310A
Authority
CN
China
Prior art keywords
acoustic image
speech detection
image rule
present frame
characteristic vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510401974.5A
Other languages
Chinese (zh)
Other versions
CN106340310B (en
Inventor
孙廷玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Spreadtrum Communications Shanghai Co Ltd
Original Assignee
Spreadtrum Communications Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Spreadtrum Communications Shanghai Co Ltd filed Critical Spreadtrum Communications Shanghai Co Ltd
Priority to CN201510401974.5A priority Critical patent/CN106340310B/en
Publication of CN106340310A publication Critical patent/CN106340310A/en
Application granted granted Critical
Publication of CN106340310B publication Critical patent/CN106340310B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Image Analysis (AREA)

Abstract

The invention provides a speech detection method and device. The speech detection method comprises the steps that sound data corresponding to an input sound signal are framed to acquire a number of sound frames; the eigenvector of the current frame is calculated, wherein the eigenvector comprises the wide window level energy difference, the narrow window level energy difference and the zero-crossing energy difference; the eigenvector of the current frame and a preset fuzzy audio and video rule are matched to acquire the corresponding speech detection score, wherein the fuzzy audio and video rule is acquired by training a sound training sample; and if the calculated speech detection score is greater than a score threshold, sound data corresponding to the current frame are detected. According to the scheme, the speech detection speed can be improved, and the speech detection cost is reduced.

Description

Speech detection method and device
Technical field
The present invention relates to speech detection technical field, more particularly to a kind of speech detection method and device.
Background technology
Mobile terminal, refers to computer equipment used in movement.With integrated circuit technique Develop rapidly, mobile terminal has had powerful disposal ability, mobile terminal is from simply logical Words instrument is changed into an integrated information processing platform, and this also increased broader development to mobile terminal Space.
Traditional mobile terminal, it usually needs user's manual operation, so that user concentrates certain note Meaning power.Speech detection method and always listen system (always listening system) using so that can Mobile terminal is carried out with non-manual activation and operates.When described always listen system detectio to arrive acoustical signal when, Speech detection system will activate, and the acoustical signal detecting is detected.Then, mobile terminal Will be according to the detected corresponding operation of acoustical signal execution, for example, when user input " dials xx Mobile phone " voice when, mobile terminal just can be to the voice messaging of " dialing the mobile phone of xx " of user input Detected, and after correct detection, obtained the information of the phone number of xx from mobile terminal, and dialled Beat.
But, speech detection method in prior art, generally use complex mathematical model and come to defeated The acoustical signal entering is detected, accordingly, there exist that detection speed is slow and the problem of high cost.
Content of the invention
The problem that the embodiment of the present invention solves is how to improve the speed of speech detection, and reduces speech detection Cost.
For solving the above problems, embodiments provide a kind of speech detection method, described voice inspection Survey method includes:
Sub-frame processing is carried out to the corresponding voice data of acoustical signal inputting and obtains multiple voiced frames;
Calculate the characteristic vector of present frame, described characteristic vector includes wide window position energy difference, narrow window potential energy amount Difference and zero passage energy difference;
The characteristic vector of present frame is mated with default fuzzy acoustic image rule, is obtained corresponding voice Detection score value, described fuzzy acoustic image rule is that voice training sample training is obtained;
When the speech detection score value calculating is more than point threshold, the corresponding voice data to present frame Detected.
Alternatively, described fuzzy acoustic image rule is to be combined to sound using neural network algorithm and genetic algorithm Training sample is trained obtaining.
Alternatively, described fuzzy acoustic image rule includes the first kind fuzzy acoustic image rule and Equations of The Second Kind obscures acoustic image Rule, the described characteristic vector by present frame is mated with default fuzzy acoustic image rule, is corresponded to Speech detection score value, comprising:
When the characteristic vector determining present frame obscures acoustic image rule with the first kind of described default fuzzy acoustic image rule When then matching, the speech detection score value of the present frame obtaining is 0;
When the characteristic vector determining present frame obscures acoustic image rule with the Equations of The Second Kind of described default fuzzy acoustic image rule When then matching, the speech detection score value of the present frame obtaining is 1.
Alternatively, described point threshold is the signal to noise ratio of the voice data of present frame.
The embodiment of the present invention additionally provides a kind of speech detection device, and described device includes:
Sub-frame processing unit, is suitable to the corresponding voice data of acoustical signal of input is carried out sub-frame processing and obtains To multiple voiced frames;
Computing unit, is suitable to calculate the characteristic vector of present frame, described characteristic vector includes wide window potential energy amount Poor, narrow window position energy difference and zero passage energy difference;
Matching unit, is suitable to be mated the characteristic vector of present frame with default fuzzy acoustic image rule, Obtain corresponding speech detection score value, described fuzzy acoustic image rule is that voice training sample training is obtained;
Detector unit, is suitable to when the speech detection score value calculating is more than point threshold, to present frame Corresponding voice data is detected.
Alternatively, described fuzzy acoustic image rule is to be combined to sound using neural network algorithm and genetic algorithm Training sample is trained obtaining.
Alternatively, described fuzzy acoustic image rule includes the first kind fuzzy acoustic image rule and Equations of The Second Kind obscures acoustic image Rule, described matching unit is determining the of the characteristic vector of present frame and described default fuzzy acoustic image rule When the fuzzy acoustic image rule of one class matches, the speech detection score value obtaining present frame is 0;Determining present frame The Equations of The Second Kind of characteristic vector and described default fuzzy acoustic image rule when obscuring acoustic image rule and matching, obtain The speech detection score value of present frame is 1.
Alternatively, described point threshold is the signal to noise ratio of the voice data of present frame.
Compared with prior art, technical scheme has the advantage that
Above-mentioned scheme, calculates the corresponding characteristic vector of each voiced frame by default fuzzy acoustic image rule Speech detection score value, to determine whether the acoustical signal of input to be detected, due to described fuzzy Acoustic image rule is used only for detecting in present frame whether include voice messaging, and wraps without being concerned about in present frame The particular content of the speech data including, it is thus possible to improve the speed of speech detection, reduces speech detection Cost.
Further, the signal to noise ratio of the speech detection score value of calculated present frame and present frame is carried out Relatively, when the speech detection score value determining present frame is more than the signal to noise ratio of present frame, determine in present frame Including speech data, because the signal to noise ratio of present frame can reflect that the background that present frame includes is made an uproar exactly The information of sound, it is thus possible to improve the accuracy rate of speech detection, the experience of lifting user.
Brief description
Fig. 1 is the flow chart of one of embodiment of the present invention speech detection method;
Fig. 2 is the flow chart of another kind of speech detection method in the embodiment of the present invention;
Fig. 3 is the structural representation of one of embodiment of the present invention speech detection device.
Specific embodiment
Of the prior art system is always listened to adopt voice activity detection (voice activity detection, vad) Technology sound is detected.But, existing voice activity detection method, it usually needs train The voice data of input is detected to the mathematical model for speech detection, due to described mathematical modulo Type more sends out miscellaneous so that the process of speech detection is complex, accordingly, there exist detection speed slowly and becomes This high problem.
For solving the above-mentioned problems in the prior art, the technical scheme that the embodiment of the present invention adopts is passed through Default fuzzy acoustic image rule calculates the speech detection score value of the corresponding characteristic vector of each voiced frame, comes really Determine whether the acoustical signal inputting to be detected, the speed of speech detection can be improved, reduce voice The cost of detection.
Understandable for enabling the above objects, features and advantages of the present invention to become apparent from, below in conjunction with the accompanying drawings The specific embodiment of the present invention is described in detail.
The flow chart that Fig. 1 shows one of embodiment of the present invention speech detection method.As shown in Figure 1 Speech detection method, may include that
Step s101: sub-frame processing is carried out to the corresponding voice data of acoustical signal inputting and obtains multiple sound Sound frame;
In being embodied as, it is possible to use the acoustical signal that mike (mic) comes to external world is acquired, When collecting acoustical signal, the acoustical signal of collection is processed, obtains corresponding voice data, And the voice data obtaining is divided into plural frame.
Step s102: calculate present frame characteristic vector, described characteristic vector include wide window position energy difference, Narrow window potential energy amount and zero passage energy difference.
In being embodied as, when the voice data obtaining being carried out sub-frame processing obtaining plural frame, Order according to the time calculates the corresponding characteristic vector of each frame frame by frame, and the characteristic vector according to each frame Determine whether to carry out speech detection to corresponding frame.Wherein, for the ease of description, calculating each frame by frame During the characteristic vector of frame, currently can calculated the frame of characteristic vector as described present frame.
Step s103: the characteristic vector of present frame is mated with default fuzzy acoustic image rule, obtains Corresponding speech detection score value.
In being embodied as, described fuzzy acoustic image rule is that voice training sample is trained obtaining, institute State fuzzy acoustic image rule and include a plurality of acoustic image rule, the as set of acoustic image rule.Wherein, each sound As rule includes corresponding decision-making score value, on the characteristic vector determining present frame and described fuzzy acoustic image rule During any bar acoustic image rule match in then, the fuzzy acoustic image rule that matches with the characteristic vector of present frame In decision-making score value, the as speech detection score value of present frame.
Step s104: when the speech detection score value calculating is more than point threshold, present frame is corresponded to Voice data detected.
In being embodied as, described point threshold can be for immobilizing it is also possible to difference according to each frame Different and change, those skilled in the art can be configured according to the actual needs.
Below in conjunction with Fig. 2, the speech detection method in the embodiment of the present invention is further described in detail.
Step s201: sub-frame processing is carried out to the corresponding voice data of acoustical signal inputting and obtains multiple sound Sound frame.
In being embodied as, it is possible to use the acoustical signal that mike (mic) comes to external world is acquired, When collecting acoustical signal, the acoustical signal of collection is processed, obtains corresponding voice data, And the voice data obtaining is divided into plural frame.
Step s202: calculate the characteristic vector of present frame.
In being embodied as, the characteristic vector of present frame includes wide window position energy difference (wide-window Energy difference), narrow window position energy difference (narrow-window energy difference) and zero passage Energy difference (zero-crossing difference), wherein, wide window position energy difference, narrow window position energy difference and mistake Zero energy difference can be calculated by below equation respectively:
δ ew=ew-ewa (1)
δ en=en-ena (2)
δ z=z-za (3)
Wherein, δ ew represents described width window position energy difference, and ew represents the wide window potential energy amount of present frame, ewa Represent the long-term mean value of wide window potential energy amount, δ en represents described narrow window position energy difference, and en represents the narrow of present frame Window potential energy amount, ena represents the long-term mean value of narrow window potential energy amount, and δ z represents described zero passage energy difference, and z represents The zero energy excessively of present frame, za represented the long-term mean value of zero energy.
Wherein, due to wide window position energy difference δ ew, the calculating of narrow window position energy difference δ en and zero passage energy difference δ z All carry out in time domain rather than frequency domain, therefore, and then computing resource can be saved, improve speech detection Speed.
It is to be herein pointed out for the qualitative change reflecting background noise, wide window potential energy amount long-term Average ewa, long-term mean value ena of narrow window potential energy amount and long-term mean value zna crossing zero energy, only in noise It is updated when appearance.
Step s203: the characteristic vector of present frame is mated with default fuzzy acoustic image rule, and sentences The characteristic vector of disconnected present frame is matched also with the fuzzy acoustic image rule of the first kind in described fuzzy acoustic image rule It is to obscure acoustic image rule with Equations of The Second Kind to match;When the characteristic vector determining present frame and described first kind mould When paste rule matches, execution step s204;When the characteristic vector determining present frame and described Equations of The Second Kind mould When paste rule matches, execution step s205;.
In being embodied as, the grammatical representation in human language defines empirical rule, by described experience Regular people can carry out language performance using the experience of itself with problem.Pre- in the embodiment of the present invention If the acoustic image rule that exactly manually obtained according to the cognitive inspiration standard of the problem that comes from of fuzzy acoustic image rule Set then.
In being embodied as, combined using genetic algorithm and neural algorithm and voice training sample is instructed Get fuzzy acoustic image rule.Wherein, genetic algorithm can search out the genetic function with several variables Globally optimal solution, there is larger motility, and insensitive to local Optimal solution problem, therefore, tool There is good robustness.Neural algorithm then can reduce execution time and can reduce the mistake of genetic algorithm Rate.Meanwhile, genetic algorithm and neural algorithm are less in the operand that sample sound is trained, permissible Save computing resource.
Using genetic algorithm and neural algorithm sample sound is trained obtain a series of with described feature to Each variable (wide window position energy difference δ ew, narrow window position energy difference δ en and zero passage energy difference δ z) phase in amount The acoustic image sequence of association.Then, more manually it is that described output sequence interpolation decision-making score value obtains finally Fuzzy acoustic image rule, wherein, when the corresponding voice training sample of described acoustic image sequence includes voice, It is then that the decision-making score value that described acoustic image sequence is added obtains first kind broad image rule for 1, conversely, then The decision-making score value adding for described acoustic image sequence is 0, obtains Equations of The Second Kind and obscures acoustic image rule.Table 1 shows The example of the modulus acoustic image rule in present invention enforcement:
Table 1
Wherein, when corresponding characteristic vector corresponding ultrasonogram picture is zero, show corresponding characteristic vector Numerical value larger;When corresponding characteristic vector corresponding ultrasonogram picture be when, show corresponding feature to The numerical value of amount is medium;When corresponding characteristic vector corresponding ultrasonogram picture is △, show corresponding spy The numerical value levying vector is less.
In being embodied as, when the characteristic vector of the present frame calculating is located at interval accordingly, then with Corresponding acoustic image image identification forms corresponding acoustic image sequence.Then, by corresponding for present frame acoustic image sequence Contrasted with default fuzzy acoustic image rule.
Step s204: the first kind in the characteristic vector determining present frame is regular with default fuzzy acoustic image Fuzzy acoustic image rule matches, and the speech detection score value of output present frame is 0.
In being embodied as, as shown in table 1, the decision-making score value that the first kind obscures in acoustic image rule is 0, because This, when the characteristic vector determining present frame obscures acoustic image rule with the first kind in default fuzzy acoustic image rule Then match, the speech detection score value of output present frame is 0.
Step s205: the Equations of The Second Kind in the characteristic vector determining present frame is regular with default fuzzy acoustic image When fuzzy acoustic image rule matches, the speech detection score value of output present frame is 1.
In being embodied as, as shown in table 1, the decision-making score value that Equations of The Second Kind obscures in acoustic image rule is 1, because This, when the characteristic vector determining present frame obscures acoustic image rule with the Equations of The Second Kind in default fuzzy acoustic image rule Then match, the speech detection score value of output present frame is 1.
Step s206: the speech detection score value of present frame and the signal to noise ratio of present frame are compared, judge Whether the speech detection score value of present frame is more than the signal to noise ratio of present frame, when judged result is to be, permissible Execution step s207, conversely, then do not execute any operation.
In being embodied as, signal to noise ratio (snr) data of each frame reflects voice signal and noise Ratio, it meets following condition:
(1) it is evenly distributed in all possible numerical intervals;
(2) slowly varying between adjacent frame;
(3) different optimal thresholds has different numerical value;
(4) magnitude can be minimized according to signal to noise ratio.
Therefore, in view of signal to noise ratio meets above-mentioned condition, using present frame signal to noise ratio as point threshold Carry out contrast to determine whether to carry out speech detection to present frame with the speech detection score value of present frame, permissible Improve the accuracy of speech detection.
Step s207: speech detection is carried out to present frame.
Below in conjunction with Fig. 3 to the speech detection method in the embodiment of the present invention corresponding speech detection device Make further details of introduction.
Fig. 3 shows the structural representation of one of embodiment of the present invention speech detection device, such as Fig. 3 Shown speech detection device 300, can include sub-frame processing unit 301, computing unit 302, coupling Unit 303 and detector unit 304, wherein:
Sub-frame processing unit 301, is suitable to carry out sub-frame processing to the corresponding voice data of acoustical signal of input Obtain multiple voiced frames.
Computing unit 302, is suitable to calculate the characteristic vector of present frame, described characteristic vector includes wide window potential energy Measure poor, narrow window position energy difference and zero passage energy difference.
Matching unit 303, is suitable to be mated the characteristic vector of present frame with default fuzzy acoustic image rule, Obtain corresponding speech detection score value, described fuzzy acoustic image rule is that voice training sample training is obtained. Wherein, described fuzzy acoustic image rule is to be combined to voice training sample using neural network algorithm and genetic algorithm Originally it is trained obtaining.
In being embodied as, described fuzzy acoustic image rule includes the first kind and obscures acoustic image rule and Equations of The Second Kind mould Paste acoustic image rule, described matching unit 303 is in the characteristic vector determining present frame and described default fuzzy sound When matching as the fuzzy acoustic image rule of the first kind of rule, the speech detection score value obtaining present frame is 0;? Determine that the characteristic vector of present frame and the Equations of The Second Kind of described default fuzzy acoustic image rule obscure acoustic image rule phase Timing, the speech detection score value obtaining present frame is 1.
Detector unit 304, is suitable to when the speech detection score value calculating is more than point threshold, to current The corresponding voice data of frame is detected.Wherein, in order to improve the accuracy of speech detection, described score value Threshold value is the signal to noise ratio of the voice data of present frame.
One of ordinary skill in the art will appreciate that all or part step in the various methods of above-described embodiment Suddenly the program that can be by complete come the hardware to instruct correlation, and this program can be stored in computer-readable In storage medium, storage medium may include that rom, ram, disk or CD etc..
Above the method and system of the embodiment of the present invention are had been described in detail, the present invention is not limited to this. Any those skilled in the art, without departing from the spirit and scope of the present invention, all can make various change with Modification, therefore protection scope of the present invention should be defined by claim limited range.

Claims (8)

1. a kind of speech detection method is it is characterised in that include:
Sub-frame processing is carried out to the corresponding voice data of acoustical signal inputting and obtains multiple voiced frames;
Calculate the characteristic vector of present frame, described characteristic vector includes wide window position energy difference, narrow window position energy difference With zero passage energy difference;
The characteristic vector of present frame is mated with default fuzzy acoustic image rule, is obtained corresponding voice inspection Survey score value, described fuzzy acoustic image rule is that voice training sample training is obtained;
When the speech detection score value calculating is more than point threshold, to present frame, corresponding voice data enters Row detection.
2. speech detection method according to claim 1 is it is characterised in that described fuzzy acoustic image rule is Combined using neural network algorithm and genetic algorithm and voice training sample is trained obtaining.
3. speech detection method according to claim 1 is it is characterised in that described fuzzy acoustic image rule is wrapped Include the first kind and obscure acoustic image rule and Equations of The Second Kind fuzzy acoustic image rule, the described characteristic vector by present frame Mated with default fuzzy acoustic image rule, obtained corresponding speech detection score value, comprising:
When the characteristic vector determining present frame obscures acoustic image rule with the first kind of described default fuzzy acoustic image rule When then matching, the speech detection score value of the present frame obtaining is 0;
When the characteristic vector determining present frame obscures acoustic image rule with the Equations of The Second Kind of described default fuzzy acoustic image rule When then matching, the speech detection score value of the present frame obtaining is 1.
4. speech detection method according to claim 1 is it is characterised in that described point threshold is current The signal to noise ratio of the voice data of frame.
5. a kind of speech detection device is it is characterised in that include:
Sub-frame processing unit, is suitable to the corresponding voice data of acoustical signal of input is carried out sub-frame processing and obtains Multiple voiced frames;
Computing unit, be suitable to calculate present frame characteristic vector, described characteristic vector include wide window position energy difference, Narrow window position energy difference and zero passage energy difference;
Matching unit, is suitable to be mated the characteristic vector of present frame with default fuzzy acoustic image rule, obtains To corresponding speech detection score value, described fuzzy acoustic image rule is that voice training sample training is obtained;
Detector unit, is suitable to when the speech detection score value calculating is more than point threshold, to present frame pair The voice data answered is detected.
6. speech detection device according to claim 5 is it is characterised in that described fuzzy acoustic image rule is Combined using neural network algorithm and genetic algorithm and voice training sample is trained obtaining.
7. speech detection device according to claim 5 is it is characterised in that described fuzzy acoustic image rule is wrapped Include the first kind fuzzy acoustic image rule and Equations of The Second Kind obscures acoustic image rule, described matching unit is current in determination When the characteristic vector of the frame first kind fuzzy acoustic image rule regular with described default fuzzy acoustic image matches, The speech detection score value obtaining present frame is 0;In the characteristic vector determining present frame and described default mould When the fuzzy acoustic image rule of Equations of The Second Kind of paste acoustic image rule matches, obtain the speech detection score value of present frame For 1.
8. speech detection device according to claim 5 is it is characterised in that described point threshold is current The signal to noise ratio of the voice data of frame.
CN201510401974.5A 2015-07-09 2015-07-09 Speech detection method and device Active CN106340310B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510401974.5A CN106340310B (en) 2015-07-09 2015-07-09 Speech detection method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510401974.5A CN106340310B (en) 2015-07-09 2015-07-09 Speech detection method and device

Publications (2)

Publication Number Publication Date
CN106340310A true CN106340310A (en) 2017-01-18
CN106340310B CN106340310B (en) 2019-06-07

Family

ID=57827293

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510401974.5A Active CN106340310B (en) 2015-07-09 2015-07-09 Speech detection method and device

Country Status (1)

Country Link
CN (1) CN106340310B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108447501A (en) * 2018-03-27 2018-08-24 中南大学 Pirate video detection method and system based on audio word under a kind of cloud storage environment
CN108648769A (en) * 2018-04-20 2018-10-12 百度在线网络技术(北京)有限公司 Voice activity detection method, apparatus and equipment
CN111862985A (en) * 2019-05-17 2020-10-30 北京嘀嘀无限科技发展有限公司 Voice recognition device, method, electronic equipment and storage medium

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040181403A1 (en) * 2003-03-14 2004-09-16 Chien-Hua Hsu Coding apparatus and method thereof for detecting audio signal transient
CN1763844A (en) * 2004-10-18 2006-04-26 中国科学院声学研究所 End-point detecting method, device and speech recognition system based on moving window
WO2006114101A1 (en) * 2005-04-26 2006-11-02 Aalborg Universitet Detection of speech present in a noisy signal and speech enhancement making use thereof
US20070055504A1 (en) * 2002-10-29 2007-03-08 Chu Wai C Optimized windows and interpolation factors, and methods for optimizing windows, interpolation factors and linear prediction analysis in the ITU-T G.729 speech coding standard
US20100268533A1 (en) * 2009-04-17 2010-10-21 Samsung Electronics Co., Ltd. Apparatus and method for detecting speech
CN101937675A (en) * 2009-06-29 2011-01-05 展讯通信(上海)有限公司 Voice detection method and equipment thereof
CN102231277A (en) * 2011-06-29 2011-11-02 电子科技大学 Method for protecting mobile terminal privacy based on voiceprint recognition
CN102405495A (en) * 2009-03-11 2012-04-04 谷歌公司 Audio classification for information retrieval using sparse features

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070055504A1 (en) * 2002-10-29 2007-03-08 Chu Wai C Optimized windows and interpolation factors, and methods for optimizing windows, interpolation factors and linear prediction analysis in the ITU-T G.729 speech coding standard
US20040181403A1 (en) * 2003-03-14 2004-09-16 Chien-Hua Hsu Coding apparatus and method thereof for detecting audio signal transient
CN1763844A (en) * 2004-10-18 2006-04-26 中国科学院声学研究所 End-point detecting method, device and speech recognition system based on moving window
WO2006114101A1 (en) * 2005-04-26 2006-11-02 Aalborg Universitet Detection of speech present in a noisy signal and speech enhancement making use thereof
CN102405495A (en) * 2009-03-11 2012-04-04 谷歌公司 Audio classification for information retrieval using sparse features
US20100268533A1 (en) * 2009-04-17 2010-10-21 Samsung Electronics Co., Ltd. Apparatus and method for detecting speech
CN101937675A (en) * 2009-06-29 2011-01-05 展讯通信(上海)有限公司 Voice detection method and equipment thereof
CN102231277A (en) * 2011-06-29 2011-11-02 电子科技大学 Method for protecting mobile terminal privacy based on voiceprint recognition

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108447501A (en) * 2018-03-27 2018-08-24 中南大学 Pirate video detection method and system based on audio word under a kind of cloud storage environment
CN108447501B (en) * 2018-03-27 2020-08-18 中南大学 Pirated video detection method and system based on audio words in cloud storage environment
CN108648769A (en) * 2018-04-20 2018-10-12 百度在线网络技术(北京)有限公司 Voice activity detection method, apparatus and equipment
CN111862985A (en) * 2019-05-17 2020-10-30 北京嘀嘀无限科技发展有限公司 Voice recognition device, method, electronic equipment and storage medium
CN111862985B (en) * 2019-05-17 2024-05-31 北京嘀嘀无限科技发展有限公司 Speech recognition device, method, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN106340310B (en) 2019-06-07

Similar Documents

Publication Publication Date Title
CN108899044B (en) Voice signal processing method and device
CN110288978B (en) Speech recognition model training method and device
CN113113039B (en) Noise suppression method and device and mobile terminal
CN110310623B (en) Sample generation method, model training method, device, medium, and electronic apparatus
CN109087669B (en) Audio similarity detection method and device, storage medium and computer equipment
CN106486131B (en) A kind of method and device of speech de-noising
WO2020181824A1 (en) Voiceprint recognition method, apparatus and device, and computer-readable storage medium
CN103065631B (en) A kind of method of speech recognition, device
CN103971680B (en) A kind of method, apparatus of speech recognition
CN111210021A (en) Audio signal processing method, model training method and related device
CN105976812A (en) Voice identification method and equipment thereof
CN107610706A (en) The processing method and processing unit of phonetic search result
CN110335593A (en) Sound end detecting method, device, equipment and storage medium
CN110600008A (en) Voice wake-up optimization method and system
CN107274892A (en) Method for distinguishing speek person and device
CN109688271A (en) The method, apparatus and terminal device of contact information input
CN106024017A (en) Voice detection method and device
CN110931028A (en) Voice processing method and device and electronic equipment
CN106340310A (en) Speech detection method and device
CN110895930B (en) Voice recognition method and device
WO2024041512A1 (en) Audio noise reduction method and apparatus, and electronic device and readable storage medium
CN113064118A (en) Sound source positioning method and device
CN116364107A (en) Voice signal detection method, device, equipment and storage medium
CN105788590A (en) Speech recognition method, device, mobile terminal
CN110537223B (en) Voice detection method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant