CN106297795A - Audio recognition method and device - Google Patents

Audio recognition method and device Download PDF

Info

Publication number
CN106297795A
CN106297795A CN201510271782.7A CN201510271782A CN106297795A CN 106297795 A CN106297795 A CN 106297795A CN 201510271782 A CN201510271782 A CN 201510271782A CN 106297795 A CN106297795 A CN 106297795A
Authority
CN
China
Prior art keywords
frame
current sound
voiced
sound frame
calculate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510271782.7A
Other languages
Chinese (zh)
Other versions
CN106297795B (en
Inventor
孙廷玮
林福辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Spreadtrum Communications Shanghai Co Ltd
Spreadtrum Communications Inc
Original Assignee
Spreadtrum Communications Shanghai Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Spreadtrum Communications Shanghai Co Ltd filed Critical Spreadtrum Communications Shanghai Co Ltd
Priority to CN201510271782.7A priority Critical patent/CN106297795B/en
Priority to CN201910945249.2A priority patent/CN110895930B/en
Publication of CN106297795A publication Critical patent/CN106297795A/en
Application granted granted Critical
Publication of CN106297795B publication Critical patent/CN106297795B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Telephone Function (AREA)
  • Telephonic Communication Services (AREA)

Abstract

A kind of audio recognition method and device, described audio recognition method includes: the voice data of acquisition is carried out sub-frame processing, to obtain at least two voiced frame;The voiced frame meeting condition of choosing is chosen from described at least two voice data frame;Calculate the speech recognition score value of the described voiced frame meeting condition of choosing;When calculated speech recognition score value is more than the point threshold preset, the voice data of described acquisition is carried out speech recognition.Above-mentioned scheme can save calculating resource, promotes the speed of speech recognition.

Description

Audio recognition method and device
Technical field
The invention belongs to technical field of voice recognition, particularly relate to a kind of audio recognition method and device.
Background technology
Mobile terminal, refers to the computer equipment that can use in movement, include in a broad aspect mobile phone, Notebook, panel computer, POS, vehicle-mounted computer etc..Along with developing rapidly of integrated circuit technique, move Dynamic terminal has had powerful disposal ability, and mobile terminal becomes one from simple call instrument Individual integrated information processing platform, this also adds broader development space to mobile terminal.
The use of mobile terminal, it usually needs user concentrates certain attention.Mobile terminal of today sets For being equipped with touch screen, user needs to touch described touch screen, to perform corresponding operation.But, When user cannot touch mobile terminal device, operation mobile terminal will become highly inconvenient.Such as, When having carried article during user drives vehicle or hands when.
Audio recognition method and always listen the use of system (Always Listening System) so that permissible Mobile terminal is carried out non-manual activation and operation.When described always listen system acoustical signal to be detected time, language Sound identification system will activate, and is identified the acoustical signal detected, afterwards, mobile terminal is just Corresponding operation can be performed, such as, when user's input " dials the hands of XX according to the acoustical signal identified Machine " voice time, the voice messaging of " dialing the mobile phone of XX " of user's input just can be carried out by mobile terminal Identify, and after correct identification, from mobile terminal, obtain the information of the phone number of XX, and dial.
But, audio recognition method of the prior art, when carrying out speech recognition, there is amount of calculation Greatly, the problem that recognition speed is slow.
Summary of the invention
The problem that the embodiment of the present invention solves is the calculating resource saving speech recognition, improves speech recognition Speed.
For solving the problems referred to above, embodiments providing a kind of audio recognition method, described voice is known Other method includes:
The voice data of acquisition is carried out sub-frame processing, to obtain at least two voiced frame;
The voiced frame meeting condition of choosing is chosen from described at least two voice data frame;
Calculate the speech recognition score value of the described voiced frame meeting condition of choosing;
When calculated speech recognition score value is more than the point threshold preset, the sound to described acquisition Data carry out speech recognition.
Alternatively, described choose to meet from described at least two voice data frame choose the voiced frame of condition, Including:
Calculate the rear signal to noise ratio of current sound frame;
After between the previous voiced frame of rear signal-to-noise ratio computation and current sound frame according to described current sound frame Test signal to noise ratio weight energy distance;
Calculate the first selected threshold of current sound frame;
Posteriori SNR weight energy distance between described previous voiced frame and current sound frame is more than working as During the first selected threshold of front voiced frame, then choose current sound frame.
Alternatively, the rear signal to noise ratio of employing formula below calculating current sound frame:
Wherein, SNRpostT () represents the rear signal to noise ratio of current sound frame, T represents the position sequence of current sound frame, and E (t) represents the noisy speech energy of current sound frame, EnoiseT () represents The noise energy of current sound frame.
Alternatively, formula below is used to calculate the posteriority noise between previous voiced frame and current sound frame Than weight energy distance:
D (t)=| logE (t)-logE (t-1) | × SNRpost(t);Wherein, D (t) represents previous sound Posteriori SNR weight energy distance between frame and current sound frame, logE (t) represents current sound frame Logarithmic energy, logE (t-1) represents the logarithmic energy of previous voiced frame.
Alternatively, the first selected threshold of employing formula below calculating current sound frame:
T (t)=Da(t)×f(logEnoise(t)), wherein, T (t) represents that the second of current sound frame chooses threshold Value, DaT () represents the posteriori SNR weight energy distance average of the continuous voiced frame before current sound frame, f(logEnoise(t)) it is S type function.
Alternatively, the described sound choosing the satisfied condition of choosing preset from the multiple voice data frames obtained Sound frame, including:
Calculate the rear signal to noise ratio of current sound frame;
When determining calculated rear signal to noise ratio more than the second selected threshold preset, choose current sound Frame.
Alternatively, the rear signal to noise ratio of employing formula below calculating current sound frame:
Wherein, SNRpostT () represents the rear signal to noise ratio of current sound frame, T represents the position sequence of current sound frame, and E (t) represents the noisy speech energy of current sound frame, Enoise(t) table Show the noise energy of current sound frame.
Alternatively, the speech recognition using formula below to calculate the described voiced frame meeting and choosing condition divides Value, including:
M n = 1 n - + n + Σ m = n - n + f ( α × ( n + m ) ) , Wherein, MnRepresent calculated speech recognition Score value, n represents the position sequence of current sound frame, n-The position sequence of start sound frame in voiced frame selected by expression, n+Terminating the position sequence of voiced frame in voiced frame selected by expression, α represents default adjustment parameter, and m represents Along with the positive integer of selected sound framing bit sequence change, f (α × (n+m)) represents moving average method prediction Model.
The embodiment of the present invention additionally provides a kind of speech recognition equipment, and described speech recognition equipment includes:
Sub-frame processing unit, is suitable to the voice data of acquisition be carried out sub-frame processing, to obtain at least two Voiced frame;
Choose unit, be suitable to from described at least two voice data frame, choose the sound meeting condition of choosing Frame;
Computing unit, is suitable to calculate the speech recognition score value of the described voiced frame meeting condition of choosing;
Recognition unit, is suitable to when calculated speech recognition score value is more than the point threshold preset, right The voice data of described acquisition carries out speech recognition.
Alternatively, choose unit described in and be suitable to calculate the rear signal to noise ratio of current sound frame;According to described currently Posteriori SNR weight energy between the previous voiced frame of rear signal-to-noise ratio computation and the current sound frame of voiced frame Distance;Calculate the second selected threshold of current sound frame;When described previous voiced frame and current sound frame it Between posteriori SNR weight energy distance more than the second selected threshold of current sound frame time, then choose and work as Front voiced frame.
Alternatively, choose unit described in and be suitable to calculate the rear signal to noise ratio of current sound frame;Calculate when determining When the rear signal to noise ratio arrived is more than the first selected threshold preset, choose current sound frame.
Compared with prior art, technical scheme has the advantage that
Meet pre-conditioned voiced frame carry out speech recognition by choosing from voice data to be identified, The non-speech data frame not including voice messaging can be got rid of, and only selected voiced frame is all carried out language Sound identifying processing, therefore, it can save calculating resource, promotes the speed of speech recognition, promote user's Experience.
Further, according to the rear signal to noise ratio of calculated current sound frame, it is calculated current sound The posteriori SNR weight energy distance of frame and previous voiced frame, and calculated posteriori SNR is weighed Beijing South Maxpower Technology Co. Ltd's span compares from the second selected threshold with calculated current sound frame, and only calculates The rear signal to noise ratio of current sound frame is compared, and more will can not include the non-speech sounds frame of voice messaging Foreclose, therefore, it can save calculating resource further, promote the speed of speech recognition.
Further, by only by the rear signal to noise ratio of calculated current sound frame and first preset Selected threshold compares, and can be got rid of by the voiced frame more not including voice messaging, it is possible to joint Save and calculate resource, therefore, it can improve further the speed of speech recognition.
Accompanying drawing explanation
Fig. 1 is the flow chart of a kind of audio recognition method in the embodiment of the present invention;
Fig. 2 is the flow chart of the another kind of audio recognition method in the embodiment of the present invention;
Fig. 3 is the flow chart of another audio recognition method in the embodiment of the present invention;
Fig. 4 is the structural representation of a kind of speech recognition equipment in the embodiment of the present invention.
Detailed description of the invention
Audio recognition method of the prior art, when carrying out speech recognition, generally with fixing frame per second (Fixed Frame Rate, FFR) voice data to be identified is divided the multiple voiced frames obtained carry out speech recognition Process.Voice messaging is not included owing to dividing in some voiced frame in the multiple voiced frames obtained, right These do not include that the non-speech frame of voice messaging carries out voice recognition processing, have no not only for speech recognition Meaning, but also calculating resource can be wasted, reduce the recognition speed of voice.
For solving the above-mentioned problems in the prior art, the technical scheme that the embodiment of the present invention uses is passed through Choose from voice data to be identified and meet pre-conditioned voiced frame and carry out speech recognition, can save Calculate resource, promote the speed of speech recognition, promote the experience of user.
Understandable, below in conjunction with the accompanying drawings for enabling the above-mentioned purpose of the present invention, feature and advantage to become apparent from The specific embodiment of the present invention is described in detail.
Fig. 1 shows the flow chart of a kind of audio recognition method in the embodiment of the present invention.As shown in Figure 1 Audio recognition method, may include that
Step S101: the voice data of acquisition is carried out sub-frame processing, to obtain at least two voiced frame.
In being embodied as, can use Mike that the acoustical signal of input is carried out Real-time Collection.When adopting When collection is to voice data, is processed by corresponding, the acoustical signal of input is converted into the sound of correspondence Data.Afterwards, the voice data being converted to can be carried out sub-frame processing, thus obtain at least two Voiced frame.
Step S102: choose the voiced frame meeting condition of choosing from described at least two voice data frame.
Existing audio recognition method, when carrying out speech recognition, it usually needs divide voice data To described at least two voiced frame all carry out corresponding voice recognition processing.But, it is not each sound Sound frame all includes voice messaging, and the voiced frame not including voice messaging is carried out voice recognition processing and incites somebody to action Resource can be wasted, and the speed of speech recognition can be reduced.Therefore, in embodiments of the present invention, first Selected part voiced frame from the voiced frame dividing at least two obtained, does not include speech data by part Voiced frame get rid of, as such, it is possible to save resource, it is possible to promote speech recognition speed.
Step S103: calculate the speech recognition score value of the described voiced frame meeting condition of choosing.
In being embodied as, described in choose condition and can be configured according to the actual needs.
Step S104: when calculated speech recognition score value is more than the point threshold preset, to described The voice data obtained carries out speech recognition.
In being embodied as, pre-when being more than according to selected voiced frame calculated speech recognition score value If point threshold time, it may be determined that acquired voice data includes the voice messaging of user, this Time, the voice data obtained can be carried out speech recognition.Otherwise, then need not it is carried out voice knowledge Not.Wherein, speech recognition score value can be configured according to the actual needs.
Fig. 2 shows the flow chart of the another kind of audio recognition method in the embodiment of the present invention.Such as Fig. 2 institute The audio recognition method shown, may include that
Step S201: the voice data of acquisition is carried out sub-frame processing, to obtain at least two voiced frame.
Step S202: travel through described at least two voiced frame.
Step S203: calculate the rear signal to noise ratio of current sound frame.
In being embodied as, choose which voiced frame to determine, described at least two sound can be traveled through Frame, and each voiced frame is respectively adopted the rear signal to noise ratio (post SNR) that formula below calculating is corresponding:
SNR post ( t ) = log E ( t ) E noise ( t ) - - - ( 1 )
Wherein, SNRpostT () represents the rear signal to noise ratio of current sound frame, t represents the position sequence of current sound frame, E (t) represents noisy speech (noisy speech) energy of current sound frame, EnoiseT () represents current sound The noise energy of frame.
Step S204: according to the previous voiced frame of rear signal-to-noise ratio computation and the current sound of described current sound frame Posteriori SNR weight energy distance between frame.
In an embodiment of the present invention, use formula below calculate previous voiced frame and current sound frame it Between posteriori SNR weight energy distance:
D (t)=| logE (t)-logE (t-1) | × SNRpost(t) (2)
Wherein, D (t) represents the posteriori SNR weight energy span between previous voiced frame and current sound frame From, logE (t) represents the logarithmic energy of current sound frame, and logE (t-1) represents the logarithm of previous voiced frame Energy.
Step S205: calculate the first selected threshold of current sound frame.
In an embodiment of the present invention, need acquired voice data is divided each voiced frame obtained All calculate corresponding first selected threshold.Specifically, the first selected threshold of each voiced frame can use Formula below is calculated:
T (t)=Da(t)×f(logEnoise(t)) (3)
Wherein, T (t) represents the first selected threshold of current sound frame, DaT () expression includes current sound frame At the posteriori SNR weight energy distance average of two interior continuous voiced frames, f (logEnoise(t)) it is S Type function (sigmoid function).
It is to be herein pointed out DaT () is not a constant, it changes along with the change of voiced frame. Divide with acquired voice data and obtain 3 voiced frame the first voiced frames, the second voiced frame and the As a example by three voiced frames, wherein, D (1) represents the posteriori SNR weight of the first voiced frame and previous voiced frame Energy distance (being the product of the rear signal to noise ratio of the energy logarithm of the first voiced frame and the first voiced frame), D (2) Representing the posteriori SNR weight energy distance of the second voiced frame and the first voiced frame, D (3) represents the 3rd sound The posteriori SNR weight energy distance of sound frame and the second voiced frame.So, formula (3) is being used to calculate During the first selected threshold of the first voiced frame, Da(1) equal to D (1);Calculate the first of the second voiced frame to choose During threshold value, Da(2) it is D (1) and the meansigma methods of D (2);When calculating the first selected threshold of the 3rd voiced frame, Da(3) be D (1), D (2) and the meansigma methods of D (3).Thus, it can be seen that, DaT () is carried out more along with voiced frame Newly.
Step S206: by the posteriori SNR weight energy between described previous voiced frame and current sound frame Distance compares with the first selected threshold of current sound frame.
Step S207: when the posteriori SNR weight determined between described previous voiced frame and current sound frame When energy distance is more than the first selected threshold of current sound frame, choose current sound frame.
Step S208: calculate the speech recognition score value of the described voiced frame meeting condition of choosing.
In an embodiment of the present invention, moving average method (moving average method) can be used Calculate the speech recognition score value of the voiced frame meeting condition of choosing, be specially and use formula below to calculate institute State the speech recognition score value of the voiced frame meeting condition of choosing, including:
M n = 1 n - + n + Σ m = n - n + f ( α × ( n + m ) ) - - - ( 4 )
Wherein, MnRepresenting calculated speech recognition score value, n represents in selected voiced frame and is positioned at The position sequence of middle voiced frame, n-The position sequence of start sound frame, n in voiced frame selected by expression+Represent Terminating the position sequence of voiced frame in selected voiced frame, α represents default adjustment parameter, and m represents along with institute The positive integer of the sound framing bit sequence change chosen, f (α × (n+m)) represents moving average method forecast model.
When using above-mentioned formula (4) to calculate the speech recognition score value meeting the voiced frame choosing condition, Calculated MnIt is to move with the frame of 10ms to calculate, is used as in average moving window The measurement of par of voiced frame.
Step S209: when calculated speech recognition score value is more than the point threshold preset, to described The voice data obtained carries out speech recognition.
In being embodied as, when being more than, when calculated speech recognition score value, the point threshold preset, Determine that acquired voice data includes voice messaging, then acquired voice data can be entered Row speech recognition.
In being embodied as, when the voice messaging identified in acquired voice data, mobile terminal Can perform to operate accordingly.Such as, the voice messaging identified when mobile terminal is for " to open FACEBOOK " time, mobile terminal will open FACEBOOK for user.
In being embodied as, in order to further the voiced frame not including speech data be foreclosed, permissible It is compared to carry out really only by by the rear signal to noise ratio of each voiced frame and the second selected threshold preset Fixed, so it is possible not only to save calculate resource, the speed of speech recognition can also be improved simultaneously further, The most shown in Figure 3.
Fig. 3 shows the flow chart of the another kind of audio recognition method in the embodiment of the present invention.Such as Fig. 3 institute The audio recognition method shown, may include that
Step S301: the voice data of acquisition is carried out sub-frame processing, to obtain at least two voiced frame.
In an embodiment of the present invention, for the ease of the analyzing and processing to voiced frame, the sound number that will obtain According to dividing a length of 25ms of each voiced frame, adjacent two voiced frames at least two voiced frame obtained Between frame move as 1ms.
Step S302: at least two voiced frame obtained by traversal, and calculate the rear noise of current sound frame Ratio.
In embodiments of the present invention, use the rear signal to noise ratio that above-mentioned formula (1) calculates, can be direct It is used in and judges whether to choose current sound frame in subsequent step.
It is to be herein pointed out compared with calculating first signal to noise ratio (priori SNR), use voiced frame Rear signal to noise ratio determines whether that choosing voiced frame will become more directly perceived, clear and definite, because calculating each sound The first signal to noise ratio of sound frame needs to estimate the energy of the clean speech in current sound frame, and to sound Clean speech energy in frame is estimated will being a thing being quite difficult to.
Step S303: the rear signal to noise ratio of current sound frame is compared with the second selected threshold preset.
In being embodied as, the second selected threshold can be set according to the actual needs.
Step S304: when the rear signal to noise ratio determining present frame is more than the second selected threshold preset, choose Current sound frame.
In being embodied as, when the rear signal to noise ratio determining present frame is more than the second selected threshold, illustrate to work as Front frame potentially includes voice messaging, now chooses present frame.Otherwise, then give up present frame, and continue The continuous judgement carrying out next voiced frame.
Step S305: calculate the speech recognition score value of the described voiced frame meeting condition of choosing.
Step S306: when calculated speech recognition score value is more than the point threshold preset, to described The voice data obtained carries out speech recognition.
Fig. 4 shows that the embodiment of the present invention additionally provides a kind of speech recognition equipment.Language as shown in Figure 4 Sound identification device, can include sub-frame processing unit 401, choose unit 402, computing unit 403 and know Other unit 404, wherein:
Sub-frame processing unit 401, is suitable to the voice data of acquisition be carried out sub-frame processing, to obtain at least two Individual voiced frame.
Choose unit 402, be suitable to from described at least two voice data frame, choose the sound meeting condition of choosing Sound frame.In an embodiment of the present invention, choose unit 402 and be suitable to calculate the rear signal to noise ratio of current sound frame. When determining calculated rear signal to noise ratio more than the first selected threshold preset, choose current sound frame. In an alternative embodiment of the invention, choose unit 402 and be suitable to calculate the rear signal to noise ratio of current sound frame;Root According to the posteriority noise between the previous voiced frame of rear signal-to-noise ratio computation and the current sound frame of described current sound frame Than weight energy distance;Calculate the second selected threshold of current sound frame;When described previous voiced frame and work as When posteriori SNR weight energy distance between front voiced frame is more than the second selected threshold of current sound frame, Then choose current sound frame.
Computing unit 403, is suitable to calculate the speech recognition score value of the described voiced frame meeting condition of choosing.
Recognition unit 404, is suitable to when calculated speech recognition score value is more than the point threshold preset, The voice data of described acquisition is carried out speech recognition.
One of ordinary skill in the art will appreciate that all or part of step in the various methods of above-described embodiment Suddenly the program that can be by completes to instruct relevant hardware, and this program can be stored in computer-readable In storage medium, storage medium may include that ROM, RAM, disk or CD etc..
Having been described in detail the method and system of the embodiment of the present invention above, the present invention is not limited to this. Any those skilled in the art, without departing from the spirit and scope of the present invention, all can make various change with Amendment, therefore protection scope of the present invention should be as the criterion with claim limited range.

Claims (11)

1. an audio recognition method, it is characterised in that including:
The voice data of acquisition is carried out sub-frame processing, to obtain at least two voiced frame;
The voiced frame meeting condition of choosing is chosen from described at least two voice data frame;
Calculate the speech recognition score value of the described voiced frame meeting condition of choosing;
When calculated speech recognition score value is more than the point threshold preset, the sound number to described acquisition According to carrying out speech recognition.
Audio recognition method the most according to claim 1, it is characterised in that described from described at least two Voice data frame is chosen the voiced frame meeting condition of choosing, including:
Calculate the rear signal to noise ratio of current sound frame;
After between the previous voiced frame of rear signal-to-noise ratio computation and current sound frame according to described current sound frame Test signal to noise ratio weight energy distance;
Calculate the first selected threshold of current sound frame;
Posteriori SNR weight energy distance between described previous voiced frame and current sound frame is more than working as During the first selected threshold of front voiced frame, then choose current sound frame.
Audio recognition method the most according to claim 2, it is characterised in that use formula below to calculate The rear signal to noise ratio of current sound frame:
Wherein, SNRpostT () represents the rear signal to noise ratio of current sound frame, T represents the position sequence of current sound frame, and E (t) represents the noisy speech energy of current sound frame, Enoise(t) table Show the noise energy of current sound frame.
Audio recognition method the most according to claim 3, it is characterised in that before using formula below to calculate Posteriori SNR weight energy distance between one voiced frame and current sound frame:
D (t)=| logE (t)-logE (t-1) | × SNRpost(t);Wherein, D (t) represents previous voiced frame And the posteriori SNR weight energy distance between current sound frame, logE (t) represents current sound frame Logarithmic energy, logE (t-1) represents the logarithmic energy of previous voiced frame.
Audio recognition method the most according to claim 4, it is characterised in that use formula below to calculate First selected threshold of current sound frame:
T (t)=Da(t)×f(logEnoise(t)), wherein, T (t) represents the first selected threshold of current sound frame, DaT () represents the posteriori SNR weight energy distance average of the continuous voiced frame before current sound frame, f(logEnoise(t)) it is S type function.
Audio recognition method the most according to claim 1, it is characterised in that described from the multiple sound obtained Sound Frame is chosen the voiced frame meeting the condition of choosing preset, including:
Calculate the rear signal to noise ratio of current sound frame;
When determining calculated rear signal to noise ratio more than the second selected threshold preset, choose current sound frame.
Audio recognition method the most according to claim 6, it is characterised in that use formula below to calculate The rear signal to noise ratio of current sound frame:
Wherein, SNRpostT () represents the rear signal to noise ratio of current sound frame, T represents the position sequence of current sound frame, and E (t) represents the noisy speech energy of current sound frame, Enoise(t) table Show the noise energy of current sound frame.
8. according to the audio recognition method described in claim 2 or 7, it is characterised in that use formula below Calculate the speech recognition score value of the described voiced frame meeting condition of choosing, including:
Wherein, MnRepresent that calculated speech recognition divides Value, n represents the position sequence of current sound frame, n-The position of start sound frame in voiced frame selected by expression Sequence, n+Terminating the position sequence of voiced frame in voiced frame selected by expression, α represents default adjustment parameter, M represents that, along with the positive integer of selected sound framing bit sequence change, f (α × (n+m)) represents mobile Averaging method forecast model.
9. a speech recognition equipment, it is characterised in that including:
Sub-frame processing unit, is suitable to the voice data of acquisition is carried out sub-frame processing, to obtain at least two sound Sound frame;
Choose unit, be suitable to from described at least two voice data frame, choose the voiced frame meeting condition of choosing;
Computing unit, is suitable to calculate the speech recognition score value of the described voiced frame meeting condition of choosing;
Recognition unit, is suitable to when calculated speech recognition score value is more than the point threshold preset, to institute The voice data stating acquisition carries out speech recognition.
Speech recognition equipment the most according to claim 9, it is characterised in that described in choose unit be suitable to meter Calculate the rear signal to noise ratio of current sound frame;The previous sound of rear signal-to-noise ratio computation according to described current sound frame Posteriori SNR weight energy distance between frame and current sound frame;Calculate the first of current sound frame Selected threshold;Posteriori SNR weight energy span between described previous voiced frame and current sound frame From during more than the first selected threshold of current sound frame, then choose current sound frame.
11. speech recognition equipments according to claim 9, it is characterised in that described in choose unit be suitable to meter Calculate the rear signal to noise ratio of current sound frame;When determine calculated after signal to noise ratio more than preset second choosing When taking threshold value, choose current sound frame.
CN201510271782.7A 2015-05-25 2015-05-25 Audio recognition method and device Active CN106297795B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201510271782.7A CN106297795B (en) 2015-05-25 2015-05-25 Audio recognition method and device
CN201910945249.2A CN110895930B (en) 2015-05-25 2015-05-25 Voice recognition method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510271782.7A CN106297795B (en) 2015-05-25 2015-05-25 Audio recognition method and device

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN201910945249.2A Division CN110895930B (en) 2015-05-25 2015-05-25 Voice recognition method and device

Publications (2)

Publication Number Publication Date
CN106297795A true CN106297795A (en) 2017-01-04
CN106297795B CN106297795B (en) 2019-09-27

Family

ID=57634654

Family Applications (2)

Application Number Title Priority Date Filing Date
CN201910945249.2A Active CN110895930B (en) 2015-05-25 2015-05-25 Voice recognition method and device
CN201510271782.7A Active CN106297795B (en) 2015-05-25 2015-05-25 Audio recognition method and device

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN201910945249.2A Active CN110895930B (en) 2015-05-25 2015-05-25 Voice recognition method and device

Country Status (1)

Country Link
CN (2) CN110895930B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107702706A (en) * 2017-09-20 2018-02-16 广东欧珀移动通信有限公司 Determining method of path, device, storage medium and mobile terminal
CN107738622A (en) * 2017-08-29 2018-02-27 科大讯飞股份有限公司 Vehicular intelligent response method and device, storage medium, electronic equipment
CN112420079A (en) * 2020-11-18 2021-02-26 青岛海尔科技有限公司 Voice endpoint detection method and device, storage medium and electronic equipment
WO2023050301A1 (en) * 2021-09-30 2023-04-06 华为技术有限公司 Speech quality assessment method and apparatus, speech recognition quality prediction method and apparatus, and speech recognition quality improvement method and apparatus

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101122636A (en) * 2006-08-09 2008-02-13 富士通株式会社 Method of estimating sound arrival direction and apparatus of estimating sound arrival direction
US20080109219A1 (en) * 2003-10-16 2008-05-08 Yen-Shih Lin ADPCM encoding and decoding method and system with improved step size adaptation thereof
CN102270450A (en) * 2010-06-07 2011-12-07 株式会社曙飞电子 System and method of multi model adaptation and voice recognition
CN103730110A (en) * 2012-10-10 2014-04-16 北京百度网讯科技有限公司 Method and device for detecting voice endpoint

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6324509B1 (en) * 1999-02-08 2001-11-27 Qualcomm Incorporated Method and apparatus for accurate endpointing of speech in the presence of noise
CN100456356C (en) * 2004-11-12 2009-01-28 中国科学院声学研究所 Sound end detecting method for sound identifying system
CN101320559B (en) * 2007-06-07 2011-05-18 华为技术有限公司 Sound activation detection apparatus and method
JP2013508773A (en) * 2009-10-19 2013-03-07 テレフオンアクチーボラゲット エル エム エリクソン(パブル) Speech encoder method and voice activity detector

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080109219A1 (en) * 2003-10-16 2008-05-08 Yen-Shih Lin ADPCM encoding and decoding method and system with improved step size adaptation thereof
CN101122636A (en) * 2006-08-09 2008-02-13 富士通株式会社 Method of estimating sound arrival direction and apparatus of estimating sound arrival direction
CN102270450A (en) * 2010-06-07 2011-12-07 株式会社曙飞电子 System and method of multi model adaptation and voice recognition
CN103730110A (en) * 2012-10-10 2014-04-16 北京百度网讯科技有限公司 Method and device for detecting voice endpoint

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107738622A (en) * 2017-08-29 2018-02-27 科大讯飞股份有限公司 Vehicular intelligent response method and device, storage medium, electronic equipment
CN107702706A (en) * 2017-09-20 2018-02-16 广东欧珀移动通信有限公司 Determining method of path, device, storage medium and mobile terminal
CN112420079A (en) * 2020-11-18 2021-02-26 青岛海尔科技有限公司 Voice endpoint detection method and device, storage medium and electronic equipment
WO2023050301A1 (en) * 2021-09-30 2023-04-06 华为技术有限公司 Speech quality assessment method and apparatus, speech recognition quality prediction method and apparatus, and speech recognition quality improvement method and apparatus

Also Published As

Publication number Publication date
CN110895930A (en) 2020-03-20
CN106297795B (en) 2019-09-27
CN110895930B (en) 2022-01-28

Similar Documents

Publication Publication Date Title
US8239194B1 (en) System and method for multi-channel multi-feature speech/noise classification for noise suppression
US20200066296A1 (en) Speech Enhancement And Noise Suppression Systems And Methods
CN103325386B (en) The method and system controlled for signal transmission
CN101010722B (en) Device and method of detection of voice activity in an audio signal
CN102750956B (en) Method and device for removing reverberation of single channel voice
CN103109320B (en) Noise suppression device
CN103440872A (en) Transient state noise removing method
CN110047470A (en) A kind of sound end detecting method
CN106157967A (en) Impulse noise mitigation
KR102012325B1 (en) Estimation of background noise in audio signals
CN106297795A (en) Audio recognition method and device
CN113766073A (en) Howling detection in a conferencing system
EP3118852B1 (en) Method and device for detecting audio signal
CN106024017A (en) Voice detection method and device
CN111223492A (en) Echo path delay estimation method and device
CN103295582A (en) Noise suppression method and system
US20240046947A1 (en) Speech signal enhancement method and apparatus, and electronic device
CN103871416B (en) Speech processing device and method of speech processing
CN103903629A (en) Noise estimation method and device based on hidden Markov model
CN106033669A (en) Voice identification method and apparatus thereof
CN112750461B (en) Voice communication optimization method and device, electronic equipment and readable storage medium
CN106920543B (en) Audio recognition method and device
JP4551817B2 (en) Noise level estimation method and apparatus
CN113160846B (en) Noise suppression method and electronic equipment
CN106816157A (en) Audio recognition method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant