CN106297795A - Audio recognition method and device - Google Patents
Audio recognition method and device Download PDFInfo
- Publication number
- CN106297795A CN106297795A CN201510271782.7A CN201510271782A CN106297795A CN 106297795 A CN106297795 A CN 106297795A CN 201510271782 A CN201510271782 A CN 201510271782A CN 106297795 A CN106297795 A CN 106297795A
- Authority
- CN
- China
- Prior art keywords
- frame
- current sound
- voiced
- sound frame
- calculate
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Telephone Function (AREA)
- Telephonic Communication Services (AREA)
Abstract
A kind of audio recognition method and device, described audio recognition method includes: the voice data of acquisition is carried out sub-frame processing, to obtain at least two voiced frame;The voiced frame meeting condition of choosing is chosen from described at least two voice data frame;Calculate the speech recognition score value of the described voiced frame meeting condition of choosing;When calculated speech recognition score value is more than the point threshold preset, the voice data of described acquisition is carried out speech recognition.Above-mentioned scheme can save calculating resource, promotes the speed of speech recognition.
Description
Technical field
The invention belongs to technical field of voice recognition, particularly relate to a kind of audio recognition method and device.
Background technology
Mobile terminal, refers to the computer equipment that can use in movement, include in a broad aspect mobile phone,
Notebook, panel computer, POS, vehicle-mounted computer etc..Along with developing rapidly of integrated circuit technique, move
Dynamic terminal has had powerful disposal ability, and mobile terminal becomes one from simple call instrument
Individual integrated information processing platform, this also adds broader development space to mobile terminal.
The use of mobile terminal, it usually needs user concentrates certain attention.Mobile terminal of today sets
For being equipped with touch screen, user needs to touch described touch screen, to perform corresponding operation.But,
When user cannot touch mobile terminal device, operation mobile terminal will become highly inconvenient.Such as,
When having carried article during user drives vehicle or hands when.
Audio recognition method and always listen the use of system (Always Listening System) so that permissible
Mobile terminal is carried out non-manual activation and operation.When described always listen system acoustical signal to be detected time, language
Sound identification system will activate, and is identified the acoustical signal detected, afterwards, mobile terminal is just
Corresponding operation can be performed, such as, when user's input " dials the hands of XX according to the acoustical signal identified
Machine " voice time, the voice messaging of " dialing the mobile phone of XX " of user's input just can be carried out by mobile terminal
Identify, and after correct identification, from mobile terminal, obtain the information of the phone number of XX, and dial.
But, audio recognition method of the prior art, when carrying out speech recognition, there is amount of calculation
Greatly, the problem that recognition speed is slow.
Summary of the invention
The problem that the embodiment of the present invention solves is the calculating resource saving speech recognition, improves speech recognition
Speed.
For solving the problems referred to above, embodiments providing a kind of audio recognition method, described voice is known
Other method includes:
The voice data of acquisition is carried out sub-frame processing, to obtain at least two voiced frame;
The voiced frame meeting condition of choosing is chosen from described at least two voice data frame;
Calculate the speech recognition score value of the described voiced frame meeting condition of choosing;
When calculated speech recognition score value is more than the point threshold preset, the sound to described acquisition
Data carry out speech recognition.
Alternatively, described choose to meet from described at least two voice data frame choose the voiced frame of condition,
Including:
Calculate the rear signal to noise ratio of current sound frame;
After between the previous voiced frame of rear signal-to-noise ratio computation and current sound frame according to described current sound frame
Test signal to noise ratio weight energy distance;
Calculate the first selected threshold of current sound frame;
Posteriori SNR weight energy distance between described previous voiced frame and current sound frame is more than working as
During the first selected threshold of front voiced frame, then choose current sound frame.
Alternatively, the rear signal to noise ratio of employing formula below calculating current sound frame:
Wherein, SNRpostT () represents the rear signal to noise ratio of current sound frame,
T represents the position sequence of current sound frame, and E (t) represents the noisy speech energy of current sound frame, EnoiseT () represents
The noise energy of current sound frame.
Alternatively, formula below is used to calculate the posteriority noise between previous voiced frame and current sound frame
Than weight energy distance:
D (t)=| logE (t)-logE (t-1) | × SNRpost(t);Wherein, D (t) represents previous sound
Posteriori SNR weight energy distance between frame and current sound frame, logE (t) represents current sound frame
Logarithmic energy, logE (t-1) represents the logarithmic energy of previous voiced frame.
Alternatively, the first selected threshold of employing formula below calculating current sound frame:
T (t)=Da(t)×f(logEnoise(t)), wherein, T (t) represents that the second of current sound frame chooses threshold
Value, DaT () represents the posteriori SNR weight energy distance average of the continuous voiced frame before current sound frame,
f(logEnoise(t)) it is S type function.
Alternatively, the described sound choosing the satisfied condition of choosing preset from the multiple voice data frames obtained
Sound frame, including:
Calculate the rear signal to noise ratio of current sound frame;
When determining calculated rear signal to noise ratio more than the second selected threshold preset, choose current sound
Frame.
Alternatively, the rear signal to noise ratio of employing formula below calculating current sound frame:
Wherein, SNRpostT () represents the rear signal to noise ratio of current sound frame,
T represents the position sequence of current sound frame, and E (t) represents the noisy speech energy of current sound frame, Enoise(t) table
Show the noise energy of current sound frame.
Alternatively, the speech recognition using formula below to calculate the described voiced frame meeting and choosing condition divides
Value, including:
The embodiment of the present invention additionally provides a kind of speech recognition equipment, and described speech recognition equipment includes:
Sub-frame processing unit, is suitable to the voice data of acquisition be carried out sub-frame processing, to obtain at least two
Voiced frame;
Choose unit, be suitable to from described at least two voice data frame, choose the sound meeting condition of choosing
Frame;
Computing unit, is suitable to calculate the speech recognition score value of the described voiced frame meeting condition of choosing;
Recognition unit, is suitable to when calculated speech recognition score value is more than the point threshold preset, right
The voice data of described acquisition carries out speech recognition.
Alternatively, choose unit described in and be suitable to calculate the rear signal to noise ratio of current sound frame;According to described currently
Posteriori SNR weight energy between the previous voiced frame of rear signal-to-noise ratio computation and the current sound frame of voiced frame
Distance;Calculate the second selected threshold of current sound frame;When described previous voiced frame and current sound frame it
Between posteriori SNR weight energy distance more than the second selected threshold of current sound frame time, then choose and work as
Front voiced frame.
Alternatively, choose unit described in and be suitable to calculate the rear signal to noise ratio of current sound frame;Calculate when determining
When the rear signal to noise ratio arrived is more than the first selected threshold preset, choose current sound frame.
Compared with prior art, technical scheme has the advantage that
Meet pre-conditioned voiced frame carry out speech recognition by choosing from voice data to be identified,
The non-speech data frame not including voice messaging can be got rid of, and only selected voiced frame is all carried out language
Sound identifying processing, therefore, it can save calculating resource, promotes the speed of speech recognition, promote user's
Experience.
Further, according to the rear signal to noise ratio of calculated current sound frame, it is calculated current sound
The posteriori SNR weight energy distance of frame and previous voiced frame, and calculated posteriori SNR is weighed
Beijing South Maxpower Technology Co. Ltd's span compares from the second selected threshold with calculated current sound frame, and only calculates
The rear signal to noise ratio of current sound frame is compared, and more will can not include the non-speech sounds frame of voice messaging
Foreclose, therefore, it can save calculating resource further, promote the speed of speech recognition.
Further, by only by the rear signal to noise ratio of calculated current sound frame and first preset
Selected threshold compares, and can be got rid of by the voiced frame more not including voice messaging, it is possible to joint
Save and calculate resource, therefore, it can improve further the speed of speech recognition.
Accompanying drawing explanation
Fig. 1 is the flow chart of a kind of audio recognition method in the embodiment of the present invention;
Fig. 2 is the flow chart of the another kind of audio recognition method in the embodiment of the present invention;
Fig. 3 is the flow chart of another audio recognition method in the embodiment of the present invention;
Fig. 4 is the structural representation of a kind of speech recognition equipment in the embodiment of the present invention.
Detailed description of the invention
Audio recognition method of the prior art, when carrying out speech recognition, generally with fixing frame per second (Fixed
Frame Rate, FFR) voice data to be identified is divided the multiple voiced frames obtained carry out speech recognition
Process.Voice messaging is not included owing to dividing in some voiced frame in the multiple voiced frames obtained, right
These do not include that the non-speech frame of voice messaging carries out voice recognition processing, have no not only for speech recognition
Meaning, but also calculating resource can be wasted, reduce the recognition speed of voice.
For solving the above-mentioned problems in the prior art, the technical scheme that the embodiment of the present invention uses is passed through
Choose from voice data to be identified and meet pre-conditioned voiced frame and carry out speech recognition, can save
Calculate resource, promote the speed of speech recognition, promote the experience of user.
Understandable, below in conjunction with the accompanying drawings for enabling the above-mentioned purpose of the present invention, feature and advantage to become apparent from
The specific embodiment of the present invention is described in detail.
Fig. 1 shows the flow chart of a kind of audio recognition method in the embodiment of the present invention.As shown in Figure 1
Audio recognition method, may include that
Step S101: the voice data of acquisition is carried out sub-frame processing, to obtain at least two voiced frame.
In being embodied as, can use Mike that the acoustical signal of input is carried out Real-time Collection.When adopting
When collection is to voice data, is processed by corresponding, the acoustical signal of input is converted into the sound of correspondence
Data.Afterwards, the voice data being converted to can be carried out sub-frame processing, thus obtain at least two
Voiced frame.
Step S102: choose the voiced frame meeting condition of choosing from described at least two voice data frame.
Existing audio recognition method, when carrying out speech recognition, it usually needs divide voice data
To described at least two voiced frame all carry out corresponding voice recognition processing.But, it is not each sound
Sound frame all includes voice messaging, and the voiced frame not including voice messaging is carried out voice recognition processing and incites somebody to action
Resource can be wasted, and the speed of speech recognition can be reduced.Therefore, in embodiments of the present invention, first
Selected part voiced frame from the voiced frame dividing at least two obtained, does not include speech data by part
Voiced frame get rid of, as such, it is possible to save resource, it is possible to promote speech recognition speed.
Step S103: calculate the speech recognition score value of the described voiced frame meeting condition of choosing.
In being embodied as, described in choose condition and can be configured according to the actual needs.
Step S104: when calculated speech recognition score value is more than the point threshold preset, to described
The voice data obtained carries out speech recognition.
In being embodied as, pre-when being more than according to selected voiced frame calculated speech recognition score value
If point threshold time, it may be determined that acquired voice data includes the voice messaging of user, this
Time, the voice data obtained can be carried out speech recognition.Otherwise, then need not it is carried out voice knowledge
Not.Wherein, speech recognition score value can be configured according to the actual needs.
Fig. 2 shows the flow chart of the another kind of audio recognition method in the embodiment of the present invention.Such as Fig. 2 institute
The audio recognition method shown, may include that
Step S201: the voice data of acquisition is carried out sub-frame processing, to obtain at least two voiced frame.
Step S202: travel through described at least two voiced frame.
Step S203: calculate the rear signal to noise ratio of current sound frame.
In being embodied as, choose which voiced frame to determine, described at least two sound can be traveled through
Frame, and each voiced frame is respectively adopted the rear signal to noise ratio (post SNR) that formula below calculating is corresponding:
Wherein, SNRpostT () represents the rear signal to noise ratio of current sound frame, t represents the position sequence of current sound frame,
E (t) represents noisy speech (noisy speech) energy of current sound frame, EnoiseT () represents current sound
The noise energy of frame.
Step S204: according to the previous voiced frame of rear signal-to-noise ratio computation and the current sound of described current sound frame
Posteriori SNR weight energy distance between frame.
In an embodiment of the present invention, use formula below calculate previous voiced frame and current sound frame it
Between posteriori SNR weight energy distance:
D (t)=| logE (t)-logE (t-1) | × SNRpost(t) (2)
Wherein, D (t) represents the posteriori SNR weight energy span between previous voiced frame and current sound frame
From, logE (t) represents the logarithmic energy of current sound frame, and logE (t-1) represents the logarithm of previous voiced frame
Energy.
Step S205: calculate the first selected threshold of current sound frame.
In an embodiment of the present invention, need acquired voice data is divided each voiced frame obtained
All calculate corresponding first selected threshold.Specifically, the first selected threshold of each voiced frame can use
Formula below is calculated:
T (t)=Da(t)×f(logEnoise(t)) (3)
Wherein, T (t) represents the first selected threshold of current sound frame, DaT () expression includes current sound frame
At the posteriori SNR weight energy distance average of two interior continuous voiced frames, f (logEnoise(t)) it is S
Type function (sigmoid function).
It is to be herein pointed out DaT () is not a constant, it changes along with the change of voiced frame.
Divide with acquired voice data and obtain 3 voiced frame the first voiced frames, the second voiced frame and the
As a example by three voiced frames, wherein, D (1) represents the posteriori SNR weight of the first voiced frame and previous voiced frame
Energy distance (being the product of the rear signal to noise ratio of the energy logarithm of the first voiced frame and the first voiced frame), D (2)
Representing the posteriori SNR weight energy distance of the second voiced frame and the first voiced frame, D (3) represents the 3rd sound
The posteriori SNR weight energy distance of sound frame and the second voiced frame.So, formula (3) is being used to calculate
During the first selected threshold of the first voiced frame, Da(1) equal to D (1);Calculate the first of the second voiced frame to choose
During threshold value, Da(2) it is D (1) and the meansigma methods of D (2);When calculating the first selected threshold of the 3rd voiced frame,
Da(3) be D (1), D (2) and the meansigma methods of D (3).Thus, it can be seen that, DaT () is carried out more along with voiced frame
Newly.
Step S206: by the posteriori SNR weight energy between described previous voiced frame and current sound frame
Distance compares with the first selected threshold of current sound frame.
Step S207: when the posteriori SNR weight determined between described previous voiced frame and current sound frame
When energy distance is more than the first selected threshold of current sound frame, choose current sound frame.
Step S208: calculate the speech recognition score value of the described voiced frame meeting condition of choosing.
In an embodiment of the present invention, moving average method (moving average method) can be used
Calculate the speech recognition score value of the voiced frame meeting condition of choosing, be specially and use formula below to calculate institute
State the speech recognition score value of the voiced frame meeting condition of choosing, including:
Wherein, MnRepresenting calculated speech recognition score value, n represents in selected voiced frame and is positioned at
The position sequence of middle voiced frame, n-The position sequence of start sound frame, n in voiced frame selected by expression+Represent
Terminating the position sequence of voiced frame in selected voiced frame, α represents default adjustment parameter, and m represents along with institute
The positive integer of the sound framing bit sequence change chosen, f (α × (n+m)) represents moving average method forecast model.
When using above-mentioned formula (4) to calculate the speech recognition score value meeting the voiced frame choosing condition,
Calculated MnIt is to move with the frame of 10ms to calculate, is used as in average moving window
The measurement of par of voiced frame.
Step S209: when calculated speech recognition score value is more than the point threshold preset, to described
The voice data obtained carries out speech recognition.
In being embodied as, when being more than, when calculated speech recognition score value, the point threshold preset,
Determine that acquired voice data includes voice messaging, then acquired voice data can be entered
Row speech recognition.
In being embodied as, when the voice messaging identified in acquired voice data, mobile terminal
Can perform to operate accordingly.Such as, the voice messaging identified when mobile terminal is for " to open
FACEBOOK " time, mobile terminal will open FACEBOOK for user.
In being embodied as, in order to further the voiced frame not including speech data be foreclosed, permissible
It is compared to carry out really only by by the rear signal to noise ratio of each voiced frame and the second selected threshold preset
Fixed, so it is possible not only to save calculate resource, the speed of speech recognition can also be improved simultaneously further,
The most shown in Figure 3.
Fig. 3 shows the flow chart of the another kind of audio recognition method in the embodiment of the present invention.Such as Fig. 3 institute
The audio recognition method shown, may include that
Step S301: the voice data of acquisition is carried out sub-frame processing, to obtain at least two voiced frame.
In an embodiment of the present invention, for the ease of the analyzing and processing to voiced frame, the sound number that will obtain
According to dividing a length of 25ms of each voiced frame, adjacent two voiced frames at least two voiced frame obtained
Between frame move as 1ms.
Step S302: at least two voiced frame obtained by traversal, and calculate the rear noise of current sound frame
Ratio.
In embodiments of the present invention, use the rear signal to noise ratio that above-mentioned formula (1) calculates, can be direct
It is used in and judges whether to choose current sound frame in subsequent step.
It is to be herein pointed out compared with calculating first signal to noise ratio (priori SNR), use voiced frame
Rear signal to noise ratio determines whether that choosing voiced frame will become more directly perceived, clear and definite, because calculating each sound
The first signal to noise ratio of sound frame needs to estimate the energy of the clean speech in current sound frame, and to sound
Clean speech energy in frame is estimated will being a thing being quite difficult to.
Step S303: the rear signal to noise ratio of current sound frame is compared with the second selected threshold preset.
In being embodied as, the second selected threshold can be set according to the actual needs.
Step S304: when the rear signal to noise ratio determining present frame is more than the second selected threshold preset, choose
Current sound frame.
In being embodied as, when the rear signal to noise ratio determining present frame is more than the second selected threshold, illustrate to work as
Front frame potentially includes voice messaging, now chooses present frame.Otherwise, then give up present frame, and continue
The continuous judgement carrying out next voiced frame.
Step S305: calculate the speech recognition score value of the described voiced frame meeting condition of choosing.
Step S306: when calculated speech recognition score value is more than the point threshold preset, to described
The voice data obtained carries out speech recognition.
Fig. 4 shows that the embodiment of the present invention additionally provides a kind of speech recognition equipment.Language as shown in Figure 4
Sound identification device, can include sub-frame processing unit 401, choose unit 402, computing unit 403 and know
Other unit 404, wherein:
Sub-frame processing unit 401, is suitable to the voice data of acquisition be carried out sub-frame processing, to obtain at least two
Individual voiced frame.
Choose unit 402, be suitable to from described at least two voice data frame, choose the sound meeting condition of choosing
Sound frame.In an embodiment of the present invention, choose unit 402 and be suitable to calculate the rear signal to noise ratio of current sound frame.
When determining calculated rear signal to noise ratio more than the first selected threshold preset, choose current sound frame.
In an alternative embodiment of the invention, choose unit 402 and be suitable to calculate the rear signal to noise ratio of current sound frame;Root
According to the posteriority noise between the previous voiced frame of rear signal-to-noise ratio computation and the current sound frame of described current sound frame
Than weight energy distance;Calculate the second selected threshold of current sound frame;When described previous voiced frame and work as
When posteriori SNR weight energy distance between front voiced frame is more than the second selected threshold of current sound frame,
Then choose current sound frame.
Computing unit 403, is suitable to calculate the speech recognition score value of the described voiced frame meeting condition of choosing.
Recognition unit 404, is suitable to when calculated speech recognition score value is more than the point threshold preset,
The voice data of described acquisition is carried out speech recognition.
One of ordinary skill in the art will appreciate that all or part of step in the various methods of above-described embodiment
Suddenly the program that can be by completes to instruct relevant hardware, and this program can be stored in computer-readable
In storage medium, storage medium may include that ROM, RAM, disk or CD etc..
Having been described in detail the method and system of the embodiment of the present invention above, the present invention is not limited to this.
Any those skilled in the art, without departing from the spirit and scope of the present invention, all can make various change with
Amendment, therefore protection scope of the present invention should be as the criterion with claim limited range.
Claims (11)
1. an audio recognition method, it is characterised in that including:
The voice data of acquisition is carried out sub-frame processing, to obtain at least two voiced frame;
The voiced frame meeting condition of choosing is chosen from described at least two voice data frame;
Calculate the speech recognition score value of the described voiced frame meeting condition of choosing;
When calculated speech recognition score value is more than the point threshold preset, the sound number to described acquisition
According to carrying out speech recognition.
Audio recognition method the most according to claim 1, it is characterised in that described from described at least two
Voice data frame is chosen the voiced frame meeting condition of choosing, including:
Calculate the rear signal to noise ratio of current sound frame;
After between the previous voiced frame of rear signal-to-noise ratio computation and current sound frame according to described current sound frame
Test signal to noise ratio weight energy distance;
Calculate the first selected threshold of current sound frame;
Posteriori SNR weight energy distance between described previous voiced frame and current sound frame is more than working as
During the first selected threshold of front voiced frame, then choose current sound frame.
Audio recognition method the most according to claim 2, it is characterised in that use formula below to calculate
The rear signal to noise ratio of current sound frame:
Wherein, SNRpostT () represents the rear signal to noise ratio of current sound frame,
T represents the position sequence of current sound frame, and E (t) represents the noisy speech energy of current sound frame, Enoise(t) table
Show the noise energy of current sound frame.
Audio recognition method the most according to claim 3, it is characterised in that before using formula below to calculate
Posteriori SNR weight energy distance between one voiced frame and current sound frame:
D (t)=| logE (t)-logE (t-1) | × SNRpost(t);Wherein, D (t) represents previous voiced frame
And the posteriori SNR weight energy distance between current sound frame, logE (t) represents current sound frame
Logarithmic energy, logE (t-1) represents the logarithmic energy of previous voiced frame.
Audio recognition method the most according to claim 4, it is characterised in that use formula below to calculate
First selected threshold of current sound frame:
T (t)=Da(t)×f(logEnoise(t)), wherein, T (t) represents the first selected threshold of current sound frame,
DaT () represents the posteriori SNR weight energy distance average of the continuous voiced frame before current sound frame,
f(logEnoise(t)) it is S type function.
Audio recognition method the most according to claim 1, it is characterised in that described from the multiple sound obtained
Sound Frame is chosen the voiced frame meeting the condition of choosing preset, including:
Calculate the rear signal to noise ratio of current sound frame;
When determining calculated rear signal to noise ratio more than the second selected threshold preset, choose current sound frame.
Audio recognition method the most according to claim 6, it is characterised in that use formula below to calculate
The rear signal to noise ratio of current sound frame:
Wherein, SNRpostT () represents the rear signal to noise ratio of current sound frame,
T represents the position sequence of current sound frame, and E (t) represents the noisy speech energy of current sound frame, Enoise(t) table
Show the noise energy of current sound frame.
8. according to the audio recognition method described in claim 2 or 7, it is characterised in that use formula below
Calculate the speech recognition score value of the described voiced frame meeting condition of choosing, including:
Wherein, MnRepresent that calculated speech recognition divides
Value, n represents the position sequence of current sound frame, n-The position of start sound frame in voiced frame selected by expression
Sequence, n+Terminating the position sequence of voiced frame in voiced frame selected by expression, α represents default adjustment parameter,
M represents that, along with the positive integer of selected sound framing bit sequence change, f (α × (n+m)) represents mobile
Averaging method forecast model.
9. a speech recognition equipment, it is characterised in that including:
Sub-frame processing unit, is suitable to the voice data of acquisition is carried out sub-frame processing, to obtain at least two sound
Sound frame;
Choose unit, be suitable to from described at least two voice data frame, choose the voiced frame meeting condition of choosing;
Computing unit, is suitable to calculate the speech recognition score value of the described voiced frame meeting condition of choosing;
Recognition unit, is suitable to when calculated speech recognition score value is more than the point threshold preset, to institute
The voice data stating acquisition carries out speech recognition.
Speech recognition equipment the most according to claim 9, it is characterised in that described in choose unit be suitable to meter
Calculate the rear signal to noise ratio of current sound frame;The previous sound of rear signal-to-noise ratio computation according to described current sound frame
Posteriori SNR weight energy distance between frame and current sound frame;Calculate the first of current sound frame
Selected threshold;Posteriori SNR weight energy span between described previous voiced frame and current sound frame
From during more than the first selected threshold of current sound frame, then choose current sound frame.
11. speech recognition equipments according to claim 9, it is characterised in that described in choose unit be suitable to meter
Calculate the rear signal to noise ratio of current sound frame;When determine calculated after signal to noise ratio more than preset second choosing
When taking threshold value, choose current sound frame.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510271782.7A CN106297795B (en) | 2015-05-25 | 2015-05-25 | Audio recognition method and device |
CN201910945249.2A CN110895930B (en) | 2015-05-25 | 2015-05-25 | Voice recognition method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510271782.7A CN106297795B (en) | 2015-05-25 | 2015-05-25 | Audio recognition method and device |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910945249.2A Division CN110895930B (en) | 2015-05-25 | 2015-05-25 | Voice recognition method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106297795A true CN106297795A (en) | 2017-01-04 |
CN106297795B CN106297795B (en) | 2019-09-27 |
Family
ID=57634654
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910945249.2A Active CN110895930B (en) | 2015-05-25 | 2015-05-25 | Voice recognition method and device |
CN201510271782.7A Active CN106297795B (en) | 2015-05-25 | 2015-05-25 | Audio recognition method and device |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910945249.2A Active CN110895930B (en) | 2015-05-25 | 2015-05-25 | Voice recognition method and device |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN110895930B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107702706A (en) * | 2017-09-20 | 2018-02-16 | 广东欧珀移动通信有限公司 | Determining method of path, device, storage medium and mobile terminal |
CN107738622A (en) * | 2017-08-29 | 2018-02-27 | 科大讯飞股份有限公司 | Vehicular intelligent response method and device, storage medium, electronic equipment |
CN112420079A (en) * | 2020-11-18 | 2021-02-26 | 青岛海尔科技有限公司 | Voice endpoint detection method and device, storage medium and electronic equipment |
WO2023050301A1 (en) * | 2021-09-30 | 2023-04-06 | 华为技术有限公司 | Speech quality assessment method and apparatus, speech recognition quality prediction method and apparatus, and speech recognition quality improvement method and apparatus |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101122636A (en) * | 2006-08-09 | 2008-02-13 | 富士通株式会社 | Method of estimating sound arrival direction and apparatus of estimating sound arrival direction |
US20080109219A1 (en) * | 2003-10-16 | 2008-05-08 | Yen-Shih Lin | ADPCM encoding and decoding method and system with improved step size adaptation thereof |
CN102270450A (en) * | 2010-06-07 | 2011-12-07 | 株式会社曙飞电子 | System and method of multi model adaptation and voice recognition |
CN103730110A (en) * | 2012-10-10 | 2014-04-16 | 北京百度网讯科技有限公司 | Method and device for detecting voice endpoint |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6324509B1 (en) * | 1999-02-08 | 2001-11-27 | Qualcomm Incorporated | Method and apparatus for accurate endpointing of speech in the presence of noise |
CN100456356C (en) * | 2004-11-12 | 2009-01-28 | 中国科学院声学研究所 | Sound end detecting method for sound identifying system |
CN101320559B (en) * | 2007-06-07 | 2011-05-18 | 华为技术有限公司 | Sound activation detection apparatus and method |
JP2013508773A (en) * | 2009-10-19 | 2013-03-07 | テレフオンアクチーボラゲット エル エム エリクソン(パブル) | Speech encoder method and voice activity detector |
-
2015
- 2015-05-25 CN CN201910945249.2A patent/CN110895930B/en active Active
- 2015-05-25 CN CN201510271782.7A patent/CN106297795B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080109219A1 (en) * | 2003-10-16 | 2008-05-08 | Yen-Shih Lin | ADPCM encoding and decoding method and system with improved step size adaptation thereof |
CN101122636A (en) * | 2006-08-09 | 2008-02-13 | 富士通株式会社 | Method of estimating sound arrival direction and apparatus of estimating sound arrival direction |
CN102270450A (en) * | 2010-06-07 | 2011-12-07 | 株式会社曙飞电子 | System and method of multi model adaptation and voice recognition |
CN103730110A (en) * | 2012-10-10 | 2014-04-16 | 北京百度网讯科技有限公司 | Method and device for detecting voice endpoint |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107738622A (en) * | 2017-08-29 | 2018-02-27 | 科大讯飞股份有限公司 | Vehicular intelligent response method and device, storage medium, electronic equipment |
CN107702706A (en) * | 2017-09-20 | 2018-02-16 | 广东欧珀移动通信有限公司 | Determining method of path, device, storage medium and mobile terminal |
CN112420079A (en) * | 2020-11-18 | 2021-02-26 | 青岛海尔科技有限公司 | Voice endpoint detection method and device, storage medium and electronic equipment |
WO2023050301A1 (en) * | 2021-09-30 | 2023-04-06 | 华为技术有限公司 | Speech quality assessment method and apparatus, speech recognition quality prediction method and apparatus, and speech recognition quality improvement method and apparatus |
Also Published As
Publication number | Publication date |
---|---|
CN110895930A (en) | 2020-03-20 |
CN106297795B (en) | 2019-09-27 |
CN110895930B (en) | 2022-01-28 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US8239194B1 (en) | System and method for multi-channel multi-feature speech/noise classification for noise suppression | |
US20200066296A1 (en) | Speech Enhancement And Noise Suppression Systems And Methods | |
CN103325386B (en) | The method and system controlled for signal transmission | |
CN101010722B (en) | Device and method of detection of voice activity in an audio signal | |
CN102750956B (en) | Method and device for removing reverberation of single channel voice | |
CN103109320B (en) | Noise suppression device | |
CN103440872A (en) | Transient state noise removing method | |
CN110047470A (en) | A kind of sound end detecting method | |
CN106157967A (en) | Impulse noise mitigation | |
KR102012325B1 (en) | Estimation of background noise in audio signals | |
CN106297795A (en) | Audio recognition method and device | |
CN113766073A (en) | Howling detection in a conferencing system | |
EP3118852B1 (en) | Method and device for detecting audio signal | |
CN106024017A (en) | Voice detection method and device | |
CN111223492A (en) | Echo path delay estimation method and device | |
CN103295582A (en) | Noise suppression method and system | |
US20240046947A1 (en) | Speech signal enhancement method and apparatus, and electronic device | |
CN103871416B (en) | Speech processing device and method of speech processing | |
CN103903629A (en) | Noise estimation method and device based on hidden Markov model | |
CN106033669A (en) | Voice identification method and apparatus thereof | |
CN112750461B (en) | Voice communication optimization method and device, electronic equipment and readable storage medium | |
CN106920543B (en) | Audio recognition method and device | |
JP4551817B2 (en) | Noise level estimation method and apparatus | |
CN113160846B (en) | Noise suppression method and electronic equipment | |
CN106816157A (en) | Audio recognition method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |