CN107564527A - The method for recognizing Chinese-English bilingual voice of embedded system - Google Patents
The method for recognizing Chinese-English bilingual voice of embedded system Download PDFInfo
- Publication number
- CN107564527A CN107564527A CN201710793500.9A CN201710793500A CN107564527A CN 107564527 A CN107564527 A CN 107564527A CN 201710793500 A CN201710793500 A CN 201710793500A CN 107564527 A CN107564527 A CN 107564527A
- Authority
- CN
- China
- Prior art keywords
- model
- chinese
- english
- voice
- speech recognition
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Landscapes
- Machine Translation (AREA)
Abstract
The invention belongs to technical field of voice recognition, more particularly to a kind of method for recognizing Chinese-English bilingual voice of embedded system.Include the preemphasis of voice after A/D samplings and sampling, improve the energy of high-frequency signal, adding window sub-frame processing and the extraction of speech characteristic parameter, and according to the acoustic model pre-established, carry out the match cognization of voice command;The process of establishing of wherein acoustic model is to establish the non-mother tongue Model Fusion adjustment of Chinese-English bilingual speech recognition initial model, Chinese-English bilingual speech recognition initial model;The match cognization of voice command is specifically the identification of Chinese-English bilingual voice command.The defects of can only identifying single language instant invention overcomes existing voice identifying system.
Description
Technical field
The invention belongs to the Chinese-English bilingual speech recognition side of technical field of voice recognition, more particularly to a kind of embedded system
Method.
Background technology
In recent years, external Voice ASIC have developed rapidly.Some external voice technologies and semiconductor company all throw
Enter a large amount of man power and materials and develop Voice ASIC, and patent guarantor is carried out to the speech recognition algorithm of oneself national language
Shield.The speech recognition performance of these special (system) chips is also different.The process of common speech recognition as shown in figure 1,
The voice signal of input first passes around A/D and sampled, and the processing of frequency spectrum shaping adding window preemphasis, improves radio-frequency component, carries out real
When characteristic parameter extraction, the parameter of extraction is Mel Frequency Cepstral Coefficients (MFCC), while carries out speech recognition template training and language
Sound recognition template matches, and in order to improve the chip recognition performance robustness under noise circumstance, can also carry out the processing of speech enhan-cement.
Special chip generally comprises 8 or 16 MCU controllers or 16 bit DSP microprocessors and coupled automatic growth control
(AGC), audio frequency preamplifier, low pass filter, D/A (A/D) converter, analog (D/A) converter, audio power are put
Big device, read-only storage (ROM).Special (system) chip of these speech recognitions has begun to be applied in intelligent sound object for appreciation
On tool, mobile communication terminal.
But the high-performance Voice ASIC of existing medium vocabulary can only identify single languages language, that is to say, that
Identification mission can only be made up of the verbal order of Chinese either single languages such as English or Japanese, not support bilingual
The identification of (such as Chinese-English bilingual mixing) order.
However, deepening continuously with internationalization trend, either economic, politics, or culture, science, people are in day
Bilingual phenomenon often appeared in life is more and more common, such as Sino-British two-character given name etc..Thus, only structure based on Chinese or
The speech recognition system of single language such as person's English can not increasingly comply with the requirement of era development.Especially as using in the world
Number is most and most popular Chinese and English, structure one can carry out the system that Chinese and English mixing identifies, and will
He realizes on the portable equipments such as special chip system, it appears extremely important.
The content of the invention
The object of the present invention is to overcome the shortcomings of that existing chip system can only identify single language, propose a kind of embedded
The method for recognizing Chinese-English bilingual voice of system.This method be the Chinese-English bilingual Embedded Speech Recognition System based on phoneme integration modeling,
Embedded speech Enhancement Method.
Technical scheme is a kind of method for recognizing Chinese-English bilingual voice of embedded system, including language after A/D samplings and sampling
The preemphasis of sound, improves the energy of high-frequency signal, adding window sub-frame processing and the extraction of speech characteristic parameter, and according to building in advance
Vertical acoustic model, the match cognization of voice command is carried out, it is characterized in that the process of establishing of the acoustic model is that establishment is Chinese-English
The non-mother tongue Model Fusion adjustment of double-language voice identification initial model, Chinese-English bilingual speech recognition initial model;The voice life
The match cognization of order is specifically the identification of Chinese-English bilingual voice command;
Wherein, the establishment Chinese-English bilingual speech recognition initial model includes revision Mandarin speech recognition model, revision English
After language speech recognition modeling, the revised Mandarin speech recognition model of merging and English Phonetics identification model and training merge
Chinese speech and English Phonetics identification model;
The non-mother tongue Model Fusion adjustment of the Chinese-English bilingual speech recognition initial model uses selectable model merger
Method is merged to mother tongue model and non-mother tongue model, and the Chinese-English bilingual speech recognition initial model after fusion is carried out most
Small phoneme fault discrimination training, obtains Chinese-English bilingual speech recognition modeling;
The identification feature of the voice signal for being identified by extraction input of the Chinese-English bilingual voice command, calculate Chinese-English double
The Gauss fraction of language speech recognition modeling, template matches are carried out according to Chinese-English bilingual entry, the maximum entry of fraction will be matched and made
For recognition result.
Methods described also includes speech enhan-cement step.
It is described to merge revised Mandarin speech recognition model and English Phonetics identification model is specifically, using based on state
The modal distance computational methods of time alignment, Chinese and english the distance between phoneme two-by-two is calculated, then by distance minimum
A pair of phonemes merge.
Chinese speech and English Phonetics identification model after the training merging, using maximal possibility estimation criterion and expectation
Maximized valuation iterative algorithm, obtains Chinese-English bilingual speech recognition initial model.
Chinese speech and English Phonetics identification model after the training merging are completed on PC.
It is described that mother tongue model and non-mother tongue model are merged using selectable model merging method, including following step
Suddenly:
(11) a mother tongue model M 1 is obtained by the database training of pure mother tongue;
(12) model M 1 using maximum likelihood linear regression adaptively, obtain with a small amount of non-mother tongue database
To model M 2;
(13) by selectable model merger strategy, by the correspondence in Chinese-English bilingual speech recognition initial model, some is female
Voice element λ i model Sb, with λ i in the corresponding mother tongue model Sne and model M 2 of the phoneme λ i in model M 1 corresponding to it is adaptive
Phoneme λ i easy confusion tone element is corresponded in model Sa, and the Pronounceable dictionary obtained according to non-mother tongue easy confusion tone element changing method
γ j adaptive model γ m carry out linear interpolation fusion, the adjustment model Sf of the phoneme λ i after being merged;Model interpolation
Formula is as follows:
P (Sf)=λ 1p (Sb)+λ 2p (Sne)+λ 3p (Sa)+λ 4p (γ m)
Wherein λ 1, λ 2, λ 3 and λ 4 represent the interpolation factor of corresponding model respectively.
Chinese-English bilingual speech recognition initial model after the fusion, which carries out minimum phoneme fault discrimination training, to be included:Make
Obtain training the word lattice information of voice with speech recognition device;Trained by the prime word level markup information in voice training storehouse
To the language model of Chinese and english;Front and rear item algorithm is done to update model parameter in obtained word lattice information.
The speech enhan-cement step uses improved Wiener filtering algorithm, comprises the following steps:
(21) initial value for using one section of typical ambient noise to estimate as noise;
(22) noise measuring of robust is carried out using sliding filter and tri-state state machine, for different input signal-to-noise ratios
Noisy speech signal, by the output of wave filter compared with threshold value set in advance, present frame letter is determined according to decision condition
Number whether it is in ambient noise;If it is, perform step (23);
(23) estimation of present frame prior weight is carried out using Decision-Directed algorithms, and utilizes historical frames
Information carries out the renewal of noise signal;
(24) two-stage interframe smoothing processing is used, the continuity of enhancing speech signal spec-trum is improved, reduces voice signal
Distortion.
The estimation of the present frame prior weight, by the estimation of former frame prior weight and present frame posteriori SNR
γ k (n) weightings obtain, and calculation formula is:
Wherein, it is the estimation of present frame prior weight;P is feedback factor, for controlling previous frame with present frame to working as
The contribution of previous frame a priori SNR estimation;A is the control convergence factor.
Method provided by the invention, which overcomes existing chip system, can only identify the deficiency of single language, have algorithm complex
It is low, identify the characteristics of sane performance is good under accuracy of identification height and noise circumstance.
Brief description of the drawings
Fig. 1 is currently used speech recognition schematic diagram;
Fig. 2 is that Chinese say and obscure phoneme change table during English;
Fig. 3 is the time slice information schematic diagram that the phoneme merging method based on state for time alignment obtains.
Embodiment
Below in conjunction with the accompanying drawings, preferred embodiment is elaborated.It is emphasized that the description below is merely exemplary
, the scope being not intended to be limiting of the invention and its application.
Fig. 2 is method for recognizing Chinese-English bilingual voice process schematic provided by the invention.It is provided by the invention embedding in Fig. 2
The method for recognizing Chinese-English bilingual voice of embedded system, comprises the following steps:The preemphasis of voice, is improved after A/D is sampled and sampled
The energy of high-frequency signal, adding window sub-frame processing and the extraction of speech characteristic parameter, establish Chinese-English bilingual speech recognition introductory die
Type, the adjustment of non-mother tongue Model Fusion and the identification of Chinese-English bilingual voice command of Chinese-English bilingual speech recognition initial model.Wherein,
A/D sample and sampling after voice preemphasis, improve the energy of high-frequency signal, adding window sub-frame processing and speech characteristic parameter
Extraction is existing technology, establishes Chinese-English bilingual speech recognition initial model, the non-mother of Chinese-English bilingual speech recognition initial model
Language Model Fusion adjusts and the identification of Chinese-English bilingual voice command is new technology proposed by the present invention.
Establishing Chinese-English bilingual speech recognition initial model includes revision Mandarin speech recognition model, revision English Phonetics identification
Model, merge revised Mandarin speech recognition model and English Phonetics identification model and training merge after Chinese speech and
English Phonetics identification model.
Mandarin speech recognition model and English Phonetics identification model are revised, says English or the foreigner according to Chinese first
Be right pronunciation difference finishing Pronounceable dictionary (i.e. Chinese and english speech recognition modeling) caused by text.Mainly have and known based on expert
Know and based on two methods of data-driven.In the present invention, so can be under expertise guidance in combination with two kinds of strategies
Obtain versatile, rely on the small pronunciation changing rule of non-mother tongue pronunciation data volume, and can has data-driven concurrently.So as to realize with
Real data matching is good, and manual intervention is few, propagable advantage.When using the method for data-driven, by combined training number
According to archiphoneme mark and the identification of identifier mark to obtain confusing phoneme matrix, then in conjunction with the guidance of expertise
It is determined that final pronunciation changing rule.So that Chinese say English as an example, Fig. 2 is that Chinese say that the phoneme of obscuring during English changes
Table, in Fig. 2, the phoneme changing rule that is finally determined according to this, carry out the Pronounceable dictionary of revised English.
After revision Mandarin speech recognition model and English Phonetics identification model, two models of revision are merged,
Unified and the Models Sets of scale is smaller.The identification model of a scale is smaller is obtained with regard to necessarily carrying out Chinese and English knowledge
The merging of other model, while in order to ensure higher discrimination, when merging, by some, in acoustic model, spatially distance is enough
Near model merges.The present invention weighs two moulds using the method model distance calculating method based on state for time alignment
Distance between type.Illustrate that the distance between two models calculates by taking two phoneme model Chinese phoneme λ i and English phoneme γ j as an example
Method, first prepare some sections of voices from the voice manually marked for two phonemes, then by each section of voice of λ i respectively with this sound
Plain λ i and other side phoneme γ j carry out the alignment of viterbi (Viterbi) state for time, obtain segment information as shown in Figure 3.Wherein
λ i and γ j represent two models before not merging respectively.It can be seen that 5 sections of segmentation informations can be obtained, then according to corresponding
Period, calculate the Bhattacharyya distances of each section of upper two model, be designated as Dmn, finally by the use of the length of period as
Weight is weighted to obtain a distance:
D (λ i, γ j)=∑ q=15 Δs tqDmn.]] >
In turn, each section of voice of γ j is subjected to viterbi (Viterbi) shape with this phoneme γ j and other side phoneme λ i respectively
State time alignment, same method obtain D (γ j, λ i), and the distance between final mask λ i and γ j is
D=12 (D (λ i, γ j)+D (γ j, λ i))]] >
Computational methods more than, Chinese and English the distance between phoneme two-by-two is obtained, then by a pair of distance minimum
Phoneme merges.The circulation of phoneme merging is carried out according to this process, untill phoneme number drops to the quantity of needs.According to
Distance calculating method based on state for time alignment presented hereinbefore, 15 pairs altogether are incorporated by Chinese phoneme and English phoneme,
The scale of phone set is significantly reduced, is adapted to the resource requirement of embedded system.
Followed by substantial amounts of Chinese and English Phonetics database, the Chinese speech after merging and English Phonetics are known
Other model is trained, here using MLE (Maximum likelylood estimation, maximal possibility estimation) criterions and
EM (Expectation Maximum, expectation maximization) valuation iterative algorithm is carried out, and it is initial to obtain Chinese-English bilingual speech recognition
Model.Whole training process is completed on PC.
The non-mother tongue Model Fusion adjustment of Chinese-English bilingual speech recognition initial model uses selectable model merging method
Mother tongue model and non-mother tongue model are merged, and it is wrong to carry out minimum phone to the Chinese-English bilingual identification initial model after fusion
Distinction training by mistake, obtains Chinese-English bilingual speech recognition modeling.
Non-native speaker is lack of standardization often with mother tongue accent or pronunciation, must so as to which identifying system can cause to judge by accident
The initial model of identification must be adjusted using Model Fusion technology.The present invention uses selectable model merging method pair
Mother tongue model and non-mother tongue model are merged, and correct the parameter of recognition template, and its process is:
(11) a mother tongue model M 1 is obtained by the database training of pure mother tongue;
(12) model M 1 using maximum likelihood linear regression adaptively, obtain with a small amount of non-mother tongue database
To model M 2;
(13) by selectable model merger strategy, by the correspondence in Chinese-English bilingual speech recognition initial model, some is female
Voice element λ i model Sb, with λ i in the corresponding mother tongue model Sne and model M 2 of the phoneme λ i in model M 1 corresponding to it is adaptive
Phoneme λ i easy confusion tone element is corresponded in model Sa, and the Pronounceable dictionary obtained according to non-mother tongue easy confusion tone element changing method
γ j adaptive model γ m carry out linear interpolation fusion, the adjustment model Sf of the phoneme λ i after being merged.Model interpolation
Formula is as follows:
P (Sf)=λ 1p (Sb)+λ 2p (Sne)+λ 3p (Sa)+λ 4p (γ m)
Wherein λ 1, λ 2, λ 3 and λ 4 represent the interpolation factor of corresponding model respectively.
In order to obtain finer model, the discrimination of non-mother tongue Chinese-English bilingual is particularly further improved, the present invention
Distinction training technique is applied under bilingual environment first.According to MPE, (MinimumPhone Error, minimum phoneme are wrong
Criterion by mistake), MPE distinction training is carried out to obtained Chinese-English bilingual identification model:Come first by speech recognition device
The language of Chinese and English is obtained to the word lattice information of training voice, while by the prime word level markup information in voice training storehouse, training
Say model;Model parameter is updated finally by item algorithm is before and after Forward-Backward in obtained word lattice information.
After multiple parameter iteration valuation, model parameter has obtained further adjustment, and bigger distinctive is kept between model
And distinction;Chinese-English bilingual identification model after being adjusted according to non-mother tongue, both can guarantee that bilingual discrimination when voice is mother tongue
Do not reduce, while the bilingual discrimination of non-mother tongue has been significantly increased.Finally to the identification of mother tongue and non-mother tongue Chinese and English
Rate has all reached more than 98%.
The identification of Chinese-English bilingual voice command, it is the identification feature by extracting the voice signal of input, calculates Chinese-English double
The Gauss fraction of language speech recognition modeling, and template matches are carried out according to Chinese-English bilingual entry, the entry maximum by fraction is matched
As recognition result.The identification feature of the voice signal of input is extracted, the extraction side of conventional speech characteristic parameter can be used
Method.According to the Gauss fraction of feature calculation Chinese-English bilingual model, template matches are carried out according to Chinese-English bilingual entry, find out matching point
Number it is maximum for recognition result.To improve recognition speed and accuracy of identification, identification judging process is also divided into rough identification and fine
Identify two processes.The model parameter identified roughly is less, and for model parameter less than 200, rough recognition speed is fast.To some hairs
Sound is nonstandard or easily mixed voice is finely identified that the parameter of fine identification model is more again, probably at 1000 or so.
But because the candidate obtained after rough identification is seldom, although fine identification model number is more, recognition speed
Equally quickly.Two stage recognition not only improves the average speed of identification, and improves accuracy of identification.
Method for recognizing Chinese-English bilingual voice provided by the invention, realize the identification function of Chinese-English bilingual, the model of system
Scale does not expand compared to the identifying system of single language, and shared storage resource is smaller;Simultaneously under conditions of non-mother tongue is taken into account,
While ensureing mother tongue high discrimination, the high-performance of non-mother tongue identification is obtained, has additionally been improved using speech enhancement technique
Accuracy of identification under noise circumstance, suitable for the embedded realization of Chinese-English bilingual identification.
The present invention is carried out so that the bilingual name dial system of portable mobile phone Chinese and English of a reality is platform as an example
Experiment.Wherein identification mission is to include 500 English name-tos and 500 Chinese personal names.Experiment shows, in terms of amount of storage,
The amount of storage resource that the bilingual recognition methods of the present invention needs is close with the identification system of single language.Chinese and English can be handled simultaneously
The identification of name, while under conditions of non-mother tongue is taken into account, while ensureing mother tongue high discrimination, non-mother tongue identification is obtained
High-performance, the mother tongue of final system Chinese-English bilingual and non-mother tongue discrimination all reach more than 98%.Additionally increased using voice
Strong technology improves the accuracy of identification under noise circumstance, suitable for the embedded realization of Chinese-English bilingual identification.
The foregoing is only a preferred embodiment of the present invention, but protection scope of the present invention be not limited thereto,
Any one skilled in the art the invention discloses technical scope in, the change or replacement that can readily occur in,
It should all be included within the scope of the present invention.Therefore, protection scope of the present invention should be with scope of the claims
It is defined.
Claims (6)
1. the preemphasis of voice, is carried after a kind of method for recognizing Chinese-English bilingual voice of embedded system, including A/D samplings and sampling
The energy of high high-frequency signal, adding window sub-frame processing and the extraction of speech characteristic parameter, and according to the acoustic model pre-established,
The match cognization of voice command is carried out, it is characterized in that the process of establishing of the acoustic model is at the beginning of establishing Chinese-English bilingual speech recognition
The non-mother tongue Model Fusion adjustment of beginning model, Chinese-English bilingual speech recognition initial model;The match cognization tool of institute's speech commands
Body is the identification of Chinese-English bilingual voice command;
Wherein, the establishment Chinese-English bilingual speech recognition initial model includes revision Mandarin speech recognition model, revision English language
Sound identification model, merge the Chinese after revised Mandarin speech recognition model and English Phonetics identification model and training merging
Voice and English Phonetics identification model;
The non-mother tongue Model Fusion adjustment of the Chinese-English bilingual speech recognition initial model uses selectable model merging method
Mother tongue model and non-mother tongue model are merged, and minimum sound is carried out to the Chinese-English bilingual speech recognition initial model after fusion
Plain fault discrimination training, obtains Chinese-English bilingual speech recognition modeling;
Wherein, mother tongue model and non-mother tongue model are merged using selectable model merging method, comprised the following steps:
(11) a mother tongue model M 1 is obtained by the database training of pure mother tongue;
(12) model M 1 is carried out adaptively, obtaining mould using maximum likelihood linear regression with a small amount of non-mother tongue database
Type M2;
(13) by selectable model merger strategy, by correspondence some mother pronunciation in Chinese-English bilingual speech recognition initial model
Plain λ i model Sb, with adaptive model corresponding to λ i in the corresponding mother tongue model Sne and model M 2 of the phoneme λ i in model M 1
Phoneme λ i easy confusion tone element γ j are corresponded in Sa, and the Pronounceable dictionary obtained according to non-mother tongue easy confusion tone element changing method
Adaptive model γ m carry out linear interpolation fusion, the adjustment model Sf of the phoneme λ i after being merged;Interpolation formula is such as
Under:
P (Sf)=λ 1p (Sb)+λ 2p (Sne)+λ 3p (Sa)+λ 4p (γ m)
Wherein λ 1, λ 2, λ 3 and λ 4 represent the interpolation factor of corresponding model respectively;
The identification feature of the voice signal for being identified by extraction input of the Chinese-English bilingual voice command, calculates Chinese-English bilingual language
The Gauss fraction of sound identification model, template matches are carried out according to Chinese-English bilingual entry, the maximum entry of fraction will be matched as knowledge
Other result.
A kind of 2. method for recognizing Chinese-English bilingual voice of embedded system according to claim 1, it is characterized in that described embedding
The method for recognizing Chinese-English bilingual voice of embedded system also includes speech enhan-cement step.
A kind of 3. method for recognizing Chinese-English bilingual voice of embedded system according to claim 1 or 2, it is characterized in that described
Merge revised Mandarin speech recognition model and English Phonetics identification model is specifically, using the mould being aligned based on state for time
Type distance calculating method, Chinese and english the distance between phoneme two-by-two is calculated, then carry out a pair of minimum phonemes of distance
Merge.
A kind of 4. method for recognizing Chinese-English bilingual voice of embedded system according to claim 1 or 2, it is characterized in that described
Chinese speech and English Phonetics identification model after training merging, using the valuation of maximal possibility estimation criterion and expectation maximization
Iterative algorithm, obtain Chinese-English bilingual speech recognition initial model.
A kind of 5. method for recognizing Chinese-English bilingual voice of embedded system according to claim 1 or 2, it is characterized in that described
Chinese speech and English Phonetics identification model after training merging are completed on PC.
A kind of 6. method for recognizing Chinese-English bilingual voice of embedded system according to claim 1 or 2, it is characterized in that described
Chinese-English bilingual speech recognition initial model after fusion, which carries out minimum phoneme fault discrimination training, to be included:Use speech recognition device
To obtain training the word lattice information of voice;Train to obtain Chinese and english by the prime word level markup information in voice training storehouse
Language model;Front and rear item algorithm is done to update model parameter in obtained word lattice information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710793500.9A CN107564527A (en) | 2017-09-01 | 2017-09-01 | The method for recognizing Chinese-English bilingual voice of embedded system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710793500.9A CN107564527A (en) | 2017-09-01 | 2017-09-01 | The method for recognizing Chinese-English bilingual voice of embedded system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107564527A true CN107564527A (en) | 2018-01-09 |
Family
ID=60979474
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710793500.9A Withdrawn CN107564527A (en) | 2017-09-01 | 2017-09-01 | The method for recognizing Chinese-English bilingual voice of embedded system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107564527A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108877782A (en) * | 2018-07-04 | 2018-11-23 | 百度在线网络技术(北京)有限公司 | Audio recognition method and device |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004347761A (en) * | 2003-05-21 | 2004-12-09 | Internatl Business Mach Corp <Ibm> | Voice recognition device, voice recognition method, computer executable program and storage medium for performing the voice recognition method to computer |
CN101727901A (en) * | 2009-12-10 | 2010-06-09 | 清华大学 | Method for recognizing Chinese-English bilingual voice of embedded system |
-
2017
- 2017-09-01 CN CN201710793500.9A patent/CN107564527A/en not_active Withdrawn
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2004347761A (en) * | 2003-05-21 | 2004-12-09 | Internatl Business Mach Corp <Ibm> | Voice recognition device, voice recognition method, computer executable program and storage medium for performing the voice recognition method to computer |
CN101727901A (en) * | 2009-12-10 | 2010-06-09 | 清华大学 | Method for recognizing Chinese-English bilingual voice of embedded system |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108877782A (en) * | 2018-07-04 | 2018-11-23 | 百度在线网络技术(北京)有限公司 | Audio recognition method and device |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101727901B (en) | Method for recognizing Chinese-English bilingual voice of embedded system | |
CN105161093B (en) | A kind of method and system judging speaker's number | |
CN105632501B (en) | A kind of automatic accent classification method and device based on depth learning technology | |
CN111816165A (en) | Voice recognition method and device and electronic equipment | |
CN102122506B (en) | Method for recognizing voice | |
CN111667818B (en) | Method and device for training wake-up model | |
CN106847259B (en) | Method for screening and optimizing audio keyword template | |
Thangthai et al. | Improving lip-reading performance for robust audiovisual speech recognition using DNNs. | |
WO2015124006A1 (en) | Audio detection and classification method with customized function | |
Li et al. | Large margin HMMs for speech recognition | |
CN106340297A (en) | Speech recognition method and system based on cloud computing and confidence calculation | |
AU2001273410A1 (en) | Method and apparatus for constructing voice templates for a speaker-independent voice recognition system | |
CN103177721A (en) | Voice recognition method and system | |
CN108735200A (en) | A kind of speaker's automatic marking method | |
CN110189746A (en) | A kind of method for recognizing speech applied to earth-space communication | |
CN106023986A (en) | Voice identification method based on sound effect mode detection | |
Gupta et al. | Speech feature extraction and recognition using genetic algorithm | |
CN106887226A (en) | Speech recognition algorithm based on artificial intelligence recognition | |
CN107564527A (en) | The method for recognizing Chinese-English bilingual voice of embedded system | |
CN111429919B (en) | Crosstalk prevention method based on conference real recording system, electronic device and storage medium | |
CN114530141A (en) | Chinese and English mixed offline voice keyword recognition method under specific scene and system implementation thereof | |
Trabelsi et al. | A multi level data fusion approach for speaker identification on telephone speech | |
Obuchi et al. | Language identification using phonetic and prosodic HMMs with feature normalization | |
KR100586045B1 (en) | Recursive Speaker Adaptation Automation Speech Recognition System and Method using EigenVoice Speaker Adaptation | |
Nakamura et al. | Multi-modal temporal asynchronicity modeling by product HMMs for robust audio-visual speech recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20180109 |
|
WW01 | Invention patent application withdrawn after publication |