CN101727901A - Method for recognizing Chinese-English bilingual voice of embedded system - Google Patents
Method for recognizing Chinese-English bilingual voice of embedded system Download PDFInfo
- Publication number
- CN101727901A CN101727901A CN200910242406A CN200910242406A CN101727901A CN 101727901 A CN101727901 A CN 101727901A CN 200910242406 A CN200910242406 A CN 200910242406A CN 200910242406 A CN200910242406 A CN 200910242406A CN 101727901 A CN101727901 A CN 101727901A
- Authority
- CN
- China
- Prior art keywords
- model
- chinese
- english
- voice
- english bilingual
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 63
- 230000008569 process Effects 0.000 claims abstract description 12
- 238000005070 sampling Methods 0.000 claims abstract description 10
- 238000012545 processing Methods 0.000 claims abstract description 8
- 238000000605 extraction Methods 0.000 claims abstract description 7
- 230000019771 cognition Effects 0.000 claims description 34
- 238000012549 training Methods 0.000 claims description 27
- 206010038743 Restlessness Diseases 0.000 claims description 6
- 230000008878 coupling Effects 0.000 claims description 5
- 238000010168 coupling process Methods 0.000 claims description 5
- 238000005859 coupling reaction Methods 0.000 claims description 5
- 238000004364 calculation method Methods 0.000 claims description 4
- 238000001228 spectrum Methods 0.000 claims description 4
- 238000007476 Maximum Likelihood Methods 0.000 claims description 3
- 230000003044 adaptive effect Effects 0.000 claims description 3
- 238000009499 grossing Methods 0.000 claims description 3
- 238000001914 filtration Methods 0.000 claims description 2
- 230000004927 fusion Effects 0.000 claims description 2
- 230000007547 defect Effects 0.000 abstract 1
- 238000009432 framing Methods 0.000 abstract 1
- 238000005516 engineering process Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 239000000203 mixture Substances 0.000 description 3
- 230000008859 change Effects 0.000 description 2
- 230000007812 deficiency Effects 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 230000008901 benefit Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000007493 shaping process Methods 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Images
Landscapes
- Machine Translation (AREA)
Abstract
The invention belongs to the technical field of voice recognition, and in particular relates to a method for recognizing Chinese-English bilingual voice of an embedded system. The method comprises the following steps: A/D sampling, voice pre-emphasis after sampling, energy improvement on high-frequency signals, windowing and framing processing, extraction of voice characteristic parameters, and matching recognition on voice commands according to a pre-established acoustic model, wherein the process for establishing the acoustic model is to determine a Chinese-English bilingual voice recognition initial model, and integrate and adjust foreign language models of the Chinese-English bilingual voice recognition initial model; and the matching recognition of the voice commands is specifically recognition of the Chinese-English bilingual voice commands. The method overcomes the defect that the conventional voice recognition system can only recognize single language.
Description
Technical field
The invention belongs to the speech recognition technology field, relate in particular to a kind of method for recognizing Chinese-English bilingual voice of embedded system.
Background technology
In recent years, external speech recognition special chip development is very fast.More external voice technologies and semiconductor company all drop into a large amount of man power and materials and develop the speech recognition special chip, and the speech recognition algorithm of own national language is carried out patent protection.The speech recognition performance of these special uses (system) chip also has nothing in common with each other.The process of common speech recognition as shown in Figure 1, the voice signal of input is at first sampled through A/D, frequency spectrum shaping windowing pre-emphasis is handled, improve radio-frequency component, carry out real-time characteristic parameter extraction, the parameter of extraction is a Mel frequency marking cepstrum coefficient (MFCC), carries out speech recognition template training and speech recognition template matches simultaneously, in order to improve the chip identification performance robustness under the noise circumstance, also can carry out the processing that voice strengthen.Special chip generally comprises 8 or 16 MCU controllers or 16 bit DSP microprocessors and coupled automatic gain control (AGC), audio frequency preamplifier, low-pass filter, D/A (A/D) converter, mould/number (D/A) converter, audio-frequency power amplifier, ROM (read-only memory) (ROM).These speech recognition special use (system) chips have begun to be applied on intelligent sound toy, the mobile communication terminal.
But the high-performance speech recognition special chip of existing medium vocabulary can only identification form languages language, that is to say that identification mission can only be made of the verbal order of single languages such as Chinese or English or Japanese, do not support the identification of bilingual (mixing) order such as Chinese-English bilingual.
Yet, along with deepening continuously of internationalization trend, no matter be economical, political, still culture, academic, the bilingual phenomenon that people are occurred in daily life is more and more general, such as Sino-British two-character given name etc.Thereby, only make up the requirement that more and more can not comply with era development based on the speech recognition system of single language such as Chinese or English.Particularly as maximum and most popular Chinese of number of users and English in the world, makes up one and can carry out Chinese and English and mix the system that discerns, and he is realized on portable equipments such as special chip system, seem extremely important.
Summary of the invention
The objective of the invention is,, propose a kind of method for recognizing Chinese-English bilingual voice of embedded system for overcoming the deficiency that existing chip system can only the identification form language.This method is based on Chinese-English bilingual Embedded Speech Recognition System, the embedded speech Enhancement Method that phoneme merges modeling.
Technical scheme is, a kind of method for recognizing Chinese-English bilingual voice of embedded system, the pre-emphasis that comprises A/D sampling and sampling back voice, improve the energy of high-frequency signal, windowing divides the extraction of frame processing and speech characteristic parameter, and, carry out the coupling identification of voice command according to the acoustic model of setting up in advance, the process of setting up that it is characterized in that described acoustic model is that the non-mother tongue model of establishing Chinese-English bilingual speech recognition initial model, Chinese-English bilingual speech recognition initial model merges adjustment; The coupling identification of described voice command specifically is the identification of Chinese-English bilingual voice command;
Wherein, described establishment Chinese-English bilingual speech recognition initial model comprises revision Chinese speech model of cognition, revision English Phonetics model of cognition, merges Chinese speech and English Phonetics model of cognition after revised Chinese speech model of cognition and English Phonetics model of cognition and training merge;
The non-mother tongue model of described Chinese-English bilingual speech recognition initial model merges adjustment and adopts selectable model merging method that mother tongue model and non-mother tongue model are merged, and the Chinese-English bilingual speech recognition initial model after merging carried out the training of minimum phoneme fault discrimination, obtain the Chinese-English bilingual speech recognition modeling;
The identification of described Chinese-English bilingual voice command is calculated Gauss's mark of Chinese-English bilingual speech recognition modeling by extracting the recognition feature of the voice signal of importing, and carries out template matches according to the Chinese-English bilingual entry, will mate the entry of mark maximum as recognition result.
Described method comprises that also voice strengthen step.
Revised Chinese speech model of cognition of described merging and English Phonetics model of cognition specifically are, employing is based on the modal distance computing method of state time alignment, calculate the Chinese and english distance between the phoneme in twos, will merge apart from a pair of phoneme of minimum then.
Chinese speech and English Phonetics model of cognition after described training merges, the valuation iterative algorithm of employing maximal possibility estimation criterion and expectation maximization obtains Chinese-English bilingual speech recognition initial model.
Chinese speech and English Phonetics model of cognition after described training merges are finished on PC.
The selectable model merging method of described employing merges mother tongue model and non-mother tongue model, comprises the following steps:
(11) the database training by pure mother tongue obtains a mother tongue model M 1;
(12) use the linear homing method of maximum likelihood to carry out self-adaptation with a spot of non-mother tongue database to model M 1, obtain model M 2;
(13) by selectable model merger strategy, with certain mother tongue phoneme λ of the correspondence in the Chinese-English bilingual speech recognition initial model
iModel S
b, with the phoneme λ in the model M 1
iCorresponding mother tongue model S
NeWith λ in the model M 2
iCorresponding adaptive model S
a, and corresponding phoneme λ in the Pronounceable dictionary that obtains according to the plain changing method of non-mother tongue easy confusion tone
iThe plain γ of easy confusion tone
jAdaptive model γ
mCarry out linear interpolation and merge the phoneme λ after obtaining merging
iAdjustment model S
fThe model interpolation formula is as follows:
p(S
f)=λ
1p(S
b)+λ
2p(S
ne)+λ
3p(S
a)+λ
4p(γ
m)
λ wherein
1, λ
2, λ
3And λ
4The interpolation factor of representing corresponding model respectively.
Chinese-English bilingual speech recognition initial model after the described fusion carries out the training of minimum phoneme fault discrimination and comprises: use speech recognition device to obtain the speech lattice information of training utterance; Prime word level markup information by the voice training storehouse is trained the language model that obtains Chinese and english; An algorithm upgrades model parameter before and after doing on the speech lattice information that obtains.
Described voice strengthen step and adopt improved Wiener filtering algorithm, comprise the following steps:
(21) use the initial value of one section typical ground unrest as Noise Estimation;
(22) utilize sliding filter and tri-state state machine to carry out the walkaway of robust, noisy speech signal for different input signal-to-noise ratios, the output and the pre-set threshold of wave filter are compared, whether be in ground unrest according to decision condition decision current frame signal; If, execution in step (23) then;
(23) adopt the Decision-Directed algorithm to carry out the estimation of present frame priori signal to noise ratio (S/N ratio), and utilize historical frames information to carry out the renewal of noise signal;
(24) adopt two-stage interframe smoothing processing, improve the continuity that strengthens the voice signal frequency spectrum, reduce the distortion of voice signal.
The estimation of described present frame priori signal to noise ratio (S/N ratio) is by former frame priori signal to noise ratio (S/N ratio)
Estimation γ with present frame posteriority signal to noise ratio (S/N ratio)
k(n) weighting obtains, and computing formula is:
Wherein,
Estimation for present frame priori signal to noise ratio (S/N ratio); P is a feedback factor, is used to control the contribution to present frame priori SNR estimation of previous frame and present frame; A is the control converging factor.
Method provided by the invention has overcome the deficiency that existing chip system can only the identification form language, and it is low to have an algorithm complex, discerns the good characteristics of sane performance under the high and noise circumstance of accuracy of identification.
Description of drawings
Fig. 1 is a speech recognition synoptic diagram commonly used at present;
Fig. 2 is a method for recognizing Chinese-English bilingual voice process synoptic diagram provided by the invention;
Fig. 3 is that Chinese obscure the phoneme change list when saying English;
Fig. 4 is based on the time slice information synoptic diagram that the phoneme merging method of state time alignment obtains.
Embodiment
Below in conjunction with accompanying drawing, preferred embodiment is elaborated.Should be emphasized that following explanation only is exemplary, rather than in order to limit the scope of the invention and to use.
Fig. 2 is a method for recognizing Chinese-English bilingual voice process synoptic diagram provided by the invention.Among Fig. 2, the method for recognizing Chinese-English bilingual voice of embedded system provided by the invention, comprise the steps: the pre-emphasis of A/D sampling and sampling back voice, improve the energy of high-frequency signal, windowing divides the extraction of frame processing and speech characteristic parameter, establish Chinese-English bilingual speech recognition initial model, the non-mother tongue model of Chinese-English bilingual speech recognition initial model merges the identification of adjustment and Chinese-English bilingual voice command.Wherein, the pre-emphasis of A/D sampling and sampling back voice, improve the energy of high-frequency signal, windowing divides frame to handle and the extraction of speech characteristic parameter is existing technology, establish Chinese-English bilingual speech recognition initial model, the identification that the non-mother tongue model of Chinese-English bilingual speech recognition initial model merges adjustment and Chinese-English bilingual voice command is the new technology that the present invention proposes.
Establishing Chinese-English bilingual speech recognition initial model comprises revision Chinese speech model of cognition, revision English Phonetics model of cognition, merges Chinese speech and English Phonetics model of cognition after revised Chinese speech model of cognition and English Phonetics model of cognition and training merge.
Revision Chinese speech model of cognition and English Phonetics model of cognition are at first said the pronunciation difference finishing Pronounceable dictionary (being the Chinese and english speech recognition modeling) that is right the English or the foreigner literary composition produced according to Chinese.Mainly contain based on expertise with based on two kinds of methods of data-driven.In the present invention,, can under expertise instructs, obtain highly versatile like this, rely on the little pronunciation Changing Pattern of non-mother tongue pronunciation data volume, can have data-driven concurrently again simultaneously in conjunction with two kinds of strategies.Thereby it is good to realize with the real data matching, and manual intervention is few, propagable advantage.When using the method for data-driven, the archiphoneme mark of combined training data and the identification of recognizer are marked the phoneme matrix of easily being obscured, determine final pronunciation Changing Pattern in conjunction with the guidance of expertise then.Say that with Chinese English is example, Fig. 3 is that Chinese obscure the phoneme change list when saying English, among Fig. 3, according to this phoneme Changing Pattern of determining at last, revises English Pronounceable dictionary again.
Behind revision Chinese speech model of cognition and English Phonetics model of cognition, two models revising are merged, obtain unified and the less mode set of scale.Obtain a less model of cognition of scale and just must will carry out the merging of Chinese and English model of cognition, in order to guarantee high recognition, when merging, some enough near models of distance on the acoustic model space are merged simultaneously.The present invention adopts and weighs two distances between model based on the method model distance calculating method of state time alignment.With two phoneme model Chinese phoneme λ
iWith English phoneme γ
jBeing the distance calculating method between two models of example explanation, is earlier that the plurality of sections voice prepared in two phonemes from the voice of artificial mark, then with λ
iThis phoneme λ used respectively in each section voice
iWith the other side's phoneme γ
jCarry out viterbi (Viterbi) state time alignment, obtain segment information as shown in Figure 4.λ wherein
iAnd γ
jTwo models before expression does not merge respectively.As we know from the figure, can obtain 5 sections carve informations,, calculate the Bhattacharyya distance of last two models of each section, be designated as D then according to the time corresponding section
Mn, be weighted as weight with the length of time period at last and obtain a distance and be:
Conversely, with γ
jThis phoneme γ used respectively in each section voice
jWith the other side's phoneme λ
iCarry out viterbi (Viterbi) state time alignment, same method obtains D (γ
j, λ
i), final mask λ
iAnd γ
jBetween distance be
According to above computing method, obtain the Chinese and English distance between the phoneme in twos, will merge apart from a pair of phoneme of minimum then.Carry out the circulation that phoneme merges according to this process, drop to till the quantity that needs up to the phoneme number.According to the distance calculating method of introducing above based on the state time alignment, Chinese phoneme and English phoneme have been merged 15 pairs altogether, significantly reduced the scale of phone set, be fit to the resource requirement of embedded system.
Next by a large amount of Chinese and English Phonetics database, Chinese speech after being combined and English Phonetics model of cognition are trained, here adopt MLE (Maximum likelylood estimation, maximal possibility estimation) criterion and EM (Expectation Maximum, expectation maximization) valuation iterative algorithm carries out, and obtains Chinese-English bilingual speech recognition initial model.Whole training process is finished on PC.
The non-mother tongue model of Chinese-English bilingual speech recognition initial model merges adjustment and adopts selectable model merging method that mother tongue model and non-mother tongue model are merged, and the Chinese-English bilingual after merging is discerned initial model carry out the training of minimum phone fault discrimination, obtain the Chinese-English bilingual speech recognition modeling.
Non-mother tongue speaker often have the mother tongue accent or pronounce lack of standardization, thereby recognition system can cause erroneous judgement, must adopt the model integration technology come to identification initial model adjust.The present invention adopts selectable model merging method that mother tongue model and non-mother tongue model are merged, and revises the parameter of recognition template, and its process is:
(11) the database training by pure mother tongue obtains a mother tongue model M 1;
(12) use the linear homing method of maximum likelihood to carry out self-adaptation with a spot of non-mother tongue database to model M 1, obtain model M 2;
(13) by selectable model merger strategy, with certain mother tongue phoneme λ of the correspondence in the Chinese-English bilingual speech recognition initial model
iModel S
b, with the phoneme λ in the model M 1
iCorresponding mother tongue model S
NeWith λ in the model M 2
iCorresponding adaptive model S
a, and corresponding phoneme λ in the Pronounceable dictionary that obtains according to the plain changing method of non-mother tongue easy confusion tone
iThe plain γ of easy confusion tone
jAdaptive model γ
mCarry out linear interpolation and merge the phoneme λ after obtaining merging
iAdjustment model S
fThe model interpolation formula is as follows:
p(S
f)=λ1p(S
b)+λ
2p(S
ne)+λ
3p(S
a)+λ
4p(γ
m)
λ wherein
1, λ
2, λ
3And λ
4The interpolation factor of representing corresponding model respectively.
In order to obtain meticulousr model, particularly further improve the discrimination of non-mother tongue Chinese-English bilingual, the present invention is applied to the property distinguished training technique under the bilingual environment first.According to MPE (MinimumPhone Error, minimum phoneme mistake) criterion, the Chinese-English bilingual model of cognition that has obtained is carried out the training of the MPE property distinguished: at first use speech recognition device to obtain the speech lattice information of training utterance, by the prime word level markup information in voice training storehouse, training obtains the language model of Chinese and English simultaneously; Upgrade model parameter by an algorithm before and after on the speech lattice information that obtains, being Forward-Backward at last.Through after the parameter iteration valuation repeatedly, model parameter has obtained further adjustment, keeps bigger distinctive and the property distinguished between the model; According to the adjusted Chinese-English bilingual model of cognition of non-mother tongue, can guarantee that the bilingual discrimination when voice are mother tongue does not reduce, improved the bilingual discrimination of non-mother tongue simultaneously significantly.Finally the discrimination to mother tongue and non-mother tongue Chinese and English has all reached more than 98%.
The identification of Chinese-English bilingual voice command is the recognition feature by the voice signal that extracts input, calculates Gauss's mark of Chinese-English bilingual speech recognition modeling, and carries out template matches according to the Chinese-English bilingual entry, will mate the entry of mark maximum as recognition result.Extract the recognition feature of the voice signal of input, can adopt the extracting method of speech characteristic parameter commonly used.Gauss's mark according to feature calculation Chinese-English bilingual model carries out template matches according to the Chinese-English bilingual entry, finds out the recognition result that is of coupling mark maximum.For improving recognition speed and accuracy of identification, the identification judging process also is divided into rough identification and two processes of meticulous identification.The model parameter of rough identification is less, and model parameter is less than 200, and rough recognition speed is fast.Some pronunciations voice nonstandard or that easily mix are carried out meticulous identification again, and the parameter of meticulous model of cognition is more, probably about 1000.But because the candidate who obtains after the rough identification of process seldom, although meticulous model of cognition number is more, recognition speed is equally very fast.Two-stage identification not only improves the average velocity of identification, and has improved accuracy of identification.
In order to improve the performance of speech recognition under the noise circumstance, the present invention can also comprise that voice strengthen step.Voice strengthen step specifically:
(21) use the initial value of one section typical ground unrest as Noise Estimation.
(22) utilize sliding filter and tri-state state machine to carry out the walkaway of robust, noisy speech signal for different input signal-to-noise ratios, the output and the pre-set threshold of wave filter are compared, whether be in ground unrest according to decision condition decision current frame signal; If, execution in step (23) then; Otherwise, finish.
(23) adopt the Decision-Directed algorithm to carry out the estimation of present frame priori signal to noise ratio (S/N ratio), and utilize historical frames information to carry out the renewal of noise signal.The estimation of present frame priori signal to noise ratio (S/N ratio) is by former frame priori signal to noise ratio (S/N ratio)
Estimation γ with present frame posteriority signal to noise ratio (S/N ratio)
k(n) weighting obtains, and computing formula is:
(24) adopt two-stage interframe smoothing processing simultaneously, improved the continuity that strengthens the voice signal frequency spectrum, reduce the distortion of voice signal.
Method for recognizing Chinese-English bilingual voice provided by the invention has realized that the recognition function of Chinese-English bilingual, the model scale of system compare the recognition system of single language and do not enlarge, and shared storage resources is less; Taking into account under the condition of non-mother tongue simultaneously, when guaranteeing the high discrimination of mother tongue, obtaining the high-performance of non-mother tongue identification, adopting speech enhancement technique to improve the accuracy of identification under the noise circumstance in addition, be applicable to the embedded realization of Chinese-English bilingual identification.
The present invention is that platform is that example experimentizes with the bilingual name dial system of portable mobile phone Chinese and English of a reality.Wherein identification mission comprises 500 English name-tos and 500 Chinese names in being.Experiment shows that aspect memory space, the memory space resource that bilingual recognition methods of the present invention needs is close with the identification system of single language.Can handle the identification of Chinese and English name simultaneously, take into account under the condition of non-mother tongue simultaneously, when guaranteeing the high discrimination of mother tongue, obtain the high-performance of non-mother tongue identification, the mother tongue of final system Chinese-English bilingual and non-mother tongue discrimination have all arrived more than 98%.Adopt speech enhancement technique to improve the accuracy of identification under the noise circumstance in addition, be applicable to the embedded realization of Chinese-English bilingual identification.
The above; only for the preferable embodiment of the present invention, but protection scope of the present invention is not limited thereto, and anyly is familiar with those skilled in the art in the technical scope that the present invention discloses; the variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claim.
Claims (9)
1. the method for recognizing Chinese-English bilingual voice of an embedded system, the pre-emphasis that comprises A/D sampling and sampling back voice, improve the energy of high-frequency signal, windowing divides the extraction of frame processing and speech characteristic parameter, and according to the acoustic model of setting up in advance, carry out the coupling identification of voice command, the process of setting up that it is characterized in that described acoustic model is that the non-mother tongue model of establishing Chinese-English bilingual speech recognition initial model, Chinese-English bilingual speech recognition initial model merges adjustment; The coupling identification of described voice command specifically is the identification of Chinese-English bilingual voice command;
Wherein, described establishment Chinese-English bilingual speech recognition initial model comprises revision Chinese speech model of cognition, revision English Phonetics model of cognition, merges Chinese speech and English Phonetics model of cognition after revised Chinese speech model of cognition and English Phonetics model of cognition and training merge;
The non-mother tongue model of described Chinese-English bilingual speech recognition initial model merges adjustment and adopts selectable model merging method that mother tongue model and non-mother tongue model are merged, and the Chinese-English bilingual speech recognition initial model after merging carried out the training of minimum phoneme fault discrimination, obtain the Chinese-English bilingual speech recognition modeling;
The identification of described Chinese-English bilingual voice command is calculated Gauss's mark of Chinese-English bilingual speech recognition modeling by extracting the recognition feature of the voice signal of importing, and carries out template matches according to the Chinese-English bilingual entry, will mate the entry of mark maximum as recognition result.
2. the method for recognizing Chinese-English bilingual voice of a kind of embedded system according to claim 1 is characterized in that described method comprises that also voice strengthen step.
3. the method for recognizing Chinese-English bilingual voice of a kind of embedded system according to claim 1 and 2, it is characterized in that revised Chinese speech model of cognition of described merging and English Phonetics model of cognition specifically are, employing is based on the modal distance computing method of state time alignment, calculate the Chinese and english distance between the phoneme in twos, will merge apart from a pair of phoneme of minimum then.
4. the method for recognizing Chinese-English bilingual voice of a kind of embedded system according to claim 1 and 2, it is characterized in that Chinese speech and English Phonetics model of cognition after described training merges, adopt the valuation iterative algorithm of maximal possibility estimation criterion and expectation maximization, obtain Chinese-English bilingual speech recognition initial model.
5. the method for recognizing Chinese-English bilingual voice of a kind of embedded system according to claim 1 and 2 is characterized in that Chinese speech and the English Phonetics model of cognition after described training merges finished on PC.
6. the method for recognizing Chinese-English bilingual voice of a kind of embedded system according to claim 1 and 2 is characterized in that the selectable model merging method of described employing merges mother tongue model and non-mother tongue model, comprises the following steps:
(11) the database training by pure mother tongue obtains a mother tongue model M 1;
(12) use the linear homing method of maximum likelihood to carry out self-adaptation with a spot of non-mother tongue database to model M 1, obtain model M 2;
(13) by selectable model merger strategy, with certain mother tongue phoneme λ of the correspondence in the Chinese-English bilingual speech recognition initial model
iModel S
b, with the phoneme λ in the model M 1
iCorresponding mother tongue model S
NeWith λ in the model M 2
iCorresponding adaptive model S
a, and corresponding phoneme λ in the Pronounceable dictionary that obtains according to the plain changing method of non-mother tongue easy confusion tone
iThe plain γ of easy confusion tone
jAdaptive model γ
mCarry out linear interpolation and merge the phoneme λ after obtaining merging
iAdjustment model S
fInterpolation formula is as follows:
p(S
f)=λ
1p(S
b)+λ
2p(S
ne)+λ
3p(S
a)+λ
4p(γ
m)
λ wherein
1, λ
2, λ
3And λ
4The interpolation factor of representing corresponding model respectively.
7. the method for recognizing Chinese-English bilingual voice of a kind of embedded system according to claim 1 and 2 is characterized in that Chinese-English bilingual speech recognition initial model after the described fusion carries out the training of minimum phoneme fault discrimination and comprises: use speech recognition device to obtain the speech lattice information of training utterance; Prime word level markup information by the voice training storehouse is trained the language model that obtains Chinese and english; An algorithm upgrades model parameter before and after doing on the speech lattice information that obtains.
8. the method for recognizing Chinese-English bilingual voice of a kind of embedded system according to claim 1 and 2 is characterized in that described voice strengthen step and adopt improved Wiener filtering algorithm, comprises the following steps:
(21) use the initial value of one section typical ground unrest as Noise Estimation;
(22) utilize sliding filter and tri-state state machine to carry out the walkaway of robust, noisy speech signal for different input signal-to-noise ratios, the output and the pre-set threshold of wave filter are compared, whether be in ground unrest according to decision condition decision current frame signal; If, execution in step (23) then; Otherwise, finish;
(23) adopt the Decision-Directed algorithm to carry out the estimation of present frame priori signal to noise ratio (S/N ratio), and utilize historical frames information to carry out the renewal of noise signal;
(24) adopt two-stage interframe smoothing processing, improve the continuity that strengthens the voice signal frequency spectrum, reduce the distortion of voice signal.
9. the method for recognizing Chinese-English bilingual voice of a kind of embedded system according to claim 2 is characterized in that the estimation of described present frame priori signal to noise ratio (S/N ratio), by former frame priori signal to noise ratio (S/N ratio)
Estimation γ with present frame posteriority signal to noise ratio (S/N ratio)
k(n) weighting obtains, and computing formula is:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200910242406XA CN101727901B (en) | 2009-12-10 | 2009-12-10 | Method for recognizing Chinese-English bilingual voice of embedded system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN200910242406XA CN101727901B (en) | 2009-12-10 | 2009-12-10 | Method for recognizing Chinese-English bilingual voice of embedded system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN101727901A true CN101727901A (en) | 2010-06-09 |
CN101727901B CN101727901B (en) | 2011-11-09 |
Family
ID=42448692
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN200910242406XA Expired - Fee Related CN101727901B (en) | 2009-12-10 | 2009-12-10 | Method for recognizing Chinese-English bilingual voice of embedded system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN101727901B (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103412856A (en) * | 2013-01-14 | 2013-11-27 | 刘恒 | Portable China and foreign language translation machine |
CN104167206A (en) * | 2013-05-17 | 2014-11-26 | 佳能株式会社 | Acoustic model combination method and device, and voice identification method and system |
CN105161092A (en) * | 2015-09-17 | 2015-12-16 | 百度在线网络技术(北京)有限公司 | Speech recognition method and device |
WO2016110068A1 (en) * | 2015-01-07 | 2016-07-14 | 中兴通讯股份有限公司 | Voice switching method and apparatus for voice recognition device |
CN106448655A (en) * | 2016-10-18 | 2017-02-22 | 江西博瑞彤芸科技有限公司 | Speech identification method |
CN106878805A (en) * | 2017-02-06 | 2017-06-20 | 广东小天才科技有限公司 | Mixed language subtitle file generation method and device |
CN107564527A (en) * | 2017-09-01 | 2018-01-09 | 平顶山学院 | The method for recognizing Chinese-English bilingual voice of embedded system |
CN108510978A (en) * | 2018-04-18 | 2018-09-07 | 中国人民解放军62315部队 | The modeling method and system of a kind of English acoustic model applied to languages identification |
CN108630192A (en) * | 2017-03-16 | 2018-10-09 | 清华大学 | A kind of non-methods for mandarin speech recognition, system and its building method |
CN110634487A (en) * | 2019-10-24 | 2019-12-31 | 科大讯飞股份有限公司 | Bilingual mixed speech recognition method, device, equipment and storage medium |
CN111816169A (en) * | 2020-07-23 | 2020-10-23 | 苏州思必驰信息科技有限公司 | Method and device for training Chinese and English hybrid speech recognition model |
CN112071307A (en) * | 2020-09-15 | 2020-12-11 | 江苏慧明智能科技有限公司 | Intelligent incomplete voice recognition method for elderly people |
CN112652311A (en) * | 2020-12-01 | 2021-04-13 | 北京百度网讯科技有限公司 | Chinese and English mixed speech recognition method and device, electronic equipment and storage medium |
CN113692616A (en) * | 2019-05-03 | 2021-11-23 | 谷歌有限责任公司 | Phoneme-based contextualization for cross-language speech recognition in an end-to-end model |
WO2022105235A1 (en) * | 2020-11-18 | 2022-05-27 | 华为技术有限公司 | Information recognition method and apparatus, and storage medium |
-
2009
- 2009-12-10 CN CN200910242406XA patent/CN101727901B/en not_active Expired - Fee Related
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103412856A (en) * | 2013-01-14 | 2013-11-27 | 刘恒 | Portable China and foreign language translation machine |
CN104167206A (en) * | 2013-05-17 | 2014-11-26 | 佳能株式会社 | Acoustic model combination method and device, and voice identification method and system |
CN104167206B (en) * | 2013-05-17 | 2017-05-31 | 佳能株式会社 | Acoustic model merging method and equipment and audio recognition method and system |
WO2016110068A1 (en) * | 2015-01-07 | 2016-07-14 | 中兴通讯股份有限公司 | Voice switching method and apparatus for voice recognition device |
CN105161092A (en) * | 2015-09-17 | 2015-12-16 | 百度在线网络技术(北京)有限公司 | Speech recognition method and device |
CN105161092B (en) * | 2015-09-17 | 2017-03-01 | 百度在线网络技术(北京)有限公司 | A kind of audio recognition method and device |
CN106448655A (en) * | 2016-10-18 | 2017-02-22 | 江西博瑞彤芸科技有限公司 | Speech identification method |
CN106878805A (en) * | 2017-02-06 | 2017-06-20 | 广东小天才科技有限公司 | Mixed language subtitle file generation method and device |
CN108630192A (en) * | 2017-03-16 | 2018-10-09 | 清华大学 | A kind of non-methods for mandarin speech recognition, system and its building method |
CN108630192B (en) * | 2017-03-16 | 2020-06-26 | 清华大学 | non-Chinese speech recognition method, system and construction method thereof |
CN107564527A (en) * | 2017-09-01 | 2018-01-09 | 平顶山学院 | The method for recognizing Chinese-English bilingual voice of embedded system |
CN108510978A (en) * | 2018-04-18 | 2018-09-07 | 中国人民解放军62315部队 | The modeling method and system of a kind of English acoustic model applied to languages identification |
CN108510978B (en) * | 2018-04-18 | 2020-08-21 | 中国人民解放军62315部队 | English acoustic model modeling method and system applied to language identification |
CN113692616A (en) * | 2019-05-03 | 2021-11-23 | 谷歌有限责任公司 | Phoneme-based contextualization for cross-language speech recognition in an end-to-end model |
CN113692616B (en) * | 2019-05-03 | 2024-01-05 | 谷歌有限责任公司 | Phoneme-based contextualization for cross-language speech recognition in an end-to-end model |
CN110634487A (en) * | 2019-10-24 | 2019-12-31 | 科大讯飞股份有限公司 | Bilingual mixed speech recognition method, device, equipment and storage medium |
CN111816169A (en) * | 2020-07-23 | 2020-10-23 | 苏州思必驰信息科技有限公司 | Method and device for training Chinese and English hybrid speech recognition model |
CN112071307A (en) * | 2020-09-15 | 2020-12-11 | 江苏慧明智能科技有限公司 | Intelligent incomplete voice recognition method for elderly people |
WO2022105235A1 (en) * | 2020-11-18 | 2022-05-27 | 华为技术有限公司 | Information recognition method and apparatus, and storage medium |
CN112652311A (en) * | 2020-12-01 | 2021-04-13 | 北京百度网讯科技有限公司 | Chinese and English mixed speech recognition method and device, electronic equipment and storage medium |
US11893977B2 (en) | 2020-12-01 | 2024-02-06 | Beijing Baidu Netcom Science Technology Co., Ltd. | Method for recognizing Chinese-English mixed speech, electronic device, and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN101727901B (en) | 2011-11-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN101727901A (en) | Method for recognizing Chinese-English bilingual voice of embedded system | |
US11062699B2 (en) | Speech recognition with trained GMM-HMM and LSTM models | |
CN103971685B (en) | Method and system for recognizing voice commands | |
US8930196B2 (en) | System for detecting speech interval and recognizing continuous speech in a noisy environment through real-time recognition of call commands | |
EP1557822B1 (en) | Automatic speech recognition adaptation using user corrections | |
CN101118745B (en) | Confidence degree quick acquiring method in speech identification system | |
CN101246685B (en) | Pronunciation quality evaluation method of computer auxiliary language learning system | |
CN101645271B (en) | Rapid confidence-calculation method in pronunciation quality evaluation system | |
CN103077708B (en) | Method for improving rejection capability of speech recognition system | |
WO2008024148A1 (en) | Incrementally regulated discriminative margins in mce training for speech recognition | |
CN107093422B (en) | Voice recognition method and voice recognition system | |
CN104036774A (en) | Method and system for recognizing Tibetan dialects | |
CN102122506A (en) | Method for recognizing voice | |
CN112349289B (en) | Voice recognition method, device, equipment and storage medium | |
CN112233651B (en) | Dialect type determining method, device, equipment and storage medium | |
US11705116B2 (en) | Language and grammar model adaptation using model weight data | |
CN102982799A (en) | Speech recognition optimization decoding method integrating guide probability | |
CN102693723A (en) | Method and device for recognizing speaker-independent isolated word based on subspace | |
Adell et al. | Comparative study of automatic phone segmentation methods for TTS | |
CN103474062A (en) | Voice identification method | |
CN106887226A (en) | Speech recognition algorithm based on artificial intelligence recognition | |
CN111933121B (en) | Acoustic model training method and device | |
CN112863486B (en) | Voice-based spoken language evaluation method and device and electronic equipment | |
CN107564527A (en) | The method for recognizing Chinese-English bilingual voice of embedded system | |
Li et al. | English sentence recognition based on hmm and clustering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20181121 Address after: 100085 Beijing Haidian District Shangdi Information Industry Base Pioneer Road 1 B Block 2 Floor 2030 Patentee after: Beijing Huacong Zhijia Technology Co., Ltd. Address before: 100084 mailbox 100084-82, Beijing City Patentee before: Tsinghua University |
|
CF01 | Termination of patent right due to non-payment of annual fee | ||
CF01 | Termination of patent right due to non-payment of annual fee |
Granted publication date: 20111109 Termination date: 20201210 |