CN101727901A - Method for recognizing Chinese-English bilingual voice of embedded system - Google Patents

Method for recognizing Chinese-English bilingual voice of embedded system Download PDF

Info

Publication number
CN101727901A
CN101727901A CN200910242406A CN200910242406A CN101727901A CN 101727901 A CN101727901 A CN 101727901A CN 200910242406 A CN200910242406 A CN 200910242406A CN 200910242406 A CN200910242406 A CN 200910242406A CN 101727901 A CN101727901 A CN 101727901A
Authority
CN
China
Prior art keywords
model
chinese
english
voice
english bilingual
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN200910242406A
Other languages
Chinese (zh)
Other versions
CN101727901B (en
Inventor
刘加
钱彦旻
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Huacong Zhijia Technology Co Ltd
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN200910242406XA priority Critical patent/CN101727901B/en
Publication of CN101727901A publication Critical patent/CN101727901A/en
Application granted granted Critical
Publication of CN101727901B publication Critical patent/CN101727901B/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

The invention belongs to the technical field of voice recognition, and in particular relates to a method for recognizing Chinese-English bilingual voice of an embedded system. The method comprises the following steps: A/D sampling, voice pre-emphasis after sampling, energy improvement on high-frequency signals, windowing and framing processing, extraction of voice characteristic parameters, and matching recognition on voice commands according to a pre-established acoustic model, wherein the process for establishing the acoustic model is to determine a Chinese-English bilingual voice recognition initial model, and integrate and adjust foreign language models of the Chinese-English bilingual voice recognition initial model; and the matching recognition of the voice commands is specifically recognition of the Chinese-English bilingual voice commands. The method overcomes the defect that the conventional voice recognition system can only recognize single language.

Description

The method for recognizing Chinese-English bilingual voice of embedded system
Technical field
The invention belongs to the speech recognition technology field, relate in particular to a kind of method for recognizing Chinese-English bilingual voice of embedded system.
Background technology
In recent years, external speech recognition special chip development is very fast.More external voice technologies and semiconductor company all drop into a large amount of man power and materials and develop the speech recognition special chip, and the speech recognition algorithm of own national language is carried out patent protection.The speech recognition performance of these special uses (system) chip also has nothing in common with each other.The process of common speech recognition as shown in Figure 1, the voice signal of input is at first sampled through A/D, frequency spectrum shaping windowing pre-emphasis is handled, improve radio-frequency component, carry out real-time characteristic parameter extraction, the parameter of extraction is a Mel frequency marking cepstrum coefficient (MFCC), carries out speech recognition template training and speech recognition template matches simultaneously, in order to improve the chip identification performance robustness under the noise circumstance, also can carry out the processing that voice strengthen.Special chip generally comprises 8 or 16 MCU controllers or 16 bit DSP microprocessors and coupled automatic gain control (AGC), audio frequency preamplifier, low-pass filter, D/A (A/D) converter, mould/number (D/A) converter, audio-frequency power amplifier, ROM (read-only memory) (ROM).These speech recognition special use (system) chips have begun to be applied on intelligent sound toy, the mobile communication terminal.
But the high-performance speech recognition special chip of existing medium vocabulary can only identification form languages language, that is to say that identification mission can only be made of the verbal order of single languages such as Chinese or English or Japanese, do not support the identification of bilingual (mixing) order such as Chinese-English bilingual.
Yet, along with deepening continuously of internationalization trend, no matter be economical, political, still culture, academic, the bilingual phenomenon that people are occurred in daily life is more and more general, such as Sino-British two-character given name etc.Thereby, only make up the requirement that more and more can not comply with era development based on the speech recognition system of single language such as Chinese or English.Particularly as maximum and most popular Chinese of number of users and English in the world, makes up one and can carry out Chinese and English and mix the system that discerns, and he is realized on portable equipments such as special chip system, seem extremely important.
Summary of the invention
The objective of the invention is,, propose a kind of method for recognizing Chinese-English bilingual voice of embedded system for overcoming the deficiency that existing chip system can only the identification form language.This method is based on Chinese-English bilingual Embedded Speech Recognition System, the embedded speech Enhancement Method that phoneme merges modeling.
Technical scheme is, a kind of method for recognizing Chinese-English bilingual voice of embedded system, the pre-emphasis that comprises A/D sampling and sampling back voice, improve the energy of high-frequency signal, windowing divides the extraction of frame processing and speech characteristic parameter, and, carry out the coupling identification of voice command according to the acoustic model of setting up in advance, the process of setting up that it is characterized in that described acoustic model is that the non-mother tongue model of establishing Chinese-English bilingual speech recognition initial model, Chinese-English bilingual speech recognition initial model merges adjustment; The coupling identification of described voice command specifically is the identification of Chinese-English bilingual voice command;
Wherein, described establishment Chinese-English bilingual speech recognition initial model comprises revision Chinese speech model of cognition, revision English Phonetics model of cognition, merges Chinese speech and English Phonetics model of cognition after revised Chinese speech model of cognition and English Phonetics model of cognition and training merge;
The non-mother tongue model of described Chinese-English bilingual speech recognition initial model merges adjustment and adopts selectable model merging method that mother tongue model and non-mother tongue model are merged, and the Chinese-English bilingual speech recognition initial model after merging carried out the training of minimum phoneme fault discrimination, obtain the Chinese-English bilingual speech recognition modeling;
The identification of described Chinese-English bilingual voice command is calculated Gauss's mark of Chinese-English bilingual speech recognition modeling by extracting the recognition feature of the voice signal of importing, and carries out template matches according to the Chinese-English bilingual entry, will mate the entry of mark maximum as recognition result.
Described method comprises that also voice strengthen step.
Revised Chinese speech model of cognition of described merging and English Phonetics model of cognition specifically are, employing is based on the modal distance computing method of state time alignment, calculate the Chinese and english distance between the phoneme in twos, will merge apart from a pair of phoneme of minimum then.
Chinese speech and English Phonetics model of cognition after described training merges, the valuation iterative algorithm of employing maximal possibility estimation criterion and expectation maximization obtains Chinese-English bilingual speech recognition initial model.
Chinese speech and English Phonetics model of cognition after described training merges are finished on PC.
The selectable model merging method of described employing merges mother tongue model and non-mother tongue model, comprises the following steps:
(11) the database training by pure mother tongue obtains a mother tongue model M 1;
(12) use the linear homing method of maximum likelihood to carry out self-adaptation with a spot of non-mother tongue database to model M 1, obtain model M 2;
(13) by selectable model merger strategy, with certain mother tongue phoneme λ of the correspondence in the Chinese-English bilingual speech recognition initial model iModel S b, with the phoneme λ in the model M 1 iCorresponding mother tongue model S NeWith λ in the model M 2 iCorresponding adaptive model S a, and corresponding phoneme λ in the Pronounceable dictionary that obtains according to the plain changing method of non-mother tongue easy confusion tone iThe plain γ of easy confusion tone jAdaptive model γ mCarry out linear interpolation and merge the phoneme λ after obtaining merging iAdjustment model S fThe model interpolation formula is as follows:
p(S f)=λ 1p(S b)+λ 2p(S ne)+λ 3p(S a)+λ 4p(γ m)
λ wherein 1, λ 2, λ 3And λ 4The interpolation factor of representing corresponding model respectively.
Chinese-English bilingual speech recognition initial model after the described fusion carries out the training of minimum phoneme fault discrimination and comprises: use speech recognition device to obtain the speech lattice information of training utterance; Prime word level markup information by the voice training storehouse is trained the language model that obtains Chinese and english; An algorithm upgrades model parameter before and after doing on the speech lattice information that obtains.
Described voice strengthen step and adopt improved Wiener filtering algorithm, comprise the following steps:
(21) use the initial value of one section typical ground unrest as Noise Estimation;
(22) utilize sliding filter and tri-state state machine to carry out the walkaway of robust, noisy speech signal for different input signal-to-noise ratios, the output and the pre-set threshold of wave filter are compared, whether be in ground unrest according to decision condition decision current frame signal; If, execution in step (23) then;
(23) adopt the Decision-Directed algorithm to carry out the estimation of present frame priori signal to noise ratio (S/N ratio), and utilize historical frames information to carry out the renewal of noise signal;
(24) adopt two-stage interframe smoothing processing, improve the continuity that strengthens the voice signal frequency spectrum, reduce the distortion of voice signal.
The estimation of described present frame priori signal to noise ratio (S/N ratio) is by former frame priori signal to noise ratio (S/N ratio)
Figure G200910242406XD00041
Estimation γ with present frame posteriority signal to noise ratio (S/N ratio) k(n) weighting obtains, and computing formula is:
Figure G200910242406XD00042
Wherein,
Figure G200910242406XD00043
Estimation for present frame priori signal to noise ratio (S/N ratio); P is a feedback factor, is used to control the contribution to present frame priori SNR estimation of previous frame and present frame; A is the control converging factor.
Method provided by the invention has overcome the deficiency that existing chip system can only the identification form language, and it is low to have an algorithm complex, discerns the good characteristics of sane performance under the high and noise circumstance of accuracy of identification.
Description of drawings
Fig. 1 is a speech recognition synoptic diagram commonly used at present;
Fig. 2 is a method for recognizing Chinese-English bilingual voice process synoptic diagram provided by the invention;
Fig. 3 is that Chinese obscure the phoneme change list when saying English;
Fig. 4 is based on the time slice information synoptic diagram that the phoneme merging method of state time alignment obtains.
Embodiment
Below in conjunction with accompanying drawing, preferred embodiment is elaborated.Should be emphasized that following explanation only is exemplary, rather than in order to limit the scope of the invention and to use.
Fig. 2 is a method for recognizing Chinese-English bilingual voice process synoptic diagram provided by the invention.Among Fig. 2, the method for recognizing Chinese-English bilingual voice of embedded system provided by the invention, comprise the steps: the pre-emphasis of A/D sampling and sampling back voice, improve the energy of high-frequency signal, windowing divides the extraction of frame processing and speech characteristic parameter, establish Chinese-English bilingual speech recognition initial model, the non-mother tongue model of Chinese-English bilingual speech recognition initial model merges the identification of adjustment and Chinese-English bilingual voice command.Wherein, the pre-emphasis of A/D sampling and sampling back voice, improve the energy of high-frequency signal, windowing divides frame to handle and the extraction of speech characteristic parameter is existing technology, establish Chinese-English bilingual speech recognition initial model, the identification that the non-mother tongue model of Chinese-English bilingual speech recognition initial model merges adjustment and Chinese-English bilingual voice command is the new technology that the present invention proposes.
Establishing Chinese-English bilingual speech recognition initial model comprises revision Chinese speech model of cognition, revision English Phonetics model of cognition, merges Chinese speech and English Phonetics model of cognition after revised Chinese speech model of cognition and English Phonetics model of cognition and training merge.
Revision Chinese speech model of cognition and English Phonetics model of cognition are at first said the pronunciation difference finishing Pronounceable dictionary (being the Chinese and english speech recognition modeling) that is right the English or the foreigner literary composition produced according to Chinese.Mainly contain based on expertise with based on two kinds of methods of data-driven.In the present invention,, can under expertise instructs, obtain highly versatile like this, rely on the little pronunciation Changing Pattern of non-mother tongue pronunciation data volume, can have data-driven concurrently again simultaneously in conjunction with two kinds of strategies.Thereby it is good to realize with the real data matching, and manual intervention is few, propagable advantage.When using the method for data-driven, the archiphoneme mark of combined training data and the identification of recognizer are marked the phoneme matrix of easily being obscured, determine final pronunciation Changing Pattern in conjunction with the guidance of expertise then.Say that with Chinese English is example, Fig. 3 is that Chinese obscure the phoneme change list when saying English, among Fig. 3, according to this phoneme Changing Pattern of determining at last, revises English Pronounceable dictionary again.
Behind revision Chinese speech model of cognition and English Phonetics model of cognition, two models revising are merged, obtain unified and the less mode set of scale.Obtain a less model of cognition of scale and just must will carry out the merging of Chinese and English model of cognition, in order to guarantee high recognition, when merging, some enough near models of distance on the acoustic model space are merged simultaneously.The present invention adopts and weighs two distances between model based on the method model distance calculating method of state time alignment.With two phoneme model Chinese phoneme λ iWith English phoneme γ jBeing the distance calculating method between two models of example explanation, is earlier that the plurality of sections voice prepared in two phonemes from the voice of artificial mark, then with λ iThis phoneme λ used respectively in each section voice iWith the other side's phoneme γ jCarry out viterbi (Viterbi) state time alignment, obtain segment information as shown in Figure 4.λ wherein iAnd γ jTwo models before expression does not merge respectively.As we know from the figure, can obtain 5 sections carve informations,, calculate the Bhattacharyya distance of last two models of each section, be designated as D then according to the time corresponding section Mn, be weighted as weight with the length of time period at last and obtain a distance and be:
D ( λ i , γ j ) = Σ q = 1 5 Δt q D mn .
Conversely, with γ jThis phoneme γ used respectively in each section voice jWith the other side's phoneme λ iCarry out viterbi (Viterbi) state time alignment, same method obtains D (γ j, λ i), final mask λ iAnd γ jBetween distance be
D = 1 2 ( D ( λ i , γ j ) + D ( γ j , λ i ) ) .
According to above computing method, obtain the Chinese and English distance between the phoneme in twos, will merge apart from a pair of phoneme of minimum then.Carry out the circulation that phoneme merges according to this process, drop to till the quantity that needs up to the phoneme number.According to the distance calculating method of introducing above based on the state time alignment, Chinese phoneme and English phoneme have been merged 15 pairs altogether, significantly reduced the scale of phone set, be fit to the resource requirement of embedded system.
Next by a large amount of Chinese and English Phonetics database, Chinese speech after being combined and English Phonetics model of cognition are trained, here adopt MLE (Maximum likelylood estimation, maximal possibility estimation) criterion and EM (Expectation Maximum, expectation maximization) valuation iterative algorithm carries out, and obtains Chinese-English bilingual speech recognition initial model.Whole training process is finished on PC.
The non-mother tongue model of Chinese-English bilingual speech recognition initial model merges adjustment and adopts selectable model merging method that mother tongue model and non-mother tongue model are merged, and the Chinese-English bilingual after merging is discerned initial model carry out the training of minimum phone fault discrimination, obtain the Chinese-English bilingual speech recognition modeling.
Non-mother tongue speaker often have the mother tongue accent or pronounce lack of standardization, thereby recognition system can cause erroneous judgement, must adopt the model integration technology come to identification initial model adjust.The present invention adopts selectable model merging method that mother tongue model and non-mother tongue model are merged, and revises the parameter of recognition template, and its process is:
(11) the database training by pure mother tongue obtains a mother tongue model M 1;
(12) use the linear homing method of maximum likelihood to carry out self-adaptation with a spot of non-mother tongue database to model M 1, obtain model M 2;
(13) by selectable model merger strategy, with certain mother tongue phoneme λ of the correspondence in the Chinese-English bilingual speech recognition initial model iModel S b, with the phoneme λ in the model M 1 iCorresponding mother tongue model S NeWith λ in the model M 2 iCorresponding adaptive model S a, and corresponding phoneme λ in the Pronounceable dictionary that obtains according to the plain changing method of non-mother tongue easy confusion tone iThe plain γ of easy confusion tone jAdaptive model γ mCarry out linear interpolation and merge the phoneme λ after obtaining merging iAdjustment model S fThe model interpolation formula is as follows:
p(S f)=λ1p(S b)+λ 2p(S ne)+λ 3p(S a)+λ 4p(γ m)
λ wherein 1, λ 2, λ 3And λ 4The interpolation factor of representing corresponding model respectively.
In order to obtain meticulousr model, particularly further improve the discrimination of non-mother tongue Chinese-English bilingual, the present invention is applied to the property distinguished training technique under the bilingual environment first.According to MPE (MinimumPhone Error, minimum phoneme mistake) criterion, the Chinese-English bilingual model of cognition that has obtained is carried out the training of the MPE property distinguished: at first use speech recognition device to obtain the speech lattice information of training utterance, by the prime word level markup information in voice training storehouse, training obtains the language model of Chinese and English simultaneously; Upgrade model parameter by an algorithm before and after on the speech lattice information that obtains, being Forward-Backward at last.Through after the parameter iteration valuation repeatedly, model parameter has obtained further adjustment, keeps bigger distinctive and the property distinguished between the model; According to the adjusted Chinese-English bilingual model of cognition of non-mother tongue, can guarantee that the bilingual discrimination when voice are mother tongue does not reduce, improved the bilingual discrimination of non-mother tongue simultaneously significantly.Finally the discrimination to mother tongue and non-mother tongue Chinese and English has all reached more than 98%.
The identification of Chinese-English bilingual voice command is the recognition feature by the voice signal that extracts input, calculates Gauss's mark of Chinese-English bilingual speech recognition modeling, and carries out template matches according to the Chinese-English bilingual entry, will mate the entry of mark maximum as recognition result.Extract the recognition feature of the voice signal of input, can adopt the extracting method of speech characteristic parameter commonly used.Gauss's mark according to feature calculation Chinese-English bilingual model carries out template matches according to the Chinese-English bilingual entry, finds out the recognition result that is of coupling mark maximum.For improving recognition speed and accuracy of identification, the identification judging process also is divided into rough identification and two processes of meticulous identification.The model parameter of rough identification is less, and model parameter is less than 200, and rough recognition speed is fast.Some pronunciations voice nonstandard or that easily mix are carried out meticulous identification again, and the parameter of meticulous model of cognition is more, probably about 1000.But because the candidate who obtains after the rough identification of process seldom, although meticulous model of cognition number is more, recognition speed is equally very fast.Two-stage identification not only improves the average velocity of identification, and has improved accuracy of identification.
In order to improve the performance of speech recognition under the noise circumstance, the present invention can also comprise that voice strengthen step.Voice strengthen step specifically:
(21) use the initial value of one section typical ground unrest as Noise Estimation.
(22) utilize sliding filter and tri-state state machine to carry out the walkaway of robust, noisy speech signal for different input signal-to-noise ratios, the output and the pre-set threshold of wave filter are compared, whether be in ground unrest according to decision condition decision current frame signal; If, execution in step (23) then; Otherwise, finish.
(23) adopt the Decision-Directed algorithm to carry out the estimation of present frame priori signal to noise ratio (S/N ratio), and utilize historical frames information to carry out the renewal of noise signal.The estimation of present frame priori signal to noise ratio (S/N ratio) is by former frame priori signal to noise ratio (S/N ratio)
Figure G200910242406XD00091
Estimation γ with present frame posteriority signal to noise ratio (S/N ratio) k(n) weighting obtains, and computing formula is:
Figure G200910242406XD00092
Wherein,
Figure G200910242406XD00093
Be the estimation of present frame priori signal to noise ratio (S/N ratio), p a.
(24) adopt two-stage interframe smoothing processing simultaneously, improved the continuity that strengthens the voice signal frequency spectrum, reduce the distortion of voice signal.
Method for recognizing Chinese-English bilingual voice provided by the invention has realized that the recognition function of Chinese-English bilingual, the model scale of system compare the recognition system of single language and do not enlarge, and shared storage resources is less; Taking into account under the condition of non-mother tongue simultaneously, when guaranteeing the high discrimination of mother tongue, obtaining the high-performance of non-mother tongue identification, adopting speech enhancement technique to improve the accuracy of identification under the noise circumstance in addition, be applicable to the embedded realization of Chinese-English bilingual identification.
The present invention is that platform is that example experimentizes with the bilingual name dial system of portable mobile phone Chinese and English of a reality.Wherein identification mission comprises 500 English name-tos and 500 Chinese names in being.Experiment shows that aspect memory space, the memory space resource that bilingual recognition methods of the present invention needs is close with the identification system of single language.Can handle the identification of Chinese and English name simultaneously, take into account under the condition of non-mother tongue simultaneously, when guaranteeing the high discrimination of mother tongue, obtain the high-performance of non-mother tongue identification, the mother tongue of final system Chinese-English bilingual and non-mother tongue discrimination have all arrived more than 98%.Adopt speech enhancement technique to improve the accuracy of identification under the noise circumstance in addition, be applicable to the embedded realization of Chinese-English bilingual identification.
The above; only for the preferable embodiment of the present invention, but protection scope of the present invention is not limited thereto, and anyly is familiar with those skilled in the art in the technical scope that the present invention discloses; the variation that can expect easily or replacement all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should be as the criterion with the protection domain of claim.

Claims (9)

1. the method for recognizing Chinese-English bilingual voice of an embedded system, the pre-emphasis that comprises A/D sampling and sampling back voice, improve the energy of high-frequency signal, windowing divides the extraction of frame processing and speech characteristic parameter, and according to the acoustic model of setting up in advance, carry out the coupling identification of voice command, the process of setting up that it is characterized in that described acoustic model is that the non-mother tongue model of establishing Chinese-English bilingual speech recognition initial model, Chinese-English bilingual speech recognition initial model merges adjustment; The coupling identification of described voice command specifically is the identification of Chinese-English bilingual voice command;
Wherein, described establishment Chinese-English bilingual speech recognition initial model comprises revision Chinese speech model of cognition, revision English Phonetics model of cognition, merges Chinese speech and English Phonetics model of cognition after revised Chinese speech model of cognition and English Phonetics model of cognition and training merge;
The non-mother tongue model of described Chinese-English bilingual speech recognition initial model merges adjustment and adopts selectable model merging method that mother tongue model and non-mother tongue model are merged, and the Chinese-English bilingual speech recognition initial model after merging carried out the training of minimum phoneme fault discrimination, obtain the Chinese-English bilingual speech recognition modeling;
The identification of described Chinese-English bilingual voice command is calculated Gauss's mark of Chinese-English bilingual speech recognition modeling by extracting the recognition feature of the voice signal of importing, and carries out template matches according to the Chinese-English bilingual entry, will mate the entry of mark maximum as recognition result.
2. the method for recognizing Chinese-English bilingual voice of a kind of embedded system according to claim 1 is characterized in that described method comprises that also voice strengthen step.
3. the method for recognizing Chinese-English bilingual voice of a kind of embedded system according to claim 1 and 2, it is characterized in that revised Chinese speech model of cognition of described merging and English Phonetics model of cognition specifically are, employing is based on the modal distance computing method of state time alignment, calculate the Chinese and english distance between the phoneme in twos, will merge apart from a pair of phoneme of minimum then.
4. the method for recognizing Chinese-English bilingual voice of a kind of embedded system according to claim 1 and 2, it is characterized in that Chinese speech and English Phonetics model of cognition after described training merges, adopt the valuation iterative algorithm of maximal possibility estimation criterion and expectation maximization, obtain Chinese-English bilingual speech recognition initial model.
5. the method for recognizing Chinese-English bilingual voice of a kind of embedded system according to claim 1 and 2 is characterized in that Chinese speech and the English Phonetics model of cognition after described training merges finished on PC.
6. the method for recognizing Chinese-English bilingual voice of a kind of embedded system according to claim 1 and 2 is characterized in that the selectable model merging method of described employing merges mother tongue model and non-mother tongue model, comprises the following steps:
(11) the database training by pure mother tongue obtains a mother tongue model M 1;
(12) use the linear homing method of maximum likelihood to carry out self-adaptation with a spot of non-mother tongue database to model M 1, obtain model M 2;
(13) by selectable model merger strategy, with certain mother tongue phoneme λ of the correspondence in the Chinese-English bilingual speech recognition initial model iModel S b, with the phoneme λ in the model M 1 iCorresponding mother tongue model S NeWith λ in the model M 2 iCorresponding adaptive model S a, and corresponding phoneme λ in the Pronounceable dictionary that obtains according to the plain changing method of non-mother tongue easy confusion tone iThe plain γ of easy confusion tone jAdaptive model γ mCarry out linear interpolation and merge the phoneme λ after obtaining merging iAdjustment model S fInterpolation formula is as follows:
p(S f)=λ 1p(S b)+λ 2p(S ne)+λ 3p(S a)+λ 4p(γ m)
λ wherein 1, λ 2, λ 3And λ 4The interpolation factor of representing corresponding model respectively.
7. the method for recognizing Chinese-English bilingual voice of a kind of embedded system according to claim 1 and 2 is characterized in that Chinese-English bilingual speech recognition initial model after the described fusion carries out the training of minimum phoneme fault discrimination and comprises: use speech recognition device to obtain the speech lattice information of training utterance; Prime word level markup information by the voice training storehouse is trained the language model that obtains Chinese and english; An algorithm upgrades model parameter before and after doing on the speech lattice information that obtains.
8. the method for recognizing Chinese-English bilingual voice of a kind of embedded system according to claim 1 and 2 is characterized in that described voice strengthen step and adopt improved Wiener filtering algorithm, comprises the following steps:
(21) use the initial value of one section typical ground unrest as Noise Estimation;
(22) utilize sliding filter and tri-state state machine to carry out the walkaway of robust, noisy speech signal for different input signal-to-noise ratios, the output and the pre-set threshold of wave filter are compared, whether be in ground unrest according to decision condition decision current frame signal; If, execution in step (23) then; Otherwise, finish;
(23) adopt the Decision-Directed algorithm to carry out the estimation of present frame priori signal to noise ratio (S/N ratio), and utilize historical frames information to carry out the renewal of noise signal;
(24) adopt two-stage interframe smoothing processing, improve the continuity that strengthens the voice signal frequency spectrum, reduce the distortion of voice signal.
9. the method for recognizing Chinese-English bilingual voice of a kind of embedded system according to claim 2 is characterized in that the estimation of described present frame priori signal to noise ratio (S/N ratio), by former frame priori signal to noise ratio (S/N ratio)
Figure F200910242406XC00031
Estimation γ with present frame posteriority signal to noise ratio (S/N ratio) k(n) weighting obtains, and computing formula is:
Figure F200910242406XC00032
Wherein,
Figure F200910242406XC00033
Estimation for present frame priori signal to noise ratio (S/N ratio); P is a feedback factor, is used to control the contribution to present frame priori SNR estimation of previous frame and present frame; A is the control converging factor.
CN200910242406XA 2009-12-10 2009-12-10 Method for recognizing Chinese-English bilingual voice of embedded system Expired - Fee Related CN101727901B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200910242406XA CN101727901B (en) 2009-12-10 2009-12-10 Method for recognizing Chinese-English bilingual voice of embedded system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200910242406XA CN101727901B (en) 2009-12-10 2009-12-10 Method for recognizing Chinese-English bilingual voice of embedded system

Publications (2)

Publication Number Publication Date
CN101727901A true CN101727901A (en) 2010-06-09
CN101727901B CN101727901B (en) 2011-11-09

Family

ID=42448692

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200910242406XA Expired - Fee Related CN101727901B (en) 2009-12-10 2009-12-10 Method for recognizing Chinese-English bilingual voice of embedded system

Country Status (1)

Country Link
CN (1) CN101727901B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103412856A (en) * 2013-01-14 2013-11-27 刘恒 Portable China and foreign language translation machine
CN104167206A (en) * 2013-05-17 2014-11-26 佳能株式会社 Acoustic model combination method and device, and voice identification method and system
CN105161092A (en) * 2015-09-17 2015-12-16 百度在线网络技术(北京)有限公司 Speech recognition method and device
WO2016110068A1 (en) * 2015-01-07 2016-07-14 中兴通讯股份有限公司 Voice switching method and apparatus for voice recognition device
CN106448655A (en) * 2016-10-18 2017-02-22 江西博瑞彤芸科技有限公司 Speech identification method
CN106878805A (en) * 2017-02-06 2017-06-20 广东小天才科技有限公司 Mixed language subtitle file generation method and device
CN107564527A (en) * 2017-09-01 2018-01-09 平顶山学院 The method for recognizing Chinese-English bilingual voice of embedded system
CN108510978A (en) * 2018-04-18 2018-09-07 中国人民解放军62315部队 The modeling method and system of a kind of English acoustic model applied to languages identification
CN108630192A (en) * 2017-03-16 2018-10-09 清华大学 A kind of non-methods for mandarin speech recognition, system and its building method
CN110634487A (en) * 2019-10-24 2019-12-31 科大讯飞股份有限公司 Bilingual mixed speech recognition method, device, equipment and storage medium
CN111816169A (en) * 2020-07-23 2020-10-23 苏州思必驰信息科技有限公司 Method and device for training Chinese and English hybrid speech recognition model
CN112071307A (en) * 2020-09-15 2020-12-11 江苏慧明智能科技有限公司 Intelligent incomplete voice recognition method for elderly people
CN112652311A (en) * 2020-12-01 2021-04-13 北京百度网讯科技有限公司 Chinese and English mixed speech recognition method and device, electronic equipment and storage medium
CN113692616A (en) * 2019-05-03 2021-11-23 谷歌有限责任公司 Phoneme-based contextualization for cross-language speech recognition in an end-to-end model
WO2022105235A1 (en) * 2020-11-18 2022-05-27 华为技术有限公司 Information recognition method and apparatus, and storage medium

Cited By (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103412856A (en) * 2013-01-14 2013-11-27 刘恒 Portable China and foreign language translation machine
CN104167206A (en) * 2013-05-17 2014-11-26 佳能株式会社 Acoustic model combination method and device, and voice identification method and system
CN104167206B (en) * 2013-05-17 2017-05-31 佳能株式会社 Acoustic model merging method and equipment and audio recognition method and system
WO2016110068A1 (en) * 2015-01-07 2016-07-14 中兴通讯股份有限公司 Voice switching method and apparatus for voice recognition device
CN105161092A (en) * 2015-09-17 2015-12-16 百度在线网络技术(北京)有限公司 Speech recognition method and device
CN105161092B (en) * 2015-09-17 2017-03-01 百度在线网络技术(北京)有限公司 A kind of audio recognition method and device
CN106448655A (en) * 2016-10-18 2017-02-22 江西博瑞彤芸科技有限公司 Speech identification method
CN106878805A (en) * 2017-02-06 2017-06-20 广东小天才科技有限公司 Mixed language subtitle file generation method and device
CN108630192A (en) * 2017-03-16 2018-10-09 清华大学 A kind of non-methods for mandarin speech recognition, system and its building method
CN108630192B (en) * 2017-03-16 2020-06-26 清华大学 non-Chinese speech recognition method, system and construction method thereof
CN107564527A (en) * 2017-09-01 2018-01-09 平顶山学院 The method for recognizing Chinese-English bilingual voice of embedded system
CN108510978A (en) * 2018-04-18 2018-09-07 中国人民解放军62315部队 The modeling method and system of a kind of English acoustic model applied to languages identification
CN108510978B (en) * 2018-04-18 2020-08-21 中国人民解放军62315部队 English acoustic model modeling method and system applied to language identification
CN113692616A (en) * 2019-05-03 2021-11-23 谷歌有限责任公司 Phoneme-based contextualization for cross-language speech recognition in an end-to-end model
CN113692616B (en) * 2019-05-03 2024-01-05 谷歌有限责任公司 Phoneme-based contextualization for cross-language speech recognition in an end-to-end model
CN110634487A (en) * 2019-10-24 2019-12-31 科大讯飞股份有限公司 Bilingual mixed speech recognition method, device, equipment and storage medium
CN111816169A (en) * 2020-07-23 2020-10-23 苏州思必驰信息科技有限公司 Method and device for training Chinese and English hybrid speech recognition model
CN112071307A (en) * 2020-09-15 2020-12-11 江苏慧明智能科技有限公司 Intelligent incomplete voice recognition method for elderly people
WO2022105235A1 (en) * 2020-11-18 2022-05-27 华为技术有限公司 Information recognition method and apparatus, and storage medium
CN112652311A (en) * 2020-12-01 2021-04-13 北京百度网讯科技有限公司 Chinese and English mixed speech recognition method and device, electronic equipment and storage medium
US11893977B2 (en) 2020-12-01 2024-02-06 Beijing Baidu Netcom Science Technology Co., Ltd. Method for recognizing Chinese-English mixed speech, electronic device, and storage medium

Also Published As

Publication number Publication date
CN101727901B (en) 2011-11-09

Similar Documents

Publication Publication Date Title
CN101727901A (en) Method for recognizing Chinese-English bilingual voice of embedded system
US11062699B2 (en) Speech recognition with trained GMM-HMM and LSTM models
CN103971685B (en) Method and system for recognizing voice commands
US8930196B2 (en) System for detecting speech interval and recognizing continuous speech in a noisy environment through real-time recognition of call commands
EP1557822B1 (en) Automatic speech recognition adaptation using user corrections
CN101118745B (en) Confidence degree quick acquiring method in speech identification system
CN101246685B (en) Pronunciation quality evaluation method of computer auxiliary language learning system
CN101645271B (en) Rapid confidence-calculation method in pronunciation quality evaluation system
CN103077708B (en) Method for improving rejection capability of speech recognition system
WO2008024148A1 (en) Incrementally regulated discriminative margins in mce training for speech recognition
CN107093422B (en) Voice recognition method and voice recognition system
CN104036774A (en) Method and system for recognizing Tibetan dialects
CN102122506A (en) Method for recognizing voice
CN112349289B (en) Voice recognition method, device, equipment and storage medium
CN112233651B (en) Dialect type determining method, device, equipment and storage medium
US11705116B2 (en) Language and grammar model adaptation using model weight data
CN102982799A (en) Speech recognition optimization decoding method integrating guide probability
CN102693723A (en) Method and device for recognizing speaker-independent isolated word based on subspace
Adell et al. Comparative study of automatic phone segmentation methods for TTS
CN103474062A (en) Voice identification method
CN106887226A (en) Speech recognition algorithm based on artificial intelligence recognition
CN111933121B (en) Acoustic model training method and device
CN112863486B (en) Voice-based spoken language evaluation method and device and electronic equipment
CN107564527A (en) The method for recognizing Chinese-English bilingual voice of embedded system
Li et al. English sentence recognition based on hmm and clustering

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20181121

Address after: 100085 Beijing Haidian District Shangdi Information Industry Base Pioneer Road 1 B Block 2 Floor 2030

Patentee after: Beijing Huacong Zhijia Technology Co., Ltd.

Address before: 100084 mailbox 100084-82, Beijing City

Patentee before: Tsinghua University

CF01 Termination of patent right due to non-payment of annual fee
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20111109

Termination date: 20201210