CN1801324A - Acoustic model construction method - Google Patents

Acoustic model construction method Download PDF

Info

Publication number
CN1801324A
CN1801324A CNA2005100042414A CN200510004241A CN1801324A CN 1801324 A CN1801324 A CN 1801324A CN A2005100042414 A CNA2005100042414 A CN A2005100042414A CN 200510004241 A CN200510004241 A CN 200510004241A CN 1801324 A CN1801324 A CN 1801324A
Authority
CN
China
Prior art keywords
language material
root
phoneme
acoustic model
time
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CNA2005100042414A
Other languages
Chinese (zh)
Inventor
黄昭世
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Acer Inc
Original Assignee
Acer Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Acer Inc filed Critical Acer Inc
Priority to CNA2005100042414A priority Critical patent/CN1801324A/en
Publication of CN1801324A publication Critical patent/CN1801324A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Machine Translation (AREA)

Abstract

Disclosed is a method for establishing acoustic model which employs a plurality of acoustic materials to establish acoustic model as reference model of an acoustic signal, wherein each acoustic material has one phoneme. The inventive method comprises steps of: A) establishing a root acoustic material set with a plurality of root acoustic materials of the same root phoneme; B) establishing a sub acoustic material set which is related to the root phoneme, each sub acoustic material set having at least one sub acoustic material with a root phoneme and a sub phoneme of adjacent root phoneme; C) establishing an acoustic model of sub acoustic material set with each root acoustic material and sub acoustic material.

Description

The method for building up of acoustic model
Technical field
The present invention relates to a kind of method for building up of acoustic model, particularly a kind of the following language material set is done the adaptability training to the acoustic model of root language material set, to obtain the acoustic model method for building up of time corpus chorus model.
Background technology
The speech recognition technology of main flow is based on the basic theories of statistical model identification at present.A complete speech recognition system can roughly be divided into three levels:
(1) voice signal is handled: its objective is from the voice signal of input to extract time dependent mentioned speech feature vector sequence.
(2) acoustics decoding (acoustic decoding): because the voice signal of input is made up of a series of syllable (syllable), and be not suitable as the base unit of voice identifications, so in the identification of acoustics level, general normal serves as the identification unit with single syllable (mono syllable).Phonetic feature (Feature Model) sequence of in voice signal is handled, being obtained, produce corresponding acoustic model (Acoustic Model) by algorithm, and when carrying out speech recognition, the voice signal of input and the acoustic model of foundation are compared, to obtain best recognition result.
(3) linguistic decoding (linguistic decoding): then be the identification problem that single syllable is concatenated into speech or sentence, come voice signal is carried out grammer, semantic analysis by the grammer network or by the language model that statistical method constitutes.
Wherein, in phonetics, when the state of speaking of nature, pronunciation is continuous, be syllable and syllable division mutually and not obvious, that is so-called sounding coupling (coarticulation) phenomenon, in order to solve each syllable challenge of sounding coupling to each other, how to overcome at present with " front and back literary composition correlation model " (context dependent models).
In general, each single syllable all has at least one phoneme (phone), phoneme can be divided into initial consonant (initial) and simple or compound vowel of a Chinese syllable (final) is secondary noise and vowel two parts, because the influence of sounding coupling will make identical phoneme in different statements, different acoustic models is arranged, and the different phoneme quantity that language had is also inequality, has 40-50 as English, and Chinese then has 37.If set up the civilian correlation model in front and back according to the relation of front and back literary composition, then required acoustic model quantity will be very huge, need more than 60000 as middle contract, English needs more than 125000 approximately, and the foundation of each acoustic model, need enough language materials again, can make this model have certain fiduciary level.For reducing required language material, be to adopt decision tree (decision tree) mode at present, share the method for parameter with each relevant language material and train acoustic model.
Decision tree is the method for a kind of integration linguistics (phonetics) and acoustics (acoustic) knowledge, all language materials that from top to bottom all belonged to a phoneme are placed on the superiors, this data Layer is referred to as root node, utilize various etic problems to check all language materials that belong to this node again, with checking the result is that the language material that meets is classified as a class, the language material that is not inconsistent then is classified as another kind of, and language material is divided into two child nodes, broken needle is not done same action to each child node, till convergence, can obtain a tree structure at last.Can arrive a leaf node (leaf node) along the every paths of tree structure, the language material of representing some to have similar acoustic characteristic troop (cluster), but if language material underbunching one critical value (Threshold) of last gained, then this language material is trooped and can't be reached a certain number of statistic, and causes this training pattern inaccurate.Want head it off, present solution is to adopt back all language materials of quoting (backing-off) its upper strata root node, the reference language material during as modelling.But the critical value of the language material number during therefore language material is trooped is difficult for decision, and very limited to the help that promotes model resolution (resolution).
Summary of the invention
Therefore, purpose of the present invention promptly can effectively be utilized existing language material providing a kind of, and set up the acoustic model method for building up of a more accurate acoustic model.
So, the present invention discloses a kind of method for building up of acoustic model, be to utilize a plurality of language materials, set up an acoustic model, with reference model as comparison one voice signal, wherein, each language material has at least one phoneme, and this method comprises following step: A) set up a language material set, wherein, the set of root language material has a plurality of language materials, and each root language material has a phoneme; B) set up a language material set relevant with the root phoneme, wherein, inferior language material set has at least one time language material, and inferior language material has the inferior phoneme of a root phoneme and a consecutive roots phoneme; And C) utilize each root language material and time language material to set up time acoustic model of language material set.
The present invention also discloses a kind of recording medium of embodied on computer readable, its can by mounting in a computing machine as a speech model apparatus for establishing, and work in coordination with a speech input device, in order to language material to input, set up corresponding speech model, wherein, speech input device is in order to receive a language material with at least one phoneme, and this language material is sent in the storage element of computing machine, record one in the recording medium and can order about the program code that the speech model apparatus for establishing moves, program code can be carried out following step in the speech model apparatus for establishing: A) set up a language material set, wherein, the set of root language material has a plurality of language materials, and each root language material has a phoneme; B) set up a language material set relevant with the root phoneme, wherein, inferior language material set has at least one time language material, and inferior language material has the inferior phoneme of a root phoneme and a consecutive roots phoneme; And C) utilize each root language material and time language material to set up time acoustic model of language material set.
The present invention also discloses a kind of method for building up of concealed Marko husband acoustic model, comprises following step: A) set up a root language material set that comprises a plurality of language materials, and each root language material has an identical phoneme; B)
μ ‾ = n d kn i + n d μ d ‾ + k n i kn i + n d μ i ‾
Set up an inferior language material that comprises at least language material and gather, this time language material has the inferior phoneme of this root phoneme and adjacent this root phoneme; C) calculate μ value in the following formula:
Wherein,
Figure A20051000424100072
With Be respectively the mean value of the concealed markov model parameter of this root language material set and time language material set, n iAnd n dBe respectively the language material sample number that is had in this root language material set and time language material set, k is a default weighted value; D) upgrade time concealed markov model mean parameter of language material set with the μ value; And E) sets up the acoustic model of this time language material set according to the concealed markov model mean parameter of the inferior language material set after upgrading.
The present invention discloses a kind of method for building up of concealed Marko husband acoustic model in addition, in order to build the acoustic model of upright this time language material set jointly according to a language material set and a relevant inferior corpus, this root language material set comprises a plurality of language materials and each root language material has an identical phoneme, and the set of this time language material then comprises at least one
μ ‾ = n d kn i + n d μ d ‾ + k n i kn i + n d μ i ‾
Inferior language material and inferior language material have the inferior phoneme of this a root phoneme and a consecutive roots phoneme, and this method comprises following step:
A) calculate μ value in the following formula:
Wherein, With Be respectively the mean value of the concealed markov model parameter of this root language material set and time language material set, n iAnd n dBe respectively the language material sample number that is had in set of root language material and time language material set, k is a default weighted value; B) upgrade time concealed markov model mean parameter of language material set with the μ value; And C) sets up the acoustic model of this time language material set according to the concealed markov model mean parameter of the inferior language material set after upgrading.
Description of drawings
Fig. 1 is a synoptic diagram, and the enforcement aspect of the method for building up of acoustic model of the present invention is described;
Fig. 2 is a calcspar, and an application program is described;
Fig. 3 is a calcspar, illustrates that an acoustic model sets up module;
Fig. 4 is a process flow diagram, and the flow process of setting up of a concealed markov model is described;
Fig. 5 is a synoptic diagram, and the determining step of a decision tree is described; And
Fig. 6 is a process flow diagram, and the method for building up of acoustic model of the present invention is described.
The reference numeral explanation
1 computing machine
11 displays
12 keyboards
13 storage elements
2 voice-input units
31 user's interface units
32 Audio Processing Units
4 acoustic models are set up module
41 set of phonemes unit
42 phoneme models are set up the unit
43 phoneme aggregation units
44 times phoneme model is set up the unit
The 61-65 step
Embodiment
Aforementioned and other technology contents, feature and advantage of the present invention, in the detailed description of a following cooperation preferred embodiment with reference to the accompanying drawings, can be clear.
Before being elaborated, chat earlier and bright be, the method that acoustic model of the present invention is set up is applicable to the language of arbitrary state, family, though illustrate with English in the present embodiment, should be as limit.
At first see also Fig. 1, the method for building up of acoustic model of the present invention can be by the form of implementation of a program code, and is stored in a computing machine 1 medium capable of reading record, in CD, floppy disk, hard disk, and as calculated after machine 1 load and execution, produce an acoustic model and set up module 4 (consulting Fig. 3).Wherein, computing machine 1 is to comprise a central processing unit (figure does not show), a storage element 13, a display 11, a keyboard 12, in the present invention because computing machine 1 is a present known skill, and non-feature of the present invention place, so do not add to give unnecessary details at this.In addition, because computing machine 1 is the treating apparatus for an electric signal, for making computing machine can receive and handle the sound that people send, so need by a voice-input unit 2, as a microphone (MIC) that is electrically connected with computing machine 1, can be the voice signal that computing machine 1 reception is handled in order to a sound is converted to one, and be sent in the computing machine 1.
Consult Fig. 1,2, have an application program in the storage element 13 of computing machine 1, it promptly produces a user's interface unit 31 and an Audio Processing Unit 32 after carrying out.User's interface unit 31 is in order to produce user's interface on display 11, allow the indication of user according to user's interface, send a corresponding sound, show " please read out video " as user's interface, then the user promptly reads out " video " this word to voice-input unit 2, after receiving via voice-input unit 2, be sent to after Audio Processing Unit 32 carries out a pre-programmed processing to the voice signal of this input, with as setting up the required speech data of an acoustic model.Hereinafter, be that these are referred to as language material (corpus) for the speech data of setting up the acoustic model demand, because the intonation of everyone sounding is all slightly variant, so for to ask the acoustic model of foundation more appropriate, each words need have the reference language material the when language material of a great deal of different sources of sound is next sets up acoustic model as it.
Wherein, Audio Processing Unit 32 will quantize to show in the hope of replacement the characteristic parameter of these voice to this language material, and this language material will be set up a tag file (feature film), and it will be stored in the storage element 13 after reading language material.
Consult Fig. 1,3, after the language material tag file all was stored in storage element 13, acoustic model was set up.Module 4 will utilize stored language material tag file to carry out the foundation of correlation model.Wherein, because the voice signal of input is made up of a succession of syllable, and is not suitable as the base unit of voice identifications, so, be to be to recognize unit with the single syllable in the identification of acoustics level.In addition, because pronunciation is continuous, the division that is syllable and syllable is also not obvious, that is so-called sounding coupling phenomenon, and in general, each single syllable all has at least one phoneme, and phoneme can be divided into consonant and vowel two big classes again, because the influence of sounding coupling will make identical phoneme that different acoustic models be arranged in different statements.Therefore, set up in the module 4,, again to the relevant language material of this root phoneme, set up a language material set, and this root corpus is built jointly upright its corresponding acoustic model earlier a predetermined phoneme being set it for the root phoneme at acoustic model.
Afterwards, more respectively according to the different difference of the adjacent phoneme in the root phoneme left and right sides therewith, and by separating out a plurality of language material set in the set of root language material.And the acoustic model of each time language material set, then the acoustic model with the set of root language material is a female parent, by each language material in affiliated the language material set, revises the acoustic model of root language material set one by one, and set up out the affiliated acoustic model of each time language material set respectively, its steps flow chart is detailed later.
Consult Fig. 3, acoustic model is set up module 4 and is had a set of phonemes unit 41, phoneme model and set up unit 42, a set of phonemes unit 43, and a phoneme model is set up unit 44.
Consult Fig. 1,3, it is a phoneme that a phoneme is set earlier in root set of phonemes unit 41, and in the storage element 13 by computing machine 1, selection has the tag file of the language material of this root phoneme, even this root phoneme is/v/, then as long as the first term phoneme of this language material all belongs to its set for/v/ person, as v/vi/, vacate/ve ' ket/, vagi/ ' vegi/ ... Deng.Utilize this principle can set up the very huge root language material set of a quantity, also can be described as and the irrelevant set of phonemes (context-independent phone set) of content.
After root language material set is set up in root set of phonemes unit 41, the root phoneme model is set up unit 42 will build upright acoustic model under it jointly to the root corpus.Be the process that adopts a concealed markov model (Hidden Markov Model) to come the emulation language material to change in the present embodiment in the oral cavity sound channel, to set up out the acoustic model of this root language material set, wherein, shown in the method for building up following steps of concealed markov model:
Consult Fig. 4, at first,, set up an initial model the corpus that the language material tag file that stores from storage element 13 is defined earlier as step 50,51.Then as step 52,53, this initial model is carried out calculation of similarity degree, and be the parameter renegotiation estimation algorithm that adopts Baum-Welch in the present embodiment, with the parameter estimation (maximum likelihood estimation) that obtains maximum approximate value, wait the parameter of revaluation according to this, upgrade the parameter of concealed markov model.Step 54 for another example, repeating step 52,53 is till the model convergence.Last as step 55, if then training end of model convergence, and export this acoustic model.
Through above-mentioned step, can draw the acoustic model of a language material set.Because concealed markov model is an existing skill, so do not given unnecessary details at this.
Consult Fig. 5, after the root phoneme model was set up unit 42 and set up the acoustic model of root set of phonemes, inferior phoneme aggregation units 43 by in the set of root language material, sorted out the inferior language material relevant with this root phoneme again, and was established as a language material and gathers.In the present embodiment, the method of this kind classification is to adopt a decision tree mode, all all language materials that belong to a phoneme are placed on the superiors, utilize various etic problems to check all language materials that belong to this layer again, as root phoneme the right is acuteness or continuity etc., and it is different and different that problem can be screened logical method according to decision tree, do not exceed with the present embodiment revealer.
Via the difference that checks the result with regard to problem, check the language material that meets and to be classified as a class, be classified as another kind of and check the language material that is not inconsistent, according to this language material is divided into two child nodes, broken needle is not done same action to each child node until till the convergence, can obtain representing some to have the inferior language material set of similar acoustic characteristic at last.For example, if the root phoneme be/v/, during its corpus is closed, will obtain in the root language material set language material because of screening, all phonemes orderings are/language material of vi/, and the language material number will be looked when setting up originally, what data are provided and decide.Wherein, language material set this time also can be referred to as the set of phonemes relevant with content (context-dependent phone set).
Inferior phoneme model set up unit 44 be inferior corpus build jointly upright after, each language material during language material is gathered in proper order, pursue pen ground with a predetermined way, the acoustic model of root language material set done adjust training (automatically adaptive training) automatically, to obtain the acoustic model that language material is gathered.Wherein, the mode of setting up of the acoustic model of inferior language material set is similar with the acoustic model of root language material set substantially, but it should be noted that, when setting up time corpus chorus model with concealed markov model, this model parameter " mean value " is carried out a updating steps, and its " mean value " that upgrades model parameter is according to following formula:
Wherein,
Figure A20051000424100111
With Be respectively the concealed markov model ginseng of set of root language material and time language material set
μ ‾ = n d kn i + n d μ d ‾ + kn i kn i + n d μ i ‾
The mean value of number, n iAnd n dBe respectively the sample number of the language material that is had in set of root language material and time language material set, k is a weighted value, and μ is the mean value of the concealed markov model parameter of the inferior language material set after upgrading.
So, when setting up the acoustic model of time language material set, can fill part and utilize each relevant corpus, and under limited language material, obtain a more accurate model, and do not have the problem that critical value is difficult for decision, under the language material for similar number, can effectively promote the resolution of model.
Consult Fig. 1,6, above-mentioned acoustic model is set up the How It Works of module 4 and is put in order as follows, at first as step 61, root set of phonemes unit 41 elder generations setting one phoneme that acoustic model is set up module 4 is a phoneme, and in the storage element 13 by computing machine 1, selection has the language material of this root phoneme tag file, and sets up a language material set.As step 62, the root phoneme model is set up unit 42 and will be adopted a concealed markov model to set up acoustic model under it to root language material set.As step 63, after the root phoneme model was set up unit 42 and set up the acoustic model of root set of phonemes, inferior phoneme aggregation units 43 by in the set of root language material, sorted out the inferior language material relevant with this root phoneme again, and was established as a language material and gathers.As step 64, inferior phoneme model set up unit 44 be inferior corpus build jointly upright after, each language material in the language material set in proper order, sentence by sentence with a predetermined way, the acoustic model of root language material set is made model adjust training, last as step 65, set up and export the acoustic model of this language material set.
Conclude above-mentioned, the method for building up of acoustic model of the present invention is in the judgement rule of decision tree, do not take the general employed rule of back quoting, and when setting up the acoustic model of time language material set, the method of adaptability training is made in employing to the acoustic model of root language material set, with an account form that is different from the mean value of existing concealed markov model parameter, use all language materials in the set of all times language material effectively, set up the acoustic model of this language material set, so the present invention takes into account convenience and robustness, and really can reach its goal of the invention.
The above person of thought, it only is preferred embodiment of the present invention, when not limiting scope of the invention process with this, promptly the simple equivalent of being done according to the present patent application claim and description of the invention content generally changes and modifies, and all should still belong in the scope that patent of the present invention contains.

Claims (12)

1. the method for building up of an acoustic model is to utilize a plurality of language materials, sets up an acoustic model, and with the reference model as comparison one voice signal, wherein, respectively this language material has at least one phoneme, and this method comprises following step:
A) set up a language material set, wherein, this root language material set has a plurality of language materials, and respectively this root language material has an identical phoneme;
B) set up a language material set relevant with this root phoneme, wherein, the set of this time language material has at least one time language material, and this time language material has the inferior phoneme of this root phoneme and adjacent this root phoneme; And
C) utilize that respectively this root language material and this time language material are set up the acoustic model of this time language material set.
2. the method for building up of acoustic model according to claim 1, wherein, the acoustic model of this time language material set is to adopt concealed markov model.
3. the method for building up of acoustic model according to claim 2, wherein, the renewal of the mean value of this concealed markov model parameter is according to following formula:
Wherein,
Figure A2005100042410002C1
With
Figure A2005100042410002C2
Be respectively the mean value of the concealed markov model parameter of set of root language material and time language material set, n iAnd n dBe respectively the sample of the language material that is had in set of root language material and time language material set
μ ‾ = n d kn i + n d μ d ‾ + kn i kn i + n d μ i ‾
Number, k is a weighted value, μ is the mean value of the concealed markov model parameter of the inferior language material set after upgrading.
4. the method for building up of acoustic model according to claim 1 wherein, between this steps A and this step B, more comprises the step D of the acoustic model of setting up this root language material set.
5. the method for building up of acoustic model according to claim 4, wherein, the acoustic model of the inferior language material set of this step C be with this time language material to the acoustic model of this root language material set adjust automatically the training gained it.
6. the recording medium of an embodied on computer readable, it can be worked in coordination in a computing machine and with a speech input device by mounting, in order to the language material of input is set up corresponding speech model, wherein, this speech input device is in order to receive a language material with at least one phoneme, and this language material is sent in the storage element of this computing machine, record a program code in this recording medium, can read and carry out following step for this computing machine:
A) set up a language material set, wherein, this root language material set has a plurality of language materials, and respectively this root language material has an identical phoneme;
B) set up a language material set relevant with this root phoneme, wherein, the set of this time language material has at least one time language material, and this time language material has the inferior phoneme of this root phoneme and adjacent this root phoneme; And
C) utilize that respectively this root language material and this time language material are set up the acoustic model of this time language material set.
7. recording medium according to claim 6, wherein, the acoustic model of this time language material set is to adopt concealed markov model.
8. recording medium according to claim 7, wherein, the renewal of the mean value of this concealed markov model parameter is according to following formula:
μ ‾ = n d kn i + n d μ d ‾ + kn i kn i + n d μ i ‾
Wherein, With
Figure A2005100042410003C3
Be respectively the mean value of the concealed markov model parameter of set of root language material and time language material set, n iAnd n dBe respectively the sample number of the language material that is had in set of root language material and time language material set, k is a weighted value, and μ is the mean value of the concealed markov model parameter of the inferior language material set after upgrading.
9. recording medium according to claim 6 wherein, between this steps A and this step B, more comprises the step D of the acoustic model of setting up this root language material set.
10. the method for building up of acoustic model according to claim 9, wherein, the acoustic model of the inferior language material set of this step C be with this time language material to the acoustic model of this root language material set adjust automatically the training gained it.
11. the method for building up of a concealed Marko husband acoustic model comprises following step:
A) set up a root language material that comprises a plurality of language materials and gather, and respectively this root language material has an identical phoneme;
B) set up an inferior language material that comprises at least language material and gather, this time language material has the inferior phoneme of this root phoneme and adjacent this root phoneme;
μ ‾ = n d kn i + n d μ d ‾ + kn i kn i + n d μ i ‾
C) calculate μ value in the following formula:
Wherein,
Figure A2005100042410003C5
With Be respectively the mean value of the concealed markov model parameter of this root language material set and the set of this time language material, n iAnd n dBe respectively the language material sample number that is had in this root language material set and the set of this time language material, k is a default weighted value;
D) upgrade the concealed markov model mean parameter that this time language material is gathered with this μ value; And
E) set up the acoustic model of this time language material set according to the concealed markov model mean parameter of the inferior language material set after this renewal.
12. the method for building up of a concealed Marko husband acoustic model, in order to build the acoustic model of upright this time language material set jointly according to a language material set and a relevant inferior corpus, the set of this root language material comprise a plurality of language materials and respectively this root language material have an identical phoneme, this time language material set then comprises the inferior phoneme that at least language material and this time language material have this root phoneme and adjacent this root phoneme, and this method comprises following step:
A) calculate μ value in the following formula:
μ ‾ = n d kn i + n d μ d ‾ + kn i kn i + n d μ i ‾
Wherein, With
Figure A2005100042410004C3
Be respectively the mean value of the concealed markov model parameter of this root language material set and the set of this time language material, n iAnd n dBe respectively the language material sample number that is had in this root language material set and the set of this time language material, k is a default weighted value;
B) upgrade the concealed markov model mean parameter that this time language material is gathered with this μ value; And
C) set up the acoustic model of this time language material set according to the concealed markov model mean parameter of the inferior language material set after this renewal.
CNA2005100042414A 2005-01-04 2005-01-04 Acoustic model construction method Pending CN1801324A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CNA2005100042414A CN1801324A (en) 2005-01-04 2005-01-04 Acoustic model construction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CNA2005100042414A CN1801324A (en) 2005-01-04 2005-01-04 Acoustic model construction method

Publications (1)

Publication Number Publication Date
CN1801324A true CN1801324A (en) 2006-07-12

Family

ID=36811271

Family Applications (1)

Application Number Title Priority Date Filing Date
CNA2005100042414A Pending CN1801324A (en) 2005-01-04 2005-01-04 Acoustic model construction method

Country Status (1)

Country Link
CN (1) CN1801324A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103578467A (en) * 2013-10-18 2014-02-12 威盛电子股份有限公司 Acoustic model building method, voice recognition method and electronic device

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103578467A (en) * 2013-10-18 2014-02-12 威盛电子股份有限公司 Acoustic model building method, voice recognition method and electronic device

Similar Documents

Publication Publication Date Title
US10854193B2 (en) Methods, devices and computer-readable storage media for real-time speech recognition
CN110534095B (en) Speech recognition method, apparatus, device and computer readable storage medium
CN103700370B (en) A kind of radio and television speech recognition system method and system
US9202464B1 (en) Curriculum learning for speech recognition
CN1256714C (en) Hierarchichal language models
CN102227767B (en) System and method for automatic speach to text conversion
CN1655235A (en) Automatic identification of telephone callers based on voice characteristics
CN109065032A (en) A kind of external corpus audio recognition method based on depth convolutional neural networks
CN1187693C (en) Method, apparatus, and system for bottom-up tone integration to Chinese continuous speech recognition system
CN1571013A (en) Method and device for predicting word error rate from text
CN1540625A (en) Front end architecture for multi-lingual text-to-speech system
CN101076851A (en) Spoken language identification system and method for training and operating the said system
CN109979257B (en) Method for performing accurate splitting operation correction based on English reading automatic scoring
CN112580335B (en) Method and device for disambiguating polyphone
CN1819017A (en) Method for extracting feature vectors for speech recognition
CN106295717A (en) A kind of western musical instrument sorting technique based on rarefaction representation and machine learning
WO2014183411A1 (en) Method, apparatus and speech synthesis system for classifying unvoiced and voiced sound
CN1924994A (en) Embedded language synthetic method and system
Le et al. G2G: TTS-driven pronunciation learning for graphemic hybrid ASR
CN115394287A (en) Mixed language voice recognition method, device, system and storage medium
CN1224954C (en) Speech recognition device comprising language model having unchangeable and changeable syntactic block
CN115116428A (en) Prosodic boundary labeling method, apparatus, device, medium, and program product
CN112035700B (en) Voice deep hash learning method and system based on CNN
CN1153127C (en) Intelligent common spoken Chinese phonetic input method and dictation machine
CN104199811A (en) Short sentence analytic model establishing method and system

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Open date: 20060712