CN105957518B - A kind of method of Mongol large vocabulary continuous speech recognition - Google Patents
A kind of method of Mongol large vocabulary continuous speech recognition Download PDFInfo
- Publication number
- CN105957518B CN105957518B CN201610440618.9A CN201610440618A CN105957518B CN 105957518 B CN105957518 B CN 105957518B CN 201610440618 A CN201610440618 A CN 201610440618A CN 105957518 B CN105957518 B CN 105957518B
- Authority
- CN
- China
- Prior art keywords
- suffix
- verb
- word
- stage
- lattice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 24
- 238000012549 training Methods 0.000 claims abstract description 43
- 238000006243 chemical reaction Methods 0.000 claims abstract description 21
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 13
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 12
- 238000002360 preparation method Methods 0.000 claims abstract description 10
- 239000000203 mixture Substances 0.000 claims description 7
- 230000011218 segmentation Effects 0.000 claims description 5
- 238000009958 sewing Methods 0.000 claims description 2
- 230000008569 process Effects 0.000 description 8
- 238000005516 engineering process Methods 0.000 description 6
- 238000010586 diagram Methods 0.000 description 4
- 230000000694 effects Effects 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 238000007476 Maximum Likelihood Methods 0.000 description 2
- 230000002546 agglutinic effect Effects 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000012937 correction Methods 0.000 description 2
- 230000001105 regulatory effect Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000012417 linear regression Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000013518 transcription Methods 0.000 description 1
- 230000035897 transcription Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000001755 vocal effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a kind of methods of Mongol large vocabulary continuous speech recognition, are made of pretreatment stage, preparation stage, training stage, decoding stage and synthesis conversion stage;Pretreatment stage is the cutting to text training corpus, and establishes pronunciation dictionary;Preparation stage is to extract acoustic feature to the voice signal of input;Training stage is using whole word pronunciation dictionary training acoustic model, utilizes the training text train language model after cutting;Decoding stage is that the acoustic feature of input is identified as text information using acoustic model, language model and pronunciation dictionary;The synthesis conversion stage is using the lattice suffix mistake during regular correcting decoder and to merge stem dative suffix, the sentence that final output is made of a Mongolian word.Solving speech recognition system in the prior art can not be comprising extensive Mongol word, by the excessive overlong time for leading to speech recognition of word amount, the sparse problem of language model data in speech recognition system.
Description
Technical field
The invention belongs to technical field of voice recognition, are related to a kind of method of Mongol large vocabulary continuous speech recognition.
Background technique
Speech recognition is to realize a key technology of man machine language's communication, it is related to acoustics, linguistics, at digital signal
Multiple subject technologies such as reason, computer science are a cutting edge technologies of field of information processing, and how the main problem of solution is
The acoustic information received is converted into text information.According to different mission requirements, speech recognition can be divided into: speaker knows
Not, the several types such as keyword spotting and continuous speech recognition.It has been successfully applied to industry, household electrical appliances, communication, automobile at present
The every field such as electronics, medical treatment, home services and consumption electronic product, and achieve extraordinary effect.
The language identified in practical study application field is still with the most widely used languages such as English and Chinese
It is main, and to the language that some use scopes are smaller or number of users is less, the research of speech recognition is still in the initial stage.Mongolia
Language studies its speech recognition technology not only to the education of the minority area in China, traffic, logical as such a language
News, office automatic have great importance, and the research identified to the other country's language voice for also belonging to agglutinative language provides
New idea and method.
According to " research of Mongol voice keyword detection technology, flying dragon, " Chinese Ph.D. Dissertation's full-text database
Information technology volume ", in November, 2013 " described in the scheme for building speech recognition system be divided into three phases.As shown in Figure 1, the
One stage was preparation stage (or front-end processing stage), its main effect is to extract acoustic feature to the voice signal of input.
Second stage is the training stage, and main function is that training is used to decoded acoustic model and language model.Phase III is solution
The code stage, that is, be identified as the acoustic feature of input using the obtained acoustic model of second stage training and language model
Text information.
It is a processing compression process to voice signal information that acoustic feature, which extracts, this in the process to voice signal into
Row analysis processing, retains its information relevant to speech recognition, removes the redundancy unrelated with its.Common extraction acoustics is special
The linear prediction cepstrum coefficient (LPCC) of the mode of sign, mel-frequency cepstrum coefficient (MFCC) and Filter-Bank (Fbank) are special
Sign.But the distinction and adaptability due to these features do not achieve the effect that expect, often make in the training process
It is linearly returned with linear discriminant analysis (Linear Discriminant Analysis, LDA) and feature space maximum likelihood
The methods of (featurespace Maximum Likelihood Linear Regression fMLLR) comes the area of Enhanced feature
Divide property and adaptability.
In the training process, frequently with GMM-HMM (Gaussian Mixture-markov) model is first trained, DNN is trained later
(deep neural network) model is used to substitute GMM (Gaussian Mixture) model, and it is (deep to form the DNN-HMM based on deep neural network
Spend neural network-markov) model.To language model, then the general training N-gram language model either language based on RNN
Say model.
For acoustic feature, an identification network is created as using acoustic model, language model and pronunciation dictionary structure.The net
Network is a directed acyclic graph, and an optimal path (path of maximum probability) for the network is found by Viterbi algorithm, this
Paths are exactly the best text information that voice signal is identified by identifying system.Simultaneously in use, language is usually given
Speech model assigns different weights, and a long word is arranged and punishes score, for finding the best of language model and acoustic model
Specific gravity.
Include million or more Mongol word in Mongol and is continually introducing new vocabulary.In the actual environment I
All Mongol words can not be integrally incorporated in pronunciation dictionary, the corpus of text being collected into also can not be by all Mongolia
Language word is all summarized, and will appear missing or rare situation to many words, will lead in this way train language model when
There is the problem of Sparse in time.Simultaneously with the increase of word quantity in pronunciation dictionary, it will lead to speech recognition system and knowing
Calculation amount increases during not, and recognition time extends to the intolerable degree of user.
Summary of the invention
To achieve the above object, the present invention provides a kind of method of Mongol large vocabulary continuous speech recognition, solves
Speech recognition system can not include extensive Mongol word in the prior art, by the word amount excessive time for leading to speech recognition
It is too long, the sparse problem of language model data in speech recognition system.
The technical scheme adopted by the invention is that a kind of method of Mongol large vocabulary continuous speech recognition, by advance
Reason stage, preparation stage, training stage, decoding stage and synthesis conversion stage composition;
Pretreatment stage is exactly by the segmentation of words in language model training text into stem other than verb, lattice suffix and dynamic
The form of word, while establishing the pronunciation dictionary based on stem, lattice suffix and verb other than verb;
Preparation stage is to extract acoustic feature to the voice signal of input;
Training stage is to establish acoustic model using the pronunciation dictionary based on the whole word of Mongolian, using based on word other than verb
The pronunciation dictionary of dry, lattice suffix and verb establishes language model;
Decoding stage is to utilize acoustic model, language model and the pronunciation based on stem, lattice suffix and verb other than verb
Dictionary creation identifies network, and the acoustic feature of input is identified as text information;
The synthesis conversion stage is using the lattice suffix mistake during regular correcting decoder and to merge stem dative suffix,
The sentence that final output is made of a Mongolian word.
Of the invention to be further characterized in that, further, pretreatment stage specifically follows the steps below: in training mould
Before type, a Mongolian word in the training set text of language model is converted into corresponding Latin state;It later will be after conversion
The segmentation of words is deposited at stem, lattice suffix and verb form other than corresponding verb, and by stem, lattice suffix and verb other than verb
It is placed on based on other than verb in the pronunciation dictionary of stem, lattice suffix and verb.
Further, the application method of pronunciation dictionary, specifically follows the steps below: establish two kinds of pronunciation dictionaries, one
The whole word of kind pronunciation dictionary storage Mongolian and corresponding pronunciation, the training for acoustic model;Another pronunciation dictionary is deposited
Put stem other than verb, stem, lattice suffix and verb pronounce accordingly other than lattice suffix and verb and verb, while establishing hair
The all possible pronunciation of lattice suffix is all added in pronunciation dictionary when sound dictionary, the decoding for acoustic model.
Further, the conversion stage is synthesized, is specifically followed the steps below:
Step 1, the lattice suffix mistake in text after regular correcting decoder is utilized;
Step 2, stem dative suffix is merged to the word for being combined into corresponding Latin form, while utilizing condition random field mould
Type carries out punctuation mark prediction to the sentence after identification, and prediction result is added in the sentence of identification;
Step 3, by the contrast relationship of Latin word and Mongolian word, the Latin word merged is converted into actual
A Mongolian word is exactly actual output result by the sentence that a Mongolian word forms.
The invention has the advantages that the invention has the following advantages that
(1) Mongolian Speech Recognition Systems based on stem, lattice suffix and verb other than verb can by identification stem,
Lattice suffix and verb realize the identification to most of a Mongolian words.
(2) Mongolian Speech Recognition Systems based on stem, lattice suffix and verb other than verb reduce in pronunciation dictionary
The number of word greatly reduces the calculation amount of system identification, by recognition time control within tolerance interval.
(3) Mongolian Speech Recognition Systems based on stem, lattice suffix and verb other than verb solve language in system
The sparse problem of model data, so that system performance greatly improves.
Detailed description of the invention
Fig. 1 is speech recognition system frame diagram in the prior art.
Fig. 2 is Mongolian splicing word formation pattern schematic diagram of the present invention.
Fig. 3 is speech recognition system frame diagram of the present invention.
Fig. 4 is the instance graph of pretreatment stage cutting Mongolian sentence of the present invention.
Fig. 5 is two kinds of pronunciation dictionary partial content tables of comparisons of the invention.
Fig. 6 is the selection rule schema of regular correction section ending suffix of the invention.
Fig. 7 is the instance graph in present invention synthesis conversion stage.
Specific embodiment
The principle of Mongol segmentation identification:
Mongol is typical agglutinative language, is mainly spliced by root and affixe to constitute Mongol word, such as Fig. 2 institute
Show.From splicing and combining for root and affixe, it can be seen that root and morphological affix are configured the splicing of suffix there is reality
Semantic modification, and then there was only phraseological meaning with the splicing of ending suffix later, and to be stored in composition always single for position
Word it is last.Ending suffix be then not belonging to stem suffix, it include quiet word lattice suffix, possess and control (owner) suffix, formula verb (when
Between, person) suffix and secondary verbal suffix.And for participle suffix, if participle can consider when serving as the predicate of main clause
Be ending suffix, but when participle is used as quiet word (especially back connect add lattice suffix when) may be considered stem after
Sew.Under normal circumstances, the order of suffix is word-building suffix preceding, and configuration suffix is rear, and the suffix that ends up is last.Structure in word
Word suffix and configuration suffix can have more than one, but the suffix that ends up it is general only one (Mongolian is sewed after reversed body possess and control
When sewing can there are two end up suffix).By root, word-building suffix and configuration suffix splicing composition stem, allow stem and ending word
Sew the basis as Mongolian language word-building, different stems and different ending suffix can be combined into most of Mongol list
Word.The training identification of word can be converted into knowing the training of stem and ending suffix in this way in speech recognition system
Not.But there is the following in the simple training identification method based on stem and ending suffix.Firstly, Mongolian verb
Will appear when stem and ending suffix cutting, phenomena such as falling off and be inserted into of vowel, so it is difficult to ensure that cutting when cutting
Accuracy rate.Secondly, the knot of different verbs is spliced in the pronunciation of the ending suffix of verb stem and verb in verb stem suffix
When tail suffix, the pronunciation of verb stem and the suffix that ends up the transformation of vowel and consonant phoneme can all occur, be inserted into and fall off etc. one
Series of problems, so it is times that impossible complete that the pronunciation of all verb stems and verb ending suffix, which is added to pronunciation dictionary,
Business, this proposes very big challenge to the foundation of pronunciation dictionary.However, other stems other than verb sew the ending suffix connect is
Lattice suffix, the pronunciation of lattice suffix with stem be it is relatively independent, sew and connect different lattice suffix, will not influence the pronunciation of stem, institute
More stable with the pronunciation of stem other than verb, we only need the different pronunciations of lattice pronunciation dictionary is added.
Therefore, verb is separately separated out by we, by stem other than verb and verb and lattice suffix collectively as identification
Unit, so in the text identifying system be known as the speech recognition system based on stem, lattice suffix and verb other than verb.
Mongolian Speech Recognition Systems based on stem and ending suffix are built:
Mongolian Speech Recognition Systems based on stem, lattice suffix and verb other than verb are by pretreatment stage, preparation rank
Section, training stage, decoding stage and synthesis conversion stage composition.Pretreatment stage is instructed to phonetic symbol text and language model
Practice text Latin conversion and conversion after language model training text Inner Mongol ancient Chinese prose word cutting, while establish based on verb with
The pronunciation dictionary of outer stem, lattice suffix and verb;Preparation stage is to extract acoustic feature to the voice signal of input;Training stage
It is using whole word pronunciation dictionary training acoustic model, utilizes the training text train language model after cutting;Decoding stage is benefit
With acoustic model, language model and pronunciation dictionary based on stem, lattice suffix and verb other than verb, by the acoustic feature of input
It is identified as text information.Wherein preparation stage, training stage and decoding stage are unrelated with language, and the present invention is mainly to pronunciation word
Allusion quotation, newly added pretreatment stage and synthesis conversion stage are adjusted.Since Mongolian letter is in the different location of word
Have different deformations, and there are problems that similar shape not unisonance in letter, this when building Mongolian Speech Recognition Systems,
It is unfavorable for making a search to the recognition performance of system, so the application is in pretreatment stage by the text in pronunciation dictionary, sound bank
The equal transcription of Mongolian word in the text training set of mark and train language model passes through increased conjunction at Latin form
Show that actual Mongolian sentence, frame diagram are as shown in Figure 3 at conversion process.
The pretreatment of language model training:
For the training set of language model, need for the word in training set to be cut into stem, lattice other than corresponding verb
Suffix and verb form.Mongolian lattice suffix is write in written word using the narrow Nonbreaking Space of Mongolian point.Mongolian is narrow continuously
The width in disconnected space is the one third of double byte character, and slightly more shorter than common space, Latin form is indicated with "-".Such as Fig. 4
It is shown, it is carried out other than verb in the corpus of text for the train language model being converted into after Latin form according to "-" letter is convenient
The cutting of stem and lattice suffix;Training text after cutting is used to be trained language model.
Language model is trained using the training text after cutting, enable language model in decoding process very
Good is matched with the pronunciation dictionary of stem, lattice suffix and verb other than verb.The result obtained after the decoding in this way is
Exist with stem, lattice suffix and verb form other than verb.Stem and lattice suffix can combine large-scale Mongolia other than verb
Literary word, and other than verb stem, lattice suffix and common verb sum within tens of thousands of.This solves language models
The identification problem of Sparse Problem and extensive a Mongolian word in the training process.
The variation and use of pronunciation dictionary:
Different from original Mongolian Speech Recognition Systems, the present invention will use two kinds of pronunciation dictionaries, and one is traditional
Store the whole word of Mongolian and its correspond to pronunciation pronunciation dictionary, another kind be storage verb other than stem, lattice suffix and verb with
And its pronunciation dictionary accordingly to pronounce, and a variety of pronunciation situations of same lattice suffix are directed to, table one by one is needed in pronunciation dictionary
It shows and.As shown in figure 5, being two pronunciation dictionary partial content tables of comparisons, it can be seen that the word of whole word pronunciation dictionary storage
There are two types of the forms of expression for the pronunciation dictionary of stem, lattice suffix and verb other than based on verb, and one is constant forms, i.e. verb
It is exactly other parts of speech of stem with whole word, indicates consistent in two kinds of pronunciation dictionaries, " sagvjv " and " qasidahv " in Fig. 5
Belong to verb, " elqin " is then that only stem, the form that they are stored in two pronunciation dictionaries are constant;It is another then be by
Other words of the non-verb of stem and lattice suffix composition, this word stem, lattice suffix and verb other than based on verb
It is divided into stem in pronunciation dictionary and lattice suffix stores respectively." tarihi-ban " and " tere-yi " in Fig. 5, they by
Stem and lattice suffix are composed, therefore are divided into word in the pronunciation dictionary of stem, lattice suffix and verb other than based on verb
Dry " tarihi ", " tere " and lattice suffix "-ban ", "-yi " are stored respectively.We use whole word in training acoustic model
Pronunciation dictionary, such acoustic training model can more accurately indicate to train the corresponding pronunciation phonemes of sentence.Otherwise, it is embroidered with after lattice
Multiple pronunciations, the pronunciation default choice of training sentence the first pronunciation therein, will appear the pronunciation of many training sentences in this way
Phoneme conversion mistake.The pronunciation dictionary based on stem, lattice suffix and verb other than verb is then used in decoding process.
Not only had to word in collecting using being decoded based on the pronunciation dictionary of stem, lattice suffix and verb other than verb
Effect same as the pronunciation dictionary based on whole word, and utilize the pronunciation word based on stem, lattice suffix and verb other than verb
Allusion quotation can preferably arrange in pairs or groups with the language model after cutting, and make in the way of stem other than verb, lattice suffix and verb
It is able to solve the problem of identifying large-scale a Mongolian word, while this mode reduces word quantity in pronunciation dictionary,
Time needed for reducing identification, solves the problems, such as existing Mongol speech recognition overlong time.
Synthesize the conversion stage:
During the experiment, it has been found that in some error results after the decoding, there is universal rule.These rule
Rule, is concentrated mainly in Mongol on the decoding error of lattice suffix.Therefore these mistakes are directed to, can be used more Mongolian
Rule corrects it.As shown in fig. 6, judging lattice suffix "-dv " ,-du ", the selection of "-tv ", "-tu ", be positive word in stem
In the case where, if stem is not with vowel or " n ", " N ", " l ", "-tv " lattice suffix is chosen in " m " ending, if stem is with vowel
Or " n " ending then selects "-dv " lattice suffix.Conversely, in the case where stem is not positive word, if stem be not with vowel or
" n ", " N ", " l ", " m " ending, then select lattice suffix "-tu ", if stem is with vowel or " n ", " N ", " l ", " m " ending, then
It selects lattice suffix "-du ".
Therefore in the synthesis conversion stage, it is necessary first to carry out the lattice suffix mistake in decoding process by the way of rule
It corrects, stem dative suffix is merged into corresponding Latin word later, while using condition random field to the Mongolia after identification
Sentence is made pauses in reading unpunctuated ancient writings and adds punctuation mark.Finally by the contrast relationship of Latin word and a Mongolian word, it is converted
It is exactly actual output result by the sentence that a Mongolian word forms at actual a Mongolian word.
The lattice suffix correction for identifying mistake can be further improved voice using Mongol rule and knows by the synthesis conversion stage
Other accuracy rate.The result after identification can be shown in the form of Mongolian simultaneously.This solves a part of acoustic mode
The problem of type and language model can not distinguish approximate lattice suffix completely, while solving the display problem of Mongolian.Fig. 7 gives
Realize a full instance in synthesis conversion stage, first sentence is the early results after identification in figure, and second sentence is then
It is by after rule regulating as a result, the lattice suffix of overstriking is exactly the correct lattice suffix obtained by rule regulating in sentence.The
Three sentences are the results that prediction punctuation mark obtains after merging;4th sentence is the knot being converted into after the Mongolian form of expression
Fruit.
Claims (4)
1. a kind of method of Mongol large vocabulary continuous speech recognition, which is characterized in that by pretreatment stage, the preparation stage,
Training stage, decoding stage and synthesis conversion stage composition;
The pretreatment stage is exactly by the segmentation of words in language model training text into stem other than verb, lattice suffix and dynamic
The form of word, while establishing the pronunciation dictionary based on stem, lattice suffix and verb other than verb;
The preparation stage is to extract acoustic feature to the voice signal of input;
The training stage is to establish acoustic model using the pronunciation dictionary based on the whole word of Mongolian, using based on word other than verb
The pronunciation dictionary of dry, lattice suffix and verb establishes language model;
The decoding stage is to utilize acoustic model, language model and the pronunciation based on stem, lattice suffix and verb other than verb
Dictionary creation identifies network, and the acoustic feature of input is identified as text information;
The synthesis conversion stage is using the lattice suffix mistake during regular correcting decoder and to merge stem dative suffix,
The sentence that final output is made of a Mongolian word.
2. a kind of method of Mongol large vocabulary continuous speech recognition according to claim 1, which is characterized in that described
Pretreatment stage specifically follows the steps below: before training pattern, by the Mongolian in the training set text of language model
Word is converted into corresponding Latin state;Later by the segmentation of words after conversion at stem other than corresponding verb, lattice suffix and
Verb form, and stem, lattice suffix and verb other than verb are stored in the hair based on stem, lattice suffix and verb other than verb
In sound dictionary.
3. a kind of method of Mongol large vocabulary continuous speech recognition according to claim 1, which is characterized in that described
The application method of pronunciation dictionary, specifically follows the steps below: establishing two kinds of pronunciation dictionaries, a kind of pronunciation dictionary storage Mongolia
The whole word of text and corresponding pronunciation, the training for acoustic model;Other than another pronunciation dictionary storage verb after stem, lattice
By lattice suffix when sewing and pronounce accordingly with stem, lattice suffix and verb other than verb and verb, while establishing pronunciation dictionary
All possible pronunciation is all added in pronunciation dictionary, the decoding for acoustic model.
4. a kind of method of Mongol large vocabulary continuous speech recognition according to claim 1, which is characterized in that described
The conversion stage is synthesized, is specifically followed the steps below:
Step 1, the lattice suffix mistake in text after regular correcting decoder is utilized;
Step 2, stem dative suffix is merged to the word for being combined into corresponding Latin form, while utilizing conditional random field models pair
Sentence after identification carries out punctuation mark prediction, and prediction result is added in the sentence of identification;
Step 3, by the contrast relationship of Latin word and Mongolian word, the Latin word merged is converted into actual Mongolia
Literary word is exactly actual output result by the sentence that a Mongolian word forms.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610440618.9A CN105957518B (en) | 2016-06-16 | 2016-06-16 | A kind of method of Mongol large vocabulary continuous speech recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610440618.9A CN105957518B (en) | 2016-06-16 | 2016-06-16 | A kind of method of Mongol large vocabulary continuous speech recognition |
Publications (2)
Publication Number | Publication Date |
---|---|
CN105957518A CN105957518A (en) | 2016-09-21 |
CN105957518B true CN105957518B (en) | 2019-05-31 |
Family
ID=56905926
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610440618.9A Active CN105957518B (en) | 2016-06-16 | 2016-06-16 | A kind of method of Mongol large vocabulary continuous speech recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105957518B (en) |
Families Citing this family (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108428448A (en) * | 2017-02-13 | 2018-08-21 | 芋头科技(杭州)有限公司 | A kind of sound end detecting method and audio recognition method |
CN108573696B (en) * | 2017-03-10 | 2021-03-30 | 北京搜狗科技发展有限公司 | Voice recognition method, device and equipment |
JP6585112B2 (en) * | 2017-03-17 | 2019-10-02 | 株式会社東芝 | Voice keyword detection apparatus and voice keyword detection method |
CN107247706B (en) * | 2017-06-16 | 2021-06-25 | 中国电子技术标准化研究院 | Text sentence-breaking model establishing method, sentence-breaking method, device and computer equipment |
CN107680582B (en) * | 2017-07-28 | 2021-03-26 | 平安科技(深圳)有限公司 | Acoustic model training method, voice recognition method, device, equipment and medium |
CN107767858B (en) * | 2017-09-08 | 2021-05-04 | 科大讯飞股份有限公司 | Pronunciation dictionary generating method and device, storage medium and electronic equipment |
CN107978315B (en) * | 2017-11-20 | 2021-08-10 | 徐榭 | Dialogue type radiotherapy planning system based on voice recognition and making method |
CN108182938B (en) * | 2017-12-21 | 2019-03-19 | 内蒙古工业大学 | A kind of training method of the Mongol acoustic model based on DNN |
CN108563639B (en) * | 2018-04-17 | 2021-09-17 | 内蒙古工业大学 | Mongolian language model based on recurrent neural network |
CN108549703B (en) * | 2018-04-17 | 2022-03-25 | 内蒙古工业大学 | Mongolian language model training method based on recurrent neural network |
CN108831458A (en) * | 2018-05-29 | 2018-11-16 | 广东声将军科技有限公司 | A kind of offline voice is to order transform method and system |
CN109410914B (en) * | 2018-08-28 | 2022-02-22 | 江西师范大学 | Method for identifying Jiangxi dialect speech and dialect point |
CN109492232A (en) * | 2018-10-22 | 2019-03-19 | 内蒙古工业大学 | A kind of illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer |
CN109360554A (en) * | 2018-12-10 | 2019-02-19 | 广东潮庭集团有限公司 | A kind of language identification method based on language deep neural network |
CN110223675B (en) * | 2019-06-13 | 2022-04-19 | 思必驰科技股份有限公司 | Method and system for screening training text data for voice recognition |
CN111540343B (en) * | 2020-03-17 | 2021-02-05 | 北京捷通华声科技股份有限公司 | Corpus identification method and apparatus |
CN113205792A (en) * | 2021-04-08 | 2021-08-03 | 内蒙古工业大学 | Mongolian speech synthesis method based on Transformer and WaveNet |
CN113377901B (en) * | 2021-05-17 | 2022-08-19 | 内蒙古工业大学 | Mongolian text emotion analysis method based on multi-size CNN and LSTM models |
CN113571045B (en) * | 2021-06-02 | 2024-03-12 | 北京它思智能科技有限公司 | Method, system, equipment and medium for identifying Minnan language voice |
CN113515952B (en) * | 2021-08-18 | 2023-09-12 | 内蒙古工业大学 | Combined modeling method, system and equipment for Mongolian dialogue model |
CN114936555B (en) * | 2022-05-24 | 2023-06-06 | 内蒙古自治区公安厅 | Method and system for AI intelligent labeling of Mongolian |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101576909A (en) * | 2009-05-11 | 2009-11-11 | 内蒙古蒙科立软件有限责任公司 | Mongolian digital knowledge base system construction method |
CN101576924A (en) * | 2009-06-25 | 2009-11-11 | 内蒙古大学 | Mongolian retrieval method |
CN102063900A (en) * | 2010-11-26 | 2011-05-18 | 北京交通大学 | Speech recognition method and system for overcoming confusing pronunciation |
CN103021407A (en) * | 2012-12-18 | 2013-04-03 | 中国科学院声学研究所 | Method and system for recognizing speech of agglutinative language |
CN103065632A (en) * | 2012-12-21 | 2013-04-24 | 中国科学院声学研究所 | Selection method and system of recognition unit for Uygur language voice recognition |
CN103632663A (en) * | 2013-11-25 | 2014-03-12 | 飞龙 | HMM-based method of Mongolian speech synthesis and front-end processing |
CN104575497A (en) * | 2013-10-28 | 2015-04-29 | 中国科学院声学研究所 | Method for building acoustic model and speech decoding method based on acoustic model |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH01260494A (en) * | 1988-04-12 | 1989-10-17 | Matsushita Electric Ind Co Ltd | Voice recognizing method |
JPH0446397A (en) * | 1990-06-14 | 1992-02-17 | Nec Corp | Continuous voice recognition system |
-
2016
- 2016-06-16 CN CN201610440618.9A patent/CN105957518B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101576909A (en) * | 2009-05-11 | 2009-11-11 | 内蒙古蒙科立软件有限责任公司 | Mongolian digital knowledge base system construction method |
CN101576924A (en) * | 2009-06-25 | 2009-11-11 | 内蒙古大学 | Mongolian retrieval method |
CN102063900A (en) * | 2010-11-26 | 2011-05-18 | 北京交通大学 | Speech recognition method and system for overcoming confusing pronunciation |
CN103021407A (en) * | 2012-12-18 | 2013-04-03 | 中国科学院声学研究所 | Method and system for recognizing speech of agglutinative language |
CN103065632A (en) * | 2012-12-21 | 2013-04-24 | 中国科学院声学研究所 | Selection method and system of recognition unit for Uygur language voice recognition |
CN104575497A (en) * | 2013-10-28 | 2015-04-29 | 中国科学院声学研究所 | Method for building acoustic model and speech decoding method based on acoustic model |
CN103632663A (en) * | 2013-11-25 | 2014-03-12 | 飞龙 | HMM-based method of Mongolian speech synthesis and front-end processing |
Non-Patent Citations (1)
Title |
---|
基于分割识别的蒙古语语音关键词检测方法的研究;飞龙等;《计算机科学》;20130930;第40卷(第9期);第208-211页 |
Also Published As
Publication number | Publication date |
---|---|
CN105957518A (en) | 2016-09-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105957518B (en) | A kind of method of Mongol large vocabulary continuous speech recognition | |
US9711139B2 (en) | Method for building language model, speech recognition method and electronic apparatus | |
Waibel et al. | Multilinguality in speech and spoken language systems | |
US9613621B2 (en) | Speech recognition method and electronic apparatus | |
Karpov et al. | Large vocabulary Russian speech recognition using syntactico-statistical language modeling | |
CN109637537B (en) | Method for automatically acquiring annotated data to optimize user-defined awakening model | |
Abushariah et al. | Arabic speaker-independent continuous automatic speech recognition based on a phonetically rich and balanced speech corpus. | |
CN108986797B (en) | Voice theme recognition method and system | |
CN102063900A (en) | Speech recognition method and system for overcoming confusing pronunciation | |
Chen et al. | Lightly supervised and data-driven approaches to mandarin broadcast news transcription | |
Carvalho et al. | A critical survey on the use of fuzzy sets in speech and natural language processing | |
US9959270B2 (en) | Method and apparatus to model and transfer the prosody of tags across languages | |
JP5073024B2 (en) | Spoken dialogue device | |
Al-Anzi et al. | The impact of phonological rules on Arabic speech recognition | |
CN102970618A (en) | Video on demand method based on syllable identification | |
Lin et al. | Hierarchical prosody modeling for Mandarin spontaneous speech | |
CN112489634A (en) | Language acoustic model training method and device, electronic equipment and computer medium | |
Vazhenina et al. | State-of-the-art speech recognition technologies for Russian language | |
KR20150128656A (en) | Name transliteration method based on classification of name origins | |
CN109523992A (en) | Tibetan dialect speech processing system | |
CN114863914A (en) | Deep learning method for constructing end-to-end speech evaluation model | |
Sung et al. | Deploying google search by voice in cantonese | |
Schlippe et al. | Rapid bootstrapping of a ukrainian large vocabulary continuous speech recognition system | |
Chen et al. | Using Taigi dramas with Mandarin Chinese subtitles to improve Taigi speech recognition | |
Arısoy | Turkish dictation system for radiology and broadcast news applications |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |