CN105957518B - A kind of method of Mongol large vocabulary continuous speech recognition - Google Patents

A kind of method of Mongol large vocabulary continuous speech recognition Download PDF

Info

Publication number
CN105957518B
CN105957518B CN201610440618.9A CN201610440618A CN105957518B CN 105957518 B CN105957518 B CN 105957518B CN 201610440618 A CN201610440618 A CN 201610440618A CN 105957518 B CN105957518 B CN 105957518B
Authority
CN
China
Prior art keywords
suffix
verb
word
stage
lattice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610440618.9A
Other languages
Chinese (zh)
Other versions
CN105957518A (en
Inventor
飞龙
高光来
张红伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inner Mongolia University
Original Assignee
Inner Mongolia University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inner Mongolia University filed Critical Inner Mongolia University
Priority to CN201610440618.9A priority Critical patent/CN105957518B/en
Publication of CN105957518A publication Critical patent/CN105957518A/en
Application granted granted Critical
Publication of CN105957518B publication Critical patent/CN105957518B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a kind of methods of Mongol large vocabulary continuous speech recognition, are made of pretreatment stage, preparation stage, training stage, decoding stage and synthesis conversion stage;Pretreatment stage is the cutting to text training corpus, and establishes pronunciation dictionary;Preparation stage is to extract acoustic feature to the voice signal of input;Training stage is using whole word pronunciation dictionary training acoustic model, utilizes the training text train language model after cutting;Decoding stage is that the acoustic feature of input is identified as text information using acoustic model, language model and pronunciation dictionary;The synthesis conversion stage is using the lattice suffix mistake during regular correcting decoder and to merge stem dative suffix, the sentence that final output is made of a Mongolian word.Solving speech recognition system in the prior art can not be comprising extensive Mongol word, by the excessive overlong time for leading to speech recognition of word amount, the sparse problem of language model data in speech recognition system.

Description

A kind of method of Mongol large vocabulary continuous speech recognition
Technical field
The invention belongs to technical field of voice recognition, are related to a kind of method of Mongol large vocabulary continuous speech recognition.
Background technique
Speech recognition is to realize a key technology of man machine language's communication, it is related to acoustics, linguistics, at digital signal Multiple subject technologies such as reason, computer science are a cutting edge technologies of field of information processing, and how the main problem of solution is The acoustic information received is converted into text information.According to different mission requirements, speech recognition can be divided into: speaker knows Not, the several types such as keyword spotting and continuous speech recognition.It has been successfully applied to industry, household electrical appliances, communication, automobile at present The every field such as electronics, medical treatment, home services and consumption electronic product, and achieve extraordinary effect.
The language identified in practical study application field is still with the most widely used languages such as English and Chinese It is main, and to the language that some use scopes are smaller or number of users is less, the research of speech recognition is still in the initial stage.Mongolia Language studies its speech recognition technology not only to the education of the minority area in China, traffic, logical as such a language News, office automatic have great importance, and the research identified to the other country's language voice for also belonging to agglutinative language provides New idea and method.
According to " research of Mongol voice keyword detection technology, flying dragon, " Chinese Ph.D. Dissertation's full-text database Information technology volume ", in November, 2013 " described in the scheme for building speech recognition system be divided into three phases.As shown in Figure 1, the One stage was preparation stage (or front-end processing stage), its main effect is to extract acoustic feature to the voice signal of input. Second stage is the training stage, and main function is that training is used to decoded acoustic model and language model.Phase III is solution The code stage, that is, be identified as the acoustic feature of input using the obtained acoustic model of second stage training and language model Text information.
It is a processing compression process to voice signal information that acoustic feature, which extracts, this in the process to voice signal into Row analysis processing, retains its information relevant to speech recognition, removes the redundancy unrelated with its.Common extraction acoustics is special The linear prediction cepstrum coefficient (LPCC) of the mode of sign, mel-frequency cepstrum coefficient (MFCC) and Filter-Bank (Fbank) are special Sign.But the distinction and adaptability due to these features do not achieve the effect that expect, often make in the training process It is linearly returned with linear discriminant analysis (Linear Discriminant Analysis, LDA) and feature space maximum likelihood The methods of (featurespace Maximum Likelihood Linear Regression fMLLR) comes the area of Enhanced feature Divide property and adaptability.
In the training process, frequently with GMM-HMM (Gaussian Mixture-markov) model is first trained, DNN is trained later (deep neural network) model is used to substitute GMM (Gaussian Mixture) model, and it is (deep to form the DNN-HMM based on deep neural network Spend neural network-markov) model.To language model, then the general training N-gram language model either language based on RNN Say model.
For acoustic feature, an identification network is created as using acoustic model, language model and pronunciation dictionary structure.The net Network is a directed acyclic graph, and an optimal path (path of maximum probability) for the network is found by Viterbi algorithm, this Paths are exactly the best text information that voice signal is identified by identifying system.Simultaneously in use, language is usually given Speech model assigns different weights, and a long word is arranged and punishes score, for finding the best of language model and acoustic model Specific gravity.
Include million or more Mongol word in Mongol and is continually introducing new vocabulary.In the actual environment I All Mongol words can not be integrally incorporated in pronunciation dictionary, the corpus of text being collected into also can not be by all Mongolia Language word is all summarized, and will appear missing or rare situation to many words, will lead in this way train language model when There is the problem of Sparse in time.Simultaneously with the increase of word quantity in pronunciation dictionary, it will lead to speech recognition system and knowing Calculation amount increases during not, and recognition time extends to the intolerable degree of user.
Summary of the invention
To achieve the above object, the present invention provides a kind of method of Mongol large vocabulary continuous speech recognition, solves Speech recognition system can not include extensive Mongol word in the prior art, by the word amount excessive time for leading to speech recognition It is too long, the sparse problem of language model data in speech recognition system.
The technical scheme adopted by the invention is that a kind of method of Mongol large vocabulary continuous speech recognition, by advance Reason stage, preparation stage, training stage, decoding stage and synthesis conversion stage composition;
Pretreatment stage is exactly by the segmentation of words in language model training text into stem other than verb, lattice suffix and dynamic The form of word, while establishing the pronunciation dictionary based on stem, lattice suffix and verb other than verb;
Preparation stage is to extract acoustic feature to the voice signal of input;
Training stage is to establish acoustic model using the pronunciation dictionary based on the whole word of Mongolian, using based on word other than verb The pronunciation dictionary of dry, lattice suffix and verb establishes language model;
Decoding stage is to utilize acoustic model, language model and the pronunciation based on stem, lattice suffix and verb other than verb Dictionary creation identifies network, and the acoustic feature of input is identified as text information;
The synthesis conversion stage is using the lattice suffix mistake during regular correcting decoder and to merge stem dative suffix, The sentence that final output is made of a Mongolian word.
Of the invention to be further characterized in that, further, pretreatment stage specifically follows the steps below: in training mould Before type, a Mongolian word in the training set text of language model is converted into corresponding Latin state;It later will be after conversion The segmentation of words is deposited at stem, lattice suffix and verb form other than corresponding verb, and by stem, lattice suffix and verb other than verb It is placed on based on other than verb in the pronunciation dictionary of stem, lattice suffix and verb.
Further, the application method of pronunciation dictionary, specifically follows the steps below: establish two kinds of pronunciation dictionaries, one The whole word of kind pronunciation dictionary storage Mongolian and corresponding pronunciation, the training for acoustic model;Another pronunciation dictionary is deposited Put stem other than verb, stem, lattice suffix and verb pronounce accordingly other than lattice suffix and verb and verb, while establishing hair The all possible pronunciation of lattice suffix is all added in pronunciation dictionary when sound dictionary, the decoding for acoustic model.
Further, the conversion stage is synthesized, is specifically followed the steps below:
Step 1, the lattice suffix mistake in text after regular correcting decoder is utilized;
Step 2, stem dative suffix is merged to the word for being combined into corresponding Latin form, while utilizing condition random field mould Type carries out punctuation mark prediction to the sentence after identification, and prediction result is added in the sentence of identification;
Step 3, by the contrast relationship of Latin word and Mongolian word, the Latin word merged is converted into actual A Mongolian word is exactly actual output result by the sentence that a Mongolian word forms.
The invention has the advantages that the invention has the following advantages that
(1) Mongolian Speech Recognition Systems based on stem, lattice suffix and verb other than verb can by identification stem, Lattice suffix and verb realize the identification to most of a Mongolian words.
(2) Mongolian Speech Recognition Systems based on stem, lattice suffix and verb other than verb reduce in pronunciation dictionary The number of word greatly reduces the calculation amount of system identification, by recognition time control within tolerance interval.
(3) Mongolian Speech Recognition Systems based on stem, lattice suffix and verb other than verb solve language in system The sparse problem of model data, so that system performance greatly improves.
Detailed description of the invention
Fig. 1 is speech recognition system frame diagram in the prior art.
Fig. 2 is Mongolian splicing word formation pattern schematic diagram of the present invention.
Fig. 3 is speech recognition system frame diagram of the present invention.
Fig. 4 is the instance graph of pretreatment stage cutting Mongolian sentence of the present invention.
Fig. 5 is two kinds of pronunciation dictionary partial content tables of comparisons of the invention.
Fig. 6 is the selection rule schema of regular correction section ending suffix of the invention.
Fig. 7 is the instance graph in present invention synthesis conversion stage.
Specific embodiment
The principle of Mongol segmentation identification:
Mongol is typical agglutinative language, is mainly spliced by root and affixe to constitute Mongol word, such as Fig. 2 institute Show.From splicing and combining for root and affixe, it can be seen that root and morphological affix are configured the splicing of suffix there is reality Semantic modification, and then there was only phraseological meaning with the splicing of ending suffix later, and to be stored in composition always single for position Word it is last.Ending suffix be then not belonging to stem suffix, it include quiet word lattice suffix, possess and control (owner) suffix, formula verb (when Between, person) suffix and secondary verbal suffix.And for participle suffix, if participle can consider when serving as the predicate of main clause Be ending suffix, but when participle is used as quiet word (especially back connect add lattice suffix when) may be considered stem after Sew.Under normal circumstances, the order of suffix is word-building suffix preceding, and configuration suffix is rear, and the suffix that ends up is last.Structure in word Word suffix and configuration suffix can have more than one, but the suffix that ends up it is general only one (Mongolian is sewed after reversed body possess and control When sewing can there are two end up suffix).By root, word-building suffix and configuration suffix splicing composition stem, allow stem and ending word Sew the basis as Mongolian language word-building, different stems and different ending suffix can be combined into most of Mongol list Word.The training identification of word can be converted into knowing the training of stem and ending suffix in this way in speech recognition system Not.But there is the following in the simple training identification method based on stem and ending suffix.Firstly, Mongolian verb Will appear when stem and ending suffix cutting, phenomena such as falling off and be inserted into of vowel, so it is difficult to ensure that cutting when cutting Accuracy rate.Secondly, the knot of different verbs is spliced in the pronunciation of the ending suffix of verb stem and verb in verb stem suffix When tail suffix, the pronunciation of verb stem and the suffix that ends up the transformation of vowel and consonant phoneme can all occur, be inserted into and fall off etc. one Series of problems, so it is times that impossible complete that the pronunciation of all verb stems and verb ending suffix, which is added to pronunciation dictionary, Business, this proposes very big challenge to the foundation of pronunciation dictionary.However, other stems other than verb sew the ending suffix connect is Lattice suffix, the pronunciation of lattice suffix with stem be it is relatively independent, sew and connect different lattice suffix, will not influence the pronunciation of stem, institute More stable with the pronunciation of stem other than verb, we only need the different pronunciations of lattice pronunciation dictionary is added.
Therefore, verb is separately separated out by we, by stem other than verb and verb and lattice suffix collectively as identification Unit, so in the text identifying system be known as the speech recognition system based on stem, lattice suffix and verb other than verb.
Mongolian Speech Recognition Systems based on stem and ending suffix are built:
Mongolian Speech Recognition Systems based on stem, lattice suffix and verb other than verb are by pretreatment stage, preparation rank Section, training stage, decoding stage and synthesis conversion stage composition.Pretreatment stage is instructed to phonetic symbol text and language model Practice text Latin conversion and conversion after language model training text Inner Mongol ancient Chinese prose word cutting, while establish based on verb with The pronunciation dictionary of outer stem, lattice suffix and verb;Preparation stage is to extract acoustic feature to the voice signal of input;Training stage It is using whole word pronunciation dictionary training acoustic model, utilizes the training text train language model after cutting;Decoding stage is benefit With acoustic model, language model and pronunciation dictionary based on stem, lattice suffix and verb other than verb, by the acoustic feature of input It is identified as text information.Wherein preparation stage, training stage and decoding stage are unrelated with language, and the present invention is mainly to pronunciation word Allusion quotation, newly added pretreatment stage and synthesis conversion stage are adjusted.Since Mongolian letter is in the different location of word Have different deformations, and there are problems that similar shape not unisonance in letter, this when building Mongolian Speech Recognition Systems, It is unfavorable for making a search to the recognition performance of system, so the application is in pretreatment stage by the text in pronunciation dictionary, sound bank The equal transcription of Mongolian word in the text training set of mark and train language model passes through increased conjunction at Latin form Show that actual Mongolian sentence, frame diagram are as shown in Figure 3 at conversion process.
The pretreatment of language model training:
For the training set of language model, need for the word in training set to be cut into stem, lattice other than corresponding verb Suffix and verb form.Mongolian lattice suffix is write in written word using the narrow Nonbreaking Space of Mongolian point.Mongolian is narrow continuously The width in disconnected space is the one third of double byte character, and slightly more shorter than common space, Latin form is indicated with "-".Such as Fig. 4 It is shown, it is carried out other than verb in the corpus of text for the train language model being converted into after Latin form according to "-" letter is convenient The cutting of stem and lattice suffix;Training text after cutting is used to be trained language model.
Language model is trained using the training text after cutting, enable language model in decoding process very Good is matched with the pronunciation dictionary of stem, lattice suffix and verb other than verb.The result obtained after the decoding in this way is Exist with stem, lattice suffix and verb form other than verb.Stem and lattice suffix can combine large-scale Mongolia other than verb Literary word, and other than verb stem, lattice suffix and common verb sum within tens of thousands of.This solves language models The identification problem of Sparse Problem and extensive a Mongolian word in the training process.
The variation and use of pronunciation dictionary:
Different from original Mongolian Speech Recognition Systems, the present invention will use two kinds of pronunciation dictionaries, and one is traditional Store the whole word of Mongolian and its correspond to pronunciation pronunciation dictionary, another kind be storage verb other than stem, lattice suffix and verb with And its pronunciation dictionary accordingly to pronounce, and a variety of pronunciation situations of same lattice suffix are directed to, table one by one is needed in pronunciation dictionary It shows and.As shown in figure 5, being two pronunciation dictionary partial content tables of comparisons, it can be seen that the word of whole word pronunciation dictionary storage There are two types of the forms of expression for the pronunciation dictionary of stem, lattice suffix and verb other than based on verb, and one is constant forms, i.e. verb It is exactly other parts of speech of stem with whole word, indicates consistent in two kinds of pronunciation dictionaries, " sagvjv " and " qasidahv " in Fig. 5 Belong to verb, " elqin " is then that only stem, the form that they are stored in two pronunciation dictionaries are constant;It is another then be by Other words of the non-verb of stem and lattice suffix composition, this word stem, lattice suffix and verb other than based on verb It is divided into stem in pronunciation dictionary and lattice suffix stores respectively." tarihi-ban " and " tere-yi " in Fig. 5, they by Stem and lattice suffix are composed, therefore are divided into word in the pronunciation dictionary of stem, lattice suffix and verb other than based on verb Dry " tarihi ", " tere " and lattice suffix "-ban ", "-yi " are stored respectively.We use whole word in training acoustic model Pronunciation dictionary, such acoustic training model can more accurately indicate to train the corresponding pronunciation phonemes of sentence.Otherwise, it is embroidered with after lattice Multiple pronunciations, the pronunciation default choice of training sentence the first pronunciation therein, will appear the pronunciation of many training sentences in this way Phoneme conversion mistake.The pronunciation dictionary based on stem, lattice suffix and verb other than verb is then used in decoding process.
Not only had to word in collecting using being decoded based on the pronunciation dictionary of stem, lattice suffix and verb other than verb Effect same as the pronunciation dictionary based on whole word, and utilize the pronunciation word based on stem, lattice suffix and verb other than verb Allusion quotation can preferably arrange in pairs or groups with the language model after cutting, and make in the way of stem other than verb, lattice suffix and verb It is able to solve the problem of identifying large-scale a Mongolian word, while this mode reduces word quantity in pronunciation dictionary, Time needed for reducing identification, solves the problems, such as existing Mongol speech recognition overlong time.
Synthesize the conversion stage:
During the experiment, it has been found that in some error results after the decoding, there is universal rule.These rule Rule, is concentrated mainly in Mongol on the decoding error of lattice suffix.Therefore these mistakes are directed to, can be used more Mongolian Rule corrects it.As shown in fig. 6, judging lattice suffix "-dv " ,-du ", the selection of "-tv ", "-tu ", be positive word in stem In the case where, if stem is not with vowel or " n ", " N ", " l ", "-tv " lattice suffix is chosen in " m " ending, if stem is with vowel Or " n " ending then selects "-dv " lattice suffix.Conversely, in the case where stem is not positive word, if stem be not with vowel or " n ", " N ", " l ", " m " ending, then select lattice suffix "-tu ", if stem is with vowel or " n ", " N ", " l ", " m " ending, then It selects lattice suffix "-du ".
Therefore in the synthesis conversion stage, it is necessary first to carry out the lattice suffix mistake in decoding process by the way of rule It corrects, stem dative suffix is merged into corresponding Latin word later, while using condition random field to the Mongolia after identification Sentence is made pauses in reading unpunctuated ancient writings and adds punctuation mark.Finally by the contrast relationship of Latin word and a Mongolian word, it is converted It is exactly actual output result by the sentence that a Mongolian word forms at actual a Mongolian word.
The lattice suffix correction for identifying mistake can be further improved voice using Mongol rule and knows by the synthesis conversion stage Other accuracy rate.The result after identification can be shown in the form of Mongolian simultaneously.This solves a part of acoustic mode The problem of type and language model can not distinguish approximate lattice suffix completely, while solving the display problem of Mongolian.Fig. 7 gives Realize a full instance in synthesis conversion stage, first sentence is the early results after identification in figure, and second sentence is then It is by after rule regulating as a result, the lattice suffix of overstriking is exactly the correct lattice suffix obtained by rule regulating in sentence.The Three sentences are the results that prediction punctuation mark obtains after merging;4th sentence is the knot being converted into after the Mongolian form of expression Fruit.

Claims (4)

1. a kind of method of Mongol large vocabulary continuous speech recognition, which is characterized in that by pretreatment stage, the preparation stage, Training stage, decoding stage and synthesis conversion stage composition;
The pretreatment stage is exactly by the segmentation of words in language model training text into stem other than verb, lattice suffix and dynamic The form of word, while establishing the pronunciation dictionary based on stem, lattice suffix and verb other than verb;
The preparation stage is to extract acoustic feature to the voice signal of input;
The training stage is to establish acoustic model using the pronunciation dictionary based on the whole word of Mongolian, using based on word other than verb The pronunciation dictionary of dry, lattice suffix and verb establishes language model;
The decoding stage is to utilize acoustic model, language model and the pronunciation based on stem, lattice suffix and verb other than verb Dictionary creation identifies network, and the acoustic feature of input is identified as text information;
The synthesis conversion stage is using the lattice suffix mistake during regular correcting decoder and to merge stem dative suffix, The sentence that final output is made of a Mongolian word.
2. a kind of method of Mongol large vocabulary continuous speech recognition according to claim 1, which is characterized in that described Pretreatment stage specifically follows the steps below: before training pattern, by the Mongolian in the training set text of language model Word is converted into corresponding Latin state;Later by the segmentation of words after conversion at stem other than corresponding verb, lattice suffix and Verb form, and stem, lattice suffix and verb other than verb are stored in the hair based on stem, lattice suffix and verb other than verb In sound dictionary.
3. a kind of method of Mongol large vocabulary continuous speech recognition according to claim 1, which is characterized in that described The application method of pronunciation dictionary, specifically follows the steps below: establishing two kinds of pronunciation dictionaries, a kind of pronunciation dictionary storage Mongolia The whole word of text and corresponding pronunciation, the training for acoustic model;Other than another pronunciation dictionary storage verb after stem, lattice By lattice suffix when sewing and pronounce accordingly with stem, lattice suffix and verb other than verb and verb, while establishing pronunciation dictionary All possible pronunciation is all added in pronunciation dictionary, the decoding for acoustic model.
4. a kind of method of Mongol large vocabulary continuous speech recognition according to claim 1, which is characterized in that described The conversion stage is synthesized, is specifically followed the steps below:
Step 1, the lattice suffix mistake in text after regular correcting decoder is utilized;
Step 2, stem dative suffix is merged to the word for being combined into corresponding Latin form, while utilizing conditional random field models pair Sentence after identification carries out punctuation mark prediction, and prediction result is added in the sentence of identification;
Step 3, by the contrast relationship of Latin word and Mongolian word, the Latin word merged is converted into actual Mongolia Literary word is exactly actual output result by the sentence that a Mongolian word forms.
CN201610440618.9A 2016-06-16 2016-06-16 A kind of method of Mongol large vocabulary continuous speech recognition Active CN105957518B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610440618.9A CN105957518B (en) 2016-06-16 2016-06-16 A kind of method of Mongol large vocabulary continuous speech recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610440618.9A CN105957518B (en) 2016-06-16 2016-06-16 A kind of method of Mongol large vocabulary continuous speech recognition

Publications (2)

Publication Number Publication Date
CN105957518A CN105957518A (en) 2016-09-21
CN105957518B true CN105957518B (en) 2019-05-31

Family

ID=56905926

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610440618.9A Active CN105957518B (en) 2016-06-16 2016-06-16 A kind of method of Mongol large vocabulary continuous speech recognition

Country Status (1)

Country Link
CN (1) CN105957518B (en)

Families Citing this family (21)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108428448A (en) * 2017-02-13 2018-08-21 芋头科技(杭州)有限公司 A kind of sound end detecting method and audio recognition method
CN108573696B (en) * 2017-03-10 2021-03-30 北京搜狗科技发展有限公司 Voice recognition method, device and equipment
JP6585112B2 (en) * 2017-03-17 2019-10-02 株式会社東芝 Voice keyword detection apparatus and voice keyword detection method
CN107247706B (en) * 2017-06-16 2021-06-25 中国电子技术标准化研究院 Text sentence-breaking model establishing method, sentence-breaking method, device and computer equipment
CN107680582B (en) * 2017-07-28 2021-03-26 平安科技(深圳)有限公司 Acoustic model training method, voice recognition method, device, equipment and medium
CN107767858B (en) * 2017-09-08 2021-05-04 科大讯飞股份有限公司 Pronunciation dictionary generating method and device, storage medium and electronic equipment
CN107978315B (en) * 2017-11-20 2021-08-10 徐榭 Dialogue type radiotherapy planning system based on voice recognition and making method
CN108182938B (en) * 2017-12-21 2019-03-19 内蒙古工业大学 A kind of training method of the Mongol acoustic model based on DNN
CN108563639B (en) * 2018-04-17 2021-09-17 内蒙古工业大学 Mongolian language model based on recurrent neural network
CN108549703B (en) * 2018-04-17 2022-03-25 内蒙古工业大学 Mongolian language model training method based on recurrent neural network
CN108831458A (en) * 2018-05-29 2018-11-16 广东声将军科技有限公司 A kind of offline voice is to order transform method and system
CN109410914B (en) * 2018-08-28 2022-02-22 江西师范大学 Method for identifying Jiangxi dialect speech and dialect point
CN109492232A (en) * 2018-10-22 2019-03-19 内蒙古工业大学 A kind of illiteracy Chinese machine translation method of the enhancing semantic feature information based on Transformer
CN109360554A (en) * 2018-12-10 2019-02-19 广东潮庭集团有限公司 A kind of language identification method based on language deep neural network
CN110223675B (en) * 2019-06-13 2022-04-19 思必驰科技股份有限公司 Method and system for screening training text data for voice recognition
CN111540343B (en) * 2020-03-17 2021-02-05 北京捷通华声科技股份有限公司 Corpus identification method and apparatus
CN113205792A (en) * 2021-04-08 2021-08-03 内蒙古工业大学 Mongolian speech synthesis method based on Transformer and WaveNet
CN113377901B (en) * 2021-05-17 2022-08-19 内蒙古工业大学 Mongolian text emotion analysis method based on multi-size CNN and LSTM models
CN113571045B (en) * 2021-06-02 2024-03-12 北京它思智能科技有限公司 Method, system, equipment and medium for identifying Minnan language voice
CN113515952B (en) * 2021-08-18 2023-09-12 内蒙古工业大学 Combined modeling method, system and equipment for Mongolian dialogue model
CN114936555B (en) * 2022-05-24 2023-06-06 内蒙古自治区公安厅 Method and system for AI intelligent labeling of Mongolian

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101576909A (en) * 2009-05-11 2009-11-11 内蒙古蒙科立软件有限责任公司 Mongolian digital knowledge base system construction method
CN101576924A (en) * 2009-06-25 2009-11-11 内蒙古大学 Mongolian retrieval method
CN102063900A (en) * 2010-11-26 2011-05-18 北京交通大学 Speech recognition method and system for overcoming confusing pronunciation
CN103021407A (en) * 2012-12-18 2013-04-03 中国科学院声学研究所 Method and system for recognizing speech of agglutinative language
CN103065632A (en) * 2012-12-21 2013-04-24 中国科学院声学研究所 Selection method and system of recognition unit for Uygur language voice recognition
CN103632663A (en) * 2013-11-25 2014-03-12 飞龙 HMM-based method of Mongolian speech synthesis and front-end processing
CN104575497A (en) * 2013-10-28 2015-04-29 中国科学院声学研究所 Method for building acoustic model and speech decoding method based on acoustic model

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPH01260494A (en) * 1988-04-12 1989-10-17 Matsushita Electric Ind Co Ltd Voice recognizing method
JPH0446397A (en) * 1990-06-14 1992-02-17 Nec Corp Continuous voice recognition system

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101576909A (en) * 2009-05-11 2009-11-11 内蒙古蒙科立软件有限责任公司 Mongolian digital knowledge base system construction method
CN101576924A (en) * 2009-06-25 2009-11-11 内蒙古大学 Mongolian retrieval method
CN102063900A (en) * 2010-11-26 2011-05-18 北京交通大学 Speech recognition method and system for overcoming confusing pronunciation
CN103021407A (en) * 2012-12-18 2013-04-03 中国科学院声学研究所 Method and system for recognizing speech of agglutinative language
CN103065632A (en) * 2012-12-21 2013-04-24 中国科学院声学研究所 Selection method and system of recognition unit for Uygur language voice recognition
CN104575497A (en) * 2013-10-28 2015-04-29 中国科学院声学研究所 Method for building acoustic model and speech decoding method based on acoustic model
CN103632663A (en) * 2013-11-25 2014-03-12 飞龙 HMM-based method of Mongolian speech synthesis and front-end processing

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于分割识别的蒙古语语音关键词检测方法的研究;飞龙等;《计算机科学》;20130930;第40卷(第9期);第208-211页

Also Published As

Publication number Publication date
CN105957518A (en) 2016-09-21

Similar Documents

Publication Publication Date Title
CN105957518B (en) A kind of method of Mongol large vocabulary continuous speech recognition
US9711139B2 (en) Method for building language model, speech recognition method and electronic apparatus
Waibel et al. Multilinguality in speech and spoken language systems
US9613621B2 (en) Speech recognition method and electronic apparatus
Karpov et al. Large vocabulary Russian speech recognition using syntactico-statistical language modeling
CN109637537B (en) Method for automatically acquiring annotated data to optimize user-defined awakening model
Abushariah et al. Arabic speaker-independent continuous automatic speech recognition based on a phonetically rich and balanced speech corpus.
CN108986797B (en) Voice theme recognition method and system
CN102063900A (en) Speech recognition method and system for overcoming confusing pronunciation
Chen et al. Lightly supervised and data-driven approaches to mandarin broadcast news transcription
Carvalho et al. A critical survey on the use of fuzzy sets in speech and natural language processing
US9959270B2 (en) Method and apparatus to model and transfer the prosody of tags across languages
JP5073024B2 (en) Spoken dialogue device
Al-Anzi et al. The impact of phonological rules on Arabic speech recognition
CN102970618A (en) Video on demand method based on syllable identification
Lin et al. Hierarchical prosody modeling for Mandarin spontaneous speech
CN112489634A (en) Language acoustic model training method and device, electronic equipment and computer medium
Vazhenina et al. State-of-the-art speech recognition technologies for Russian language
KR20150128656A (en) Name transliteration method based on classification of name origins
CN109523992A (en) Tibetan dialect speech processing system
CN114863914A (en) Deep learning method for constructing end-to-end speech evaluation model
Sung et al. Deploying google search by voice in cantonese
Schlippe et al. Rapid bootstrapping of a ukrainian large vocabulary continuous speech recognition system
Chen et al. Using Taigi dramas with Mandarin Chinese subtitles to improve Taigi speech recognition
Arısoy Turkish dictation system for radiology and broadcast news applications

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant