CN106971721A

CN106971721A - A kind of accent speech recognition system based on embedded mobile device

Info

Publication number: CN106971721A
Application number: CN201710198053.2A
Authority: CN
Inventors: 龚鸣敏; 马作伟; 金弘林; 李强
Original assignee: Wo Hang (wuhan) Technology Co Ltd
Current assignee: Wo Hang (wuhan) Technology Co Ltd
Priority date: 2017-03-29
Filing date: 2017-03-29
Publication date: 2017-07-21

Abstract

The present invention relates to a kind of accent speech recognition system based on embedded mobile device, including the model training module being integrated in embedded mobile device, characteristic extracting module and Pattern Matching Module, the model training module is used to accent voice is collected and trained, and draws the entry model of accent；The characteristic extracting module is used to extract the phonetic feature in the accent of input；The Pattern Matching Module is used to carry out voice match calculating to the phonetic feature according to the entry model, draws voice identification result.A kind of accent speech recognition system based on embedded mobile device of the present invention improves dialectal accent phonetic recognization rate, and the difficulty for learning and training when reduction dialect crowd is using speech recognition reduces study and training amount；Meanwhile, speech recognition is incorporated into various embedded mobile devices by the system, it is possible to achieve intelligent interaction.

Description

A kind of accent speech recognition system based on embedded mobile device

Technical field

The present invention relates to field of speech recognition, and in particular to a kind of accent voice based on embedded mobile device is known Other system.

Background technology

The Research of Speech Recognition of China originates in 1958, utilizes vacuum tube circuit to recognize 10 by Chinese Academy of Sciences's acoustics Individual vowel.Until the Computer Distance Education by acoustics institute of the Chinese Academy of Sciences of ability in 1973.Due to the limitation of prevailing condition, in The Research of Speech Recognition work of state is constantly in the stage slowly developed.The language model that the current country is used is a kind of probability Model, will make computer understand that the language and visual expression of the mankind come out really, on this point of identification must just make progress, This is a quite arduous job.In addition, continuing to develop with hardware resource, such as feature extraction of some core algorithms, search Rope algorithm or adaptive algorithm would be possible to further improvement.

External IBM ViaVoice and Asiaworks SPK are required for user using the preceding instruction for carrying out hundreds of words Practice, to allow computer to adapt to your sound characteristic.This necessarily limits the further application of speech recognition technology, substantial amounts of training Not only allow user to be fed up with, and increase the burden of system.Also, the consumer-electronics applications product in future can not be look to It is trained for single consumer.Therefore, it is necessary to have further raising at adaptive aspect, accomplish not by particular person, mouth The influence of sound or dialect, this actually also implies that the further improvement to language model.The user type of real world is It is diversified, there is the difference of male sound, female's sound and Tong Yin for sound characteristic, in addition, the pronunciation of many people is from RP Gap is very remote, and this relates to the processing to accent or dialect.If speech recognition can accomplish the automatic sound for adapting to most people Line feature, that may be more important than improving one or two percentage points of discriminations.In fact, the application prospect of voice recognition is also because of this Any is made a discount, and the user that only mandarin is spoken very well just can obtain relative in terms of literary version continuous speech recognition wherein Satisfied achievement.

Current computer automatic speech recognition technology has made great progress.To ensure the effective of statistical model matching Property, it is necessary to mass data is collected to cover the change of all acoustic connections appeared in speech recognition application, the change of such as words person Change, ambient noise, the Different Effects of microphone and communication channel.The serious hair for constraining this type technology of difference of identification mission Exhibition.Different language also can produce influence to the result of speech recognition in actual applications, especially Chinese.Chinese speech recognition It is an extremely complex task.The complexity of speech recognition technology in itself is removed, the complexity of Chinese dialect is also known to voice Other popularization and application bring great difficulty.China possesses hundreds and thousands of kinds of dialects.Up to the present Chinese speech recognition is studied With the basic only consideration mandarin of exploitation, the identification for accent is also very few for it.

The content of the invention

The technical problems to be solved by the invention are to provide a kind of accent voice based on embedded mobile device and known Other system, can precisely be recognized in embedded mobile device to accent voice.

The technical scheme that the present invention solves above-mentioned technical problem is as follows：A kind of accent based on embedded mobile device Speech recognition system, including model training module, characteristic extracting module and the pattern match being integrated in embedded mobile device Module,

The model training module is used to accent voice is collected and trained, and draws the entry mould of accent Type；

The characteristic extracting module is used to extract the phonetic feature in the accent of input；

The Pattern Matching Module is used to carry out voice match calculating to the phonetic feature according to the entry model, obtains Go out voice identification result.

The beneficial effects of the invention are as follows：A kind of accent speech recognition system based on embedded mobile device of the present invention By setting up the local dialect accent Chinese speech database, carry out pronunciation variation law, speaker on its basis certainly Adapt to study with non-native speaker accents recognition, and explore solution multilingual and mix, differentiation application environment, and it is different The user speech identification problem of dialect and mother tongue, improves dialectal accent phonetic recognization rate, reduction dialect crowd uses speech recognition When learn and training difficulty, reduce study and training amount；Meanwhile, speech recognition is incorporated into various embedding by the system Enter formula mobile device, it is possible to achieve intelligent interaction.

On the basis of above-mentioned technical proposal, the present invention can also do following improvement.

Further, the entry model includes acoustic model and linguistic model.

Further, in the linguistic model, pronunciation variation completely is modeled using many Pronounceable dictionaries.

Further, in the acoustic model, part pronunciation is made a variation using context-free meristic variation phone model It is modeled.

Further, in the acoustic model, for numeral using primitive of the syllable as model；For control command collection Or continuous speech recognition is used as model primitive using the simple or compound vowel of a Chinese syllable of the right related sound of context and context without text.

Beneficial effect using above-mentioned further scheme is：The modeling method of acoustic model had both considered the collaboration in syllable Pronunciation, reduces the number of training primitive, balance has been reached between acoustic model scale, calculating speed and discrimination, has been made again Obtaining the system can be integrated in embedded mobile device.

Further, the phonetic feature is carried out using neural network structure and cloud in the Pattern Matching Module Voice match is calculated.

Beneficial effect using above-mentioned further scheme is：It can increase ground square opening using neural network structure and cloud The correctness of sound speech recognition.

Brief description of the drawings

Fig. 1 is a kind of structured flowchart of the accent speech recognition system based on embedded mobile device of the present invention；

Fig. 2 is a kind of identification system of the accent speech recognition system based on embedded mobile device of the present invention.

Embodiment

The principle and feature of the present invention are described below in conjunction with accompanying drawing, the given examples are served only to explain the present invention, and It is non-to be used to limit the scope of the present invention.

As shown in figure 1, a kind of accent speech recognition system based on embedded mobile device, including it is integrated in insertion Model training module, characteristic extracting module and Pattern Matching Module on formula mobile device, the model training module be used for pair Accent voice is collected and trained, and draws the entry model of accent；The characteristic extracting module is used for input Accent in phonetic feature extracted；The Pattern Matching Module is used for according to the entry model to the voice Feature carries out voice match calculating, draws voice identification result.

Specifically：The entry model includes acoustic model and linguistic model.In the linguistic model, using many Pronounceable dictionary is modeled to pronunciation variation completely.In the acoustic model, context-free meristic variation phone mould is used Type is modeled to part pronunciation variation.In the acoustic model, for numeral using primitive of the syllable as model；For Control command collection or continuous speech recognition are used as model primitive using the simple or compound vowel of a Chinese syllable of the right related sound of context and context without text. Voice match calculating is carried out to the phonetic feature using neural network structure and cloud in the Pattern Matching Module.

In this specific embodiment, speech recognition is mainly carried out by taking the accent of Wuhan as an example.

Fig. 2 is a kind of identification system of the accent speech recognition system based on embedded mobile device of the present invention.Ground Square opening sound enters sound bank by phonetic entry, and voice match and pattern match are carried out in model library, the voice of matching is selected Synthesis, adds vocal print password；Wherein speech recognition system includes voice turn semanteme, text turn semanteme, speech evaluating, recognition of face Technology；Speech recognition technology is combined with other natural language processing techniques such as machine translation and speech synthesis technique, can be with structure Build out more complicated application.

In the system of the present invention：, can be to isolated word (word) and connective word according to the requirement to speaker's tongue Carry out speech recognition；According to the degree of dependence to speaker, speech recognition can be carried out to particular person and unspecified person；According to word Remittance amount size, can carry out speech recognition to small vocabulary, medium vocabulary, large vocabulary and unlimited vocabulary.

In the system of the present invention, speech recognition technology mainly includes Feature Extraction Technology, pattern match criterion and model Three aspects of training technique；In addition, the problems such as also relating to the selection of voice recognition unit.

The main research application speech recognition technology of system of the present invention recognizes expression and interaction between Wuhan Dialect, with reality The intelligent search of existing high experience property and experience.The problem of system of the present invention will mainly be solved has embedded Wuhan words speech recognition Entered a higher school in system the bidirectional recognition of the foundation of model, Wuhan language sound and word.Use neural network structure and cloud increase The correctness of identification.Utilization of the linguistic model in terms of tone with mandarin model and northern dialect model.

The system research simultaneously sets up acoustic model and linguistic model that speech recognition system is talked about in embedded Wuhan.Currently, When Wuhan, words speech recognition engine is integrated on embedded mobile device-- mobile phone or few in number.Its difficulty is to be embedded in The memory size of formula mobile device is few, and computing capability is low, and the Wuhan words speech recognition system in embedded mobile device needs spy Other Acoustic Modeling.The system will be modeled to Relational database using different primitives：

1. for numeral, syllable is employed as the primitive of model；

2. the right related sound of context and context are employed without text for control command collection or continuous speech recognition Simple or compound vowel of a Chinese syllable is used as model primitive.

This modeling method had both considered the coarticulation in syllable, the number of training primitive was reduced again, in acoustic mode Balance has been reached between type scale, calculating speed and discrimination.

In the system of the present invention, the language use of accent is substantially carried when being spoken standard Chinese pronunciation according to the domestic crowd in Hubei Present situation, sets up one for the purpose of studying non-native speaker mandarin continuous speech recognition, the standard Chinese of different accents Speech database, and carried out pronunciation variation law, speaker adaptation and non-native speaker accents recognition on its basis Research.

Based on the system of the present invention, allow cell phone platform not influenceed by accent, can correctly recognize each place accent.Voice Identification engine can be integrated into embedded mobile device.Research forms a set of practical speaker's accent adaptively side Case, is that further research and development from now in this respect lays the foundation.

By being modeled in voice layer using many Pronounceable dictionaries to pronunciation variation completely, above and below acoustic layer respectively use Literary irrelevant portions variation phone model (partial change phone model, PCPM) is built to part pronunciation variation Mould, so as to probe into feature, difference and the contact of voice layer and acoustic layer pronunciation mutation model, and would be integrated into voice knowledge In the different piece of other system, the layered shaping to pronunciation variation is realized.Using layering pronunciation mutation model, to band dialectal accent Standard Chinese is read aloud voice and tested, and improves recognition result.Respectively using the mandarin with the accent of Hubei some areas Tested, the difference of variation of pronouncing between the different dialectal accent of Chinese is analyzed with associating from experimental result.

Speech recognition technology (Auto Speech Recognize, abbreviation ASR) problem to be solved is to allow machine can The voice of " understanding " mankind, the text information included in voice " extraction " is come out, equivalent to " ear " is installed to machine, made It possesses the function of " can listen ".

The invention provides more accurate intelligent speech recognition technology, possess high recognition accuracy, high recognition speed, field Model is customizable, support the functions such as a variety of tupes, while small to have SDK to develop simple, kit resource occupation in the future Etc. advantage.The voice of input can be identified and text transcription in real time, exactly.And the language material by being constantly collected into, The optimization training of model is carried out, the coverage rate of model and the accuracy of identification is constantly improved.

Its application value is：

1) precisely identification, recognizes engine semantic context self-correction.

2) persistently, invalid voice is filtered in lasting recording, continuous identification for man-machine interaction.

3) it can be interrupted at any time based on semantic intelligence punctuate, support active interactive.

4) context is talked with, context understanding, is putd question to based on content, is talked with scene management more, across the shared length of scene information When remember.

5) personalization can be expanded, and the personalization of product performance customized user supports that interactive mode is expansible.

A kind of accent speech recognition system based on embedded mobile device of the present invention is by setting up the local dialect mouthful Sound Chinese speech database, carries out pronunciation variation law, speaker adaptation and non-native speaker on its basis Accents recognition is studied, and is explored solution multilingual and mixed, differentiation application environment, and different dialect and mother tongue user's language Sound recognizes problem, improves dialectal accent phonetic recognization rate, the difficulty for learning and training when reducing dialect crowd using speech recognition, Reduce study and training amount；Meanwhile, speech recognition is incorporated into various embedded mobile devices, Ke Yishi by the system Existing intelligent interaction.

The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent substitution and improvements made etc. should be included in the scope of the protection.

Claims

1. a kind of accent speech recognition system based on embedded mobile device, it is characterised in that：Including being integrated in insertion Model training module, characteristic extracting module and Pattern Matching Module on formula mobile device,

The model training module is used to accent voice is collected and trained, and draws the entry model of accent；

The Pattern Matching Module is used to carry out voice match calculating to the phonetic feature according to the entry model, draws language Sound recognition result.

2. a kind of accent speech recognition system based on embedded mobile device according to claim 1, its feature It is：The entry model includes acoustic model and linguistic model.

3. a kind of accent speech recognition system based on embedded mobile device according to claim 2, its feature It is：In the linguistic model, pronunciation variation completely is modeled using many Pronounceable dictionaries.

4. a kind of accent speech recognition system based on embedded mobile device according to claim 2, its feature It is：In the acoustic model, part pronunciation variation is modeled using context-free meristic variation phone model.

5. a kind of accent speech recognition system based on embedded mobile device according to claim 4, its feature It is：In the acoustic model, for numeral using primitive of the syllable as model；For control command collection or continuous speech Identification is used as model primitive using the simple or compound vowel of a Chinese syllable of the right related sound of context and context without text.

6. a kind of accent speech recognition system based on embedded mobile device according to any one of claim 1 to 5 System, it is characterised in that：The phonetic feature is carried out using neural network structure and cloud in the Pattern Matching Module Voice match is calculated.