CN106971721A - A kind of accent speech recognition system based on embedded mobile device - Google Patents
A kind of accent speech recognition system based on embedded mobile device Download PDFInfo
- Publication number
- CN106971721A CN106971721A CN201710198053.2A CN201710198053A CN106971721A CN 106971721 A CN106971721 A CN 106971721A CN 201710198053 A CN201710198053 A CN 201710198053A CN 106971721 A CN106971721 A CN 106971721A
- Authority
- CN
- China
- Prior art keywords
- accent
- model
- speech recognition
- mobile device
- embedded mobile
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/02—Feature extraction for speech recognition; Selection of recognition unit
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
Abstract
The present invention relates to a kind of accent speech recognition system based on embedded mobile device, including the model training module being integrated in embedded mobile device, characteristic extracting module and Pattern Matching Module, the model training module is used to accent voice is collected and trained, and draws the entry model of accent;The characteristic extracting module is used to extract the phonetic feature in the accent of input;The Pattern Matching Module is used to carry out voice match calculating to the phonetic feature according to the entry model, draws voice identification result.A kind of accent speech recognition system based on embedded mobile device of the present invention improves dialectal accent phonetic recognization rate, and the difficulty for learning and training when reduction dialect crowd is using speech recognition reduces study and training amount;Meanwhile, speech recognition is incorporated into various embedded mobile devices by the system, it is possible to achieve intelligent interaction.
Description
Technical field
The present invention relates to field of speech recognition, and in particular to a kind of accent voice based on embedded mobile device is known
Other system.
Background technology
The Research of Speech Recognition of China originates in 1958, utilizes vacuum tube circuit to recognize 10 by Chinese Academy of Sciences's acoustics
Individual vowel.Until the Computer Distance Education by acoustics institute of the Chinese Academy of Sciences of ability in 1973.Due to the limitation of prevailing condition, in
The Research of Speech Recognition work of state is constantly in the stage slowly developed.The language model that the current country is used is a kind of probability
Model, will make computer understand that the language and visual expression of the mankind come out really, on this point of identification must just make progress,
This is a quite arduous job.In addition, continuing to develop with hardware resource, such as feature extraction of some core algorithms, search
Rope algorithm or adaptive algorithm would be possible to further improvement.
External IBM ViaVoice and Asiaworks SPK are required for user using the preceding instruction for carrying out hundreds of words
Practice, to allow computer to adapt to your sound characteristic.This necessarily limits the further application of speech recognition technology, substantial amounts of training
Not only allow user to be fed up with, and increase the burden of system.Also, the consumer-electronics applications product in future can not be look to
It is trained for single consumer.Therefore, it is necessary to have further raising at adaptive aspect, accomplish not by particular person, mouth
The influence of sound or dialect, this actually also implies that the further improvement to language model.The user type of real world is
It is diversified, there is the difference of male sound, female's sound and Tong Yin for sound characteristic, in addition, the pronunciation of many people is from RP
Gap is very remote, and this relates to the processing to accent or dialect.If speech recognition can accomplish the automatic sound for adapting to most people
Line feature, that may be more important than improving one or two percentage points of discriminations.In fact, the application prospect of voice recognition is also because of this
Any is made a discount, and the user that only mandarin is spoken very well just can obtain relative in terms of literary version continuous speech recognition wherein
Satisfied achievement.
Current computer automatic speech recognition technology has made great progress.To ensure the effective of statistical model matching
Property, it is necessary to mass data is collected to cover the change of all acoustic connections appeared in speech recognition application, the change of such as words person
Change, ambient noise, the Different Effects of microphone and communication channel.The serious hair for constraining this type technology of difference of identification mission
Exhibition.Different language also can produce influence to the result of speech recognition in actual applications, especially Chinese.Chinese speech recognition
It is an extremely complex task.The complexity of speech recognition technology in itself is removed, the complexity of Chinese dialect is also known to voice
Other popularization and application bring great difficulty.China possesses hundreds and thousands of kinds of dialects.Up to the present Chinese speech recognition is studied
With the basic only consideration mandarin of exploitation, the identification for accent is also very few for it.
The content of the invention
The technical problems to be solved by the invention are to provide a kind of accent voice based on embedded mobile device and known
Other system, can precisely be recognized in embedded mobile device to accent voice.
The technical scheme that the present invention solves above-mentioned technical problem is as follows:A kind of accent based on embedded mobile device
Speech recognition system, including model training module, characteristic extracting module and the pattern match being integrated in embedded mobile device
Module,
The model training module is used to accent voice is collected and trained, and draws the entry mould of accent
Type;
The characteristic extracting module is used to extract the phonetic feature in the accent of input;
The Pattern Matching Module is used to carry out voice match calculating to the phonetic feature according to the entry model, obtains
Go out voice identification result.
The beneficial effects of the invention are as follows:A kind of accent speech recognition system based on embedded mobile device of the present invention
By setting up the local dialect accent Chinese speech database, carry out pronunciation variation law, speaker on its basis certainly
Adapt to study with non-native speaker accents recognition, and explore solution multilingual and mix, differentiation application environment, and it is different
The user speech identification problem of dialect and mother tongue, improves dialectal accent phonetic recognization rate, reduction dialect crowd uses speech recognition
When learn and training difficulty, reduce study and training amount;Meanwhile, speech recognition is incorporated into various embedding by the system
Enter formula mobile device, it is possible to achieve intelligent interaction.
On the basis of above-mentioned technical proposal, the present invention can also do following improvement.
Further, the entry model includes acoustic model and linguistic model.
Further, in the linguistic model, pronunciation variation completely is modeled using many Pronounceable dictionaries.
Further, in the acoustic model, part pronunciation is made a variation using context-free meristic variation phone model
It is modeled.
Further, in the acoustic model, for numeral using primitive of the syllable as model;For control command collection
Or continuous speech recognition is used as model primitive using the simple or compound vowel of a Chinese syllable of the right related sound of context and context without text.
Beneficial effect using above-mentioned further scheme is:The modeling method of acoustic model had both considered the collaboration in syllable
Pronunciation, reduces the number of training primitive, balance has been reached between acoustic model scale, calculating speed and discrimination, has been made again
Obtaining the system can be integrated in embedded mobile device.
Further, the phonetic feature is carried out using neural network structure and cloud in the Pattern Matching Module
Voice match is calculated.
Beneficial effect using above-mentioned further scheme is:It can increase ground square opening using neural network structure and cloud
The correctness of sound speech recognition.
Brief description of the drawings
Fig. 1 is a kind of structured flowchart of the accent speech recognition system based on embedded mobile device of the present invention;
Fig. 2 is a kind of identification system of the accent speech recognition system based on embedded mobile device of the present invention.
Embodiment
The principle and feature of the present invention are described below in conjunction with accompanying drawing, the given examples are served only to explain the present invention, and
It is non-to be used to limit the scope of the present invention.
As shown in figure 1, a kind of accent speech recognition system based on embedded mobile device, including it is integrated in insertion
Model training module, characteristic extracting module and Pattern Matching Module on formula mobile device, the model training module be used for pair
Accent voice is collected and trained, and draws the entry model of accent;The characteristic extracting module is used for input
Accent in phonetic feature extracted;The Pattern Matching Module is used for according to the entry model to the voice
Feature carries out voice match calculating, draws voice identification result.
Specifically:The entry model includes acoustic model and linguistic model.In the linguistic model, using many
Pronounceable dictionary is modeled to pronunciation variation completely.In the acoustic model, context-free meristic variation phone mould is used
Type is modeled to part pronunciation variation.In the acoustic model, for numeral using primitive of the syllable as model;For
Control command collection or continuous speech recognition are used as model primitive using the simple or compound vowel of a Chinese syllable of the right related sound of context and context without text.
Voice match calculating is carried out to the phonetic feature using neural network structure and cloud in the Pattern Matching Module.
In this specific embodiment, speech recognition is mainly carried out by taking the accent of Wuhan as an example.
Fig. 2 is a kind of identification system of the accent speech recognition system based on embedded mobile device of the present invention.Ground
Square opening sound enters sound bank by phonetic entry, and voice match and pattern match are carried out in model library, the voice of matching is selected
Synthesis, adds vocal print password;Wherein speech recognition system includes voice turn semanteme, text turn semanteme, speech evaluating, recognition of face
Technology;Speech recognition technology is combined with other natural language processing techniques such as machine translation and speech synthesis technique, can be with structure
Build out more complicated application.
In the system of the present invention:, can be to isolated word (word) and connective word according to the requirement to speaker's tongue
Carry out speech recognition;According to the degree of dependence to speaker, speech recognition can be carried out to particular person and unspecified person;According to word
Remittance amount size, can carry out speech recognition to small vocabulary, medium vocabulary, large vocabulary and unlimited vocabulary.
In the system of the present invention, speech recognition technology mainly includes Feature Extraction Technology, pattern match criterion and model
Three aspects of training technique;In addition, the problems such as also relating to the selection of voice recognition unit.
The main research application speech recognition technology of system of the present invention recognizes expression and interaction between Wuhan Dialect, with reality
The intelligent search of existing high experience property and experience.The problem of system of the present invention will mainly be solved has embedded Wuhan words speech recognition
Entered a higher school in system the bidirectional recognition of the foundation of model, Wuhan language sound and word.Use neural network structure and cloud increase
The correctness of identification.Utilization of the linguistic model in terms of tone with mandarin model and northern dialect model.
The system research simultaneously sets up acoustic model and linguistic model that speech recognition system is talked about in embedded Wuhan.Currently,
When Wuhan, words speech recognition engine is integrated on embedded mobile device-- mobile phone or few in number.Its difficulty is to be embedded in
The memory size of formula mobile device is few, and computing capability is low, and the Wuhan words speech recognition system in embedded mobile device needs spy
Other Acoustic Modeling.The system will be modeled to Relational database using different primitives:
1. for numeral, syllable is employed as the primitive of model;
2. the right related sound of context and context are employed without text for control command collection or continuous speech recognition
Simple or compound vowel of a Chinese syllable is used as model primitive.
This modeling method had both considered the coarticulation in syllable, the number of training primitive was reduced again, in acoustic mode
Balance has been reached between type scale, calculating speed and discrimination.
In the system of the present invention, the language use of accent is substantially carried when being spoken standard Chinese pronunciation according to the domestic crowd in Hubei
Present situation, sets up one for the purpose of studying non-native speaker mandarin continuous speech recognition, the standard Chinese of different accents
Speech database, and carried out pronunciation variation law, speaker adaptation and non-native speaker accents recognition on its basis
Research.
Based on the system of the present invention, allow cell phone platform not influenceed by accent, can correctly recognize each place accent.Voice
Identification engine can be integrated into embedded mobile device.Research forms a set of practical speaker's accent adaptively side
Case, is that further research and development from now in this respect lays the foundation.
By being modeled in voice layer using many Pronounceable dictionaries to pronunciation variation completely, above and below acoustic layer respectively use
Literary irrelevant portions variation phone model (partial change phone model, PCPM) is built to part pronunciation variation
Mould, so as to probe into feature, difference and the contact of voice layer and acoustic layer pronunciation mutation model, and would be integrated into voice knowledge
In the different piece of other system, the layered shaping to pronunciation variation is realized.Using layering pronunciation mutation model, to band dialectal accent
Standard Chinese is read aloud voice and tested, and improves recognition result.Respectively using the mandarin with the accent of Hubei some areas
Tested, the difference of variation of pronouncing between the different dialectal accent of Chinese is analyzed with associating from experimental result.
Speech recognition technology (Auto Speech Recognize, abbreviation ASR) problem to be solved is to allow machine can
The voice of " understanding " mankind, the text information included in voice " extraction " is come out, equivalent to " ear " is installed to machine, made
It possesses the function of " can listen ".
The invention provides more accurate intelligent speech recognition technology, possess high recognition accuracy, high recognition speed, field
Model is customizable, support the functions such as a variety of tupes, while small to have SDK to develop simple, kit resource occupation in the future
Etc. advantage.The voice of input can be identified and text transcription in real time, exactly.And the language material by being constantly collected into,
The optimization training of model is carried out, the coverage rate of model and the accuracy of identification is constantly improved.
Its application value is:
1) precisely identification, recognizes engine semantic context self-correction.
2) persistently, invalid voice is filtered in lasting recording, continuous identification for man-machine interaction.
3) it can be interrupted at any time based on semantic intelligence punctuate, support active interactive.
4) context is talked with, context understanding, is putd question to based on content, is talked with scene management more, across the shared length of scene information
When remember.
5) personalization can be expanded, and the personalization of product performance customized user supports that interactive mode is expansible.
A kind of accent speech recognition system based on embedded mobile device of the present invention is by setting up the local dialect mouthful
Sound Chinese speech database, carries out pronunciation variation law, speaker adaptation and non-native speaker on its basis
Accents recognition is studied, and is explored solution multilingual and mixed, differentiation application environment, and different dialect and mother tongue user's language
Sound recognizes problem, improves dialectal accent phonetic recognization rate, the difficulty for learning and training when reducing dialect crowd using speech recognition,
Reduce study and training amount;Meanwhile, speech recognition is incorporated into various embedded mobile devices, Ke Yishi by the system
Existing intelligent interaction.
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and
Within principle, any modification, equivalent substitution and improvements made etc. should be included in the scope of the protection.
Claims (6)
1. a kind of accent speech recognition system based on embedded mobile device, it is characterised in that:Including being integrated in insertion
Model training module, characteristic extracting module and Pattern Matching Module on formula mobile device,
The model training module is used to accent voice is collected and trained, and draws the entry model of accent;
The characteristic extracting module is used to extract the phonetic feature in the accent of input;
The Pattern Matching Module is used to carry out voice match calculating to the phonetic feature according to the entry model, draws language
Sound recognition result.
2. a kind of accent speech recognition system based on embedded mobile device according to claim 1, its feature
It is:The entry model includes acoustic model and linguistic model.
3. a kind of accent speech recognition system based on embedded mobile device according to claim 2, its feature
It is:In the linguistic model, pronunciation variation completely is modeled using many Pronounceable dictionaries.
4. a kind of accent speech recognition system based on embedded mobile device according to claim 2, its feature
It is:In the acoustic model, part pronunciation variation is modeled using context-free meristic variation phone model.
5. a kind of accent speech recognition system based on embedded mobile device according to claim 4, its feature
It is:In the acoustic model, for numeral using primitive of the syllable as model;For control command collection or continuous speech
Identification is used as model primitive using the simple or compound vowel of a Chinese syllable of the right related sound of context and context without text.
6. a kind of accent speech recognition system based on embedded mobile device according to any one of claim 1 to 5
System, it is characterised in that:The phonetic feature is carried out using neural network structure and cloud in the Pattern Matching Module
Voice match is calculated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710198053.2A CN106971721A (en) | 2017-03-29 | 2017-03-29 | A kind of accent speech recognition system based on embedded mobile device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710198053.2A CN106971721A (en) | 2017-03-29 | 2017-03-29 | A kind of accent speech recognition system based on embedded mobile device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106971721A true CN106971721A (en) | 2017-07-21 |
Family
ID=59336068
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710198053.2A Pending CN106971721A (en) | 2017-03-29 | 2017-03-29 | A kind of accent speech recognition system based on embedded mobile device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106971721A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109192194A (en) * | 2018-08-22 | 2019-01-11 | 北京百度网讯科技有限公司 | Voice data mask method, device, computer equipment and storage medium |
CN110019683A (en) * | 2017-12-29 | 2019-07-16 | 同方威视技术股份有限公司 | Intelligent sound interaction robot and its voice interactive method |
CN112259102A (en) * | 2020-10-29 | 2021-01-22 | 适享智能科技(苏州)有限公司 | Retail scene voice interaction optimization method based on knowledge graph |
CN112349294A (en) * | 2020-10-22 | 2021-02-09 | 腾讯科技(深圳)有限公司 | Voice processing method and device, computer readable medium and electronic equipment |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1412741A (en) * | 2002-12-13 | 2003-04-23 | 郑方 | Chinese speech identification method with dialect background |
CN1741131A (en) * | 2004-08-27 | 2006-03-01 | 中国科学院自动化研究所 | A kind of unspecified person alone word audio recognition method and device |
CN101281745A (en) * | 2008-05-23 | 2008-10-08 | 深圳市北科瑞声科技有限公司 | Interactive system for vehicle-mounted voice |
CN103700370A (en) * | 2013-12-04 | 2014-04-02 | 北京中科模识科技有限公司 | Broadcast television voice recognition method and system |
CN106057196A (en) * | 2016-07-08 | 2016-10-26 | 成都之达科技有限公司 | Vehicular voice data analysis identification method |
-
2017
- 2017-03-29 CN CN201710198053.2A patent/CN106971721A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1412741A (en) * | 2002-12-13 | 2003-04-23 | 郑方 | Chinese speech identification method with dialect background |
CN1741131A (en) * | 2004-08-27 | 2006-03-01 | 中国科学院自动化研究所 | A kind of unspecified person alone word audio recognition method and device |
CN101281745A (en) * | 2008-05-23 | 2008-10-08 | 深圳市北科瑞声科技有限公司 | Interactive system for vehicle-mounted voice |
CN103700370A (en) * | 2013-12-04 | 2014-04-02 | 北京中科模识科技有限公司 | Broadcast television voice recognition method and system |
CN106057196A (en) * | 2016-07-08 | 2016-10-26 | 成都之达科技有限公司 | Vehicular voice data analysis identification method |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110019683A (en) * | 2017-12-29 | 2019-07-16 | 同方威视技术股份有限公司 | Intelligent sound interaction robot and its voice interactive method |
CN109192194A (en) * | 2018-08-22 | 2019-01-11 | 北京百度网讯科技有限公司 | Voice data mask method, device, computer equipment and storage medium |
CN112349294A (en) * | 2020-10-22 | 2021-02-09 | 腾讯科技(深圳)有限公司 | Voice processing method and device, computer readable medium and electronic equipment |
CN112259102A (en) * | 2020-10-29 | 2021-01-22 | 适享智能科技(苏州)有限公司 | Retail scene voice interaction optimization method based on knowledge graph |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110491382B (en) | Speech recognition method and device based on artificial intelligence and speech interaction equipment | |
CN107195296B (en) | Voice recognition method, device, terminal and system | |
WO2022057712A1 (en) | Electronic device and semantic parsing method therefor, medium, and human-machine dialog system | |
CN111833845B (en) | Multilingual speech recognition model training method, device, equipment and storage medium | |
CN109119072A (en) | Civil aviaton's land sky call acoustic model construction method based on DNN-HMM | |
CN110473523A (en) | A kind of audio recognition method, device, storage medium and terminal | |
Singh et al. | ASRoIL: a comprehensive survey for automatic speech recognition of Indian languages | |
CN109523989A (en) | Phoneme synthesizing method, speech synthetic device, storage medium and electronic equipment | |
CN109410914A (en) | A kind of Jiangxi dialect phonetic and dialect point recognition methods | |
CN113205817B (en) | Speech semantic recognition method, system, device and medium | |
CN109036391A (en) | Audio recognition method, apparatus and system | |
CN110517664A (en) | Multi-party speech recognition methods, device, equipment and readable storage medium storing program for executing | |
CN107972028A (en) | Man-machine interaction method, device and electronic equipment | |
CN106971721A (en) | A kind of accent speech recognition system based on embedded mobile device | |
CN109508402A (en) | Violation term detection method and device | |
CN101515456A (en) | Speech recognition interface unit and speed recognition method thereof | |
Zhao et al. | End-to-end-based Tibetan multitask speech recognition | |
Vyas et al. | An automatic emotion recognizer using MFCCs and Hidden Markov Models | |
Zeng | Implementation of Embedded Technology-Based English Speech Identification and Translation System. | |
Shivakumar et al. | A study on impact of language model in improving the accuracy of speech to text conversion system | |
CN112489634A (en) | Language acoustic model training method and device, electronic equipment and computer medium | |
Rasipuram et al. | Grapheme and multilingual posterior features for under-resourced speech recognition: a study on scottish gaelic | |
Sharma et al. | Soft-Computational Techniques and Spectro-Temporal Features for Telephonic Speech Recognition: an overview and review of current state of the art | |
Daouad et al. | An automatic speech recognition system for isolated Amazigh word using 1D & 2D CNN-LSTM architecture | |
Mon et al. | Improving Myanmar automatic speech recognition with optimization of convolutional neural network parameters |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20170721 |