CN106971721A - A kind of accent speech recognition system based on embedded mobile device - Google Patents

A kind of accent speech recognition system based on embedded mobile device Download PDF

Info

Publication number
CN106971721A
CN106971721A CN201710198053.2A CN201710198053A CN106971721A CN 106971721 A CN106971721 A CN 106971721A CN 201710198053 A CN201710198053 A CN 201710198053A CN 106971721 A CN106971721 A CN 106971721A
Authority
CN
China
Prior art keywords
accent
model
speech recognition
mobile device
embedded mobile
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710198053.2A
Other languages
Chinese (zh)
Inventor
龚鸣敏
马作伟
金弘林
李强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wo Hang (wuhan) Technology Co Ltd
Original Assignee
Wo Hang (wuhan) Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wo Hang (wuhan) Technology Co Ltd filed Critical Wo Hang (wuhan) Technology Co Ltd
Priority to CN201710198053.2A priority Critical patent/CN106971721A/en
Publication of CN106971721A publication Critical patent/CN106971721A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice

Abstract

The present invention relates to a kind of accent speech recognition system based on embedded mobile device, including the model training module being integrated in embedded mobile device, characteristic extracting module and Pattern Matching Module, the model training module is used to accent voice is collected and trained, and draws the entry model of accent;The characteristic extracting module is used to extract the phonetic feature in the accent of input;The Pattern Matching Module is used to carry out voice match calculating to the phonetic feature according to the entry model, draws voice identification result.A kind of accent speech recognition system based on embedded mobile device of the present invention improves dialectal accent phonetic recognization rate, and the difficulty for learning and training when reduction dialect crowd is using speech recognition reduces study and training amount;Meanwhile, speech recognition is incorporated into various embedded mobile devices by the system, it is possible to achieve intelligent interaction.

Description

A kind of accent speech recognition system based on embedded mobile device
Technical field
The present invention relates to field of speech recognition, and in particular to a kind of accent voice based on embedded mobile device is known Other system.
Background technology
The Research of Speech Recognition of China originates in 1958, utilizes vacuum tube circuit to recognize 10 by Chinese Academy of Sciences's acoustics Individual vowel.Until the Computer Distance Education by acoustics institute of the Chinese Academy of Sciences of ability in 1973.Due to the limitation of prevailing condition, in The Research of Speech Recognition work of state is constantly in the stage slowly developed.The language model that the current country is used is a kind of probability Model, will make computer understand that the language and visual expression of the mankind come out really, on this point of identification must just make progress, This is a quite arduous job.In addition, continuing to develop with hardware resource, such as feature extraction of some core algorithms, search Rope algorithm or adaptive algorithm would be possible to further improvement.
External IBM ViaVoice and Asiaworks SPK are required for user using the preceding instruction for carrying out hundreds of words Practice, to allow computer to adapt to your sound characteristic.This necessarily limits the further application of speech recognition technology, substantial amounts of training Not only allow user to be fed up with, and increase the burden of system.Also, the consumer-electronics applications product in future can not be look to It is trained for single consumer.Therefore, it is necessary to have further raising at adaptive aspect, accomplish not by particular person, mouth The influence of sound or dialect, this actually also implies that the further improvement to language model.The user type of real world is It is diversified, there is the difference of male sound, female's sound and Tong Yin for sound characteristic, in addition, the pronunciation of many people is from RP Gap is very remote, and this relates to the processing to accent or dialect.If speech recognition can accomplish the automatic sound for adapting to most people Line feature, that may be more important than improving one or two percentage points of discriminations.In fact, the application prospect of voice recognition is also because of this Any is made a discount, and the user that only mandarin is spoken very well just can obtain relative in terms of literary version continuous speech recognition wherein Satisfied achievement.
Current computer automatic speech recognition technology has made great progress.To ensure the effective of statistical model matching Property, it is necessary to mass data is collected to cover the change of all acoustic connections appeared in speech recognition application, the change of such as words person Change, ambient noise, the Different Effects of microphone and communication channel.The serious hair for constraining this type technology of difference of identification mission Exhibition.Different language also can produce influence to the result of speech recognition in actual applications, especially Chinese.Chinese speech recognition It is an extremely complex task.The complexity of speech recognition technology in itself is removed, the complexity of Chinese dialect is also known to voice Other popularization and application bring great difficulty.China possesses hundreds and thousands of kinds of dialects.Up to the present Chinese speech recognition is studied With the basic only consideration mandarin of exploitation, the identification for accent is also very few for it.
The content of the invention
The technical problems to be solved by the invention are to provide a kind of accent voice based on embedded mobile device and known Other system, can precisely be recognized in embedded mobile device to accent voice.
The technical scheme that the present invention solves above-mentioned technical problem is as follows:A kind of accent based on embedded mobile device Speech recognition system, including model training module, characteristic extracting module and the pattern match being integrated in embedded mobile device Module,
The model training module is used to accent voice is collected and trained, and draws the entry mould of accent Type;
The characteristic extracting module is used to extract the phonetic feature in the accent of input;
The Pattern Matching Module is used to carry out voice match calculating to the phonetic feature according to the entry model, obtains Go out voice identification result.
The beneficial effects of the invention are as follows:A kind of accent speech recognition system based on embedded mobile device of the present invention By setting up the local dialect accent Chinese speech database, carry out pronunciation variation law, speaker on its basis certainly Adapt to study with non-native speaker accents recognition, and explore solution multilingual and mix, differentiation application environment, and it is different The user speech identification problem of dialect and mother tongue, improves dialectal accent phonetic recognization rate, reduction dialect crowd uses speech recognition When learn and training difficulty, reduce study and training amount;Meanwhile, speech recognition is incorporated into various embedding by the system Enter formula mobile device, it is possible to achieve intelligent interaction.
On the basis of above-mentioned technical proposal, the present invention can also do following improvement.
Further, the entry model includes acoustic model and linguistic model.
Further, in the linguistic model, pronunciation variation completely is modeled using many Pronounceable dictionaries.
Further, in the acoustic model, part pronunciation is made a variation using context-free meristic variation phone model It is modeled.
Further, in the acoustic model, for numeral using primitive of the syllable as model;For control command collection Or continuous speech recognition is used as model primitive using the simple or compound vowel of a Chinese syllable of the right related sound of context and context without text.
Beneficial effect using above-mentioned further scheme is:The modeling method of acoustic model had both considered the collaboration in syllable Pronunciation, reduces the number of training primitive, balance has been reached between acoustic model scale, calculating speed and discrimination, has been made again Obtaining the system can be integrated in embedded mobile device.
Further, the phonetic feature is carried out using neural network structure and cloud in the Pattern Matching Module Voice match is calculated.
Beneficial effect using above-mentioned further scheme is:It can increase ground square opening using neural network structure and cloud The correctness of sound speech recognition.
Brief description of the drawings
Fig. 1 is a kind of structured flowchart of the accent speech recognition system based on embedded mobile device of the present invention;
Fig. 2 is a kind of identification system of the accent speech recognition system based on embedded mobile device of the present invention.
Embodiment
The principle and feature of the present invention are described below in conjunction with accompanying drawing, the given examples are served only to explain the present invention, and It is non-to be used to limit the scope of the present invention.
As shown in figure 1, a kind of accent speech recognition system based on embedded mobile device, including it is integrated in insertion Model training module, characteristic extracting module and Pattern Matching Module on formula mobile device, the model training module be used for pair Accent voice is collected and trained, and draws the entry model of accent;The characteristic extracting module is used for input Accent in phonetic feature extracted;The Pattern Matching Module is used for according to the entry model to the voice Feature carries out voice match calculating, draws voice identification result.
Specifically:The entry model includes acoustic model and linguistic model.In the linguistic model, using many Pronounceable dictionary is modeled to pronunciation variation completely.In the acoustic model, context-free meristic variation phone mould is used Type is modeled to part pronunciation variation.In the acoustic model, for numeral using primitive of the syllable as model;For Control command collection or continuous speech recognition are used as model primitive using the simple or compound vowel of a Chinese syllable of the right related sound of context and context without text. Voice match calculating is carried out to the phonetic feature using neural network structure and cloud in the Pattern Matching Module.
In this specific embodiment, speech recognition is mainly carried out by taking the accent of Wuhan as an example.
Fig. 2 is a kind of identification system of the accent speech recognition system based on embedded mobile device of the present invention.Ground Square opening sound enters sound bank by phonetic entry, and voice match and pattern match are carried out in model library, the voice of matching is selected Synthesis, adds vocal print password;Wherein speech recognition system includes voice turn semanteme, text turn semanteme, speech evaluating, recognition of face Technology;Speech recognition technology is combined with other natural language processing techniques such as machine translation and speech synthesis technique, can be with structure Build out more complicated application.
In the system of the present invention:, can be to isolated word (word) and connective word according to the requirement to speaker's tongue Carry out speech recognition;According to the degree of dependence to speaker, speech recognition can be carried out to particular person and unspecified person;According to word Remittance amount size, can carry out speech recognition to small vocabulary, medium vocabulary, large vocabulary and unlimited vocabulary.
In the system of the present invention, speech recognition technology mainly includes Feature Extraction Technology, pattern match criterion and model Three aspects of training technique;In addition, the problems such as also relating to the selection of voice recognition unit.
The main research application speech recognition technology of system of the present invention recognizes expression and interaction between Wuhan Dialect, with reality The intelligent search of existing high experience property and experience.The problem of system of the present invention will mainly be solved has embedded Wuhan words speech recognition Entered a higher school in system the bidirectional recognition of the foundation of model, Wuhan language sound and word.Use neural network structure and cloud increase The correctness of identification.Utilization of the linguistic model in terms of tone with mandarin model and northern dialect model.
The system research simultaneously sets up acoustic model and linguistic model that speech recognition system is talked about in embedded Wuhan.Currently, When Wuhan, words speech recognition engine is integrated on embedded mobile device-- mobile phone or few in number.Its difficulty is to be embedded in The memory size of formula mobile device is few, and computing capability is low, and the Wuhan words speech recognition system in embedded mobile device needs spy Other Acoustic Modeling.The system will be modeled to Relational database using different primitives:
1. for numeral, syllable is employed as the primitive of model;
2. the right related sound of context and context are employed without text for control command collection or continuous speech recognition Simple or compound vowel of a Chinese syllable is used as model primitive.
This modeling method had both considered the coarticulation in syllable, the number of training primitive was reduced again, in acoustic mode Balance has been reached between type scale, calculating speed and discrimination.
In the system of the present invention, the language use of accent is substantially carried when being spoken standard Chinese pronunciation according to the domestic crowd in Hubei Present situation, sets up one for the purpose of studying non-native speaker mandarin continuous speech recognition, the standard Chinese of different accents Speech database, and carried out pronunciation variation law, speaker adaptation and non-native speaker accents recognition on its basis Research.
Based on the system of the present invention, allow cell phone platform not influenceed by accent, can correctly recognize each place accent.Voice Identification engine can be integrated into embedded mobile device.Research forms a set of practical speaker's accent adaptively side Case, is that further research and development from now in this respect lays the foundation.
By being modeled in voice layer using many Pronounceable dictionaries to pronunciation variation completely, above and below acoustic layer respectively use Literary irrelevant portions variation phone model (partial change phone model, PCPM) is built to part pronunciation variation Mould, so as to probe into feature, difference and the contact of voice layer and acoustic layer pronunciation mutation model, and would be integrated into voice knowledge In the different piece of other system, the layered shaping to pronunciation variation is realized.Using layering pronunciation mutation model, to band dialectal accent Standard Chinese is read aloud voice and tested, and improves recognition result.Respectively using the mandarin with the accent of Hubei some areas Tested, the difference of variation of pronouncing between the different dialectal accent of Chinese is analyzed with associating from experimental result.
Speech recognition technology (Auto Speech Recognize, abbreviation ASR) problem to be solved is to allow machine can The voice of " understanding " mankind, the text information included in voice " extraction " is come out, equivalent to " ear " is installed to machine, made It possesses the function of " can listen ".
The invention provides more accurate intelligent speech recognition technology, possess high recognition accuracy, high recognition speed, field Model is customizable, support the functions such as a variety of tupes, while small to have SDK to develop simple, kit resource occupation in the future Etc. advantage.The voice of input can be identified and text transcription in real time, exactly.And the language material by being constantly collected into, The optimization training of model is carried out, the coverage rate of model and the accuracy of identification is constantly improved.
Its application value is:
1) precisely identification, recognizes engine semantic context self-correction.
2) persistently, invalid voice is filtered in lasting recording, continuous identification for man-machine interaction.
3) it can be interrupted at any time based on semantic intelligence punctuate, support active interactive.
4) context is talked with, context understanding, is putd question to based on content, is talked with scene management more, across the shared length of scene information When remember.
5) personalization can be expanded, and the personalization of product performance customized user supports that interactive mode is expansible.
A kind of accent speech recognition system based on embedded mobile device of the present invention is by setting up the local dialect mouthful Sound Chinese speech database, carries out pronunciation variation law, speaker adaptation and non-native speaker on its basis Accents recognition is studied, and is explored solution multilingual and mixed, differentiation application environment, and different dialect and mother tongue user's language Sound recognizes problem, improves dialectal accent phonetic recognization rate, the difficulty for learning and training when reducing dialect crowd using speech recognition, Reduce study and training amount;Meanwhile, speech recognition is incorporated into various embedded mobile devices, Ke Yishi by the system Existing intelligent interaction.
The foregoing is only presently preferred embodiments of the present invention, be not intended to limit the invention, it is all the present invention spirit and Within principle, any modification, equivalent substitution and improvements made etc. should be included in the scope of the protection.

Claims (6)

1. a kind of accent speech recognition system based on embedded mobile device, it is characterised in that:Including being integrated in insertion Model training module, characteristic extracting module and Pattern Matching Module on formula mobile device,
The model training module is used to accent voice is collected and trained, and draws the entry model of accent;
The characteristic extracting module is used to extract the phonetic feature in the accent of input;
The Pattern Matching Module is used to carry out voice match calculating to the phonetic feature according to the entry model, draws language Sound recognition result.
2. a kind of accent speech recognition system based on embedded mobile device according to claim 1, its feature It is:The entry model includes acoustic model and linguistic model.
3. a kind of accent speech recognition system based on embedded mobile device according to claim 2, its feature It is:In the linguistic model, pronunciation variation completely is modeled using many Pronounceable dictionaries.
4. a kind of accent speech recognition system based on embedded mobile device according to claim 2, its feature It is:In the acoustic model, part pronunciation variation is modeled using context-free meristic variation phone model.
5. a kind of accent speech recognition system based on embedded mobile device according to claim 4, its feature It is:In the acoustic model, for numeral using primitive of the syllable as model;For control command collection or continuous speech Identification is used as model primitive using the simple or compound vowel of a Chinese syllable of the right related sound of context and context without text.
6. a kind of accent speech recognition system based on embedded mobile device according to any one of claim 1 to 5 System, it is characterised in that:The phonetic feature is carried out using neural network structure and cloud in the Pattern Matching Module Voice match is calculated.
CN201710198053.2A 2017-03-29 2017-03-29 A kind of accent speech recognition system based on embedded mobile device Pending CN106971721A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710198053.2A CN106971721A (en) 2017-03-29 2017-03-29 A kind of accent speech recognition system based on embedded mobile device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710198053.2A CN106971721A (en) 2017-03-29 2017-03-29 A kind of accent speech recognition system based on embedded mobile device

Publications (1)

Publication Number Publication Date
CN106971721A true CN106971721A (en) 2017-07-21

Family

ID=59336068

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710198053.2A Pending CN106971721A (en) 2017-03-29 2017-03-29 A kind of accent speech recognition system based on embedded mobile device

Country Status (1)

Country Link
CN (1) CN106971721A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109192194A (en) * 2018-08-22 2019-01-11 北京百度网讯科技有限公司 Voice data mask method, device, computer equipment and storage medium
CN110019683A (en) * 2017-12-29 2019-07-16 同方威视技术股份有限公司 Intelligent sound interaction robot and its voice interactive method
CN112259102A (en) * 2020-10-29 2021-01-22 适享智能科技(苏州)有限公司 Retail scene voice interaction optimization method based on knowledge graph
CN112349294A (en) * 2020-10-22 2021-02-09 腾讯科技(深圳)有限公司 Voice processing method and device, computer readable medium and electronic equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1412741A (en) * 2002-12-13 2003-04-23 郑方 Chinese speech identification method with dialect background
CN1741131A (en) * 2004-08-27 2006-03-01 中国科学院自动化研究所 A kind of unspecified person alone word audio recognition method and device
CN101281745A (en) * 2008-05-23 2008-10-08 深圳市北科瑞声科技有限公司 Interactive system for vehicle-mounted voice
CN103700370A (en) * 2013-12-04 2014-04-02 北京中科模识科技有限公司 Broadcast television voice recognition method and system
CN106057196A (en) * 2016-07-08 2016-10-26 成都之达科技有限公司 Vehicular voice data analysis identification method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1412741A (en) * 2002-12-13 2003-04-23 郑方 Chinese speech identification method with dialect background
CN1741131A (en) * 2004-08-27 2006-03-01 中国科学院自动化研究所 A kind of unspecified person alone word audio recognition method and device
CN101281745A (en) * 2008-05-23 2008-10-08 深圳市北科瑞声科技有限公司 Interactive system for vehicle-mounted voice
CN103700370A (en) * 2013-12-04 2014-04-02 北京中科模识科技有限公司 Broadcast television voice recognition method and system
CN106057196A (en) * 2016-07-08 2016-10-26 成都之达科技有限公司 Vehicular voice data analysis identification method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110019683A (en) * 2017-12-29 2019-07-16 同方威视技术股份有限公司 Intelligent sound interaction robot and its voice interactive method
CN109192194A (en) * 2018-08-22 2019-01-11 北京百度网讯科技有限公司 Voice data mask method, device, computer equipment and storage medium
CN112349294A (en) * 2020-10-22 2021-02-09 腾讯科技(深圳)有限公司 Voice processing method and device, computer readable medium and electronic equipment
CN112259102A (en) * 2020-10-29 2021-01-22 适享智能科技(苏州)有限公司 Retail scene voice interaction optimization method based on knowledge graph

Similar Documents

Publication Publication Date Title
CN110491382B (en) Speech recognition method and device based on artificial intelligence and speech interaction equipment
CN107195296B (en) Voice recognition method, device, terminal and system
WO2022057712A1 (en) Electronic device and semantic parsing method therefor, medium, and human-machine dialog system
CN111833845B (en) Multilingual speech recognition model training method, device, equipment and storage medium
CN109119072A (en) Civil aviaton's land sky call acoustic model construction method based on DNN-HMM
CN110473523A (en) A kind of audio recognition method, device, storage medium and terminal
Singh et al. ASRoIL: a comprehensive survey for automatic speech recognition of Indian languages
CN109523989A (en) Phoneme synthesizing method, speech synthetic device, storage medium and electronic equipment
CN109410914A (en) A kind of Jiangxi dialect phonetic and dialect point recognition methods
CN113205817B (en) Speech semantic recognition method, system, device and medium
CN109036391A (en) Audio recognition method, apparatus and system
CN110517664A (en) Multi-party speech recognition methods, device, equipment and readable storage medium storing program for executing
CN107972028A (en) Man-machine interaction method, device and electronic equipment
CN106971721A (en) A kind of accent speech recognition system based on embedded mobile device
CN109508402A (en) Violation term detection method and device
CN101515456A (en) Speech recognition interface unit and speed recognition method thereof
Zhao et al. End-to-end-based Tibetan multitask speech recognition
Vyas et al. An automatic emotion recognizer using MFCCs and Hidden Markov Models
Zeng Implementation of Embedded Technology-Based English Speech Identification and Translation System.
Shivakumar et al. A study on impact of language model in improving the accuracy of speech to text conversion system
CN112489634A (en) Language acoustic model training method and device, electronic equipment and computer medium
Rasipuram et al. Grapheme and multilingual posterior features for under-resourced speech recognition: a study on scottish gaelic
Sharma et al. Soft-Computational Techniques and Spectro-Temporal Features for Telephonic Speech Recognition: an overview and review of current state of the art
Daouad et al. An automatic speech recognition system for isolated Amazigh word using 1D & 2D CNN-LSTM architecture
Mon et al. Improving Myanmar automatic speech recognition with optimization of convolutional neural network parameters

Legal Events

Date Code Title Description
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20170721