CN106653007B - A kind of speech recognition system - Google Patents
A kind of speech recognition system Download PDFInfo
- Publication number
- CN106653007B CN106653007B CN201611101551.2A CN201611101551A CN106653007B CN 106653007 B CN106653007 B CN 106653007B CN 201611101551 A CN201611101551 A CN 201611101551A CN 106653007 B CN106653007 B CN 106653007B
- Authority
- CN
- China
- Prior art keywords
- network
- phonetic
- word
- specific identification
- identification device
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/28—Constructional details of speech recognition systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/16—Speech classification or search using artificial neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L15/18—Speech classification or search using natural language modelling
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Abstract
The present invention relates to a kind of speech recognition systems, by the basic basic identifier based on acoustic model to phonetic mapping network and any number of being collectively constituted based on phonetic to the specific identification device of word mapping network and an integrated decision-making unit for different application field.Voice first by basic identifier be mapped as by multiple candidate pinyin sequentials organization at network, then the phonetic network passes through again is combined with the specific identification device of a particular application target, the search that optimal path is finally carried out on network after combining, obtains final recognition result.Under this framework, phonetic network can be combined with the individual phonetic of multiple application fields to the specific identification device that word maps, and finally select optimal recognition result according to acoustics and language model scoring and the relevant super rule of other application.
Description
Technical field
The present invention relates to technical field of voice recognition, more particularly to a kind of speech recognition that can carry out online field extension
System.
Background technique
Chinese is not to combine language into syllables, if being difficult to directly conclude corresponding Chinese character from sound without contextual information.Traditional
Speech recognition is decoded using pre-generated static decoding network, and the decoding network is usually directly to map from phoneme
For word.The decoding network has merged the probability distribution information of the word for the audio content to be identified.Cause in this way identifier from
When one field is switched to another field, performance can sharply decline, and other term and neologisms may always can not be just
Really identification.In order to support the identification of multiple fields, the probability point of the word of multiple fields is usually modeled simultaneously with a model
Cloth information.This causes model probability distribution relatively average (this means that recognition performance is generally also relatively average), and model
It is huger.In order to support the identification of neologisms or term, it is necessary to re -training model and conformation identification device.This is to expend very much
Time and resource.
In view of the above shortcomings, the designer, is actively subject to research and innovation, can be led online to found one kind
The speech recognition system of domain extension, makes it with more the utility value in industry.
Summary of the invention
In order to solve the above technical problems, the object of the present invention is to provide one kind can carry out online field extension, so as to
Quickly improve the speech recognition system of the recognition performance of specific area.
Speech recognition system of the invention, including
Based on the basic identifier of acoustic model to phonetic mapping network, for being mapped as voice by multiple candidate spellings
Sound sequential organization at network;
Multiple specific identification devices based on phonetic to word mapping network for different application field arranged side by side, are used for
Respectively with by multiple candidate pinyin sequentials organization at network be combined, obtain multiple best word sequences and confidence level;
Integrated decision-making unit, for receiving multiple best word sequences and confidence level, then according to confidence level along with preparatory
Given priori knowledge and rule and additional knowledge, carry out decision, and optimal word sequence is selected to export.
Further, by adjusting phonetic to word mapping network, add new identification content to existing field based on
Phonetic updates the identification content in existing field into the specific identification device of word mapping network;By constructing corresponding base offline
In the specific identification device of phonetic to word mapping network, extension content is added to online then, net is mapped based on phonetic to word
In the specific identification device of network, the identification content of new application field is created.
Further, the basic identifier based on acoustic model to phonetic mapping network is according to the audio frequency characteristics of input
Dynamic calculates acoustic score, and the language model scores of pinyin sequence are preserved on its network, using dynamic programming algorithm knot
Acoustic score and language model scores are closed, several pinyin sequences output of highest scoring is searched for.
Further, the language model of the pinyin sequence use the recurrent neural network based on long memory unit in short-term into
Row modeling.
Further, the integrated decision-making unit passes through fusion recognition confidence level, priori knowledge and preset rules and attached
Add information to select optimal candidate word sequence.
Further, the priori knowledge includes at least the mark about field inputted except the speech recognition system
Information, or the field designation information obtained according to recognition result historical information.
Further, the field designation information is discrete 0/1 to set or continuous probability value.
Further, the preset rules include at least the word number range estimated according to audio length.
Further, the additional information include obtained according to super language model meet language about recognition result word string
The degree of method specification is measured.
Further, the integrated decision-making unit by the additional information and preset rules by way of stratified calculation and
Confidence score selects candidate word sequence to export as final recognition result together as decision rule.
According to the above aspect of the present invention, the present invention dynamically will can map net based on phonetic to word for different field online
The specific identification device of network is added in identifying system, can quickly improve the recognition performance of specific area;It can fast custom extension
Field, addition hot word/neologisms, customization field identify content;It supports the identification of multiple fields simultaneously, and guarantees its recognition performance not
Decline.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention,
And can be implemented in accordance with the contents of the specification, the following is a detailed description of the preferred embodiments of the present invention and the accompanying drawings.
Detailed description of the invention
Fig. 1 is speech recognition system frame diagram of the invention.
Specific embodiment
With reference to the accompanying drawings and examples, specific embodiments of the present invention will be described in further detail.Implement below
Example is not intended to limit the scope of the invention for illustrating the present invention.
Referring to Fig. 1, a kind of speech recognition system described in a preferred embodiment of the present invention, by basic based on acoustic model
To phonetic mapping network basic identifier and it is any number of for different application field based on phonetic to word mapping network
Specific identification device and an integrated decision-making unit collectively constitute, wherein based on acoustic model to the basis of phonetic mapping network
Identifier be used for by voice be mapped as by multiple candidate pinyin sequentials organization at network;Respectively based on phonetic to word mapping network
Specific identification device for respectively with by multiple candidate pinyin sequentials organization at network be combined, obtain multiple best word order
Column and confidence level;Integrated decision-making unit is for receiving multiple best word sequences and confidence level, then according to confidence level along with pre-
First given priori knowledge and rule and additional knowledge, carry out decision, optimal word sequence are selected to export.
The specific identification device based on phonetic to word mapping network for different field of the invention can online dynamically
It is added in identifying system, so as to quickly improve the recognition performance of specific area.In the present invention, respectively based on phonetic to word
The specific identification device of mapping network be it is arranged side by side, can be with Quick Extended.Specifically, by adjusting phonetic to word mapping network,
The new identification content of addition updates existing field into the specific identification device based on phonetic to word mapping network in existing field
Identification content;It, then will be in extension by constructing the corresponding specific identification device based on phonetic to word mapping network offline
Appearance is added in the specific identification device based on phonetic to word mapping network online, creates the identification content of new application field.
When concrete application, the identification content in existing field is updated, such as the addition of neologisms/hot word, it is only necessary to adjust phonetic and arrive
Word mapping network, without being related to the adjustment of acoustic model and base recognizer;The addition of new application field identification content, than
Such as: home control, vehicle mounted guidance etc., it is only necessary to construct corresponding phonetic offline to word mapping network, then can add online
It is added in identifying system, to not influence the identification process in existing field.
Basic identifier based on acoustic model to phonetic mapping network in the present invention is according to the audio frequency characteristics dynamic of input
Acoustic score is calculated, and preserves the language model scores of pinyin sequence on its network, using dynamic programming algorithm combination sound
It learns point and language model scores, searches for several pinyin sequences output of highest scoring, and the language model of pinyin sequence uses
Recurrent neural network based on long memory unit in short-term is modeled.
Above-mentioned each network in the present invention be embodied in systems a weighted finite state automatic machine (WFST,
Weighted Finite State Transducers).The sequence of input can be mapped as other sequence by the automatic machine
Column.In the basic identifier based on acoustic model to phonetic mapping network, the language mould of pinyin sequence is saved on the network
Type score calculates acoustic score according to the audio frequency characteristics of input dynamic, using dynamic programming algorithm at this in decoding process
Acoustic score and language model scores are combined in WFST network, search for several pinyin sequences of highest scoring as more candidate results
Output.
When it is implemented, phonetic language model can be using based on long short-term memory (LSTM, Long-short Term
Memory) recurrent neural network (RNN, Recurrent Neural Network) of unit is modeled, and strengthens spelling in this way
The association of sound context improves the accuracy of the more candidate recognition results of phonetic.
In the present invention, its input of the specific identification device based on phonetic to word mapping network is to indicate more candidate pinyin sequences
Network and phonetic to word mapping network, output be best word sequence and its confidence indicator.More candidate pinyin sequences
Network can be expressed as the WFST that a phonetic is mapped to phonetic, and the mapping network of phonetic to word is also expressed as one
WFST, path weight value are mapping cost of the pinyin sequence to word sequence.Identification process is combined to two WFST first
A new WFST is generated, the sequence of highest scoring is then searched for from the WFST, exports its word sequence and score.
In the present invention, integrated decision-making unit is received from multiple specific identification devices based on phonetic to word mapping network
Output, i.e. word sequence and its confidence level, then according to its confidence level along with previously given priori knowledge and rule and
Additional knowledge carries out decision, and optimal word sequence is selected to export.Specifically, so-called priori knowledge includes at least: identifying system
Except the identification information about field that inputs, or the field designation information obtained according to recognition result historical information.It is so-called
Field designation information can be discrete 0/1 and set, and be also possible to continuous probability value.Specifically, so-called rule includes at least:
The word number range estimated according to audio length.According to word number range, those overlength or ultrashort recognition result can be excluded.It is special
Fixed, so-called additional information may include being obtained according to super language model about recognition result word string grammaticalness specification
Degree measurement.Above- mentioned information and rule are by way of stratified calculation and confidence score selects to wait together as decision rule
Word string is selected to export as final recognition result.
The above is only a preferred embodiment of the present invention, it is not intended to restrict the invention, it is noted that for this skill
For the those of ordinary skill in art field, without departing from the technical principles of the invention, can also make it is several improvement and
Modification, these improvements and modifications also should be regarded as protection scope of the present invention.
Claims (1)
1. a kind of speech recognition system, it is characterised in that: including
Based on the basic identifier of acoustic model to phonetic mapping network, for being mapped as voice by multiple candidate pinyin sequences
Arrange the network being organized into;
Multiple specific identification devices based on phonetic to word mapping network for different application field arranged side by side, for distinguishing
With by multiple candidate pinyin sequentials organization at network be combined, obtain multiple best word sequences and confidence level;
Integrated decision-making unit, for receiving multiple best word sequences and confidence level, then according to confidence level along with previously given
Priori knowledge and preset rules and additional information, carry out decision, optimal word sequence selected to export;
By adjusting phonetic to word mapping network, new identification content mapping based on phonetic to word to existing field is added
In the specific identification device of network, the identification content in existing field is updated;It corresponding is reflected based on phonetic to word by constructing offline
The specific identification device of network is penetrated, extension content is then added to the specific identification device based on phonetic to word mapping network online
In, create the identification content of new application field;
The basic identifier based on acoustic model to phonetic mapping network calculates acoustics according to the audio frequency characteristics dynamic of input
Score, and preserve on its network the language model scores of pinyin sequence, using dynamic programming algorithm combination acoustic score and
Language model scores search for several pinyin sequences output of highest scoring;
The language model of the pinyin sequence uses the recurrent neural network based on long memory unit in short-term to be modeled;
The integrated decision-making unit is selected most by fusion recognition confidence level, priori knowledge and preset rules and additional information
Good candidate's word sequence;
The priori knowledge includes at least the identification information about field that inputs except the speech recognition system, or according to
The field designation information that recognition result historical information obtains;
The field designation information is discrete 0/1 to set or continuous probability value;
The preset rules include at least the word number range estimated according to audio length;
The additional information includes the degree about recognition result word string grammaticalness specification obtained according to super language model
Measurement;
The integrated decision-making unit is by the additional information and preset rules by way of stratified calculation and confidence score one
Candidate word sequence is selected to export as final recognition result as decision rule.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611101551.2A CN106653007B (en) | 2016-12-05 | 2016-12-05 | A kind of speech recognition system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611101551.2A CN106653007B (en) | 2016-12-05 | 2016-12-05 | A kind of speech recognition system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106653007A CN106653007A (en) | 2017-05-10 |
CN106653007B true CN106653007B (en) | 2019-07-16 |
Family
ID=58818327
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611101551.2A Active CN106653007B (en) | 2016-12-05 | 2016-12-05 | A kind of speech recognition system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106653007B (en) |
Families Citing this family (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108959238B (en) * | 2017-05-24 | 2021-12-31 | 艺龙网信息技术(北京)有限公司 | Input stream identification method, device and computer readable storage medium |
CN107507621B (en) * | 2017-07-28 | 2021-06-22 | 维沃移动通信有限公司 | Noise suppression method and mobile terminal |
CN107767858B (en) * | 2017-09-08 | 2021-05-04 | 科大讯飞股份有限公司 | Pronunciation dictionary generating method and device, storage medium and electronic equipment |
CN110689881B (en) * | 2018-06-20 | 2022-07-12 | 深圳市北科瑞声科技股份有限公司 | Speech recognition method, speech recognition device, computer equipment and storage medium |
CN108899013B (en) * | 2018-06-27 | 2023-04-18 | 广州视源电子科技股份有限公司 | Voice search method and device and voice recognition system |
CN111354347B (en) * | 2018-12-21 | 2023-08-15 | 中国科学院声学研究所 | Speech recognition method and system based on self-adaptive hotword weight |
CN110148416B (en) * | 2019-04-23 | 2024-03-15 | 腾讯科技(深圳)有限公司 | Speech recognition method, device, equipment and storage medium |
CN110111775B (en) * | 2019-05-17 | 2021-06-22 | 腾讯科技(深圳)有限公司 | Streaming voice recognition method, device, equipment and storage medium |
CN110322884B (en) * | 2019-07-09 | 2021-12-07 | 科大讯飞股份有限公司 | Word insertion method, device, equipment and storage medium of decoding network |
CN112242142B (en) * | 2019-07-17 | 2024-01-30 | 北京搜狗科技发展有限公司 | Voice recognition input method and related device |
CN110992959A (en) * | 2019-12-06 | 2020-04-10 | 北京市科学技术情报研究所 | Voice recognition method and system |
CN113299283B (en) * | 2021-04-28 | 2023-03-10 | 上海淇玥信息技术有限公司 | Speech recognition method, system, apparatus and medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1674091A (en) * | 2005-04-18 | 2005-09-28 | 南京师范大学 | Sound identifying method for geographic information and its application in navigation system |
CN101067780A (en) * | 2007-06-21 | 2007-11-07 | 腾讯科技(深圳)有限公司 | Character inputting system and method for intelligent equipment |
US7783484B2 (en) * | 2003-04-04 | 2010-08-24 | Nuance Communications, Inc. | Apparatus for reducing spurious insertions in speech recognition |
CN101901599A (en) * | 2009-05-19 | 2010-12-01 | 塔塔咨询服务有限公司 | The system and method for the quick original shapeization of the existing voice identifying schemes of different language |
CN103578464A (en) * | 2013-10-18 | 2014-02-12 | 威盛电子股份有限公司 | Language model establishing method, speech recognition method and electronic device |
CN104575497A (en) * | 2013-10-28 | 2015-04-29 | 中国科学院声学研究所 | Method for building acoustic model and speech decoding method based on acoustic model |
-
2016
- 2016-12-05 CN CN201611101551.2A patent/CN106653007B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US7783484B2 (en) * | 2003-04-04 | 2010-08-24 | Nuance Communications, Inc. | Apparatus for reducing spurious insertions in speech recognition |
CN1674091A (en) * | 2005-04-18 | 2005-09-28 | 南京师范大学 | Sound identifying method for geographic information and its application in navigation system |
CN101067780A (en) * | 2007-06-21 | 2007-11-07 | 腾讯科技(深圳)有限公司 | Character inputting system and method for intelligent equipment |
CN101901599A (en) * | 2009-05-19 | 2010-12-01 | 塔塔咨询服务有限公司 | The system and method for the quick original shapeization of the existing voice identifying schemes of different language |
CN103578464A (en) * | 2013-10-18 | 2014-02-12 | 威盛电子股份有限公司 | Language model establishing method, speech recognition method and electronic device |
CN104575497A (en) * | 2013-10-28 | 2015-04-29 | 中国科学院声学研究所 | Method for building acoustic model and speech decoding method based on acoustic model |
Non-Patent Citations (1)
Title |
---|
一种基于TTRNN的汉语拼音全音节识别方法;赵以宝等;《哈尔滨工业大学学报》;20010430;第213-216页 |
Also Published As
Publication number | Publication date |
---|---|
CN106653007A (en) | 2017-05-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106653007B (en) | A kind of speech recognition system | |
US8494850B2 (en) | Speech recognition using variable-length context | |
EP2460155B1 (en) | Method for improving speech recognition accuracy by use of geographic information | |
CN110364171A (en) | A kind of audio recognition method, speech recognition system and storage medium | |
US9336771B2 (en) | Speech recognition using non-parametric models | |
US9190054B1 (en) | Natural language refinement of voice and text entry | |
CN106571139B (en) | Phonetic search result processing method and device based on artificial intelligence | |
CN104780388B (en) | The cutting method and device of a kind of video data | |
CN104376065B (en) | The determination method and apparatus of term importance | |
CN106528845A (en) | Artificial intelligence-based searching error correction method and apparatus | |
CN104765996B (en) | Voiceprint password authentication method and system | |
CN109637520B (en) | Sensitive content identification method, device, terminal and medium based on voice analysis | |
CN109243461B (en) | Voice recognition method, device, equipment and storage medium | |
CN106202153A (en) | The spelling error correction method of a kind of ES search engine and system | |
CN106663424A (en) | Device and method for understanding user intent | |
CA2508946A1 (en) | Method and apparatus for natural language call routing using confidence scores | |
CN106875949A (en) | A kind of bearing calibration of speech recognition and device | |
CN109976702A (en) | A kind of audio recognition method, device and terminal | |
WO2021040842A1 (en) | Optimizing a keyword spotting system | |
KR101519591B1 (en) | System and method for processing virtual interview based speech recognition | |
CN108073565A (en) | The method and apparatus and machine translation method and equipment of words criterion | |
CN109346056A (en) | Phoneme synthesizing method and device based on depth measure network | |
CN108595609A (en) | Generation method, system, medium and equipment are replied by robot based on personage IP | |
CN106843523A (en) | Character input method and device based on artificial intelligence | |
CN109684928A (en) | Chinese document recognition methods based on Internal retrieval |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |