CN106653007B - A kind of speech recognition system - Google Patents

A kind of speech recognition system Download PDF

Info

Publication number
CN106653007B
CN106653007B CN201611101551.2A CN201611101551A CN106653007B CN 106653007 B CN106653007 B CN 106653007B CN 201611101551 A CN201611101551 A CN 201611101551A CN 106653007 B CN106653007 B CN 106653007B
Authority
CN
China
Prior art keywords
network
phonetic
word
specific identification
identification device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201611101551.2A
Other languages
Chinese (zh)
Other versions
CN106653007A (en
Inventor
沈小正
张光宇
朱孟旭
代大明
肖佳林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Qdreamer Network Science And Technology Co Ltd
Original Assignee
Suzhou Qdreamer Network Science And Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Qdreamer Network Science And Technology Co Ltd filed Critical Suzhou Qdreamer Network Science And Technology Co Ltd
Priority to CN201611101551.2A priority Critical patent/CN106653007B/en
Publication of CN106653007A publication Critical patent/CN106653007A/en
Application granted granted Critical
Publication of CN106653007B publication Critical patent/CN106653007B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Abstract

The present invention relates to a kind of speech recognition systems, by the basic basic identifier based on acoustic model to phonetic mapping network and any number of being collectively constituted based on phonetic to the specific identification device of word mapping network and an integrated decision-making unit for different application field.Voice first by basic identifier be mapped as by multiple candidate pinyin sequentials organization at network, then the phonetic network passes through again is combined with the specific identification device of a particular application target, the search that optimal path is finally carried out on network after combining, obtains final recognition result.Under this framework, phonetic network can be combined with the individual phonetic of multiple application fields to the specific identification device that word maps, and finally select optimal recognition result according to acoustics and language model scoring and the relevant super rule of other application.

Description

A kind of speech recognition system
Technical field
The present invention relates to technical field of voice recognition, more particularly to a kind of speech recognition that can carry out online field extension System.
Background technique
Chinese is not to combine language into syllables, if being difficult to directly conclude corresponding Chinese character from sound without contextual information.Traditional Speech recognition is decoded using pre-generated static decoding network, and the decoding network is usually directly to map from phoneme For word.The decoding network has merged the probability distribution information of the word for the audio content to be identified.Cause in this way identifier from When one field is switched to another field, performance can sharply decline, and other term and neologisms may always can not be just Really identification.In order to support the identification of multiple fields, the probability point of the word of multiple fields is usually modeled simultaneously with a model Cloth information.This causes model probability distribution relatively average (this means that recognition performance is generally also relatively average), and model It is huger.In order to support the identification of neologisms or term, it is necessary to re -training model and conformation identification device.This is to expend very much Time and resource.
In view of the above shortcomings, the designer, is actively subject to research and innovation, can be led online to found one kind The speech recognition system of domain extension, makes it with more the utility value in industry.
Summary of the invention
In order to solve the above technical problems, the object of the present invention is to provide one kind can carry out online field extension, so as to Quickly improve the speech recognition system of the recognition performance of specific area.
Speech recognition system of the invention, including
Based on the basic identifier of acoustic model to phonetic mapping network, for being mapped as voice by multiple candidate spellings Sound sequential organization at network;
Multiple specific identification devices based on phonetic to word mapping network for different application field arranged side by side, are used for Respectively with by multiple candidate pinyin sequentials organization at network be combined, obtain multiple best word sequences and confidence level;
Integrated decision-making unit, for receiving multiple best word sequences and confidence level, then according to confidence level along with preparatory Given priori knowledge and rule and additional knowledge, carry out decision, and optimal word sequence is selected to export.
Further, by adjusting phonetic to word mapping network, add new identification content to existing field based on Phonetic updates the identification content in existing field into the specific identification device of word mapping network;By constructing corresponding base offline In the specific identification device of phonetic to word mapping network, extension content is added to online then, net is mapped based on phonetic to word In the specific identification device of network, the identification content of new application field is created.
Further, the basic identifier based on acoustic model to phonetic mapping network is according to the audio frequency characteristics of input Dynamic calculates acoustic score, and the language model scores of pinyin sequence are preserved on its network, using dynamic programming algorithm knot Acoustic score and language model scores are closed, several pinyin sequences output of highest scoring is searched for.
Further, the language model of the pinyin sequence use the recurrent neural network based on long memory unit in short-term into Row modeling.
Further, the integrated decision-making unit passes through fusion recognition confidence level, priori knowledge and preset rules and attached Add information to select optimal candidate word sequence.
Further, the priori knowledge includes at least the mark about field inputted except the speech recognition system Information, or the field designation information obtained according to recognition result historical information.
Further, the field designation information is discrete 0/1 to set or continuous probability value.
Further, the preset rules include at least the word number range estimated according to audio length.
Further, the additional information include obtained according to super language model meet language about recognition result word string The degree of method specification is measured.
Further, the integrated decision-making unit by the additional information and preset rules by way of stratified calculation and Confidence score selects candidate word sequence to export as final recognition result together as decision rule.
According to the above aspect of the present invention, the present invention dynamically will can map net based on phonetic to word for different field online The specific identification device of network is added in identifying system, can quickly improve the recognition performance of specific area;It can fast custom extension Field, addition hot word/neologisms, customization field identify content;It supports the identification of multiple fields simultaneously, and guarantees its recognition performance not Decline.
The above description is only an overview of the technical scheme of the present invention, in order to better understand the technical means of the present invention, And can be implemented in accordance with the contents of the specification, the following is a detailed description of the preferred embodiments of the present invention and the accompanying drawings.
Detailed description of the invention
Fig. 1 is speech recognition system frame diagram of the invention.
Specific embodiment
With reference to the accompanying drawings and examples, specific embodiments of the present invention will be described in further detail.Implement below Example is not intended to limit the scope of the invention for illustrating the present invention.
Referring to Fig. 1, a kind of speech recognition system described in a preferred embodiment of the present invention, by basic based on acoustic model To phonetic mapping network basic identifier and it is any number of for different application field based on phonetic to word mapping network Specific identification device and an integrated decision-making unit collectively constitute, wherein based on acoustic model to the basis of phonetic mapping network Identifier be used for by voice be mapped as by multiple candidate pinyin sequentials organization at network;Respectively based on phonetic to word mapping network Specific identification device for respectively with by multiple candidate pinyin sequentials organization at network be combined, obtain multiple best word order Column and confidence level;Integrated decision-making unit is for receiving multiple best word sequences and confidence level, then according to confidence level along with pre- First given priori knowledge and rule and additional knowledge, carry out decision, optimal word sequence are selected to export.
The specific identification device based on phonetic to word mapping network for different field of the invention can online dynamically It is added in identifying system, so as to quickly improve the recognition performance of specific area.In the present invention, respectively based on phonetic to word The specific identification device of mapping network be it is arranged side by side, can be with Quick Extended.Specifically, by adjusting phonetic to word mapping network, The new identification content of addition updates existing field into the specific identification device based on phonetic to word mapping network in existing field Identification content;It, then will be in extension by constructing the corresponding specific identification device based on phonetic to word mapping network offline Appearance is added in the specific identification device based on phonetic to word mapping network online, creates the identification content of new application field. When concrete application, the identification content in existing field is updated, such as the addition of neologisms/hot word, it is only necessary to adjust phonetic and arrive Word mapping network, without being related to the adjustment of acoustic model and base recognizer;The addition of new application field identification content, than Such as: home control, vehicle mounted guidance etc., it is only necessary to construct corresponding phonetic offline to word mapping network, then can add online It is added in identifying system, to not influence the identification process in existing field.
Basic identifier based on acoustic model to phonetic mapping network in the present invention is according to the audio frequency characteristics dynamic of input Acoustic score is calculated, and preserves the language model scores of pinyin sequence on its network, using dynamic programming algorithm combination sound It learns point and language model scores, searches for several pinyin sequences output of highest scoring, and the language model of pinyin sequence uses Recurrent neural network based on long memory unit in short-term is modeled.
Above-mentioned each network in the present invention be embodied in systems a weighted finite state automatic machine (WFST, Weighted Finite State Transducers).The sequence of input can be mapped as other sequence by the automatic machine Column.In the basic identifier based on acoustic model to phonetic mapping network, the language mould of pinyin sequence is saved on the network Type score calculates acoustic score according to the audio frequency characteristics of input dynamic, using dynamic programming algorithm at this in decoding process Acoustic score and language model scores are combined in WFST network, search for several pinyin sequences of highest scoring as more candidate results Output.
When it is implemented, phonetic language model can be using based on long short-term memory (LSTM, Long-short Term Memory) recurrent neural network (RNN, Recurrent Neural Network) of unit is modeled, and strengthens spelling in this way The association of sound context improves the accuracy of the more candidate recognition results of phonetic.
In the present invention, its input of the specific identification device based on phonetic to word mapping network is to indicate more candidate pinyin sequences Network and phonetic to word mapping network, output be best word sequence and its confidence indicator.More candidate pinyin sequences Network can be expressed as the WFST that a phonetic is mapped to phonetic, and the mapping network of phonetic to word is also expressed as one WFST, path weight value are mapping cost of the pinyin sequence to word sequence.Identification process is combined to two WFST first A new WFST is generated, the sequence of highest scoring is then searched for from the WFST, exports its word sequence and score.
In the present invention, integrated decision-making unit is received from multiple specific identification devices based on phonetic to word mapping network Output, i.e. word sequence and its confidence level, then according to its confidence level along with previously given priori knowledge and rule and Additional knowledge carries out decision, and optimal word sequence is selected to export.Specifically, so-called priori knowledge includes at least: identifying system Except the identification information about field that inputs, or the field designation information obtained according to recognition result historical information.It is so-called Field designation information can be discrete 0/1 and set, and be also possible to continuous probability value.Specifically, so-called rule includes at least: The word number range estimated according to audio length.According to word number range, those overlength or ultrashort recognition result can be excluded.It is special Fixed, so-called additional information may include being obtained according to super language model about recognition result word string grammaticalness specification Degree measurement.Above- mentioned information and rule are by way of stratified calculation and confidence score selects to wait together as decision rule Word string is selected to export as final recognition result.
The above is only a preferred embodiment of the present invention, it is not intended to restrict the invention, it is noted that for this skill For the those of ordinary skill in art field, without departing from the technical principles of the invention, can also make it is several improvement and Modification, these improvements and modifications also should be regarded as protection scope of the present invention.

Claims (1)

1. a kind of speech recognition system, it is characterised in that: including
Based on the basic identifier of acoustic model to phonetic mapping network, for being mapped as voice by multiple candidate pinyin sequences Arrange the network being organized into;
Multiple specific identification devices based on phonetic to word mapping network for different application field arranged side by side, for distinguishing With by multiple candidate pinyin sequentials organization at network be combined, obtain multiple best word sequences and confidence level;
Integrated decision-making unit, for receiving multiple best word sequences and confidence level, then according to confidence level along with previously given Priori knowledge and preset rules and additional information, carry out decision, optimal word sequence selected to export;
By adjusting phonetic to word mapping network, new identification content mapping based on phonetic to word to existing field is added In the specific identification device of network, the identification content in existing field is updated;It corresponding is reflected based on phonetic to word by constructing offline The specific identification device of network is penetrated, extension content is then added to the specific identification device based on phonetic to word mapping network online In, create the identification content of new application field;
The basic identifier based on acoustic model to phonetic mapping network calculates acoustics according to the audio frequency characteristics dynamic of input Score, and preserve on its network the language model scores of pinyin sequence, using dynamic programming algorithm combination acoustic score and Language model scores search for several pinyin sequences output of highest scoring;
The language model of the pinyin sequence uses the recurrent neural network based on long memory unit in short-term to be modeled;
The integrated decision-making unit is selected most by fusion recognition confidence level, priori knowledge and preset rules and additional information Good candidate's word sequence;
The priori knowledge includes at least the identification information about field that inputs except the speech recognition system, or according to The field designation information that recognition result historical information obtains;
The field designation information is discrete 0/1 to set or continuous probability value;
The preset rules include at least the word number range estimated according to audio length;
The additional information includes the degree about recognition result word string grammaticalness specification obtained according to super language model Measurement;
The integrated decision-making unit is by the additional information and preset rules by way of stratified calculation and confidence score one Candidate word sequence is selected to export as final recognition result as decision rule.
CN201611101551.2A 2016-12-05 2016-12-05 A kind of speech recognition system Active CN106653007B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611101551.2A CN106653007B (en) 2016-12-05 2016-12-05 A kind of speech recognition system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611101551.2A CN106653007B (en) 2016-12-05 2016-12-05 A kind of speech recognition system

Publications (2)

Publication Number Publication Date
CN106653007A CN106653007A (en) 2017-05-10
CN106653007B true CN106653007B (en) 2019-07-16

Family

ID=58818327

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611101551.2A Active CN106653007B (en) 2016-12-05 2016-12-05 A kind of speech recognition system

Country Status (1)

Country Link
CN (1) CN106653007B (en)

Families Citing this family (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108959238B (en) * 2017-05-24 2021-12-31 艺龙网信息技术(北京)有限公司 Input stream identification method, device and computer readable storage medium
CN107507621B (en) * 2017-07-28 2021-06-22 维沃移动通信有限公司 Noise suppression method and mobile terminal
CN107767858B (en) * 2017-09-08 2021-05-04 科大讯飞股份有限公司 Pronunciation dictionary generating method and device, storage medium and electronic equipment
CN110689881B (en) * 2018-06-20 2022-07-12 深圳市北科瑞声科技股份有限公司 Speech recognition method, speech recognition device, computer equipment and storage medium
CN108899013B (en) * 2018-06-27 2023-04-18 广州视源电子科技股份有限公司 Voice search method and device and voice recognition system
CN111354347B (en) * 2018-12-21 2023-08-15 中国科学院声学研究所 Speech recognition method and system based on self-adaptive hotword weight
CN110148416B (en) * 2019-04-23 2024-03-15 腾讯科技(深圳)有限公司 Speech recognition method, device, equipment and storage medium
CN110111775B (en) * 2019-05-17 2021-06-22 腾讯科技(深圳)有限公司 Streaming voice recognition method, device, equipment and storage medium
CN110322884B (en) * 2019-07-09 2021-12-07 科大讯飞股份有限公司 Word insertion method, device, equipment and storage medium of decoding network
CN112242142B (en) * 2019-07-17 2024-01-30 北京搜狗科技发展有限公司 Voice recognition input method and related device
CN110992959A (en) * 2019-12-06 2020-04-10 北京市科学技术情报研究所 Voice recognition method and system
CN113299283B (en) * 2021-04-28 2023-03-10 上海淇玥信息技术有限公司 Speech recognition method, system, apparatus and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1674091A (en) * 2005-04-18 2005-09-28 南京师范大学 Sound identifying method for geographic information and its application in navigation system
CN101067780A (en) * 2007-06-21 2007-11-07 腾讯科技(深圳)有限公司 Character inputting system and method for intelligent equipment
US7783484B2 (en) * 2003-04-04 2010-08-24 Nuance Communications, Inc. Apparatus for reducing spurious insertions in speech recognition
CN101901599A (en) * 2009-05-19 2010-12-01 塔塔咨询服务有限公司 The system and method for the quick original shapeization of the existing voice identifying schemes of different language
CN103578464A (en) * 2013-10-18 2014-02-12 威盛电子股份有限公司 Language model establishing method, speech recognition method and electronic device
CN104575497A (en) * 2013-10-28 2015-04-29 中国科学院声学研究所 Method for building acoustic model and speech decoding method based on acoustic model

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7783484B2 (en) * 2003-04-04 2010-08-24 Nuance Communications, Inc. Apparatus for reducing spurious insertions in speech recognition
CN1674091A (en) * 2005-04-18 2005-09-28 南京师范大学 Sound identifying method for geographic information and its application in navigation system
CN101067780A (en) * 2007-06-21 2007-11-07 腾讯科技(深圳)有限公司 Character inputting system and method for intelligent equipment
CN101901599A (en) * 2009-05-19 2010-12-01 塔塔咨询服务有限公司 The system and method for the quick original shapeization of the existing voice identifying schemes of different language
CN103578464A (en) * 2013-10-18 2014-02-12 威盛电子股份有限公司 Language model establishing method, speech recognition method and electronic device
CN104575497A (en) * 2013-10-28 2015-04-29 中国科学院声学研究所 Method for building acoustic model and speech decoding method based on acoustic model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
一种基于TTRNN的汉语拼音全音节识别方法;赵以宝等;《哈尔滨工业大学学报》;20010430;第213-216页

Also Published As

Publication number Publication date
CN106653007A (en) 2017-05-10

Similar Documents

Publication Publication Date Title
CN106653007B (en) A kind of speech recognition system
US8494850B2 (en) Speech recognition using variable-length context
EP2460155B1 (en) Method for improving speech recognition accuracy by use of geographic information
CN110364171A (en) A kind of audio recognition method, speech recognition system and storage medium
US9336771B2 (en) Speech recognition using non-parametric models
US9190054B1 (en) Natural language refinement of voice and text entry
CN106571139B (en) Phonetic search result processing method and device based on artificial intelligence
CN104780388B (en) The cutting method and device of a kind of video data
CN104376065B (en) The determination method and apparatus of term importance
CN106528845A (en) Artificial intelligence-based searching error correction method and apparatus
CN104765996B (en) Voiceprint password authentication method and system
CN109637520B (en) Sensitive content identification method, device, terminal and medium based on voice analysis
CN109243461B (en) Voice recognition method, device, equipment and storage medium
CN106202153A (en) The spelling error correction method of a kind of ES search engine and system
CN106663424A (en) Device and method for understanding user intent
CA2508946A1 (en) Method and apparatus for natural language call routing using confidence scores
CN106875949A (en) A kind of bearing calibration of speech recognition and device
CN109976702A (en) A kind of audio recognition method, device and terminal
WO2021040842A1 (en) Optimizing a keyword spotting system
KR101519591B1 (en) System and method for processing virtual interview based speech recognition
CN108073565A (en) The method and apparatus and machine translation method and equipment of words criterion
CN109346056A (en) Phoneme synthesizing method and device based on depth measure network
CN108595609A (en) Generation method, system, medium and equipment are replied by robot based on personage IP
CN106843523A (en) Character input method and device based on artificial intelligence
CN109684928A (en) Chinese document recognition methods based on Internal retrieval

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant