CN106098059A - customizable voice awakening method and system - Google Patents

customizable voice awakening method and system Download PDF

Info

Publication number
CN106098059A
CN106098059A CN201610462976.XA CN201610462976A CN106098059A CN 106098059 A CN106098059 A CN 106098059A CN 201610462976 A CN201610462976 A CN 201610462976A CN 106098059 A CN106098059 A CN 106098059A
Authority
CN
China
Prior art keywords
model
phoneme sequence
phoneme
sequence
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610462976.XA
Other languages
Chinese (zh)
Other versions
CN106098059B (en
Inventor
俞凯
钱彦旻
庄毅萌
陈哲怀
常烜恺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sipic Technology Co Ltd
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201610462976.XA priority Critical patent/CN106098059B/en
Publication of CN106098059A publication Critical patent/CN106098059A/en
Application granted granted Critical
Publication of CN106098059B publication Critical patent/CN106098059B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • G10L2015/025Phonemes, fenemes or fenones being the recognition units
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0631Creating reference templates; Clustering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

A kind of customizable voice awakening method and system, based on long memory network in short-term and connection sequential disaggregated model by using the phoneme information of voice messaging being modeled and is trained model, the possible aligned phoneme sequence waking up word up most like that after using training, model carries out testing and searching for and customize in the Lattice network structure generated is used as basis for estimation.The feature that the present invention utilizes CTC model output posterior probability sparse carries out effective search, thus completes the technology waking up word confidence calculations up.One aspect of the present invention can obtain higher waking up up property, i.e. high-accuracy, low false wake-up, and another aspect is the most relatively fewer to the calculating resource consumption of application system.

Description

Customizable voice awakening method and system
Technical field
The present invention relates to the technology in a kind of computer-aided control field, specifically a kind of based on long short term memory net Network
(LSTM) and connect sequential disaggregated model (CTC) customizable voice awakening method and system.
Background technology
In recent years, along with the development of information technology, research is increasingly becomed including voice at inner multimedia information technology The focus of attention.Language awakening technology is a key areas in speech recognition, and be widely used in voice command control system In system.The task that one customizable voice wakes up (Voice Wake up) system up is automatically to send out from one section of continuous print voice Now and position some order words (waking up word up) specified in advance.Customizable characteristics exhibit does not relies on waking up word detection model up What user specified wake up up word, thus realize can conveniently changing user and waking up word up without revising model.On the contrary, non-customizable call out The technology of waking up with specify to wake up word up relevant, wake up word up and fix, it is impossible to change wakes up word up easily.Voice wakes up up and continuous speech recognition In close relations, but voice awakening technology is not required for all identifying complete speech sentence, but only concerned with user refers to Fixed key message.Therefore, awakening technology reduces the requirement to the system of identification.With traditional text state document comparison, voice Data are as a kind of coding form to sound so that voice messaging becomes a kind of and is more difficult to directly examine for computer Rope and the data shape of extraction effective information.Additionally due to multiple potential factor (such as background noise, speaker's accent etc.), also Make to develop a set of effective voice to wake up system up and become more complicated and difficult.Main voice awakening technology includes in early days Dynamic time warping, method based on HMM of today, and based on the degree of depth study method.
Summary of the invention
The present invention is directed to wake up word in prior art up and cannot customize and rely on the deficiency of preset language model, proposing one can Customized voice awakening method and system, the feature utilizing CTC model output posterior probability sparse carries out effective search, thus completes To the technology waking up word confidence calculations up.One aspect of the present invention can obtain the higher performance (accuracy rate, recall rate) that wakes up up, high Accuracy rate, low false wake-up, the most relatively fewer to the calculating resource consumption of application system.
The present invention is achieved by the following technical solutions:
The present invention relates to a kind of customizable voice awakening method, comprise the following steps:
Step 1) use and based on long memory network in short-term and connection sequential disaggregated model, the phoneme information of voice messaging is entered Row modeling;
Step 2) model is trained: the voice data collected in advance and mark is first had to through traditional signal Processing method does pretreatment and extraction is available for the acoustic features of model training, model using characteristic as input, by mark Phoneme information is as output, and the method learnt by the degree of depth under mass data completes the training of model each parameter amount, and obtaining can The length memory network in short-term used and connection sequential disaggregated model;
Step 3) use training after model test: tested speech is done same pretreatment and feature extraction, and defeated Entering model, model will export all modeling unit of each frame, the posterior probability that i.e. phoneme is likely to occur;
Step 4) wake up up word search: by generate Lattice network structure on search for customize wake up word phase up As may aligned phoneme sequence be used as basis for estimation.
Described most like possible aligned phoneme sequence Hmax, by judging aligned phoneme sequence probability of occurrence in voice, i.e. go out Existing the highest and more similar to waking up word aligned phoneme sequence up the sequence of probability is more probably and wakes up the mode of word up and obtain, particularly as follows:
Wherein: P (T) is target phoneme sequence T, T={t1,t2,···,tnObservation probability, tiRepresenting the i-th phoneme in T, i is 1~n;
Target phoneme sequence T is positioned at the probability of all aligned phoneme sequence in CTC phoneme lattice structure:
P(T|LH)∝P(LH|T)P(T)≈P(Hmax| T) P (T), wherein: LHRepresent owning in CTC phoneme lattice structure Aligned phoneme sequence, and may aligned phoneme sequence HmaxBe T known time LHIn maximum probability.
P (H) is aligned phoneme sequence H, H={nij1,n(i+1)j2,···,n(i+m-1)jmObservation probability, nijFor lattice The phoneme of the i-th hurdle jth row in network, aligned phoneme sequence H can also be expressed as H={h1,h2,···,hm, wherein: hk= n(i+k-1)jk
P (H) by unigram it is assumed that i.e. the posterior probability of each phoneme obtains in continued product aligned phoneme sequence, particularly as follows:
P (T/H) is the phase between target phoneme sequence T and aligned phoneme sequence H Like degree,I.e. target phoneme sequence and assume between sequence each The long-pending index as measurement aligned phoneme sequence similarity of the probability of edit operation, MED (T, H) represents target phoneme sequence T and sound The minimum edit operation number of times of prime sequences H, P (opi| R=T, E=H) represent the conjecture phoneme sequence when being T with reference to aligned phoneme sequence R When row E is H, the i & lt edit operation op between sequence E and RiProbability.
Described edit operation refers to: inserts, delete, replacement operation, its probability, i.e. P (insert (ei))、P(delete (ri)) and P (ri/ei) directly drawn by priori, riAnd eiIt is taken respectively from reference to aligned phoneme sequence R and conjecture aligned phoneme sequence E.
The present invention relates to a kind of customizable voice realizing said method and wake up system up, including: acoustic feature extraction module, Memory network module, disaggregated model module, wake up word search module, decision-making module and threshold evaluating module up, wherein: acoustic features Extraction module is connected with memory network module and exports the acoustic features information of voice to be measured, memory network module and disaggregated model Module is connected and transmits phoneme posterior information, and disaggregated model module is connected with waking up word search module up and transmits maximum phoneme similarity Sequence, wakes up word search module up and is connected with decision-making module and exports the possible solution of voice to be measured, decision model according to the word that wakes up up received Tuber solution may draw judged result according to judgment threshold and the voice to be measured from threshold evaluating module.
The described word search module that wakes up up uses based on connecting sequential classification (Connectionist Temporal Classification, CTC) the Lattice network structure of model, wherein contain all possible voice identification result and Probability.
Technique effect
The present invention traditional method that compares mainly has a following difference:
Accompanying drawing explanation
Fig. 1 is present system structural representation;
Fig. 2 is neural network structure schematic diagram;
Giving the posterior probability that different mobile phone is corresponding in figure, the solid line in the grid of the latter half represents potential path, Dotted line represents all effective connections.
Detailed description of the invention
The present embodiment includes: acoustic feature extraction module, memory network module, disaggregated model module, wake up up word search mould Block, decision-making module and threshold evaluating module, wherein: acoustic feature extraction module is connected with memory network module and exports language to be measured The acoustic features information of sound, memory network module is connected with disaggregated model module and transmits phoneme posterior information, disaggregated model mould Block is connected with waking up word search module up and transmits maximum phoneme similarity sequence, wakes up word search module up and is connected with decision-making module and root Exporting voice to be measured may solve according to the word that wakes up up received, decision-making module is according to from the judgment threshold of threshold evaluating module and treat Survey voice and solution may draw judged result.
The present embodiment relates to the voice awakening method of said system, specifically includes following steps:
Step 1) use and based on long memory network in short-term and connection sequential disaggregated model, the phoneme information of voice messaging is entered Row modeling, concrete steps include:
1.1) model structure is determined: according to the computing capability prediction model complexity of application scenarios equipment, multiple owing to calculating Miscellaneous degree and the parameter amount positive correlation of model, therefore first have to the upper limit of setup parameter amount, such as less than 5.5M size.Then exist This specifies network structure under limiting, such as, use 3 hidden layer networks, each 256 nodes, be projected as 96 nodes.
1.2) long memory network in short-term is initialized: use the parameter of random initializtion, it is also possible to pass through model Transferring mode initializes, and after causing due to random initializtion, the problem of CTC training difficulty, recommends here The acoustics length memory network in short-term of cross entropy one standard of criterion pre-training, then at the beginning of replicating by the way of parameter Beginningization model.
Step 2) model is trained: the voice data collected in advance and mark is first had to through traditional signal Processing method does pretreatment and extraction is available for the acoustic features of model training, model using characteristic as input, by mark Phoneme information is as output, and the method learnt by the degree of depth under mass data completes the training of model each parameter amount, and obtaining can The length memory network in short-term used and connection sequential disaggregated model, concrete steps include:
2.1) training data is extracted fbank, i.e. Filter bank acoustic features.
2.2) using stochastic gradient descent mode to complete the training of model, training parameter need to be according to model structure and training Size of data sets, such as, for the model in above-mentioned example, can use the learning rate of 0.00001, momentum value is 0.9, batch Size 256 etc..
Step 3) use training after model test: tested speech is done same pretreatment and feature extraction, and defeated Enter model, the posterior probability that the output all modeling unit of each frame are likely to occur by model, concrete steps include:
3.1) test data being extracted fbank acoustic features, characteristic extraction procedure requires and training data feature extraction one Cause.
3.2) feature of extraction is inputted frame by frame the model that training obtains, directly calculate the posterior probability of each frame.
Step 4) wake up up word search: by generate lattice network structure on search for customize wake up word phase up As may aligned phoneme sequence be used as basis for estimation, concrete steps include:
4.1) each tested speech is generated lattice network structure, scan the posterior probability of each frame " blank ", when Its posteriority be less than preset value, such as 0.8, then think that this frame is a spike.After finding out all spikes in short, by time On between, continuous print spike merges into a spike, for each spike, selects the phoneme that on this frame, posterior probability is bigger, example Such as the posterior probability phoneme more than 0.005, the string in composition lattice network structure;After constructing all row, connect two-by-two Connect each node phoneme in adjacent two row in lattice, obtain required lattice network structure.
4.2) according to above-mentioned searching algorithm formula, the lattice network structure generated performs searching algorithm, Find the aligned phoneme sequence most like with target phoneme sequence.
4.3) calculate the product observing probability and similarity degree of the aligned phoneme sequence found, and do ratio with the threshold value arranged Relatively, if more than threshold value, then judging that tested speech comprises and wake up word up, do not wake up up.
The present embodiment and the comparing result such as following table of prior art:
Wherein LSTM CTC KWS is the method that the present invention proposes.Form illustrates this method and processes customizable at present Waking up the comparing result of main stream approach HMM of word up, performance indications EER are average equal error rate, FOM be false wakeups 0 to Waking up the meansigma methods of rate in the range of 10 up, EER is the smaller the better, and FOM is the bigger the better.Form also illustrate that the parameter of each model simultaneously Amount.Experiment uses the WSJ0 data set of standard, employs 50 and wake up word up and test.It will be seen that the side that the present invention proposes Method is substantially better than traditional GMM HMM, DNN HMM method, and parameter amount is less.
In sum, compared with prior art the present invention based on waking up the comparison with threshold value of the word confidence level up to discriminate whether to call out Wake up.Owing to the customizable system that wakes up up does not limit and wakes up word up, the threshold value that word used is waken up up for difference and also is difficult to unified, so this In propose and a kind of wake up the method that threshold value estimated automatically in word up for difference, solve threshold value to a certain extent and be difficult to unified asking Topic, thus promote the accuracy of system wake-up;Additionally, present invention lattice based on CTC grid scale is less, the most to the greatest extent may be used Useful information can be remained.The searching algorithm proposed on this basis, make use of aligned phoneme sequence observation probability and with target sequence Row similarity degree two aspect information, realizes effective search by dynamic programming and wakes up the target of word up, and the time of algorithm and space are multiple Miscellaneous degree is relatively low, but accuracy is the highest.
Above-mentioned be embodied as can by those skilled in the art on the premise of without departing substantially from the principle of the invention and objective with difference Mode it is carried out local directed complete set, protection scope of the present invention is as the criterion with claims and is not embodied as institute by above-mentioned Limit, each implementation in the range of it is all by the constraint of the present invention.

Claims (9)

1. a customizable voice awakening method, it is characterised in that comprise the following steps:
Step 1) use and based on long memory network in short-term and connection sequential disaggregated model, the phoneme information of voice messaging is built Mould;
Step 2) model is trained: the voice data collected in advance and mark is first had to through traditional signal processing Method does pretreatment and extraction is available for the acoustic features of model training, model using characteristic as input, will the phoneme of mark Information is as output, and the method learnt by the degree of depth under mass data completes the training of model each parameter amount, obtains using Length memory network in short-term and connect sequential disaggregated model;
Step 3) use training after model test: tested speech is done same pretreatment and feature extraction, and inputs mould Type, model will export all modeling unit of each frame, the posterior probability that i.e. phoneme is likely to occur;
Step 4) wake up word search up: by search for and customize in the Lattice network structure generated to wake up word up most like Possible aligned phoneme sequence is used as basis for estimation.
Customizable voice awakening method the most according to claim 1, is characterized in that, described most like possible phoneme sequence Row Hmax, by judge aligned phoneme sequence probability of occurrence in voice, i.e. probability of occurrence the highest and more with wake up word aligned phoneme sequence phase up As sequence be more probably and wake up the mode of word up and obtain, particularly as follows:
Wherein: P (T) is target phoneme sequence T, T= {t1,t2,···,tnObservation probability, tiRepresenting the i-th phoneme in T, i is 1~n;
Target phoneme sequence T is positioned at the probability of all aligned phoneme sequence in CTC phoneme lattice structure:
P(T|LH)∝P(LH|T)P(T)≈P(Hmax| T) P (T), wherein: LHRepresent all phonemes in CTC phoneme lattice structure Sequence, and may aligned phoneme sequence HmaxBe T known time LHIn maximum probability;
P (H) is aligned phoneme sequence H, H={nij1,n(i+1)j2,···,n(i+m-1)jmObservation probability, nijFor lattice grid The phoneme of the i-th hurdle jth row in structure, aligned phoneme sequence H can also be expressed as H={h1,h2,···,hm, wherein: hk= n(i+k-1)jk
P (H) by unigram it is assumed that i.e. the posterior probability of each phoneme obtains in continued product aligned phoneme sequence, particularly as follows:P (T/H) is the similar journey between target phoneme sequence T and aligned phoneme sequence H Degree,I.e. each editor between target phoneme sequence and hypothesis sequence The long-pending index as measurement aligned phoneme sequence similarity of the probability of operation, MED (T, H) represents target phoneme sequence T and phoneme sequence The minimum edit operation number of times of row H, P (opi| R=T, E=H) represent that conjecture aligned phoneme sequence E is when being T with reference to aligned phoneme sequence R I & lt edit operation op during H, between sequence E and RiProbability.
Customizable voice awakening method the most according to claim 2, is characterized in that, described edit operation refers to: insert, Deletion, replacement operation, its probability, i.e. P (insert (ei))、P(delete(ri)) and P (ri/ei) directly obtained by priori Go out, riAnd eiIt is taken respectively from reference to aligned phoneme sequence R and conjecture aligned phoneme sequence E.
Customizable voice awakening method the most according to claim 2, is characterized in that, described step 1 includes:
1.1) model structure is determined: according to the computing capability prediction model complexity of application scenarios equipment, first setup parameter amount The upper limit, then at this limit under specify network structure;
1.2) long memory network in short-term is initialized: use the parameter of random initializtion or model transferring mode initial Change.
Customizable voice awakening method the most according to claim 4, is characterized in that, described initialization long short term memory net Network, uses the acoustics length memory network in short-term of cross entropy one standard of criterion pre-training, then by replicating parameter Mode initialization model.
Customizable voice awakening method the most according to claim 2, is characterized in that, described step 2 includes:
2.1) training data is extracted fbank, i.e. Filter bank acoustic features;
2.2) using stochastic gradient descent mode to complete the training of model, training parameter need to be according to model structure and training data Size sets.
Customizable voice awakening method the most according to claim 2, is characterized in that, described step 3 includes:
3.1) test data being extracted fbank acoustic features, characteristic extraction procedure requires consistent with training data feature extraction;
3.2) feature of extraction is inputted frame by frame the model that training obtains, directly calculate the posterior probability of each frame.
Customizable voice awakening method the most according to claim 2, is characterized in that, described step 4 includes:
4.1) each tested speech is generated lattice network structure, scan the posterior probability that each frame is blank, when its posteriority Judge during less than preset value that spike continuous in time, as a spike, after all spikes in a word, is merged into by this frame One spike, for each spike, selects posterior probability on this frame and exceedes the phoneme of preset value to form lattice network String in structure;After constructing all row, connect each node phoneme in adjacent two row in lattice two-by-two, obtain Required lattice network structure;
4.2) in the lattice network structure generated, perform search and customization, find the sound most like with target phoneme sequence Prime sequences;
4.3) calculate the product observing probability and similarity degree of the aligned phoneme sequence found, and compare with the threshold value arranged, when More than threshold value, then judge that tested speech comprises and wake up word up, do not wake up up.
9. one kind realizes the customizable voice of arbitrary described method in claim 1~8 and wakes up system up, it is characterised in that including: Acoustic feature extraction module, memory network module, disaggregated model module, wake up word search module, decision-making module and threshold estimation up Module, wherein: acoustic feature extraction module is connected with memory network module and exports the acoustic features information of voice to be measured, memory Mixed-media network modules mixed-media is connected with disaggregated model module and transmits phoneme posterior information, and disaggregated model module is connected with waking up word search module up And transmit maximum phoneme similarity sequence, wake up word search module up and be connected with decision-making module and export to be measured according to the word that wakes up up received Voice may solve, and decision-making module solution may draw judgement knot according to judgment threshold and the voice to be measured from threshold evaluating module Really;
The described word search module that wakes up up uses based on the Lattice network structure connecting sequential disaggregated model, wherein contains All possible voice identification result and probability thereof.
CN201610462976.XA 2016-06-23 2016-06-23 Customizable voice awakening method and system Active CN106098059B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610462976.XA CN106098059B (en) 2016-06-23 2016-06-23 Customizable voice awakening method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610462976.XA CN106098059B (en) 2016-06-23 2016-06-23 Customizable voice awakening method and system

Publications (2)

Publication Number Publication Date
CN106098059A true CN106098059A (en) 2016-11-09
CN106098059B CN106098059B (en) 2019-06-18

Family

ID=57253493

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610462976.XA Active CN106098059B (en) 2016-06-23 2016-06-23 Customizable voice awakening method and system

Country Status (1)

Country Link
CN (1) CN106098059B (en)

Cited By (39)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107123417A (en) * 2017-05-16 2017-09-01 上海交通大学 Optimization method and system are waken up based on the customized voice that distinctive is trained
CN107358951A (en) * 2017-06-29 2017-11-17 阿里巴巴集团控股有限公司 A kind of voice awakening method, device and electronic equipment
CN107369439A (en) * 2017-07-31 2017-11-21 北京捷通华声科技股份有限公司 A kind of voice awakening method and device
CN107704275A (en) * 2017-09-04 2018-02-16 百度在线网络技术(北京)有限公司 Smart machine awakening method, device, server and smart machine
CN107945796A (en) * 2017-11-13 2018-04-20 百度在线网络技术(北京)有限公司 Audio recognition method, device, equipment and computer-readable medium
WO2018086033A1 (en) * 2016-11-10 2018-05-17 Nuance Communications, Inc. Techniques for language independent wake-up word detection
CN108122556A (en) * 2017-08-08 2018-06-05 问众智能信息科技(北京)有限公司 Reduce the method and device that driver's voice wakes up instruction word false triggering
CN108305619A (en) * 2017-03-10 2018-07-20 腾讯科技(深圳)有限公司 Voice data collection training method and apparatus
CN108417202A (en) * 2018-01-19 2018-08-17 苏州思必驰信息科技有限公司 Audio recognition method and system
WO2018153200A1 (en) * 2017-02-21 2018-08-30 中兴通讯股份有限公司 Hlstm model-based acoustic modeling method and device, and storage medium
CN108735202A (en) * 2017-03-13 2018-11-02 百度(美国)有限责任公司 Convolution recurrent neural network for small occupancy resource keyword retrieval
CN109545194A (en) * 2018-12-26 2019-03-29 出门问问信息科技有限公司 Wake up word pre-training method, apparatus, equipment and storage medium
CN109741735A (en) * 2017-10-30 2019-05-10 阿里巴巴集团控股有限公司 The acquisition methods and device of a kind of modeling method, acoustic model
CN109754789A (en) * 2017-11-07 2019-05-14 北京国双科技有限公司 The recognition methods of phoneme of speech sound and device
CN109767763A (en) * 2018-12-25 2019-05-17 苏州思必驰信息科技有限公司 It is customized wake up word determination method and for determine it is customized wake up word device
US10388276B2 (en) * 2017-05-16 2019-08-20 Baidu Online Network Technology (Beijing) Co., Ltd. Method and device for waking up via speech based on artificial intelligence and computer device
CN110189748A (en) * 2019-05-31 2019-08-30 百度在线网络技术(北京)有限公司 Model building method and device
CN110782898A (en) * 2018-07-12 2020-02-11 北京搜狗科技发展有限公司 End-to-end voice awakening method and device and computer equipment
CN111128134A (en) * 2018-10-11 2020-05-08 阿里巴巴集团控股有限公司 Acoustic model training method, voice awakening method, device and electronic equipment
CN111128172A (en) * 2019-12-31 2020-05-08 达闼科技成都有限公司 Voice recognition method, electronic equipment and storage medium
CN111276127A (en) * 2020-03-31 2020-06-12 北京字节跳动网络技术有限公司 Voice awakening method and device, storage medium and electronic equipment
CN111554288A (en) * 2020-04-27 2020-08-18 北京猎户星空科技有限公司 Awakening method and device of intelligent device, electronic device and medium
CN111599350A (en) * 2020-04-07 2020-08-28 云知声智能科技股份有限公司 Command word customization identification method and system
CN111862963A (en) * 2019-04-12 2020-10-30 阿里巴巴集团控股有限公司 Voice wake-up method, device and equipment
CN111883121A (en) * 2020-07-20 2020-11-03 北京声智科技有限公司 Awakening method and device and electronic equipment
CN111916068A (en) * 2019-05-07 2020-11-10 北京地平线机器人技术研发有限公司 Audio detection method and device
CN112447171A (en) * 2019-08-15 2021-03-05 马思明 System and method for providing customized wake phrase training
WO2021093449A1 (en) * 2019-11-14 2021-05-20 腾讯科技(深圳)有限公司 Wakeup word detection method and apparatus employing artificial intelligence, device, and medium
CN112837694A (en) * 2021-01-29 2021-05-25 青岛海尔科技有限公司 Equipment awakening method and device, storage medium and electronic device
US11087750B2 (en) 2013-03-12 2021-08-10 Cerence Operating Company Methods and apparatus for detecting a voice command
CN113314104A (en) * 2021-05-31 2021-08-27 北京市商汤科技开发有限公司 Interactive object driving and phoneme processing method, device, equipment and storage medium
CN114038457A (en) * 2021-11-04 2022-02-11 北京房江湖科技有限公司 Method, electronic device, storage medium, and program for voice wakeup
US11295741B2 (en) 2019-12-05 2022-04-05 Soundhound, Inc. Dynamic wakewords for speech-enabled devices
US11437020B2 (en) 2016-02-10 2022-09-06 Cerence Operating Company Techniques for spatially selective wake-up word recognition and related systems and methods
CN115223574A (en) * 2022-07-15 2022-10-21 北京百度网讯科技有限公司 Voice information processing method, model training method, awakening method and device
US11530930B2 (en) 2017-09-19 2022-12-20 Volkswagen Aktiengesellschaft Transportation vehicle control with phoneme generation
US11600269B2 (en) 2016-06-15 2023-03-07 Cerence Operating Company Techniques for wake-up word recognition and related systems and methods
WO2023029615A1 (en) * 2021-08-30 2023-03-09 华为技术有限公司 Wake-on-voice method and apparatus, device, storage medium, and program product
CN115862604A (en) * 2022-11-24 2023-03-28 镁佳(北京)科技有限公司 Voice wakeup model training and voice wakeup method, device and computer equipment

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103021409A (en) * 2012-11-13 2013-04-03 安徽科大讯飞信息科技股份有限公司 Voice activating photographing system
CN103095911A (en) * 2012-12-18 2013-05-08 苏州思必驰信息科技有限公司 Method and system for finding mobile phone through voice awakening
CN103956164A (en) * 2014-05-20 2014-07-30 苏州思必驰信息科技有限公司 Voice awakening method and system
CN104538031A (en) * 2014-12-15 2015-04-22 北京云知声信息技术有限公司 Intelligent voice service development cloud platform and method
CN104620314A (en) * 2012-04-26 2015-05-13 纽昂斯通讯公司 Embedded system for construction of small footprint speech recognition with user-definable constraints
CN104732978A (en) * 2015-03-12 2015-06-24 上海交通大学 Text-dependent speaker recognition method based on joint deep learning
US20150340034A1 (en) * 2014-05-22 2015-11-26 Google Inc. Recognizing speech using neural networks
US9263036B1 (en) * 2012-11-29 2016-02-16 Google Inc. System and method for speech recognition using deep recurrent neural networks
CN105551483A (en) * 2015-12-11 2016-05-04 百度在线网络技术(北京)有限公司 Speech recognition modeling method and speech recognition modeling device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104620314A (en) * 2012-04-26 2015-05-13 纽昂斯通讯公司 Embedded system for construction of small footprint speech recognition with user-definable constraints
CN103021409A (en) * 2012-11-13 2013-04-03 安徽科大讯飞信息科技股份有限公司 Voice activating photographing system
US9263036B1 (en) * 2012-11-29 2016-02-16 Google Inc. System and method for speech recognition using deep recurrent neural networks
CN103095911A (en) * 2012-12-18 2013-05-08 苏州思必驰信息科技有限公司 Method and system for finding mobile phone through voice awakening
CN103956164A (en) * 2014-05-20 2014-07-30 苏州思必驰信息科技有限公司 Voice awakening method and system
US20150340034A1 (en) * 2014-05-22 2015-11-26 Google Inc. Recognizing speech using neural networks
CN104538031A (en) * 2014-12-15 2015-04-22 北京云知声信息技术有限公司 Intelligent voice service development cloud platform and method
CN104732978A (en) * 2015-03-12 2015-06-24 上海交通大学 Text-dependent speaker recognition method based on joint deep learning
CN105551483A (en) * 2015-12-11 2016-05-04 百度在线网络技术(北京)有限公司 Speech recognition modeling method and speech recognition modeling device

Cited By (60)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11676600B2 (en) 2013-03-12 2023-06-13 Cerence Operating Company Methods and apparatus for detecting a voice command
US11087750B2 (en) 2013-03-12 2021-08-10 Cerence Operating Company Methods and apparatus for detecting a voice command
US11393461B2 (en) 2013-03-12 2022-07-19 Cerence Operating Company Methods and apparatus for detecting a voice command
US11437020B2 (en) 2016-02-10 2022-09-06 Cerence Operating Company Techniques for spatially selective wake-up word recognition and related systems and methods
US11600269B2 (en) 2016-06-15 2023-03-07 Cerence Operating Company Techniques for wake-up word recognition and related systems and methods
WO2018086033A1 (en) * 2016-11-10 2018-05-17 Nuance Communications, Inc. Techniques for language independent wake-up word detection
US11545146B2 (en) 2016-11-10 2023-01-03 Cerence Operating Company Techniques for language independent wake-up word detection
WO2018153200A1 (en) * 2017-02-21 2018-08-30 中兴通讯股份有限公司 Hlstm model-based acoustic modeling method and device, and storage medium
CN108305619B (en) * 2017-03-10 2020-08-04 腾讯科技(深圳)有限公司 Voice data set training method and device
CN108305619A (en) * 2017-03-10 2018-07-20 腾讯科技(深圳)有限公司 Voice data collection training method and apparatus
WO2018161763A1 (en) * 2017-03-10 2018-09-13 腾讯科技(深圳)有限公司 Training method for voice data set, computer device and computer readable storage medium
CN108735202A (en) * 2017-03-13 2018-11-02 百度(美国)有限责任公司 Convolution recurrent neural network for small occupancy resource keyword retrieval
CN108735202B (en) * 2017-03-13 2023-04-07 百度(美国)有限责任公司 Convolutional recurrent neural network for small-occupied resource keyword retrieval
CN107123417B (en) * 2017-05-16 2020-06-09 上海交通大学 Customized voice awakening optimization method and system based on discriminant training
US10388276B2 (en) * 2017-05-16 2019-08-20 Baidu Online Network Technology (Beijing) Co., Ltd. Method and device for waking up via speech based on artificial intelligence and computer device
CN107123417A (en) * 2017-05-16 2017-09-01 上海交通大学 Optimization method and system are waken up based on the customized voice that distinctive is trained
WO2019001428A1 (en) * 2017-06-29 2019-01-03 阿里巴巴集团控股有限公司 Voice wake-up method and device and electronic device
US10748524B2 (en) 2017-06-29 2020-08-18 Alibaba Group Holding Limited Speech wakeup method, apparatus, and electronic device
TWI692751B (en) * 2017-06-29 2020-05-01 香港商阿里巴巴集團服務有限公司 Voice wake-up method, device and electronic equipment
CN107358951A (en) * 2017-06-29 2017-11-17 阿里巴巴集团控股有限公司 A kind of voice awakening method, device and electronic equipment
CN107369439A (en) * 2017-07-31 2017-11-21 北京捷通华声科技股份有限公司 A kind of voice awakening method and device
CN108122556A (en) * 2017-08-08 2018-06-05 问众智能信息科技(北京)有限公司 Reduce the method and device that driver's voice wakes up instruction word false triggering
CN107704275A (en) * 2017-09-04 2018-02-16 百度在线网络技术(北京)有限公司 Smart machine awakening method, device, server and smart machine
US11530930B2 (en) 2017-09-19 2022-12-20 Volkswagen Aktiengesellschaft Transportation vehicle control with phoneme generation
CN109741735A (en) * 2017-10-30 2019-05-10 阿里巴巴集团控股有限公司 The acquisition methods and device of a kind of modeling method, acoustic model
CN109741735B (en) * 2017-10-30 2023-09-01 阿里巴巴集团控股有限公司 Modeling method, acoustic model acquisition method and acoustic model acquisition device
CN109754789A (en) * 2017-11-07 2019-05-14 北京国双科技有限公司 The recognition methods of phoneme of speech sound and device
CN109754789B (en) * 2017-11-07 2021-06-08 北京国双科技有限公司 Method and device for recognizing voice phonemes
CN107945796A (en) * 2017-11-13 2018-04-20 百度在线网络技术(北京)有限公司 Audio recognition method, device, equipment and computer-readable medium
CN108417202A (en) * 2018-01-19 2018-08-17 苏州思必驰信息科技有限公司 Audio recognition method and system
CN110782898B (en) * 2018-07-12 2024-01-09 北京搜狗科技发展有限公司 End-to-end voice awakening method and device and computer equipment
CN110782898A (en) * 2018-07-12 2020-02-11 北京搜狗科技发展有限公司 End-to-end voice awakening method and device and computer equipment
CN111128134A (en) * 2018-10-11 2020-05-08 阿里巴巴集团控股有限公司 Acoustic model training method, voice awakening method, device and electronic equipment
CN109767763A (en) * 2018-12-25 2019-05-17 苏州思必驰信息科技有限公司 It is customized wake up word determination method and for determine it is customized wake up word device
CN109767763B (en) * 2018-12-25 2021-01-26 苏州思必驰信息科技有限公司 Method and device for determining user-defined awakening words
CN109545194A (en) * 2018-12-26 2019-03-29 出门问问信息科技有限公司 Wake up word pre-training method, apparatus, equipment and storage medium
CN111862963A (en) * 2019-04-12 2020-10-30 阿里巴巴集团控股有限公司 Voice wake-up method, device and equipment
CN111916068A (en) * 2019-05-07 2020-11-10 北京地平线机器人技术研发有限公司 Audio detection method and device
CN110189748B (en) * 2019-05-31 2021-06-11 百度在线网络技术(北京)有限公司 Model construction method and device
CN110189748A (en) * 2019-05-31 2019-08-30 百度在线网络技术(北京)有限公司 Model building method and device
CN112447171A (en) * 2019-08-15 2021-03-05 马思明 System and method for providing customized wake phrase training
WO2021093449A1 (en) * 2019-11-14 2021-05-20 腾讯科技(深圳)有限公司 Wakeup word detection method and apparatus employing artificial intelligence, device, and medium
US11848008B2 (en) 2019-11-14 2023-12-19 Tencent Technology (Shenzhen) Company Limited Artificial intelligence-based wakeup word detection method and apparatus, device, and medium
US11948571B2 (en) 2019-12-05 2024-04-02 Soundhound Ai Ip, Llc Wakeword selection
US11295741B2 (en) 2019-12-05 2022-04-05 Soundhound, Inc. Dynamic wakewords for speech-enabled devices
CN111128172A (en) * 2019-12-31 2020-05-08 达闼科技成都有限公司 Voice recognition method, electronic equipment and storage medium
CN111276127A (en) * 2020-03-31 2020-06-12 北京字节跳动网络技术有限公司 Voice awakening method and device, storage medium and electronic equipment
CN111599350B (en) * 2020-04-07 2023-02-28 云知声智能科技股份有限公司 Command word customization identification method and system
CN111599350A (en) * 2020-04-07 2020-08-28 云知声智能科技股份有限公司 Command word customization identification method and system
CN111554288A (en) * 2020-04-27 2020-08-18 北京猎户星空科技有限公司 Awakening method and device of intelligent device, electronic device and medium
CN111883121A (en) * 2020-07-20 2020-11-03 北京声智科技有限公司 Awakening method and device and electronic equipment
CN112837694A (en) * 2021-01-29 2021-05-25 青岛海尔科技有限公司 Equipment awakening method and device, storage medium and electronic device
CN113314104A (en) * 2021-05-31 2021-08-27 北京市商汤科技开发有限公司 Interactive object driving and phoneme processing method, device, equipment and storage medium
CN113314104B (en) * 2021-05-31 2023-06-20 北京市商汤科技开发有限公司 Interactive object driving and phoneme processing method, device, equipment and storage medium
WO2023029615A1 (en) * 2021-08-30 2023-03-09 华为技术有限公司 Wake-on-voice method and apparatus, device, storage medium, and program product
CN114038457A (en) * 2021-11-04 2022-02-11 北京房江湖科技有限公司 Method, electronic device, storage medium, and program for voice wakeup
CN115223574B (en) * 2022-07-15 2023-11-24 北京百度网讯科技有限公司 Voice information processing method, model training method, awakening method and device
CN115223574A (en) * 2022-07-15 2022-10-21 北京百度网讯科技有限公司 Voice information processing method, model training method, awakening method and device
CN115862604A (en) * 2022-11-24 2023-03-28 镁佳(北京)科技有限公司 Voice wakeup model training and voice wakeup method, device and computer equipment
CN115862604B (en) * 2022-11-24 2024-02-20 镁佳(北京)科技有限公司 Voice awakening model training and voice awakening method and device and computer equipment

Also Published As

Publication number Publication date
CN106098059B (en) 2019-06-18

Similar Documents

Publication Publication Date Title
CN106098059B (en) Customizable voice awakening method and system
CN110491416B (en) Telephone voice emotion analysis and identification method based on LSTM and SAE
US11335347B2 (en) Multiple classifications of audio data
CN108346436B (en) Voice emotion detection method and device, computer equipment and storage medium
CN104903954B (en) The speaker verification distinguished using the sub- phonetic unit based on artificial neural network and identification
CN106683661B (en) Role separation method and device based on voice
Carlin et al. Rapid evaluation of speech representations for spoken term discovery
Agarwalla et al. Machine learning based sample extraction for automatic speech recognition using dialectal Assamese speech
CN105261357B (en) Sound end detecting method based on statistical model and device
CN109065032B (en) External corpus speech recognition method based on deep convolutional neural network
CN108831445A (en) Sichuan dialect recognition methods, acoustic training model method, device and equipment
CN108281137A (en) A kind of universal phonetic under whole tone element frame wakes up recognition methods and system
CN108711421A (en) A kind of voice recognition acoustic model method for building up and device and electronic equipment
CN110211594B (en) Speaker identification method based on twin network model and KNN algorithm
CN110349597B (en) Voice detection method and device
CN103400577A (en) Acoustic model building method and device for multi-language voice identification
CN105096941A (en) Voice recognition method and device
CN111653275B (en) Method and device for constructing voice recognition model based on LSTM-CTC tail convolution and voice recognition method
JP2019159654A (en) Time-series information learning system, method, and neural network model
CN102810311B (en) Speaker estimation method and speaker estimation equipment
CN110517664A (en) Multi-party speech recognition methods, device, equipment and readable storage medium storing program for executing
CN110992959A (en) Voice recognition method and system
CN111477220A (en) Neural network speech recognition method and system for household spoken language environment
CN113178193A (en) Chinese self-defined awakening and Internet of things interaction method based on intelligent voice chip
CN112037772B (en) Response obligation detection method, system and device based on multiple modes

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20200617

Address after: Room 105G, 199 GuoShoujing Road, Pudong New Area, Shanghai, 200120

Patentee after: Shanghai Jiaotong University Intellectual Property Management Co.,Ltd.

Address before: 200240 Dongchuan Road, Shanghai, No. 800, No.

Patentee before: SHANGHAI JIAO TONG University

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20201026

Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Patentee after: AI SPEECH Ltd.

Address before: Room 105G, 199 GuoShoujing Road, Pudong New Area, Shanghai, 200120

Patentee before: Shanghai Jiaotong University Intellectual Property Management Co.,Ltd.

CP01 Change in the name or title of a patent holder
CP01 Change in the name or title of a patent holder

Address after: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Patentee after: Sipic Technology Co.,Ltd.

Address before: 215123 14 Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou, Jiangsu.

Patentee before: AI SPEECH Ltd.