CN109036412A - voice awakening method and system - Google Patents

voice awakening method and system Download PDF

Info

Publication number
CN109036412A
CN109036412A CN201811081600.XA CN201811081600A CN109036412A CN 109036412 A CN109036412 A CN 109036412A CN 201811081600 A CN201811081600 A CN 201811081600A CN 109036412 A CN109036412 A CN 109036412A
Authority
CN
China
Prior art keywords
data
voice
acoustic feature
feature information
audio data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201811081600.XA
Other languages
Chinese (zh)
Inventor
王欢良
鄢楷强
张宏阳
沈旭晖
马殿昌
李显光
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Suzhou Qdreamer Network Science And Technology Co Ltd
Original Assignee
Suzhou Qdreamer Network Science And Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Suzhou Qdreamer Network Science And Technology Co Ltd filed Critical Suzhou Qdreamer Network Science And Technology Co Ltd
Priority to CN201811081600.XA priority Critical patent/CN109036412A/en
Publication of CN109036412A publication Critical patent/CN109036412A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The present invention relates to a kind of speech recognition awakening method and systems, and wherein method to obtain the corresponding speech frame of the raw tone, and extracts the acoustic feature information of the speech frame the following steps are included: to primary voice data framing windowing operation;It carries out the acoustic feature information that deep neural network disaggregated model is calculated;Live voice data is enrolled, extracts the corresponding acoustic feature information of the live voice data, and by the corresponding acoustic feature information input of the live voice data to the deep neural network disaggregated model, to obtain posterior probability information;And by described compared with pre-set threshold, when the confidence level is greater than the pre-set threshold, wake-up voice recording device.The above method effectively promotes the wake-up performance under noise scene;The simulation such as word speed, pitch, volume is carried out to initial data, effectively promotes wake-up system to the adaptability of different speakers.

Description

Voice awakening method and system
Technical field
The present invention relates to field of speech recognition, more particularly to a kind of voice awakening method and system.
Background technique
Voice awakening technology is an important branch in field of speech recognition, is widely used in mobile phone terminal, intelligence In the voice interactive systems such as household, vehicle mounted guidance, user-friendly phonetic order wake-up device.More specifically, voice wakes up The task of system is to detect some wake-up word predetermined automatically from the voice received incessantly from the background, generally Also referred to as keyword detection (Keyword Spotting, KWS), when system detection is to corresponding keyword, equipment is called out It wakes up, and enters specific working condition.
Currently, mainly evaluating the performance that a voice wakes up system using two indices: one is accidentally to refuse rate (False Reject Rate, FRR), refer to that system will wake up the probability of word missing inspection;One is false alarm rate (False Alarm Rate, FAR), Finger system is by non-wake-up word misrecognition at the probability for waking up word, also referred to as false wake-up rate.False wake-up rate generally also can be used separately One index is measured, i.e., the false wake-up number occurred whithin a period of time, and such as 1 time/12 hours.Theoretically, rate and mistake are accidentally refused Alert rate is a pair of conflicting index: accidentally refusing rate to reduce, false alarm rate is likely to rise;On the contrary, if being missed to reduce Alert rate, accidentally the rate of refusing is also likely to rise.
One voice of good performance, which wakes up system and should be provided simultaneously with, lower accidentally refuses rate and lower false alarm rate: especially It is in fields such as smart homes, excessively high false alarm rate will influence normal communication, leisure or the amusement of user to a certain extent, recruit It applies the dislike at family;And on the other hand, under the complex scenes such as common far field, noise, the excessively high accidentally rate of refusing be will be greatly reduced The actual use of intelligent sound equipment is experienced.How under the premise of controlling lower false alarm rate, reduce as far as possible various multiple Rate is accidentally refused under miscellaneous scene, the robustness that wake-up system changes the word speed of speaker, accent is improved, is one urgently to be resolved Problem.
Summary of the invention
Based on this, it is necessary under the premise of the lower false alarm rate of above-mentioned control, how to reduce various complicated fields as far as possible The problem of accidentally refusing rate under scape, and how improving the robustness that wake-up system changes the word speed of speaker, accent, provides one kind Voice awakening method and system.
A kind of voice awakening method, comprising the following steps:
The corresponding environmental audio data of scene applied by typing original audio data and acquisition voice recording device, according to The original audio data is converted environment speech simulation data by environmental audio data;
Framing windowing operation is carried out to primary voice data and/or analog voice data, with obtain the raw tone and/ Or the corresponding speech frame of analog voice, and extract the acoustic feature information of the speech frame;
The acoustic feature information is calculated, to obtain wake-up word class that the speech frame is included at least and non- Wake up the deep neural network disaggregated model of word class;
Live voice data is enrolled, extracts the corresponding acoustic feature information of the live voice data, and this is described existing Voice data corresponding acoustic feature information input in field is to the deep neural network disaggregated model, to obtain the field speech The posterior probability information of data;
According to the posterior probability information calculate it is described admission live voice data confidence level, and by the confidence level with Pre-set threshold compares, when the confidence level is greater than the pre-set threshold, wake-up voice recording device, when the confidence level is small In the pre-set threshold, voice recording device is not waken up and further obtains user instruction.
In a wherein preferred embodiment, answered in the typing original audio data and acquisition voice recording device The corresponding environmental audio data of scene convert environment voice mould for the original audio data according to environmental audio data In the step of quasi- data, the environment speech simulation data include the noise simulation to original audio data, word speed simulation, reverberation The one of them or multinomial of simulation, tone and loudness simulation.
In a wherein preferred embodiment, primary voice data and the progress framing of analog voice data are added described Window operation, to obtain the raw tone and/or the corresponding speech frame of analog voice, and extracts the acoustic feature of the speech frame After the step of information, further includes:
Denoising is carried out to the acoustic feature information of the speech frame.
In a wherein preferred embodiment, in the admission live voice data, the live voice data is extracted Corresponding acoustic feature information, and by the corresponding acoustic feature information input of the live voice data to the depth nerve Network class model, the step of to obtain the posterior probability information of the live voice data in, further includes:
Denoising is carried out to the corresponding acoustic feature information of the field data.
Above-mentioned speech recognition awakening method in present embodiment can effectively promote the wake-up performance under noise scene, solution Robustness problem of the system in word speed, the accent variation of speaker is certainly waken up, the reality of intelligent sound equipment is substantially improved Usage experience.The simulation such as word speed, pitch, volume is carried out to initial data, wake-up system is effectively promoted and different speakers is fitted Ying Xing.
A kind of voice wake-up system, comprising:
Voice data analog module to typing original audio data and obtains scene pair applied by voice recording device The environmental audio data answered convert environment speech simulation data for the original audio data according to environmental audio data;
Characteristic extracting module, to primary voice data and/or analog voice data framing windowing operation, to obtain Raw tone and/or the corresponding speech frame of analog voice are stated, and extracts the acoustic feature information of the speech frame;
Depth network neural module, to calculate the acoustic feature information, to obtain the speech frame institute extremely Include less wakes up word class and the non-deep neural network disaggregated model for waking up word class;
It wakes up decision-making module and extracts the corresponding acoustic feature of the live voice data to enroll live voice data Information, and by the corresponding acoustic feature information input of the live voice data to the deep neural network disaggregated model, To obtain the posterior probability information of the live voice data, the admission field speech is calculated according to the posterior probability information Data confidence, and by the confidence level compared with pre-set threshold, when the confidence level is greater than the pre-set threshold, wake-up Voice recording device.
In a wherein preferred embodiment, answered in the typing original audio data and acquisition voice recording device The corresponding environmental audio data of scene convert environment voice mould for the original audio data according to environmental audio data In the step of quasi- data, the environment speech simulation data include the noise simulation to original audio data, word speed simulation, reverberation The one of them or multinomial of simulation, tone and loudness simulation.
In a wherein preferred embodiment, the system also includes:
Denoising carries out denoising from coding module, to the acoustic feature information to the speech frame.
In a wherein preferred embodiment, decision-making module is waken up further include:
Unit is denoised, to carry out denoising to the corresponding acoustic feature information of the data.
Above-mentioned speech recognition in present embodiment, which wakes up system, can effectively promote the wake-up performance under noise scene, solve Robustness problem of the system in word speed, the accent variation of speaker is certainly waken up, the reality of intelligent sound equipment is substantially improved Usage experience.The simulation such as word speed, pitch, volume is carried out to initial data, wake-up system is effectively promoted and different speakers is fitted Ying Xing.
Detailed description of the invention
Fig. 1 is a kind of flow chart of voice awakening method of a preferred embodiment of the invention;
Fig. 2 is that a kind of voice of a preferred embodiment of the invention wakes up the module diagram of system.
Specific embodiment
In order to make the objectives, technical solutions, and advantages of the present invention clearer, with reference to the accompanying drawings and embodiments, right The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.
It should be noted that it can directly on the other element when element is referred to as " being set to " another element Or there may also be elements placed in the middle.When an element is considered as " connection " another element, it, which can be, is directly connected to To another element or it may be simultaneously present centering elements.Term as used herein " vertical ", " horizontal ", " left side ", " right side " and similar statement for illustrative purposes only, are not meant to be the only embodiment.
Unless otherwise defined, all technical and scientific terms used herein and belong to technical field of the invention The normally understood meaning of technical staff is identical.Term as used herein in the specification of the present invention is intended merely to description tool The purpose of the embodiment of body, it is not intended that in the limitation present invention.Term " and or " used herein includes one or more phases Any and all combinations of the listed item of pass.
As shown in Figure 1, a kind of voice awakening method of a preferred embodiment of the invention, method includes the following steps:
S10: the corresponding environmental audio data of scene applied by typing original audio data and acquisition voice recording device, Environment speech simulation data are converted by the original audio data according to environmental audio data.
In this step, the original audio data that operator can be original, clean by the typing in voice recording device, And the environmental factors such as noise, word speed, reverberation, tone and the loudness of place scene of the voice recording device are simulated, it will Above-mentioned original audio data is converted into environment speech simulation data.
S20: to primary voice data and/or analog voice data framing windowing operation, with obtain the raw tone and/ Or the corresponding speech frame of analog voice, and extract the acoustic feature information of the speech frame;
Extracting corresponding voice to primary voice data and/or analog voice data in the way of framing adding window Frame, and extract the acoustic feature information of the speech frame.
Then feature extraction carried out to above-mentioned each speech frame, in present embodiment, above-mentioned phonetic feature can be filtering Device group (filter bank, i.e. fbank), or other phonetic features, this is not limited by the present invention.
S30: denoising is carried out to the acoustic feature information of the speech frame.
In this step, denoising is carried out to the acoustic feature information of above-mentioned speech frame.Specifically, noise simulation voice The corresponding raw tone feature of feature can be used for training denoising self-encoding encoder: present embodiment uses full Connection Neural Network Construction denoising self-encoding encoder, according to the operational capability of system, usually using layer 2-3 hidden layer network, every layer includes 256 or 512 Node, and according to mean square error (Mean-Square Error, MSE) minimize criterion, by the way of stochastic gradient descent into The training of row denoising self-encoding encoder.
S40: the acoustic feature information is calculated, to obtain the wake-up word class that the speech frame is included at least And the non-deep neural network disaggregated model for waking up word class;
It is original using the generation of large vocabulary Continuous Speech Recognition System firstly, for the acoustic feature information of above-mentioned speech frame Audio data and corresponding pressure alignment information (phoneme level or syllable grade) of environment speech simulation data, and non-wake-up word is relevant Phoneme or syllable are uniformly labeled as filler, and in present embodiment, the acoustic feature of the above-mentioned speech frame of above-mentioned steps inputs convolution The voice of neural network wakes up model, and is based on cross entropy criterion, under a large amount of data by way of stochastic gradient descent It is trained, final optimization pass obtains the corresponding deep neural network disaggregated model of acoustic feature information of above-mentioned speech frame.
In addition to above-mentioned convolutional neural networks, above-mentioned depth network class model can also be full Connection Neural Network, time delay Neural network etc..
S50: admission live voice data extracts the corresponding acoustic feature information of the live voice data, and by the institute The corresponding acoustic feature information input of live voice data is stated to the deep neural network disaggregated model, to obtain the scene The posterior probability information of voice data;
In this step, live voice data is enrolled, which can be tested speech, or true language Sound data, this is not limited by the present invention, the extraction of acoustic feature information is carried out to the live voice data of the typing, and should Deep neural network disaggregated model in the corresponding acoustic feature information input above-mentioned steps of live voice data, obtains the scene The posterior probability information of non-wake-up word class and wake-up word class that the corresponding acoustic feature information of voice data is included;
It can also include that denoising is carried out to admission live voice data, specific processing mode and S30 are walked in this step Identical to the mode of the denoising of the first acoustic feature information in rapid, the present invention repeats no more this.
S60: the admission live voice data is calculated in depth network class model according to the posterior probability information Confidence level when the confidence level is greater than the pre-set threshold, wake up voice input and by described compared with pre-set threshold Equipment.
In this step, according to the distribution of the posterior probability information of live voice data in above-mentioned steps, and then enrolled Live voice data confidence level, and the confidence level is compared with pre-set threshold, judging result is obtained, when wake-up word confidence level When greater than pre-set threshold, wake up speech ciphering equipment, otherwise, when the confidence level be less than the pre-set threshold, do not wake up voice Recording device simultaneously further obtains user instruction.
Above-mentioned voice awakening method in present embodiment can effectively promote the wake-up performance under noise scene, solve to call out Robustness problem of the system of waking up in word speed, the accent variation of speaker, substantially improves the actual use of intelligent sound equipment Experience.The simulation such as word speed, pitch, volume is carried out to initial data, effectively promotes wake-up system to the adaptability of different speakers.
As shown in Fig. 2, another preferred embodiment of the present invention, which discloses a kind of speech recognition, wakes up system described in system 100 Including voice data analog module 110, characteristic extracting module 120, deep neural network module 130, wake up decision-making module 140.
Above-mentioned voice data analog module 110 is to typing original audio data and obtains applied by voice recording device The corresponding environmental audio data of scene convert environment speech simulation number for the original audio data according to environmental audio data According to.
Operator can be original, clean by 110 typing of voice data analog module original audio data, and to this The environmental factors such as noise, word speed, reverberation, tone and the loudness of place scene of voice recording device are simulated, by above-mentioned original Beginning audio data is converted into environment speech simulation data.
Features described above extraction module 120 to primary voice data and/or analog voice data framing windowing operation, with The raw tone and/or the corresponding speech frame of analog voice are obtained, and extracts the acoustic feature information of the speech frame.
Features described above extraction module 120 in the way of framing adding window to primary voice data and/or analog voice Data extract corresponding speech frame, and extract the acoustic feature information of the speech frame.Then to above-mentioned each speech frame into Row feature extraction, in present embodiment, above-mentioned phonetic feature can be filter group (filter bank, i.e. fbank), can also Think other phonetic features, this is not limited by the present invention.
This system can also include denoising from coding module 150, and denoising is from coding module 150 to the speech frame Acoustic feature information carries out denoising.
It denoises from acoustic feature information of the coding module to above-mentioned speech frame and carries out denoising.Specifically, pass through noise The corresponding raw tone feature of the feature of analog voice can be used for training denoising self-encoding encoder: present embodiment uses to be connected entirely Neural network configuration denoising self-encoding encoder is connect, according to the operational capability of system, usually using layer 2-3 hidden layer network, every layer includes 256 or 512 nodes, and criterion is minimized according to mean square error (Mean-Square Error, MSE), using under stochastic gradient The mode of drop carries out the training of denoising self-encoding encoder.
Depth network neural module 130 is to calculate the acoustic feature information, to obtain the speech frame institute What is included at least wakes up word class and the non-deep neural network disaggregated model for waking up word class.
Firstly, depth network neural module 130 is continuous using large vocabulary for the acoustic feature information of above-mentioned speech frame Speech recognition system generates original audio data and corresponding pressure alignment information (phoneme level or the syllable of environment speech simulation data Grade), and the relevant phoneme of non-wake-up word or syllable are uniformly labeled as filler, in present embodiment, above-mentioned speech frame Acoustic feature input convolutional neural networks voice wake up model, and be based on cross entropy criterion, pass through under a large amount of data The mode of stochastic gradient descent is trained, and final optimization pass obtains the corresponding depth nerve of acoustic feature information of above-mentioned speech frame Network class model.
Decision-making module 140 is waken up to enroll live voice data, it is special to extract the corresponding acoustics of the live voice data Reference breath, and the corresponding acoustic feature information input of the live voice data to the deep neural network is classified mould Type calculates the admission scene according to the posterior probability information to obtain the posterior probability information of the live voice data The confidence level of voice data, and by described compared with pre-set threshold, when the confidence level is greater than the pre-set threshold, wake-up Voice recording device., when the confidence level be less than the pre-set threshold, do not wake up voice recording device and further obtain use Family instruction
It wakes up decision-making module 140 and enrolls live voice data, which can be tested speech, or Real speech data, this is not limited by the present invention, and the extraction of acoustic feature information is carried out to the live voice data of the typing, And by the deep neural network disaggregated model in the corresponding acoustic feature information input above-mentioned steps of the live voice data, obtain The posterior probability letter of non-wake-up word class and wake-up word class that the corresponding acoustic feature information of the live voice data is included Breath.
Waking up decision-making module can also include denoising unit, to go to the corresponding acoustic feature information of the data It makes an uproar processing.Denoising, mode phase of the specific processing mode denoising from coding module 150 are carried out to admission live voice data Together, the present invention repeats no more this.
According to the distribution of the posterior probability information of live voice data in above-mentioned steps, and then obtain admission field speech number According to confidence level, and the confidence level is compared with pre-set threshold, obtains judging result, when wake up word confidence level be greater than it is default When determining threshold value, speech ciphering equipment is waken up, otherwise, speech ciphering equipment is not made accordingly.
Above-mentioned speech recognition in present embodiment, which wakes up system, can effectively promote the wake-up performance under noise scene, solve Robustness problem of the system in word speed, the accent variation of speaker is certainly waken up, the reality of intelligent sound equipment is substantially improved Usage experience.The simulation such as word speed, pitch, volume is carried out to initial data, wake-up system is effectively promoted and different speakers is fitted Ying Xing.
Each technical characteristic of embodiment described above can be combined arbitrarily, for simplicity of description, not to above-mentioned reality It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited In contradiction, all should be considered as described in this specification.
The embodiments described above only express several embodiments of the present invention, and the description thereof is more specific and detailed, but simultaneously It cannot therefore be construed as limiting the scope of the patent.It should be pointed out that coming for those of ordinary skill in the art It says, without departing from the inventive concept of the premise, various modifications and improvements can be made, these belong to protection of the invention Range.Therefore, the scope of protection of the patent of the invention shall be subject to the appended claims.

Claims (8)

1. a kind of voice awakening method, which comprises the following steps:
The corresponding environmental audio data of scene applied by typing original audio data and acquisition voice recording device, according to environment The original audio data is converted environment speech simulation data by audio data;
Framing windowing operation is carried out to primary voice data and/or analog voice data, to obtain the raw tone and/or mould The quasi- corresponding speech frame of voice, and extract the acoustic feature information of the speech frame;
The acoustic feature information is calculated, to obtain the wake-up word class and non-wake-up that the speech frame is included at least The deep neural network disaggregated model of word class;
Live voice data is enrolled, extracts the corresponding acoustic feature information of the live voice data, and by the live language The corresponding acoustic feature information input of sound data is to the deep neural network disaggregated model, to obtain the live voice data Posterior probability information;
The confidence level of the admission live voice data is calculated according to the posterior probability information, and by the confidence level and is preset Threshold value comparison is determined, when the confidence level is greater than the pre-set threshold, wake-up voice recording device, when the confidence level is less than institute Pre-set threshold is stated, do not wake up voice recording device and further obtains user instruction.
2. voice awakening method according to claim 1, which is characterized in that in the typing original audio data and acquisition The corresponding environmental audio data of scene applied by voice recording device, according to environmental audio data by the original audio data In the step of being converted into environment speech simulation data, the environment speech simulation data include the noise mode to original audio data The one of them or multinomial that quasi-, word speed simulation, reverberation simulation, tone and loudness are simulated.
3. voice awakening method according to claim 1, which is characterized in that it is described to primary voice data and simulation language Sound data carry out framing windowing operation, to obtain the raw tone and/or the corresponding speech frame of analog voice, and described in extraction After the step of acoustic feature information of speech frame, further includes:
Denoising is carried out to the acoustic feature information of the speech frame.
4. voice awakening method according to claim 1, which is characterized in that in the admission live voice data, extract The corresponding acoustic feature information of the live voice data, and the corresponding acoustic feature information of the live voice data is defeated Enter to the deep neural network disaggregated model, the step of to obtain the posterior probability information of the live voice data in, also Include:
Denoising is carried out to the corresponding acoustic feature information of the field data.
5. a kind of voice wakes up system, which is characterized in that including following system:
Voice data analog module, it is corresponding to scene applied by typing original audio data and acquisition voice recording device Environmental audio data convert environment speech simulation data for the original audio data according to environmental audio data;
Characteristic extracting module, to carry out framing windowing operation to primary voice data and/or analog voice data, to obtain Raw tone and/or the corresponding speech frame of analog voice are stated, and extracts the acoustic feature information of the speech frame;
Depth network neural module is at least wrapped calculating the acoustic feature information with obtaining the speech frame What is contained wakes up word class and the non-deep neural network disaggregated model for waking up word class;
Decision-making module is waken up, enrolling live voice data, extracts the corresponding acoustic feature information of the live voice data, And by the corresponding acoustic feature information input of the live voice data to the deep neural network disaggregated model, to obtain The posterior probability information of the live voice data calculates the admission live voice data according to the posterior probability information Confidence level, and by described compared with pre-set threshold, when the confidence level is greater than the pre-set threshold, wake-up voice input is set It is standby, when the confidence level be less than the pre-set threshold, do not wake up voice recording device and further obtain user instruction.
6. voice according to claim 5 wakes up system, which is characterized in that in the typing original audio data and acquisition The corresponding environmental audio data of scene applied by voice recording device, according to environmental audio data by the original audio data In the step of being converted into environment speech simulation data, the environment speech simulation data include the noise mode to original audio data The one of them or multinomial that quasi-, word speed simulation, reverberation simulation, tone and loudness are simulated.
7. voice according to claim 5 wakes up system, which is characterized in that the system also includes:
Denoising carries out denoising from coding module, to the acoustic feature information to the speech frame.
8. voice according to claim 5 wakes up system, which is characterized in that wake up decision-making module further include:
Unit is denoised, to carry out denoising to the corresponding acoustic feature information of the data.
CN201811081600.XA 2018-09-17 2018-09-17 voice awakening method and system Pending CN109036412A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811081600.XA CN109036412A (en) 2018-09-17 2018-09-17 voice awakening method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811081600.XA CN109036412A (en) 2018-09-17 2018-09-17 voice awakening method and system

Publications (1)

Publication Number Publication Date
CN109036412A true CN109036412A (en) 2018-12-18

Family

ID=64622013

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811081600.XA Pending CN109036412A (en) 2018-09-17 2018-09-17 voice awakening method and system

Country Status (1)

Country Link
CN (1) CN109036412A (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109801629A (en) * 2019-03-01 2019-05-24 珠海格力电器股份有限公司 A kind of sound control method, device, storage medium and air-conditioning
CN109886386A (en) * 2019-01-30 2019-06-14 北京声智科技有限公司 Wake up the determination method and device of model
CN110223708A (en) * 2019-05-07 2019-09-10 平安科技(深圳)有限公司 Sound enhancement method and relevant device based on speech processes
CN110534102A (en) * 2019-09-19 2019-12-03 北京声智科技有限公司 A kind of voice awakening method, device, equipment and medium
CN110767231A (en) * 2019-09-19 2020-02-07 平安科技(深圳)有限公司 Voice control equipment awakening word identification method and device based on time delay neural network
CN110838289A (en) * 2019-11-14 2020-02-25 腾讯科技(深圳)有限公司 Awakening word detection method, device, equipment and medium based on artificial intelligence
CN111081217A (en) * 2019-12-03 2020-04-28 珠海格力电器股份有限公司 Voice wake-up method and device, electronic equipment and storage medium
CN111833869A (en) * 2020-07-01 2020-10-27 中关村科学城城市大脑股份有限公司 Voice interaction method and system applied to urban brain
WO2020228815A1 (en) * 2019-05-16 2020-11-19 华为技术有限公司 Voice-based wakeup method and device
CN112825250A (en) * 2019-11-20 2021-05-21 芋头科技(杭州)有限公司 Voice wake-up method, apparatus, storage medium and program product
CN112992189A (en) * 2021-01-29 2021-06-18 青岛海尔科技有限公司 Voice audio detection method and device, storage medium and electronic device
CN113593560A (en) * 2021-07-29 2021-11-02 普强时代(珠海横琴)信息技术有限公司 Customizable low-delay command word recognition method and device
CN113782016A (en) * 2021-08-06 2021-12-10 佛山市顺德区美的电子科技有限公司 Wake-up processing method, device, equipment and computer storage medium
WO2023029615A1 (en) * 2021-08-30 2023-03-09 华为技术有限公司 Wake-on-voice method and apparatus, device, storage medium, and program product

Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514879A (en) * 2013-09-18 2014-01-15 广东欧珀移动通信有限公司 Local voice recognition method based on BP neural network
CN105096939A (en) * 2015-07-08 2015-11-25 百度在线网络技术(北京)有限公司 Voice wake-up method and device
CN105448303A (en) * 2015-11-27 2016-03-30 百度在线网络技术(北京)有限公司 Voice signal processing method and apparatus
CN105632486A (en) * 2015-12-23 2016-06-01 北京奇虎科技有限公司 Voice wake-up method and device of intelligent hardware
CN106057192A (en) * 2016-07-07 2016-10-26 Tcl集团股份有限公司 Real-time voice conversion method and apparatus
CN106127217A (en) * 2015-05-07 2016-11-16 西门子保健有限责任公司 The method and system that neutral net detects is goed deep into for anatomical object for approximation
CN106297779A (en) * 2016-07-28 2017-01-04 块互动(北京)科技有限公司 A kind of background noise removing method based on positional information and device
CN106328126A (en) * 2016-10-20 2017-01-11 北京云知声信息技术有限公司 Far-field speech recognition processing method and device
CN106611599A (en) * 2015-10-21 2017-05-03 展讯通信(上海)有限公司 Voice recognition method and device based on artificial neural network and electronic equipment
CN106611598A (en) * 2016-12-28 2017-05-03 上海智臻智能网络科技股份有限公司 VAD dynamic parameter adjusting method and device
CN106683663A (en) * 2015-11-06 2017-05-17 三星电子株式会社 Neural network training apparatus and method, and speech recognition apparatus and method
CN106782536A (en) * 2016-12-26 2017-05-31 北京云知声信息技术有限公司 A kind of voice awakening method and device
CN106940998A (en) * 2015-12-31 2017-07-11 阿里巴巴集团控股有限公司 A kind of execution method and device of setting operation
CN107123417A (en) * 2017-05-16 2017-09-01 上海交通大学 Optimization method and system are waken up based on the customized voice that distinctive is trained
CN107134279A (en) * 2017-06-30 2017-09-05 百度在线网络技术(北京)有限公司 A kind of voice awakening method, device, terminal and storage medium
CN107945788A (en) * 2017-11-27 2018-04-20 桂林电子科技大学 A kind of relevant Oral English Practice pronunciation error detection of text and quality score method
CN108242234A (en) * 2018-01-10 2018-07-03 腾讯科技(深圳)有限公司 Speech recognition modeling generation method and its equipment, storage medium, electronic equipment
CN108320733A (en) * 2017-12-18 2018-07-24 上海科大讯飞信息科技有限公司 Voice data processing method and device, storage medium, electronic equipment
CN108335702A (en) * 2018-02-01 2018-07-27 福州大学 A kind of audio defeat method based on deep neural network
CN108494710A (en) * 2018-03-30 2018-09-04 中南民族大学 Visible light communication MIMO anti-interference noise-reduction methods based on BP neural network

Patent Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514879A (en) * 2013-09-18 2014-01-15 广东欧珀移动通信有限公司 Local voice recognition method based on BP neural network
CN106127217A (en) * 2015-05-07 2016-11-16 西门子保健有限责任公司 The method and system that neutral net detects is goed deep into for anatomical object for approximation
CN105096939A (en) * 2015-07-08 2015-11-25 百度在线网络技术(北京)有限公司 Voice wake-up method and device
CN106611599A (en) * 2015-10-21 2017-05-03 展讯通信(上海)有限公司 Voice recognition method and device based on artificial neural network and electronic equipment
CN106683663A (en) * 2015-11-06 2017-05-17 三星电子株式会社 Neural network training apparatus and method, and speech recognition apparatus and method
CN105448303A (en) * 2015-11-27 2016-03-30 百度在线网络技术(北京)有限公司 Voice signal processing method and apparatus
CN105632486A (en) * 2015-12-23 2016-06-01 北京奇虎科技有限公司 Voice wake-up method and device of intelligent hardware
CN106940998A (en) * 2015-12-31 2017-07-11 阿里巴巴集团控股有限公司 A kind of execution method and device of setting operation
CN106057192A (en) * 2016-07-07 2016-10-26 Tcl集团股份有限公司 Real-time voice conversion method and apparatus
CN106297779A (en) * 2016-07-28 2017-01-04 块互动(北京)科技有限公司 A kind of background noise removing method based on positional information and device
CN106328126A (en) * 2016-10-20 2017-01-11 北京云知声信息技术有限公司 Far-field speech recognition processing method and device
CN106782536A (en) * 2016-12-26 2017-05-31 北京云知声信息技术有限公司 A kind of voice awakening method and device
CN106611598A (en) * 2016-12-28 2017-05-03 上海智臻智能网络科技股份有限公司 VAD dynamic parameter adjusting method and device
CN107123417A (en) * 2017-05-16 2017-09-01 上海交通大学 Optimization method and system are waken up based on the customized voice that distinctive is trained
CN107134279A (en) * 2017-06-30 2017-09-05 百度在线网络技术(北京)有限公司 A kind of voice awakening method, device, terminal and storage medium
CN107945788A (en) * 2017-11-27 2018-04-20 桂林电子科技大学 A kind of relevant Oral English Practice pronunciation error detection of text and quality score method
CN108320733A (en) * 2017-12-18 2018-07-24 上海科大讯飞信息科技有限公司 Voice data processing method and device, storage medium, electronic equipment
CN108242234A (en) * 2018-01-10 2018-07-03 腾讯科技(深圳)有限公司 Speech recognition modeling generation method and its equipment, storage medium, electronic equipment
CN108335702A (en) * 2018-02-01 2018-07-27 福州大学 A kind of audio defeat method based on deep neural network
CN108494710A (en) * 2018-03-30 2018-09-04 中南民族大学 Visible light communication MIMO anti-interference noise-reduction methods based on BP neural network

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109886386A (en) * 2019-01-30 2019-06-14 北京声智科技有限公司 Wake up the determination method and device of model
CN109801629A (en) * 2019-03-01 2019-05-24 珠海格力电器股份有限公司 A kind of sound control method, device, storage medium and air-conditioning
CN110223708A (en) * 2019-05-07 2019-09-10 平安科技(深圳)有限公司 Sound enhancement method and relevant device based on speech processes
CN110223708B (en) * 2019-05-07 2023-05-30 平安科技(深圳)有限公司 Speech enhancement method based on speech processing and related equipment
WO2020228815A1 (en) * 2019-05-16 2020-11-19 华为技术有限公司 Voice-based wakeup method and device
CN110534102A (en) * 2019-09-19 2019-12-03 北京声智科技有限公司 A kind of voice awakening method, device, equipment and medium
CN110534102B (en) * 2019-09-19 2020-10-30 北京声智科技有限公司 Voice wake-up method, device, equipment and medium
CN110767231A (en) * 2019-09-19 2020-02-07 平安科技(深圳)有限公司 Voice control equipment awakening word identification method and device based on time delay neural network
US11848008B2 (en) 2019-11-14 2023-12-19 Tencent Technology (Shenzhen) Company Limited Artificial intelligence-based wakeup word detection method and apparatus, device, and medium
CN110838289B (en) * 2019-11-14 2023-08-11 腾讯科技(深圳)有限公司 Wake-up word detection method, device, equipment and medium based on artificial intelligence
CN110838289A (en) * 2019-11-14 2020-02-25 腾讯科技(深圳)有限公司 Awakening word detection method, device, equipment and medium based on artificial intelligence
WO2021093449A1 (en) * 2019-11-14 2021-05-20 腾讯科技(深圳)有限公司 Wakeup word detection method and apparatus employing artificial intelligence, device, and medium
CN112825250A (en) * 2019-11-20 2021-05-21 芋头科技(杭州)有限公司 Voice wake-up method, apparatus, storage medium and program product
CN111081217B (en) * 2019-12-03 2021-06-04 珠海格力电器股份有限公司 Voice wake-up method and device, electronic equipment and storage medium
CN111081217A (en) * 2019-12-03 2020-04-28 珠海格力电器股份有限公司 Voice wake-up method and device, electronic equipment and storage medium
CN111833869B (en) * 2020-07-01 2022-02-11 中关村科学城城市大脑股份有限公司 Voice interaction method and system applied to urban brain
CN111833869A (en) * 2020-07-01 2020-10-27 中关村科学城城市大脑股份有限公司 Voice interaction method and system applied to urban brain
CN112992189B (en) * 2021-01-29 2022-05-03 青岛海尔科技有限公司 Voice audio detection method and device, storage medium and electronic device
CN112992189A (en) * 2021-01-29 2021-06-18 青岛海尔科技有限公司 Voice audio detection method and device, storage medium and electronic device
CN113593560A (en) * 2021-07-29 2021-11-02 普强时代(珠海横琴)信息技术有限公司 Customizable low-delay command word recognition method and device
CN113593560B (en) * 2021-07-29 2024-04-16 普强时代(珠海横琴)信息技术有限公司 Customizable low-delay command word recognition method and device
CN113782016A (en) * 2021-08-06 2021-12-10 佛山市顺德区美的电子科技有限公司 Wake-up processing method, device, equipment and computer storage medium
CN113782016B (en) * 2021-08-06 2023-05-05 佛山市顺德区美的电子科技有限公司 Wakeup processing method, wakeup processing device, equipment and computer storage medium
WO2023029615A1 (en) * 2021-08-30 2023-03-09 华为技术有限公司 Wake-on-voice method and apparatus, device, storage medium, and program product

Similar Documents

Publication Publication Date Title
CN109036412A (en) voice awakening method and system
CN106098059B (en) Customizable voice awakening method and system
CN109326299B (en) Speech enhancement method, device and storage medium based on full convolution neural network
CN106504768B (en) Phone testing audio frequency classification method and device based on artificial intelligence
CN110970018B (en) Speech recognition method and device
KR20060022156A (en) Distributed speech recognition system and method
CN103377651B (en) The automatic synthesizer of voice and method
CN103456305A (en) Terminal and speech processing method based on multiple sound collecting units
CN104538043A (en) Real-time emotion reminder for call
CN110930976A (en) Voice generation method and device
CN110600008A (en) Voice wake-up optimization method and system
CN105895082A (en) Acoustic model training method and device as well as speech recognition method and device
CN112581938B (en) Speech breakpoint detection method, device and equipment based on artificial intelligence
CN109410956A (en) A kind of object identifying method of audio data, device, equipment and storage medium
CN112328994A (en) Voiceprint data processing method and device, electronic equipment and storage medium
CN103811000A (en) Voice recognition system and voice recognition method
CN113763966B (en) End-to-end text irrelevant voiceprint recognition method and system
CN105845131A (en) Far-talking voice recognition method and device
CN116705071A (en) Playback voice detection method based on data enhancement and pre-training model feature extraction
CN113099043A (en) Customer service control method, apparatus and computer-readable storage medium
CN115762500A (en) Voice processing method, device, equipment and storage medium
CN115472174A (en) Sound noise reduction method and device, electronic equipment and storage medium
CN114333912A (en) Voice activation detection method and device, electronic equipment and storage medium
CN103533193B (en) Residual echo elimination method and device
CN117636909B (en) Data processing method, device, equipment and computer readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181218

RJ01 Rejection of invention patent application after publication