CN110517670A - Promote the method and apparatus for waking up performance - Google Patents

Promote the method and apparatus for waking up performance Download PDF

Info

Publication number
CN110517670A
CN110517670A CN201910801354.9A CN201910801354A CN110517670A CN 110517670 A CN110517670 A CN 110517670A CN 201910801354 A CN201910801354 A CN 201910801354A CN 110517670 A CN110517670 A CN 110517670A
Authority
CN
China
Prior art keywords
wake
speech frame
input signal
word
frame
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910801354.9A
Other languages
Chinese (zh)
Inventor
焦蓓
周强
徐俊峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
AI Speech Ltd
Original Assignee
AI Speech Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by AI Speech Ltd filed Critical AI Speech Ltd
Priority to CN201910801354.9A priority Critical patent/CN110517670A/en
Publication of CN110517670A publication Critical patent/CN110517670A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Signal Processing (AREA)
  • Telephone Function (AREA)

Abstract

The present invention discloses the method and apparatus for being promoted and waking up performance, wherein a method of it is promoted and wakes up performance, comprising: input signal is detected in the form of speech frame, determines that the speech frame is doubtful speech frame or non-speech frame;Efficient voice section is determined based on the doubtful speech frame and non-speech frame of judgement;The efficient voice section is input to adaptive wake up in model, wherein the adaptive model that wakes up is wake-up word positive example sample and the non-disaggregated model more than one for waking up word negative data and obtain after the deep neural network study of supervision ground based on collection;And the adaptive output for waking up model is obtained, wake-up device is determined whether based on the output.The scheme that the present processes and device provide is added the adaptive model that wakes up and is adaptively waken up to efficient voice section, and so as to effectively promote wake-up rate, especially effect is particularly evident in noisy environment.

Description

Promote the method and apparatus for waking up performance
Technical field
The invention belongs to voice awakening technology fields, more particularly to promote the method and apparatus for waking up performance.
Background technique
In the related technology, there are no promote wake-up rate to reduce the sample of false wake-up again currently on the market.It wakes up and accidentally calls out Waking up is shifting relationship, is all to select one and do to optimize currently on the market, and on the basis of keeping a side not drop substantially, optimization is another One side.In terms of reducing false wake-up, comparing have representative is the secondary verifying of addition;In terms of promoting wake-up rate, mainly reduces and call out Awake threshold.
Inventor has found during realizing the application, is primarily present following scheme in the prior art:
1, secondary authentication module is added, dual thresholding is set, engine is waken up and is tentatively judged received voice signal, Secondary verifying is then opened when meeting preset condition, determines whether equipment executes wake operation by the result of secondary verifying;
2, it reduces and wakes up threshold, most simple and fast is exactly to reduce threshold wake-up value, achievees the purpose that be easy wake-up.
Wherein, on the one hand, it is secondary to verify the power consumption that will increase equipment, delay is increased, the interaction of user in practice is influenced Experience;In addition, there is the risk of collapse in the limited equipment of some calculation resources, it cannot spread to that computing capability is weaker to be set It is standby upper, also along with the reduction of wake-up rate while reducing rapidly false wake-up.On the other hand, reducing wake-up threshold can be effective Wake-up rate is promoted, but the promotion of false wake-up can be brought simultaneously.
Summary of the invention
The embodiment of the present invention provides a kind of method and apparatus for being promoted and waking up performance, at least solving above-mentioned technical problem One of.
In a first aspect, the embodiment of the present invention provides a kind of method for being promoted and waking up performance, comprising: in the form of speech frame pair Input signal is detected, and determines that the speech frame is doubtful speech frame or non-speech frame;Doubtful speech frame based on judgement and Non-speech frame determines efficient voice section;The efficient voice section is input to adaptive wake up in model, wherein described adaptive Waking up model is that wake-up word positive example sample and non-wake-up word negative data based on collection have carried out supervision ground deep neural network The disaggregated model more than one obtained after study;And the adaptive output for waking up model is obtained, it is determined based on the output Whether wake-up device.
Second aspect, the embodiment of the present invention provide a kind of device for being promoted and waking up performance, comprising: note determination module, configuration To detect in the form of speech frame to input signal, determine that the speech frame is doubtful speech frame or non-speech frame;Effectively Voice segments determining module, is configured to the doubtful speech frame determined and non-speech frame signal determines efficient voice section;Adaptively Wake-up module is configured to for the efficient voice section to be input to adaptive wake up in model, wherein the adaptive wake-up model It is to be obtained after wake-up word positive example sample and non-wake-up word negative data based on collection have carried out the deep neural network study of supervision ground Two disaggregated models taken;And result output module, it is configured to obtain the adaptive output for waking up model, is based on institute It states output and determines whether wake-up device.
The third aspect provides a kind of electronic equipment comprising: at least one processor, and with described at least one Manage the memory of device communication connection, wherein the memory is stored with the instruction that can be executed by least one described processor, institute It states instruction to be executed by least one described processor, so that at least one described processor is able to carry out any embodiment of the present invention Promotion wake up performance method the step of.
Fourth aspect, the embodiment of the present invention also provide a kind of computer program product, and the computer program product includes The computer program being stored on non-volatile computer readable storage medium storing program for executing, the computer program include program instruction, when When described program instruction is computer-executed, the computer is made to execute the side of the promotion wake-up performance of any embodiment of the present invention The step of method.
The scheme that the present processes and device provide is by determining efficient voice by first detecting to input voice Then the efficient voice section is input in the adaptive wake-up model of training in advance, is adaptively waken up, Ke Yiyou by section Effect ground, which is promoted, wakes up accuracy, and the input voice that can constantly collect due to adaptively waking up model user has carried out prison Superintend and direct it is trained, can be more complicated for usage scenario with so as to the usage scenario according to user is effectively optimized Environment wake up accuracy it is especially high.
Detailed description of the invention
In order to illustrate the technical solution of the embodiments of the present invention more clearly, required use in being described below to embodiment Attached drawing be briefly described, it should be apparent that, drawings in the following description are some embodiments of the invention, for ability For the those of ordinary skill of domain, without creative efforts, it can also be obtained according to these attached drawings other attached Figure.
Fig. 1 is the flow chart for the method that a kind of promotion that one embodiment of the invention provides wakes up performance;
Fig. 2 is the flow chart for another method for promoting wake-up performance that one embodiment of the invention provides;
Fig. 3 is the flow chart for the method that another promotion that one embodiment of the invention provides wakes up performance;
Fig. 4 is the flow chart for the method that another promotion that one embodiment of the invention provides wakes up performance;
Fig. 5 is the process of a specific example of the method that a kind of promotion that one embodiment of the invention provides wakes up performance Figure;
Fig. 6 is the block diagram for the device that a kind of promotion that one embodiment of the invention provides wakes up performance;
Fig. 7 is the structural schematic diagram for the electronic equipment that one embodiment of the invention provides.
Specific embodiment
In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without creative efforts, shall fall within the protection scope of the present invention.
Referring to FIG. 1, it illustrates the flow chart that the promotion of the application wakes up one embodiment of method of performance, the present embodiment Promotion wake up performance method can be adapted for the terminal for having Intelligent voice dialog arousal function, such as intelligent sound TV, Intelligent sound box, Intelligent dialogue toy and other existing intelligent terminals for supporting voice to wake up etc..
As shown in Figure 1, in a step 101, detecting in the form of speech frame to input signal, determine that speech frame is doubtful Like speech frame or non-speech frame;
Later, in a step 102, efficient voice section is determined based on the doubtful speech frame of judgement and non-speech frame signal;
Then, in step 103, efficient voice section is input to adaptive wake up in model;
Finally, at step 104, the adaptive output for waking up model is obtained, the wake-up device if output is wakes up word, if Output is non-wake-up word then not wake-up device.
In the present embodiment, for step 101, the device for waking up performance is promoted first in the form of speech frame to each frame Input signal carries out detection judgement, determines that the speech frame is doubtful speech frame or non-speech frame, i.e., each frame signal can all be judged to Fixed, it is non-speech frame that the frame that judgement result has, which is the frame that doubtful speech frame has,.Later, it for step 102, is promoted and wakes up performance Device determines efficient voice section based on the doubtful speech frame and non-speech frame signal of judgement.
Later, for step 103, efficient voice section is input to adaptive wake-up model by the device for promoting wake-up performance In, wherein the adaptive model that wakes up is the wake-up word positive example sample and non-wake-up word counter-example sample under the various scenes based on collection This carries out the disaggregated model more than one that obtains after the deep neural network study of supervision ground, by using the several scenes of collection It is with having carried out supervision trained to the adaptive wake-up model to wake up word positive example sample and non-wake-up word negative data, this can be made The adaptive model that wakes up can adaptively be made whether the judgement comprising waking up word to the input speech signal under various scenes, By guiding with having supervision, the adaptive model that wakes up constantly is got a promotion to the wake-up accuracy of various scenes. Finally, for step 104, the adaptive output for waking up model of device acquisition for waking up performance is promoted, if exporting as wake-up word Wake up instruction is sent to equipment, does not send wake up instruction to equipment if exporting as non-wake-up word.
The method of the present embodiment determines efficient voice section, then by effective language by first detecting to input voice Segment is input in the adaptive wake-up model of training in advance, is adaptively waken up, and it is accurate can effectively to promote wake-up Degree, and the input voice that can constantly collect due to adaptively waking up model user is with having carried out supervision trained, can make Obtaining it can effectively be optimized according to the usage scenario of user, and the environment more complicated for usage scenario wakes up accuracy It is especially high.
With further reference to Fig. 2, another provided it illustrates one embodiment of the application promotes the method for wake-up performance Flow chart.The flow chart is mainly process the step of further limiting to step 102 in Fig. 1.
As shown in Fig. 2, in step 201, if it is determined that occurring first continuously to preset the input signal of frame number being doubtful language Sound frame determines that the first speech frame in the input signal of the first continuous default frame number is the starting point of efficient voice section;
In step 202, after the starting point for determining efficient voice section, if it is determined that there is the second input for continuously presetting frame number Signal is non-speech frame or the end speech frame for detecting input signal, is determined in the second continuous input signal for presetting frame number First speech frame be efficient voice section terminal;
In step 203, the terminal of the starting point based on determining efficient voice section and efficient voice section is from input signal Select out efficient voice section.
In the present embodiment, for step 201, if promoting the device judgement for waking up performance there is the first continuous default frame Several input signals is doubtful speech frame, it is determined that the first frame in the input signal of the first continuous default frame number is effective The starting point of speech frame.Later, for step 202, after the starting point that efficient voice frame has been determined, if promoting the dress for waking up performance It sets and determines to occur the input signal of the second continuous default frame number and be non-speech frame signal or have been detected by input signal End speech frame, then can determine the first speech frame in input signal of the second continuous default frame number or change end frame and be The terminal of efficient voice section.It, can be from this based on the beginning and end of the efficient voice frame of the determination finally, for step 203 Efficient voice section is selected out in input signal.Such as first continuously default frame number can be 10 frames, second continuously default frame number can also Think that 10 frames or the first continuous default frame number are 20 frames, second presets frame number continuously as 15 frames, and the application does not limit herein System.In addition, may also can have multiple efficient voice sections in input signal, it can be using one section of one section of ground detection of aforesaid way Out, there is no limit herein by the application.
The method of the present embodiment is doubtful voice frame signal or non-language by the input signal for determining continuous default frame number Sound frame signal can determine the beginning and end of one or more efficient voice sections with this, so as to by efficient voice section from It is selected out in input signal, judges whether there is default wake-up word from the efficient voice section convenient for subsequent, be subsequent judgement pole The earth reduces workload.
With further reference to Fig. 3, it illustrates another promotions that one embodiment of the application provides to wake up the method for performance Flow chart.The process for the step of the step of flow chart is primarily directed to after step 101 in Fig. 1 further limits.
As shown in figure 3, in step 301, if it is determined that the input signal for not occurring the first continuous default frame number is doubtful When speech frame, continue to detect input signal and determine until occurring first continuously to preset the input signal of frame number being doubtful Like speech frame;
In step 302, after the starting point for determining efficient voice section, if it is determined that not occurring the defeated of the second continuous default frame number When entering signal and being non-speech frame, continue to detect input signal and determine until there is the defeated of the second continuous default frame number Entering signal is non-speech frame signal or the end speech frame up to input signal.
It in the present embodiment, is doubtful when there is no appearance first continuously to preset the input signal of frame number for step 301 When speech frame, need always to detect input signal and determine to know that it is successively doubtful for the input of the first continuous frame number occur Like speech frame or detect the end speech frame of input signal.For step 302, it is determined that after starting point, if it is determined that not going out When the input signal of existing second continuous default frame number is non-speech frame, continue to detect input signal and determine until going out The input signal of existing second continuous default frame number is non-speech frame signal or the end speech frame up to input signal.To this Embodiment can effectively handle various the case where being likely to occur.
In some alternative embodiments, before being detected in the form of speech frame to input signal, method is also wrapped It includes: input signal is pre-processed to promote signal-to-noise ratio.It is further alternative, input signal is pre-processed to promote letter It makes an uproar than including: to carry out analog-to-digital conversion to the analog signal of acquisition, and configure sample rate and quantizating index, is converted into digital signal; And time-domain information, frequency domain information, delay and/or energy loss based on digital signal, to the target direction of digital signal Voice is enhanced to promote signal-to-noise ratio.
Referring to FIG. 4, it illustrates the flow charts that the promotion that the application another embodiment provides wakes up the method for performance.It should The process for the step of flow chart is further limited primarily directed to step 104 in Fig. 1.
As shown in figure 4, in step 401, waking up word for presetting based on the adaptive model calculating efficient voice section that wakes up Confidence level, judge whether confidence level is greater than and default wake up word threshold value;
In step 402, if confidence level is more than or equal to default wake-up word threshold value, wake-up device;
In step 403, if confidence level be less than it is default wake up word threshold value, not wake-up device.
In the present embodiment, for step 401, promote the device for waking up performance has according to what adaptive wake-up model calculated Voice segments are imitated for the default confidence level for waking up word, then judge whether the confidence level is greater than default wake-up word threshold value.For step Rapid 402, it, can be with wake-up device if the confidence level is more than or equal to default wake-up word threshold value.For step 403, if the confidence level Small space is default to wake up word threshold value, then is unable to wake-up device.So as to be sentenced by the confidence level for adaptively waking up model calculating It is disconnected whether wake-up device.
In some alternative embodiments, efficient voice section is calculated for default wake-up word based on the adaptive model that wakes up Confidence level includes: to calculate the posterior probability of each speech frame in efficient voice section to obtain each speech frame for default wake-up word Confidence level;Confidence calculations efficient voice section based on each speech frame is for the default confidence level for waking up word.
Below to some problems encountered in the implementation of the present invention by description inventor and to finally determination One specific embodiment of scheme is illustrated, so that those skilled in the art more fully understand the scheme of the application.
Inventor has found that the defect of prior art is mainly due to following original after carefully studying to the prior art Because caused by:
1, secondary verifying, first passage wake-up module are secondary to pass through identification module;Basic thought is the voice that front end obtains Signal passes through level-one wake-up module, checks current knowledge when the confidence level that present frame wakes up word is more than or equal to default wake-up word threshold value The decoding result of other module executes wake operation if including to wake up word in the sentence decoded, does not otherwise do any feedback, Secondary identification module at this time plays decisive role.
2, the identification model recognition accuracy of big parameter amount is high, and whether can effectively identify is real wake-up word, False wake-up is reduced, but parameter amount is big, calculation amount is necessarily big, and the occupancy of CPU and memory is just come up, certainly delay and power consumption Thereupon, the feeling of subjective experience is exactly that interaction is blunt, and reaction is slow, and equipment is easy hot;Small parameter amount model calculation amount is small, But model identification is inaccurate, the identification mistake for waking up word causes wake-up rate to decline, and the audio that can be waken up originally is not due to identifying Refuse to wake up out, such case maximum probability appears in some cacoepies or noisy environment.
3, threshold wake-up value is reduced in order to promote wake-up rate, threshold wake-up value is to wake up the sequencing that occurs according to word of root to it What posteriority weighted, threshold value reduces the requirement reduction meant that partial words in word are waken up, causes to delete and replace accidentally to call out Awake increase.
Those skilled in the art may use following scheme to solve drawbacks described above:
It usually will recognize that the condition for suitably loosening secondary verifying, reduce the size of secondary verifying model, reduce a part meter Calculation amount, or reduce the reduction ratio of threshold wake-up value.
Inventor has found that these methods can quickly do defect before some excellent during realizing the application Change, but does not all tackle the problem at its root.For Item drive type industry, everybody consider it is more be that product is quickly fallen Ground goes to consider the mode that other are more deep so as to no time enough and energy.
The scheme of the application proposes a kind of device for being promoted and waking up performance:
By big data analysis, the usage scenario of product, frequency of use are obtained, the distribution of user's use state is based on depth Learning art excavates user and is intended to, and real-time monitoring currently and history acoustic environmental information, is converted into oneself by original fixed module Adaptive learning module, to adapt to different scene demands, false wake-up reduces by 60% and (is based on the basis of calculation amount is not increased 600 hours household scrnario testings);Simultaneously by learning to the above-mentioned information counted on, it is adaptive after module can be with It effectively monitors rapidly and obtains effective doubtful wake-up voice segments, so as to effectively promote wake-up rate, especially noisy Effect is particularly evident in environment.
Referring to FIG. 5, it illustrates the flow charts of a specific embodiment of the scheme of the application, it should be noted that with Although referring to some specific examples in lower embodiment, the scheme being not intended to limit this application.
As shown in figure 5, detailed, steps are as follows:
1, pre-processing is carried out to the signal of equipment end acquisition, carries out analog-to-digital conversion, configures suitable sample rate and quantization Index is converted into manageable digital signal;
2, pre-processing is done to the collected digital signal of step 1, main purpose is to inhibit noise, enhances voice.It is based on Signal time domain, frequency domain information are delayed in the communication process of binding signal, and energy loss enhances the voice of target direction, Promote signal-to-noise ratio;
3, the enhanced signal of step 2 is detected, is based on big data and deep learning, it is less to obtain a number of plies, Two few depth of assortment neural network models of number of nodes are voice or non-voice by calculating posterior probability differentiation, when after When testing threshold value of the probability greater than a certain setting, it is determined as doubtful speech frame, is otherwise determined as non-voice, it is doubtful for accumulating several frames all When like speech frame, it is determined as that voice starts, once voice starts to enter step 3, otherwise continues this step;
4, effective voice segments are sent into wake-up model and are calculated by the detection information obtained based on step 3.Wake-up module Middle wake-up model is the sample information based on collection, and positive example sample and non-wake-up word negative data comprising waking up word are had The disaggregated model more than one that the deep neural network study of supervision obtains.In conjunction with current acoustic enviroment, usage scenario, analysis is used The use at family is intended to, autonomous learning, adaptive adjustment wake-up module.The efficient voice input that step 2 detects is adjusted to call out In module of waking up, the posteriority of every frame data is calculated, the confidence level for specified wake-up word of present frame is obtained, if the confidence level is big Then start wake-up mechanism when being equal to the specified preset threshold for waking up word, otherwise enters step 3.
Wake-up module after adaptive can distinguish noise scenarios well, and then can effectively inhibit the mistake of noise field Wake up, on the basis of calculation amount is not substantially increased, false wake-up reduce by 60%, while if current speech signal due to various originals Cause, environment is too noisy or human hair sound of speaking is inaccurate etc. cause it is artificial for the first time do not reach when send wake up instruction it is default Threshold wake-up value starting wakes up, then after adaptive learning, next equal ambient, under equivalent sound even more rugged environment When sending wake up instruction again, meeting maximum probability gets the wake up instruction of transmission and correctly starts wake-up states, thus very great Cheng Wake-up rate is improved on degree, to low signal-to-noise ratio, the acquisition promotion of the wake up instruction with accent is especially apparent adaptation mechanism.
Inventor also used following alternative, and summarize the alternative during realizing the application Merits and demerits.
Although current scheme can promote wake-up performance to a certain extent, calculating is not increased substantially under equal conditions Amount, can also be a little painstaking but use in the especially deficient equipment of computing resource.
Inventor also attempted following scheme during realizing the application:
One of alternative is the structure of reduced-order models, but the reduction of model structure brings and wakes up under performance Drop reduces calculation amount so mainstream way still does fixed point to wake-up model at present, can be nervous in script computing resource It promotes in equipment.
Another alternative is: in terms of reducing false wake-up, being limited to each word waken up in word, if waking up word In each word reach the confidence level of requirement and the word and be more than or equal to preset threshold value and then open wake-up, otherwise refuse.It is this The advantage of mode is exactly very effective to false wake-up is reduced, but the requirement for speaker is also relatively stringent, wakes up some in word Word cacoepy or pronunciation have deviation, pronounce indistinctly, and word speed is too fast to flood one of word, all wake-up rate can be caused not high, Be not suitable for popularizing, can only be used in certain specific demands.
Above-described embodiment at least can be realized following technical effect:
Wake-up module after adaptive can distinguish noise scenarios well, and then can effectively inhibit the mistake of noise field Wake up, on the basis of calculation amount is not substantially increased, false wake-up reduce by 60%, while if current speech signal due to various originals Cause, environment is too noisy or human hair sound of speaking is inaccurate etc. cause it is artificial for the first time do not reach when send wake up instruction it is default Threshold wake-up value starting wakes up, then after adaptive learning, next equal ambient, under equivalent sound even more rugged environment When sending wake up instruction again, meeting maximum probability gets the wake up instruction of transmission and correctly starts wake-up states, thus very great Cheng Wake-up rate is improved on degree, to low signal-to-noise ratio, the acquisition promotion of the wake up instruction with accent is especially apparent adaptation mechanism.
Referring to FIG. 6, it illustrates the block diagrams that the promotion that one embodiment of the invention provides wakes up the device of performance.
As shown in fig. 6, promoting the device 600 for waking up performance, module 610, efficient voice section determining module are included determining whether 620, adaptive wake-up module 630 and result output module 640.
Wherein, determination module 610 are configured to detect input signal in the form of speech frame, determine the voice Frame is doubtful speech frame or non-speech frame;Efficient voice section determining module 620 is configured to the doubtful speech frame and non-determined Voice frame signal determines efficient voice section;Adaptive wake-up module 630, is configured to for the efficient voice section being input to adaptively It wakes up in model, wherein the adaptive wake-up model is wake-up word positive example sample and non-wake-up word counter-example sample based on collection This carries out two disaggregated models obtained after the deep neural network study of supervision ground;And result output module 640, configuration To obtain the adaptive output for waking up model, wake-up device is determined whether based on the output.
It should be appreciated that all modules recorded in Fig. 6 with reference to each in method described in Fig. 1, Fig. 2, Fig. 3 and Fig. 4 Step is corresponding.The operation above with respect to method description and feature and corresponding technical effect are equally applicable in Fig. 6 as a result, All modules, details are not described herein.
It is worth noting that, the scheme that the module in embodiments herein is not intended to limit this application, such as determine Module can be described as detecting input signal in the form of speech frame, determine that the speech frame is doubtful speech frame or non- The module of speech frame.Furthermore it is also possible to realize related function module by hardware processor, such as determination module can also be used Processor realizes that details are not described herein.
In further embodiments, the embodiment of the invention also provides a kind of nonvolatile computer storage medias, calculate Machine storage medium is stored with computer executable instructions, which can be performed in above-mentioned any means embodiment Promotion wake up performance method;
As an implementation, nonvolatile computer storage media of the invention is stored with the executable finger of computer It enables, computer executable instructions setting are as follows:
Input signal is detected in the form of speech frame, determines that the speech frame is doubtful speech frame or non-voice Frame;
Efficient voice section is determined based on the doubtful speech frame and non-speech frame of judgement;
The efficient voice section is input to adaptive wake up in model, wherein the adaptive wake-up model is to be based on The wake-up word positive example sample of collection and the non-word negative data that wakes up carry out obtain after supervision ground deep neural network learns one A more disaggregated models;
The adaptive output for waking up model is obtained, wake-up device is determined whether based on the output.
Non-volatile computer readable storage medium storing program for executing may include storing program area and storage data area, wherein storage journey It sequence area can application program required for storage program area, at least one function;Storage data area can be stored to be waken up according to promotion The device of performance uses created data etc..In addition, non-volatile computer readable storage medium storing program for executing may include high speed with Machine access memory, can also include nonvolatile memory, a for example, at least disk memory, flush memory device or its His non-volatile solid state memory part.In some embodiments, it includes opposite that non-volatile computer readable storage medium storing program for executing is optional In the remotely located memory of processor, these remote memories can be by being connected to the network to the device for promoting wake-up performance. The example of above-mentioned network includes but is not limited to internet, intranet, local area network, mobile radio communication and combinations thereof.
The embodiment of the present invention also provides a kind of computer program product, and computer program product is non-volatile including being stored in Computer program on computer readable storage medium, computer program include program instruction, when program instruction is held by computer When row, so that computer is executed any of the above-described and promote the method for waking up performance.
Fig. 7 is the structural schematic diagram of electronic equipment provided in an embodiment of the present invention, as shown in fig. 7, the equipment includes: one Or multiple processors 710 and memory 720, in Fig. 7 by taking a processor 710 as an example.Promote setting for the method for waking up performance Standby can also include: input unit 730 and output device 740.Processor 710, memory 720, input unit 730 and output dress Setting 740 can be connected by bus or other modes, in Fig. 7 for being connected by bus.Memory 720 is above-mentioned non- Volatile computer readable storage medium storing program for executing.The non-volatile software journey that processor 710 is stored in memory 720 by operation Sequence, instruction and module, thereby executing the various function application and data processing of server, i.e. realization above method embodiment Promote the method for waking up performance.Input unit 730 can receive the number or character information of input, and generates and promote wake-up property The related key signals input of the user setting and function control of the device of energy.Output device 740 may include the display such as display screen Equipment.
Method provided by the embodiment of the present invention can be performed in the said goods, has the corresponding functional module of execution method and has Beneficial effect.The not technical detail of detailed description in the present embodiment, reference can be made to method provided by the embodiment of the present invention.
As an implementation, above-mentioned electronic apparatus application is in the device for promoting wake-up performance, comprising: at least one Processor;And the memory being connect at least one processor communication;Wherein, be stored with can be by least one for memory Manage device execute instruction, instruction executed by least one processor so that at least one processor can:
Input signal is detected in the form of speech frame, determines that the speech frame is doubtful speech frame or non-voice Frame;
Efficient voice section is determined based on the doubtful speech frame and non-speech frame of judgement;
The efficient voice section is input to adaptive wake up in model, wherein the adaptive wake-up model is to be based on The wake-up word positive example sample of collection and the non-word negative data that wakes up carry out obtain after supervision ground deep neural network learns one A more disaggregated models;
The adaptive output for waking up model is obtained, wake-up device is determined whether based on the output.
The electronic equipment of the embodiment of the present application exists in a variety of forms, including but not limited to:
(1) mobile communication equipment: the characteristics of this kind of equipment is that have mobile communication function, and to provide speech, data Communication is main target.This Terminal Type includes: smart phone (such as iPhone), multimedia handset, functional mobile phone and low Hold mobile phone etc..
(2) super mobile personal computer equipment: this kind of equipment belongs to the scope of personal computer, there is calculating and processing function Can, generally also have mobile Internet access characteristic.This Terminal Type includes: PDA, MID and UMPC equipment etc., such as iPad.
(3) portable entertainment device: this kind of equipment can show and play multimedia content.Such equipment include: audio, Video player (such as iPod), handheld device, e-book and intelligent toy and portable car-mounted navigation equipment.
(4) server: providing the equipment of the service of calculating, and the composition of server includes that processor, hard disk, memory, system are total Line etc., server is similar with general computer architecture, but due to needing to provide highly reliable service, in processing energy Power, stability, reliability, safety, scalability, manageability etc. are more demanding.
(5) other electronic devices with data interaction function.
The apparatus embodiments described above are merely exemplary, wherein unit can be as illustrated by the separation member Or may not be and be physically separated, component shown as a unit may or may not be physical unit, i.e., It can be located in one place, or may be distributed over multiple network units.It can select according to the actual needs therein Some or all of the modules achieves the purpose of the solution of this embodiment.Those of ordinary skill in the art are not paying creative labor In the case where dynamic, it can understand and implement.
Finally, it should be noted that the above embodiments are merely illustrative of the technical solutions of the present invention, rather than its limitations;Although Present invention has been described in detail with reference to the aforementioned embodiments, those skilled in the art should understand that: it still may be used To modify the technical solutions described in the foregoing embodiments or equivalent replacement of some of the technical features; And these are modified or replaceed, technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution spirit and Range.

Claims (10)

1. a kind of promote the method for waking up performance, comprising:
Input signal is detected in the form of speech frame, determines that the speech frame is doubtful speech frame or non-speech frame;
Efficient voice section is determined based on the doubtful speech frame and non-speech frame of judgement;
The efficient voice section is input to adaptive wake up in model, wherein the adaptive wake-up model is based on collection Wake-up word positive example sample and it is non-wake up word negative data carry out supervision ground deep neural network study after obtain more than one Disaggregated model;
The adaptive output for waking up model is obtained, wake-up device is determined whether based on the output.
2. according to the method described in claim 1, wherein the doubtful speech frame and non-speech frame based on judgement determines effectively Voice segments include:
If it is determined that occurring first continuously to preset the input signal of frame number being doubtful speech frame, determine that described first continuously presets frame First speech frame in several input signals is the starting point of the efficient voice section;
After the starting point for determining the efficient voice section, if it is determined that occurring second continuously to preset the input signal of frame number being non-language Sound frame or the end speech frame for detecting the input signal determine the head in the input signal of the described second continuous default frame number A speech frame is the terminal of the efficient voice section;
The terminal of starting point and the efficient voice section based on the determining efficient voice section is selected out from the input signal The efficient voice section.
3. according to the method described in claim 2, wherein, determining that the speech frame is doubtful speech frame or non-speech frame described Later, the method also includes:
If it is determined that continuing when the input signal for not occurring the first continuous default frame number is doubtful speech frame to the input signal It is detected and is determined until occurring first continuously to preset the input signal of frame number being doubtful speech frame;
After the starting point for determining the efficient voice section, if it is determined that the input signal for not occurring the second continuous default frame number is non- When speech frame, continue to detect the input signal and determine until there is the input letter of the second continuous default frame number Number be non-speech frame signal or until the input signal end speech frame.
4. according to the method described in claim 1, wherein, carrying out detecting it to input signal in the form of speech frame described Before, the method also includes:
Input signal is pre-processed to promote signal-to-noise ratio.
5. described to be pre-processed to input signal to promote signal-to-noise ratio and include: according to the method described in claim 4, wherein
Analog-to-digital conversion is carried out to the analog signal of acquisition, and configures sample rate and quantizating index, is converted into digital signal;
Time-domain information, frequency domain information, delay and/or energy loss based on the digital signal, to the mesh of the digital signal The voice in mark direction is enhanced to promote signal-to-noise ratio.
6. method according to any one of claims 1-5, wherein described to obtain the defeated of the adaptive wake-up model Out, the wake-up device if the output is wakes up word, determines whether that wake-up device includes: based on the output
The adaptive efficient voice section for waking up model output is obtained for the default confidence level for waking up word, described in judgement Whether confidence level is greater than default wake-up word threshold value;
If the confidence level is more than or equal to the default wake-up word threshold value, the equipment is waken up;
If the confidence level is less than the default wake-up word threshold value, the equipment is not waken up.
7. described based on adaptive the wake-ups model calculating efficient voice according to the method described in claim 6, wherein Section includes: for the default confidence level for waking up word
The posterior probability of each speech frame in the efficient voice section is calculated to obtain each speech frame for default wake-up word Confidence level;
Efficient voice section described in confidence calculations based on each speech frame is for the default confidence level for waking up word.
8. a kind of promote the device for waking up performance, comprising:
Determination module is configured to detect input signal in the form of speech frame, determines that the speech frame is doubtful voice Frame or non-speech frame;
Efficient voice section determining module, is configured to the doubtful speech frame determined and non-speech frame signal determines efficient voice Section;
Adaptive wake-up module is configured to for the efficient voice section to be input to adaptive wake up in model, wherein described adaptive It is that wake-up word positive example sample and non-wake-up word negative data based on collection have carried out supervision ground depth nerve net that model, which should be waken up, Two disaggregated models obtained after network study;
As a result output module is configured to obtain the adaptive output for waking up model, determines whether to wake up based on the output Equipment.
9. a kind of electronic equipment comprising: at least one processor, and deposited with what at least one described processor communication was connect Reservoir, wherein the memory be stored with can by least one described processor execute instruction, described instruction by it is described at least One processor executes, so that at least one described processor is able to carry out the step of any one of claim 1 to 7 the method Suddenly.
10. a kind of storage medium, is stored thereon with computer program, which is characterized in that real when described program is executed by processor The step of any one of existing claim 1 to 7 the method.
CN201910801354.9A 2019-08-28 2019-08-28 Promote the method and apparatus for waking up performance Pending CN110517670A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910801354.9A CN110517670A (en) 2019-08-28 2019-08-28 Promote the method and apparatus for waking up performance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910801354.9A CN110517670A (en) 2019-08-28 2019-08-28 Promote the method and apparatus for waking up performance

Publications (1)

Publication Number Publication Date
CN110517670A true CN110517670A (en) 2019-11-29

Family

ID=68628694

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910801354.9A Pending CN110517670A (en) 2019-08-28 2019-08-28 Promote the method and apparatus for waking up performance

Country Status (1)

Country Link
CN (1) CN110517670A (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110807909A (en) * 2019-12-09 2020-02-18 深圳云端生活科技有限公司 Radar and voice processing combined control method
CN111223497A (en) * 2020-01-06 2020-06-02 苏州思必驰信息科技有限公司 Nearby wake-up method and device for terminal, computing equipment and storage medium
CN111599371A (en) * 2020-05-19 2020-08-28 苏州奇梦者网络科技有限公司 Voice adding method, system, device and storage medium
CN111596882A (en) * 2020-04-02 2020-08-28 云知声智能科技股份有限公司 Distributed array alignment method
CN111653274A (en) * 2020-04-17 2020-09-11 北京声智科技有限公司 Method, device and storage medium for awakening word recognition
CN111653276A (en) * 2020-06-22 2020-09-11 四川长虹电器股份有限公司 Voice awakening system and method
CN111833869A (en) * 2020-07-01 2020-10-27 中关村科学城城市大脑股份有限公司 Voice interaction method and system applied to urban brain
CN111899722A (en) * 2020-08-11 2020-11-06 Oppo广东移动通信有限公司 Voice processing method and device and storage medium
CN112114886A (en) * 2020-09-17 2020-12-22 北京百度网讯科技有限公司 Method and device for acquiring false wake-up audio
CN112151015A (en) * 2020-09-03 2020-12-29 腾讯科技(深圳)有限公司 Keyword detection method and device, electronic equipment and storage medium
CN112669822A (en) * 2020-12-16 2021-04-16 爱驰汽车有限公司 Audio processing method and device, electronic equipment and storage medium
EP3923272A1 (en) * 2020-06-10 2021-12-15 Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd. Method and apparatus for adapting a wake-up model
CN117012206A (en) * 2023-10-07 2023-11-07 山东省智能机器人应用技术研究院 Man-machine voice interaction system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105632486A (en) * 2015-12-23 2016-06-01 北京奇虎科技有限公司 Voice wake-up method and device of intelligent hardware
US20170323202A1 (en) * 2016-05-06 2017-11-09 Fujitsu Limited Recognition apparatus based on deep neural network, training apparatus and methods thereof
CN107871506A (en) * 2017-11-15 2018-04-03 北京云知声信息技术有限公司 The awakening method and device of speech identifying function
CN108010515A (en) * 2017-11-21 2018-05-08 清华大学 A kind of speech terminals detection and awakening method and device
CN108877778A (en) * 2018-06-13 2018-11-23 百度在线网络技术(北京)有限公司 Sound end detecting method and equipment

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105632486A (en) * 2015-12-23 2016-06-01 北京奇虎科技有限公司 Voice wake-up method and device of intelligent hardware
US20170323202A1 (en) * 2016-05-06 2017-11-09 Fujitsu Limited Recognition apparatus based on deep neural network, training apparatus and methods thereof
CN107871506A (en) * 2017-11-15 2018-04-03 北京云知声信息技术有限公司 The awakening method and device of speech identifying function
CN108010515A (en) * 2017-11-21 2018-05-08 清华大学 A kind of speech terminals detection and awakening method and device
CN108877778A (en) * 2018-06-13 2018-11-23 百度在线网络技术(北京)有限公司 Sound end detecting method and equipment

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110807909A (en) * 2019-12-09 2020-02-18 深圳云端生活科技有限公司 Radar and voice processing combined control method
CN111223497A (en) * 2020-01-06 2020-06-02 苏州思必驰信息科技有限公司 Nearby wake-up method and device for terminal, computing equipment and storage medium
CN111223497B (en) * 2020-01-06 2022-04-19 思必驰科技股份有限公司 Nearby wake-up method and device for terminal, computing equipment and storage medium
CN111596882A (en) * 2020-04-02 2020-08-28 云知声智能科技股份有限公司 Distributed array alignment method
CN111596882B (en) * 2020-04-02 2023-05-26 云知声智能科技股份有限公司 Distributed array alignment method
CN111653274A (en) * 2020-04-17 2020-09-11 北京声智科技有限公司 Method, device and storage medium for awakening word recognition
CN111653274B (en) * 2020-04-17 2023-08-04 北京声智科技有限公司 Wake-up word recognition method, device and storage medium
CN111599371A (en) * 2020-05-19 2020-08-28 苏州奇梦者网络科技有限公司 Voice adding method, system, device and storage medium
CN111599371B (en) * 2020-05-19 2023-10-20 苏州奇梦者网络科技有限公司 Voice adding method, system, device and storage medium
EP3923272A1 (en) * 2020-06-10 2021-12-15 Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd. Method and apparatus for adapting a wake-up model
US11587550B2 (en) 2020-06-10 2023-02-21 Apollo Intelligent Connectivity (Beijing) Technology Co., Ltd. Method and apparatus for outputting information
CN111653276A (en) * 2020-06-22 2020-09-11 四川长虹电器股份有限公司 Voice awakening system and method
CN111833869B (en) * 2020-07-01 2022-02-11 中关村科学城城市大脑股份有限公司 Voice interaction method and system applied to urban brain
CN111833869A (en) * 2020-07-01 2020-10-27 中关村科学城城市大脑股份有限公司 Voice interaction method and system applied to urban brain
CN111899722A (en) * 2020-08-11 2020-11-06 Oppo广东移动通信有限公司 Voice processing method and device and storage medium
CN111899722B (en) * 2020-08-11 2024-02-06 Oppo广东移动通信有限公司 Voice processing method and device and storage medium
CN112151015A (en) * 2020-09-03 2020-12-29 腾讯科技(深圳)有限公司 Keyword detection method and device, electronic equipment and storage medium
CN112151015B (en) * 2020-09-03 2024-03-12 腾讯科技(深圳)有限公司 Keyword detection method, keyword detection device, electronic equipment and storage medium
CN112114886A (en) * 2020-09-17 2020-12-22 北京百度网讯科技有限公司 Method and device for acquiring false wake-up audio
CN112114886B (en) * 2020-09-17 2024-03-29 北京百度网讯科技有限公司 Acquisition method and device for false wake-up audio
CN112669822A (en) * 2020-12-16 2021-04-16 爱驰汽车有限公司 Audio processing method and device, electronic equipment and storage medium
CN117012206A (en) * 2023-10-07 2023-11-07 山东省智能机器人应用技术研究院 Man-machine voice interaction system
CN117012206B (en) * 2023-10-07 2024-01-16 山东省智能机器人应用技术研究院 Man-machine voice interaction system

Similar Documents

Publication Publication Date Title
CN110517670A (en) Promote the method and apparatus for waking up performance
CN110473539A (en) Promote the method and apparatus that voice wakes up performance
US11127416B2 (en) Method and apparatus for voice activity detection
CN110364143B (en) Voice awakening method and device and intelligent electronic equipment
CN105632486B (en) Voice awakening method and device of intelligent hardware
CN108597505B (en) Voice recognition method and device and terminal equipment
CN108899044A (en) Audio signal processing method and device
DE102018126133A1 (en) Generate dialog based on verification values
CN107767863A (en) voice awakening method, system and intelligent terminal
CN110570873B (en) Voiceprint wake-up method and device, computer equipment and storage medium
CN107360157A (en) A kind of user registering method, device and intelligent air conditioner
CN110600008A (en) Voice wake-up optimization method and system
CN112562742B (en) Voice processing method and device
CN110335593A (en) Sound end detecting method, device, equipment and storage medium
CN108917283A (en) A kind of intelligent refrigerator control method, system, intelligent refrigerator and cloud server
US20080215318A1 (en) Event recognition
CN111722696B (en) Voice data processing method and device for low-power-consumption equipment
CN109697981B (en) Voice interaction method, device, equipment and storage medium
CN111145763A (en) GRU-based voice recognition method and system in audio
CN111179915A (en) Age identification method and device based on voice
CN110503944A (en) The training of voice wake-up model and application method and device
CN111323783A (en) Scene recognition method and device, storage medium and electronic equipment
CN109994129A (en) Speech processing system, method and apparatus
CN109377993A (en) Intelligent voice system and its voice awakening method and intelligent sound equipment
CN110197663B (en) Control method and device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Applicant after: Sipic Technology Co.,Ltd.

Address before: 215123 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Applicant before: AI SPEECH Co.,Ltd.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20191129