CN110970016A

CN110970016A - Awakening model generation method, intelligent terminal awakening method and device

Info

Publication number: CN110970016A
Application number: CN201911028892.5A
Authority: CN
Inventors: 白二伟; 倪合强; 宋志�; 姚寿柏
Original assignee: Suning Cloud Computing Co Ltd
Current assignee: Jiangsu Biying Technology Co ltd; Jiangsu Suning Cloud Computing Co ltd
Priority date: 2019-10-28
Filing date: 2019-10-28
Publication date: 2020-04-07
Anticipated expiration: 2039-10-28
Also published as: CA3158930A1; CN110970016B; WO2021082572A1

Abstract

The invention discloses a wake-up model generation method, an intelligent terminal wake-up method and a device, belonging to the technical field of voice wake-up, wherein the wake-up model generation method comprises the following steps: marking the start-stop time of each awakening word contained in the awakening word audio in the sample audio set to obtain the marked awakening word audio, wherein the time length of the awakening word audio is not fixed; denoising the marked awakening word audio by using negative sample audio containing background noise to obtain positive sample audio; respectively extracting a plurality of audio frame characteristics from the positive sample audio and the negative sample audio, and labeling the frame labels of the positive sample audio and the negative sample audio to obtain a plurality of audio training samples; and training the recurrent neural network by using a plurality of audio training samples to generate a wake-up model. According to the embodiment of the invention, the model training is carried out by using the variable-length input cyclic neural network, so that the operation of manually intercepting the sample can be avoided, and the awakening effect of the intelligent terminal can be improved.

Description

Awakening model generation method, intelligent terminal awakening method and device

Technical Field

The invention relates to the technical field of data security, in particular to a wake-up model generation method, an intelligent terminal wake-up method and an intelligent terminal wake-up device.

Background

At present, the voice awakening application fields are wide, such as robots, mobile phones, wearable devices, smart homes, vehicles and the like. Different intelligent terminal can have different words of awakening up, says specific word of awakening up when the user, can make intelligent terminal switch to operating condition from standby state, only accomplish the switching of state fast, accurately, the user just can be nearly without perception directly use intelligent terminal's other functions, consequently, improve the effect of awakening up crucial.

In the prior art, a wake-up technology based on a neural network is mainly adopted for waking up an intelligent terminal. In the data preparation stage, positive sample data needs to be manually intercepted to a fixed time length t, and the time length for recording the awakening words cannot exceed the time length t, so that the labor cost is greatly increased, and the awakening voice with slow voice speed cannot be identified; in addition, the time for awakening words is possibly short, so that the training on the neural network is insufficient, and the awakening effect of the intelligent terminal is affected finally; in addition, in the terminal wake-up stage, the neural network needs to process the audio with the time length t in the memory of the terminal every time, so that a large amount of repeated data between two adjacent time lengths t needs to be processed, and the calculation time and the power consumption of the terminal are increased.

Disclosure of Invention

The invention aims to at least solve one technical problem in the prior art or the related art, and provides a wake-up model generation method, an intelligent terminal wake-up method and an intelligent terminal wake-up device.

The embodiment of the invention provides the following specific technical scheme:

in a first aspect, a method for generating a wake-up model is provided, where the method includes:

marking the start-stop time of each awakening word contained in the awakening word audio in the sample audio set to obtain the marked awakening word audio, wherein the time length of the awakening word audio is not fixed;

denoising the marked awakening word audio by using negative sample audio containing background noise to obtain positive sample audio;

respectively extracting a plurality of audio frame characteristics from the positive sample audio and the negative sample audio, and labeling frame labels of the positive sample audio and the negative sample audio to obtain a plurality of audio training samples;

and training a recurrent neural network by using the plurality of audio training samples to generate a wake-up model.

Further, the labeling the start-stop time of each wakeup word included in the wakeup word audio in the sample audio set to obtain the labeled wakeup word audio includes:

identifying at least one key audio segment in the wake word audio that contains only the wake word;

and respectively labeling the start-stop time of each awakening word according to the respective start-stop time of each key audio segment to obtain the labeled audio.

Further, the denoising the labeled wakeup word audio by using a negative sample audio containing background noise to obtain a positive sample audio, including:

intercepting a negative sample audio frequency segment with the same time length as the marked awakening word audio frequency from the negative sample audio frequency;

and adjusting the amplitude mean value of the negative sample audio frequency segment, and mixing and adding noise to the marked audio frequency by using the adjusted negative sample audio frequency segment to obtain the positive sample audio frequency.

Further, the frame label includes a positive label, a negative label, and a middle label, and the labeling of the frame label is performed on the positive sample audio and the negative sample audio to obtain a plurality of audio training samples, including:

for each audio frame of the positive sample audio, judging whether part or all of the audio frame falls into the start-stop time period of any awakening word, and if so, marking the audio frame as a middle label;

if not, judging whether the previous audio frame of the audio frames falls into the start-stop time period of any awakening word or not, if so, marking the audio frames as positive labels, otherwise, marking the audio frames as negative labels, and if not, judging whether the previous audio frame of the audio frames falls into the start-stop time period of any awakening word or not, and the audio frames do not contain the end time of the awakening word for the first time;

for each audio frame of the negative sample audio, marking the audio frame as a negative label.

In a second aspect, a method for waking up an intelligent terminal is provided, where the method includes:

the intelligent terminal acquires real-time audio at the current moment;

extracting a plurality of audio frame features from the real-time audio;

sequentially inputting the extracted audio frame characteristics into a pre-deployed awakening model, and calculating by combining the state stored at the previous moment of the awakening model to obtain an awakening result of whether the real-time audio contains awakening words;

wherein the wake-up model is generated by using the wake-up model generation method of the first aspect.

In a third aspect, an apparatus for generating a wake model is provided, the apparatus comprising:

the first labeling module is used for labeling the start-stop time of each awakening word contained in the awakening word audio in the sample audio set to obtain the labeled awakening word audio, wherein the time length of the awakening word audio is not fixed;

the noise adding processing module is used for adding noise to the marked awakening word audio by using negative sample audio containing background noise to obtain positive sample audio;

the characteristic extraction module is used for respectively extracting a plurality of audio frame characteristics from the positive sample audio and the negative sample audio;

the second labeling module is used for labeling the frame labels of the positive sample audio and the negative sample audio to obtain a plurality of audio training samples;

and the model generation module is used for training the recurrent neural network by using the plurality of audio training samples to generate a wake-up model.

Further, the first labeling module is specifically configured to:

Further, the denoising processing module is specifically configured to:

Further, the frame tag includes a positive tag, a negative tag, and a middle tag, and the second labeling module is specifically configured to:

In a fourth aspect, an intelligent terminal wake-up device is provided, the device including:

the audio acquisition module is used for acquiring real-time audio at the current moment by the intelligent terminal;

the feature extraction module is used for extracting a plurality of audio frame features from the real-time audio;

the model identification module is used for sequentially inputting the extracted audio frame characteristics into a pre-deployed awakening model and calculating by combining the state stored at the previous moment of the awakening model so as to obtain an awakening result of whether the real-time audio contains awakening words or not;

The technical scheme provided by the embodiment of the invention has the following beneficial effects:

1. because the time length of the awakening word audio is not fixed, the awakening word audio is used as variable-length input data to train the Recurrent Neural Network (RNN), so that the manual data interception is avoided, the manual data processing flow is reduced, the labor cost is saved, and the awakening voice with lower speed can be identified;

2. the sample audio set can contain long audio, so that the RNN can be trained uninterruptedly, the recognition precision of the awakening words is improved, and the awakening effect of the intelligent terminal is improved;

3. in the terminal awakening process, for each frame of audio newly added into the terminal memory, the old data does not need to be repeatedly calculated, and the calculation time and the power consumption of the terminal are reduced.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic flowchart illustrating a method for generating a wake-up model according to an embodiment of the present invention;

fig. 2 is a schematic diagram illustrating a start-stop time labeling of a wakeup word according to an embodiment of the present invention;

FIG. 3 is a schematic diagram illustrating MFCC feature vector acquisition provided by an embodiment of the present invention;

FIG. 4 is a labeling diagram of a frame tag according to an embodiment of the present invention;

fig. 5 is a flowchart illustrating an intelligent terminal wake-up method according to an embodiment of the present invention;

fig. 6a illustrates a schematic diagram of a wake-up process in a terminal memory when t is 1 according to an embodiment of the present invention;

fig. 6b illustrates a schematic diagram of a wake-up process in a terminal memory when t is M according to an embodiment of the present invention;

fig. 7 is a schematic structural diagram of a wake model generation apparatus according to an embodiment of the present invention;

fig. 8 shows a schematic structural diagram of an intelligent terminal wake-up apparatus according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In the description of the present application, it is to be understood that the terms "first," "second," and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. In addition, in the description of the present application, "a plurality" means two or more unless otherwise specified.

Example one

An embodiment of the present invention provides a method for generating a wake-up model, where the method may be applied to a server, and as shown in fig. 1, the method may include the steps of:

and 101, marking the start-stop time of each awakening word contained in the awakening word audio in the sample audio set to obtain the marked awakening word audio, wherein the time length of the awakening word audio is not fixed.

The sample audio set comprises a plurality of wake-up word audios, and each wake-up word audio comprises at least one wake-up word. In a specific implementation, a plurality of wake word audios including a wake word may be recorded in a quiet environment, where when recording a wake word audio, a certain time interval needs to be reserved between adjacent wake words, and the content of each wake word is the same, for example, "biu min biu". In this embodiment, the time length of each wake word audio is approximately several seconds to several minutes, and the time length of the wake word is approximately 1 second or so.

Specifically, at least one key audio segment only containing the awakening word in the awakening word audio is identified, and the start-stop time of each awakening word is respectively labeled according to the respective start-stop time of each key audio segment to obtain a labeled audio. In specific implementation, the start-stop time of each awakening word in the awakening word audio can be labeled on the server in a manual mode, so that the labeled awakening word audio is obtained.

Wherein the start-stop time includes a start time and an end time, and the wake-up word is labeled with the start time and the end time, for example, it can be start_NAnd end_NFig. 2 shows the start time and the end time of the nth wakeup word, and fig. 2 shows a schematic diagram of the start-stop time labeling of the wakeup word provided by the embodiment of the present invention, where black parts are represented as the wakeup words.

And 102, denoising the marked awakening word audio by using the negative sample audio containing the background noise to obtain a positive sample audio.

Background noise under different scenes can be prerecorded to obtain negative sample audio, and various scenes can be set under different scenes, such as scenes during playing TV, scenes during cooking, other scenes, and the like.

Specifically, a negative sample audio segment with the same duration as the labeled wake-up word audio is intercepted from the negative sample audio, the amplitude average value of the negative sample audio segment is adjusted, and the adjusted negative sample audio segment is used for mixing and adding noise to the labeled audio to obtain a positive sample audio.

In a specific implementation, the amplitude mean value of the negative-sample audio segment may be adjusted to be equal to the amplitude mean value of the labeled audio, and then the amplitude mean value of the negative-sample audio segment is reduced to a preset percentage of the amplitude mean value, where the preset percentage may be between 5% and 10%.

In this embodiment, to amplify the positive sample audio data set, N negative sample audios may be used to perform noise addition on each of the M wake-up word audios, so as to obtain N × M positive sample audios.

103, extracting a plurality of audio frame features from the positive sample audio and the negative sample audio respectively, and labeling the frame labels of the positive sample audio and the negative sample audio to obtain a plurality of audio training samples.

Specifically, the process of extracting a plurality of audio frame features from the positive sample audio and the negative sample audio respectively may include:

the method includes extracting a plurality of audio frame features from each audio frame of a positive sample audio and each audio frame of a negative sample audio, and generating a feature spectrogram of the positive sample audio and a feature spectrogram of the negative sample audio, where the audio frame features may specifically be Mel-Frequency Cepstrum Coefficient features, the feature spectrogram is a Mel-Frequency Cepstrum Coefficient (MFCC) spectrogram, and each feature vector in the Mel-Frequency Cepstrum Coefficient represents an MFCC feature vector of each audio frame.

Fig. 3 shows a schematic diagram of obtaining an MFCC feature vector according to an embodiment of the present invention. As shown in fig. 3, for each positive sample audio and each negative sample audio, the window width W, the moving step S and the mel-frequency cepstrum coefficient C can be preset respectively_MelAnd calculating the characteristics of the mel frequency cepstrum coefficients to generate a mel cepstrum map.

Specifically, the frame labels are labeled for the positive sample audio and the negative sample audio, where the frame labels include a positive label, a negative label, and a middle label, and the process may include:

judging whether part or all of the audio frames fall into the start-stop time period of any awakening word or not aiming at each audio frame of the positive sample audio, and if so, marking the audio frames as middle labels; if not, judging whether the previous audio frame of the audio frame falls into the start-stop time period of any awakening word or not, if so, marking the audio frame as a positive label, otherwise, marking the audio frame as a negative label, and if not, judging whether the previous audio frame of the audio frame falls into the start-stop time period of any awakening word or not, and if not, marking the audio frame as a negative label; for each audio frame of negative sample audio, the audio frame is marked as a negative label.

In this embodiment, the Positive label, the Negative label, and the Middle label may be respectively denoted as "Positive", "Negative", "Middle", or "1", "-1", and "0".

Fig. 4 is a schematic diagram illustrating a labeling of a frame tag according to an embodiment of the present invention. As shown in fig. 4, assuming that the start time of the window is denoted as t and the window width is w, for each audio frame of the positive sample audio, if the audio frame falls outside the start-stop time period of any wakeup word, the audio frame is labeled as "Negative", that is: (end)_N-1＜t)&&(t+w＜start_N) (ii) a If part or all of the audio frame falls within the start-stop time period of any awakening word, marking the audio frame as "Middle", namely: (start)_N＜t+w)&&(t＜end_N) (ii) a If the previous audio frame of the audio frame falls into the start-stop time period of any awakening word and the audio frame does not contain the end time of the awakening word for the first time, namely: (end)_N≤t)&&(t-1＜end_N) Then the audio frame is marked as "Positive".

It will be appreciated that each audio frame of Negative sample audio is labeled "Negative".

And 104, training the recurrent neural network by using a plurality of audio training samples to generate a wake-up model.

Specifically, for the Nth audio frame of each audio training sample, the frame characteristics of the audio frame are used as the input number of the input layer of the recurrent neural network at the t momentAccording to the method, the frame label of the audio frame is taken as the output result of the output layer t moment of the recurrent neural network and is combined with the state value S of the previous moment of the hidden layer t moment of the recurrent neural network_t-1Calculating the state value S of the hidden layer t moment of the recurrent neural network_tAnd sequentially calculating state values of all moments of the hidden layer of the recurrent neural network to generate an awakening model.

It should be noted that, after the wake-up model is generated, the embodiment of the present invention may deploy the wake-up model to the intelligent terminal, so as to perform wake-up processing on the intelligent terminal by using the wake-up model.

The embodiment of the invention provides a method for generating a wake-up module, which is characterized in that the time length of the wake-up word audio is not fixed, and the wake-up word audio is used as variable-length input data to train a Recurrent Neural Network (RNN), so that the data is prevented from being intercepted manually, the labor cost is saved, and the data with slower speed can be identified; meanwhile, the sample audio set can contain long audio, so that the RNN can be trained uninterruptedly, the recognition precision of the awakening words is improved, and the awakening effect of the intelligent terminal is improved.

Example two

An embodiment of the present invention provides an intelligent terminal wake-up method, which may be applied to an intelligent terminal, where the intelligent terminal is pre-deployed with a wake-up model generated based on the wake-up model generation method in the first embodiment, and as shown in fig. 5, the method may include the steps of:

501, the intelligent terminal obtains the real-time audio at the current moment.

Specifically, the intelligent terminal may utilize a microphone to capture real-time audio at the current time in the scene. The intelligent terminal includes but is not limited to a robot, a smart phone, a wearable device, a smart home, a vehicle-mounted terminal, and the like.

A plurality of audio frame features are extracted from real-time audio 502.

Specifically, the method comprises the steps of setting a window width W, a moving step S and a Mel frequency cepstrum coefficient C_MelExtracting Mel frequency inverse spectrum from each audio frame of real-time audioAnd counting the features to obtain a plurality of audio frame features.

Further, to improve the recognition accuracy of the wake-up word and improve the wake-up effect, before executing step 202, the method provided in the embodiment of the present invention may further include:

the real-time audio at the current time is preprocessed, wherein the preprocessing includes but is not limited to echo cancellation and noise reduction processing.

503, sequentially inputting the extracted audio frame features into a pre-deployed wake-up model, and calculating by combining the state saved at the previous moment of the wake-up model to obtain a wake-up result of whether the real-time audio contains a wake-up word.

Specifically, according to the extracted time sequence of the plurality of audio frame features corresponding to the real-time audio, the audio frame features are sequentially input into the wake-up model, calculation is performed by combining the state stored at the previous moment of the wake-up model, according to the output result of the wake-up model, the frame tags corresponding to the plurality of audio frames of the real-time audio at the current moment and the state of the wake-up model at the current moment are obtained, the state of the wake-up model at the current moment is stored, and the wake-up result whether the real-time audio contains the wake-up word or not is obtained according to the frame tags corresponding to the plurality of audio frames, wherein when the frame tags corresponding to the plurality of audio frames contain the positive tags, the real-time audio is determined to contain the wake-.

The following describes the method for waking up an intelligent terminal according to an embodiment of the present invention with reference to fig. 6a to 6 b.

Assuming that the memory of the intelligent terminal can only store N frames of data each time, as shown in fig. 6a, when the intelligent terminal is powered on for the first time, the real-time audio at the time when t is 1 is loaded into the memory, and the state S at the previous time of the RNN network in the model is awakened₀To be 0, the real-time audio feature at the time t-1 needs to be input into the RNN network of the wake-up model, and the state S in the RNN network at the time t-1 is obtained₁And outputting the recognition result. As shown in fig. 6b, at any time after the intelligent terminal is powered on, assuming that t is M, where M is greater than 1, only the real-time audio frame feature newly added to the memory when t is M needs to be input into the RNN network of the wake-up model, and the last one of the RNN network is combined with the real-time audio frame featureState S of the moment save_M-1The calculation is performed without the need to repeatedly calculate all the data in memory.

In this embodiment, because the low-end chip is mostly adopted in the current intelligent terminal, the capacity of the memory of the terminal is limited, and in the prior art, in the terminal wake-up stage, because the neural network needs to process the audio frequency of the time length t in the memory of the terminal every time, a large amount of repeated data needs to be processed between two adjacent time lengths t, which results in increasing the calculation time and power consumption of the terminal. The invention judges whether the real-time audio contains the awakening words or not by utilizing the RNN awakening model input in a variable length mode without repeatedly calculating old data, thereby reducing the calculated amount, quickening the processing speed and reducing the power consumption.

EXAMPLE III

As an implementation of the wake-up model generation method provided in the first embodiment, an embodiment of the present invention provides a wake-up model generation apparatus, as shown in fig. 7, the apparatus includes:

a first labeling module 71, configured to label start and end times of each wakeup word included in a wakeup word audio in the sample audio set to obtain a labeled wakeup word audio, where a time length of the wakeup word audio is not fixed;

a noise adding processing module 72, configured to add noise to the labeled wake-up word audio by using a negative sample audio containing background noise to obtain a positive sample audio;

a feature extraction module 73, configured to extract a plurality of audio frame features from the positive sample audio and the negative sample audio, respectively;

a second labeling module 74, configured to label frame labels for the positive sample audio and the negative sample audio to obtain multiple audio training samples;

and a model generating module 75, configured to train the recurrent neural network using a plurality of audio training samples, and generate a wake-up model.

Further, the first labeling module 71 is specifically configured to:

identifying at least one key audio segment in the wake word audio that only contains the wake word;

and respectively labeling the start-stop time of each awakening word according to the respective start-stop time of each key audio segment to obtain labeled audio.

Further, the noise processing module 72 is specifically configured to:

and adjusting the amplitude mean value of the negative sample audio frequency segment, and mixing and adding noise to the marked audio frequency by using the adjusted negative sample audio frequency segment to obtain a positive sample audio frequency.

Further, the frame tags include a positive tag, a negative tag, and a middle tag, and the second labeling module 74 is specifically configured to:

judging whether part or all of the audio frames fall into the start-stop time period of any awakening word or not aiming at each audio frame of the positive sample audio, and if so, marking the audio frames as middle labels;

if not, judging whether the previous audio frame of the audio frame falls into the start-stop time period of any awakening word or not, if so, marking the audio frame as a positive label, otherwise, marking the audio frame as a negative label, and if not, judging whether the previous audio frame of the audio frame falls into the start-stop time period of any awakening word or not, and if not, marking the audio frame as a negative label;

for each audio frame of negative sample audio, the audio frame is marked as a negative label.

The wake-up model generation device provided by the embodiment of the invention belongs to the same inventive concept as the wake-up model generation method provided by the first embodiment of the invention, can execute the wake-up model generation method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the wake-up model generation method. For details of the technology that are not described in detail in the embodiments of the present invention, reference may be made to the method for generating the wake-up model provided in the embodiments of the present invention, which is not described herein again.

Example four

As an implementation of the method for waking up an intelligent terminal provided in the second embodiment, an embodiment of the present invention provides an apparatus for waking up an intelligent terminal, where as shown in fig. 8, the apparatus includes:

the audio acquisition module 81 is used for the intelligent terminal to acquire real-time audio at the current moment;

a feature extraction module 82, configured to extract a plurality of audio frame features from the real-time audio;

the model identification module 83 is configured to sequentially input the extracted multiple audio frame features into a pre-deployed wake-up model, and perform calculation by combining a state stored at a previous moment of the wake-up model to obtain a wake-up result indicating whether a real-time audio contains a wake-up word;

the wake-up model is generated by using the wake-up model generation method in the first embodiment.

Further, in order to improve the recognition accuracy of the wake-up word and improve the wake-up effect, the apparatus may further include:

and the preprocessing module is used for preprocessing the real-time audio at the current moment, wherein the preprocessing includes but is not limited to echo cancellation and noise reduction processing.

The feature extraction module 82 is further configured to extract a plurality of audio frame features from the pre-processed real-time audio.

The intelligent terminal awakening device provided by the embodiment of the invention belongs to the same invention concept as the intelligent terminal awakening method provided by the second embodiment of the invention, can execute the intelligent terminal awakening method provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of executing the service request processing method. For technical details that are not described in detail in the embodiments of the present invention, reference may be made to the method for waking up an intelligent terminal provided in the embodiments of the present invention, and details are not described herein again.

In addition, another embodiment of the present invention further provides a computer device, including:

one or more processors;

a memory;

a program stored in the memory, which when executed by the one or more processors, causes the processors to perform the steps of the wake model generation method as described in the above embodiments.

one or more processors;

a memory;

a program stored in the memory, which when executed by the one or more processors, causes the processors to perform the steps of the intelligent terminal wake-up method as described in the above embodiments.

Furthermore, another embodiment of the present invention further provides a computer-readable storage medium, which stores a program, and when the program is executed by a processor, the program causes the processor to execute the steps of the wake model generation method according to the above embodiment.

In addition, another embodiment of the present invention further provides a computer-readable storage medium, which stores a program, and when the program is executed by a processor, the program causes the processor to perform the steps of the intelligent terminal wake-up method according to the above embodiment.

As will be appreciated by one of skill in the art, embodiments of the present invention may be provided as a method, apparatus, or computer program product. Accordingly, embodiments of the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, embodiments of the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

Embodiments of the present invention are described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart lucu flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the embodiments of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A wake model generation method, the method comprising:

2. The method according to claim 1, wherein the labeling the start-stop time of each wakeup word included in the wakeup word audio in the sample audio set to obtain the labeled wakeup word audio comprises:

3. The method of claim 1, wherein the denoising the labeled wakeup word audio by using a negative sample audio containing background noise to obtain a positive sample audio comprises:

4. The method according to any one of claims 1 to 3, wherein the frame labels comprise a positive label, a negative label and a middle label, and the labeling of the frame labels for the positive sample audio and the negative sample audio to obtain a plurality of audio training samples comprises:

5. An intelligent terminal awakening method is characterized by comprising the following steps:

the intelligent terminal acquires real-time audio at the current moment;

extracting a plurality of audio frame features from the real-time audio;

wherein the wake-up model is generated using the wake-up model generation method of any one of claims 1 to 4.

6. An apparatus for wake pattern generation, the apparatus comprising:

7. The apparatus according to claim 6, wherein the first labeling module is specifically configured to:

8. The apparatus of claim 6, wherein the denoising processing module is specifically configured to:

9. The apparatus according to any one of claims 6 to 8, wherein the frame tag comprises a positive tag, a negative tag, and a middle tag, and the second labeling module is specifically configured to:

10. An intelligent terminal awakening device, characterized in that the device includes: