CN112885341A

CN112885341A - Voice wake-up method and device, electronic equipment and storage medium

Info

Publication number: CN112885341A
Application number: CN201911199202.2A
Authority: CN
Inventors: 杜国威
Original assignee: Beijing Anyun Century Technology Co Ltd
Current assignee: Beijing Anyun Century Technology Co Ltd
Priority date: 2019-11-29
Filing date: 2019-11-29
Publication date: 2021-06-01

Abstract

The invention relates to the technical field of voice awakening of electronic equipment, in particular to a voice awakening method and device, electronic equipment and a storage medium. The method comprises the following steps: carrying out pre-awakening word detection on the received first audio data; if the first audio data contains any pre-awakening word in the pre-awakening word set, receiving second audio data within a first set time period after the first audio data is received; performing awakening word detection on the received second audio data; if the second audio data contains the awakening words, awakening the electronic equipment; and the pre-awakening word set does not contain the awakening word. When the pre-awakening words are detected, the collection and the receiving of the second sound data are continuously kept in the first set time period, and if the second sound data contain the awakening words, the electronic equipment is directly awakened. The embodiment of the invention establishes the relation between the pre-awakening word and the awakening process, and improves the voice awakening speed of the electronic equipment by using the mouth error of the user.

Description

Voice wake-up method and device, electronic equipment and storage medium

Technical Field

The invention relates to the technical field of voice awakening of electronic equipment, in particular to a voice awakening method and device, electronic equipment and a storage medium.

Background

The voice interaction uses sound waves to transmit information, is the most original and most convenient information transmission mode for human beings, has the advantage of non-contact, is convenient to collect and low in cost, can also be remotely operated, and is a more natural interaction mode for users.

In recent years, with the development of automatic speech recognition technology, people have been able to convert input speech into a corresponding instruction language or text, so that electronic devices such as smart speakers, smart televisions, and smart televisions can "understand" the language of human beings, and execute instructions that people want according to recognized contents, thereby realizing speech interaction between people and the electronic devices.

When the user does not perform voice interaction, the electronic device is usually in a standby state, and waits for the user to issue a wake-up command. Currently, it is common practice in the industry to wake up a sound box by setting a wake-up word, for example: "Tianmaoling", "Xiao ai classmates" and "Ding Dang" etc. The wake-up of the electronic device is achieved without delay, which provides a good use experience for the user of the electronic device.

Therefore, how to increase the voice wake-up speed of the electronic device is a technical problem to be solved urgently.

Disclosure of Invention

The invention aims to provide a voice awakening method, a voice awakening device, electronic equipment and a storage medium, so as to improve the voice awakening speed of the electronic equipment.

The embodiment of the invention provides the following scheme:

in a first aspect, an embodiment of the present invention provides a voice wake-up method, applied to an electronic device, including:

carrying out pre-awakening word detection on the received first audio data;

if the first audio data contains any pre-awakening word in a pre-awakening word set, receiving second audio data within a first set time period after the first audio data is received;

performing awakening word detection on the received second audio data;

if the second audio data contains a wake-up word, waking up the electronic equipment;

wherein the pre-wake-up word set does not include the wake-up word.

In a possible embodiment, before the performing wake word detection on the received second audio data, the method further comprises:

reducing the set number threshold of the awakening word syllables;

when the second audio data contains the awakening word similar sections with the set number threshold, the second audio data is indicated to contain the awakening words; and the syllables with similarity between syllable features corresponding to the syllables of the awakening word being more than or equal to a similarity matching threshold are similar syllables of the awakening word.

In a possible embodiment, before performing the wake word detection on the received second audio data, the method further comprises: reducing the size of the similarity matching threshold.

In a possible embodiment, after the pre-wake word detection on the received first audio data, the method further includes:

if the first audio data contains any pre-awakening word in the pre-awakening word set, converting the electronic equipment from a standby state to a working state, but forbidding giving the electronic equipment working authority;

and if the second audio data contains the awakening words, giving work permission to the electronic equipment.

In a possible embodiment, before the pre-wake word detection on the received first audio data, the method further includes:

communicating with electronic equipment of other manufacturers in a set area to obtain awakening words corresponding to the electronic equipment of the other manufacturers;

and adding the awakening words corresponding to the electronic equipment of the other manufacturers into the pre-awakening word set as pre-awakening words.

In a possible embodiment, after the detecting the wake word for the received second audio data, the method further includes:

and if the second audio data does not contain the awakening word, not awakening the electronic equipment.

In a possible embodiment, before performing pre-wakeup word detection on the received first audio data, the method further includes:

performing awakening word detection on the received first audio data;

and if the first audio data contains the awakening word, awakening the electronic equipment.

In a second aspect, an embodiment of the present invention provides a voice wake-up apparatus, which is applied to an electronic device, and includes:

the audio receiving module is used for receiving audio data;

the pre-awakening word detection module is used for carrying out pre-awakening word detection on the received first audio data;

the awakening word detection module is used for carrying out awakening word detection on the received second audio data;

the control module is used for controlling the audio receiving module to receive the second audio data within a first set time period after the first audio data is received when the first audio data contains any pre-awakening word in a pre-awakening word set, and awakening the electronic equipment when the second audio data contains the awakening word;

the storage module is used for storing the pre-awakening word set and the awakening words;

wherein the set of pre-wake words does not include the wake word.

In a third aspect, an embodiment of the present invention provides an electronic device, including:

a memory for storing a computer program;

a processor for executing the computer program for carrying out the steps of the voice wake-up method as defined in any one of the above first aspects.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps of the voice wake-up method according to any one of the above first aspects.

Compared with the prior art, the invention has the following advantages and beneficial effects:

because many electronic devices exist in life, in order to ensure the accuracy of user awakening, electronic devices of different manufacturers have unique awakening words, and when a user calls the awakening word of the electronic device B at the electronic device A, the electronic device A can use the awakening word of the electronic device B as a nonsense sentence and cannot be awakened by the electronic device B. The invention gives a new meaning to the nonsense sentence, namely the user mistakenly calls the wrong awakening word of the current electronic equipment, which means that the user really wants to awaken the current electronic equipment. The invention starts from the application scene, when the preset pre-awakening word is detected, the collection and the receiving of the second sound data are continuously kept in the first set time period, and if the second sound data contains the preset awakening word, the electronic equipment is directly awakened. The invention establishes the relation between the pre-awakening words and the awakening process, thereby improving the voice awakening speed of the electronic equipment under the application scene of 'mistaken calling of the user by mistake and the awakening words of the current electronic equipment'.

Furthermore, when the wakeup word detection is performed on the received second audio data, the size of the threshold value of the set number of the syllables of the wakeup word can be reduced, and compared with the normal wakeup word detection, the second audio data can be determined to contain the wakeup word under the condition that the second audio data contains fewer similar syllables of the wakeup word, so that the detection of the wakeup word can be easily completed by the second audio data, and the wakeup word detection speed of the second audio data is increased.

Furthermore, when the received second audio data is subjected to awakening word detection, the size of the similarity matching threshold can be reduced, and compared with the normal awakening word detection, the detection of the awakening word by the second audio data is easy to complete, so that the awakening word detection speed of the second audio data is increased.

Furthermore, when the pre-awakening words are detected to be contained in the first audio data, the electronic equipment is directly switched from the standby state to the working state, but the working authority of the electronic equipment is not given, and then once the awakening words are detected to be contained in the second audio data, the working authority of the electronic equipment is directly given, so that the electronic equipment is immediately put into operation, and the awakening speed of the electronic equipment is improved.

Furthermore, the invention also provides an updating method of the pre-awakening word set, which considers that the probability of displaying wrong awakening words is higher when a user contacts various electronic devices in daily work and life, namely, awakening words corresponding to electronic devices of other manufacturers in a set area are added into the pre-awakening word set as the pre-awakening words, and the awakening words frequently called by the user in daily work and life are added into the pre-awakening word set, so that the effective updating of the pre-awakening word set is realized.

Furthermore, the method and the device perform awakening word detection on the received first audio data before performing pre-awakening word detection on the received first audio data, and if the first audio data contains any awakening word in the awakening word set, the electronic device is directly awakened, so that the voice awakening speed of the method and the device is ensured, and the influence on normal awakening of the electronic device for acquiring the pre-awakening word is avoided.

Drawings

In order to more clearly illustrate the embodiments of the present specification or the technical solutions in the prior art, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present specification, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without creative efforts.

Fig. 1 is a flowchart of a possible voice wake-up method according to an embodiment of the present invention;

fig. 2 is a flowchart of a method for detecting a wake-up word of the received second audio data according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a possible voice wake-up apparatus according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the accompanying drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, rather than all embodiments, and all other embodiments obtained by those skilled in the art based on the embodiments of the present invention belong to the scope of protection of the embodiments of the present invention.

Referring to fig. 1, fig. 1 is a flowchart of a voice wake-up method according to an embodiment of the present invention, including the following steps:

and step 11, performing pre-awakening word detection on the received first audio data.

Wherein the pre-wake-up word set does not include the wake-up word.

Specifically, because many electronic devices exist in life, in order to ensure the accuracy of user awakening, electronic devices of different manufacturers have unique awakening words, and when a user calls the awakening word of the electronic device B at the electronic device a, the electronic device a will use the awakening word of the electronic device B as a nonsense sentence and will not be awakened by the electronic device B. However, the inventor of the present invention believes that, because there are many current electronic devices and many types of wake-up words, sometimes when a user mistakenly calls a wrong wake-up word of a current electronic device, the user actually wants to wake up the current electronic device, and at this time, the wrong wake-up word is not a "nonsense sentence", so in the embodiment of the present invention, the wake-up words corresponding to electronic devices of different manufacturers are used as the pre-wake-up words of the embodiment of the present invention, so as to fully utilize the wrong wake-up word of the user to improve the voice wake-up experience of the user.

Specifically, the electronic device may be an intelligent television, an intelligent refrigerator, an intelligent sound box, and the like, where an intelligent sound box is taken as an example, and the awakening word of the small sound box in the embodiment of the present invention is "small sound, then" small degree "corresponding to the small sound box," jingle "corresponding to the Tencent sound box, and" small love classmate "corresponding to the millet sound box may be taken as the pre-awakening words in the embodiment of the present invention, and table 1 below is an example of a set of pre-awakening words.

TABLE 1

Serial number	Pre-awakening word
		1	Small degree and smallness
2	Jingle Dingdang
		3	Classmates of love

Specifically, the pre-awakening word detection algorithm can be implemented by using a gaussian mixture model-hidden markov model, the model is a very wide model based on probability statistics used in the speech recognition technology and is often used for representing the distribution of fourier spectrum speech features, and the model is based on the bayesian decision theory, so that the classification problem of phoneme states in Chinese speech recognition is regarded as the estimation problem of data distribution, and the speech training and matching problem is converted into smaller problems of model selection, parameter training, probability calculation and the like. The embodiment of the invention takes the pre-awakening words as a training set, trains a Gaussian mixture model-hidden Markov model by using the phoneme characteristics of the pre-awakening words, then uses the trained Gaussian mixture model-hidden Markov model to match the input audio data, and picks out the audio data containing the pre-awakening words.

And step 12, if any pre-awakening word in the pre-awakening word set is contained in the first audio data, receiving second audio data within a first set time period after the first audio data is received.

Specifically, by taking the xiaoming speaker as an example, if the xiaoming speaker detects the pre-awakening word "xiaoming" from the first audio data received by the xiaoming speaker in the embodiment of the present invention, the xiaoming speaker may continuously acquire the second audio data within 10 seconds (of course, this time value may be set manually), and of course, the second audio data within 10 seconds may be meaningless blank noise, or may be a voice in which the user does not include an awakening instruction, or may be a "xiaoming" which is called after the user suddenly realizes that the user has called a wrong name.

And step 13, performing awakening word detection on the received second audio data.

Specifically, the detection of the wake word may also be implemented by using the gaussian mixture model-hidden markov model described above, and the specific implementation method is similar to the above and is not described herein again.

And step 14, if the second audio data contains a wake-up word, waking up the electronic equipment.

Specifically, taking the mini bright sound box as an example, if the second audio data received by the mini bright sound box in step 13 includes the wake-up word "mini bright light", the mini bright sound box recognizes that the situation is "the user has mistakenly shout the wake-up word because of speaking by mistake", and then directly wakes up each functional module in the standby state of the mini bright sound box to receive, recognize and execute the voice instruction of the user.

reducing the set number threshold of the awakening word syllables;

Specifically, referring to fig. 2, fig. 2 is a flowchart illustrating a method for detecting a wake-up word of the received second audio data according to an embodiment of the present invention, which includes the following steps:

step 131, extracting syllable features of the second audio data from the second audio data by taking syllables as units.

The syllable feature extraction in the embodiment of the invention needs to sample the frequency, then separate different voice signals by using discrete Fourier transform, and then represent the human auditory features according to the human vocal tract model and the auditory mechanism.

Step 132, comparing and analyzing the syllable features of the second audio data and the syllable features of the awakening words.

Step 133, if there is a similarity between the syllable features corresponding to the first number of syllables in the syllable features of the second audio data and the syllable features of the wakeup word greater than or equal to a similarity matching threshold, determining that the second audio data includes the wakeup word.

The awakening word detection method used in the embodiment of the invention is a syllable feature similarity matching method. The syllable is the most easily distinguished voice unit in the sense of hearing, and is the most natural voice unit, one Chinese character pronunciation in the Chinese character is a syllable, and each basic syllable is composed of three parts of tone, initial consonant and final. According to the embodiment of the invention, syllable features similar to the syllable features of the pre-awakening words are screened from the second audio data through the syllable feature similarity threshold, and if the number of the syllable features reaches a second number, the pre-awakening words are determined to exist in the second audio data.

In the embodiment of the invention, all syllable features in the second audio data are extracted by the Xiaoming loudspeaker box, then the syllable features are compared with the syllable features of the pre-awakening word 'Xiaoming' one by one, when the similarity between a certain syllable feature and a certain syllable feature of the pre-awakening word 'Xiaoming' exceeds 90% (of course, the number can be set manually), the syllable corresponding to the syllable feature is determined to be one syllable of the awakening word, and as the pre-awakening word 'Xiaoming' of the Xiaoming loudspeaker box is 4 syllables, the set number threshold value is determined to be 4 under normal conditions, and if 4 continuous syllables are detected from the first audio data through similarity comparison, the pre-awakening word 'Xiaoming' is determined to be contained in the second audio data. However, since it is considered that the pre-awakening word is already included in the previous first audio data, the probability of the user being mistaken is very high, and the number of the set number threshold is reduced, that is, the set number threshold is set to 3, and only 3 syllables, namely "mingmuir", or "minminminminminminminminminmingmuir" (where null is a syllable considered to be dissimilar to the corresponding syllable) are required to be included in the second audio data, so that the second audio data can be considered to include the awakening word "mingmuir".

In a possible embodiment, before performing the wake word detection on the received second audio data, the method further comprises:

reducing the size of the similarity matching threshold.

Specifically, using the xiaoming speaker as an example, when the similarity of the syllable feature of the wakeup word "xiaoming" is compared with the similarity of the syllable feature of the pre-wakeup word "xiaoming" in the second audio data, the xiaoming speaker in the embodiment of the present invention should normally identify the syllable corresponding to the syllable feature as a syllable of the wakeup word only when the similarity between the syllable feature and the syllable feature of the pre-wakeup word "xiaoming" exceeds 90% (of course, this number can be set manually), but considering that the pre-wakeup word is already included in the previous first audio data, the probability of the user's misstatement is very high, the embodiment of the present invention reduces the size of the similarity matching threshold, and as long as the similarity between the syllable feature corresponding to a syllable in the second audio data and a syllable feature in the wakeup word "xiaoming" reaches 80%, the syllable is identified as a syllable in the wakeup word "xiaoming", at this time, the subsequent syllable feature similarity comparison work is not performed any more, so that the detection efficiency is improved.

Specifically, the work authority includes an authority to recognize a voice instruction issued next by the user and execute the recognized voice instruction. It should be noted that the voice command does not include a wake-up command.

Specifically, taking the xiaoming speaker as an example, in the embodiment of the present invention, the xiaoming speaker detects the pre-awakening word "jingguang" from the first audio data, and the xiaoming speaker sets the internal standby functional module to an operating state, for example, converts the current power of the processor from the standby power to a normal operating power, for example, connects a cloud server to prepare for semantic analysis of a long text, and the like, but at this time, the xiaoming speaker prohibits giving the processor operating authority, that is, the xiaoming speaker is in a pre-awakening state capable of operating normally, so that once the xiaoming speaker detects the awakening word "xiaoming" within the next 10 seconds, the small ming speaker can immediately start operating normally, thereby increasing the awakening speed of the speaker. Through a large number of experimental tests, the embodiment of the invention can at least improve the awakening speed of about 0.1 to 0.5 seconds.

Specifically, the set area may be a network area under the same network node, for example, a network area under the same router or a network area under a fixed IP segment, or may be an area within a certain distance range in practice, for example, within a house range.

Specifically, the embodiment of the invention considers that the probability of displaying wrong wake-up words is higher when a user contacts various electronic devices in daily work and life, and therefore, the method for updating the pre-wake-up word set is provided. Certainly, through market research, the wake-up word corresponding to the electronic device with a higher market share rate may be directly added to the pre-wake-up word set in the embodiment of the present invention as the pre-wake-up word.

Specifically, the small bright sound box is taken as an example, in the embodiment of the present invention, the small bright sound box may establish a BLE mesh (Bluetooth Low Energy mesh) network with all electronic devices in a house, and obtain wakeup words of other electronic devices through Bluetooth communication between the small bright sound box and the electronic devices.

performing awakening word detection on the received first audio data;

Based on the same inventive concept as the method, the embodiment of the invention also provides a voice awakening device.

Referring to fig. 3, fig. 3 shows a voice wake-up apparatus according to an embodiment of the present invention, which is applied to an electronic device, and includes:

an audio receiving module 21, configured to receive audio data;

a pre-awakening word detection module 22, configured to perform pre-awakening word detection on the received first audio data;

a wake-up word detection module 23, configured to perform wake-up word detection on the received second audio data;

a control module 24, configured to control the audio receiving module 21 to receive the second audio data within a first set time period after receiving the first audio data when the first audio data includes any pre-wakeup word in a pre-wakeup word set, and to wake up the electronic device when the second audio data includes a wakeup word;

a storage module 25, configured to store a pre-wakeup word set and a wakeup word;

wherein the set of pre-wake words does not include the wake word.

In a possible embodiment, the control module 24 is further configured to reduce the threshold of the set number of the wakeword syllables detected by the second audio data wakeword;

In a possible embodiment, the control module 24 is further configured to lower a similarity matching threshold for the detection of the second audio data wake word.

In a possible embodiment, the control module 24 is further configured to, when the first audio data includes any one of a set of pre-wakeup words, convert the electronic device from a standby state to an operating state, but prohibit giving the electronic device operating permission, and, when the second audio data includes a wakeup word, give the electronic device operating permission.

In a possible embodiment, the mobile terminal further comprises a communication module, configured to communicate with electronic devices of other manufacturers in a set area, and obtain wake-up words corresponding to the electronic devices of the other manufacturers;

the control module 24 is further configured to add a wakeup word corresponding to the electronic device of the other manufacturer as a pre-wakeup word into the pre-wakeup word set.

In a possible embodiment, the control module 24 is further configured to not wake up the electronic device when any wake-up word in the set of wake-up words is not included in the second audio data.

In a possible embodiment, the wakeup word detection module 23 is further configured to perform wakeup word detection on the received first audio data;

the control module 24 is further configured to wake up the electronic device directly when the first audio data includes the wake-up word.

Based on the same inventive concept as in the previous embodiments, an embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and the processor implements the steps of any one of the methods when executing the program.

Based on the same inventive concept as in the previous embodiments, embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of any of the methods described above.

The technical scheme provided in the embodiment of the application at least has the following technical effects or advantages:

because many electronic devices exist in life, in order to ensure the accuracy of user awakening, electronic devices of different manufacturers have unique awakening words, and when a user calls the awakening word of the electronic device B at the electronic device A, the electronic device A can use the awakening word of the electronic device B as a nonsense sentence and cannot be awakened by the electronic device B. The embodiment of the invention gives a new meaning to the nonsense sentence, namely, the user mistakenly calls the wrong awakening word of the current electronic equipment, which means that the user really wants to awaken the current electronic equipment. The embodiment of the invention starts from the application scene, when the preset pre-awakening word is detected, the acquisition and the reception of the second sound data are continuously kept in the first set time period, and if the second sound data contains the preset awakening word, the electronic equipment is directly awakened. The embodiment of the invention establishes the relation between the pre-awakening word and the awakening process, thereby improving the voice awakening speed of the electronic equipment under the application scene of mistakenly calling the awakening word of the current electronic equipment by the user.

Furthermore, in the embodiment of the present invention, when the wakeup word detection is performed on the received second audio data, the size of the threshold of the set number of the syllables of the wakeup word can be reduced, and when compared with the normal wakeup word detection, the second audio data can be determined to contain the wakeup word under the condition that the second audio data contains fewer similar syllables of the wakeup word, so that the detection of the wakeup word is easily completed by the second audio data, and the wakeup word detection speed of the second audio data is increased.

Furthermore, in the embodiment of the present invention, when the received second audio data is detected for the wake-up word, the size of the similarity matching threshold can be reduced, and compared with the normal wake-up word detection, the detection of the wake-up word by the second audio data is easily completed, so as to accelerate the detection speed of the wake-up word by the second audio data.

Further, in the embodiment of the present invention, when it is detected that the first audio data includes the pre-wakeup word, the electronic device is directly switched from the standby state to the operating state, but the operating right of the electronic device is not given, and then once it is detected that the second audio data includes the pre-wakeup word, the operating right of the electronic device is directly given, so that the electronic device immediately operates, and the wakeup speed of the electronic device is increased.

Furthermore, in consideration of the fact that when a user contacts various electronic devices in daily work and life, the probability of calling out wrong wake-up words is high, the embodiment of the invention also provides an updating method of the pre-wake-up word set, namely, wake-up words corresponding to electronic devices of other manufacturers in a set area are added into the pre-wake-up word set as the pre-wake-up words, and wake-up words frequently called out by the user in daily work and life are all added into the pre-wake-up word set, so that effective updating of the pre-wake-up word set is realized.

Furthermore, in the embodiment of the present invention, before performing pre-wakeup word detection on received first audio data, wakeup word detection is performed on the received first audio data, and if the first audio data includes any wakeup word in the wakeup word set, the electronic device is directly woken up, so that the voice wakeup speed of the embodiment of the present invention is ensured, and normal wakeup of the electronic device is prevented from being affected in order to acquire the pre-wakeup word.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (modules, systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

The invention discloses a1, a voice wake-up method, which is characterized in that the method is applied to electronic equipment and comprises the following steps:

carrying out pre-awakening word detection on the received first audio data;

performing awakening word detection on the received second audio data;

wherein the pre-wake-up word set does not include the wake-up word.

A2, the voice wake-up method according to A1, wherein before the wake-up word detection of the received second audio data, the method further comprises:

reducing the set number threshold of the awakening word syllables;

A3, the voice wake-up method according to A2, wherein before performing wake-up word detection on the received second audio data, the method further comprises:

reducing the size of the similarity matching threshold.

A4, the voice wake-up method according to A1, wherein after the pre-wake-up word detection is performed on the received first audio data, the method further comprises:

A5, the voice wake-up method according to A1, wherein before the pre-wake-up word detection is performed on the received first audio data, the method further comprises:

A6, the voice wake-up method according to A1, wherein after the wake-up word detection is performed on the received second audio data, the method further comprises:

A7, the voice wake-up method according to any one of a1 to a6, wherein before the pre-wake-up word detection of the received first audio data, the method further comprises:

performing awakening word detection on the received first audio data;

B1, a voice wake-up device, applied to an electronic device, comprising:

the audio receiving module is used for receiving audio data;

wherein the set of pre-wake words does not include the wake word.

B2, the voice wakening apparatus according to B1, wherein the control module is further configured to reduce a threshold of a set number of wakening word syllables detected by the second audio data wakening word;

B3, the voice wake-up device according to B2, wherein the control module is further configured to reduce a similarity matching threshold for the detection of the wake-up word of the second audio data.

B4, the voice wake-up apparatus according to B1, wherein the control module is further configured to, when any pre-wake-up word in the pre-wake-up word set is included in the first audio data, convert the electronic device from a standby state to an operating state, but prohibit giving the electronic device an operating right, and, when a wake-up word is included in the second audio data, give the electronic device an operating right.

B5, the voice wake-up device according to B1, further comprising a communication module for communicating with electronic devices of other manufacturers in a set area to obtain wake-up words corresponding to the electronic devices of the other manufacturers;

and the control module is also used for adding the awakening words corresponding to the electronic equipment of other manufacturers into the pre-awakening word set as pre-awakening words.

B6, the voice wake-up apparatus according to B1, wherein the control module is further configured to not wake up the electronic device when the second audio data does not include any wake-up word in the set of wake-up words.

B7, the voice wake-up apparatus according to any one of B1 to B6, wherein the wake-up word detection module is further configured to perform wake-up word detection on the received first audio data;

the control module is further configured to directly wake up the electronic device when the first audio data includes the wake-up word.

C1, an electronic device, comprising:

a memory for storing a computer program;

a processor for executing the computer program to carry out the steps of the method of any one of A1 to A7.

D1, a computer-readable storage medium, having a computer program stored thereon, wherein the computer program, when being executed by a processor, is adapted to carry out the steps of the method of any of the claims a1 to a 7.

Claims

1. A voice wake-up method is applied to an electronic device, and comprises the following steps:

carrying out pre-awakening word detection on the received first audio data;

performing awakening word detection on the received second audio data;

wherein the pre-wake-up word set does not include the wake-up word.

2. The voice wake-up method according to claim 1, wherein prior to the wake-up word detection of the received second audio data, the method further comprises:

reducing the set number threshold of the awakening word syllables;

3. The voice wake-up method according to claim 2, wherein prior to wake-up word detection of the received second audio data, the method further comprises:

reducing the size of the similarity matching threshold.

4. The voice wake-up method according to claim 1, wherein after the pre-wake-up word detection of the received first audio data, the method further comprises:

5. The voice wake-up method according to claim 1, wherein before the pre-wake-up word detection of the received first audio data, the method further comprises:

6. The voice wake-up method according to claim 1, wherein after the wake-up word detection of the received second audio data, the method further comprises:

7. The voice wake-up method according to any one of claims 1 to 6, wherein before the pre-wake-up word detection of the received first audio data, the method further comprises:

performing awakening word detection on the received first audio data;

8. A voice wake-up device applied to an electronic device comprises:

the audio receiving module is used for receiving audio data;

wherein the set of pre-wake words does not include the wake word.

9. An electronic device, comprising:

a memory for storing a computer program;

a processor for executing the computer program to carry out the steps of the method of any one of claims 1 to 7.

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, is adapted to carry out the steps of the method of any one of claims 1 to 7.