CN113488050A

CN113488050A - Voice awakening method and device, storage medium and electronic equipment

Info

Publication number: CN113488050A
Application number: CN202110778555.9A
Authority: CN
Inventors: 李亚伟; 姚海涛; 田垚; 蔡猛; 马泽君
Original assignee: Beijing Youzhuju Network Technology Co Ltd
Current assignee: Beijing Youzhuju Network Technology Co Ltd
Priority date: 2021-07-09
Filing date: 2021-07-09
Publication date: 2021-10-08
Anticipated expiration: 2041-07-09
Also published as: CN113488050B

Abstract

The disclosure relates to a voice awakening method, a voice awakening device, a storage medium and an electronic device, which are used for designing a higher threshold value for common awakening words so as to reduce false awakening and setting a lower threshold value for rare awakening words so as to improve the awakening rate. The method comprises the following steps: acquiring a voice awakening word of a threshold to be set; determining a word score of the voice awakening word through a language model, wherein the word score is used for representing the occurrence probability of the voice awakening word; determining a target awakening threshold corresponding to the voice awakening word according to the word score of the voice awakening word and a preset corresponding relation between the awakening threshold and the word score, wherein the awakening threshold in the preset corresponding relation is positively correlated with the word score, and the target awakening threshold is used for comparing with the collected target voice awakening word in the voice awakening process so as to determine a voice awakening result corresponding to the target voice awakening word.

Description

Voice awakening method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of voice technologies, and in particular, to a voice wake-up method, apparatus, storage medium, and electronic device.

Background

The voice wake-up technology presets a wake-up word in the electronic device or software, so that when a user sends a voice command corresponding to the wake-up word, the electronic device can be woken up from a sleep state and make a specified response. Specifically, each preset wake-up word has a corresponding wake-up threshold, and after a user sends a voice command, a word score corresponding to the voice command is determined, and if the word score is greater than or equal to the wake-up threshold, the electronic device may be woken up from a sleep state and make a specified response. If the word score is less than the wake threshold, the electronic device is not woken up.

In the related art, a uniform wake-up threshold is usually set for each wake-up word, which may result in a low wake-up rate corresponding to some wake-up words or more false wake-up corresponding to some wake-up words, affecting accuracy of voice wake-up.

Disclosure of Invention

This summary is provided to introduce a selection of concepts in a simplified form that are further described below in the detailed description. This summary is not intended to identify key features or essential features of the claimed subject matter, nor is it intended to be used to limit the scope of the claimed subject matter.

In a first aspect, the present disclosure provides a voice wake-up method, including:

acquiring a voice awakening word of a threshold to be set;

determining a word score of the voice awakening word through a language model, wherein the word score is used for representing the occurrence probability of the voice awakening word;

determining a target awakening threshold corresponding to the voice awakening word according to the word score of the voice awakening word and a preset corresponding relation between the awakening threshold and the word score, wherein the awakening threshold in the preset corresponding relation is positively correlated with the word score, and the target awakening threshold is used for comparing with the collected target voice awakening word in the voice awakening process so as to determine a voice awakening result corresponding to the target voice awakening word.

In a second aspect, the present disclosure provides a voice wake-up apparatus, the apparatus comprising:

the acquisition module is used for acquiring voice awakening words of a threshold to be set;

the first determining module is used for determining a word score of the voice awakening word through a language model, wherein the word score is used for representing the occurrence probability of the voice awakening word;

and the second determining module is used for determining a target awakening threshold corresponding to the voice awakening word according to the word score of the voice awakening word and a preset corresponding relation between the awakening threshold and the word score, wherein the awakening threshold is positively correlated with the word score in the preset corresponding relation, and the target awakening threshold is used for comparing with the collected target voice awakening word in the voice awakening process so as to determine a voice awakening result corresponding to the target voice awakening word.

In a third aspect, the present disclosure provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processing apparatus, implements the steps of the method described in the first aspect.

In a fourth aspect, the present disclosure provides an electronic device comprising:

a storage device having a computer program stored thereon;

processing means for executing the computer program in the storage means to carry out the steps of the method of the first aspect.

By the technical scheme, the target awakening threshold corresponding to the voice awakening word can be determined according to the word score of the voice awakening word and the preset corresponding relation between the awakening threshold and the word score. In addition, the awakening threshold is positively correlated with the word score in the preset corresponding relation, so that the higher the word score is, namely the more common the voice awakening words are, the higher the corresponding awakening threshold is, and thus the false awakening corresponding to the awakening words can be reduced. On the contrary, the lower the word score is, that is, the less the voice awakening word is, the lower the corresponding awakening threshold value is, so that the awakening rate corresponding to the awakening word can be improved. Therefore, the problem of low awakening rate or more mistaken awakenings in the related technology can be solved, and the accuracy of voice awakening is improved.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows.

Drawings

The above and other features, advantages and aspects of various embodiments of the present disclosure will become more apparent by referring to the following detailed description when taken in conjunction with the accompanying drawings. Throughout the drawings, the same or similar reference numbers refer to the same or similar elements. It should be understood that the drawings are schematic and that elements and features are not necessarily drawn to scale. In the drawings:

FIG. 1 is a flow chart illustrating a voice wake-up method according to an exemplary embodiment of the present disclosure;

FIG. 2 is a block diagram illustrating a voice wake-up unit according to an exemplary embodiment of the present disclosure;

fig. 3 is a block diagram illustrating an electronic device according to an exemplary embodiment of the present disclosure.

Detailed Description

Embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While certain embodiments of the present disclosure are shown in the drawings, it is to be understood that the present disclosure may be embodied in various forms and should not be construed as limited to the embodiments set forth herein, but rather are provided for a more thorough and complete understanding of the present disclosure. It should be understood that the drawings and embodiments of the disclosure are for illustration purposes only and are not intended to limit the scope of the disclosure.

It should be understood that the various steps recited in the method embodiments of the present disclosure may be performed in a different order, and/or performed in parallel. Moreover, method embodiments may include additional steps and/or omit performing the illustrated steps. The scope of the present disclosure is not limited in this respect.

The term "include" and variations thereof as used herein are open-ended, i.e., "including but not limited to". The term "based on" is "based, at least in part, on". The term "one embodiment" means "at least one embodiment"; the term "another embodiment" means "at least one additional embodiment"; the term "some embodiments" means "at least some embodiments". Relevant definitions for other terms will be given in the following description.

It should be noted that the terms "first", "second", and the like in the present disclosure are only used for distinguishing different devices, modules or units, and are not used for limiting the order or interdependence relationship of the functions performed by the devices, modules or units. It is further noted that references to "a", "an", and "the" modifications in the present disclosure are intended to be illustrative rather than limiting, and that those skilled in the art will recognize that "one or more" may be used unless the context clearly dictates otherwise.

The names of messages or information exchanged between devices in the embodiments of the present disclosure are for illustrative purposes only, and are not intended to limit the scope of the messages or information.

As mentioned in the background, the related art generally sets a uniform wake-up threshold for each wake-up word. However, the inventor researches and finds that some wake-up words are common and are easy to wake up by mistake, so that a high threshold value needs to be set, while some wake-up words are rare and are difficult to wake up, so that a low threshold value needs to be set. According to the mode of the related art, a uniform threshold value is set for different awakening words, so that the awakening rate corresponding to some awakening words is low, or the false awakening corresponding to some awakening words is more, and the accuracy of voice awakening is influenced.

The inventor finds that appropriate threshold values of different awakening words and word scores obtained after the awakening words are input into a language model show strong correlation through a large amount of data analysis. Therefore, the present disclosure provides a new threshold setting manner, in which a higher threshold is designed for a wake-up word that is easy to wake up, so as to reduce false wake-up, and a lower threshold is set for a wake-up word that is difficult to wake up, so as to improve the wake-up rate.

Fig. 1 is a flowchart illustrating a voice wake-up method according to an exemplary embodiment of the present disclosure. Referring to fig. 1, the voice wakeup method includes:

step 101, acquiring a voice awakening word of a threshold to be set;

102, determining a word score of the voice awakening word through a language model, wherein the word score is used for representing the occurrence probability of the voice awakening word;

step 103, determining a target awakening threshold corresponding to the voice awakening word according to the word score of the voice awakening word and a preset corresponding relationship between the awakening threshold and the word score, wherein the awakening threshold is positively correlated with the word score in the preset corresponding relationship. The target awakening threshold is used for comparing the target awakening words with the collected target voice awakening words in the voice awakening process so as to determine voice awakening results corresponding to the target voice awakening words.

For example, the voice wake-up word to be set with the threshold may be a word or a word group customized by the user, and is used for waking up the electronic device to perform an operation. For example, "navigate" or "open navigation" may be used, and the embodiments of the present disclosure do not limit this. The word score may characterize the probability of occurrence of the voice wake word, i.e., how common the voice wake word is. The word score of the voice awakening word is positively correlated with the occurrence probability (i.e. the common degree), i.e. the higher the word score is, the more common the voice awakening word is, otherwise, the lower the word score is, the less common the voice awakening word is. In practical applications, the score of the voice wake word may be determined by an N-gram (N is a positive integer) language model, which is not limited in the embodiment of the disclosure.

After the word score of the voice awakening word is determined, the target awakening threshold corresponding to the voice awakening word can be determined according to the word score and the preset corresponding relation between the awakening threshold and the word score. In addition, the awakening threshold is positively correlated with the word score in the preset corresponding relation, so that the higher the word score is, namely the more common the voice awakening words are, the higher the corresponding awakening threshold is, and thus the false awakening corresponding to the awakening words can be reduced. On the contrary, the lower the word score is, that is, the less the voice awakening word is, the lower the corresponding awakening threshold value is, so that the awakening rate corresponding to the awakening word can be improved. Therefore, the problem of low awakening rate or more mistaken awakenings in the related technology can be solved, and the accuracy of voice awakening is improved.

In order to make those skilled in the art understand the voice wake-up method provided by the present disclosure, the following detailed description is provided for each step.

For example, the language model may be an N-gram language model, such as a 1gram language model, a 2gram language model, and the like, which is not limited by the embodiment of the present disclosure. The 1gram language model divides input text or words into a plurality of words, and the word score of each word is independent of other words. The 2gram language model is to divide the input text or words into several single words, and the word score of each single word is related to the previous single word.

In a possible manner, if the voice wakeup model is an end-to-end voice recognition RNN-T model, the voice model may be a 2gram language model, and the 2gram language model is used to determine a word score corresponding to a word segmentation according to a single word before the word segmentation. It should be understood that the RNN-T model has a correlation between two adjacent data during data processing, and thus a 2gram language model is correspondingly adopted to determine a word score corresponding to each word segmentation according to a single word before the word segmentation. Therefore, the language model can be matched with the voice awakening model, so that the result accuracy of the awakening threshold value can be improved, and the accuracy of voice awakening is further improved.

For example, the language model may be trained by the sample text, and then the word score of the voice wake word may characterize the probability of occurrence of the voice wake word in the sample text. In a possible approach, a first sample text for training a wake-on-speech model may be obtained and a language model is trained from the first sample text. That is, the language model can be trained by training the text data of the voice awakening model, so that the matching degree between the voice awakening model and the language model is improved, and the accuracy of voice awakening is improved.

It should be understood that if the voice awakening word appears more frequently in the first sample text for training the voice awakening model, after the language model is trained by the first sample text, the language model determines a higher word score for the voice awakening word, which indicates that the voice awakening word is more common, so that a higher awakening threshold value can be subsequently determined for the voice awakening word, and corresponding false awakening is reduced. On the contrary, if the frequency of occurrence of the voice awakening word in the first sample text for training the voice awakening model is low, after the language model is trained by the first sample text, the language model determines a low word score for the voice awakening word, which indicates that the voice awakening word is less frequent, so that a low awakening threshold value can be subsequently determined for the voice awakening word, and the corresponding awakening rate is improved.

In a possible manner, the preset correspondence between the wake-up threshold and the word score may be obtained as follows: and determining sample word scores corresponding to the participles in the second sample text through a language model, determining sample awakening thresholds corresponding to the participles in the second sample text, and performing data fitting according to the sample word scores corresponding to the participles in the second sample text and the sample awakening thresholds to obtain a preset corresponding relation between the awakening thresholds and the word scores.

For example, the second sample text may be the same as or different from the first sample text, and the embodiment of the present disclosure does not limit this. A plurality of segmented words may be selected from the second sample text, and then a sample word score corresponding to the plurality of segmented words may be determined by the language model. And, a sample wake threshold corresponding to the plurality of participles may be determined. Wherein the sample wake-up threshold may be determined by analyzing a large amount of data.

In a possible manner, determining a sample awakening threshold corresponding to each participle in the second sample text may be: and inputting test linguistic data except the target participle to the voice awakening model within a preset time length aiming at the target participle in the second sample text, determining a false awakening rate corresponding to each candidate awakening threshold within the preset time length according to a plurality of candidate awakening thresholds corresponding to the voice awakening model, wherein the target participle is any participle in the sample text, and then determining the candidate awakening threshold when the false awakening rate reaches the preset false awakening rate as the sample awakening threshold corresponding to the target participle.

For example, the preset time period may be set according to practical situations, for example, may be set to 100 hours, 200 hours, and the like, which is not limited in this disclosure. The candidate wake-up threshold may be a plurality of values within a preset threshold range, for example, the threshold range is set to be 0 to 1, and the step length is set to be 0.1, then the candidate wake-up threshold may be sequentially valued as 0, 0.1, 0.2, … …, 1. The preset threshold range and the step length may be set according to actual conditions, which is not limited in the embodiments of the present disclosure. In addition, the false wake-up rate is the number of false wake-up of the test corpus in unit time, and the preset false wake-up rate is an expected false wake-up rate in practical application and can be set according to practical situations.

In the embodiment of the disclosure, the test corpus except the target participle may be input to the voice wake-up model within a preset duration, then the false wake-up rate corresponding to each candidate wake-up threshold within the preset duration is determined, and if the false wake-up rate corresponding to a certain candidate wake-up threshold reaches the preset false wake-up rate, the candidate wake-up threshold may be used as the sample wake-up threshold. Therefore, the sample awakening threshold corresponding to each participle in the second sample text can be obtained according to the false awakening rate, and the sample awakening threshold conforms to the expected false awakening rate, so that the corresponding relation between the awakening threshold and the word score is fitted according to the sample awakening threshold, false awakening can be reduced, and the accuracy of threshold setting is improved.

After the sample word score and the sample awakening threshold corresponding to each participle in the second sample text are determined, data fitting can be performed according to the sample word score and the sample awakening threshold corresponding to each participle in the second sample text, so that a preset corresponding relationship between the awakening threshold and the word score is obtained. In a possible manner, the sample word score corresponding to each participle in the second sample text may be used as an independent variable, and the sample wake-up threshold corresponding to each participle in the second sample text may be used as a dependent variable to perform linear fitting, so as to obtain a functional relation expression for representing the correspondence between the wake-up threshold and the word score.

For example, the sample word score corresponding to each participle in the second sample text may be used as an independent variable, the sample wake-up threshold corresponding to each participle in the second sample text may be used as a dependent variable, and linear fitting may be performed by using the straight-line equation y ═ ax + b. Wherein y represents a sample awakening threshold corresponding to each participle in the second sample text, x represents a sample word score corresponding to each participle in the second sample text, and a and b represent coefficients to be fitted. Therefore, a functional relation expression for representing the corresponding relation between the awakening threshold and the word score can be obtained, and the target awakening threshold corresponding to the voice awakening word can be determined through the functional relation expression and the word score of the voice awakening word. In addition, the awakening threshold value is positively correlated with the word score in the functional relation, so that the higher the word score is, namely the more common the voice awakening words are, the higher the corresponding awakening threshold value is, and the false awakening corresponding to the awakening words can be reduced. On the contrary, the lower the word score is, that is, the less the voice awakening word is, the lower the corresponding awakening threshold value is, so that the awakening rate corresponding to the awakening word can be improved. Therefore, the problem of low awakening rate or more mistaken awakenings in the related technology can be solved, and the accuracy of voice awakening is improved.

Based on the same inventive concept, the embodiment of the present disclosure further provides a voice wake-up apparatus, which may be a part or all of an electronic device through software, hardware, or a combination of both. Referring to fig. 2, the voice wake-up apparatus 200 includes:

an obtaining module 201, configured to obtain a voice wakeup word of a threshold to be set;

a first determining module 202, configured to determine, through a language model, a word score of the voice wakeup word, where the word score is used to characterize an occurrence probability of the voice wakeup word;

a second determining module 203, configured to determine a target wake-up threshold corresponding to the voice wake-up word according to the word score of the voice wake-up word and a preset corresponding relationship between a wake-up threshold and the word score, where the wake-up threshold in the preset corresponding relationship is positively correlated with the word score, and the target wake-up threshold is used to compare with a collected target voice wake-up word in a voice wake-up process, so as to determine a voice wake-up result corresponding to the target voice wake-up word.

Optionally, the apparatus 200 further includes a training module, configured to obtain a first sample for training the voice wakeup model, and train the language model according to the first sample.

Optionally, the apparatus 200 further comprises a data fitting module for:

determining sample word scores corresponding to the participles in a second sample text through the language model, and determining sample awakening thresholds corresponding to the participles in the second sample text;

and performing data fitting according to the sample word score corresponding to each word in the second sample text and the sample awakening threshold value to obtain a preset corresponding relation between the awakening threshold value and the word score.

Optionally, the data fitting module is configured to:

and taking the sample word score corresponding to each participle in the second sample text as an independent variable, and taking the sample awakening threshold corresponding to each participle in the second sample text as a dependent variable to perform linear fitting, so as to obtain a functional relation expression for representing the corresponding relation between the awakening threshold and the word score.

Optionally, the data fitting module is configured to:

aiming at target participles in the second sample text, inputting test corpora except the target participles into a voice awakening model within a preset time length, and determining a false awakening rate corresponding to each candidate awakening threshold within the preset time length according to a plurality of candidate awakening thresholds corresponding to the voice awakening model, wherein the target participles are any participles in the sample text;

and determining the candidate awakening threshold when the false awakening rate reaches a preset false awakening rate as a sample awakening threshold corresponding to the target word segmentation.

Optionally, the voice wakeup model includes an end-to-end voice recognition RNN-T model, the language model is a 2gram language model, and the 2gram language model is configured to determine a word score corresponding to a word segmentation according to a single word before the word segmentation.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Based on the same inventive concept, the disclosed embodiments further provide a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processing device, implements the steps of any of the above voice wake-up methods.

Based on the same inventive concept, an embodiment of the present disclosure further provides an electronic device, including:

a storage device having a computer program stored thereon;

and the processing device is used for executing the computer program in the storage device so as to realize the steps of any voice wake-up method.

Referring now to FIG. 3, a block diagram of an electronic device 300 suitable for use in implementing embodiments of the present disclosure is shown. The terminal device in the embodiments of the present disclosure may include, but is not limited to, a mobile terminal such as a mobile phone, a notebook computer, a digital broadcast receiver, a PDA (personal digital assistant), a PAD (tablet computer), a PMP (portable multimedia player), a vehicle terminal (e.g., a car navigation terminal), and the like, and a stationary terminal such as a digital TV, a desktop computer, and the like. The electronic device shown in fig. 3 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.

As shown in fig. 3, the electronic device 300 may include a processing means (e.g., a central processing unit, a graphics processor, etc.) 301 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)302 or a program loaded from a storage means 308 into a Random Access Memory (RAM) 303. In the RAM 303, various programs and data necessary for the operation of the electronic apparatus 300 are also stored. The processing device 301, the ROM 302, and the RAM 303 are connected to each other via a bus 304. An input/output (I/O) interface 305 is also connected to bus 304.

Generally, the following devices may be connected to the I/O interface 305: input devices 306 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; an output device 307 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage devices 308 including, for example, magnetic tape, hard disk, etc.; and a communication device 309. The communication means 309 may allow the electronic device 300 to communicate wirelessly or by wire with other devices to exchange data. While fig. 3 illustrates an electronic device 300 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided.

In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program carried on a non-transitory computer readable medium, the computer program containing program code for performing the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network through the communication means 309, or installed from the storage means 308, or installed from the ROM 302. The computer program, when executed by the processing device 301, performs the above-described functions defined in the methods of the embodiments of the present disclosure.

It should be noted that the computer readable medium in the present disclosure can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In contrast, in the present disclosure, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.

In some embodiments, the communication may be performed using any currently known or future developed network Protocol, such as HTTP (HyperText Transfer Protocol), and may be interconnected with any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include a local area network ("LAN"), a wide area network ("WAN"), the Internet (e.g., the Internet), and peer-to-peer networks (e.g., ad hoc peer-to-peer networks), as well as any currently known or future developed network.

The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device.

The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a voice awakening word of a threshold to be set; determining a word score of the voice awakening word through a language model, wherein the word score is used for representing the occurrence probability of the voice awakening word; determining a target awakening threshold corresponding to the voice awakening word according to the word score of the voice awakening word and a preset corresponding relation between the awakening threshold and the word score, wherein the awakening threshold in the preset corresponding relation is positively correlated with the word score, and the target awakening threshold is used for comparing with the collected target voice awakening word in the voice awakening process so as to determine a voice awakening result corresponding to the target voice awakening word.

Computer program code for carrying out operations for the present disclosure may be written in any combination of one or more programming languages, including but not limited to an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present disclosure may be implemented by software or hardware. Wherein the name of a module in some cases does not constitute a limitation on the module itself.

The functions described herein above may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: field Programmable Gate Arrays (FPGAs), Application Specific Integrated Circuits (ASICs), Application Specific Standard Products (ASSPs), systems on a chip (SOCs), Complex Programmable Logic Devices (CPLDs), and the like.

In the context of this disclosure, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. A machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

Example 1 provides a voice wake-up method, according to one or more embodiments of the present disclosure, including:

acquiring a voice awakening word of a threshold to be set;

Example 2 provides the method of example 1, the language model being trained by:

a first sample text for training a voice wakeup model is obtained, and the language model is trained according to the first sample text.

Example 3 provides the method of example 1 or 2, wherein the preset correspondence between the wake threshold and the word score is obtained by:

According to one or more embodiments of the present disclosure, example 4 provides the method of example 3, where the performing data fitting according to the sample word score corresponding to each participle in the second sample text and the sample wake-up threshold to obtain a preset corresponding relationship between the wake-up threshold and the word score includes:

Example 5 provides the method of example 3, wherein determining the sample wake-up threshold corresponding to each participle in the second sample text comprises:

Example 6 provides the method of example 1 or 2, the voice wakeup model including an end-to-end voice recognition RNN-T model, the language model being a 2gram language model, the 2gram language model being configured to determine a word score corresponding to a participle from a single word before the participle.

Example 7 provides a voice wake-up apparatus, in accordance with one or more embodiments of the present disclosure, the apparatus comprising:

Example 8 provides the apparatus of example 7, the apparatus further comprising, in accordance with one or more embodiments of the present disclosure:

and the training module is used for acquiring a target sample text for training a voice awakening model and taking the target sample text as a first sample text for training the language model.

Example 9 provides a non-transitory computer-readable storage medium having stored thereon, a computer program that, when executed by a processing device, implements the steps of the method of any of examples 1-6, in accordance with one or more embodiments of the present disclosure.

Example 10 provides, in accordance with one or more embodiments of the present disclosure, an electronic device comprising:

a storage device having a computer program stored thereon;

processing means for executing the computer program in the storage means to carry out the steps of the method of any of examples 1-6.

The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the disclosure herein is not limited to the particular combination of features described above, but also encompasses other embodiments in which any combination of the features described above or their equivalents does not depart from the spirit of the disclosure. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Further, while operations are depicted in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order. Under certain circumstances, multitasking and parallel processing may be advantageous. Likewise, while several specific implementation details are included in the above discussion, these should not be construed as limitations on the scope of the disclosure. Certain features that are described in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be implemented in multiple embodiments separately or in any suitable subcombination.

Although the subject matter has been described in language specific to structural features and/or methodological acts, it is to be understood that the subject matter defined in the appended claims is not necessarily limited to the specific features or acts described above. Rather, the specific features and acts described above are disclosed as example forms of implementing the claims. With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Claims

1. A voice wake-up method, the method comprising:

acquiring a voice awakening word of a threshold to be set;

2. The method of claim 1, wherein the language model is trained by:

3. The method according to claim 1 or 2, wherein the preset correspondence between the wake-up threshold and the word score is obtained by:

4. The method according to claim 3, wherein the obtaining of the preset correspondence between the wake-up threshold and the word score by performing data fitting according to the sample word score corresponding to each participle in the second sample text and the sample wake-up threshold comprises:

5. The method of claim 3, wherein the determining the sample wake-up threshold for each participle in the second sample text comprises:

6. The method according to claim 1 or 2, wherein the voice wakeup model comprises an end-to-end voice recognition (RNN-T) model, the language model is a 2gram language model, and the 2gram language model is used for determining a word score corresponding to a word segmentation according to a single word before the word segmentation.

7. A voice wake-up apparatus, the apparatus comprising:

8. The apparatus of claim 7, further comprising:

9. A non-transitory computer readable storage medium, having stored thereon a computer program, characterized in that the program, when executed by a processing device, implements the steps of the method of any one of claims 1-6.

10. An electronic device, comprising:

a storage device having a computer program stored thereon;

processing means for executing the computer program in the storage means to carry out the steps of the method according to any one of claims 1 to 6.