CN111742361B

CN111742361B - Method for updating wake-up voice of voice assistant by terminal and terminal

Info

Publication number: CN111742361B
Application number: CN201880089912.7A
Authority: CN
Inventors: 许军
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2018-07-24
Filing date: 2018-07-24
Publication date: 2023-08-22
Anticipated expiration: 2038-07-24
Also published as: WO2020019176A1; CN111742361A

Abstract

The embodiment of the application discloses a method for updating wake-up voice of a voice assistant by a terminal and the terminal, which relate to the technical field of voice control and can update wake-up words of the terminal in real time, so that the voice wake-up rate of the terminal for executing voice wake-up can be improved, and the false wake-up rate is reduced. The specific scheme is as follows: the terminal receives first voice data input by a user; the terminal judges whether a text corresponding to the first voice data is matched with a text of a preset wake-up word registered in the terminal; if the text corresponding to the first voice data is matched with the text of the preset wake-up word, the terminal performs identity authentication on the user; if the identity authentication is passed, the terminal updates a first voiceprint model in the terminal by adopting first voice data; the voice assistant comprises a voice assistant, a first voice model and a second voice model, wherein the first voice model is used for conducting voice verification when the voice assistant is awakened, and the first voice model represents voice characteristics of preset awakening words.

Description

Method for updating wake-up voice of voice assistant by terminal and terminal

Technical Field

The embodiment of the application relates to the technical field of voice control, in particular to a method for updating wake-up voice of a voice assistant by a terminal and the terminal.

Background

Voice assistants are an important application for cell phones. The voice assistant may conduct intelligent conversations and intelligent interactions with the user with immediate questions and answers. The voice assistant can also recognize the voice command of the user and enable the mobile phone to execute the event corresponding to the voice command. For example, if the voice assistant receives and recognizes the user-entered voice command "make phone call to Bob (Bob), the phone may automatically make a call to contact Bob.

Generally, the voice assistant is in a dormant state. Before the user wants to use the voice assistant, he can wake up the voice assistant by voice. Before voice wakeup, the user needs to register a wake word (i.e., wake voice) in the handset to wake up the voice assistant. The mobile phone can generate a voiceprint model which can represent voiceprint characteristics of the wake-up word according to the wake-up word input by the user. The voice wakeup process may include: the handset monitors the voice data through a low power digital signal processor (Digital Signal Processing, DSP). When the DSP detects that the similarity between the voice data and the wake-up word meets a certain condition, the DSP gives the monitored voice data to an application processor (Application Processor, AP). And performing text verification and voiceprint verification on the voice data by the AP to judge whether the voice data is matched with the generated voiceprint model. When the voice data matches the voiceprint model, the phone may then turn on the voice assistant.

After the user registers the wake-up word in the mobile phone, the wake-up word is rarely re-registered (i.e. updated). However, the wake-up words registered in the mobile phone are only voice data recorded in a noise scene by the user in the current physical state. The change of the physical state of the user and the change of the noise scene of the user can influence the voice data sent by the user. Therefore, when the physical state of the user and/or the noise scene where the user is located are changed, if the user still uses the initially registered wake-up word to wake up the voice, the voice wake-up rate of the mobile phone is reduced, and the false wake-up rate of the mobile phone for executing voice wake-up is increased.

Disclosure of Invention

The embodiment of the application provides a method for updating the wake-up voice of a voice assistant by a terminal and the terminal, which can update the wake-up voice of the terminal in real time, thereby improving the voice wake-up rate of the terminal for executing voice wake-up and reducing the false wake-up rate.

In a first aspect, an embodiment of the present application provides a method for updating wake-up speech of a speech assistant by a terminal. The method may include: the terminal receives first voice data input by a user; the terminal judges whether a text corresponding to the first voice data is matched with a text of a preset wake-up word registered in the terminal; if the text corresponding to the first voice data is matched with the text of the preset wake-up word, the terminal performs identity authentication on the user. If the identity authentication is passed, the terminal updates a first voiceprint model in the terminal by adopting the first voice data. The voice assistant comprises a voice assistant, a first voice model and a second voice model, wherein the first voice model is used for conducting voice verification when the voice assistant is awakened, and the first voice model represents voice characteristics of preset awakening words.

In the embodiment of the application, if the text corresponding to the first voice data can be matched with the text of the preset wake-up word and the user identity authentication is passed, the first voice data is the wake-up voice which is sent by the user and can wake up the voice assistant and passes the identity authentication. And, because the first voice data is the voice data of the user that the terminal station obtains in real time; thus, the first speech data may reflect the physical state of the user and/or the real-time condition of the noise scene in which the user is located. In summary, the voice print model of the terminal is updated by using the first voice data, so that the voice wake-up rate of the terminal for executing voice wake-up can be improved, and the false wake-up rate can be reduced.

Further, the first voice data is automatically acquired by the terminal in the voice wake-up process of the terminal, rather than prompting the user to manually re-register the wake-up word and then receiving the user input. Therefore, the voice print model is updated by adopting the first voice data, and the flow of wake-up word updating can be simplified.

With reference to the first aspect, in one possible design manner, the terminal performs identity authentication on the user, specifically: and the terminal uses the first voiceprint model to carry out voiceprint verification on the first voice data. And if the first voice data passes the voiceprint verification, the identity authentication is indicated to pass.

In the embodiment of the application, the terminal can acquire the first voice data through text verification and voiceprint verification when the terminal executes voice wakeup. Then, a first voiceprint model in the terminal is updated with the first voice data. Wherein, because the first voice data is the voice data of the user acquired by the terminal in real time; thus, the first speech data may reflect the physical state of the user and/or the real-time condition of the noise scene in which the user is located. Moreover, as the first voice data passes the text verification and the voiceprint verification; therefore, the voice-print model of the terminal is updated by adopting the first voice data, so that the voice wake-up rate of the terminal for executing voice wake-up can be improved, and the false wake-up rate is reduced.

In combination with the first aspect, in another possible design, the terminal may start the voice assistant if the first voice data passes the voiceprint verification. After the voice assistant is started, the terminal may or may not receive valid voice commands through the voice assistant. The terminal may determine whether to update the first voiceprint model with the first voice data by determining whether the terminal receives a valid voice command. Specifically, the method of the embodiment of the application further comprises the following steps: when the identity authentication passes, the terminal starts a voice assistant; the terminal receives second voice data through the voice assistant; the terminal determines the second voice data as a valid voice command. In this way, after the authentication is passed, if the terminal determines that the second voice data is a valid voice command, the terminal may update the first voiceprint model in the terminal with the first voice data.

The terminal updates a first voiceprint model in the terminal by using first voice data only when receiving a valid voice command for triggering the terminal to execute a corresponding function after the voice assistant is started. If the voice assistant of the terminal receives a valid voice command after starting, the voice wakeup is indicated to be the valid voice wakeup conforming to the user intention. The voice data which can reflect the real intention of the user and can successfully wake up the terminal is adopted to update the voice print model of the terminal, so that the voice wake-up rate of the terminal for executing voice wake-up can be further improved, and the false wake-up rate is reduced.

With reference to the first aspect, in another possible design, the terminal includes a coprocessor and a main processor; the terminal monitors voice data by using a coprocessor; when the coprocessor monitors the first voice data with the similarity with the preset wake-up word meeting the preset condition, the main processor is informed to judge whether the text corresponding to the first voice data is matched with the text of the preset wake-up word of the terminal, and when the text corresponding to the first voice data is determined to be matched with the text of the preset wake-up word, the main processor performs voiceprint verification on the first voice data by using a first voiceprint model. For example, the coprocessor is a DSP and the main processor is an AP.

With reference to the first aspect, in another possible design manner, before the terminal performs identity authentication on the user, the terminal may perform voiceprint verification on the first voice data using the first voiceprint model; if the first voice data do not pass the voiceprint verification, the terminal performs text verification on the voice data received in the first preset time; and if the terminal receives the second voice data and at least one voice data matched with the text of the preset wake-up word in the first preset time, the terminal performs identity authentication on the user. The text corresponding to the second voice data comprises preset keywords. For example, the second voice data may be voice data in which the user complains of a voice wake failure, such as voice data of "how to wake up", "how to not go", "not respond", "unable to wake up", and "voice wake up failure".

And if the terminal receives the first voice data, finding that the voiceprint verification of the first voice data fails. And then, the terminal can receive at least one voice data which passes the text verification in the first preset time, and the voice data indicate that the user wants to wake up the voice assistant of the terminal through voice for a plurality of times, but the voice wake-up fails. In this case, if the terminal also receives the second voice data within the first preset time, the result of the voice wakeup failure of the user is not satisfied. The terminal receives the second voice data and the voice data passing through at least one text check in the first preset time, and the strong intention of the user to wake up the voice assistant by voice is indicated; however, multiple voice failures may be caused because the current physical state of the user is greatly different from the physical state when the user registers the wake word. Since the received first voice data is voice data for voice waking up the voice assistant issued by the user at a strong intention of the voice assistant having the voice waking up terminal. Therefore, the voice data capable of reflecting the real intention of the user is adopted to update the voice print model of the terminal, so that the voice wake-up rate of the terminal for executing voice wake-up can be further improved, and the false wake-up rate is reduced.

And, because the first voice data is the voice data of the user that the terminal station obtains in real time; thus, the first speech data may reflect the physical state of the user and/or the real-time condition of the noise scene in which the user is located. Therefore, the voice-print model of the terminal is updated by adopting the first voice data, so that the voice wake-up rate of the terminal for executing voice wake-up can be improved, and the false wake-up rate is reduced. Further, the received first voice data is automatically acquired by the terminal in the voice wake-up process of the terminal, rather than prompting the user to manually re-register the wake-up word and then receiving the user input. Therefore, the voice print model is updated by adopting the received first voice data, and the flow of wake-up word updating can be simplified.

With reference to the first aspect, in another possible design manner, the terminal performs identity authentication on a user, including: the terminal displays an identity verification interface; the terminal receives authentication information input by a user on an authentication interface; and the terminal performs user identity verification on the user according to the identity verification information.

With reference to the first aspect, in another possible design, the terminal includes a coprocessor and a main processor; the terminal monitors voice data by using a coprocessor; when the coprocessor monitors first voice data with similarity to a preset wake-up word meeting preset conditions, the main processor is informed to judge whether a text corresponding to the first voice data is matched with a text of the preset wake-up word of the terminal, and when the text corresponding to the first voice data is determined to be matched with the text of the preset wake-up word, the main processor performs voiceprint verification on the first voice data by using a first voiceprint model. The terminal uses the coprocessor to monitor voice data within a first preset time; the notification main processor judges whether the voice data received in the first preset time comprises second voice data and at least one voice data matched with the text of the preset wake-up word, and the text corresponding to the second voice data comprises preset keywords. For example, the coprocessor is a DSP and the main processor is an AP.

In combination with the first aspect, in another possible design manner, the preset wake-up word includes at least two pieces of registered voice data, where the at least two pieces of registered voice data are recorded when the terminal registers the preset wake-up word, and the first voiceprint model is generated according to the at least two pieces of registered voice data. After the terminal generates a new voiceprint model according to the first voice data, if the new voiceprint model is directly adopted to replace the first voiceprint model, although the voice wake-up rate of the terminal for executing voice wake-up can be improved. However, directly replacing the first voiceprint model with a voiceprint model generated from new voice data (i.e., the first voice data) can greatly enhance the voice wakeup rate. And the voice wake-up rate is greatly improved, and the false wake-up rate of the terminal for executing voice wake-up can be correspondingly improved.

In order to stably improve the voice wake-up rate of the terminal and simultaneously reduce the false wake-up rate of the terminal for executing voice wake-up. The method for updating the first voiceprint model in the terminal by the terminal through the first voice data can comprise the following steps: the terminal adopts the first voice data to replace third voice data in at least two pieces of registered voice data to obtain at least two pieces of updated registered voice data, and the signal quality parameters of the third voice data are lower than those of other voice data in the at least two pieces of registered voice data; the terminal generates a second voice pattern model according to the updated at least two registered voice data; the terminal replaces the first voiceprint model with the second voiceprint model. The second voiceprint model is used to characterize voiceprint features of the updated at least two registered voice data.

In the embodiment of the application, the terminal adopts the first voice data to replace part of voice data in at least two registered voice data, such as third voice data; rather than generating the second acoustic model entirely from the first speech data. Thus, the voice wake-up rate of the terminal for executing voice wake-up can be stably improved. And the voice wake-up rate of the terminal can be stably improved, and meanwhile, the false wake-up rate of the terminal for executing voice wake-up can be reduced.

In combination with the first aspect, in another possible design manner, if the second voice print threshold generated by the terminal according to the second voice print model is different from the first voice print threshold greatly, the wake-up rate of the terminal for performing voice wake-up is caused to fluctuate greatly, so that user experience is affected. Based on the second voice channel model and the updated at least two registered voice data, the terminal can generate a second voice channel threshold; if the difference value between the second voice print threshold and the first voice print threshold is smaller than a first preset threshold, the terminal replaces the first voice print model with the second voice print model.

When the second voice print threshold and the first voice print threshold are changed greatly, the terminal can delete the second voice print model and the first voice data, namely, the first voice print model is not adopted to replace the second voice print model. Therefore, the influence on user experience caused by the great fluctuation of the wake-up rate of the terminal for executing voice wake-up due to the large difference value between the second voice print threshold and the first voice print threshold can be avoided.

In combination with the first aspect, in another possible design manner, in order to avoid that the terminal updates the first voiceprint model with voice data with poor signal quality, the terminal may first determine whether the signal quality parameter of the first voice data is higher than a second preset threshold before updating the first voiceprint model with the first voice data. Wherein, the signal quality parameter of the voice data is used for representing the signal quality of the voice data. For example, the signal quality parameter of the voice data may be a signal to noise ratio of the voice data. If the signal quality parameter of the first voice data is higher than the second preset threshold value, the signal quality of the first voice data is higher. In this case, the terminal may update the first voiceprint model with the first voice data. If the signal quality parameter of the first voice data is lower than or equal to the second preset threshold value, the terminal can delete the first voice data.

In a second aspect, an embodiment of the present application provides a terminal, including: the device comprises a storage unit, an input unit, a text verification unit, an identity authentication unit and an updating unit. The storage unit stores preset wake-up words registered in the terminal and a first voiceprint model. The first voiceprint model is used for carrying out voiceprint verification when a voice assistant is awakened, and the first voiceprint model represents voiceprint characteristics of preset awakening words. And the input unit is used for receiving the first voice data input by the user. And the text verification unit is used for judging whether the text corresponding to the first voice data is matched with the text of the preset wake-up word registered in the terminal. And the identity authentication unit is used for authenticating the identity of the user if the text verification unit determines that the text corresponding to the first voice data is matched with the text of the preset wake-up word. And the updating unit is used for updating the first voiceprint model in the terminal by adopting the first voice data if the identity authentication unit determines that the identity authentication passes.

With reference to the second aspect, in one possible design manner, the identity authentication unit is specifically configured to: and carrying out voiceprint verification on the first voice data by using the first voiceprint model, and if the voiceprint verification is passed, passing the identity authentication.

With reference to the second aspect, in another possible design manner, the terminal further includes: a starting unit and a determining unit. And the starting unit is used for starting the voice assistant when the identity authentication unit determines that the identity authentication passes. And the input unit is also used for receiving the second voice data through the voice assistant. And the determining unit is used for determining the second voice data received by the input unit to be a valid voice command after the identity authentication unit passes the identity authentication. And the updating unit is used for updating the first voiceprint model by adopting the first voice data after the determining unit determines that the second voice data is a valid voice command.

With reference to the second aspect, in another possible design manner, the terminal further includes: and a voiceprint verification unit. And the voiceprint verification unit is used for carrying out voiceprint verification on the first voice data by using the first voiceprint model before the identity authentication unit carries out identity authentication on the user. The text verification unit is further used for performing text verification on the voice data received by the input unit in the first preset time if the voiceprint verification unit determines that the first voice data does not pass the voiceprint verification. The identity authentication unit is specifically used for: and if the text verification unit determines that the input unit receives the second voice data and at least one voice data matched with the text of the preset wake-up word in the first preset time, authenticating the identity of the user. The text corresponding to the second voice data comprises preset keywords.

With reference to the second aspect, in another possible design manner, the terminal further includes: and a display unit. And the display unit is used for displaying the identity verification interface if the text verification unit determines that the input unit receives the second voice data and at least one voice data matched with the text of the preset wake-up word within the first preset time. The input unit is also used for receiving the identity verification information input by the user on the identity verification interface displayed by the display unit. The identity authentication unit is specifically configured to perform user identity authentication on the user according to the identity authentication information received by the input unit.

With reference to the second aspect, in another possible design manner, the preset wake word includes at least two pieces of registered voice data, where the at least two pieces of registered voice data are recorded when the terminal registers the preset wake word, and the first voiceprint model is generated according to the at least two pieces of registered voice data. The terminal further comprises: a replacement unit and a generation unit. And the replacing unit is used for replacing third voice data in the at least two pieces of registered voice data by adopting the first voice data to obtain at least two pieces of updated registered voice data, wherein the signal quality parameter of the third voice data is lower than that of other voice data in the at least two pieces of registered voice data. And the generating unit is used for generating a second voice pattern model according to the updated at least two registered voice data obtained by the replacing unit. And the updating unit is used for replacing the first voiceprint model by adopting the second voiceprint model generated by the generating unit, and the second voiceprint model is used for representing voiceprint characteristics of at least two updated registered voice data.

With reference to the second aspect, in another possible design manner, the storage unit is configured to store a first voiceprint threshold, where the first voiceprint threshold is generated by the generating unit according to the first voiceprint model and at least two registered voice data. The generating unit is further used for generating a second voice channel threshold according to the second voice channel model and at least two updated registered voice data before the updating unit replaces the first voice channel model with the second voice channel model after generating the second voice channel model;

the updating unit is specifically configured to replace the first voiceprint model with the second voiceprint model if the difference between the second voiceprint threshold and the first voiceprint threshold generated by the generating unit is smaller than a first preset threshold.

With reference to the second aspect, in another possible design manner, the terminal further includes: and deleting the unit. And the deleting unit is used for deleting the second voice pattern and the first voice data if the difference value between the second voice pattern threshold and the first voice pattern threshold generated by the generating unit is larger than or equal to a first preset threshold.

With reference to the second aspect, in another possible design manner, the updating unit is specifically configured to update the first voiceprint model with the first voice data if the signal quality parameter of the first voice data is higher than a second preset threshold. Wherein the signal quality parameter of the first voice data comprises a signal to noise ratio of the first voice data.

In a third aspect, an embodiment of the present application provides a terminal, which may include: a processor, a memory, and a display. The memory, display, and processor are coupled. The display is used for displaying the image generated by the processor. The memory is used for storing computer program codes, related information of the voice assistant, preset wake-up words registered in the terminal and the first voiceprint model. The computer program code comprises computer instructions which, when executed by a processor, are operable to receive first voice data input by a user; judging whether a text corresponding to the first voice data is matched with a text of a preset wake-up word or not; if the text corresponding to the first voice data is matched with the text of the preset wake-up word, authenticating the identity of the user; and if the identity authentication is passed, updating the first voiceprint model by adopting the first voice data. The voice assistant comprises a voice assistant, a first voice model and a second voice model, wherein the first voice model is used for conducting voice verification when the voice assistant is awakened, and the first voice model represents voice characteristics of preset awakening words.

With reference to the third aspect, in one possible design manner, the processor may be further configured to perform voiceprint verification on the first voice data using the first voiceprint model. If the voiceprint verification is passed, the identity authentication is passed.

With reference to the third aspect, in another possible design manner, the processor may be further configured to activate the voice assistant when the identity authentication passes; receiving, by the voice assistant, the second voice data; after the identity authentication is passed, the second voice data is determined to be a valid voice command. After the second voice data is determined to be a valid voice command, the first voice data is used for updating a first voiceprint model in the terminal.

With reference to the third aspect, in another possible design manner, the processor includes a coprocessor and a main processor; the coprocessor voice monitors voice data; when the coprocessor monitors first voice data with similarity to a preset wake-up word meeting preset conditions, the main processor is informed to judge whether a text corresponding to the first voice data is matched with a text of the preset wake-up word of the terminal, and when the text corresponding to the first voice data is determined to be matched with the text of the preset wake-up word, the main processor performs voiceprint verification on the first voice data by using a first voiceprint model.

With reference to the third aspect, in another possible design manner, the processor is further configured to perform voiceprint verification on the first voice data using the first voiceprint model before performing identity authentication on the user; if the first voice data does not pass the voiceprint verification, performing text verification on the voice data received in the first preset time; and if the second voice data and at least one voice data matched with the text of the preset wake-up word are received within the first preset time, authenticating the identity of the user. The text corresponding to the second voice data comprises preset keywords.

With reference to the third aspect, in another possible design manner, the processor is further configured to control the display to display the identity verification interface if the second voice data and the at least one voice data matching the text of the preset wake-up word are received within the first preset time. The processor is also used for receiving the identity verification information input by the user on the identity verification interface displayed on the display; and carrying out user identity verification on the user according to the identity verification information.

With reference to the third aspect, in another possible design manner, the processor includes a coprocessor and a main processor; the coprocessor monitors voice data; when the coprocessor monitors first voice data with similarity to a preset wake-up word meeting preset conditions, the main processor is informed to judge whether a text corresponding to the first voice data is matched with a text of the preset wake-up word of the terminal, and when the text corresponding to the first voice data is determined to be matched with the text of the preset wake-up word, the main processor performs voiceprint verification on the first voice data by using a first voiceprint model. The coprocessor monitors voice data within a first preset time; the notification main processor judges whether the voice data received in the first preset time comprises second voice data and at least one voice data matched with the text of the preset wake-up word, and the text corresponding to the second voice data comprises preset keywords.

With reference to the third aspect, in another possible design manner, the preset wake word stored in the memory includes at least two pieces of registered voice data, where the at least two pieces of registered voice data are recorded when the processor registers the preset wake word, and the first voiceprint model is generated by the processor according to the at least two pieces of registered voice data. The processor is further configured to replace third voice data in the at least two pieces of registered voice data with the first voice data, so as to obtain updated at least two pieces of registered voice data, where a signal quality parameter of the third voice data is lower than a signal quality parameter of other voice data in the at least two pieces of registered voice data; generating a second voice pattern model according to the updated at least two registered voice data; and replacing the first voiceprint model with a second voiceprint model, wherein the second voiceprint model is used for representing voiceprint characteristics of at least two updated registered voice data.

With reference to the third aspect, in another possible design manner, the memory further stores a first voiceprint threshold, where the first voiceprint threshold is generated by the processor according to the first voiceprint model and at least two registered voice data. The processor is further configured to generate a second voice threshold according to the second voice model and the updated at least two registered voice data before replacing the first voice model with the second voice model after generating the second voice model; and if the difference value between the second voiceprint threshold and the first voiceprint threshold is smaller than a first preset threshold, replacing the first voiceprint model by adopting the second voiceprint model.

With reference to the third aspect, in another possible design manner, the processor is further configured to delete the second voice print model and the first voice data if a difference between the second voice print threshold and the first voice print threshold is greater than or equal to a first preset threshold.

With reference to the third aspect, in another possible design manner, the processor is further configured to update the first voiceprint model with the first voice data if the signal quality parameter of the first voice data is higher than a second preset threshold. Wherein the signal quality parameter of the first voice data comprises a signal to noise ratio of the first voice data.

In a fourth aspect, embodiments of the present application provide a computer storage medium comprising computer instructions which, when run on a terminal, cause the terminal to perform a method according to the first aspect and any one of its possible designs.

In a fifth aspect, embodiments of the present application provide a computer program product which, when run on a computer, causes the computer to perform the method according to the first aspect and any one of its possible designs.

In addition, the technical effects of the terminal according to the second aspect and the third aspect and any of the design manners thereof, the computer storage medium according to the fourth aspect, and the computer program product according to the fifth aspect may refer to the technical effects of the first aspect and the different design manners thereof, which are not described herein.

Drawings

Fig. 1 is a schematic diagram of an example display interface of a terminal according to an embodiment of the present application;

fig. 2 is a schematic diagram of a display interface example of a terminal according to an embodiment of the present application;

fig. 3 is a schematic diagram of a hardware structure of a terminal according to an embodiment of the present application;

FIG. 4A is a flowchart illustrating a method for updating wake-up speech of a voice assistant by a terminal according to an embodiment of the present application;

fig. 4B is a schematic diagram of a display interface example of a terminal according to an embodiment of the present application;

FIG. 5A is a flowchart of a method for updating wake-up voice of a voice assistant by a terminal according to an embodiment of the present application;

fig. 5B is a schematic diagram of an example display interface of a terminal according to an embodiment of the present application;

FIG. 6 is a flowchart of a method for updating wake-up voice of a voice assistant by a terminal according to an embodiment of the present application;

FIG. 7 is a flowchart of a method for updating wake-up voice of a voice assistant by a terminal according to an embodiment of the present application;

FIG. 8 is a flowchart of a method for updating wake-up voice of a voice assistant by a terminal according to an embodiment of the present application;

FIG. 9 is a flowchart sixth of a method for updating wake-up voice of a voice assistant by a terminal according to an embodiment of the present application;

Fig. 10 is a schematic diagram fifth example of a display interface of a terminal according to an embodiment of the present application;

FIG. 11 is a flowchart seventh illustrating a method for updating wake-up voice of a voice assistant by a terminal according to an embodiment of the present application;

FIG. 12 is a flowchart eighth of a method for updating wake-up voice of a voice assistant by a terminal according to an embodiment of the present application;

fig. 13 is a schematic diagram of a structural composition of a terminal according to an embodiment of the present application;

fig. 14 is a schematic diagram II of the structural composition of a terminal according to an embodiment of the present application;

fig. 15 is a schematic diagram III of the structural composition of a terminal according to an embodiment of the present application.

Detailed Description

The embodiment of the application provides a method for updating wake-up voice of a voice assistant by a terminal and the terminal, which can be applied to a process of executing voice wake-up by the terminal in response to voice data input by a user.

The terminal may receive a preset wake-up word registered by the user before performing voice wake-up. The preset wake-up word is used for waking up a voice assistant in the terminal so that the terminal can provide voice control service for a user through the voice assistant. The wake-up voice assistant in the embodiment of the application means that the terminal responds to voice data sent by a user and starts the voice assistant. The voice control service refers to: after the voice assistant of the terminal is started, the user can trigger the terminal to execute a corresponding event by sending a voice command (i.e., voice data) to the voice assistant. The preset wake-up word in the embodiment of the application is a section of voice data. The voice data is wake-up voice for waking up the voice assistant.

The voice assistant may be an Application (APP) installed in the terminal. The voice assistant may be an embedded application in the terminal (i.e., a system application of the terminal) or a downloadable application. Wherein the embedded application is an application provided as part of a terminal (e.g., handset) implementation. For example, the embedded application may be a "setup" application, a "short message" application, a "camera" application, and the like. The downloadable application is an application that can provide its own internet protocol multimedia subsystem (Internet Protocol Multimedia Subsystem, IMS) connection, which can be an application pre-installed in the terminal or a third party application that can be downloaded by the user and installed in the terminal. For example, the downloadable application may be a "WeChat" application, a "Payment Buddha" application, a "mail" application, and the like.

In the embodiment of the present application, taking the mobile phone 100 shown in fig. 1 as an example, a process of registering a preset wake-up word in a terminal is described as follows:

the handset 100 may receive a user click (e.g., a single click) on the "set" application icon. In response to a click operation of the "set" application icon by the user, the mobile phone 100 may display the setting interface 101 shown in fig. 1 (a). The settings interface 101 may include a "flight mode" option, a "WLAN" option, a "bluetooth" option, a "mobile network" option, and a "smart assist" option 102, among others. The specific functions of the "flight mode" option, the "WLAN" option, the "bluetooth" option and the "mobile network" option may refer to the specific descriptions in the conventional technology, and the embodiments of the present application are not repeated here.

The handset 100 may receive a user click (e.g., a single click) on the "smart assist" option 102. In response to a user clicking on the "Smart Assist" option 102, the handset 100 may display the Smart Assist interface 103 shown in FIG. 1 (b). The intelligent auxiliary interface 103 includes a gesture control option 104, a voice control option 105, and the like. The "gesture control" option 104 is used to manage the gesture of the user that triggers the mobile phone 100 to execute the corresponding event. The "voice control" option 105 is used to manage the voice wake-up function of the handset 100. Specifically, the mobile phone 100 may receive a click operation of the "voice control" option 105 by the user, and the mobile phone 100 may display the voice control interface 106 shown in fig. 1 (c). The voice control interface 106 includes a "voice wake" option 107 and an "incoming voice control" option 108. Wherein the "voice wake" option 107 is used to turn on or off the voice wake function of the handset 100. The voice wake-up function of a terminal (e.g., handset 100) is described later with reference to embodiments of the present application, which are not described herein. The "incoming voice control" option 108 is used to trigger the handset 100 to turn on or off the voice wake-up function when the handset 100 receives an incoming call. For example, assume that the "incoming call voice control" option 108 of the handset 100 is in an on state. When the mobile phone 100 receives the incoming call of other terminals and carries out incoming call reminding, if the mobile phone 100 recognizes that voice data recorded by a phone owner is "answer call", the mobile phone 100 can automatically answer the incoming call; if the mobile phone 100 recognizes that the phone is "hung up" according to the voice data recorded by the owner, the mobile phone 100 can automatically reject the incoming call.

The handset 100 may receive a click (e.g., a single click) of the "voice wakeup" option 107 from the user. In response to a user clicking on the "voice wakeup" option 107, the handset 100 may display a voice wakeup interface 109 shown in fig. 1 (d). The voice wake interface 109 includes a "voice wake" switch 110, a "find phone" option 111, a "how to make phone call" option 112, and a "wake" option 113. Wherein the "voice wake-up" switch 110 is used to trigger the mobile phone 100 to turn on or off the voice wake-up function. The "find phone" option 111 and the "how to make phone call" option 112 are used to instruct the voice assistant of the mobile phone 100 to activate the voice control function of the mobile phone 100. For example, the "find handset" option 111 is used to indicate that after the voice assistant of the handset 100 is activated, the voice assistant of the handset 100 responds to the user's voice data "where? "can respond to the user to facilitate the user in finding the cell phone 100. The "how to make phone call" option 112 is used to instruct the voice assistant of the mobile phone 100 to automatically make phone calls to the contact Bob in response to the user's voice data "make phone calls to Bob (Bob)", after the voice assistant of the mobile phone 100 is started.

The "wake word" option 113 is used to register wake words with the handset 100 for waking up the handset 100 (e.g., a voice assistant of the handset 100). Before the user has not registered the custom wake word in the handset 100, the handset 100 may indicate a default wake word to the user, e.g., assuming the default wake word of the handset 100 is "my small k".

Assume that the "voice wake" switch 110 is in an on state and that no user-defined wake-up word has been registered in the handset 100. The handset 100 may receive a click operation (e.g., a single click operation) of the "wake word" option 113 shown in fig. 1 (d) by the user. In response to a user clicking on the "wake word" option 113, the mobile phone 100 may display a default wake word registration interface 201 shown in fig. 2 (a). The default wake word registration interface 201 may include: a recording progress bar 202, a "custom wake" option 203, a "microphone" option 204, and recording prompts 205. The "microphone" option 204 is used to trigger the mobile phone 100 to start recording voice data as a wake-up word. The recording progress bar 202 is used for displaying the progress of recording the wake-up word by the mobile phone 100. The recording prompt 205 is used to indicate a default wake-up word for the handset 100. For example, the recording hint 205 may be "please help the cell phone learn wake words (my small k), click to speak" my small k' ". Optionally, the default wake-up word registration interface 201 may also include a recording prompt "please record about 30 cm away from the phone in a quiet environment-! ". The default wake word registration interface 201 also includes a "cancel" button 206 and a "ok" button 207. The "ok" button 207 is used to trigger the handset 100 to save the recorded wake-up word. The cancel button 206 is used to trigger the mobile phone to cancel registration of the wake-up word and display the voice wake-up interface 109 shown in fig. 1 (d).

The handset 100 may begin recording voice data entered by the user in response to a user clicking on the "microphone" option 204. After the mobile phone 100 receives the voice data (denoted as voice data 1) input by the user, it can determine whether the voice data 1 meets the preset condition. If the voice data 1 does not satisfy the preset condition, the mobile phone 100 may delete the voice data 1 and redisplay the default wake-up word registration interface 201 shown in fig. 2 (a). If the voice data 1 satisfies the preset condition, the mobile phone 100 may save the voice data 1.

In the embodiment of the present application, the satisfaction of the preset condition by the voice data 1 may specifically be: the text information corresponding to the voice data 1 is text information 'My small k' of a default wake-up word, and the signal-to-noise ratio of the voice data 1 is higher than a preset threshold.

After receiving the voice data 1 meeting the preset conditions input by the user, the mobile phone 100 may generate a voiceprint model for performing voiceprint verification when waking up the voice assistant according to the voice data 1 meeting the preset conditions, and generate a voiceprint threshold according to the voice data 1 and the voiceprint model. The voiceprint model can characterize voiceprint features of wake words registered by the user.

It will be appreciated that the voiceprint model corresponds to a function. Different voiceprint models can be generated from different voice data. That is, the mobile phone 100 may generate different voiceprint models according to different wake words registered by the same user. Different voiceprint models can also be generated by different users registering the same wake-up word with the handset 100. The mobile phone 100 may take the voice data 1 (i.e., the voice data that is input by the user when the user registers the wake-up word and satisfies the preset condition) as an input value, and substitutes the voice data into the voiceprint model to obtain a voiceprint value (e.g., the voiceprint value a).

Optionally, to improve the accuracy of voice wakeup. The terminal may record a plurality of voice data satisfying a preset condition. The terminal can generate a voiceprint model for voiceprint verification when waking up the voice assistant according to a plurality of voice data meeting preset conditions. For example, after the voice data 1 satisfies the preset condition and the voice data 1 is stored, the mobile phone 100 may prompt the user to record the voice data again.

The "custom wake-up word" option 203 is used to trigger the mobile phone 100 to display a wake-up word input interface. For example, the mobile phone 100 may display the wake word input interface 208 shown in fig. 2 (b) in response to a user clicking (e.g., clicking) on the "custom wake word" option 203 shown in fig. 2 (a). The wake word input interface 208 may include a "cancel" button 209, a "ok" button 210, a "wake word input box" 211, and a wake word suggestion 212. The cancel button 209 is used to trigger the mobile phone to cancel the custom wake-up word, and display the default wake-up word registration interface 201 shown in fig. 2 (a). The "wake word input box" 211 is used for receiving a custom wake word input by a user. The "ok" button 210 is used to save the custom wake word entered by the user at the "wake word entry box" 211. The wake word suggestion 212 is used to prompt the user for a custom wake word requirement by the phone.

Assume that the user inputs the custom wake-up word "my superphone" in a wake-up word input box 211 shown in fig. 2 (c). The mobile phone 100 may display the custom wake word registration interface 213 shown in (d) of fig. 2 in response to a click operation (e.g., a single click operation) of the "ok" button 210 shown in (c) of fig. 2 by a user, so that the user may register the custom wake word in the custom wake word registration interface 213. The method for the user to register the custom wake word in the custom wake word registration interface 213 is the same as the method for registering the default wake word in the default wake word registration interface 201, and the embodiments of the present application are not described here again.

It will be appreciated that if a user-defined wake word has been registered with the handset 100, such as the user-defined wake word being "my superhandset", then the handset 100 may display the user-defined wake word registration interface 216 shown in fig. 2 (d) in response to a user clicking on the "wake word" option 113 shown in fig. 1 (d).

It should be noted that different terminals have different designs. For example, in some terminals, the intelligent assistance may be referred to as an assistance function, the voice control may be referred to as a voice assistant, and the voice wake-up may be referred to as a wake-up function. And, the manner in which the user triggers the terminal to display the wake-up word registration interface (e.g., default wake-up word registration interface or custom wake-up word registration interface) includes, but is not limited to, the user's "set-intelligent auxiliary-voice control-voice wake-up word" operation in the terminal. For example, in some terminals, the manner in which the user triggers the terminal to display the wake-up word registration interface may be "setup-voice assistant-voice wake-up word".

In the embodiment of the present application, the wake-up word of the mobile phone 100 is taken as a default wake-up word "my small k" as an example, and the voice wake-up process of the mobile phone 100 is described as follows:

when the DSP of the mobile phone 100 monitors that the similarity between the voice data (such as voice data 2) and the default wake-up word "my small k" meets a certain condition, the monitored voice data 2 may be handed to the AP. The voice data 2 is text checked by the AP. When recognizing that the text corresponding to the voice data 2 is "my small k", the AP may substitute the voice data 2 as an input value into the voiceprint model of the mobile phone 100 to obtain a voiceprint value (voiceprint value b). If the difference between the voiceprint value b and the voiceprint threshold (i.e., voiceprint value a) is less than the predetermined threshold, the AP may determine that the voice data 2 matches the wake-up word registered by the user.

In order to adapt to the physical state of the user and/or the change of the noise scene where the user is located, part of mobile phones can periodically remind the user to re-register the wake-up words. However, the process of manually registering the wake-up word is complicated, and multiple times of manually registering the wake-up word wastes time of the user and affects the user experience.

In the embodiment of the application, the terminal can acquire the effective wake-up word in the voice wake-up executing process, and the terminal updates the wake-up word registered by the user by adopting the effective wake-up word. The valid wake-up word in the embodiment of the present application may include voice data that successfully wakes up the terminal. In the process of executing voice awakening, the terminal automatically acquires the effective awakening words to update the awakening words registered by the user, so that the complicated operation when the user manually re-registers the awakening words can be omitted.

The application provides a principle of a method for updating wake-up voice of a voice assistant by a terminal, which comprises the following steps: because the effective wake-up word is voice data acquired by the terminal in the process of executing voice wake-up; thus, the valid wake word is speech data related to the user's current physical state and the noise scenario in which the user is currently located. Moreover, the terminal can be successfully awakened due to the effective awakening words; therefore, the matching degree of the effective wake-up word and the wake-up word registered by the user meets the condition of voice wake-up. In summary, if the terminal uses the effective wake-up word to update the wake-up word registered by the user, and then uses the updated wake-up word to perform voice wake-up, the method can adapt to the change of the physical state of the user and/or the noise scene where the user is located, and further can improve the voice wake-up rate of the mobile phone, and reduce the false wake-up rate of the terminal executing voice wake-up.

The terminal in the embodiment of the present application may be a portable computer (such as a mobile phone), a notebook computer, a personal computer (Personal Computer, PC), a wearable electronic device (such as a smart watch), a tablet computer, an augmented reality (augmented reality, AR) \virtual reality (VR) device, a vehicle-mounted computer, or the like, and the following embodiments do not limit the specific form of the terminal in particular.

Referring to fig. 3, a block diagram of a terminal 300 according to an embodiment of the present application is shown. The terminal 300 may include, among other things, a processor 310, an external memory interface 320, an internal memory 321, a usb interface 330, a charge management module 340, a power management module 341, a battery 342, an antenna 1, an antenna 2, a radio frequency module 350, a communication module 360, an audio module 370, a speaker 370A, a receiver 370B, a microphone 370C, an earphone interface 370D, a sensor module 380, keys 390, a motor 391, an indicator 392, a camera 393, a display 394, and a SIM card interface 395. The sensor modules may include a pressure sensor 380A, a gyroscope sensor 380B, a barometric sensor 380C, a magnetic sensor 380D, an acceleration sensor 380E, a distance sensor 380F, a proximity sensor 380G, a fingerprint sensor 380H, a temperature sensor 380J, a touch sensor 380K, an ambient light sensor 380L, a bone conduction sensor, etc.

Among them, the terminal 300 shown in fig. 3 is only one example of a terminal. The structure illustrated in fig. 3 does not constitute a limitation of the terminal 300. More or fewer components than shown may be included, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

The processor 310 may include one or more processing units, such as: the processor 310 may include an application processor (Application Processor, AP), a modem processor, a graphics processor (Graphics Processing Unit, GPU), an image signal processor (Image Signal Processor, ISP), a controller, a memory, a video codec, a DSP, a baseband processor, and/or a Neural network processor (Neural-network Processing Unit, NPU), etc. Wherein, the different processing units can be independent devices or integrated in the same processor.

In the embodiment of the application, the DSP can monitor voice data in real time. When the similarity between the voice data monitored by the DSP and the wake-up words registered in the terminal meets the preset condition, the voice data can be handed to the AP. And carrying out text verification and voiceprint verification on the voice data by the AP. When the AP determines that the voice data matches the wake-up word registered by the user, the terminal may turn on the voice assistant.

The controller may be a decision maker that directs the various components of the terminal 300 to work in concert in accordance with instructions. Is the neural and command center of the terminal 300. The controller generates an operation control signal according to the instruction operation code and the time sequence signal to finish the control of instruction fetching and instruction execution.

A memory may also be provided in the processor 310 for storing instructions and data. In some embodiments, the memory in the processor is a cache memory. Instructions or data that the processor has just used or recycled may be saved. If the processor needs to reuse the instruction or data, it can be called directly from the memory. Repeated access is avoided, and the waiting time of the processor is reduced, so that the efficiency of the system is improved.

In some embodiments, the processor 310 may include an interface. The interfaces may include an integrated circuit (Inter-Integrated Circuit, I2C) interface, an integrated circuit built-in audio (Inter-Integrated Circuit Sound, I2S) interface, a pulse code modulation (Pulse Code Modulation, PCM) interface, a universal asynchronous receiver Transmitter (Universal Asynchronous Receiver/Transmitter, UART) interface, a mobile industry processor interface (Mobile Industry Processor Interface, MIPI), a General-Purpose Input/output (GPIO) interface, a subscriber identity module (Subscriber Identity Module, SIM) interface, and/or a universal serial bus (Universal Serial Bus, USB) interface, among others.

The I2C interface is a bi-directional synchronous Serial bus, comprising a Serial Data Line (SDA) and a Serial clock Line (Derail Clock Line, SCL). In some embodiments, the processor may contain multiple sets of I2C buses. The processor may be coupled to the touch sensor, charger, flash, camera, etc. via different I2C bus interfaces, respectively. For example: the processor may be coupled to the touch sensor through an I2C interface, so that the processor and the touch sensor communicate through an I2C bus interface to implement a touch function of the terminal 300.

The I2S interface may be used for audio communication. In some embodiments, the processor may contain multiple sets of I2S buses. The processor may be coupled to the audio module via an I2S bus to enable communication between the processor and the audio module. In some embodiments, the audio module may transmit an audio signal to the communication module through the I2S interface, so as to implement a function of answering a call through the bluetooth headset.

PCM interfaces may also be used for audio communication to sample, quantize and encode analog signals. In some embodiments, the audio module and the communication module may be coupled through a PCM bus interface. In some embodiments, the audio module may also transmit an audio signal to the communication module through the PCM interface, so as to implement a function of answering a call through the bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication, the sampling rates of the two interfaces being different.

The UART interface is a universal serial data bus for asynchronous communications. The bus is a bi-directional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is typically used to connect the processor with the communication module 360. For example: the processor communicates with the Bluetooth module through a UART interface to realize the Bluetooth function. In some embodiments, the audio module may transmit an audio signal to the communication module through the UART interface, so as to implement a function of playing music through the bluetooth headset.

The MIPI interface may be used to connect a processor to a display, camera, or other peripheral device. The MIPI interfaces include camera serial interfaces (Camera Serial Interface, CSI), display serial interfaces (Display Serial Interface, DSI), and the like. In some embodiments, the processor and the camera communicate through the CSI interface to implement the photographing function of the terminal 300. The processor and the display screen communicate through the DSI interface to realize the display function of the terminal 300.

The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal or as a data signal. In some embodiments, a GPIO interface may be used to connect a processor with a camera, display screen, communication module, audio module, sensor, etc. The GPIO interface may also be configured as an I2C interface, an I2S interface, a UART interface, an MIPI interface, etc.

USB interface 330 may be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface may be used to connect a charger to charge the terminal 300, or may be used to transfer data between the terminal 300 and a peripheral device. And can also be used for connecting with a headset, and playing audio through the headset. But also for connecting other electronic devices, such as AR devices, etc.

The interface connection relationship between the modules illustrated in the embodiment of the present application is only schematically illustrated, and does not limit the structure of the terminal 300. The terminal 300 may use different interfacing means, or a combination of interfacing means in the embodiment of the present application.

The charge management module 340 is configured to receive a charge input from a charger. The charger can be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module may receive a charging input of the wired charger through the USB interface. In some wireless charging embodiments, the charging management module may receive wireless charging input through a wireless charging coil of the terminal 300. The charging management module may also supply power to the terminal device through the power management module 341 while charging the battery.

The power management module 341 is configured to connect the battery 342, the charge management module 340 and the processor 310. The power management module receives the input of the battery and/or the charging management module and supplies power for the processor, the internal memory, the external memory, the display screen, the camera, the communication module and the like. The power management module can also be used for monitoring parameters such as battery capacity, battery cycle times, battery health status (leakage, impedance) and the like. In some embodiments, the power management module 341 may also be disposed in the processor 310. In some embodiments, the power management module 341 and the charge management module may also be provided in the same device.

The wireless communication function of the terminal 300 may be implemented by the antenna module 1, the antenna module 2, the radio frequency module 350, the communication module 360, a modem, a baseband processor, and the like. The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in terminal 300 may be configured to cover a single or multiple communication bands. Different antennas may also be multiplexed to improve the utilization of the antennas. For example: the cellular network antennas may be multiplexed into wireless local area network diversity antennas. In some embodiments, the antenna may be used in conjunction with a tuning switch.

The radio frequency module 350 may provide a communication processing module including a solution of wireless communication of 2G/3G/4G/5G or the like applied to the terminal 300. May include at least one filter, switch, power amplifier, low noise amplifier (Low Noise Amplifier, LNA), etc. The radio frequency module receives electromagnetic waves from the antenna 1, filters, amplifies and the like the received electromagnetic waves, and transmits the electromagnetic waves to the modem for demodulation. The radio frequency module can amplify the signal modulated by the modem and convert the signal into electromagnetic waves to radiate through the antenna 1. In some embodiments, at least some of the functional modules of the radio frequency module 350 may be disposed in the processor 310. In some embodiments, at least some of the functional modules of the radio frequency module 350 may be disposed in the same device as at least some of the modules of the processor 310.

The modem may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low frequency baseband signal to the baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs sound signals through an audio device (not limited to speakers, receivers, etc.), or displays images or video through a display screen. In some embodiments, the modem may be a stand-alone device. In some embodiments, the modem may be provided in the same device as the radio frequency module or other functional module, independent of the processor.

The communication module 360 may provide a communication processing module including solutions of wireless communication such as wireless local area network (Wireless Local Area Networks, WLAN) (e.g., wireless fidelity (Wireless Fidelity, wi-Fi) network), bluetooth (BT), global navigation satellite system (Global Navigation Satellite System, GNSS), frequency modulation (Frequency Modulation, FM), near field wireless communication technology (Near Field Communication, NFC), infrared technology (IR), etc., applied on the terminal 300. The communication module 360 may be one or more devices integrating at least one communication processing module. The communication module receives electromagnetic waves via the antenna 2, modulates the electromagnetic wave signals, filters the electromagnetic wave signals, and transmits the processed signals to the processor. The communication module 360 may also receive a signal to be transmitted from the processor, frequency modulate it, amplify it, and convert it to electromagnetic waves for radiation via the antenna 2.

In some embodiments, antenna 1 and the radio frequency module of terminal 300 are coupled, and antenna 2 and communication module 360 are coupled. Such that the terminal 300 may communicate with the network and other devices through wireless communication techniques. The wireless communication techniques may include the Global System for Mobile communications (Global System For Mobile Communications, GSM), general packet radio service (General Packet Radio Service, GPRS), code division multiple access (Code Division Multiple Access, CDMA), wideband code division multiple access (Wideband Code Division Multiple Access, WCDMA), time division code division multiple access (Time-Division Code Division Multiple Access, TD-SCDMA), long term evolution (Long Term Evolution, LTE), BT, GNSS, WLAN, NFC, FM, and/or IR techniques, among others. The GNSS may include a global satellite positioning system (Global Positioning System, GPS), a global navigation satellite system (global navigation satellite system, GLONASS), a Beidou satellite navigation system (BeiDou Navigation Satellite System, BDS), a Quasi zenith satellite system (Quasi-Zenith Satellite System, QZSS)) and/or a satellite based augmentation system (Satellite Based Augmentation Systems, SBAS).

Terminal 300 implements display functions via a GPU, display screen 394, and an application processor, etc. The GPU is a microprocessor for image processing and is connected with the display screen and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 310 may include one or more GPUs that execute program instructions to generate or change display information.

The display screen 394 is used for displaying images, videos, and the like. The display screen includes a display panel. The display panel may use a liquid crystal display (Liquid Crystal Display, LCD), an Organic Light-Emitting Diode (OLED), an Active-Matrix Organic Light Emitting Diode (AMOLED), a flexible Light-Emitting Diode (Flex Light-Emitting Diode), a mini, micro-OLED, a quantum dot Light-Emitting Diode (Quantum Dot Light Emitting Diodes, QLED), or the like. In some embodiments, the terminal 300 may include 1 or N displays, N being a positive integer greater than 1.

The terminal 300 may implement a photographing function through an ISP, a camera 393, a video codec, a GPU, a display screen, an application processor, and the like.

The ISP is used for processing the data fed back by the camera. For example, when photographing, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electric signal, and the camera photosensitive element transmits the electric signal to the ISP for processing and is converted into an image visible to naked eyes. ISP can also optimize the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in the camera 393.

Camera 393 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image onto the photosensitive element. The photosensitive element may be a charge coupled device (charge coupled device, CCD) or a Complementary Metal Oxide Semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, which is then transferred to the ISP to be converted into a digital image signal. The ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into an image signal in a standard RGB, YUV, or the like format. In some embodiments, the terminal 300 may include 1 or N cameras, N being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals. For example, when the terminal 300 selects a frequency bin, the digital signal processor is used to fourier transform the frequency bin energy, etc.

Video codecs are used to compress or decompress digital video. The terminal 300 may support one or more codecs. Thus, the terminal 300 may play or record video in a variety of encoding formats, such as: MPEG1, MPEG2, MPEG3, MPEG4, etc.

The NPU is a Neural-Network (NN) computing processor, and can rapidly process input information by referencing a biological Neural Network structure, for example, referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent cognition of the terminal 300 can be implemented through the NPU, for example: image recognition, face recognition, speech recognition, text understanding, etc.

The external memory interface 320 may be used to connect an external memory card, such as a Micro SD card, to realize the memory capability of the extension terminal 300. The external memory card communicates with the processor through an external memory interface to realize the data storage function. For example, files such as music, video, etc. are stored in an external memory card.

The internal memory 321 may be used to store computer executable program code comprising instructions. The processor 310 executes various functional applications of the terminal 300 and data processing by executing instructions stored in the internal memory 321. The memory 321 may include a stored program area and a stored data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data (e.g., audio data, phonebook, etc.) created during use of the terminal 300, etc. Among them, data (such as audio data, phonebook, etc.) created during the use of the terminal 300 may be referred to as user data. In addition, the internal Memory 321 may include a high-speed random access Memory (Random Access Memory, RAM), a Read Only Memory (ROM), and may also include a nonvolatile Memory, such as at least one magnetic disk storage device, a flash Memory device, other volatile solid-state storage devices, a universal flash Memory (Universal Flash Storage, UFS), and the like.

Wherein the internal memory 321 includes a data partition (e.g., a data partition) according to an embodiment of the present application. The data partition stores files or data which are required to be read and written when the operating system is started and user data which are created in the using process of the terminal. The data partition may be a predetermined memory area in the internal memory 321. For example, the data partition may be contained in RAM in internal memory 321.

The virtual data partition in the embodiment of the present application may be a storage area of RAM in the internal memory 321. Alternatively, the virtual data partition may be a storage area of the ROM in the internal memory 321. Alternatively, the virtual data partition may be an external memory card, such as a Micro SD card, to which the external memory interface 320 is connected.

The terminal 300 may implement audio functions through an audio module 370, a speaker 370A, a receiver 370B, a microphone 370C, an earphone interface 370D, an application processor, and the like. Such as music playing, recording, etc.

The audio module is used for converting digital audio information into analog audio signals for output and also used for converting analog audio input into digital audio signals. The audio module may also be used to encode and decode audio signals. In some embodiments, the audio module may be disposed in the processor 310, or a portion of the functional modules of the audio module may be disposed in the processor 310.

Speaker 370A, also known as a "horn," is used to convert audio electrical signals into sound signals. The terminal 300 can listen to music through a speaker or listen to hands-free calls.

A receiver 370B, also referred to as a "earpiece", is used to convert the audio electrical signal into a sound signal. When the terminal 300 receives a telephone call or voice message, it is possible to receive voice by approaching the receiver to the human ear.

Microphone 370C, also referred to as a "microphone," is used to convert sound signals into electrical signals. When making a call or transmitting voice information, a user can sound near the microphone through the mouth, inputting a sound signal to the microphone. The terminal 300 may be provided with at least one microphone. In some embodiments, the terminal 300 may be provided with two microphones, and may implement a noise reduction function in addition to collecting sound signals. In some embodiments, the terminal 300 may also be provided with three, four or more microphones to enable collection of sound signals, noise reduction, identification of sound sources, directional recording functions, etc.

The earphone interface 370D is for connecting a wired earphone. The earphone interface may be a USB interface or a 3.5mm open mobile terminal platform (Open Mobile Terminal Platform, OMTP) standard interface, a american cellular telecommunications industry association (Cellular Telecommunications Industry Association of the USA, CTIA) standard interface.

The pressure sensor 380A is configured to sense a pressure signal and convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor may be provided on the display screen. Pressure sensors are of many kinds, such as resistive pressure sensors, inductive pressure sensors, capacitive pressure sensors, etc. The capacitive pressure sensor may be a capacitive pressure sensor comprising at least two parallel plates with conductive material. When a force is applied to the pressure sensor, the capacitance between the electrodes changes. The terminal 300 determines the strength of the pressure according to the change of the capacitance. When a touch operation is applied to the display screen, the terminal 300 detects the intensity of the touch operation according to the pressure sensor. The terminal 300 may also calculate the location of the touch based on the detection signal of the pressure sensor. In some embodiments, touch operations that act on the same touch location, but at different touch operation strengths, may correspond to different operation instructions. For example: and executing an instruction for checking the short message when the touch operation with the touch operation intensity smaller than the first pressure threshold acts on the short message application icon. And executing an instruction for newly creating the short message when the touch operation with the touch operation intensity being greater than or equal to the first pressure threshold acts on the short message application icon.

The gyro sensor 380B may be used to determine a motion gesture of the terminal 300. In some embodiments, the angular velocity of the terminal 300 about three axes (i.e., x, y, and z axes) may be determined by a gyroscopic sensor. The gyro sensor may be used for photographing anti-shake. For example, when the shutter is pressed, the gyro sensor detects the shake angle of the terminal 300, calculates the distance to be compensated by the lens module according to the angle, and makes the lens counteract the shake of the terminal 300 through the reverse motion, thereby realizing the anti-shake. The gyroscopic sensor may also be used to navigate, somatosensory a game scene.

The air pressure sensor 380C is used to measure air pressure. In some embodiments, the terminal 300 calculates altitude from barometric pressure values measured by barometric pressure sensors, aiding in positioning and navigation.

The magnetic sensor 380D includes a hall sensor. The terminal 300 may detect the opening and closing of the flip cover using a magnetic sensor. In some embodiments, when the terminal 300 is a folder, the terminal 300 may detect the opening and closing of the folder according to the magnetic sensor. And then according to the detected opening and closing state of the leather sheath or the opening and closing state of the flip, the characteristics of automatic unlocking of the flip and the like are set.

The acceleration sensor 380E may detect the magnitude of acceleration of the terminal 300 in various directions (typically three axes). The magnitude and direction of gravity may be detected when the terminal 300 is stationary. The method can also be used for identifying the gesture of the terminal, and is applied to the applications such as horizontal and vertical screen switching, pedometers and the like.

A distance sensor 380F for measuring distance. The terminal 300 may measure the distance by infrared or laser. In some embodiments, the terminal 300 may range using a distance sensor to achieve quick focusing when shooting a scene.

The proximity light sensor 380G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. Infrared light is emitted outwards by the light emitting diode. A photodiode is used to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it may be determined that there is an object near the terminal 300. When insufficient reflected light is detected, it can be determined that there is no object near the terminal 300. The terminal 300 can detect that the user holds the terminal 300 close to the ear to talk by using the proximity light sensor, so as to automatically extinguish the screen to achieve the purpose of saving electricity. The proximity light sensor can also be used in a holster mode, and a pocket mode can be used for automatically unlocking and locking a screen.

The ambient light sensor 380L is used to sense ambient light level. The terminal 300 may adaptively adjust the display brightness according to the perceived ambient light level. The ambient light sensor may also be used to automatically adjust white balance when taking a photograph. The ambient light sensor may also cooperate with the proximity light sensor to detect whether the terminal 300 is in a pocket to prevent false touches.

The fingerprint sensor 380H is used to collect a fingerprint. The terminal 300 can utilize the collected fingerprint characteristics to realize fingerprint unlocking, access an application lock, fingerprint photographing, fingerprint incoming call answering and the like.

The temperature sensor 380J is used to detect temperature. In some embodiments, the terminal 300 performs a temperature processing strategy using the temperature detected by the temperature sensor. For example, when the temperature reported by the temperature sensor exceeds a threshold, the terminal 300 performs a reduction in performance of a processor located near the temperature sensor in order to reduce power consumption and implement thermal protection.

The touch sensor 380K, also referred to as a "touch panel". Can be arranged on a display screen. For detecting a touch operation acting on or near it. The detected touch operation may be communicated to an application processor to determine the touch event type and provide a corresponding visual output through the display screen.

The bone conduction sensor 380M may acquire a vibration signal. In some embodiments, the bone conduction sensor may acquire a vibration signal of the human vocal tract vibrating the bone pieces. The bone conduction sensor can also contact the pulse of a human body to receive the blood pressure jumping signal. In some embodiments, the bone conduction sensor may also be provided in the headset. The audio module 370 may parse out a voice signal based on the vibration signal of the vocal part vibration bone piece obtained by the bone conduction sensor, so as to implement a voice function. The application processor can analyze heart rate information based on the blood pressure beating signals acquired by the bone conduction sensor, so that a heart rate detection function is realized.

The keys 390 include a power on key, a volume key, etc. The keys may be mechanical keys. Or may be a touch key. The terminal 300 receives key inputs, and generates key signal inputs related to user settings and function controls of the terminal 300.

The motor 391 may generate a vibration alert. The motor can be used for incoming call vibration prompting and also can be used for touch vibration feedback. For example, touch operations acting on different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The touch operation on different areas of the display screen can also correspond to different vibration feedback effects. Different application scenarios (such as time reminding, receiving information, alarm clock, game, etc.) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

The indicator 392 may be an indicator light, which may be used to indicate a state of charge, a change in charge, a message indicating a missed call, a notification, etc.

The SIM card interface 395 is for interfacing with a subscriber identity module (Subscriber Identity Module, SIM). The SIM card may be inserted into or withdrawn from the SIM card interface to achieve contact and separation with the terminal 300. The terminal 300 may support 1 or N SIM card interfaces, N being a positive integer greater than 1. The SIM card interface may support Nano SIM cards, micro SIM cards, etc. The same SIM card interface can be used to insert multiple cards simultaneously. The types of the plurality of cards may be the same or different. The SIM card interface may also be compatible with different types of SIM cards. The SIM card interface may also be compatible with external memory cards. The terminal 300 interacts with the network through the SIM card to realize functions such as communication and data communication. In some embodiments, the terminal 300 uses esims, i.e.: an embedded SIM card. The eSIM card may be embedded in the terminal 300 and cannot be separated from the terminal 300.

The method for updating the wake-up voice of the voice assistant by the terminal provided by the embodiment of the application can be implemented in the terminal 300.

The embodiment of the application provides a method for updating wake-up voice of a voice assistant by a terminal. The terminal 300 may receive first voice data input by a user; judging whether the text corresponding to the first voice data is matched with the text of a preset wake-up word registered in the terminal 300; if the text corresponding to the first voice data is matched with the text of the preset wake-up word, the terminal 300 performs identity authentication on the user; if the identity authentication is passed, the terminal 300 updates the first voiceprint model in the terminal using the first voice data.

The voice assistant comprises a voice assistant, a voice assistant and a first voice model, wherein the first voice model is used for conducting voice verification when the voice assistant is awakened, and the first voice model represents voice characteristics of preset awakening words in the terminal.

In an embodiment of the present application, the terminal performs identity authentication on the user, specifically: and the terminal uses the first voiceprint model to carry out voiceprint verification on the first voice data. And if the first voice data passes the voiceprint verification, the identity authentication is indicated to pass.

In the embodiment of the application, if the text corresponding to the first voice data can be matched with the text of the preset wake-up word and the user identity authentication is passed, the first voice data is the wake-up voice which is sent by the user and can wake up the voice assistant and passes the identity authentication. And, since the first voice data is voice data of the user acquired in real time by the terminal 300; thus, the first speech data may reflect the physical state of the user and/or the real-time condition of the noise scene in which the user is located. In summary, the voice-print model of the terminal 300 is updated by using the first voice data, so that the voice wake-up rate of the terminal for performing voice wake-up can be improved, and the false wake-up rate can be reduced.

Further, the first voice data is automatically acquired by the terminal 300 during the voice wake-up process performed by the terminal 300, instead of prompting the user to manually re-register the wake-up word and then receiving the user input. Therefore, the voice print model is updated by adopting the first voice data, and the flow of wake-up word updating can be simplified.

The embodiment of the application provides a method for updating wake-up voice of a voice assistant by a terminal. As shown in fig. 4A, the method for updating wake-up voice of the voice assistant by the terminal may include S401-S405:

s401, the terminal 300 receives the first voice data.

S402, the terminal 300 judges whether the text corresponding to the first voice data is matched with the text of the preset wake-up word registered in the terminal.

After the DSP of the terminal 300 monitors the first voice data, the AP of the terminal 300 may be notified to perform text verification and voiceprint verification on the first voice data. The AP may perform text verification on the first voice data by determining whether a text corresponding to the first voice data is matched with a text of a preset wake-up word registered in the terminal. If the text corresponding to the first voice data matches (e.g., is the same as) the text of the preset wake-up word registered in the terminal, the AP may continue to perform voiceprint verification on the first voice data, i.e., the terminal 300 continues to execute S403. If the text corresponding to the first voice data does not match the text of the preset wake-up word registered in the terminal, the terminal 300 may delete the first voice data, i.e., the terminal 300 may continue to perform S405.

S403, the terminal 300 performs voiceprint verification on the first voice data by using the first voiceprint model.

The first voiceprint model is used for carrying out voiceprint verification when a voice assistant is awakened. The first voiceprint model is used to characterize voiceprint features of wake words registered in the terminal 300.

The process of the terminal registration wake-up word introduced by the embodiment of the application can be as follows: the terminal 300 records voice data (referred to as registered voice data) when registering a preset wake-up word. The preset wake-up word registered in the terminal 300 may include the registered voice data. The first voiceprint model is generated from the registered voice data. In combination with the above description of the embodiment of the present application, after the terminal 300 generates the first voiceprint model, the registered voice data may be substituted into the first voiceprint model as an input value to obtain the first voiceprint threshold.

The method for voice-print verification of the first voice data by the terminal 300 using the first voice-print model may include: after determining that the first voice data passes the text verification, the terminal 300 may substitute the first voice data as an input value into the first voiceprint model to obtain a voiceprint value. The terminal 300 determines whether the difference between the voiceprint value and the first voiceprint threshold is less than a preset threshold. If the difference between the voiceprint value and the first voiceprint threshold is less than a preset threshold, voiceprint verification is passed. If the difference value between the voiceprint value and the first voiceprint threshold is greater than or equal to a preset threshold, voiceprint verification is failed.

If the first voice data passes the voiceprint verification, the terminal 300 can update the first voiceprint model in the terminal 300 with the first voice data, i.e., the terminal 300 can continue to perform S404. If the first voice data does not pass the voiceprint verification, the terminal 300 may delete the first voice data, i.e., the terminal 300 may continue to perform S405.

S404, the terminal 300 updates the first voiceprint model in the terminal 300 by using the first voice data.

The method (i.e. S404) of updating the first voiceprint model by the terminal 300 using the first voice data may include: the terminal 300 generates a second voiceprint model from the first voice data, and replaces the first voiceprint model with the second voiceprint model. The method of generating the second voice pattern by the terminal 300 according to the first voice data may refer to a method of generating the voice pattern by the terminal in the conventional technology. The embodiments of the present application are not described herein.

S405, the terminal 300 deletes the first voice data.

The embodiment of the application provides a method for updating wake-up voice of a voice assistant by a terminal, wherein the terminal 300 can acquire first voice data through text check and voiceprint check when the terminal 300 executes voice wake-up. Then, the first voice print model in the terminal 300 is updated with the first voice data. Wherein, since the first voice data is the voice data of the user acquired by the terminal 300 in real time; thus, the first speech data may reflect the physical state of the user and/or the real-time condition of the noise scene in which the user is located. Moreover, as the first voice data passes the text verification and the voiceprint verification; therefore, the voice-print model of the terminal 300 is updated by using the first voice data, so that the voice wake-up rate of the terminal for performing voice wake-up can be improved, and the false wake-up rate can be reduced.

Wherein if the first voice data passes the voiceprint verification, the terminal 300 can activate a voice assistant. In some cases, the user may speak a preset wake-up word (i.e., voice data) of the terminal 300 during a conversation with another person. In this case, the real purpose of the user speaking the preset wake-up word of the terminal 300 is not to activate the voice assistant. After the voice assistant of the terminal 300 is activated, the user does not trigger the terminal 300 to perform any function by voice. In the embodiment of the application, the voice wake is called invalid voice wake. That is, after the voice assistant is started, the terminal 300 does not receive a valid voice command through the voice assistant. Based on this, the terminal 300 may determine whether to update the first voiceprint model in the terminal 300 with the first voice data by determining whether the voice assistant has received a valid voice command after the voice assistant is activated. Specifically, the embodiment of the application provides a method for updating wake-up voice of a voice assistant by a terminal. As shown in fig. 5A, the method for updating the wake-up voice of the voice assistant by the terminal may include S401-S403, S501-S503, S404 and S405:

Wherein, after S403, if the first voice data passes the voiceprint verification, the terminal 300 may continue to perform S501-S503. If the first voice data does not pass the voiceprint verification, the terminal 300 can continue to perform S405.

S501, the terminal 300 starts a voice assistant.

S502, the terminal 300 receives the second voice data through the voice assistant.

After the voice assistant is started, the voice assistant may receive the second voice data input by the user, and trigger the terminal 100 to execute the function corresponding to the second voice data.

Taking the mobile phone 400 shown in fig. 4B as an example of the terminal. After the handset 400 initiates the voice assistant, the handset 400 may display a "voice assistant" interface 401 shown in fig. 4B. Included in the "voice assistant" interface 401 are a "record" button 403 and a "set" option 404. In response to the clicking operation (such as long pressing operation) of the "record" button 403 by the user, the mobile phone 400 may receive a voice command sent by the user, and trigger the mobile phone 400 to execute an event corresponding to the voice command. The "setup" option 404 is used to set up the various functions and parameters of the "voice assistant" application. The handset 400 may receive a click operation from the user on the "set" option 306 in the voice control interface 303. In response to a user clicking on the "set" option 404, the handset 400 may display the voice control interface 106 shown in fig. 1 (c). Optionally, a prompt 402 may also be included in the "voice assistant" interface 401. The prompt 402 is used to indicate to the user the usual functionality of the "voice assistant" application.

It should be noted that the "voice assistant" interface 401 may not include the "record" button 403. That is, when the mobile phone 400 displays the "voice assistant" interface, the user does not need to click any button (such as the "record" button 403) in the "voice assistant" interface, and the mobile phone 400 can record the voice command issued by the user. The "voice assistant" interface of terminal 300 includes, but is not limited to, "voice assistant" interface 401 shown in fig. 4B.

S503, the terminal 300 determines whether the second voice data is a valid voice command.

The effective voice command in the embodiment of the application refers to: instructions capable of triggering the terminal 300 to perform the corresponding functions.

It will be appreciated that if the user intentionally speaks a preset wake-up word of the terminal 300, i.e. the user speaks the preset wake-up word of the terminal 300 for the real purpose of waking up the voice assistant of the terminal 300, the user will typically trigger the terminal 300 to perform the corresponding function by voice after the voice assistant of the terminal 300 is activated. In other words, if the terminal 300 receives an instruction (i.e., a valid voice command) for triggering the terminal 300 to perform a corresponding function through the voice assistant after the voice assistant is started, it means that the terminal will perform the corresponding function in response to the valid voice command, it may be determined that this time the voice wakeup is a voice wakeup consistent with the user's intention. In the embodiment of the application, the voice wakeup is called effective voice wakeup.

To ensure a voice wake-up rate at which the terminal 300 performs voice wake-up after updating the first voiceprint model of the terminal 300 with the first voice data. In the embodiment of the present application, the terminal only updates the wake-up word of the terminal 300 by using the voice data corresponding to the valid voice wake-up. Specifically, if the voice assistant of the terminal 300 receives a valid voice command after being started, it indicates that the user wakes up the voice assistant of the terminal 300 using the first voice data is a valid voice wake-up, i.e., the second voice data is a valid voice command, and the terminal 300 may execute 404. If the second voice data is not received after the voice assistant of the terminal 300 is started, it indicates that the user wakes up the voice assistant of the terminal 300 using the first voice data is an invalid voice wakeup, i.e., the second voice data is not a valid voice command, and the terminal 300 may delete the first voice data, i.e., perform S405.

In the embodiment of the present application, after the voice assistant of the terminal 300 is started, the terminal 300 updates the first voiceprint model in the terminal 300 by using the first voice data only when receiving the valid voice command for triggering the terminal 300 to execute the corresponding function. If the voice assistant of the terminal 300 receives a valid voice command after starting, it indicates that this voice wakeup is a valid voice wakeup consistent with the user's intention. The voice data which can reflect the real intention of the user and can successfully wake up the terminal 300 is adopted to update the voice print model of the terminal 300, so that the voice wake-up rate of the terminal for executing voice wake-up can be further improved, and the false wake-up rate is reduced.

It can be appreciated that if the signal quality of the first voice data is poor, after the terminal 300 updates the first voiceprint model with the first voice data, the terminal 300 performs voice wake-up by using the updated voiceprint model, which affects the success rate of voice wake-up.

In order to avoid that the terminal 300 updates the first voiceprint model with voice data with poor signal quality, the terminal 300 may determine whether the signal quality parameter of the first voice data is higher than the second preset threshold before updating the first voiceprint model with the first voice data. Wherein, the signal quality parameter of the voice data is used for representing the signal quality of the voice data. For example, the signal quality parameter of the voice data may be a signal to noise ratio of the voice data. If the signal quality parameter of the first voice data is higher than the second preset threshold value, the signal quality of the first voice data is higher. In this case, the terminal 300 may update the first voiceprint model with the first voice data. The terminal 300 may delete the first voice data if the signal quality parameter of the first voice data is lower than or equal to the second preset threshold.

Optionally, in the embodiment of the present application, the user may also determine whether to update the first voiceprint model in the terminal 300 with the first voice data. Specifically, before updating the first voiceprint model in the terminal 300 with the first voice data, the terminal 300 may further display a first interface for prompting the user whether to update the voiceprint model. The terminal 300 then determines whether to update the voiceprint model based on the user selection in the first interface. For example, the terminal 300 is a mobile phone 500 shown in fig. 5B. The handset 500 may display the first interface 501 shown in fig. 5B before updating the first voiceprint model in the handset 500 with the first voice data. The first interface 501 is used to prompt the user whether to update the voiceprint model (i.e., wake word). For example, the first interface 501 includes first prompt information, such as "during a voice wake-up process of the mobile phone, obtain voice data that can update the wake-up word" and "do it update the wake-up word? ". The first interface 501 further includes: an "update" option for triggering the handset 500 to update the voiceprint model and a "cancel" option for triggering the handset 500 not to update the voiceprint model.

In the embodiment of the present application, before updating the voiceprint model, the terminal 300 displays a first interface for prompting the user whether to update the voiceprint model. In this way, it may be decided by the user whether to update the first interface of the voiceprint model. That is, the terminal 300 can determine whether to update the first interface of the voiceprint model according to the user requirement, so that the interaction performance between the terminal 300 and the user can be improved, and the user experience is improved.

The process of the terminal registration wake-up word introduced by the embodiment of the application can be as follows: the terminal 300 records one or more voice data (referred to as registered voice data) when registering a preset wake-up word. The first voiceprint model is generated from the one or more registered voice data. It is assumed that the first voiceprint model is generated from at least two registered voice data. Then, after the terminal 300 generates a new voiceprint model according to the first voice data, if the new voiceprint model is directly used to replace the first voiceprint model, although the voice wake rate of the terminal 300 for performing voice wake may be improved. However, directly replacing the first voiceprint model with a voiceprint model generated from new voice data (i.e., the first voice data) can greatly enhance the voice wakeup rate. The voice wake-up rate is greatly improved, and the false wake-up rate of the terminal 300 for performing voice wake-up may be correspondingly improved. In order to stably increase the voice wake-up rate of the terminal 300 and reduce the false wake-up rate of the terminal 300 for performing voice wake-up. The method of updating the first voiceprint model with the first voice data (i.e., S404) of the terminal 300 can include S601-S603. For example, as shown in fig. 6, S404 shown in fig. 5A may include S601-S603:

S601, the terminal 300 replaces third voice data in at least two pieces of registered voice data by the first voice data to obtain at least two pieces of updated registered voice data.

S602, the terminal 300 generates a second voice pattern model according to the updated at least two registered voice data.

S603, the terminal 300 replaces the first voiceprint model with the second voiceprint model.

If the voice assistant of the terminal 300 receives a valid voice command after being started, the terminal 300 may determine the third voice data from at least two registered voice data stored in the terminal 300.

In some embodiments, the third voice data is voice data having a signal quality parameter lower than that of other voice data among the at least two registered voice data. The terminal 300 replaces the third voice data having a signal quality parameter lower than that of the other voice data with the first voice data; then, a second voice pattern is generated according to the updated at least two registered voice data. The voice data replaced with the first voice data of the at least two registered voice data has a lower signal quality parameter than the other voice data. That is, the signal quality parameters of the remaining voice data (i.e., the updated at least two registered voice data) are high. The second voiceprint model generated by the terminal 300 according to the voice data with higher signal quality parameters can more accurately and clearly represent voiceprint characteristics of the user. The terminal 300 adopts the second voice pattern model to perform voice awakening, so that the voice awakening rate can be improved, and the false awakening rate of the terminal for performing voice awakening can be reduced.

In other embodiments, the third voice data may be the earliest voice data stored by the terminal among the at least two registered voice data. Compared with other voice data except the third voice data in at least two registered voice data, the voice data (namely the third voice data) stored by the terminal and the current physical state of the user and the real-time condition of the noise scene where the user is currently located have lower coincidence degree. Therefore, after the first voice data is adopted to replace the third voice data, the coincidence between the reserved voice data (i.e. the updated at least two registered voice data) and the current physical state of the user and the real-time condition of the noise scene in which the user is currently located can be improved. The second voiceprint model generated by the terminal 300 according to the voice data with higher conformity can more accurately and clearly represent the voiceprint characteristics of the user in the current physical state of the user and the current noise scene. The terminal 300 adopts the second voice pattern model to perform voice awakening, so that the voice awakening rate can be improved, and the false awakening rate of the terminal for performing voice awakening can be reduced.

In the embodiment of the present application, the terminal 300 replaces part of the voice data, such as the third voice data, in the at least two registered voice data with the first voice data; rather than generating the second acoustic model entirely from the first speech data. Thus, the voice wake-up rate of the terminal 300 for performing voice wake-up can be stably improved. In addition, the false wake-up rate of the terminal 300 for performing the voice wake-up can be reduced while the voice wake-up rate of the terminal 300 is stably improved.

It can be appreciated that if the second voice print threshold generated by the terminal according to the second voice print model is different from the first voice print threshold greatly, the wake-up rate of the terminal 300 for performing voice wake-up is greatly fluctuated, which affects the user experience. Based on this, as shown in fig. 7, after S602 and before S603 shown in fig. 6, the method of the embodiment of the present application may further include S701-S702:

s701, the terminal 300 generates a second voice channel threshold according to the second voice channel model and the updated at least two registered voice data.

Wherein the second voiceprint model corresponds to a function. For example, the terminal 300 may take each of the updated at least two registered voice data as an input value, and substitute the input value into the second voice model to obtain at least two voice print thresholds. The terminal 300 may calculate an average value of the at least two voiceprint thresholds to obtain a second voiceprint threshold. For example, it is assumed that the above-mentioned updated at least two pieces of registered voice data include registered voice data a and registered voice data b. The terminal 300 may substitute the registered voice data a into the second voice pattern to obtain a voice pattern threshold a; substituting the registered voice data B into a second voice pattern model to obtain a voice pattern threshold B; and calculating the average value of the voiceprint threshold A and the voiceprint threshold B to obtain a second voiceprint threshold.

S702, the terminal 300 judges whether the difference value between the second voiceprint threshold and the first voiceprint threshold is smaller than a first preset threshold.

Specifically, if the difference between the second voice print threshold and the first voice print threshold is smaller than the first preset threshold, the change between the second voice print threshold and the first voice print threshold is smaller. In this case, the voiceprint verification is performed by using the second voiceprint model, and the wake-up rate of the terminal 300 for voice wake-up is not greatly affected. At this time, the terminal 300 may perform S603.

If the difference value between the second voice print threshold and the first voice print threshold is larger than or equal to the first preset threshold, the second voice print threshold and the first voice print threshold are larger in change. In this case, the voice print verification using the second voice print model has a great influence on the wake-up rate of the terminal 300 for voice wake-up. At this time, as shown in fig. 7, the terminal 300 may perform S703:

s703, the terminal 300 deletes the second voice pattern and the first voice data.

It can be appreciated that when the second voice print threshold and the first voice print threshold change greatly, the terminal 300 deletes the second voice print model and the first voice data, i.e. does not replace the second voice print model with the first voice print model. In this way, the wake-up rate of the terminal 300 for performing voice wake-up is prevented from greatly fluctuating due to the large difference between the second voice print threshold and the first voice print threshold, so that the user experience is prevented from being affected.

The embodiment of the application provides a method for updating wake-up voice of a voice assistant by a terminal. As shown in fig. 8, the method for updating wake-up voice of the voice assistant by the terminal may include S801-S808:

s801, the terminal 300 receives first voice data.

S802, the terminal 300 judges whether the text corresponding to the first voice data is matched with the text of the preset wake-up word registered in the terminal.

If the text corresponding to the first voice data matches the text of the preset wake-up word registered in the terminal, the AP may continue to perform voiceprint verification on the first voice data, i.e., the terminal 300 continues to execute S803. If the text corresponding to the first voice data does not match the text of the preset wake-up word registered in the terminal, the terminal 300 may delete the first voice data, i.e., the terminal 300 may continue to perform S808.

S803, the terminal 300 performs voiceprint verification on the first voice data by using the first voiceprint model.

If the first voice data passes the voiceprint verification, the terminal 300 can continue to perform S804. If the first voice data does not pass the voiceprint verification, the terminal 300 can continue to perform S808.

The detailed descriptions of S801 to S803 may refer to the descriptions of S401 to S403 in the embodiments of the present application, and the embodiments of the present application are not described herein.

S804, the terminal 300 starts a voice assistant.

S805, the terminal 300 performs text verification on the voice data received in the first preset time.

S806, the terminal 300 judges whether the terminal 300 receives the second voice data and at least one voice data matched with the text of the preset wake-up word in the first preset time.

Wherein the first preset time is a preset time period in which it is determined from the terminal 300 that the first voice data is identical to the text information of the wake word registered in the terminal 300 (i.e., the first voice data passes the text check) but does not pass the voiceprint check.

Generally, the AP of the terminal 300 is in a sleep state. The DSP of the terminal 300 monitors the first voice data. When it is detected that the similarity between the voice data and the wake-up word registered in the terminal 300 satisfies a certain condition, the DSP delivers the monitored voice data to the AP, and the AP is awakened. And performing text verification and voiceprint verification on the voice data by the AP to judge whether the voice data is matched with the generated voiceprint model. After the AP performs text verification and voiceprint verification on the voice data to obtain a verification result, the AP enters a dormant state until the voice data sent by the DSP is received again. That is, the DSP transmits only voice data, which satisfies a certain condition with respect to the similarity with the wake-up word registered in the terminal 300, to the AP. The AP performs text checksum voiceprint verification only on voice data (i.e., voice data whose similarity with the wake-up word registered in the terminal 300 satisfies a certain condition) transmitted by the DSP.

It will be appreciated that if the first voice data is identical to the text information of the wake-up word registered in the terminal 300 (i.e., the first voice data can be checked through text), the DSP can recognize that the similarity between the first voice data and the wake-up word registered in the terminal 300 satisfies a certain condition. The DSP may transmit the first voice data to the AP and wake up the AP. And performing text verification and voiceprint verification on the first voice data by the AP.

In contrast, in the embodiment of the present application, if the AP determines that the first voice data is the same as the text information of the wake-up word registered in the terminal 300 (i.e., the first voice data may be checked through text), but the first voice data does not pass voiceprint verification, the AP does not enter the sleep state immediately after obtaining the verification result. Instead, the DSP delivers all the voice data monitored in the first preset time to the AP, and the AP may perform text verification on all the voice data monitored by the DSP in the first preset time.

The first voiceprint model is used for carrying out voiceprint verification when a voice assistant is awakened, and can represent voiceprint characteristics of awakening words registered in the terminal. The text corresponding to the second voice data contains preset keywords. For example, the second voice data may be voice data in which the user complains of a voice wake failure, such as voice data of "how to wake up", "how to not go", "not respond", "unable to wake up", and "voice wake up failure".

And the AP performs text verification on all voice data monitored by the DSP in a first preset time. If the AP recognizes that the text information is second voice data such as "how to wake up", "how to do not go, how to do not respond", "how to wake up", "voice wake up failure", etc. at the first preset time, and at least one voice data having the same text information as the text information of the wake up word registered in the terminal 300, the terminal 300 may update the first voiceprint model of the terminal 300 by using the first voice data received by the terminal 300.

It can be understood that if the terminal 300 detects that the voiceprint verification of the first voice data is not passed after receiving the first voice data in S801. Then, the terminal 300 may receive at least one text-verified voice data within the first preset time, which indicates that the user wants to wake up the voice assistant of the terminal 300 with voice multiple times, but the voice wake up fails. In this case, if the terminal 300 also receives the second voice data within the first preset time, it indicates that the result of the user's voice wakeup failure is not satisfied.

The terminal 300 receives the second voice data and the at least one text-verified voice data within the first preset time, indicating that the user has a strong intention to wake up the voice assistant by voice; however, multiple voice failures may be caused because the current physical state of the user is greatly different from the physical state when the user registers the wake word. Of course, it is also possible that multiple voice failures are caused by the fact that the real-time condition of the noise scene where the user is currently located is greatly different from the real-time condition of the noise scene where the user registers the wake-up word. In this case, even if the first voice data does not pass the voiceprint verification, the terminal 300 may update the first voiceprint model in the terminal 300 with the received first voice data. That is, if the terminal 300 receives the second voice data and at least one voice data matching the text of the preset wake word within the first preset time, the first voice data is used to update the first voiceprint model in the terminal, i.e., S807 is performed.

S807, the terminal 300 updates the first voiceprint model of the terminal 300 using the first voice data.

Wherein, if the terminal 300 does not receive the second voice data and at least one voice data matching with the text of the preset wake-up word within the first preset time, the terminal 300 may delete the first voice data.

S808, the terminal 300 deletes the first voice data.

The method for updating the first voiceprint model in the terminal 300 by using the first voice data by the terminal 300 may include: the terminal 300 generates a second voiceprint model from the first voice data, and replaces the first voiceprint model with the second voiceprint model. The method of generating the second voice pattern by the terminal 300 according to the first voice data may refer to a method of generating the voice pattern by the terminal in the conventional technology. The embodiments of the present application are not described herein.

Since the first voice data is voice data of the user acquired in real time by the terminal 300; thus, the first speech data may reflect the physical state of the user and/or the real-time condition of the noise scene in which the user is located. Therefore, the voice-print model of the terminal 300 is updated by using the first voice data, so that the voice wake-up rate of the terminal for performing voice wake-up can be improved, and the false wake-up rate can be reduced.

And, since the received first voice data is voice data issued by the user for starting the voice assistant with a strong intention of the voice assistant of the voice wake-up terminal 300. Therefore, the voice data capable of reflecting the real intention of the user is adopted to update the voice print model of the terminal 300, so that the voice wake-up rate of the terminal for executing voice wake-up can be further improved, and the false wake-up rate is reduced.

Further, the received first voice data is automatically acquired by the terminal 300 during the voice wake-up process performed by the terminal 300, instead of prompting the user to manually re-register the wake-up word and then receiving the user input. Therefore, the voice print model is updated by adopting the received first voice data, and the flow of wake-up word updating can be simplified.

In order to avoid that the terminal 300 updates the first voiceprint model with voice data with poor signal quality, the terminal 300 may determine whether the signal quality parameter of the first voice data is higher than the second preset threshold before updating the first voiceprint model with the first voice data. Wherein, the signal quality parameter of the voice data is used for representing the signal quality of the voice data. For example, the signal quality parameter of the voice data may be a signal to noise ratio of the voice data. If the signal quality parameter of the first voice data is higher than the second preset threshold value, the signal quality of the first voice data is higher. In this case, the terminal 300 may update the first voiceprint model with the first voice data. If the signal quality parameter of the first voice data is lower than or equal to the second preset threshold, the terminal 300 may delete the first voice data

Optionally, in addition to the first voice data, the terminal 300 may update the first voiceprint model with the at least one voice data matching the text of the preset wake word. Specifically, the terminal may select, from the first voice data and at least one voice data matched with the text of the preset wake-up word, a voice data with a signal quality parameter higher than a second preset threshold; and then updating the first voiceprint model by adopting voice data with voice signal quality higher than a second preset threshold value.

In order to avoid a malicious user triggering the terminal 300 to execute S801-S808, the first voiceprint model in the terminal 300 is updated to achieve the purpose of waking up the terminal 300 by voice. The terminal 300 may perform user authentication before performing S807. After the user authentication is passed, S807 is executed again. Specifically, after S806, before S807, the terminal 300 may perform identity authentication for the user; if the identity authentication is passed, the terminal 300 performs S807; if the identity authentication is not passed, the terminal 300 performs S808. The method for authenticating the identity of the user by the terminal can comprise S901-S903. As shown in fig. 9, after S806 and before S807 shown in fig. 8, the method according to the embodiment of the present application may further include S901 to S903:

S901, the terminal 300 displays an identity verification interface.

Wherein. The authentication interface is used for receiving authentication information input by a user.

S902, the terminal 300 receives authentication information input by a user on an authentication interface.

S903, the terminal 300 performs user authentication according to the authentication information.

If the identity authentication is passed, the terminal 300 updates the first voiceprint model by using the first voice data, that is, the terminal 300 executes S807. If the identity authentication is not passed, the terminal 300 deletes the first voice data, i.e., the terminal 300 performs S808.

For example, the authentication information may be any one of a digital password, a pattern password, fingerprint information, iris information, and facial feature information. Correspondingly, the identity authentication interface can be any one of an interface for inputting a digital password or a pattern password, an interface for inputting fingerprint information and an interface for inputting iris information, an interface for inputting facial feature information and the like.

Illustratively, the terminal 300 is the mobile phone 1000 shown in fig. 10, and the authentication information is a digital password, and the authentication interface is an interface for inputting the digital password. The handset 1000 may display the authentication interface 1001 shown in fig. 10. The authentication interface 1001 includes a password input box 1002 and a first prompt message "after the user authentication is passed," the mobile phone will automatically update the wake-up word "1003.

The terminal 300 performs user authentication. After the user authentication is passed, the terminal updates the first voiceprint model in the terminal 300. In this way, the malicious user can be prevented from using the sound of the malicious user to trigger the terminal 300 to update the first voiceprint model in the terminal 300, so as to achieve the purpose of waking up the terminal 300 by malicious voice. By the scheme, the voiceprint model in the terminal 300 can be prevented from being maliciously updated, and the safety of the terminal 300 can be improved.

It can be appreciated that the new voiceprint model is directly generated by using the first voice data, and the new voiceprint model is used to replace the first voiceprint model, although the voice wake rate of the terminal 300 for performing voice wake can be improved. However, directly replacing the first voiceprint model with a voiceprint model generated from the first voice data can greatly enhance the voice wakeup rate. The voice wake-up rate is greatly improved, and the false wake-up rate of the terminal 300 for performing voice wake-up may be correspondingly improved. In order to stably increase the voice wake-up rate of the terminal 300 and reduce the false wake-up rate of the terminal 300 for performing voice wake-up. As shown in fig. 11, the S807 may include the S601-S603 described above.

In the embodiment of the application, the terminal adopts the first voice data to replace part of voice data in at least two registered voice data; rather than generating the second acoustic model entirely from the first speech data. Thus, the voice wake-up rate of the terminal 300 for performing voice wake-up can be stably improved. In addition, the false wake-up rate of the terminal 300 for performing the voice wake-up can be reduced while the voice wake-up rate of the terminal 300 is stably improved.

It can be appreciated that if the second voice print threshold generated by the terminal according to the second voice print model is different from the first voice print threshold greatly, the wake-up rate of the terminal 300 for performing voice wake-up is greatly fluctuated, which affects the user experience. Based on this, as shown in fig. 2, after S602 and before S603 shown in fig. 12, the method of the embodiment of the present application may further include S701-S702:

after S702, if the difference between the second voice print threshold and the first voice print threshold is smaller than the first preset threshold, it indicates that the second voice print threshold and the first voice print threshold have smaller change. In this case, the voiceprint verification is performed by using the second voiceprint model, and the wake-up rate of the terminal 300 for voice wake-up is not greatly affected. At this time, the terminal 300 may perform S603. After S702, if the difference between the second voice print threshold and the first voice print threshold is greater than or equal to the first preset threshold, it indicates that the second voice print threshold and the first voice print threshold have a larger change. In this case, the voice print verification using the second voice print model has a great influence on the wake-up rate of the terminal 300 for voice wake-up. At this time, the terminal 300 may perform S703.

It can be appreciated that when the second voice print threshold and the first voice print threshold change greatly, the terminal 300 deletes the second voice print model and the third voice data, i.e. does not replace the second voice print model with the first voice print model. In this way, the wake-up rate of the terminal 300 for performing voice wake-up is prevented from greatly fluctuating due to the large difference between the second voice print threshold and the first voice print threshold, so that the user experience is prevented from being affected.

It will be appreciated that the above-described terminal, etc. may comprise hardware structures and/or software modules that perform the respective functions in order to achieve the above-described functions. Those of skill in the art will readily appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as hardware or combinations of hardware and computer software. Whether a function is implemented as hardware or computer software driven hardware depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the embodiments of the present application.

The embodiment of the application can divide the functional modules of the terminal and the like according to the method example, for example, each functional module can be divided corresponding to each function, and two or more functions can be integrated in one processing module. The integrated modules may be implemented in hardware or in software functional modules. It should be noted that, in the embodiment of the present application, the division of the modules is schematic, which is merely a logic function division, and other division manners may be implemented in actual implementation.

Fig. 13 shows a schematic diagram of a possible configuration of the terminal involved in the above embodiment in the case where respective functional modules are divided with corresponding respective functions, and the terminal 1300 includes: a storage unit 1301, an input unit 1302, a text verification unit 1303, a voiceprint verification unit 1304, and an update unit 1305. Wherein, the storage unit 1301 stores a preset wake-up word registered in the terminal 1300 and a first voiceprint model. The first voiceprint model is used for voiceprint verification when waking up the voice assistant. The first voiceprint model characterizes voiceprint features of a preset wake word.

Wherein the input unit 1302 is for supporting the terminal 1300 to perform S401, S502, S801, S902 in the above-described method embodiments, and/or other processes for the techniques described herein. The text verification unit 1303 is configured to support the terminal 1300 to perform S402, S802, S805 in the above-described method embodiments, and/or other processes for the techniques described herein. The voiceprint verification unit 1304 is configured to support the terminal 1300 to perform S403, S803 in the method embodiments described above, and/or other processes for the techniques described herein. The update unit 1305 is configured to support the terminal 1300 to perform S404, S603, S807 in the above-described method embodiments, and/or other processes for the techniques described herein.

Further, the terminal 1300 may further include: a starting unit and a determining unit. The enabling unit is configured to enable the terminal 1300 to perform S501, S804 in the method embodiments described above, and/or other processes for the techniques described herein. The determining unit is for supporting the terminal 1300 to perform S503 in the above-described method embodiments, and/or other processes for the techniques described herein.

Further, as shown in fig. 14, the terminal 1300 may further include: an identity authentication unit 1306. The authentication unit 1306 is configured to support user authentication by the terminal 1300. For example, the identity authentication unit 1306 is used to support the terminal 1300 to perform S903 in the method embodiments described above, and/or other processes for the techniques described herein.

Further, the terminal 1300 may further include: and a display unit. The display unit is used to support the terminal 1300 to perform S901 in the method embodiments described above, and/or other processes for the techniques described herein.

Further, the terminal 1300 may further include: a replacement unit and a generation unit. The replacement unit is used to support the terminal 1300 to perform S601 in the method embodiments described above, and/or other processes for the techniques described herein. The generating unit is configured to support the terminal 1300 to perform S602, S701 in the method embodiments described above, and/or other processes for the techniques described herein.

Further, the terminal 1300 may further include: and deleting the unit. The deletion unit is used to support the terminal 1300 to perform S405, S703, S808 in the method embodiments described above, and/or other processes for the techniques described herein.

Further, the terminal 1300 may further include: and a judging unit. The judgment unit is used to support the terminal 1300 to perform S702, S806 in the above-described method embodiments, and/or other processes for the techniques described herein.

All relevant contents of each step related to the above method embodiment may be cited to the functional description of the corresponding functional module, which is not described herein.

Of course, the terminal 1300 includes, but is not limited to, the unit modules listed above. For example, the terminal 300 may further include a receiving unit and a transmitting unit. The receiving unit is used for receiving data or instructions sent by other terminals. The sending unit is used for sending data or instructions to other terminals. In addition, the functions that can be implemented by the above functional units include, but are not limited to, functions corresponding to the method steps described in the above examples, and the detailed description of other units of the terminal 1300 may refer to the detailed description of the corresponding method steps, which are not repeated herein in the embodiments of the present application.

In case of using an integrated unit, fig. 15 shows a possible structural schematic diagram of the terminal involved in the above-described embodiment. The terminal 1500 includes: a processing module 1501, a storage module 1502 and a display module 1503. The processing module 1501 is configured to control and manage the operation of the terminal 1500. The display module 1503 is used for displaying the image generated by the processing module 1501. A storage module 1502 is configured to store program codes and data of the terminal. For example, the storage module 1502 stores a preset wake word registered in the terminal, and a first voiceprint model, where the first voiceprint model is used for performing voiceprint verification when waking up the voice assistant, and the first voiceprint model characterizes voiceprint features of the preset wake word. Optionally, the terminal 1500 may further comprise a communication module for supporting communication of the terminal with other network entities. The detailed description of each unit included in the terminal 1500 may refer to the description in the above method embodiments, and will not be repeated here.

The processing module 1501 may be a processor or controller, such as a central processing unit (Central Processing Unit, CPU), general purpose processor, digital signal processor (Digital Signal Processor, DSP), application-specific integrated circuit (ASIC), field programmable gate array (Field Programmable Gate Array, FPGA) or other programmable logic device, transistor logic device, hardware component, or any combination thereof. Which may implement or perform the various exemplary logic blocks, modules and circuits described in connection with this disclosure. The processor may also be a combination that performs the function of a computation, e.g., a combination comprising one or more microprocessors, a combination of a DSP and a microprocessor, and the like. The communication module may be a transceiver, a transceiver circuit, a communication interface, or the like. The storage module 1502 may be a memory.

When the processing module 1501 is a processor (e.g., the processor 310 shown in fig. 3), the communication module includes a Wi-Fi module and a bluetooth module (e.g., the communication module 360 shown in fig. 3). The Wi-Fi module and the bluetooth module may be collectively referred to as a communication interface. The memory module 1502 is a memory (such as the internal memory 321 shown in fig. 3 and an external SD card connected to the terminal 1500 through the external memory interface 320). When the display module 1503 is a touch screen (including the display screen 394 shown in fig. 3), the terminal provided by the embodiment of the present application may be the terminal 300 shown in fig. 3. Wherein the processor, the communication interface, the touch screen and the memory may be coupled together by a bus.

The embodiment of the present application further provides a computer storage medium having a computer program code stored therein, which when executed by the processor, causes the terminal to perform the relevant method steps of any of fig. 4A, 5A, 6, 7, 8, 9, 11 and 12 to implement the method of the above embodiment.

Embodiments of the present application also provide a computer program product which, when run on a computer, causes the computer to perform the relevant method steps in any of fig. 4A, 5A, 6, 7, 8, 9, 11 and 12 to implement the method in the above embodiments.

The terminal 1300, the terminal 1500, the computer storage medium, or the computer program product provided in the embodiments of the present application are used to execute the corresponding methods provided above, and therefore, the advantages achieved by the embodiments of the present application can refer to the advantages in the corresponding methods provided above, and are not described herein.

From the foregoing description of the embodiments, it will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of functional modules is illustrated, and in practical application, the above-described functional allocation may be implemented by different functional modules according to needs, i.e. the internal structure of the apparatus is divided into different functional modules to implement all or part of the functions described above.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus and method may be implemented in other manners. For example, the apparatus embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions when actually implemented, e.g., multiple units or components may be combined or integrated into another apparatus, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be an indirect coupling or communication connection via some interfaces, devices or units, which may be in electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and the parts displayed as units may be one physical unit or a plurality of physical units, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units described above may be implemented either in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a readable storage medium. Based on such understanding, the technical solution of the embodiments of the present application may be essentially or a part contributing to the prior art or all or part of the technical solution may be embodied in the form of a software product stored in a storage medium, including several instructions for causing a device (may be a single-chip microcomputer, a chip or the like) or a processor (processor) to perform all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a usb disk, a removable hard disk, a ROM, a RAM, a magnetic disk, or an optical disk, etc.

The foregoing is merely illustrative of specific embodiments of the present application, and the scope of the present application is not limited thereto, but any changes or substitutions within the technical scope of the present application should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for updating wake-up speech of a speech assistant by a terminal, comprising:

the terminal receives first voice data input by a user;

the terminal judges whether a text corresponding to the first voice data is matched with a text of a preset wake-up word registered in the terminal;

if the text corresponding to the first voice data is matched with the text of the preset wake-up word, the terminal uses a first voiceprint model to carry out voiceprint verification on the first voice data;

if the first voice data do not pass the voiceprint verification, the terminal performs text verification on the voice data received in the first preset time;

if the terminal receives second voice data and at least one voice data matched with the text of the preset wake-up word in the first preset time, the terminal performs identity authentication on the user; the text corresponding to the second voice data comprises preset keywords;

If the identity authentication is passed, the terminal updates a first voiceprint model in the terminal by adopting the first voice data;

the voice assistant is used for waking up voice marks of a user, wherein the first voice mark model is used for conducting voice mark verification when the voice assistant is waken up, and the first voice mark model represents voice mark characteristics of the preset wake-up word.

2. The method for updating wake-up voice of a voice assistant of claim 1, wherein the terminal authenticating the user comprises:

and the terminal uses the first voiceprint model to carry out voiceprint verification on the first voice data, and if the voiceprint verification is passed, the identity authentication is passed.

3. A method for updating wake-up speech of a speech assistant according to claim 1 or 2 and also comprising:

when the identity authentication passes, the terminal starts the voice assistant;

the terminal receives third voice data through the voice assistant;

after the identity authentication is passed, before the terminal updates the first voiceprint model in the terminal with the first voice data, the method further comprises:

the terminal determines that the third voice data is a valid voice command.

4. A method of updating wake-up speech of a speech assistant according to claim 1 or 2 and wherein the terminal comprises a co-processor and a main processor; the terminal monitors voice data by using a coprocessor; when the coprocessor monitors the first voice data with the similarity of the first voice data and the preset wake-up word meeting the preset condition, informing the main processor to judge whether the text corresponding to the first voice data is matched with the text of the preset wake-up word of the terminal, and when determining that the text corresponding to the first voice data is matched with the text of the preset wake-up word, performing voiceprint verification on the first voice data by using the first voiceprint model.

5. The method for updating wake-up voice of a voice assistant of claim 1, wherein the terminal performs identity authentication on the user, comprising:

the terminal displays an identity verification interface;

the terminal receives authentication information input by a user at the authentication interface;

and the terminal performs user authentication on the user according to the authentication information.

6. A method of updating wake-up speech of a speech assistant according to claim 1 or 5 and wherein the terminal comprises a co-processor and a main processor; the terminal monitors voice data by using a coprocessor; when the coprocessor monitors the first voice data with the similarity meeting the preset conditions, informing the main processor to judge whether the text corresponding to the first voice data is matched with the text of the preset wake word of the terminal, and when determining that the text corresponding to the first voice data is matched with the text of the preset wake word, performing voiceprint verification on the first voice data by using the first voiceprint model;

The terminal uses the coprocessor to monitor voice data in the first preset time; and informing the main processor to judge whether the voice data received in the first preset time comprises second voice data and at least one voice data matched with the text of the preset wake-up word, wherein the text corresponding to the second voice data comprises a preset keyword.

7. The method for updating wake-up voice of a voice assistant according to any one of claims 1, 2 or 5, wherein the preset wake-up word includes at least two registered voice data recorded when the terminal registered the preset wake-up word, and the first voiceprint model is generated according to the at least two registered voice data;

the terminal updates a first voiceprint model in the terminal by adopting the first voice data, and the method comprises the following steps:

the terminal adopts the first voice data to replace third voice data in the at least two pieces of registered voice data to obtain at least two pieces of updated registered voice data, wherein the signal quality parameters of the third voice data are lower than those of other voice data in the at least two pieces of registered voice data;

The terminal generates a second voice pattern model according to the updated at least two registered voice data;

the terminal replaces the first voiceprint model with the second voiceprint model, and the second voiceprint model is used for representing voiceprint characteristics of the at least two updated registered voice data.

8. The method for updating wake-up speech of a voice assistant of claim 7, wherein a first voiceprint threshold is also stored in the terminal, the first voiceprint threshold being generated from the first voiceprint model and the at least two registered voice data;

after the terminal generates a second voiceprint model according to the updated at least two registered voice data, before the terminal replaces the first voiceprint model with the second voiceprint model, the method further includes:

the terminal generates a second voice threshold according to the second voice model and the updated at least two registered voice data;

the terminal replaces the first voiceprint model with the second voiceprint model, comprising:

and if the difference value between the second voiceprint threshold and the first voiceprint threshold is smaller than a first preset threshold, the terminal replaces the first voiceprint model by the second voiceprint model.

9. The method for updating wake-up speech of a speech assistant of claim 8, wherein the method further comprises:

and if the difference value between the second voice print threshold and the first voice print threshold is larger than or equal to the first preset threshold, the terminal deletes the second voice print model and the first voice data.

10. The method of updating wake-up speech of a voice assistant for a terminal according to any of claims 1, 2, 5, 8 or 9, wherein the terminal updates a first voiceprint model in the terminal with the first speech data, comprising:

if the signal quality parameter of the first voice data is higher than a second preset threshold, the terminal updates the first voiceprint model by adopting the first voice data;

wherein the signal quality parameter of the first voice data comprises a signal to noise ratio of the first voice data.

11. A terminal, the terminal comprising: a processor, a memory, and a display; the memory, the display, and the processor are coupled; the display is used for displaying the image generated by the processor; the memory is used for storing computer program codes, related information of the voice assistant, preset wake-up words registered in the terminal and a first voiceprint model; the computer program code comprising computer instructions which, when executed by the processor,

The processor is used for receiving first voice data input by a user; judging whether the text corresponding to the first voice data is matched with the text of the preset wake-up word or not; if the text corresponding to the first voice data is matched with the text of the preset wake-up word, voiceprint verification is carried out on the first voice data by using the first voiceprint model; if the first voice data does not pass the voiceprint verification, performing text verification on the voice data received in the first preset time; if the processor receives second voice data and at least one voice data matched with the text of the preset wake-up word in the first preset time, carrying out identity authentication on the user; if the identity authentication is passed, updating the first voiceprint model stored in the memory by adopting the first voice data; the text corresponding to the second voice data comprises preset keywords;

the first voiceprint model is used for carrying out voiceprint verification when a voice assistant is awakened, and characterizes voiceprint features of the preset awakening words.

12. The terminal of claim 11, wherein the processor is configured to authenticate the user, and comprises:

And the processor is used for carrying out voiceprint verification on the first voice data by using the first voiceprint model, and if the voiceprint verification is passed, the identity authentication is passed.

13. The terminal according to claim 11 or 12, wherein the processor is further configured to activate the voice assistant when the identity authentication is passed; receiving, by the voice assistant, third voice data;

the processor is further configured to determine, after the identity authentication passes, that the third voice data is a valid voice command before updating the first voiceprint model with the first voice data.

14. A terminal according to claim 11 or 12, wherein the processor comprises a co-processor and a main processor; the coprocessor is used for monitoring voice data; when the coprocessor monitors the first voice data with the similarity of the first voice data and the preset wake-up word meeting the preset condition, informing the main processor to judge whether the text corresponding to the first voice data is matched with the text of the preset wake-up word of the terminal, and when determining that the text corresponding to the first voice data is matched with the text of the preset wake-up word, performing voiceprint verification on the first voice data by using the first voiceprint model.

15. The terminal of claim 11, wherein the processor configured to authenticate the user comprises:

the processor is used for controlling the display to display an identity verification interface; receiving authentication information input by a user on the authentication interface displayed on the display; and carrying out user authentication on the user according to the authentication information.

16. A terminal according to claim 11 or 15, wherein the processor comprises a co-processor and a main processor; the coprocessor is used for monitoring voice data; when the coprocessor monitors the first voice data with the similarity meeting the preset conditions, informing the main processor to judge whether the text corresponding to the first voice data is matched with the text of the preset wake word of the terminal, and when determining that the text corresponding to the first voice data is matched with the text of the preset wake word, performing voiceprint verification on the first voice data by using the first voiceprint model;

the coprocessor is also used for monitoring voice data in the first preset time; and informing the main processor to judge whether the voice data received in the first preset time comprises second voice data and at least one voice data matched with the text of the preset wake-up word, wherein the text corresponding to the second voice data comprises a preset keyword.

17. The terminal of any of claims 11, 12 or 15, wherein the preset wake word stored in the memory includes at least two registered voice data recorded when the processor registered the preset wake word, the first voiceprint model being generated by the processor from the at least two registered voice data;

wherein the processor updates the first voiceprint model with the first voice data, comprising:

the processor is configured to replace third voice data in the at least two pieces of registered voice data with the first voice data to obtain updated at least two pieces of registered voice data, where a signal quality parameter of the third voice data is lower than a signal quality parameter of other voice data in the at least two pieces of registered voice data; generating a second voice pattern model according to the updated at least two registered voice data; and replacing the first voiceprint model with the second voiceprint model, wherein the second voiceprint model is used for representing voiceprint characteristics of the at least two updated registered voice data.

18. The terminal of claim 17, wherein the memory further stores a first voiceprint threshold, the first voiceprint threshold being generated by the processor from the first voiceprint model and the at least two registered voice data;

The processor is further configured to generate a second voice threshold according to the second voice model and the updated at least two registered voice data after generating the second voice model according to the updated at least two registered voice data and before replacing the first voice model with the second voice model;

the processor for replacing the first voiceprint model with the second voiceprint model comprises:

the processor is configured to replace the first voiceprint model with the second voiceprint model if a difference between the second voiceprint threshold and the first voiceprint threshold is less than a first preset threshold.

19. The terminal of claim 18, wherein the processor is further configured to delete the second voice print model and the first voice data if a difference between the second voice print threshold and the first voice print threshold is greater than or equal to the first preset threshold.

20. The terminal of any of claims 11, 12, 15, 18 or 19, wherein the processor is configured to update a first voiceprint model in the terminal with the first voice data, comprising:

The processor is configured to update the first voiceprint model with the first voice data if the signal quality parameter of the first voice data is higher than a second preset threshold;

21. A computer storage medium comprising computer instructions which, when run on a terminal, cause the terminal to perform a method of updating wake-up speech of a speech assistant for a terminal according to any of claims 1-10.