WO2020019176A1

WO2020019176A1 - Method for updating wake-up voice of voice assistant by terminal, and terminal

Info

Publication number: WO2020019176A1
Application number: PCT/CN2018/096917
Authority: WO
Inventors: 许军
Original assignee: 华为技术有限公司
Priority date: 2018-07-24
Filing date: 2018-07-24
Publication date: 2020-01-30
Also published as: CN111742361B; CN111742361A

Abstract

Disclosed are a method for updating a wake-up voice of a voice assistant by a terminal, and a terminal, wherein same relate to the technical field of voice control, and can update a wake-up word of a terminal in real time, thereby improving the voice wake-up rate of the terminal performing voice wake-up, and reducing the false wake-up rate. The specific solution is: a terminal receiving first voice data input by a user; the terminal determining whether the text corresponding to the first voice data matches the text of a preset wake-up word registered in the terminal; if the text corresponding to the first voice data matches the text of the preset wake-up word, the terminal performing identity authentication on the user; and if the identity authentication is passed, the terminal using the first voice data to update a first voiceprint model in the terminal, wherein the first voiceprint model is used for performing voiceprint verification when the voice assistant is woken up, and the first voiceprint model represents the voiceprint feature of the preset wake-up word.

Description

Method and terminal for updating wake-up voice of voice assistant by terminal

Technical field

The embodiments of the present application relate to the technical field of voice control, and in particular, to a method and a terminal for updating a wake-up voice of a voice assistant by a terminal.

Background technique

Voice assistant is an important application for mobile phones. Voice assistants can intelligently interact with users for intelligent dialogue and instant Q & A. In addition, the voice assistant can also recognize the user's voice command and cause the mobile phone to execute the event corresponding to the voice command. For example, if the voice assistant receives and recognizes the voice command "make a call to Bob" input by the user, the mobile phone can automatically make a call to the contact Bob.

Generally speaking, the voice assistant is dormant. Before users want to use the voice assistant, they can wake up the voice assistant. Before performing the voice wake-up, the user needs to register a wake-up word (ie, wake-up voice) in the mobile phone to wake up the voice assistant. The mobile phone can generate a voiceprint model that can characterize the voiceprint of the wakeword according to the wakeword input by the user. The voice wake-up process may include: the mobile phone monitors voice data through a low-power digital signal processor (Digital Signal Processing, DSP). When the DSP detects that the similarity between the voice data and the awake word satisfies a certain condition, the DSP delivers the monitored voice data to an Application Processor (AP). The AP performs text verification and voiceprint verification on the voice data to determine whether the voice data matches the generated voiceprint model. When the voice data matches the voiceprint model, the phone can start the voice assistant.

Among them, after a user registers a wake-up word in a mobile phone, the wake-up word is rarely re-registered (ie, updated). However, the wake-up words registered in the mobile phone are only the voice data recorded by the user in a certain noise scene under the current state of the body. Changes in the user's physical state and changes in the user's noise scene will affect the voice data sent by the user. Therefore, when the physical state of the user and / or the noise scene in which the user is present changes, if the wake-up word that is originally registered is still used for voice wake-up, the voice wake-up rate of the mobile phone will be reduced and the false wake-up of the mobile phone to perform voice wake-up rate.

Summary of the Invention

Embodiments of the present application provide a method and a terminal for updating a wake-up voice of a voice assistant by a terminal, which can update a wake-up voice of the terminal in real time, thereby improving a voice wake-up rate of the terminal performing a voice wake-up and reducing a false wake-up rate.

In a first aspect, an embodiment of the present application provides a method for a terminal to update a wake-up voice of a voice assistant. The method may include: the terminal receives the first voice data input by the user; the terminal judges whether the text corresponding to the first voice data matches the text of a preset wake-up word registered in the terminal; if the text corresponding to the first voice data matches the preset wake-up If the text of the word matches, the terminal authenticates the user. If the identity authentication is passed, the terminal uses the first voice data to update the first voiceprint model in the terminal. The first voiceprint model is used to perform voiceprint verification when the voice assistant is awakened, and the first voiceprint model represents the voiceprint characteristics of the preset wakeup word.

In the embodiment of the present application, if the text corresponding to the first voice data matches the text of the preset wake-up word, and the user identity authentication is passed, it means that the first voice data is a wake-up of the voice assistant sent by the user who passed the identity authentication. voice. In addition, since the first voice data is user voice data acquired by the terminal in real time; therefore, the first voice data may reflect a user's physical state and / or a real-time condition of a noise scene in which the user is located. In summary, using the first voice data to update the voiceprint model of the terminal can increase the voice wake-up rate and reduce the false wake-up rate when the terminal performs voice wake-up.

Further, the first voice data is automatically acquired by the terminal during the voice wake-up process performed by the terminal, instead of prompting the user to manually re-register the wake-up word and receiving user input. In this way, using the first voice data to update the voiceprint model can also simplify the process of updating the wake word.

With reference to the first aspect, in a possible design manner, the terminal performs identity authentication on the user. Specifically, the terminal uses the first voiceprint model to perform voiceprint verification on the first voice data. If the first voice data passes the voiceprint verification, it means that the identity authentication is passed.

In the embodiment of the present application, the terminal may obtain first voice data that passes text verification and voice print verification when the terminal performs voice wake-up. Then, the first voiceprint model in the terminal is updated by using the first voice data. The first voice data is user voice data acquired by the terminal in real time; therefore, the first voice data may reflect a user's physical state and / or a real-time condition of a noise scene in which the user is located. In addition, because the first voice data passes the text check and voiceprint check; therefore, updating the voiceprint model of the terminal by using the first voice data can improve the voice wake-up rate and reduce the false wake-up rate of the terminal performing voice wake-up.

With reference to the first aspect, in another possible design manner, if the first voice data passes the voiceprint verification, the terminal may start a voice assistant. After the voice assistant is started, the terminal may receive valid voice commands or may not receive valid voice commands through the voice assistant. The terminal may determine whether to use the first voice data to update the first voiceprint model by determining whether the terminal has received a valid voice command. Specifically, the method in the embodiment of the present application further includes: when identity authentication is passed, the terminal starts a voice assistant; the terminal receives the second voice data through the voice assistant; and the terminal determines that the second voice data is a valid voice command. In this way, after the identity authentication is passed, if the terminal determines that the second voice data is a valid voice command, the terminal may use the first voice data to update the first voiceprint model in the terminal.

Wherein, the terminal uses the first voice data to update the first voiceprint model in the terminal only after the voice assistant is activated and receives a valid voice command for triggering the terminal to perform a corresponding function. If the terminal's voice assistant starts and receives a valid voice command, it means that the voice wake-up is a valid voice wake-up that matches the user's intention. The voiceprint model of the terminal is updated by using the voice data that can reflect the user's true intentions and can successfully wake up the terminal, which can further increase the voice wake-up rate of the terminal to perform voice wake-up and reduce the false wake-up rate.

With reference to the first aspect, in another possible design manner, the terminal includes a coprocessor and a main processor; the terminal uses the coprocessor to monitor voice data; when the coprocessor detects that the similarity with the preset wake-up word satisfies the pre- When the first voice data is set, the main processor is notified to determine whether the text corresponding to the first voice data matches the text of the preset wake-up word of the terminal, and determines whether the text corresponding to the first voice data and the text of the preset wake-up word are determined. When matching, the main process uses the first voiceprint model to perform voiceprint verification on the first voice data. For example, the coprocessor is a DSP and the main processor is an AP.

With reference to the first aspect, in another possible design manner, before the terminal authenticates the user, the terminal may use the first voiceprint model to perform voiceprint verification on the first voice data; if the first voice data fails, Voiceprint verification, the terminal performs text verification on the voice data received within the first preset time; if the terminal receives the second voice data and at least one text with the preset wake-up word within the first preset time For the matched voice data, the terminal authenticates the user. The text corresponding to the second voice data includes a preset keyword. For example, the second voice data may be voice data in which the user complains that the voice wake-up fails, such as "how to wake up", "how not", "not responding", "unable to wake up", and "voice wake up failed".

Wherein, after receiving the first voice data, the terminal finds that the voiceprint verification of the first voice data fails. Subsequently, the terminal can receive at least one voice data that passes the text verification within the first preset time, which means that the user repeatedly wants to voice wake up the voice assistant of the terminal, but the voice wake up fails. In this case, if the terminal also receives the second voice data within the first preset time, it indicates that the user is dissatisfied with the result of the voice wake-up failure. The terminal receives the second voice data and at least one voice data that has passed the text verification within the first preset time, indicating that the user has a strong willingness to wake up the voice assistant by voice; however, it may be because the user's current physical state and the user registered the wake word The difference in the physical state of the body is large, resulting in multiple speech failures. Because the received first voice data is voice data sent by the user for voice wake-up of the voice assistant under the strong will of the voice assistant of the voice wake-up terminal. Therefore, updating the voiceprint model of the terminal with voice data that can reflect the user's true intention can further increase the voice wake-up rate and reduce the false wake-up rate when the terminal performs voice wake-up.

In addition, since the first voice data is voice data of the user acquired by the terminal in real time; therefore, the first voice data may reflect a user's physical state and / or a real-time condition of a noise scene in which the user is located. Therefore, using the first voice data to update the voiceprint model of the terminal can improve the voice wake-up rate and reduce the false wake-up rate when the terminal performs voice wake-up. Further, the received first voice data is obtained automatically by the terminal during the voice wake-up process performed by the terminal, instead of prompting the user to manually re-register the wake-up word and receiving user input. In this way, updating the voiceprint model by using the received first voice data can also simplify the process of updating the wake word.

With reference to the first aspect, in another possible design manner, the terminal authenticates the user, including: the terminal displays an authentication interface; the terminal receives the authentication information entered by the user on the authentication interface; the terminal authenticates the user based on the authentication information Perform user authentication.

With reference to the first aspect, in another possible design manner, the terminal includes a coprocessor and a main processor; the terminal uses the coprocessor to monitor voice data; when the coprocessor detects that the similarity with the preset wake-up word satisfies the pre- When the first voice data is set, the main processor is notified to determine whether the text corresponding to the first voice data matches the text of the preset wake-up word of the terminal, and when it is determined that the text corresponding to the first voice data matches the text of the preset wake-up word The main process uses the first voiceprint model to perform voiceprint verification on the first voice data. The terminal uses the coprocessor to monitor the voice data in the first preset time; and notifies the main processor to determine whether the voice data received in the first preset time includes the second voice data and at least one that matches the text of the preset wake-up word. The voice data, and the text corresponding to the second voice data contains preset keywords. For example, the coprocessor is a DSP and the main processor is an AP.

With reference to the first aspect, in another possible design manner, the preset wake-up word includes at least two registered voice data, and at least two of the registered voice data are recorded when the terminal registers the preset wake-up word, the first sound The pattern is generated based on at least two registered speech data. After the terminal generates a new voiceprint model according to the first voice data, if the first voiceprint model is directly replaced with the new voiceprint model, the voice wakeup rate of the terminal performing voice wakeup can be improved. However, directly replacing the first voiceprint model with a voiceprint model generated based on the new voice data (ie, the first voice data) will greatly improve the voice wake-up rate. And greatly increasing the voice wake-up rate may correspondingly increase the false wake-up rate of the terminal performing voice wake-up.

In order to stably increase the voice wake-up rate of the terminal, and reduce the false wake-up rate of the voice wake-up performed by the terminal. The method for the terminal to update the first voiceprint model in the terminal by using the first voice data may include: the terminal uses the first voice data to replace the third voice data in the at least two registered voice data to obtain at least two updated registrations The signal quality parameters of the voice data and the third voice data are lower than the signal quality parameters of other voice data in the at least two registered voice data; the terminal generates a second voiceprint model according to the updated at least two registered voice data; the terminal uses the first The two voiceprint models replace the first voiceprint model. The second voiceprint model is used to characterize the voiceprint features of the at least two registered voice data after the update.

In the embodiment of the present application, the terminal uses the first voice data to replace part of the voice data in the at least two registered voice data, such as the third voice data; instead of generating the second voiceprint model completely based on the first voice data. In this way, the voice wake-up rate of the terminal performing voice wake-up can be relatively stably improved. In addition, while stably increasing the voice wake-up rate of the terminal, the false wake-up rate of the terminal performing voice wake-up can be reduced.

With reference to the first aspect, in another possible design manner, if the second voiceprint threshold generated by the terminal according to the second voiceprint model is significantly different from the first voiceprint threshold, the wakeup rate of the terminal performing voice wakeup will be caused Large fluctuations affect user experience. Based on this, the terminal may generate a second voiceprint threshold according to the second voiceprint model and the updated at least two registered voice data; if the difference between the second voiceprint threshold and the first voiceprint threshold is less than the first preset threshold , The terminal will replace the first voiceprint model with the second voiceprint model.

Wherein, when the change between the second voiceprint threshold and the first voiceprint threshold is large, the terminal may delete the second voiceprint model and the first voice data, that is, the first voiceprint model is not used to replace the second voiceprint model. In this way, the large difference between the second voiceprint threshold and the first voiceprint threshold can prevent the wake-up rate of the terminal from performing a voice wakeup to fluctuate greatly, affecting the user experience.

With reference to the first aspect, in another possible design manner, in order to prevent the terminal from updating the first voiceprint model with voice data with poor signal quality, the terminal may first update the first voiceprint model with the first voice data. It is determined whether a signal quality parameter of the first voice data is higher than a second preset threshold. The signal quality parameters of the voice data are used to characterize the signal quality of the voice data. For example, the signal quality parameter of the voice data may be a signal-to-noise ratio of the voice data. If the signal quality parameter of the first voice data is higher than the second preset threshold, it means that the signal quality of the first voice data is relatively high. In this case, the terminal may update the first voiceprint model by using the first voice data. If the signal quality parameter of the first voice data is lower than or equal to the second preset threshold, the terminal may delete the first voice data.

In a second aspect, an embodiment of the present application provides a terminal. The terminal includes a storage unit, an input unit, a text verification unit, an identity authentication unit, and an update unit. The storage unit stores a preset wake-up word registered in the terminal and a first voiceprint model. The first voiceprint model is used to perform voiceprint verification when the voice assistant is awakened, and the first voiceprint model represents the voiceprint characteristics of a preset wakeup word. The input unit is configured to receive first voice data input by a user. The text verification unit is configured to determine whether the text corresponding to the first voice data matches the text of a preset wake-up word registered in the terminal. An identity authentication unit is configured to authenticate the user if the text verification unit determines that the text corresponding to the first voice data matches the text of the preset wake-up word. The updating unit is configured to: if the identity authentication unit determines that the identity authentication is passed, the terminal uses the first voice data to update the first voiceprint model in the terminal.

With reference to the second aspect, in a possible design manner, the identity authentication unit is specifically configured to: use the first voiceprint model to perform voiceprint verification on the first voice data; if the voiceprint verification is passed, the identity authentication passes .

With reference to the second aspect, in another possible design manner, the terminal further includes: a starting unit and a determining unit. The starting unit is configured to start the voice assistant when the identity authentication unit determines that the identity authentication is passed. The input unit is further configured to receive the second voice data through a voice assistant. The determining unit is configured to determine that the second voice data received by the input unit is a valid voice command after the identity authentication unit passes the identity authentication. An updating unit is configured to update the first voiceprint model with the first voice data after the determining unit determines that the second voice data is a valid voice command.

With reference to the second aspect, in another possible design manner, the terminal further includes: a voiceprint verification unit. The voiceprint verification unit is configured to perform voiceprint verification on the first voice data using the first voiceprint model before the identity authentication unit authenticates the user. The text verification unit is further configured to perform text verification on the voice data received by the input unit within a first preset time if the voiceprint verification unit determines that the first voice data fails the voiceprint verification. The identity authentication unit is specifically configured to: if the text verification unit determines that the input unit receives the second voice data and at least one voice data that matches the text of the preset wake-up word within the first preset time, authenticate the user. The text corresponding to the second voice data includes a preset keyword.

With reference to the second aspect, in another possible design manner, the foregoing terminal further includes: a display unit. The display unit is configured to display an authentication interface if the text verification unit determines that the input unit receives the second voice data and at least one voice data that matches the text of the preset wake-up word within the first preset time. The input unit is further configured to receive authentication information input by a user on an authentication interface displayed on the display unit. The identity authentication unit is specifically configured to perform user identity verification on the user according to the identity verification information received by the input unit.

With reference to the second aspect, in another possible design manner, the preset wake-up word includes at least two registered voice data, at least two of the registered voice data are recorded when the terminal registers the preset wake-up word, and the first voiceprint model It is generated based on at least two registered voice data. The terminal also includes: a replacement unit and a generation unit. The replacement unit is configured to replace the third voice data of the at least two registered voice data with the first voice data to obtain the updated at least two registered voice data. The signal quality parameter of the third voice data is lower than the at least two registered voice data. Signal quality parameters of other voice data in the voice data. A generating unit is configured to generate a second voiceprint model according to the updated at least two registered voice data obtained by the replacement unit. The updating unit is configured to replace the first voiceprint model with the second voiceprint model generated by the generating unit, and the second voiceprint model is used to characterize the voiceprint features of the updated at least two registered voice data.

With reference to the second aspect, in another possible design manner, the storage unit is configured to save a first voiceprint threshold, and the first voiceprint threshold is generated by the generating unit according to the first voiceprint model and at least two registered voice data. . The generating unit is further configured to generate a second voiceprint model, and before the updating unit replaces the first voiceprint model with the second voiceprint model, generate a first voiceprint model according to the second voiceprint model and the updated at least two registered voice data. Two voiceprint thresholds;

The updating unit is specifically configured to use a second voiceprint model to replace the first voiceprint model if the difference between the second voiceprint threshold and the first voiceprint threshold generated by the generating unit is less than the first preset threshold.

With reference to the second aspect, in another possible design manner, the foregoing terminal further includes: a deleting unit. The deleting unit is configured to delete the second voiceprint model and the first voice data if the difference between the second voiceprint threshold and the first voiceprint threshold generated by the generating unit is greater than or equal to the first preset threshold.

With reference to the second aspect, in another possible design manner, the update unit is specifically configured to update the first voiceprint model by using the first voice data if the signal quality parameter of the first voice data is higher than a second preset threshold. . The signal quality parameter of the first voice data includes a signal-to-noise ratio of the first voice data.

In a third aspect, an embodiment of the present application provides a terminal. The terminal may include a processor, a memory, and a display. The memory, display and processor are coupled. The display is used to display images generated by the processor. The memory is used to store computer program code, related information of the voice assistant, preset wake-up words registered in the terminal, and the first voiceprint model. The computer program code includes computer instructions. When the processor executes the computer instructions, the processor is configured to receive the first voice data input by the user; determine whether the text corresponding to the first voice data matches the text of the preset wake-up word; The text corresponding to a voice data matches the text of the preset wake-up word, and then the user is authenticated. If the authentication is passed, the first voice data model is updated by using the first voice data. The first voiceprint model is used to perform voiceprint verification when the voice assistant is awakened, and the first voiceprint model represents the voiceprint characteristics of the preset wakeup word.

With reference to the third aspect, in a possible design manner, the foregoing processor may be further configured to perform voiceprint verification on the first voice data using the first voiceprint model. Among them, if the voiceprint verification is passed, the identity authentication is passed.

With reference to the third aspect, in another possible design manner, the foregoing processor may also be used to start a voice assistant when the identity authentication is passed; receive the second voice data through the voice assistant; after the identity authentication is passed, determine the first The second voice data is a valid voice command. After determining that the second voice data is a valid voice command, the first voice data model in the terminal is updated with the first voice data.

With reference to the third aspect, in another possible design manner, the processor includes a coprocessor and a main processor; the coprocessor voice monitors voice data; and when the coprocessor monitors the similarity with the preset wake word When the first voice data that meets the preset conditions is notified, the main processor is notified to determine whether the text corresponding to the first voice data matches the text of the preset wake-up word of the terminal, and determines that the text corresponding to the first voice data and the text of the preset wake-up word When matching, the main process uses the first voiceprint model to perform voiceprint verification on the first voice data.

With reference to the third aspect, in another possible design manner, the processor is further configured to perform voiceprint verification on the first voice data using the first voiceprint model before performing user identity authentication; if the first The voice data does not pass the voiceprint verification. Text verification is performed on the voice data received within the first preset time; if the second voice data and at least one text with the preset wake-up word are received within the first preset time Match the voice data to authenticate the user. The text corresponding to the second voice data includes a preset keyword.

With reference to the third aspect, in another possible design manner, the processor is further configured to, if the second voice data is received within the first preset time and at least one voice data that matches the text of the preset wake-up word , Then control the display to display the authentication interface. The processor is further configured to receive the authentication information input by the user on the authentication interface displayed on the display; and perform user authentication on the user according to the authentication information.

With reference to the third aspect, in another possible design manner, the foregoing processor includes a coprocessor and a main processor; the coprocessor monitors voice data; when the coprocessor detects that the similarity with the preset wake-up word satisfies the pre- When the conditional first voice data is set, the main processor is notified to determine whether the text corresponding to the first voice data matches the text of the preset wake-up word of the terminal. The main process uses the first voiceprint model to perform voiceprint verification on the first voice data. The coprocessor monitors the voice data in the first preset time; notifies the main processor to determine whether the voice data received in the first preset time includes the second voice data and at least one voice data that matches the text of the preset wake-up word The text corresponding to the second voice data contains a preset keyword.

With reference to the third aspect, in another possible design manner, the preset wake-up word stored in the memory includes at least two registered voice data, and at least two of the registered voice data are recorded when the processor registers the preset wake-up word. A voiceprint model is generated by the processor based on at least two registered voice data. The processor is further configured to use the first voice data to replace the third voice data in the at least two registered voice data to obtain updated at least two registered voice data, and the signal quality parameter of the third voice data is lower than at least two Signal quality parameters of other voice data in the registered voice data; generating a second voiceprint model based on the updated at least two registered voice data; replacing the first voiceprint model with the second voiceprint model, and using the second voiceprint model with To characterize the voiceprint features of the at least two registered voice data after the update.

With reference to the third aspect, in another possible design manner, a first voiceprint threshold is also stored in the memory, and the first voiceprint threshold is generated by the processor according to the first voiceprint model and at least two registered voice data. . The processor is further configured to generate a second voiceprint model and replace the first voiceprint model with the second voiceprint model, and generate a second voiceprint model according to the second voiceprint model and the updated at least two registered voice data. Voiceprint threshold; if the difference between the second voiceprint threshold and the first voiceprint threshold is less than the first preset threshold, the second voiceprint model is used to replace the first voiceprint model.

With reference to the third aspect, in another possible design manner, the processor is further configured to delete the second voice if the difference between the second voiceprint threshold and the first voiceprint threshold is greater than or equal to the first preset threshold. Pattern and first speech data.

With reference to the third aspect, in another possible design manner, the processor is further configured to update the first voiceprint model by using the first voice data if the signal quality parameter of the first voice data is higher than a second preset threshold. . The signal quality parameter of the first voice data includes a signal-to-noise ratio of the first voice data.

In a fourth aspect, an embodiment of the present application provides a computer storage medium, where the computer storage medium includes computer instructions, and when the computer instructions are run on a terminal, the terminal is caused to execute the same as the first aspect and any of the possibilities. Designed in the way described.

In a fifth aspect, an embodiment of the present application provides a computer program product, and when the computer program product runs on a computer, the computer is caused to execute the method according to the first aspect and any one of possible design manners.

In addition, for the technical effects brought by the terminals described in the second and third aspects and any one of the design methods, the computer storage medium described in the fourth aspect, and the computer program product described in the fifth aspect, refer to the foregoing. The technical effects brought by the first aspect and its different design methods are not repeated here.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a first schematic diagram of a display interface example of a terminal according to an embodiment of the present application; FIG.

FIG. 2 is a second schematic diagram of a display interface example of a terminal according to an embodiment of the present application; FIG.

FIG. 3 is a schematic structural diagram of a hardware structure of a terminal according to an embodiment of the present application; FIG.

4A is a first flowchart of a method for a terminal to update a wake-up voice of a voice assistant according to an embodiment of the present application;

FIG. 4B is a third schematic diagram of a display interface example of a terminal according to an embodiment of the present application; FIG.

5A is a second flowchart of a method for a terminal to update a wake-up voice of a voice assistant according to an embodiment of the present application;

FIG. 5B is a fourth schematic view of an example of a display interface of a terminal according to an embodiment of the present application; FIG.

6 is a third flowchart of a method for a terminal to update a wake-up voice of a voice assistant according to an embodiment of the present application;

7 is a fourth flowchart of a method for a terminal to update a wake-up voice of a voice assistant according to an embodiment of the present application;

8 is a fifth flowchart of a method for a terminal to update a wake-up voice of a voice assistant according to an embodiment of the present application;

9 is a flowchart of a method for a terminal to update a wake-up voice of a voice assistant according to an embodiment of the present application;

FIG. 10 is a fifth schematic diagram of a display interface example of a terminal according to an embodiment of the present application; FIG.

11 is a flowchart VII of a method for a terminal to update a wake-up voice of a voice assistant according to an embodiment of the present application;

FIG. 12 is a flowchart of a method for updating a wake-up voice of a voice assistant provided by a terminal according to an embodiment of the present application;

FIG. 13 is a first schematic structural composition diagram of a terminal according to an embodiment of the present application; FIG.

14 is a second schematic diagram of the structure and composition of a terminal according to an embodiment of the present application;

FIG. 15 is a third schematic structural diagram of a terminal according to an embodiment of the present application.

detailed description

Embodiments of the present application provide a method and a terminal for updating a wake-up voice of a voice assistant by a terminal, which can be applied to a process in which the terminal performs a voice wake-up in response to voice data input by a user.

Before performing the voice wake-up, the terminal may receive a preset wake-up word registered by the user. The preset wake-up word is used to wake up the voice assistant in the terminal, so that the terminal can provide the user with voice control services through the voice assistant. The wake-up voice assistant described in the embodiments of the present application means that the terminal starts the voice assistant in response to the voice data sent by the user. The voice control service means that after the terminal's voice assistant is started, the user can trigger the terminal to execute a corresponding event by sending a voice command (ie, voice data) to the voice assistant. The preset wake-up word in the embodiment of the present application is a piece of voice data. The voice data is a wake-up voice used to wake up the voice assistant.

The voice assistant may be an application (Application, APP) installed in the terminal. The voice assistant may be an embedded application in the terminal (that is, a system application of the terminal) or a downloadable application. Among them, the embedded application program is an application program provided as part of a terminal (such as a mobile phone) implementation. For example, the embedded application can be a Settings application, a Short Message application, a Camera application, and so on. A downloadable application is an application that can provide its own Internet Protocol Multimedia Subsystem (IMS) connection. The downloadable application can be an application installed in the terminal in advance or can be downloaded and installed by the user in the terminal. Third-party applications in. For example, the downloadable application may be a "WeChat" application, an "Alipay" application, a "Mail" application, and the like.

In the embodiment of the present application, the mobile phone 100 shown in FIG. 1 is used as an example to describe the process of registering a preset wake-up word by a terminal:

The mobile phone 100 can receive a user's click operation (such as a click operation) on the "Settings" application icon. In response to the user's click operation on the “Settings” application icon, the mobile phone 100 may display the setting interface 101 shown in (a) in FIG. 1. The setting interface 101 may include a "airplane mode" option, a "WLAN" option, a "Bluetooth" option, a "mobile network" option, a "smart assistance" option 102, and the like. For specific functions of the "airplane mode" option, the "WLAN" option, the "Bluetooth" option, and the "mobile network" option, reference may be made to specific descriptions in conventional technologies, which are not described herein in the embodiment of the present application.

The mobile phone 100 may receive a user's click operation (such as a click operation) on the “smart assistance” option 102. In response to the user's click operation on the “smart assistance” option 102, the mobile phone 100 may display the smart assistance interface 103 shown in (b) of FIG. 1. The smart assistant interface 103 includes a "gesture control" option 104 and a "voice control" option 105. The “gesture control” option 104 is used to manage a user gesture that triggers the mobile phone 100 to execute a corresponding event. The “voice control” option 105 is used to manage a voice wake-up function of the mobile phone 100. Specifically, the mobile phone 100 may receive a user's click operation on the “voice control” option 105, and the mobile phone 100 may display the voice control interface 106 shown in (c) of FIG. 1. The voice control interface 106 includes a "voice wakeup" option 107 and an "incoming voice control" option 108. The “voice wakeup” option 107 is used to enable or disable the voice wakeup function of the mobile phone 100. For a voice wake-up function of a terminal (such as the mobile phone 100), refer to subsequent descriptions of the embodiments of the present application, which are not described herein. The "caller voice control" option 108 is used to trigger the mobile phone 100 to enable or disable the voice wake-up function when the mobile phone 100 receives an incoming call. For example, it is assumed that the “call voice control” option 108 of the mobile phone 100 is turned on. When the mobile phone 100 receives an incoming call from another terminal and performs a call reminder, if the mobile phone 100 recognizes the voice data "answer the call" entered by the owner, the mobile phone 100 can automatically answer the call; if the mobile phone 100 recognizes the voice data entered by the owner "Hang up the phone", the mobile phone 100 can automatically reject the call.

The mobile phone 100 may receive a user's click operation (such as a click operation) on the "voice wakeup" option 107. In response to the user's click operation on the "voice wakeup" option 107, the mobile phone 100 may display the voice wakeup interface 109 shown in (d) of FIG. The voice wakeup interface 109 includes a "voice wakeup" switch 110, a "find a phone" option 111, a "how to make a call" option 112, a "wake word" option 113, and the like. The “voice wakeup” switch 110 is used to trigger the mobile phone 100 to enable or disable the voice wakeup function. The "Find a phone" option 111 and the "How to make a call" option 112 are used to instruct the voice control function of the mobile phone 100 after the voice assistant of the mobile phone 100 is activated. For example, the "Find a mobile phone" option 111 is used to indicate that after the voice assistant of the mobile phone 100 is activated, the voice assistant of the mobile phone 100 can respond to the user's voice data "Where are you?" To respond to the user to facilitate the user to find the mobile phone 100. The "how to make a call" option 112 is used to indicate that the voice assistant of the mobile phone 100 can automatically make a call to the contact Bob in response to the user's voice data "call Bob" after the voice assistant of the mobile phone 100 is activated.

The “wake word” option 113 is used to register a wake up word for the mobile phone 100 to wake up the mobile phone 100 (such as the voice assistant of the mobile phone 100). Before the user has registered a custom wake-up word in the mobile phone 100, the mobile phone 100 may indicate a default wake-up word to the user. For example, it is assumed that the default wake-up word of the mobile phone 100 is "my little k".

It is assumed that the “voice wakeup” switch 110 is on and the user-defined wake-up word has not been registered in the mobile phone 100. The mobile phone 100 may receive a user's click operation (such as a click operation) on the "wake word" option 113 shown in (d) in FIG. 1. In response to the user's click operation on the "wake word" option 113, the mobile phone 100 may display the default wake word registration interface 201 shown in (a) of FIG. 2. The default wakeup word registration interface 201 may include a recording progress bar 202, a "custom wakeup word" option 203, a "microphone" option 204, and a recording prompt message 205. The “microphone” option 204 is used to trigger the mobile phone 100 to start recording voice data as the wake-up word. The recording progress bar 202 is used to display the progress of the mobile phone 100 recording the wake-up word. The recording prompt information 205 is used to indicate a default wake-up word of the mobile phone 100. For example, the recording prompt information 205 may be "Please help the mobile phone to learn the wake word (my little k), click and say‘ my little k ’”. Optionally, the default wake-up word registration interface 201 may further include a recording prompt message "Please record in a quiet environment, about 30 cm away from the mobile phone!". The default wake-up word registration interface 201 further includes a "Cancel" button 206 and an "OK" button 207. The “OK” button 207 is used to trigger the mobile phone 100 to save the recorded wake-up word. The “Cancel” button 206 is used to trigger the mobile phone to cancel the registration of the wake-up word, and display the voice wake-up interface 109 shown in (d) of FIG. 1.

In response to the user's click operation on the "microphone" option 204, the mobile phone 100 can start recording voice data input by the user. After receiving the voice data (recorded as voice data 1) input by the user, the mobile phone 100 can determine whether the voice data 1 meets a preset condition. If the voice data 1 does not satisfy the preset condition, the mobile phone 100 may delete the voice data 1 and re-display the default wake-up word registration interface 201 shown in (a) of FIG. 2. If the voice data 1 meets a preset condition, the mobile phone 100 can save the voice data 1.

In the embodiment of the present application, the voice data 1 meeting the preset condition may specifically be: the text information corresponding to the voice data 1 is the text information “my small k” of the default wake-up word, and the signal-to-noise ratio of the voice data 1 is higher than the preset Threshold.

After the mobile phone 100 receives the voice data 1 that meets the preset conditions and is input by the user, it can generate a voiceprint model for voiceprint verification when the voice assistant is awakened based on the voice data 1 that meets the preset conditions, and The harmony pattern model generates a threshold for the pattern. The voiceprint model can characterize the voiceprint characteristics of wake words registered by the user.

Understandably, the voiceprint model is equivalent to a function. Different voiceprint models can be generated based on different speech data. That is, the mobile phone 100 can generate different voiceprint models according to different wake-up words registered by the same user. Different users registering the same wake-up word with the mobile phone 100 can also generate different voiceprint models. The mobile phone 100 may use the voice data 1 (that is, the voice data that is input by the user when registering the wake-up word and meets the preset conditions) as an input value, and substitute it into the voiceprint model to obtain a voiceprint value (such as the voiceprint value a). ).

Optionally, in order to improve the accuracy of voice wake-up. The terminal can record multiple voice data that meet preset conditions. The terminal may generate a voiceprint model for performing voiceprint verification when the voice assistant is awakened based on a plurality of voice data satisfying preset conditions. For example, the mobile phone 100 may prompt the user to record the voice data again after the voice data 1 meets a preset condition, and the voice data 1 is saved.

The “custom wake word” option 203 is used to trigger the mobile phone 100 to display a wake word input interface. For example, the mobile phone 100 may display the wake-up word shown in (b) of FIG. 2 in response to a user's click operation (such as a click operation) on the “custom wake-up word” option 203 shown in (a) of FIG. 2. Input interface 208. The wake-up word input interface 208 may include a “cancel” button 209, an “OK” button 210, a “wake-up word input box” 211, and a wake-up word suggestion 212. The “Cancel” button 209 is used to trigger the mobile phone to cancel the customized wake-up word and display the default wake-up word registration interface 201 shown in (a) of FIG. 2. The “wake word input box” 211 is used to receive a custom wake word input by a user. The "OK" button 210 is used to save a custom wake-up word entered by the user in the "wake-up word input box" 211. The wake-up word suggestion 212 is used to prompt the user of the mobile phone's request for a custom wake-up word.

Assume that the user inputs a custom wake-up word "my super phone" in the "wake-up word input box" 211 shown in (c) in FIG. 2. The mobile phone 100 may display a custom wakeup word registration interface 213 shown in (d) of FIG. 2 in response to a user's click operation (such as a click operation) on the “OK” button 210 shown in (c) of FIG. 2. , So that the user can register a custom wake-up word on the custom wake-up word registration interface 213. The method for a user to register a custom wake-up word on the custom wake-up word registration interface 213 is the same as the method for registering a default wake-up word on the default wake-up word registration interface 201, which is not described in the embodiment of the present application.

It can be understood that if the user-defined wake-up word has been registered in the mobile phone 100, for example, the custom wake-up word is "my super phone", then in response to the user's response to the "wake-up word" shown in (d) of FIG. 1 With the click operation of the option 113, the mobile phone 100 may display the customized wake-up word registration interface 216 shown in (d) of FIG. 2.

It should be noted that different terminals have different designs. For example, in some terminals, the above-mentioned intelligent assistance may be referred to as an auxiliary function, the above-mentioned voice control may be referred to as a voice assistant, and the above-mentioned voice wakeup may be referred to as a wake-up function. In addition, the manner in which the user triggers the terminal to display the wake-up word registration interface (such as a default wake-up word registration interface or a custom wake-up word registration interface) includes, but is not limited to, the user's "settings-intelligent assistance-voice control-voice wake-up-wake words" "operating. For example, in some terminals, the manner in which the user triggers the terminal to display the wake-up word registration interface may be "settings-voice assistant-voice wake-up wake-up word".

In the embodiment of the present application, the wake-up word of the mobile phone 100 is used as a default wake-up word “my little k” as an example to describe the voice wake-up process of the mobile phone 100:

When the DSP of the mobile phone 100 detects that the similarity between the voice data (such as the voice data 2) and the default wake-up word "My little k" satisfies a certain condition, the monitored voice data 2 may be delivered to the AP. The AP performs text verification on the voice data 2. When the AP recognizes that the text corresponding to the voice data 2 is “my little k”, the AP may use the voice data 2 as an input value and substitute it into the voiceprint model of the mobile phone 100 to obtain a voiceprint value (voiceprint value b). If the difference between the voiceprint value b and the voiceprint threshold (ie, the voiceprint value a) is less than a preset threshold, the AP may determine that the voice data 2 matches the wake-up word registered by the user.

In order to adapt to changes in the user's physical state and / or the noise scene in which the user is located, some mobile phones can periodically remind the user to re-register the wake-up word. However, the process of manually registering the wake-up word is cumbersome, and the manual registration of the wake-up word multiple times will waste the user's time and affect the user experience.

In the embodiment of the present application, the terminal may obtain a valid wake-up word in the process of performing a voice wake-up, and the terminal uses the valid wake-up word to update the registered wake-up word of the user. Wherein, the effective wake-up word in the embodiment of the present application may include voice data of a terminal that is successfully awakened. In the process of performing a voice wake-up, the terminal automatically obtains a valid wake-up word to update the registered wake-up word of the user, which can omit the tedious operation of the user when manually re-registering the wake-up word.

The principle of the method for updating the wake-up voice of the voice assistant provided by the terminal according to the embodiment of the present application: since the effective wake-up word is the voice data obtained by the terminal during the process of performing the voice wake-up; therefore, the effective wake-up word is related to the current physical state of the user and Voice data related to the noise scene that the user is currently in. And, since the effective wake-up word can successfully wake up the terminal; therefore, the degree of matching between the effective wake-up word and the wake-up word registered by the user satisfies the condition of voice wake-up. In summary, if the terminal uses the effective wake-up word to update the wake-up word registered by the user, and then uses the updated wake-up word to wake up the voice, it can adapt to the user's physical state and / or the noise scene in which the user is located, and further It can increase the voice wake-up rate of the mobile phone and reduce the false wake-up rate when the terminal performs voice wake-up.

The terminal in the embodiment of the present application may be a portable computer (such as a mobile phone), a notebook computer, a personal computer (PC), a wearable electronic device (such as a smart watch), a tablet computer, or augmented reality (AR) \ Virtual reality (VR) equipment, on-board computers, and the like, the following embodiments do not specifically limit the specific form of the terminal.

Please refer to FIG. 3, which shows a structural block diagram of a terminal 300 provided by an embodiment of the present application. The terminal 300 may include a processor 310, an external memory interface 320, an internal memory 321, a USB interface 330, a charge management module 340, a power management module 341, a battery 342, an antenna 1, an antenna 2, a radio frequency module 350, a communication module 360, Audio module 370, speaker 370A, receiver 370B, microphone 370C, headphone interface 370D, sensor module 380, button 390, motor 391, indicator 392, camera 393, display 394, and SIM card interface 395. The sensor module can include pressure sensor 380A, gyroscope sensor 380B, barometric pressure sensor 380C, magnetic sensor 380D, acceleration sensor 380E, distance sensor 380F, proximity light sensor 380G, fingerprint sensor 380H, temperature sensor 380J, touch sensor 380K, ambient light sensor 380L, bone conduction sensor, etc.

The terminal 300 shown in FIG. 3 is only an example of the terminal. The structure shown in FIG. 3 does not limit the terminal 300. It may include more or fewer parts than shown, or some parts may be combined, or some parts may be split, or different parts may be arranged. The illustrated components can be implemented in hardware, software, or a combination of software and hardware.

The processor 310 may include one or more processing units. For example, the processor 310 may include an application processor (Application Processor), a modem processor, a graphics processor (Graphics Processing Unit, GPU), and an image signal processor. (Image Signal Processor, ISP), controller, memory, video codec, DSP, baseband processor, and / or neural network processing unit (NPU), etc. Among them, different processing units can be independent devices or integrated in the same processor.

In the embodiment of the present application, the DSP can monitor the voice data in real time. When the similarity between the voice data monitored by the DSP and the wake-up word registered in the terminal meets a preset condition, the voice data can be handed over to the AP. The AP performs text verification and voiceprint verification on the voice data. When the AP determines that the voice data matches the wake-up word registered by the user, the terminal can start the voice assistant.

The controller may be a decision maker that directs the various components of the terminal 300 to coordinate work according to the instructions. It is the nerve center and command center of the terminal 300. The controller generates operation control signals according to the instruction operation code and timing signals, and completes the control of fetching and executing the instructions.

The processor 310 may further include a memory for storing instructions and data. In some embodiments, the memory in the processor is a cache memory. You can save instructions or data that the processor has just used or recycled. If the processor needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided, the processor's waiting time is reduced, and the efficiency of the system is improved.

In some embodiments, the processor 310 may include an interface. The interface may include an integrated circuit (Inter-Integrated Circuit, I2C) interface, an integrated circuit (Inter-Integrated Circuit, Sound, I2S) interface, a pulse code modulation (Pulse Code Modulation, PCM) interface, a universal asynchronous transceiver (Universal Asynchronous Receiver / Transmitter (UART) interface, Mobile Industry Processor Interface (MIPI), General-Purpose Input / output (GPIO) interface, Subscriber Identity Module (SIM) interface, And / or universal serial bus (Universal Serial Bus, USB) interface.

The I2C interface is a two-way synchronous serial bus, including a serial data line (Serial Data Line, SDA) and a serial clock line (Derail Clock Line, SCL). In some embodiments, the processor may include multiple sets of I2C buses. The processor can be coupled to touch sensors, chargers, flashes, cameras, etc. through different I2C bus interfaces. For example, the processor may couple the touch sensor through the I2C interface, so that the processor and the touch sensor communicate through the I2C bus interface to implement the touch function of the terminal 300.

The I2S interface can be used for audio communication. In some embodiments, the processor may include multiple sets of I2S buses. The processor may be coupled to the audio module through an I2S bus to implement communication between the processor and the audio module. In some embodiments, the audio module can transmit audio signals to the communication module through the I2S interface, so as to implement the function of receiving calls through a Bluetooth headset.

The PCM interface can also be used for audio communications, sampling, quantizing, and encoding analog signals. In some embodiments, the audio module and the communication module may be coupled through a PCM bus interface. In some embodiments, the audio module can also transmit audio signals to the communication module through the PCM interface, so as to implement the function of receiving calls through a Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication, and the sampling rates of the two interfaces are different.

The UART interface is a universal serial data bus for asynchronous communication. This bus is a two-way communication bus. It converts the data to be transferred between serial and parallel communications. In some embodiments, a UART interface is typically used to connect the processor and the communication module 360. For example, the processor communicates with the Bluetooth module through a UART interface to implement the Bluetooth function. In some embodiments, the audio module can transmit audio signals to the communication module through the UART interface, so as to implement the function of playing music through a Bluetooth headset.

The MIPI interface can be used to connect processors with peripheral devices such as displays, cameras, etc. The MIPI interface includes a camera serial interface (CSI), a display serial interface (DSI), and the like. In some embodiments, the processor and the camera communicate through a CSI interface to implement a shooting function of the terminal 300. The processor and the display screen communicate through a DSI interface to implement a display function of the terminal 300.

The GPIO interface can be configured by software. The GPIO interface can be configured as a control signal or as a data signal. In some embodiments, the GPIO interface may be used to connect the processor with a camera, a display screen, a communication module, an audio module, a sensor, and the like. GPIO interface can also be configured as I2C interface, I2S interface, UART interface, MIPI interface, etc.

The USB interface 330 may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like. The USB interface can be used to connect a charger to charge the terminal 300, and can also be used to transfer data between the terminal 300 and a peripheral device. It can also be used to connect headphones and play audio through headphones. It can also be used to connect other electronic devices, such as AR devices.

The interface connection relationship between the modules illustrated in the embodiments of the present application is only a schematic description, and does not constitute a limitation on the structure of the terminal 300. The terminal 300 may use different interface connection modes or a combination of multiple interface connection modes in the embodiments of the present application.

The charging management module 340 is configured to receive a charging input from a charger. The charger may be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module may receive a charging input of a wired charger through a USB interface. In some embodiments of wireless charging, the charging management module may receive a wireless charging input through a wireless charging coil of the terminal 300. While the charging management module is charging the battery, it can also supply power to the terminal device through the power management module 341.

The power management module 341 is used to connect the battery 342, the charge management module 340, and the processor 310. The power management module receives inputs from the battery and / or charge management module, and supplies power to a processor, an internal memory, an external memory, a display screen, a camera, and a communication module. The power management module can also be used to monitor battery capacity, battery cycle times, battery health (leakage, impedance) and other parameters. In some embodiments, the power management module 341 may also be disposed in the processor 310. In some embodiments, the power management module 341 and the charge management module may also be provided in the same device.

The wireless communication function of the terminal 300 may be implemented by the antenna module 1, the antenna module 2, the radio frequency module 350, the communication module 360, a modem, and a baseband processor. The antenna 1 and the antenna 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the terminal 300 may be used to cover a single or multiple communication frequency bands. Different antennas can also be multiplexed to improve antenna utilization. For example, a cellular network antenna can be multiplexed into a wireless LAN diversity antenna. In some embodiments, the antenna may be used in conjunction with a tuning switch.

The radio frequency module 350 may provide a communication processing module for a wireless communication solution including 2G / 3G / 4G / 5G and the like applied on the terminal 300. It may include at least one filter, switch, power amplifier, Low Noise Amplifier (LNA), and the like. The radio frequency module receives electromagnetic waves from the antenna 1, and processes the received electromagnetic waves by filtering, amplifying, etc., and transmitting them to the modem for demodulation. The radio frequency module can also amplify the signal modulated by the modem and turn it into electromagnetic wave radiation through the antenna 1. In some embodiments, at least part of the functional modules of the radio frequency module 350 may be disposed in the processor 310. In some embodiments, at least part of the functional modules of the radio frequency module 350 may be provided in the same device as at least part of the modules of the processor 310.

The modem may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing. The low-frequency baseband signal is processed by the baseband processor and then passed to the application processor. The application processor outputs sound signals through audio equipment (not limited to speakers, receivers, etc.), or displays images or videos through a display screen. In some embodiments, the modem may be a separate device. In some embodiments, the modem may be independent of the processor and disposed in the same device as the radio frequency module or other functional modules.

The communication module 360 can provide wireless local area networks (WLAN) (such as Wireless Fidelity (Wi-Fi) networks), Bluetooth (BlueTooth, BT), and global navigation satellite systems applied to the terminal 300. (Global Navigation System, GNSS), Frequency Modulation (Frequency Modulation, FM), Near Field Communication (NFC), Infrared (IR) and other wireless communication solutions. The communication module 360 may be one or more devices that integrate at least one communication processing module. The communication module receives the electromagnetic wave through the antenna 2, frequency-modulates and filters the electromagnetic wave signal, and sends the processed signal to the processor. The communication module 360 may also receive a signal to be transmitted from the processor, frequency-modulate it, amplify it, and turn it into electromagnetic wave radiation through the antenna 2.

In some embodiments, the antenna 1 of the terminal 300 is coupled to a radio frequency module, and the antenna 2 is coupled to a communication module 360. This enables the terminal 300 to communicate with the network and other devices through wireless communication technology. The wireless communication technology may include a Global System for Mobile Communications (GSM), a General Packet Radio Service (GPRS), a Code Division Multiple Access (CDMA), and a broadband Code Division Multiple Access (WCDMA), Time-Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC , FM, and / or IR technology. The GNSS may include a Global Positioning System (GPS), a Global Navigation Satellite System (GLONASS), a BeiDou Navigation Navigation Satellite System (BDS), and a Quasi-Zenith Satellite System (Quasi). -Zenith Satellite System (QZSS)) and / or Satellite Based Augmentation Systems (SBAS).

The terminal 300 implements a display function through a GPU, a display screen 394, and an application processor. The GPU is a microprocessor for image processing, which connects the display screen and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 310 may include one or more GPUs that execute program instructions to generate or change display information.

The display 394 is used to display images, videos, and the like. The display includes a display panel. The display panel can use a liquid crystal display (Liquid Crystal Display, LCD), organic light emitting diode (Organic Light-Emitting Diode, OLED), active matrix organic light emitting diode or active matrix organic light emitting diode (Active-Matrix Organic Light Emitting (Diode, AMOLED), Flexible Light-Emitting Diode (FLED), Miniled, MicroLed, Micro-oLed, Quantum Dot Light (Emitting Diodes, QLED), etc. In some embodiments, the terminal 300 may include one or N display screens, where N is a positive integer greater than 1.

The terminal 300 can implement a shooting function through an ISP, a camera 393, a video codec, a GPU, a display screen, and an application processor.

ISP is used to process data from camera feedback. For example, when taking a picture, the shutter is opened, and the light is transmitted to the light receiving element of the camera through the lens. The light signal is converted into an electrical signal, and the light receiving element of the camera passes the electrical signal to the ISP for processing and converts the image to the naked eye. ISP can also optimize the image's noise, brightness, and skin tone. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, an ISP may be provided in the camera 393.

The camera 393 is used to capture still images or videos. An object generates an optical image through a lens and projects it onto a photosensitive element. The photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then passes the electrical signal to the ISP to convert it into a digital image signal. The ISP outputs digital image signals to the DSP for processing. DSP converts digital image signals into image signals in standard RGB, YUV and other formats. In some embodiments, the terminal 300 may include one or N cameras, where N is a positive integer greater than 1.

A digital signal processor is used to process digital signals. In addition to digital image signals, it can also process other digital signals. For example, when the terminal 300 selects at a frequency point, the digital signal processor is used to perform a Fourier transform on the frequency point energy and the like.

Video codecs are used to compress or decompress digital video. The terminal 300 may support one or more codecs. In this way, the terminal 300 can play or record videos in multiple encoding formats, such as: MPEG1, MPEG2, MPEG3, MPEG4, and so on.

NPU is a neural network (Neural-Network, NN) computing processor. By drawing on the structure of biological neural networks, such as the transfer mode between neurons in the human brain, the NPU can quickly process input information and continuously learn. Through the NPU, applications such as intelligent recognition of the terminal 300 can be implemented, such as: image recognition, face recognition, speech recognition, text understanding, and the like.

The external memory interface 320 may be used to connect an external memory card, such as a Micro SD card, to realize the expansion of the storage capacity of the terminal 300. The external memory card communicates with the processor through an external memory interface to implement a data storage function. For example, save music, videos and other files on an external memory card.

The internal memory 321 may be used to store computer executable program code, where the executable program code includes instructions. The processor 310 executes various functional applications and data processing of the terminal 300 by running instructions stored in the internal memory 321. The memory 321 may include a storage program area and a storage data area. The storage program area may store an operating system, at least one application required by a function (such as a sound playback function, an image playback function, etc.) and the like. The storage data area can store data (such as audio data, phone book, etc.) created during the use of the terminal 300. The data (such as audio data, phone book, etc.) created during the use of the terminal 300 may be referred to as user data. In addition, the internal memory 321 may include high-speed random access memory (RAM), read-only memory (Read Only Memory, ROM), and may also include non-volatile memory, such as at least one disk storage device, flash memory device, Other volatile solid-state storage devices, Universal Flash Memory (Universal Flash Storage, UFS), etc.

The internal memory 321 includes a data partition (such as a data partition) described in the embodiment of the present application. The data partition stores files or data that need to be read and written when the operating system starts, and user data created during terminal use. The data partition may be a storage area set in advance in the internal memory 321. For example, the data partition may be contained in a RAM in the internal memory 321.

The virtual data partition in the embodiment of the present application may be a storage area of the RAM in the internal memory 321. Alternatively, the virtual data partition may be a storage area of a ROM in the internal memory 321. Alternatively, the virtual data partition may be an external memory card connected to the external memory interface 320, such as a Micro SD card.

The terminal 300 can implement audio functions through an audio module 370, a speaker 370A, a receiver 370B, a microphone 370C, a headphone interface 370D, and an application processor. Such as music playback, recording, etc.

The audio module is used to convert digital audio information into an analog audio signal output, and is also used to convert an analog audio input into a digital audio signal. The audio module can also be used to encode and decode audio signals. In some embodiments, the audio module may be disposed in the processor 310, or some functional modules of the audio module may be disposed in the processor 310.

The speaker 370A, also called a "horn", is used to convert audio electrical signals into sound signals. The terminal 300 can listen to music through a speaker or listen to a hands-free call.

The receiver 370B, also known as the "earpiece", is used to convert audio electrical signals into sound signals. When the terminal 300 answers a call or a voice message, it can answer the voice by holding the receiver close to the human ear.

Microphone 370C, also called "microphone", "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can make a sound through the mouth close to the microphone, and input the sound signal into the microphone. The terminal 300 may be provided with at least one microphone. In some embodiments, the terminal 300 may be provided with two microphones, and in addition to collecting sound signals, it may also implement a noise reduction function. In some embodiments, the terminal 300 may further be provided with three, four, or more microphones to collect sound signals, reduce noise, and also identify sound sources, and implement a directional recording function.

The headset interface 370D is used to connect a wired headset. The earphone interface can be a USB interface or a 3.5mm Open Mobile Terminal Platform (OMTP) standard interface, and the Cellular Telecommunications Industry Association of the USA (CTIA) standard interface.

The pressure sensor 380A is used to sense the pressure signal, and can convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor may be disposed on the display screen. There are many types of pressure sensors, such as resistive pressure sensors, inductive pressure sensors, and capacitive pressure sensors. The capacitive pressure sensor may be at least two parallel plates having a conductive material. When a force is applied to the pressure sensor, the capacitance between the electrodes changes. The terminal 300 determines the intensity of the pressure according to the change in capacitance. When a touch operation acts on the display screen, the terminal 300 detects the intensity of the touch operation according to a pressure sensor. The terminal 300 may also calculate the touched position based on the detection signal of the pressure sensor. In some embodiments, touch operations acting on the same touch position but different touch operation intensities may correspond to different operation instructions. For example, when a touch operation with a touch operation intensity lower than the first pressure threshold is applied to the short message application icon, an instruction for viewing the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold is applied to the short message application icon, an instruction for creating a short message is executed.

The gyro sensor 380B may be used to determine a motion posture of the terminal 300. In some embodiments, the angular velocity of the terminal 300 around three axes (ie, x, y, and z axes) may be determined by a gyro sensor. A gyroscope sensor can be used for image stabilization. Exemplarily, when the shutter is pressed, the gyro sensor detects the angle of the terminal 300 to shake, and calculates the distance that the lens module needs to compensate according to the angle, so that the lens can offset the shake of the terminal 300 through reverse movement to achieve anti-shake. The gyroscope sensor can also be used for navigation and somatosensory game scenes.

The barometric pressure sensor 380C is used to measure air pressure. In some embodiments, the terminal 300 calculates the altitude through the air pressure value measured by the air pressure sensor to assist in positioning and navigation.

The magnetic sensor 380D includes a Hall sensor. The terminal 300 can detect the opening and closing of the flip leather case by using a magnetic sensor. In some embodiments, when the terminal 300 is a flip machine, the terminal 300 may detect the opening and closing of the flip according to a magnetic sensor. Further, according to the opened and closed state of the holster or the opened and closed state of the flip cover, characteristics such as automatic unlocking of the flip cover are set.

The acceleration sensor 380E can detect the magnitude of the acceleration of the terminal 300 in various directions (generally three axes). The magnitude and direction of gravity can be detected when the terminal 300 is stationary. It can also be used to identify the posture of the terminal, and is used in applications such as switching between horizontal and vertical screens, and pedometers.

Distance sensor 380F, used to measure distance. The terminal 300 can measure the distance by infrared or laser. In some embodiments, when shooting a scene, the terminal 300 may use a distance sensor to measure distances to achieve fast focusing.

The proximity light sensor 380G may include, for example, a light emitting diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. Infrared light is emitted outward through a light emitting diode. Use photodiodes to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the terminal 300. When insufficient reflected light is detected, it can be determined that there is no object near the terminal 300. The terminal 300 may use a proximity light sensor to detect that the user is holding the terminal 300 close to the ear to talk, so as to automatically turn off the screen to save power. The proximity light sensor can also be used in holster mode, and the pocket mode automatically unlocks and locks the screen.

Ambient light sensor 380L is used to sense ambient light brightness. The terminal 300 can adaptively adjust the brightness of the display screen according to the perceived ambient light brightness. The ambient light sensor can also be used to automatically adjust the white balance when taking pictures. The ambient light sensor can also cooperate with the proximity light sensor to detect whether the terminal 300 is in a pocket to prevent accidental touch.

The fingerprint sensor 380H is used to collect fingerprints. The terminal 300 may use the collected fingerprint characteristics to realize fingerprint unlocking, access application lock, fingerprint photographing, fingerprint answering an incoming call, and the like.

The temperature sensor 380J is used to detect the temperature. In some embodiments, the terminal 300 executes a temperature processing strategy using the temperature detected by the temperature sensor. For example, when the temperature reported by the temperature sensor exceeds a threshold, the terminal 300 executes reducing the performance of a processor located near the temperature sensor in order to reduce power consumption and implement thermal protection.

The touch sensor 380K is also called "touch panel". Can be set on the display. Used to detect touch operations on or near it. The detected touch operation can be passed to the application processor to determine the type of touch event and provide corresponding visual output through the display screen.

The bone conduction sensor 380M can acquire vibration signals. In some embodiments, the bone conduction sensor may obtain a vibration signal of a human voice oscillating bone mass. Bone conduction sensors can also touch the human pulse and receive blood pressure beating signals. In some embodiments, a bone conduction sensor may also be provided in the headset. The audio module 370 may analyze a voice signal based on a vibration signal of the oscillating bone mass obtained by the bone conduction sensor to implement a voice function. The application processor may analyze the heart rate information based on the blood pressure beating signal obtained by the bone conduction sensor to implement a heart rate detection function.

The keys 390 include a start key, a volume key, and the like. The keys can be mechanical keys. It can also be a touch button. The terminal 300 receives key input, and generates key signal inputs related to user settings and function control of the terminal 300.

The motor 391 may generate a vibration alert. The motor can be used for incoming vibration alert and touch vibration feedback. For example, the touch operation applied to different applications (such as taking pictures, playing audio, etc.) can correspond to different vibration feedback effects. Touch operations on different areas of the display can also correspond to different vibration feedback effects. Different application scenarios (such as time reminders, receiving information, alarm clocks, games, etc.) can also correspond to different vibration feedback effects. Touch vibration feedback effect can also support customization.

The indicator 392 can be an indicator light, which can be used to indicate the charging status, power change, and can also be used to indicate messages, missed calls, notifications, etc.

The SIM card interface 395 is used to connect to a Subscriber Identity Module (SIM). The SIM card can be contacted and separated from the terminal 300 by inserting or removing the SIM card interface. The terminal 300 may support one or N SIM card interfaces, and N is a positive integer greater than 1. The SIM card interface can support Nano SIM cards, Micro SIM cards, SIM cards, etc. Multiple SIM cards can be inserted into the same SIM card interface at the same time. The types of the multiple cards may be the same or different. The SIM card interface is also compatible with different types of SIM cards. The SIM card interface is also compatible with external memory cards. The terminal 300 interacts with the network through the SIM card, and realizes functions such as calling and data communication. In some embodiments, the terminal 300 uses an eSIM, that is, an embedded SIM card. The eSIM card can be embedded in the terminal 300 and cannot be separated from the terminal 300.

The method for updating the wake-up voice of the voice assistant provided by the terminal provided in the embodiment of the present application may be implemented in the terminal 300 described above.

An embodiment of the present application provides a method for a terminal to update a wake-up voice of a voice assistant. The terminal 300 may receive the first voice data input by the user; determine whether the text corresponding to the first voice data matches the text of the preset wake-up word registered in the terminal 300; if the text corresponding to the first voice data matches the text of the preset wake-up word If they match, the terminal 300 authenticates the user. If the authentication succeeds, the terminal 300 uses the first voice data to update the first voiceprint model in the terminal.

The first voiceprint model is used to perform voiceprint verification when the voice assistant is awakened, and the first voiceprint model represents the voiceprint characteristics of the preset wakeup word in the terminal.

In an embodiment of the present application, the terminal performs identity authentication on the user. Specifically, the terminal uses the first voiceprint model to perform voiceprint verification on the first voice data. If the first voice data passes the voiceprint verification, it means that the identity authentication is passed.

In the embodiment of the present application, if the text corresponding to the first voice data matches the text of the preset wake-up word, and the user identity authentication is passed, it means that the first voice data is a wake-up of the voice assistant sent by the user who passed the identity authentication. voice. In addition, since the first voice data is the voice data of the user acquired by the terminal 300 in real time; therefore, the first voice data may reflect the physical state of the user and / or the real-time condition of the noise scene in which the user is located. In summary, using the first voice data to update the voiceprint model of the terminal 300 can improve the voice wake-up rate and reduce the false wake-up rate when the terminal performs voice wake-up.

Further, the first voice data is automatically acquired by the terminal 300 during the voice wake-up process performed by the terminal 300, instead of prompting the user to manually re-register the wake-up word to receive user input. In this way, using the first voice data to update the voiceprint model can also simplify the process of updating the wake word.

An embodiment of the present application provides a method for a terminal to update a wake-up voice of a voice assistant. As shown in FIG. 4A, the method for updating the wake-up voice of the voice assistant by the terminal may include S401-S405:

S401. The terminal 300 receives first voice data.

S402. The terminal 300 determines whether the text corresponding to the first voice data matches the text of the preset wake-up word registered in the terminal.

After the DSP of the terminal 300 detects the first voice data, the DSP of the terminal 300 may notify the AP of the terminal 300 to perform text verification and voice print verification on the first voice data. The AP may perform text verification on the first voice data by determining whether the text corresponding to the first voice data matches the text of a preset wake-up word registered in the terminal. If the text corresponding to the first voice data matches the text of the preset wake-up word registered in the terminal (if the same), the AP may continue to perform voiceprint verification on the first voice data, that is, the terminal 300 continues to execute S403. If the text corresponding to the first voice data does not match the text of the preset wake-up word registered in the terminal, the terminal 300 may delete the first voice data, that is, the terminal 300 may continue to execute S405.

S403. The terminal 300 performs voiceprint verification on the first voice data using the first voiceprint model.

The first voiceprint model is used to perform voiceprint verification when the voice assistant is woken up. The first voiceprint model is used to characterize the voiceprint features of the wake-up words registered in the terminal 300.

It can be known from the process of “terminal registering wake-up words” described in the embodiment of the present application that when the terminal 300 registers a preset wake-up word, voice data (referred to as registered voice data) is recorded. The preset wake-up word registered in the terminal 300 may include the registered voice data. The first voiceprint model is generated based on the registered voice data. With reference to the above description of the embodiment of the present application, after the terminal 300 generates the first voiceprint model, the registered voice data can be used as an input value to substitute the first voiceprint model to obtain the first voiceprint threshold.

The method for the terminal 300 to perform voiceprint verification on the first voice data using the first voiceprint model may include: After the terminal 300 determines that the first voice data passes the text verification, the terminal 300 may use the first voice data as an input value and substitute it into the first Voiceprint model to get a voiceprint value. The terminal 300 determines whether the difference between the voiceprint value and the first voiceprint threshold is less than a preset threshold. If the difference between the voiceprint value and the first voiceprint threshold is less than a preset threshold, the voiceprint verification passes. If the difference between the voiceprint value and the first voiceprint threshold is greater than or equal to a preset threshold, the voiceprint verification fails.

If the first voice data passes the voiceprint verification, the terminal 300 may use the first voice data to update the first voiceprint model in the terminal 300, that is, the terminal 300 may continue to execute S404. If the first voice data fails the voiceprint verification, the terminal 300 may delete the first voice data, that is, the terminal 300 may continue to execute S405.

S404. The terminal 300 updates the first voiceprint model in the terminal 300 with the first voice data.

The method in which the terminal 300 uses the first voice data to update the first voiceprint model (ie, S404) may include: the terminal 300 generates a second voiceprint model according to the first voice data, and uses the second voiceprint model to replace the first voiceprint model . The method for generating the second voiceprint model by the terminal 300 according to the first voice data may refer to the method for generating a voiceprint model by the terminal in the conventional technology. This embodiment of the present application will not repeat them here.

S405. The terminal 300 deletes the first voice data.

An embodiment of the present application provides a method for a terminal to update a wake-up voice of a voice assistant. The terminal 300 may obtain first voice data that passes text verification and voice print verification when the terminal 300 performs voice wake-up. Then, the first voice data model in the terminal 300 is updated using the first voice data. The first voice data is the voice data of the user obtained by the terminal 300 in real time; therefore, the first voice data may reflect the physical state of the user and / or the real-time condition of the noise scene in which the user is located. In addition, since the first voice data passes the text check and voiceprint check; therefore, using the first voice data to update the voiceprint model of the terminal 300 can improve the voice wake-up rate and reduce the false wake-up rate when the terminal performs voice wake-up.

Wherein, if the first voice data passes the voiceprint verification, the terminal 300 may start a voice assistant. In some cases, the user may speak a preset wake-up word (ie, voice data) of the terminal 300 during a conversation with others. In this case, the real purpose of the user speaking the preset wake-up word of the terminal 300 is not to start the voice assistant. After the voice assistant of the terminal 300 is activated, the user will not trigger the terminal 300 to perform any function through voice. In the embodiments of the present application, this type of voice wakeup is referred to as invalid voice wakeup. That is, after the voice assistant is started, the terminal 300 does not receive a valid voice command through the voice assistant. Based on this situation, the terminal 300 can determine whether to use the first voice data to update the first voiceprint model in the terminal 300 by determining whether the voice assistant has received a valid voice command after the voice assistant is started. Specifically, an embodiment of the present application provides a method for a terminal to update a wake-up voice of a voice assistant. As shown in FIG. 5A, the method for updating the wake-up voice of the voice assistant by the terminal may include S401-S403, S501-S503, S404, and S405:

Among them, after S403, if the first voice data passes the voiceprint check, the terminal 300 may continue to execute S501-S503. If the first voice data fails the voiceprint verification, the terminal 300 may continue to execute S405.

S501. The terminal 300 starts a voice assistant.

S502. The terminal 300 receives the second voice data through a voice assistant.

After the voice assistant is started, it can receive the second voice data input by the user, and trigger the terminal 100 to execute the function corresponding to the second voice data.

The terminal is a mobile phone 400 shown in FIG. 4B as an example. After the mobile phone 400 starts the voice assistant, the mobile phone 400 may display the "voice assistant" interface 401 shown in FIG. 4B. The "Voice Assistant" interface 401 includes a "Record" button 403 and a "Setting" option 404. In response, the mobile phone 400 may receive a voice command issued by the user in response to a user's click operation (such as a long-press operation) on the "Record" button 403, and trigger the mobile phone 400 to execute an event corresponding to the voice command. The “setting” option 404 is used to set various functions and parameters of the “Voice Assistant” application. The mobile phone 400 may receive a user's click operation on the “setting” option 306 in the voice control interface 303. In response to the user's click operation on the “setting” option 404, the mobile phone 400 may display the voice control interface 106 shown in (c) in FIG. 1. Optionally, the "voice assistant" interface 401 may further include prompt information 402. The prompt information 402 is used to indicate a common function of the "Voice Assistant" application to the user.

It should be noted that the "voice assistant" interface 401 may not include a "record" button 403. In other words, when the mobile phone 400 displays the "Voice Assistant" interface, the user does not need to click any button (such as the "Record" button 403) in the "Voice Assistant" interface, and the mobile phone 400 can also record voice commands issued by the user. The "Voice Assistant" interface of the terminal 300 includes, but is not limited to, the "Voice Assistant" interface 401 shown in FIG. 4B.

S503. The terminal 300 determines whether the second voice data is a valid voice command.

The effective voice command described in the embodiment of the present application refers to an instruction capable of triggering the terminal 300 to perform a corresponding function.

It can be understood that if the user intentionally speaks the preset wake-up word of the terminal 300, that is, the real purpose of the user to speak the preset wake-up word of the terminal 300 is to wake up the voice assistant of the terminal 300, then after the voice assistant of the terminal 300 is activated, Users generally trigger the terminal 300 to perform corresponding functions through voice. In other words, if after the voice assistant is activated, the terminal 300 receives an instruction (that is, a valid voice command) for triggering the terminal 300 to perform a corresponding function through the voice assistant, it means that the terminal will execute the corresponding function in response to the valid voice command , You can determine that this voice wakeup is a voice wakeup that matches the user's intention. In the embodiment of the present application, this voice wakeup is referred to as effective voice wakeup.

In order to ensure that after the first voiceprint model of the terminal 300 is updated using the first voice data, the terminal 300 performs a voice wake-up rate of voice wake-up. In the embodiment of the present application, the terminal only updates the wake-up word of the terminal 300 with the voice data corresponding to the effective voice wake-up. Specifically, if the voice assistant of the terminal 300 receives a valid voice command after being started, it means that the user using the first voice data to wake up the voice assistant of the terminal 300 is a valid voice wakeup, that is, the second voice data is a valid voice Command, the terminal 300 can execute 404. If the second voice data is not received after the voice assistant of the terminal 300 is started, it means that the user using the first voice data to wake up the voice assistant of the terminal 300 is an invalid voice wakeup, that is, the second voice data is not a valid voice command. The terminal 300 may delete the first voice data, that is, execute S405.

In the embodiment of the present application, after the terminal 300's voice assistant is activated, the terminal 300 uses the first voice data to update the first voice data in the terminal 300 only after receiving a valid voice command for triggering the terminal 300 to perform a corresponding function. A voiceprint model. If a valid voice command is received after the voice assistant of the terminal 300 is started, it means that the voice wakeup is a valid voice wakeup in accordance with the user's intention. The voiceprint model of the terminal 300 is updated by using the voice data that can reflect the user's true intention and can successfully wake up the terminal 300, which can further improve the voice wake-up rate of the terminal to perform voice wake-up and reduce the false wake-up rate.

It can be understood that if the signal quality of the first voice data is poor, after the terminal 300 uses the first voice data to update the first voiceprint model, the terminal 300 uses the updated voiceprint model to perform voice wake-up, which will affect the success of voice wake-up rate.

In order to prevent the terminal 300 from updating the first voiceprint model with voice data with poor signal quality, the terminal 300 may determine whether the signal quality parameter of the first voice data is higher than that of the first voiceprint model before using the first voice data to update the first voiceprint model. Two preset thresholds. The signal quality parameters of the voice data are used to characterize the signal quality of the voice data. For example, the signal quality parameter of the voice data may be a signal-to-noise ratio of the voice data. If the signal quality parameter of the first voice data is higher than the second preset threshold, it means that the signal quality of the first voice data is relatively high. In this case, the terminal 300 may update the first voiceprint model by using the first voice data. If the signal quality parameter of the first voice data is lower than or equal to the second preset threshold, the terminal 300 may delete the first voice data.

Optionally, in the embodiment of the present application, the user may also decide whether to use the first voice data to update the first voiceprint model in the terminal 300. Specifically, before updating the first voiceprint model in the terminal 300 with the first voice data, the terminal 300 may further display a first interface for prompting the user whether to update the voiceprint model. Then, the terminal 300 determines whether to update the voiceprint model according to the user's selection in the first interface. For example, the terminal 300 is a mobile phone 500 shown in FIG. 5B as an example. The mobile phone 500 may display the first interface 501 shown in FIG. 5B before the first voiceprint model in the mobile phone 500 is updated with the first voice data. The first interface 501 is used to prompt the user whether to update the voiceprint model (that is, the wake word). For example, the first interface 501 includes first prompt information, such as "the mobile phone obtains voice data that can update the wake-up word during the voice wake-up process" and "is the wake-up word updated?" The first interface 501 further includes: an "update" option for triggering the mobile phone 500 to update the voiceprint model and a "cancel" option for triggering the mobile phone 500 not to update the voiceprint model.

In the embodiment of the present application, before updating the voiceprint model, the terminal 300 displays a first interface for prompting the user whether to update the voiceprint model. In this way, the user can decide whether to update the first interface of the voiceprint model. That is, the terminal 300 can determine whether to update the first interface of the voiceprint model according to user requirements, which can improve the interaction performance between the terminal 300 and the user, and improve the user experience.

According to the process of “terminal registering wake-up words” introduced in the embodiment of the present application, it is known that when the terminal 300 registers a preset wake-up word, one or more voice data (referred to as registered voice data) is recorded. The first voiceprint model is generated based on the one or more registered voice data. It is assumed that the first voiceprint model is generated based on at least two registered voice data. Then, after the terminal 300 generates a new voiceprint model according to the first voice data, if the first voiceprint model is directly replaced with the new voiceprint model, the voice wakeup rate of the terminal 300 performing voice wakeup can be improved. However, directly replacing the first voiceprint model with a voiceprint model generated based on the new voice data (ie, the first voice data) will greatly improve the voice wake-up rate. A substantial increase in the voice wake-up rate may increase the false wake-up rate of the voice wake-up performed by the terminal 300 accordingly. In order to stably increase the voice wake-up rate of the terminal 300, and reduce the false wake-up rate of the voice wake-up performed by the terminal 300. The method in which the terminal 300 uses the first voice data to update the first voiceprint model (ie, S404) may include S601-S603. For example, as shown in FIG. 6, S404 shown in FIG. 5A may include S601-S603:

S601. The terminal 300 uses the first voice data to replace the third voice data in the at least two registered voice data, and obtains at least two updated registered voice data.

S602. The terminal 300 generates a second voiceprint model according to the updated at least two registered voice data.

S603. The terminal 300 replaces the first voiceprint model with the second voiceprint model.

Wherein, after the voice assistant of the terminal 300 is activated and receives a valid voice command, the terminal 300 may determine the third voice data from the at least two registered voice data saved by the terminal 300.

In some embodiments, the third voice data is voice data in which the signal quality parameter of the at least two registered voice data is lower than the signal quality parameters of other voice data. The terminal 300 uses the first voice data to replace the third voice data whose signal quality parameter is lower than the signal quality parameters of other voice data; and then generates a second voiceprint model according to the updated at least two registered voice data. Compared with other voice data, the voice data replaced by the first voice data in the at least two registered voice data has lower signal quality parameters. That is, the signal quality parameters of the retained voice data (that is, at least two updated registered voice data) are higher. The second voiceprint model generated by the terminal 300 based on the voice data with higher signal quality parameters can more accurately and clearly characterize the voiceprint characteristics of the user. The terminal 300 uses the second voiceprint model to perform voice wake-up, which can increase the voice wake-up rate and reduce the false wake-up rate of the terminal performing voice wake-up.

In other embodiments, the third voice data may be the earliest voice data stored by the terminal among the at least two registered voice data. Among them, compared with the other voice data other than the third voice data in the at least two registered voice data, the earliest voice data (that is, the third voice data) stored by the terminal is related to the user's current physical state and the user's current The consistency of the real-time conditions of the noise scene at the place is low. Therefore, after the first voice data is used to replace the third voice data, the real-time conditions of the retained voice data (that is, at least two registered voice data after update) and the current physical state of the user and the noise scene in which the user is currently located can be improved. Of compliance. The second voiceprint model generated by the terminal 300 according to the voice data with a higher degree of conformity can more accurately and clearly characterize the voiceprint characteristics of the user under the user's current body state and the current noise scene. The terminal 300 uses the second voiceprint model to perform voice wake-up, which can increase the voice wake-up rate and reduce the false wake-up rate of the terminal performing voice wake-up.

In the embodiment of the present application, the terminal 300 uses the first voice data to replace part of the voice data in the at least two registered voice data, such as the third voice data; instead of generating the second voiceprint model completely based on the first voice data. In this way, the voice wake-up rate of the voice wake-up performed by the terminal 300 can be relatively stabilized. In addition, while the voice wake-up rate of the terminal 300 can be steadily increased, the false wake-up rate of the voice wake-up performed by the terminal 300 can be reduced.

It can be understood that if the second voiceprint threshold generated by the terminal according to the second voiceprint model is significantly different from the first voiceprint threshold, it will cause the wake-up rate of the terminal 300 to perform voice wakeup to fluctuate greatly, affecting the user experience. Based on this, as shown in FIG. 7, after S602 and before S603 shown in FIG. 6, the method in the embodiment of the present application may further include S701-S702:

S701. The terminal 300 generates a second voiceprint threshold according to the second voiceprint model and the updated at least two registered voice data.

Among them, the second voiceprint model is equivalent to a function. For example, the terminal 300 may use each of the updated at least two registered voice data as input values, respectively, and substitute them into the second voiceprint model to obtain at least two voiceprint thresholds. The terminal 300 may calculate an average value of the at least two voiceprint thresholds to obtain a second voiceprint threshold. For example, it is assumed that the at least two updated registered voice data include the registered voice data a and the registered voice data b. The terminal 300 may substitute the registered voice data a into the second voiceprint model to obtain the voiceprint threshold A; substitute the registered voice data b into the second voiceprint model to obtain the voiceprint threshold B; calculate the voiceprint threshold A and the voiceprint threshold B. The average, to get the second voiceprint threshold.

S702. The terminal 300 determines whether a difference between the second voiceprint threshold and the first voiceprint threshold is less than a first preset threshold.

Specifically, if the difference between the second voiceprint threshold and the first voiceprint threshold is smaller than the first preset threshold, it means that the change between the second voiceprint threshold and the first voiceprint threshold is small. In this case, using the second voiceprint model for voiceprint verification will not have a significant impact on the wake-up rate of voice wake-up of the terminal 300. At this time, the terminal 300 may execute S603.

If the difference between the second voiceprint threshold and the first voiceprint threshold is greater than or equal to the first preset threshold, it means that the change between the second voiceprint threshold and the first voiceprint threshold is large. In this case, performing the voiceprint verification by using the second voiceprint model will greatly affect the wake-up rate of the voice wake-up of the terminal 300. At this time, as shown in FIG. 7, the terminal 300 may execute S703:

S703. The terminal 300 deletes the second voiceprint model and the first voice data.

It can be understood that when the change between the second voiceprint threshold and the first voiceprint threshold is large, the terminal 300 deletes the second voiceprint model and the first voice data, that is, the first voiceprint model is not used to replace the second voiceprint model. In this way, the large difference between the second voiceprint threshold and the first voiceprint threshold can prevent the wake-up rate of the terminal 300 from performing the voice wakeup from fluctuating greatly, affecting the user experience.

An embodiment of the present application provides a method for a terminal to update a wake-up voice of a voice assistant. As shown in FIG. 8, the method for updating the wake-up voice of the voice assistant by the terminal may include S801-S808:

S801. The terminal 300 receives first voice data.

S802. The terminal 300 determines whether the text corresponding to the first voice data matches the text of the preset wake-up word registered in the terminal.

Wherein, if the text corresponding to the first voice data matches the text of the preset wake-up word registered in the terminal, the AP may continue to perform voiceprint verification on the first voice data, that is, the terminal 300 continues to perform S803. If the text corresponding to the first voice data does not match the text of the preset wake-up word registered in the terminal, the terminal 300 may delete the first voice data, that is, the terminal 300 may continue to execute S808.

S803. The terminal 300 performs voiceprint verification on the first voice data by using the first voiceprint model.

If the first voice data passes the voiceprint verification, the terminal 300 may continue to execute S804. If the first voice data fails the voiceprint verification, the terminal 300 may continue to execute S808.

For detailed descriptions of S801-S803, reference may be made to the introduction of S401-S403 in the embodiments of the present application, which will not be repeated here in the embodiments of the present application.

S804. The terminal 300 starts a voice assistant.

S805. The terminal 300 performs text verification on the voice data received within the first preset time.

S806. The terminal 300 determines whether the terminal 300 receives the second voice data and at least one voice data that matches the text of the preset wake-up word within the first preset time.

The first preset time is a pre-determined time determined from the terminal 300 that the first voice data is the same as the text information of the wake-up word registered in the terminal 300 (that is, the first voice data passes the text verification), but fails to start the voiceprint verification Set the time period.

Generally speaking, the AP of the terminal 300 is in a sleep state. The DSP of the terminal 300 monitors the first voice data. When the similarity between the detected voice data and the wake-up word registered in the terminal 300 satisfies a certain condition, the DSP hands the monitored voice data to the AP, and the AP is woken up. The AP performs text verification and voiceprint verification on the voice data to determine whether the voice data matches the generated voiceprint model. After the AP performs text verification and voiceprint verification on the voice data to obtain the verification result, the AP enters the sleep state until it receives the voice data sent by the DSP again. That is to say, the DSP will only send to the AP voice data that has a certain degree of similarity with the wake word registered in the terminal 300. The AP only performs text verification and voiceprint verification on the voice data sent by the DSP (that is, the voice data whose similarity with the wake-up word registered in the terminal 300 satisfies certain conditions).

It can be understood that if the first voice data is the same as the text information of the wake-up word registered in the terminal 300 (that is, the first voice data can pass text verification), the DSP can recognize the first voice data and the wake-up registered in the terminal 300 The similarity of words meets certain conditions. The DSP may transmit the first voice data to the AP to wake up the AP. The AP performs text verification and voiceprint verification on the first voice data.

The difference is that in the embodiment of the present application, if the AP determines that the first voice data is the same as the text information of the wake word registered in the terminal 300 (that is, the first voice data can pass the text verification), but the first voice data fails Voiceprint verification, then the AP will not enter the sleep state immediately after receiving the verification result. Instead, the DSP delivers all voice data monitored in the first preset time to the AP, and the AP can perform text verification on all voice data monitored by the DSP in the first preset time.

The first voiceprint model is used to perform voiceprint verification when the voice assistant is awakened. The first voiceprint model can represent the voiceprint features of the wake-up words registered in the terminal. The text corresponding to the second voice data contains a preset keyword. For example, the second voice data may be voice data in which the user complains that the voice wake-up fails, such as "how to wake up", "how not", "not responding", "unable to wake up", and "voice wake up failed".

The AP performs text verification on all voice data monitored by the DSP within the first preset time. If the AP recognizes the second voice data such as "how to wake up", "how not to wake up", "not responding", "unable to wake up", and "voice wake up failure" at the first preset time, and at least one The text information is the same voice data as the text information of the wake-up word registered in the terminal 300, then the terminal 300 may use the first voice data received by the terminal 300 to update the first voiceprint model of the terminal 300.

It can be understood that if the terminal 300 receives the first voice data in S801 and finds that the voiceprint verification of the first voice data fails. Subsequently, the terminal 300 can receive at least one voice data that passes the text verification within the first preset time, which indicates that the user repeatedly wants to voice wake up the voice assistant of the terminal 300, but the voice wake-up fails. In this case, if the terminal 300 also receives the second voice data within the first preset time, it indicates that the user is dissatisfied with the result of the voice wake-up failure.

The terminal 300 receives the second voice data and the voice data that passes at least one text verification within the first preset time, indicating that the user has a strong willingness to wake up the voice assistant by voice; however, it may be because the user's current physical state and the user register the wake word The state of the body is very different at the time, resulting in multiple speech failures. Of course, it may also be because the real-time situation of the noise scene in which the user is currently located is different from the real-time situation of the noise scene in which the user is registering the wake-up word, resulting in multiple voice failures. In this case, even if the first voice data fails the voiceprint check, the terminal 300 may use the received first voice data to update the first voiceprint model in the terminal 300. That is, if the terminal 300 receives the second voice data and at least one voice data matching the text of the preset wake-up word within the first preset time, the first voice data model in the terminal is updated with the first voice data. , Then execute S807.

S807. The terminal 300 updates the first voiceprint model of the terminal 300 with the first voice data.

Wherein, if the terminal 300 does not receive the second voice data and at least one voice data matching the text of the preset wake-up word within the first preset time, the terminal 300 may delete the first voice data.

S808. The terminal 300 deletes the first voice data.

The method in which the terminal 300 uses the first voice data to update the first voiceprint model in the terminal 300 may include: the terminal 300 generates a second voiceprint model according to the first voice data, and uses the second voiceprint model to replace the first voiceprint model. . The method for generating the second voiceprint model by the terminal 300 according to the first voice data may refer to the method for generating a voiceprint model by the terminal in the conventional technology. This embodiment of the present application will not repeat them here.

Since the first voice data is the voice data of the user acquired by the terminal 300 in real time; therefore, the first voice data may reflect the physical state of the user and / or the real-time condition of the noise scene in which the user is located. Therefore, by using the first voice data to update the voiceprint model of the terminal 300, the voice wake-up rate of the voice wake-up performed by the terminal can be improved, and the false wake-up rate can be reduced.

In addition, because the received first voice data is voice data sent by the user for activating the voice assistant under the strong will of the voice assistant of the voice wake-up terminal 300. Therefore, the voiceprint model of the terminal 300 is updated by using voice data that can reflect the user's true intention, which can further increase the voice wake-up rate and reduce the false wake-up rate when the terminal performs voice wake-up.

Further, the received first voice data is automatically acquired by the terminal 300 during the voice wake-up process performed by the terminal 300, instead of prompting the user to manually re-register the wake-up word and receiving user input. In this way, updating the voiceprint model by using the received first voice data can also simplify the process of updating the wake word.

In order to prevent the terminal 300 from updating the first voiceprint model with voice data with poor signal quality, the terminal 300 may determine whether the signal quality parameter of the first voice data is higher than the first voiceprint model before updating the first voiceprint model with the first voice data. Two preset thresholds. The signal quality parameters of the voice data are used to characterize the signal quality of the voice data. For example, the signal quality parameter of the voice data may be a signal-to-noise ratio of the voice data. If the signal quality parameter of the first voice data is higher than the second preset threshold, it means that the signal quality of the first voice data is relatively high. In this case, the terminal 300 may update the first voiceprint model by using the first voice data. If the signal quality parameter of the first voice data is lower than or equal to the second preset threshold, the terminal 300 may delete the first voice data.

Optionally, in addition to the above-mentioned first voice data, the terminal 300 may also use the at least one voice data that matches the text of the preset wake-up word to update the first voiceprint model. Specifically, the terminal may select voice data with a signal quality parameter higher than a second preset threshold from the first voice data and at least one voice data that matches the text of the preset wake-up word; and then use the voice signal quality higher than The second preset threshold of speech data updates the first voiceprint model.

In order to prevent a malicious user from triggering the terminal 300 to execute S801-S808, the first voiceprint model in the terminal 300 is updated to achieve the purpose of awakening the terminal 300 by voice. The terminal 300 may perform user identity verification before executing S807. After the user authentication is passed, S807 is performed again. Specifically, after S806 and before S807, the terminal 300 may perform identity authentication on the user; if the identity authentication passes, the terminal 300 performs S807; if the identity authentication fails, the terminal 300 performs S808. The method for the terminal to authenticate the user may include S901-S903. As shown in FIG. 9, after S806 shown in FIG. 8 and before S807, the method in this embodiment of the present application may further include S901-S903:

S901. The terminal 300 displays an identity verification interface.

among them. The authentication interface is used to receive authentication information input by a user.

S902. The terminal 300 receives the authentication information input by the user on the authentication interface.

S903. The terminal 300 performs user identity verification according to the identity verification information.

Wherein, if the identity authentication is passed, the terminal 300 updates the first voiceprint model with the first voice data, that is, the terminal 300 executes S807. If the identity authentication fails, the terminal 300 deletes the first voice data, that is, the terminal 300 executes S808.

For example, the identity verification information may be any one of a digital password, a pattern password, fingerprint information, iris information, and facial feature information. Correspondingly, the aforementioned authentication interface may be any one of an interface for inputting a digital password or a pattern password, an interface for entering fingerprint information, an interface for entering iris information, and an interface for entering facial feature information. One.

Exemplarily, the terminal 300 is the mobile phone 1000 shown in FIG. 10, the above-mentioned identity verification information is a digital password, and the above-mentioned identity verification interface is an interface for entering a digital password as an example. The mobile phone 1000 can display the authentication interface 1001 shown in FIG. 10. The authentication interface 1001 includes a password input box 1002 and a first prompt message "After the user authentication is passed, the mobile phone will automatically update the wake-up word" 1003.

The terminal 300 performs user authentication. After the user authentication is passed, the terminal updates the first voiceprint model in the terminal 300. In this way, it is possible to prevent a malicious user from using the voice of the malicious user to trigger the terminal 300 to update the first voiceprint model in the terminal 300, so as to achieve the purpose of waking the terminal 300 by malicious voice. With this solution, the voiceprint model in the terminal 300 can be prevented from being maliciously updated, and the security of the terminal 300 can be improved.

It can be understood that the new voiceprint model is directly generated by using the first voice data, and the first voiceprint model is replaced by the new voiceprint model, although the voice wakeup rate of the terminal 300 performing voice wakeup can be improved. However, directly replacing the first voiceprint model with the voiceprint model generated based on the first voice data will greatly improve the voice wake-up rate. A substantial increase in the voice wake-up rate may increase the false wake-up rate of the voice wake-up performed by the terminal 300 accordingly. In order to stably increase the voice wake-up rate of the terminal 300, and reduce the false wake-up rate of the voice wake-up performed by the terminal 300. As shown in FIG. 11, the above S807 may include the above S601-S603.

In the embodiment of the present application, the terminal uses the first voice data to replace part of the voice data in the at least two registered voice data; instead of generating the second voiceprint model completely based on the first voice data. In this way, the voice wake-up rate of the voice wake-up performed by the terminal 300 can be relatively stabilized. In addition, while the voice wake-up rate of the terminal 300 can be steadily increased, the false wake-up rate of the voice wake-up performed by the terminal 300 can be reduced.

It can be understood that if the second voiceprint threshold generated by the terminal according to the second voiceprint model is significantly different from the first voiceprint threshold, it will cause the wakeup rate of the terminal 300 to perform voice wakeup to fluctuate greatly, affecting the user experience. Based on this, as shown in FIG. 2, after S602 and before S603 shown in FIG. 12, the method in this embodiment of the present application may further include S701-S702:

After S702, if the difference between the second voiceprint threshold and the first voiceprint threshold is less than the first preset threshold, it means that the change between the second voiceprint threshold and the first voiceprint threshold is small. In this case, using the second voiceprint model for voiceprint verification will not have a significant impact on the wake-up rate of voice wake-up of the terminal 300. At this time, the terminal 300 may execute S603. After S702, if the difference between the second voiceprint threshold and the first voiceprint threshold is greater than or equal to the first preset threshold, it means that the second voiceprint threshold and the first voiceprint threshold have a larger change. In this case, performing the voiceprint verification by using the second voiceprint model will greatly affect the wake-up rate of the voice wake-up of the terminal 300. At this time, the terminal 300 may execute S703.

It can be understood that when the change between the second voiceprint threshold and the first voiceprint threshold is large, the terminal 300 deletes the second voiceprint model and the third voice data, that is, the first voiceprint model is not used to replace the second voiceprint model. In this way, the large difference between the second voiceprint threshold and the first voiceprint threshold can prevent the wake-up rate of the terminal 300 from performing the voice wakeup from fluctuating greatly, affecting the user experience.

It can be understood that, in order to implement the foregoing functions, the foregoing terminal and the like include a hardware structure and / or a software module corresponding to performing each function. Those skilled in the art should easily realize that, in combination with the units and algorithm steps of each example described in the embodiments disclosed herein, the embodiments of the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is performed by hardware or computer software-driven hardware depends on the specific application of the technical solution and design constraints. Professional technicians can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of the embodiments of the present application.

In the embodiment of the present application, functional modules may be divided into the foregoing terminals and the like according to the foregoing method examples. For example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The above integrated modules may be implemented in the form of hardware or software functional modules. It should be noted that the division of the modules in the embodiments of the present application is schematic, and is only a logical function division. In actual implementation, there may be another division manner.

In a case where each functional module is divided according to each function, FIG. 13 shows a possible structural diagram of a terminal involved in the foregoing embodiment. The terminal 1300 includes: a storage unit 1301, an input unit 1302, and a text check. The unit 1303, the voiceprint verification unit 1304, and the update unit 1305. The storage unit 1301 stores a preset wake-up word registered in the terminal 1300 and a first voiceprint model. The first voiceprint model is used for voiceprint verification when the voice assistant is woken up. The first voiceprint model represents the voiceprint characteristics of the preset wake word.

The input unit 1302 is used to support the terminal 1300 to perform S401, S502, S801, and S902 in the foregoing method embodiments, and / or other processes used in the technology described herein. The text verification unit 1303 is configured to support the terminal 1300 to perform S402, S802, and S805 in the foregoing method embodiments, and / or other processes used in the technology described herein. The voiceprint verification unit 1304 is configured to support the terminal 1300 to perform S403, S803 in the foregoing method embodiments, and / or other processes used in the technology described herein. The update unit 1305 is configured to support the terminal 1300 to perform S404, S603, and S807 in the foregoing method embodiments, and / or other processes used in the technology described herein.

Further, the terminal 1300 may further include: a starting unit and a determining unit. The initiating unit is configured to support the terminal 1300 to perform S501, S804 in the foregoing method embodiments, and / or other processes used in the technology described herein. The determining unit is configured to support the terminal 1300 to perform S503 in the foregoing method embodiment, and / or other processes used in the technology described herein.

Further, as shown in FIG. 14, the terminal 1300 may further include: an identity authentication unit 1306. The identity authentication unit 1306 is configured to support the terminal 1300 to perform user identity verification on the user. For example, the identity authentication unit 1306 is configured to support the terminal 1300 to perform S903 in the foregoing method embodiment, and / or other processes used in the technology described herein.

Further, the terminal 1300 may further include a display unit. The display unit is configured to support the terminal 1300 to execute S901 in the foregoing method embodiment, and / or other processes used in the technology described herein.

Further, the terminal 1300 may further include a replacement unit and a generation unit. The replacement unit is configured to support the terminal 1300 to perform S601 in the foregoing method embodiment, and / or other processes used in the technology described herein. The generating unit is configured to support the terminal 1300 to perform S602, S701 in the foregoing method embodiment, and / or other processes used in the technology described herein.

Further, the terminal 1300 may further include: a deleting unit. The deleting unit is configured to support the terminal 1300 to perform S405, S703, and S808 in the foregoing method embodiments, and / or other processes used in the technology described herein.

Further, the terminal 1300 may further include a judging unit. The judging unit is configured to support the terminal 1300 to execute S702 and S806 in the foregoing method embodiments, and / or other processes used in the technology described herein.

Wherein, all relevant content of each step involved in the above method embodiment can be referred to the functional description of the corresponding functional module, which will not be repeated here.

Of course, the terminal 1300 includes, but is not limited to, the unit modules listed above. For example, the terminal 300 may further include a receiving unit and a transmitting unit. The receiving unit is used to receive data or instructions sent by other terminals. The sending unit is used to send data or instructions to other terminals. In addition, the functions that can be implemented by the above functional units also include, but are not limited to, the functions corresponding to the method steps described in the above examples. For detailed descriptions of other units of the terminal 1300, refer to the detailed description of the corresponding method steps. Examples are not repeated here.

In the case of using an integrated unit, FIG. 15 shows a possible structural diagram of a terminal involved in the foregoing embodiment. The terminal 1500 includes a processing module 1501, a storage module 1502, and a display module 1503. The processing module 1501 is configured to control and manage the actions of the terminal 1500. The display module 1503 is configured to display an image generated by the processing module 1501. The storage module 1502 is configured to store program codes and data of the terminal. For example, the storage module 1502 stores a preset wake-up word registered in the terminal and a first voiceprint model, where the first voiceprint model is used to perform voiceprint verification when the voice assistant is woken up, and the first The voiceprint model characterizes the voiceprint characteristics of the preset wake word. Optionally, the terminal 1500 may further include a communication module for supporting communication between the terminal and other network entities. For a detailed description of each unit included in the terminal 1500, reference may be made to the description in the foregoing method embodiments, and details are not described herein again.

The processing module 1501 may be a processor or a controller. For example, the processing module 1501 may be a central processing unit (CPU), a general-purpose processor, a digital signal processor (DSP), and an application-specific integrated circuit (Application-Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA), or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It may implement or execute various exemplary logical blocks, modules, and circuits described in connection with the present disclosure. The processor may also be a combination that implements computing functions, such as a combination including one or more microprocessors, a combination of a DSP and a microprocessor, and so on. The communication module may be a transceiver, a transceiver circuit, or a communication interface. The storage module 1502 may be a memory.

When the processing module 1501 is a processor (such as the processor 310 shown in FIG. 3), the communication module includes a Wi-Fi module and a Bluetooth module (such as the communication module 360 shown in FIG. 3). Communication modules such as Wi-Fi modules and Bluetooth modules can be collectively referred to as communication interfaces. The storage module 1502 is a memory (an internal memory 321 as shown in FIG. 3 and an external SD card connected to the terminal 1500 through the external memory interface 320). When the display module 1503 is a touch screen (including the display screen 394 shown in FIG. 3), the terminal provided in this embodiment of the present application may be the terminal 300 shown in FIG. 3. The processor, the communication interface, the touch screen, and the memory may be coupled together through a bus.

An embodiment of the present application further provides a computer storage medium. The computer storage medium stores computer program code. When the processor executes the computer program code, the terminal executes FIG. 4A, FIG. 5A, FIG. 6, FIG. 7, and FIG. 8. The relevant method steps in any of Figures 9, 11, and 12 implement the method in the above embodiment.

The embodiment of the present application also provides a computer program product, which causes the computer to execute FIG. 4A, FIG. 5A, FIG. 6, FIG. 7, FIG. 9, FIG. 11, FIG. 11 and FIG. 12 when the computer program product runs on the computer. The relevant method steps in any of the figures implement the method in the above embodiments.

The terminal 1300, the terminal 1500, the computer storage medium, or the computer program product provided in the embodiment of the present application are all used to execute the corresponding methods provided above. Therefore, for the beneficial effects that can be achieved, refer to the foregoing provided. The beneficial effects in the corresponding method are not repeated here.

Through the description of the above embodiments, those skilled in the art can clearly understand that, for the convenience and brevity of the description, only the division of the above functional modules is used as an example. In practical applications, the above functions can be allocated as required. Completed by different functional modules, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.

In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be divided. The combination can either be integrated into another device, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.

The units described as separate components may or may not be physically separated, and the components displayed as units may be one physical unit or multiple physical units, that is, may be located in one place, or may be distributed to multiple different places. . Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.

In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each of the units may exist separately physically, or two or more units may be integrated into one unit. The above integrated unit may be implemented in the form of hardware or in the form of software functional unit.

When the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a readable storage medium. Based on such an understanding, the technical solution of the embodiments of the present application is essentially a part that contributes to the existing technology or all or part of the technical solution may be embodied in the form of a software product that is stored in a storage medium Included are several instructions for causing a device (which can be a single-chip microcomputer, a chip, etc.) or a processor to execute all or part of the steps of the method described in the embodiments of the present application. The foregoing storage medium includes various media that can store program codes, such as a U disk, a mobile hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

The above description is only a specific implementation of this application, but the scope of protection of this application is not limited to this. Any changes or replacements within the technical scope disclosed in this application shall be covered by the scope of protection of this application. . Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims

A method for a terminal to update a wake-up voice of a voice assistant, which includes:

Receiving, by the terminal, first voice data input by a user;

Determining, by the terminal, whether the text corresponding to the first voice data matches the text of a preset wake-up word registered in the terminal;

If the text corresponding to the first voice data matches the text of the preset wake-up word, the terminal authenticates the user;

If the identity authentication is passed, the terminal updates the first voiceprint model in the terminal by using the first voice data;

The first voiceprint model is used to perform voiceprint verification when the voice assistant is awakened, and the first voiceprint model represents the voiceprint characteristics of the preset wake-up word.
The method for updating a wake-up voice of a voice assistant according to claim 1, wherein the terminal performing identity authentication on the user comprises:

The terminal uses the first voiceprint model to perform voiceprint verification on the first voice data. If the voiceprint verification is passed, the identity authentication passes.
The method according to claim 1 or 2, wherein the method further comprises:

When the identity authentication is passed, the terminal starts the voice assistant;

Receiving, by the terminal, second voice data through the voice assistant;

After the identity authentication is passed, before the terminal uses the first voice data to update the first voiceprint model in the terminal, the method further includes:

The terminal determines that the second voice data is a valid voice command.
The method for updating a wake-up voice of a voice assistant according to any one of claims 1-3, wherein the terminal includes a coprocessor and a main processor; and the terminal uses the coprocessor to monitor voice data; When the coprocessor detects the first voice data whose similarity to the preset wake-up word satisfies a preset condition, notifies the main processor to judge that the text corresponding to the first voice data is related to the first voice data Whether the terminal preset text of the wakeup word matches, and when determining that the text corresponding to the first voice data matches the text of the preset wakeup word, the main process uses the first voiceprint model to Voice data is checked for voiceprint.
The method for updating a wake-up voice of a voice assistant according to claim 1, wherein before the terminal authenticates the user, the method comprises:

Performing, by the terminal, voiceprint verification on the first voice data using the first voiceprint model;

If the first voice data fails the voiceprint check, the terminal performs a text check on the voice data received within the first preset time;

If the terminal receives the second voice data and at least one voice data that matches the text of the preset wake-up word within the first preset time, the terminal authenticates the user;

The text corresponding to the second voice data includes a preset keyword.
The method for updating a wake-up voice of a voice assistant according to claim 5, wherein the terminal performing identity authentication on the user comprises:

The terminal displays an authentication interface;

Receiving, by the terminal, the authentication information input by the user on the authentication interface;

The terminal performs user identity verification on the user according to the identity verification information.
The method for updating a wake-up voice of a voice assistant according to claim 5 or 6, wherein the terminal comprises a coprocessor and a main processor; the terminal uses the coprocessor to monitor voice data; and when the coprocessor When the processor detects the first voice data whose similarity with the preset wake-up word satisfies a preset condition, notifies the main processor to judge that the text corresponding to the first voice data and the terminal preset wake-up Whether the text of the word matches, and when determining that the text corresponding to the first voice data matches the text of the preset wake-up word, the main process uses the first voiceprint model to voice the first voice data Pattern check

Monitoring, by the terminal, the voice data in the first preset time using the coprocessor; and notifying the main processor to determine whether the voice data received in the first preset time includes second voice data and at least A piece of speech data matching the text of the preset wake-up word, and the text corresponding to the second speech data contains a preset keyword.
The method for updating a wake-up voice of a voice assistant according to any one of claims 1-7, wherein the preset wake-up word includes at least two registered voice data, and the at least two registered voice data are Recorded by the terminal when registering the preset wake-up word, the first voiceprint model is generated based on the at least two registered voice data;

The updating the first voiceprint model in the terminal by using the first voice data includes:

The terminal uses the first voice data to replace the third voice data in the at least two registered voice data to obtain updated at least two registered voice data. The signal quality parameter of the third voice data is lower than Signal quality parameters of other voice data in the at least two registered voice data;

Generating, by the terminal, a second voiceprint model according to the updated at least two registered voice data;

The terminal replaces the first voiceprint model with the second voiceprint model, and the second voiceprint model is used to characterize voiceprint features of the updated at least two registered voice data.
The method for updating a wake-up voice of a voice assistant according to claim 8, wherein the terminal further stores a first voiceprint threshold, and the first voiceprint threshold is based on the first voiceprint model. And the at least two registered voice data are generated;

After the terminal generates a second voiceprint model based on the updated at least two registered voice data, before the terminal replaces the first voiceprint model with the second voiceprint model, the method further includes: include:

Generating, by the terminal, a second voiceprint threshold according to the second voiceprint model and the updated at least two registered voice data;

The terminal replacing the first voiceprint model with the second voiceprint model includes:

If the difference between the second voiceprint threshold and the first voiceprint threshold is less than a first preset threshold, the terminal replaces the first voiceprint model with the second voiceprint model.
The method for updating a wake-up voice of a voice assistant according to claim 9, wherein the method further comprises:

If the difference between the second voiceprint threshold and the first voiceprint threshold is greater than or equal to the first preset threshold, the terminal deletes the second voiceprint model and the first voice data.
The method for updating a wake-up voice of a voice assistant according to any one of claims 1-10, wherein the terminal updates the first voiceprint model in the terminal by using the first voice data, comprising: :

If the signal quality parameter of the first voice data is higher than a second preset threshold, the terminal uses the first voice data to update the first voiceprint model;

The signal quality parameter of the first voice data includes a signal-to-noise ratio of the first voice data.
A terminal, wherein the terminal includes: a processor, a memory, and a display; the memory, the display, and the processor are coupled; the display is configured to display an image generated by the processor; and The memory is configured to store computer program code, related information of a voice assistant, a preset wake-up word registered in the terminal, and a first voiceprint model; the computer program code includes computer instructions, and when the processor executes the computer instructions, ,

The processor is configured to receive first voice data input by a user; determine whether the text corresponding to the first voice data matches the text of the preset wake-up word; if the text corresponding to the first voice data matches the If the text matching of the preset wake word is performed, the user is authenticated; if the authentication is passed, the first voice data model stored in the memory is updated by using the first voice data;

Wherein, the first voiceprint model is used to perform voiceprint verification when the voice assistant is awakened, and the first voiceprint model characterizes the voiceprint characteristics of the preset wakeup word.
The terminal according to claim 12, wherein the processor, configured to perform identity authentication on a user, comprises:

The processor is configured to perform voiceprint verification on the first voice data using the first voiceprint model, and if the voiceprint verification is passed, the identity authentication is passed.
The terminal according to claim 12 or 13, wherein the processor is further configured to start the voice assistant when the identity authentication is passed; and receive second voice data through the voice assistant;

The processor is further configured to determine that the second voice data is a valid voice command after the identity authentication is passed and before the first voiceprint model is updated with the first voice data.
The terminal according to any one of claims 12 to 14, wherein the processor includes a coprocessor and a main processor; the coprocessor is used to monitor voice data; and when the coprocessor monitors that When the first voice data whose similarity with the preset wake-up word satisfies a preset condition, notify the main processor to determine whether the text corresponding to the first voice data and the text of the terminal preset wake-up word are Matching. When it is determined that the text corresponding to the first voice data matches the text of the preset wake-up word, the main process uses the first voiceprint model to perform voiceprint verification on the first voice data.
The terminal according to claim 12, wherein the processor is further configured to perform voiceprint correction on the first voice data using the first voiceprint model before performing identity authentication on the user. If the first voice data does not pass the voiceprint verification, perform text verification on the voice data received within the first preset time; if the processor receives the first voice data within the first preset time Two voice data and at least one voice data that matches the text of the preset wake-up word, then authenticate the user;

The text corresponding to the second voice data includes a preset keyword.
The terminal according to claim 16, wherein the processor, configured to perform identity authentication on the user, comprises:

The processor is configured to control the display to display an authentication interface; receive authentication information input by a user on the authentication interface displayed on the display; and perform user authentication on the user according to the authentication information.
The terminal according to claim 16 or 17, wherein the processor comprises a coprocessor and a main processor; the coprocessor is used to monitor voice data; when the coprocessor detects When the first voice data whose similarity of the wake word meets a preset condition is notified, the main processor is notified to determine whether the text corresponding to the first voice data matches the text of the preset wake word of the terminal, and When the text corresponding to the first voice data matches the text of the preset wake-up word, the main process uses the first voiceprint model to perform voiceprint verification on the first voice data;

The coprocessor is further configured to monitor the voice data in the first preset time; and notify the main processor to determine whether the voice data received in the first preset time includes second voice data and at least one Voice data matching the text of the preset wake-up word, and the text corresponding to the second voice data contains a preset keyword.
The terminal according to any one of claims 12 to 18, wherein the preset wake-up word stored in the memory includes at least two registered voice data, and the at least two registered voice data are the processing Recorded by the processor when registering the preset wake-up word, the first voiceprint model is generated by the processor according to the at least two registered voice data;

The updating the first voiceprint model by using the first voice data includes:

The processor is configured to use the first voice data to replace the third voice data in the at least two registered voice data, to obtain updated at least two registered voice data, and a signal of the third voice data Quality parameters are lower than signal quality parameters of other voice data in the at least two registered voice data; generating a second voiceprint model according to the updated at least two registered voice data; and using the second voiceprint model to replace The first voiceprint model and the second voiceprint model are used to characterize voiceprint features of the updated at least two registered voice data.
The terminal according to claim 19, wherein a first voiceprint threshold is further stored in the memory, and the first voiceprint threshold is based on the first voiceprint model and the at least Generated by two registered voice data;

The processor is further configured to, after generating the second voiceprint model according to the updated at least two registered voice data, before using the second voiceprint model to replace the first voiceprint model, Generating a second voiceprint threshold according to the second voiceprint model and the updated at least two registered voice data;

The processor configured to replace the first voiceprint model with the second voiceprint model includes:

The processor is configured to use the second voiceprint model to replace the first voiceprint model if a difference between the second voiceprint threshold and the first voiceprint threshold is less than a first preset threshold.
The terminal according to claim 20, wherein the processor is further configured to: if a difference between the second voiceprint threshold and the first voiceprint threshold is greater than or equal to the first preset threshold , Deleting the second voiceprint model and the first voice data.
The terminal according to any one of claims 12-21, wherein the processor is configured to update the first voiceprint model in the terminal by using the first voice data, comprising:

The processor is configured to update the first voiceprint model by using the first voice data if a signal quality parameter of the first voice data is higher than a second preset threshold;

The signal quality parameter of the first voice data includes a signal-to-noise ratio of the first voice data.
A computer storage medium, characterized in that the computer storage medium includes computer instructions, and when the computer instructions are run on a terminal, the terminal causes the terminal to execute the terminal update according to any one of claims 1-11. Way to wake up the voice assistant.
A computer program product, characterized in that when the computer program product is run on a computer, the computer is caused to execute the method for updating a wake-up voice of a voice assistant by a terminal according to any one of claims 1-11.