WO2020019176A1 - Method for updating wake-up voice of voice assistant by terminal, and terminal - Google Patents

Method for updating wake-up voice of voice assistant by terminal, and terminal Download PDF

Info

Publication number
WO2020019176A1
WO2020019176A1 PCT/CN2018/096917 CN2018096917W WO2020019176A1 WO 2020019176 A1 WO2020019176 A1 WO 2020019176A1 CN 2018096917 W CN2018096917 W CN 2018096917W WO 2020019176 A1 WO2020019176 A1 WO 2020019176A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice data
terminal
voice
voiceprint
wake
Prior art date
Application number
PCT/CN2018/096917
Other languages
French (fr)
Chinese (zh)
Inventor
许军
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2018/096917 priority Critical patent/WO2020019176A1/en
Priority to CN201880089912.7A priority patent/CN111742361B/en
Publication of WO2020019176A1 publication Critical patent/WO2020019176A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Definitions

  • the embodiments of the present application relate to the technical field of voice control, and in particular, to a method and a terminal for updating a wake-up voice of a voice assistant by a terminal.
  • Voice assistant is an important application for mobile phones. Voice assistants can intelligently interact with users for intelligent dialogue and instant Q & A. In addition, the voice assistant can also recognize the user's voice command and cause the mobile phone to execute the event corresponding to the voice command. For example, if the voice assistant receives and recognizes the voice command "make a call to Bob" input by the user, the mobile phone can automatically make a call to the contact Bob.
  • the voice assistant is dormant. Before users want to use the voice assistant, they can wake up the voice assistant. Before performing the voice wake-up, the user needs to register a wake-up word (ie, wake-up voice) in the mobile phone to wake up the voice assistant.
  • the mobile phone can generate a voiceprint model that can characterize the voiceprint of the wakeword according to the wakeword input by the user.
  • the voice wake-up process may include: the mobile phone monitors voice data through a low-power digital signal processor (Digital Signal Processing, DSP). When the DSP detects that the similarity between the voice data and the awake word satisfies a certain condition, the DSP delivers the monitored voice data to an Application Processor (AP). The AP performs text verification and voiceprint verification on the voice data to determine whether the voice data matches the generated voiceprint model. When the voice data matches the voiceprint model, the phone can start the voice assistant.
  • DSP Digital Signal Processing
  • AP Application Processor
  • the AP performs text verification and voiceprint verification on the voice data
  • the wake-up word is rarely re-registered (ie, updated).
  • the wake-up words registered in the mobile phone are only the voice data recorded by the user in a certain noise scene under the current state of the body. Changes in the user's physical state and changes in the user's noise scene will affect the voice data sent by the user. Therefore, when the physical state of the user and / or the noise scene in which the user is present changes, if the wake-up word that is originally registered is still used for voice wake-up, the voice wake-up rate of the mobile phone will be reduced and the false wake-up of the mobile phone to perform voice wake-up rate.
  • Embodiments of the present application provide a method and a terminal for updating a wake-up voice of a voice assistant by a terminal, which can update a wake-up voice of the terminal in real time, thereby improving a voice wake-up rate of the terminal performing a voice wake-up and reducing a false wake-up rate.
  • an embodiment of the present application provides a method for a terminal to update a wake-up voice of a voice assistant.
  • the method may include: the terminal receives the first voice data input by the user; the terminal judges whether the text corresponding to the first voice data matches the text of a preset wake-up word registered in the terminal; if the text corresponding to the first voice data matches the preset wake-up If the text of the word matches, the terminal authenticates the user. If the identity authentication is passed, the terminal uses the first voice data to update the first voiceprint model in the terminal.
  • the first voiceprint model is used to perform voiceprint verification when the voice assistant is awakened, and the first voiceprint model represents the voiceprint characteristics of the preset wakeup word.
  • the first voice data is a wake-up of the voice assistant sent by the user who passed the identity authentication. voice.
  • the first voice data since the first voice data is user voice data acquired by the terminal in real time; therefore, the first voice data may reflect a user's physical state and / or a real-time condition of a noise scene in which the user is located.
  • using the first voice data to update the voiceprint model of the terminal can increase the voice wake-up rate and reduce the false wake-up rate when the terminal performs voice wake-up.
  • the first voice data is automatically acquired by the terminal during the voice wake-up process performed by the terminal, instead of prompting the user to manually re-register the wake-up word and receiving user input.
  • using the first voice data to update the voiceprint model can also simplify the process of updating the wake word.
  • the terminal performs identity authentication on the user. Specifically, the terminal uses the first voiceprint model to perform voiceprint verification on the first voice data. If the first voice data passes the voiceprint verification, it means that the identity authentication is passed.
  • the terminal may obtain first voice data that passes text verification and voice print verification when the terminal performs voice wake-up. Then, the first voiceprint model in the terminal is updated by using the first voice data.
  • the first voice data is user voice data acquired by the terminal in real time; therefore, the first voice data may reflect a user's physical state and / or a real-time condition of a noise scene in which the user is located.
  • the first voice data passes the text check and voiceprint check; therefore, updating the voiceprint model of the terminal by using the first voice data can improve the voice wake-up rate and reduce the false wake-up rate of the terminal performing voice wake-up.
  • the terminal may start a voice assistant. After the voice assistant is started, the terminal may receive valid voice commands or may not receive valid voice commands through the voice assistant. The terminal may determine whether to use the first voice data to update the first voiceprint model by determining whether the terminal has received a valid voice command.
  • the method in the embodiment of the present application further includes: when identity authentication is passed, the terminal starts a voice assistant; the terminal receives the second voice data through the voice assistant; and the terminal determines that the second voice data is a valid voice command. In this way, after the identity authentication is passed, if the terminal determines that the second voice data is a valid voice command, the terminal may use the first voice data to update the first voiceprint model in the terminal.
  • the terminal uses the first voice data to update the first voiceprint model in the terminal only after the voice assistant is activated and receives a valid voice command for triggering the terminal to perform a corresponding function. If the terminal's voice assistant starts and receives a valid voice command, it means that the voice wake-up is a valid voice wake-up that matches the user's intention.
  • the voiceprint model of the terminal is updated by using the voice data that can reflect the user's true intentions and can successfully wake up the terminal, which can further increase the voice wake-up rate of the terminal to perform voice wake-up and reduce the false wake-up rate.
  • the terminal includes a coprocessor and a main processor; the terminal uses the coprocessor to monitor voice data; when the coprocessor detects that the similarity with the preset wake-up word satisfies the pre- When the first voice data is set, the main processor is notified to determine whether the text corresponding to the first voice data matches the text of the preset wake-up word of the terminal, and determines whether the text corresponding to the first voice data and the text of the preset wake-up word are determined. When matching, the main process uses the first voiceprint model to perform voiceprint verification on the first voice data.
  • the coprocessor is a DSP and the main processor is an AP.
  • the terminal may use the first voiceprint model to perform voiceprint verification on the first voice data; if the first voice data fails, Voiceprint verification, the terminal performs text verification on the voice data received within the first preset time; if the terminal receives the second voice data and at least one text with the preset wake-up word within the first preset time For the matched voice data, the terminal authenticates the user.
  • the text corresponding to the second voice data includes a preset keyword.
  • the second voice data may be voice data in which the user complains that the voice wake-up fails, such as "how to wake up", “how not”, “not responding", “unable to wake up", and "voice wake up failed".
  • the terminal finds that the voiceprint verification of the first voice data fails. Subsequently, the terminal can receive at least one voice data that passes the text verification within the first preset time, which means that the user repeatedly wants to voice wake up the voice assistant of the terminal, but the voice wake up fails. In this case, if the terminal also receives the second voice data within the first preset time, it indicates that the user is dissatisfied with the result of the voice wake-up failure.
  • the terminal receives the second voice data and at least one voice data that has passed the text verification within the first preset time, indicating that the user has a strong willingness to wake up the voice assistant by voice; however, it may be because the user's current physical state and the user registered the wake word The difference in the physical state of the body is large, resulting in multiple speech failures. Because the received first voice data is voice data sent by the user for voice wake-up of the voice assistant under the strong will of the voice assistant of the voice wake-up terminal. Therefore, updating the voiceprint model of the terminal with voice data that can reflect the user's true intention can further increase the voice wake-up rate and reduce the false wake-up rate when the terminal performs voice wake-up.
  • the first voice data is voice data of the user acquired by the terminal in real time; therefore, the first voice data may reflect a user's physical state and / or a real-time condition of a noise scene in which the user is located. Therefore, using the first voice data to update the voiceprint model of the terminal can improve the voice wake-up rate and reduce the false wake-up rate when the terminal performs voice wake-up. Further, the received first voice data is obtained automatically by the terminal during the voice wake-up process performed by the terminal, instead of prompting the user to manually re-register the wake-up word and receiving user input. In this way, updating the voiceprint model by using the received first voice data can also simplify the process of updating the wake word.
  • the terminal authenticates the user, including: the terminal displays an authentication interface; the terminal receives the authentication information entered by the user on the authentication interface; the terminal authenticates the user based on the authentication information Perform user authentication.
  • the terminal includes a coprocessor and a main processor; the terminal uses the coprocessor to monitor voice data; when the coprocessor detects that the similarity with the preset wake-up word satisfies the pre- When the first voice data is set, the main processor is notified to determine whether the text corresponding to the first voice data matches the text of the preset wake-up word of the terminal, and when it is determined that the text corresponding to the first voice data matches the text of the preset wake-up word The main process uses the first voiceprint model to perform voiceprint verification on the first voice data.
  • the terminal uses the coprocessor to monitor the voice data in the first preset time; and notifies the main processor to determine whether the voice data received in the first preset time includes the second voice data and at least one that matches the text of the preset wake-up word.
  • the voice data, and the text corresponding to the second voice data contains preset keywords.
  • the coprocessor is a DSP and the main processor is an AP.
  • the preset wake-up word includes at least two registered voice data, and at least two of the registered voice data are recorded when the terminal registers the preset wake-up word, the first sound The pattern is generated based on at least two registered speech data.
  • the terminal After the terminal generates a new voiceprint model according to the first voice data, if the first voiceprint model is directly replaced with the new voiceprint model, the voice wakeup rate of the terminal performing voice wakeup can be improved.
  • directly replacing the first voiceprint model with a voiceprint model generated based on the new voice data ie, the first voice data
  • greatly increasing the voice wake-up rate may correspondingly increase the false wake-up rate of the terminal performing voice wake-up.
  • the method for the terminal to update the first voiceprint model in the terminal by using the first voice data may include: the terminal uses the first voice data to replace the third voice data in the at least two registered voice data to obtain at least two updated registrations The signal quality parameters of the voice data and the third voice data are lower than the signal quality parameters of other voice data in the at least two registered voice data; the terminal generates a second voiceprint model according to the updated at least two registered voice data; the terminal uses the first The two voiceprint models replace the first voiceprint model.
  • the second voiceprint model is used to characterize the voiceprint features of the at least two registered voice data after the update.
  • the terminal uses the first voice data to replace part of the voice data in the at least two registered voice data, such as the third voice data; instead of generating the second voiceprint model completely based on the first voice data.
  • the voice wake-up rate of the terminal performing voice wake-up can be relatively stably improved.
  • the false wake-up rate of the terminal performing voice wake-up can be reduced.
  • the terminal may generate a second voiceprint threshold according to the second voiceprint model and the updated at least two registered voice data; if the difference between the second voiceprint threshold and the first voiceprint threshold is less than the first preset threshold , The terminal will replace the first voiceprint model with the second voiceprint model.
  • the terminal may delete the second voiceprint model and the first voice data, that is, the first voiceprint model is not used to replace the second voiceprint model.
  • the large difference between the second voiceprint threshold and the first voiceprint threshold can prevent the wake-up rate of the terminal from performing a voice wakeup to fluctuate greatly, affecting the user experience.
  • the terminal may first update the first voiceprint model with the first voice data. It is determined whether a signal quality parameter of the first voice data is higher than a second preset threshold.
  • the signal quality parameters of the voice data are used to characterize the signal quality of the voice data.
  • the signal quality parameter of the voice data may be a signal-to-noise ratio of the voice data. If the signal quality parameter of the first voice data is higher than the second preset threshold, it means that the signal quality of the first voice data is relatively high.
  • the terminal may update the first voiceprint model by using the first voice data. If the signal quality parameter of the first voice data is lower than or equal to the second preset threshold, the terminal may delete the first voice data.
  • an embodiment of the present application provides a terminal.
  • the terminal includes a storage unit, an input unit, a text verification unit, an identity authentication unit, and an update unit.
  • the storage unit stores a preset wake-up word registered in the terminal and a first voiceprint model.
  • the first voiceprint model is used to perform voiceprint verification when the voice assistant is awakened, and the first voiceprint model represents the voiceprint characteristics of a preset wakeup word.
  • the input unit is configured to receive first voice data input by a user.
  • the text verification unit is configured to determine whether the text corresponding to the first voice data matches the text of a preset wake-up word registered in the terminal.
  • An identity authentication unit is configured to authenticate the user if the text verification unit determines that the text corresponding to the first voice data matches the text of the preset wake-up word.
  • the updating unit is configured to: if the identity authentication unit determines that the identity authentication is passed, the terminal uses the first voice data to update the first voiceprint model in the terminal.
  • the identity authentication unit is specifically configured to: use the first voiceprint model to perform voiceprint verification on the first voice data; if the voiceprint verification is passed, the identity authentication passes .
  • the terminal further includes: a starting unit and a determining unit.
  • the starting unit is configured to start the voice assistant when the identity authentication unit determines that the identity authentication is passed.
  • the input unit is further configured to receive the second voice data through a voice assistant.
  • the determining unit is configured to determine that the second voice data received by the input unit is a valid voice command after the identity authentication unit passes the identity authentication.
  • An updating unit is configured to update the first voiceprint model with the first voice data after the determining unit determines that the second voice data is a valid voice command.
  • the terminal further includes: a voiceprint verification unit.
  • the voiceprint verification unit is configured to perform voiceprint verification on the first voice data using the first voiceprint model before the identity authentication unit authenticates the user.
  • the text verification unit is further configured to perform text verification on the voice data received by the input unit within a first preset time if the voiceprint verification unit determines that the first voice data fails the voiceprint verification.
  • the identity authentication unit is specifically configured to: if the text verification unit determines that the input unit receives the second voice data and at least one voice data that matches the text of the preset wake-up word within the first preset time, authenticate the user.
  • the text corresponding to the second voice data includes a preset keyword.
  • the foregoing terminal further includes: a display unit.
  • the display unit is configured to display an authentication interface if the text verification unit determines that the input unit receives the second voice data and at least one voice data that matches the text of the preset wake-up word within the first preset time.
  • the input unit is further configured to receive authentication information input by a user on an authentication interface displayed on the display unit.
  • the identity authentication unit is specifically configured to perform user identity verification on the user according to the identity verification information received by the input unit.
  • the preset wake-up word includes at least two registered voice data, at least two of the registered voice data are recorded when the terminal registers the preset wake-up word, and the first voiceprint model It is generated based on at least two registered voice data.
  • the terminal also includes: a replacement unit and a generation unit.
  • the replacement unit is configured to replace the third voice data of the at least two registered voice data with the first voice data to obtain the updated at least two registered voice data.
  • the signal quality parameter of the third voice data is lower than the at least two registered voice data. Signal quality parameters of other voice data in the voice data.
  • a generating unit is configured to generate a second voiceprint model according to the updated at least two registered voice data obtained by the replacement unit.
  • the updating unit is configured to replace the first voiceprint model with the second voiceprint model generated by the generating unit, and the second voiceprint model is used to characterize the voiceprint features of the updated at least two registered voice data.
  • the storage unit is configured to save a first voiceprint threshold, and the first voiceprint threshold is generated by the generating unit according to the first voiceprint model and at least two registered voice data.
  • the generating unit is further configured to generate a second voiceprint model, and before the updating unit replaces the first voiceprint model with the second voiceprint model, generate a first voiceprint model according to the second voiceprint model and the updated at least two registered voice data.
  • the updating unit is specifically configured to use a second voiceprint model to replace the first voiceprint model if the difference between the second voiceprint threshold and the first voiceprint threshold generated by the generating unit is less than the first preset threshold.
  • the foregoing terminal further includes: a deleting unit.
  • the deleting unit is configured to delete the second voiceprint model and the first voice data if the difference between the second voiceprint threshold and the first voiceprint threshold generated by the generating unit is greater than or equal to the first preset threshold.
  • the update unit is specifically configured to update the first voiceprint model by using the first voice data if the signal quality parameter of the first voice data is higher than a second preset threshold.
  • the signal quality parameter of the first voice data includes a signal-to-noise ratio of the first voice data.
  • an embodiment of the present application provides a terminal.
  • the terminal may include a processor, a memory, and a display.
  • the memory, display and processor are coupled.
  • the display is used to display images generated by the processor.
  • the memory is used to store computer program code, related information of the voice assistant, preset wake-up words registered in the terminal, and the first voiceprint model.
  • the computer program code includes computer instructions.
  • the processor executes the computer instructions, the processor is configured to receive the first voice data input by the user; determine whether the text corresponding to the first voice data matches the text of the preset wake-up word; The text corresponding to a voice data matches the text of the preset wake-up word, and then the user is authenticated. If the authentication is passed, the first voice data model is updated by using the first voice data.
  • the first voiceprint model is used to perform voiceprint verification when the voice assistant is awakened, and the first voiceprint model represents the voiceprint characteristics of the preset wakeup word.
  • the foregoing processor may be further configured to perform voiceprint verification on the first voice data using the first voiceprint model. Among them, if the voiceprint verification is passed, the identity authentication is passed.
  • the foregoing processor may also be used to start a voice assistant when the identity authentication is passed; receive the second voice data through the voice assistant; after the identity authentication is passed, determine the first The second voice data is a valid voice command. After determining that the second voice data is a valid voice command, the first voice data model in the terminal is updated with the first voice data.
  • the processor includes a coprocessor and a main processor; the coprocessor voice monitors voice data; and when the coprocessor monitors the similarity with the preset wake word
  • the main processor is notified to determine whether the text corresponding to the first voice data matches the text of the preset wake-up word of the terminal, and determines that the text corresponding to the first voice data and the text of the preset wake-up word
  • the main process uses the first voiceprint model to perform voiceprint verification on the first voice data.
  • the processor is further configured to perform voiceprint verification on the first voice data using the first voiceprint model before performing user identity authentication; if the first The voice data does not pass the voiceprint verification.
  • Text verification is performed on the voice data received within the first preset time; if the second voice data and at least one text with the preset wake-up word are received within the first preset time Match the voice data to authenticate the user.
  • the text corresponding to the second voice data includes a preset keyword.
  • the processor is further configured to, if the second voice data is received within the first preset time and at least one voice data that matches the text of the preset wake-up word , Then control the display to display the authentication interface.
  • the processor is further configured to receive the authentication information input by the user on the authentication interface displayed on the display; and perform user authentication on the user according to the authentication information.
  • the foregoing processor includes a coprocessor and a main processor; the coprocessor monitors voice data; when the coprocessor detects that the similarity with the preset wake-up word satisfies the pre- When the conditional first voice data is set, the main processor is notified to determine whether the text corresponding to the first voice data matches the text of the preset wake-up word of the terminal.
  • the main process uses the first voiceprint model to perform voiceprint verification on the first voice data.
  • the coprocessor monitors the voice data in the first preset time; notifies the main processor to determine whether the voice data received in the first preset time includes the second voice data and at least one voice data that matches the text of the preset wake-up word
  • the text corresponding to the second voice data contains a preset keyword.
  • the preset wake-up word stored in the memory includes at least two registered voice data, and at least two of the registered voice data are recorded when the processor registers the preset wake-up word.
  • a voiceprint model is generated by the processor based on at least two registered voice data.
  • the processor is further configured to use the first voice data to replace the third voice data in the at least two registered voice data to obtain updated at least two registered voice data, and the signal quality parameter of the third voice data is lower than at least two Signal quality parameters of other voice data in the registered voice data; generating a second voiceprint model based on the updated at least two registered voice data; replacing the first voiceprint model with the second voiceprint model, and using the second voiceprint model with To characterize the voiceprint features of the at least two registered voice data after the update.
  • a first voiceprint threshold is also stored in the memory, and the first voiceprint threshold is generated by the processor according to the first voiceprint model and at least two registered voice data.
  • the processor is further configured to generate a second voiceprint model and replace the first voiceprint model with the second voiceprint model, and generate a second voiceprint model according to the second voiceprint model and the updated at least two registered voice data.
  • Voiceprint threshold if the difference between the second voiceprint threshold and the first voiceprint threshold is less than the first preset threshold, the second voiceprint model is used to replace the first voiceprint model.
  • the processor is further configured to delete the second voice if the difference between the second voiceprint threshold and the first voiceprint threshold is greater than or equal to the first preset threshold. Pattern and first speech data.
  • the processor is further configured to update the first voiceprint model by using the first voice data if the signal quality parameter of the first voice data is higher than a second preset threshold.
  • the signal quality parameter of the first voice data includes a signal-to-noise ratio of the first voice data.
  • an embodiment of the present application provides a computer storage medium, where the computer storage medium includes computer instructions, and when the computer instructions are run on a terminal, the terminal is caused to execute the same as the first aspect and any of the possibilities. Designed in the way described.
  • an embodiment of the present application provides a computer program product, and when the computer program product runs on a computer, the computer is caused to execute the method according to the first aspect and any one of possible design manners.
  • FIG. 1 is a first schematic diagram of a display interface example of a terminal according to an embodiment of the present application
  • FIG. 2 is a second schematic diagram of a display interface example of a terminal according to an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a hardware structure of a terminal according to an embodiment of the present application.
  • 4A is a first flowchart of a method for a terminal to update a wake-up voice of a voice assistant according to an embodiment of the present application
  • FIG. 4B is a third schematic diagram of a display interface example of a terminal according to an embodiment of the present application.
  • 5A is a second flowchart of a method for a terminal to update a wake-up voice of a voice assistant according to an embodiment of the present application
  • FIG. 5B is a fourth schematic view of an example of a display interface of a terminal according to an embodiment of the present application.
  • FIG. 6 is a third flowchart of a method for a terminal to update a wake-up voice of a voice assistant according to an embodiment of the present application
  • FIG. 7 is a fourth flowchart of a method for a terminal to update a wake-up voice of a voice assistant according to an embodiment of the present application
  • FIG. 8 is a fifth flowchart of a method for a terminal to update a wake-up voice of a voice assistant according to an embodiment of the present application
  • FIG. 9 is a flowchart of a method for a terminal to update a wake-up voice of a voice assistant according to an embodiment of the present application.
  • FIG. 10 is a fifth schematic diagram of a display interface example of a terminal according to an embodiment of the present application.
  • 11 is a flowchart VII of a method for a terminal to update a wake-up voice of a voice assistant according to an embodiment of the present application;
  • FIG. 12 is a flowchart of a method for updating a wake-up voice of a voice assistant provided by a terminal according to an embodiment of the present application
  • FIG. 13 is a first schematic structural composition diagram of a terminal according to an embodiment of the present application.
  • FIG. 14 is a second schematic diagram of the structure and composition of a terminal according to an embodiment of the present application.
  • FIG. 15 is a third schematic structural diagram of a terminal according to an embodiment of the present application.
  • Embodiments of the present application provide a method and a terminal for updating a wake-up voice of a voice assistant by a terminal, which can be applied to a process in which the terminal performs a voice wake-up in response to voice data input by a user.
  • the terminal Before performing the voice wake-up, the terminal may receive a preset wake-up word registered by the user.
  • the preset wake-up word is used to wake up the voice assistant in the terminal, so that the terminal can provide the user with voice control services through the voice assistant.
  • the wake-up voice assistant described in the embodiments of the present application means that the terminal starts the voice assistant in response to the voice data sent by the user.
  • the voice control service means that after the terminal's voice assistant is started, the user can trigger the terminal to execute a corresponding event by sending a voice command (ie, voice data) to the voice assistant.
  • the preset wake-up word in the embodiment of the present application is a piece of voice data.
  • the voice data is a wake-up voice used to wake up the voice assistant.
  • the voice assistant may be an application (Application, APP) installed in the terminal.
  • the voice assistant may be an embedded application in the terminal (that is, a system application of the terminal) or a downloadable application.
  • the embedded application program is an application program provided as part of a terminal (such as a mobile phone) implementation.
  • the embedded application can be a Settings application, a Short Message application, a Camera application, and so on.
  • a downloadable application is an application that can provide its own Internet Protocol Multimedia Subsystem (IMS) connection.
  • the downloadable application can be an application installed in the terminal in advance or can be downloaded and installed by the user in the terminal.
  • Third-party applications in.
  • the downloadable application may be a "WeChat” application, an "Alipay” application, a "Mail” application, and the like.
  • the mobile phone 100 shown in FIG. 1 is used as an example to describe the process of registering a preset wake-up word by a terminal:
  • the mobile phone 100 can receive a user's click operation (such as a click operation) on the "Settings” application icon.
  • a user's click operation such as a click operation
  • the mobile phone 100 may display the setting interface 101 shown in (a) in FIG. 1.
  • the setting interface 101 may include a "airplane mode” option, a "WLAN” option, a “Bluetooth” option, a “mobile network” option, a “smart assistance” option 102, and the like.
  • airplane mode” option, the "WLAN” option, the “Bluetooth” option, and the “mobile network” option reference may be made to specific descriptions in conventional technologies, which are not described herein in the embodiment of the present application.
  • the mobile phone 100 may receive a user's click operation (such as a click operation) on the “smart assistance” option 102.
  • a user's click operation such as a click operation
  • the mobile phone 100 may display the smart assistance interface 103 shown in (b) of FIG. 1.
  • the smart assistant interface 103 includes a "gesture control” option 104 and a "voice control” option 105.
  • the “gesture control” option 104 is used to manage a user gesture that triggers the mobile phone 100 to execute a corresponding event.
  • the “voice control” option 105 is used to manage a voice wake-up function of the mobile phone 100.
  • the mobile phone 100 may receive a user's click operation on the “voice control” option 105, and the mobile phone 100 may display the voice control interface 106 shown in (c) of FIG. 1.
  • the voice control interface 106 includes a "voice wakeup” option 107 and an "incoming voice control” option 108.
  • the “voice wakeup” option 107 is used to enable or disable the voice wakeup function of the mobile phone 100.
  • a voice wake-up function of a terminal such as the mobile phone 100
  • the "caller voice control” option 108 is used to trigger the mobile phone 100 to enable or disable the voice wake-up function when the mobile phone 100 receives an incoming call.
  • the “call voice control” option 108 of the mobile phone 100 is turned on.
  • the mobile phone 100 receives an incoming call from another terminal and performs a call reminder, if the mobile phone 100 recognizes the voice data "answer the call" entered by the owner, the mobile phone 100 can automatically answer the call; if the mobile phone 100 recognizes the voice data entered by the owner "Hang up the phone", the mobile phone 100 can automatically reject the call.
  • the mobile phone 100 may receive a user's click operation (such as a click operation) on the "voice wakeup" option 107.
  • a user's click operation such as a click operation
  • the mobile phone 100 may display the voice wakeup interface 109 shown in (d) of FIG.
  • the voice wakeup interface 109 includes a "voice wakeup” switch 110, a "find a phone” option 111, a "how to make a call” option 112, a “wake word” option 113, and the like.
  • the “voice wakeup” switch 110 is used to trigger the mobile phone 100 to enable or disable the voice wakeup function.
  • the "Find a phone” option 111 and the "How to make a call” option 112 are used to instruct the voice control function of the mobile phone 100 after the voice assistant of the mobile phone 100 is activated.
  • the "Find a mobile phone” option 111 is used to indicate that after the voice assistant of the mobile phone 100 is activated, the voice assistant of the mobile phone 100 can respond to the user's voice data "Where are you?" To respond to the user to facilitate the user to find the mobile phone 100.
  • the "how to make a call” option 112 is used to indicate that the voice assistant of the mobile phone 100 can automatically make a call to the contact Bob in response to the user's voice data "call Bob" after the voice assistant of the mobile phone 100 is activated.
  • the “wake word” option 113 is used to register a wake up word for the mobile phone 100 to wake up the mobile phone 100 (such as the voice assistant of the mobile phone 100). Before the user has registered a custom wake-up word in the mobile phone 100, the mobile phone 100 may indicate a default wake-up word to the user. For example, it is assumed that the default wake-up word of the mobile phone 100 is "my little k".
  • the mobile phone 100 may receive a user's click operation (such as a click operation) on the "wake word” option 113 shown in (d) in FIG. 1.
  • a user's click operation such as a click operation
  • the mobile phone 100 may display the default wake word registration interface 201 shown in (a) of FIG. 2.
  • the default wakeup word registration interface 201 may include a recording progress bar 202, a "custom wakeup word” option 203, a "microphone” option 204, and a recording prompt message 205.
  • the “microphone” option 204 is used to trigger the mobile phone 100 to start recording voice data as the wake-up word.
  • the recording progress bar 202 is used to display the progress of the mobile phone 100 recording the wake-up word.
  • the recording prompt information 205 is used to indicate a default wake-up word of the mobile phone 100.
  • the recording prompt information 205 may be "Please help the mobile phone to learn the wake word (my little k), click and say‘ my little k ’”.
  • the default wake-up word registration interface 201 may further include a recording prompt message "Please record in a quiet environment, about 30 cm away from the mobile phone!.
  • the default wake-up word registration interface 201 further includes a "Cancel” button 206 and an "OK” button 207.
  • the “OK” button 207 is used to trigger the mobile phone 100 to save the recorded wake-up word.
  • the “Cancel” button 206 is used to trigger the mobile phone to cancel the registration of the wake-up word, and display the voice wake-up interface 109 shown in (d) of FIG. 1.
  • the mobile phone 100 can start recording voice data input by the user. After receiving the voice data (recorded as voice data 1) input by the user, the mobile phone 100 can determine whether the voice data 1 meets a preset condition. If the voice data 1 does not satisfy the preset condition, the mobile phone 100 may delete the voice data 1 and re-display the default wake-up word registration interface 201 shown in (a) of FIG. 2. If the voice data 1 meets a preset condition, the mobile phone 100 can save the voice data 1.
  • the voice data 1 meeting the preset condition may specifically be: the text information corresponding to the voice data 1 is the text information “my small k” of the default wake-up word, and the signal-to-noise ratio of the voice data 1 is higher than the preset Threshold.
  • the mobile phone 100 After the mobile phone 100 receives the voice data 1 that meets the preset conditions and is input by the user, it can generate a voiceprint model for voiceprint verification when the voice assistant is awakened based on the voice data 1 that meets the preset conditions, and The harmony pattern model generates a threshold for the pattern.
  • the voiceprint model can characterize the voiceprint characteristics of wake words registered by the user.
  • the voiceprint model is equivalent to a function. Different voiceprint models can be generated based on different speech data. That is, the mobile phone 100 can generate different voiceprint models according to different wake-up words registered by the same user. Different users registering the same wake-up word with the mobile phone 100 can also generate different voiceprint models.
  • the mobile phone 100 may use the voice data 1 (that is, the voice data that is input by the user when registering the wake-up word and meets the preset conditions) as an input value, and substitute it into the voiceprint model to obtain a voiceprint value (such as the voiceprint value a). ).
  • the terminal can record multiple voice data that meet preset conditions.
  • the terminal may generate a voiceprint model for performing voiceprint verification when the voice assistant is awakened based on a plurality of voice data satisfying preset conditions.
  • the mobile phone 100 may prompt the user to record the voice data again after the voice data 1 meets a preset condition, and the voice data 1 is saved.
  • the “custom wake word” option 203 is used to trigger the mobile phone 100 to display a wake word input interface.
  • the mobile phone 100 may display the wake-up word shown in (b) of FIG. 2 in response to a user's click operation (such as a click operation) on the “custom wake-up word” option 203 shown in (a) of FIG. 2.
  • the wake-up word input interface 208 may include a “cancel” button 209, an “OK” button 210, a “wake-up word input box” 211, and a wake-up word suggestion 212.
  • the “Cancel” button 209 is used to trigger the mobile phone to cancel the customized wake-up word and display the default wake-up word registration interface 201 shown in (a) of FIG.
  • the “wake word input box” 211 is used to receive a custom wake word input by a user.
  • the "OK” button 210 is used to save a custom wake-up word entered by the user in the "wake-up word input box” 211.
  • the wake-up word suggestion 212 is used to prompt the user of the mobile phone's request for a custom wake-up word.
  • the mobile phone 100 may display a custom wakeup word registration interface 213 shown in (d) of FIG. 2 in response to a user's click operation (such as a click operation) on the “OK” button 210 shown in (c) of FIG. 2. , So that the user can register a custom wake-up word on the custom wake-up word registration interface 213.
  • the method for a user to register a custom wake-up word on the custom wake-up word registration interface 213 is the same as the method for registering a default wake-up word on the default wake-up word registration interface 201, which is not described in the embodiment of the present application.
  • the mobile phone 100 may display the customized wake-up word registration interface 216 shown in (d) of FIG. 2.
  • the above-mentioned intelligent assistance may be referred to as an auxiliary function
  • the above-mentioned voice control may be referred to as a voice assistant
  • the above-mentioned voice wakeup may be referred to as a wake-up function.
  • the manner in which the user triggers the terminal to display the wake-up word registration interface includes, but is not limited to, the user's "settings-intelligent assistance-voice control-voice wake-up-wake words" "operating.
  • the manner in which the user triggers the terminal to display the wake-up word registration interface may be "settings-voice assistant-voice wake-up wake-up word".
  • the wake-up word of the mobile phone 100 is used as a default wake-up word “my little k” as an example to describe the voice wake-up process of the mobile phone 100:
  • the monitored voice data 2 may be delivered to the AP.
  • the AP performs text verification on the voice data 2.
  • the AP may use the voice data 2 as an input value and substitute it into the voiceprint model of the mobile phone 100 to obtain a voiceprint value (voiceprint value b). If the difference between the voiceprint value b and the voiceprint threshold (ie, the voiceprint value a) is less than a preset threshold, the AP may determine that the voice data 2 matches the wake-up word registered by the user.
  • some mobile phones can periodically remind the user to re-register the wake-up word.
  • the process of manually registering the wake-up word is cumbersome, and the manual registration of the wake-up word multiple times will waste the user's time and affect the user experience.
  • the terminal may obtain a valid wake-up word in the process of performing a voice wake-up, and the terminal uses the valid wake-up word to update the registered wake-up word of the user.
  • the effective wake-up word in the embodiment of the present application may include voice data of a terminal that is successfully awakened.
  • the terminal automatically obtains a valid wake-up word to update the registered wake-up word of the user, which can omit the tedious operation of the user when manually re-registering the wake-up word.
  • the principle of the method for updating the wake-up voice of the voice assistant provided by the terminal since the effective wake-up word is the voice data obtained by the terminal during the process of performing the voice wake-up; therefore, the effective wake-up word is related to the current physical state of the user and Voice data related to the noise scene that the user is currently in. And, since the effective wake-up word can successfully wake up the terminal; therefore, the degree of matching between the effective wake-up word and the wake-up word registered by the user satisfies the condition of voice wake-up.
  • the terminal uses the effective wake-up word to update the wake-up word registered by the user, and then uses the updated wake-up word to wake up the voice, it can adapt to the user's physical state and / or the noise scene in which the user is located, and further It can increase the voice wake-up rate of the mobile phone and reduce the false wake-up rate when the terminal performs voice wake-up.
  • the terminal in the embodiment of the present application may be a portable computer (such as a mobile phone), a notebook computer, a personal computer (PC), a wearable electronic device (such as a smart watch), a tablet computer, or augmented reality (AR) ⁇ Virtual reality (VR) equipment, on-board computers, and the like, the following embodiments do not specifically limit the specific form of the terminal.
  • a portable computer such as a mobile phone
  • a notebook computer such as a notebook computer
  • a wearable electronic device such as a smart watch
  • a tablet computer or augmented reality (AR) ⁇ Virtual reality (VR) equipment, on-board computers, and the like
  • AR augmented reality
  • VR Virtual reality
  • FIG. 3 shows a structural block diagram of a terminal 300 provided by an embodiment of the present application.
  • the terminal 300 may include a processor 310, an external memory interface 320, an internal memory 321, a USB interface 330, a charge management module 340, a power management module 341, a battery 342, an antenna 1, an antenna 2, a radio frequency module 350, a communication module 360, Audio module 370, speaker 370A, receiver 370B, microphone 370C, headphone interface 370D, sensor module 380, button 390, motor 391, indicator 392, camera 393, display 394, and SIM card interface 395.
  • a processor 310 may include a processor 310, an external memory interface 320, an internal memory 321, a USB interface 330, a charge management module 340, a power management module 341, a battery 342, an antenna 1, an antenna 2, a radio frequency module 350, a communication module 360, Audio module 370, speaker 370A, receiver 370B, microphone 370C, headphone interface 370D, sensor module 380
  • the sensor module can include pressure sensor 380A, gyroscope sensor 380B, barometric pressure sensor 380C, magnetic sensor 380D, acceleration sensor 380E, distance sensor 380F, proximity light sensor 380G, fingerprint sensor 380H, temperature sensor 380J, touch sensor 380K, ambient light sensor 380L, bone conduction sensor, etc.
  • the terminal 300 shown in FIG. 3 is only an example of the terminal.
  • the structure shown in FIG. 3 does not limit the terminal 300. It may include more or fewer parts than shown, or some parts may be combined, or some parts may be split, or different parts may be arranged.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the processor 310 may include one or more processing units.
  • the processor 310 may include an application processor (Application Processor), a modem processor, a graphics processor (Graphics Processing Unit, GPU), and an image signal processor. (Image Signal Processor, ISP), controller, memory, video codec, DSP, baseband processor, and / or neural network processing unit (NPU), etc.
  • ISP Application Processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • NPU neural network processing unit
  • different processing units can be independent devices or integrated in the same processor.
  • the DSP can monitor the voice data in real time.
  • the voice data can be handed over to the AP.
  • the AP performs text verification and voiceprint verification on the voice data.
  • the terminal can start the voice assistant.
  • the controller may be a decision maker that directs the various components of the terminal 300 to coordinate work according to the instructions. It is the nerve center and command center of the terminal 300.
  • the controller generates operation control signals according to the instruction operation code and timing signals, and completes the control of fetching and executing the instructions.
  • the processor 310 may further include a memory for storing instructions and data.
  • the memory in the processor is a cache memory. You can save instructions or data that the processor has just used or recycled. If the processor needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided, the processor's waiting time is reduced, and the efficiency of the system is improved.
  • the processor 310 may include an interface.
  • the interface may include an integrated circuit (Inter-Integrated Circuit, I2C) interface, an integrated circuit (Inter-Integrated Circuit, Sound, I2S) interface, a pulse code modulation (Pulse Code Modulation, PCM) interface, a universal asynchronous transceiver (Universal Asynchronous Receiver / Transmitter (UART) interface, Mobile Industry Processor Interface (MIPI), General-Purpose Input / output (GPIO) interface, Subscriber Identity Module (SIM) interface, And / or universal serial bus (Universal Serial Bus, USB) interface.
  • I2C Inter-Integrated Circuit
  • I2S Inter-Integrated Circuit, Sound, I2S
  • PCM pulse code modulation
  • PCM Pulse Code Modulation
  • UART Universal Asynchronous Receiver / Transmitter
  • MIPI Mobile Industry Processor Interface
  • GPIO General-Purpose Input / output
  • SIM Subscriber Identity Module
  • USB Universal Serial Bus
  • the I2C interface is a two-way synchronous serial bus, including a serial data line (Serial Data Line, SDA) and a serial clock line (Derail Clock Line, SCL).
  • the processor may include multiple sets of I2C buses.
  • the processor can be coupled to touch sensors, chargers, flashes, cameras, etc. through different I2C bus interfaces.
  • the processor may couple the touch sensor through the I2C interface, so that the processor and the touch sensor communicate through the I2C bus interface to implement the touch function of the terminal 300.
  • the I2S interface can be used for audio communication.
  • the processor may include multiple sets of I2S buses.
  • the processor may be coupled to the audio module through an I2S bus to implement communication between the processor and the audio module.
  • the audio module can transmit audio signals to the communication module through the I2S interface, so as to implement the function of receiving calls through a Bluetooth headset.
  • the PCM interface can also be used for audio communications, sampling, quantizing, and encoding analog signals.
  • the audio module and the communication module may be coupled through a PCM bus interface.
  • the audio module can also transmit audio signals to the communication module through the PCM interface, so as to implement the function of receiving calls through a Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication, and the sampling rates of the two interfaces are different.
  • the UART interface is a universal serial data bus for asynchronous communication. This bus is a two-way communication bus. It converts the data to be transferred between serial and parallel communications.
  • a UART interface is typically used to connect the processor and the communication module 360.
  • the processor communicates with the Bluetooth module through a UART interface to implement the Bluetooth function.
  • the audio module can transmit audio signals to the communication module through the UART interface, so as to implement the function of playing music through a Bluetooth headset.
  • the MIPI interface can be used to connect processors with peripheral devices such as displays, cameras, etc.
  • the MIPI interface includes a camera serial interface (CSI), a display serial interface (DSI), and the like.
  • the processor and the camera communicate through a CSI interface to implement a shooting function of the terminal 300.
  • the processor and the display screen communicate through a DSI interface to implement a display function of the terminal 300.
  • the GPIO interface can be configured by software.
  • the GPIO interface can be configured as a control signal or as a data signal.
  • the GPIO interface may be used to connect the processor with a camera, a display screen, a communication module, an audio module, a sensor, and the like.
  • GPIO interface can also be configured as I2C interface, I2S interface, UART interface, MIPI interface, etc.
  • the USB interface 330 may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like.
  • the USB interface can be used to connect a charger to charge the terminal 300, and can also be used to transfer data between the terminal 300 and a peripheral device. It can also be used to connect headphones and play audio through headphones. It can also be used to connect other electronic devices, such as AR devices.
  • the interface connection relationship between the modules illustrated in the embodiments of the present application is only a schematic description, and does not constitute a limitation on the structure of the terminal 300.
  • the terminal 300 may use different interface connection modes or a combination of multiple interface connection modes in the embodiments of the present application.
  • the charging management module 340 is configured to receive a charging input from a charger.
  • the charger may be a wireless charger or a wired charger.
  • the charging management module may receive a charging input of a wired charger through a USB interface.
  • the charging management module may receive a wireless charging input through a wireless charging coil of the terminal 300. While the charging management module is charging the battery, it can also supply power to the terminal device through the power management module 341.
  • the power management module 341 is used to connect the battery 342, the charge management module 340, and the processor 310.
  • the power management module receives inputs from the battery and / or charge management module, and supplies power to a processor, an internal memory, an external memory, a display screen, a camera, and a communication module.
  • the power management module can also be used to monitor battery capacity, battery cycle times, battery health (leakage, impedance) and other parameters.
  • the power management module 341 may also be disposed in the processor 310.
  • the power management module 341 and the charge management module may also be provided in the same device.
  • the wireless communication function of the terminal 300 may be implemented by the antenna module 1, the antenna module 2, the radio frequency module 350, the communication module 360, a modem, and a baseband processor.
  • the antenna 1 and the antenna 2 are used for transmitting and receiving electromagnetic wave signals.
  • Each antenna in the terminal 300 may be used to cover a single or multiple communication frequency bands. Different antennas can also be multiplexed to improve antenna utilization. For example, a cellular network antenna can be multiplexed into a wireless LAN diversity antenna. In some embodiments, the antenna may be used in conjunction with a tuning switch.
  • the radio frequency module 350 may provide a communication processing module for a wireless communication solution including 2G / 3G / 4G / 5G and the like applied on the terminal 300. It may include at least one filter, switch, power amplifier, Low Noise Amplifier (LNA), and the like.
  • the radio frequency module receives electromagnetic waves from the antenna 1, and processes the received electromagnetic waves by filtering, amplifying, etc., and transmitting them to the modem for demodulation.
  • the radio frequency module can also amplify the signal modulated by the modem and turn it into electromagnetic wave radiation through the antenna 1.
  • at least part of the functional modules of the radio frequency module 350 may be disposed in the processor 310. In some embodiments, at least part of the functional modules of the radio frequency module 350 may be provided in the same device as at least part of the modules of the processor 310.
  • the modem may include a modulator and a demodulator.
  • the modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal.
  • the demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low-frequency baseband signal is processed by the baseband processor and then passed to the application processor.
  • the application processor outputs sound signals through audio equipment (not limited to speakers, receivers, etc.), or displays images or videos through a display screen.
  • the modem may be a separate device.
  • the modem may be independent of the processor and disposed in the same device as the radio frequency module or other functional modules.
  • the communication module 360 can provide wireless local area networks (WLAN) (such as Wireless Fidelity (Wi-Fi) networks), Bluetooth (BlueTooth, BT), and global navigation satellite systems applied to the terminal 300. (Global Navigation System, GNSS), Frequency Modulation (Frequency Modulation, FM), Near Field Communication (NFC), Infrared (IR) and other wireless communication solutions.
  • WLAN wireless local area networks
  • GNSS Global Navigation System
  • Frequency Modulation Frequency Modulation, FM
  • NFC Near Field Communication
  • IR Infrared
  • the communication module 360 may be one or more devices that integrate at least one communication processing module.
  • the communication module receives the electromagnetic wave through the antenna 2, frequency-modulates and filters the electromagnetic wave signal, and sends the processed signal to the processor.
  • the communication module 360 may also receive a signal to be transmitted from the processor, frequency-modulate it, amplify it, and turn it into electromagnetic wave radiation through the antenna 2.
  • the antenna 1 of the terminal 300 is coupled to a radio frequency module, and the antenna 2 is coupled to a communication module 360.
  • the wireless communication technology may include a Global System for Mobile Communications (GSM), a General Packet Radio Service (GPRS), a Code Division Multiple Access (CDMA), and a broadband Code Division Multiple Access (WCDMA), Time-Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC , FM, and / or IR technology.
  • the GNSS may include a Global Positioning System (GPS), a Global Navigation Satellite System (GLONASS), a BeiDou Navigation Navigation Satellite System (BDS), and a Quasi-Zenith Satellite System (Quasi).
  • GPS Global Positioning System
  • GLONASS Global Navigation Satellite System
  • BDS BeiDou Navigation Navigation Satellite System
  • QZSS Quasi-Zenith Satellite System
  • SBAS Satellite Based Augmentation Systems
  • the terminal 300 implements a display function through a GPU, a display screen 394, and an application processor.
  • the GPU is a microprocessor for image processing, which connects the display screen and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • the processor 310 may include one or more GPUs that execute program instructions to generate or change display information.
  • the display 394 is used to display images, videos, and the like.
  • the display includes a display panel.
  • the display panel can use a liquid crystal display (Liquid Crystal Display, LCD), organic light emitting diode (Organic Light-Emitting Diode, OLED), active matrix organic light emitting diode or active matrix organic light emitting diode (Active-Matrix Organic Light Emitting (Diode, AMOLED), Flexible Light-Emitting Diode (FLED), Miniled, MicroLed, Micro-oLed, Quantum Dot Light (Emitting Diodes, QLED), etc.
  • the terminal 300 may include one or N display screens, where N is a positive integer greater than 1.
  • the terminal 300 can implement a shooting function through an ISP, a camera 393, a video codec, a GPU, a display screen, and an application processor.
  • ISP is used to process data from camera feedback. For example, when taking a picture, the shutter is opened, and the light is transmitted to the light receiving element of the camera through the lens. The light signal is converted into an electrical signal, and the light receiving element of the camera passes the electrical signal to the ISP for processing and converts the image to the naked eye. ISP can also optimize the image's noise, brightness, and skin tone. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, an ISP may be provided in the camera 393.
  • the camera 393 is used to capture still images or videos.
  • An object generates an optical image through a lens and projects it onto a photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then passes the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs digital image signals to the DSP for processing.
  • DSP converts digital image signals into image signals in standard RGB, YUV and other formats.
  • the terminal 300 may include one or N cameras, where N is a positive integer greater than 1.
  • a digital signal processor is used to process digital signals. In addition to digital image signals, it can also process other digital signals. For example, when the terminal 300 selects at a frequency point, the digital signal processor is used to perform a Fourier transform on the frequency point energy and the like.
  • Video codecs are used to compress or decompress digital video.
  • the terminal 300 may support one or more codecs. In this way, the terminal 300 can play or record videos in multiple encoding formats, such as: MPEG1, MPEG2, MPEG3, MPEG4, and so on.
  • NPU is a neural network (Neural-Network, NN) computing processor.
  • NN neural network
  • the NPU can quickly process input information and continuously learn.
  • applications such as intelligent recognition of the terminal 300 can be implemented, such as: image recognition, face recognition, speech recognition, text understanding, and the like.
  • the external memory interface 320 may be used to connect an external memory card, such as a Micro SD card, to realize the expansion of the storage capacity of the terminal 300.
  • the external memory card communicates with the processor through an external memory interface to implement a data storage function. For example, save music, videos and other files on an external memory card.
  • the internal memory 321 may be used to store computer executable program code, where the executable program code includes instructions.
  • the processor 310 executes various functional applications and data processing of the terminal 300 by running instructions stored in the internal memory 321.
  • the memory 321 may include a storage program area and a storage data area.
  • the storage program area may store an operating system, at least one application required by a function (such as a sound playback function, an image playback function, etc.) and the like.
  • the storage data area can store data (such as audio data, phone book, etc.) created during the use of the terminal 300.
  • the data (such as audio data, phone book, etc.) created during the use of the terminal 300 may be referred to as user data.
  • the internal memory 321 may include high-speed random access memory (RAM), read-only memory (Read Only Memory, ROM), and may also include non-volatile memory, such as at least one disk storage device, flash memory device, Other volatile solid-state storage devices, Universal Flash Memory (Universal Flash Storage, UFS), etc.
  • RAM random access memory
  • ROM read-only memory
  • non-volatile memory such as at least one disk storage device, flash memory device, Other volatile solid-state storage devices, Universal Flash Memory (Universal Flash Storage, UFS), etc.
  • the internal memory 321 includes a data partition (such as a data partition) described in the embodiment of the present application.
  • the data partition stores files or data that need to be read and written when the operating system starts, and user data created during terminal use.
  • the data partition may be a storage area set in advance in the internal memory 321.
  • the data partition may be contained in a RAM in the internal memory 321.
  • the virtual data partition in the embodiment of the present application may be a storage area of the RAM in the internal memory 321.
  • the virtual data partition may be a storage area of a ROM in the internal memory 321.
  • the virtual data partition may be an external memory card connected to the external memory interface 320, such as a Micro SD card.
  • the terminal 300 can implement audio functions through an audio module 370, a speaker 370A, a receiver 370B, a microphone 370C, a headphone interface 370D, and an application processor. Such as music playback, recording, etc.
  • the audio module is used to convert digital audio information into an analog audio signal output, and is also used to convert an analog audio input into a digital audio signal.
  • the audio module can also be used to encode and decode audio signals.
  • the audio module may be disposed in the processor 310, or some functional modules of the audio module may be disposed in the processor 310.
  • the speaker 370A also called a "horn" is used to convert audio electrical signals into sound signals.
  • the terminal 300 can listen to music through a speaker or listen to a hands-free call.
  • the receiver 370B also known as the "earpiece" is used to convert audio electrical signals into sound signals.
  • the terminal 300 answers a call or a voice message, it can answer the voice by holding the receiver close to the human ear.
  • Microphone 370C also called “microphone”, “microphone”, is used to convert sound signals into electrical signals.
  • the user can make a sound through the mouth close to the microphone, and input the sound signal into the microphone.
  • the terminal 300 may be provided with at least one microphone.
  • the terminal 300 may be provided with two microphones, and in addition to collecting sound signals, it may also implement a noise reduction function.
  • the terminal 300 may further be provided with three, four, or more microphones to collect sound signals, reduce noise, and also identify sound sources, and implement a directional recording function.
  • the headset interface 370D is used to connect a wired headset.
  • the earphone interface can be a USB interface or a 3.5mm Open Mobile Terminal Platform (OMTP) standard interface, and the Cellular Telecommunications Industry Association of the USA (CTIA) standard interface.
  • OMTP Open Mobile Terminal Platform
  • CTIA Cellular Telecommunications Industry Association of the USA
  • the pressure sensor 380A is used to sense the pressure signal, and can convert the pressure signal into an electrical signal.
  • the pressure sensor may be disposed on the display screen.
  • the capacitive pressure sensor may be at least two parallel plates having a conductive material. When a force is applied to the pressure sensor, the capacitance between the electrodes changes.
  • the terminal 300 determines the intensity of the pressure according to the change in capacitance.
  • the terminal 300 detects the intensity of the touch operation according to a pressure sensor.
  • the terminal 300 may also calculate the touched position based on the detection signal of the pressure sensor.
  • touch operations acting on the same touch position but different touch operation intensities may correspond to different operation instructions. For example, when a touch operation with a touch operation intensity lower than the first pressure threshold is applied to the short message application icon, an instruction for viewing the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold is applied to the short message application icon, an instruction for creating a short message is executed.
  • the gyro sensor 380B may be used to determine a motion posture of the terminal 300.
  • the angular velocity of the terminal 300 around three axes may be determined by a gyro sensor.
  • a gyroscope sensor can be used for image stabilization. Exemplarily, when the shutter is pressed, the gyro sensor detects the angle of the terminal 300 to shake, and calculates the distance that the lens module needs to compensate according to the angle, so that the lens can offset the shake of the terminal 300 through reverse movement to achieve anti-shake.
  • the gyroscope sensor can also be used for navigation and somatosensory game scenes.
  • the barometric pressure sensor 380C is used to measure air pressure.
  • the terminal 300 calculates the altitude through the air pressure value measured by the air pressure sensor to assist in positioning and navigation.
  • the magnetic sensor 380D includes a Hall sensor.
  • the terminal 300 can detect the opening and closing of the flip leather case by using a magnetic sensor.
  • the terminal 300 may detect the opening and closing of the flip according to a magnetic sensor. Further, according to the opened and closed state of the holster or the opened and closed state of the flip cover, characteristics such as automatic unlocking of the flip cover are set.
  • the acceleration sensor 380E can detect the magnitude of the acceleration of the terminal 300 in various directions (generally three axes).
  • the magnitude and direction of gravity can be detected when the terminal 300 is stationary. It can also be used to identify the posture of the terminal, and is used in applications such as switching between horizontal and vertical screens, and pedometers.
  • the terminal 300 can measure the distance by infrared or laser. In some embodiments, when shooting a scene, the terminal 300 may use a distance sensor to measure distances to achieve fast focusing.
  • the proximity light sensor 380G may include, for example, a light emitting diode (LED) and a light detector, such as a photodiode.
  • the light emitting diode may be an infrared light emitting diode. Infrared light is emitted outward through a light emitting diode.
  • the terminal 300 may use a proximity light sensor to detect that the user is holding the terminal 300 close to the ear to talk, so as to automatically turn off the screen to save power.
  • the proximity light sensor can also be used in holster mode, and the pocket mode automatically unlocks and locks the screen.
  • Ambient light sensor 380L is used to sense ambient light brightness.
  • the terminal 300 can adaptively adjust the brightness of the display screen according to the perceived ambient light brightness.
  • the ambient light sensor can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor can also cooperate with the proximity light sensor to detect whether the terminal 300 is in a pocket to prevent accidental touch.
  • the fingerprint sensor 380H is used to collect fingerprints.
  • the terminal 300 may use the collected fingerprint characteristics to realize fingerprint unlocking, access application lock, fingerprint photographing, fingerprint answering an incoming call, and the like.
  • the temperature sensor 380J is used to detect the temperature.
  • the terminal 300 executes a temperature processing strategy using the temperature detected by the temperature sensor. For example, when the temperature reported by the temperature sensor exceeds a threshold, the terminal 300 executes reducing the performance of a processor located near the temperature sensor in order to reduce power consumption and implement thermal protection.
  • the touch sensor 380K is also called “touch panel”. Can be set on the display. Used to detect touch operations on or near it. The detected touch operation can be passed to the application processor to determine the type of touch event and provide corresponding visual output through the display screen.
  • the bone conduction sensor 380M can acquire vibration signals.
  • the bone conduction sensor may obtain a vibration signal of a human voice oscillating bone mass.
  • Bone conduction sensors can also touch the human pulse and receive blood pressure beating signals.
  • a bone conduction sensor may also be provided in the headset.
  • the audio module 370 may analyze a voice signal based on a vibration signal of the oscillating bone mass obtained by the bone conduction sensor to implement a voice function.
  • the application processor may analyze the heart rate information based on the blood pressure beating signal obtained by the bone conduction sensor to implement a heart rate detection function.
  • the keys 390 include a start key, a volume key, and the like.
  • the keys can be mechanical keys. It can also be a touch button.
  • the terminal 300 receives key input, and generates key signal inputs related to user settings and function control of the terminal 300.
  • the motor 391 may generate a vibration alert.
  • the motor can be used for incoming vibration alert and touch vibration feedback.
  • the touch operation applied to different applications can correspond to different vibration feedback effects.
  • Touch operations on different areas of the display can also correspond to different vibration feedback effects.
  • Different application scenarios (such as time reminders, receiving information, alarm clocks, games, etc.) can also correspond to different vibration feedback effects.
  • Touch vibration feedback effect can also support customization.
  • the indicator 392 can be an indicator light, which can be used to indicate the charging status, power change, and can also be used to indicate messages, missed calls, notifications, etc.
  • the SIM card interface 395 is used to connect to a Subscriber Identity Module (SIM).
  • SIM Subscriber Identity Module
  • the SIM card can be contacted and separated from the terminal 300 by inserting or removing the SIM card interface.
  • the terminal 300 may support one or N SIM card interfaces, and N is a positive integer greater than 1.
  • the SIM card interface can support Nano SIM cards, Micro SIM cards, SIM cards, etc. Multiple SIM cards can be inserted into the same SIM card interface at the same time. The types of the multiple cards may be the same or different.
  • the SIM card interface is also compatible with different types of SIM cards.
  • the SIM card interface is also compatible with external memory cards.
  • the terminal 300 interacts with the network through the SIM card, and realizes functions such as calling and data communication.
  • the terminal 300 uses an eSIM, that is, an embedded SIM card.
  • the eSIM card can be embedded in the terminal 300 and cannot be separated from the terminal 300.
  • the method for updating the wake-up voice of the voice assistant provided by the terminal provided in the embodiment of the present application may be implemented in the terminal 300 described above.
  • An embodiment of the present application provides a method for a terminal to update a wake-up voice of a voice assistant.
  • the terminal 300 may receive the first voice data input by the user; determine whether the text corresponding to the first voice data matches the text of the preset wake-up word registered in the terminal 300; if the text corresponding to the first voice data matches the text of the preset wake-up word If they match, the terminal 300 authenticates the user. If the authentication succeeds, the terminal 300 uses the first voice data to update the first voiceprint model in the terminal.
  • the first voiceprint model is used to perform voiceprint verification when the voice assistant is awakened, and the first voiceprint model represents the voiceprint characteristics of the preset wakeup word in the terminal.
  • the terminal performs identity authentication on the user. Specifically, the terminal uses the first voiceprint model to perform voiceprint verification on the first voice data. If the first voice data passes the voiceprint verification, it means that the identity authentication is passed.
  • the first voice data is a wake-up of the voice assistant sent by the user who passed the identity authentication. voice.
  • the first voice data since the first voice data is the voice data of the user acquired by the terminal 300 in real time; therefore, the first voice data may reflect the physical state of the user and / or the real-time condition of the noise scene in which the user is located.
  • using the first voice data to update the voiceprint model of the terminal 300 can improve the voice wake-up rate and reduce the false wake-up rate when the terminal performs voice wake-up.
  • the first voice data is automatically acquired by the terminal 300 during the voice wake-up process performed by the terminal 300, instead of prompting the user to manually re-register the wake-up word to receive user input.
  • using the first voice data to update the voiceprint model can also simplify the process of updating the wake word.
  • An embodiment of the present application provides a method for a terminal to update a wake-up voice of a voice assistant.
  • the method for updating the wake-up voice of the voice assistant by the terminal may include S401-S405:
  • the terminal 300 receives first voice data.
  • the terminal 300 determines whether the text corresponding to the first voice data matches the text of the preset wake-up word registered in the terminal.
  • the DSP of the terminal 300 may notify the AP of the terminal 300 to perform text verification and voice print verification on the first voice data.
  • the AP may perform text verification on the first voice data by determining whether the text corresponding to the first voice data matches the text of a preset wake-up word registered in the terminal. If the text corresponding to the first voice data matches the text of the preset wake-up word registered in the terminal (if the same), the AP may continue to perform voiceprint verification on the first voice data, that is, the terminal 300 continues to execute S403. If the text corresponding to the first voice data does not match the text of the preset wake-up word registered in the terminal, the terminal 300 may delete the first voice data, that is, the terminal 300 may continue to execute S405.
  • the terminal 300 performs voiceprint verification on the first voice data using the first voiceprint model.
  • the first voiceprint model is used to perform voiceprint verification when the voice assistant is woken up.
  • the first voiceprint model is used to characterize the voiceprint features of the wake-up words registered in the terminal 300.
  • terminal registering wake-up words when the terminal 300 registers a preset wake-up word, voice data (referred to as registered voice data) is recorded.
  • the preset wake-up word registered in the terminal 300 may include the registered voice data.
  • the first voiceprint model is generated based on the registered voice data.
  • the registered voice data can be used as an input value to substitute the first voiceprint model to obtain the first voiceprint threshold.
  • the method for the terminal 300 to perform voiceprint verification on the first voice data using the first voiceprint model may include: After the terminal 300 determines that the first voice data passes the text verification, the terminal 300 may use the first voice data as an input value and substitute it into the first Voiceprint model to get a voiceprint value. The terminal 300 determines whether the difference between the voiceprint value and the first voiceprint threshold is less than a preset threshold. If the difference between the voiceprint value and the first voiceprint threshold is less than a preset threshold, the voiceprint verification passes. If the difference between the voiceprint value and the first voiceprint threshold is greater than or equal to a preset threshold, the voiceprint verification fails.
  • the terminal 300 may use the first voice data to update the first voiceprint model in the terminal 300, that is, the terminal 300 may continue to execute S404. If the first voice data fails the voiceprint verification, the terminal 300 may delete the first voice data, that is, the terminal 300 may continue to execute S405.
  • the terminal 300 updates the first voiceprint model in the terminal 300 with the first voice data.
  • the method in which the terminal 300 uses the first voice data to update the first voiceprint model may include: the terminal 300 generates a second voiceprint model according to the first voice data, and uses the second voiceprint model to replace the first voiceprint model .
  • the method for generating the second voiceprint model by the terminal 300 according to the first voice data may refer to the method for generating a voiceprint model by the terminal in the conventional technology. This embodiment of the present application will not repeat them here.
  • the terminal 300 deletes the first voice data.
  • An embodiment of the present application provides a method for a terminal to update a wake-up voice of a voice assistant.
  • the terminal 300 may obtain first voice data that passes text verification and voice print verification when the terminal 300 performs voice wake-up. Then, the first voice data model in the terminal 300 is updated using the first voice data.
  • the first voice data is the voice data of the user obtained by the terminal 300 in real time; therefore, the first voice data may reflect the physical state of the user and / or the real-time condition of the noise scene in which the user is located.
  • the first voice data passes the text check and voiceprint check; therefore, using the first voice data to update the voiceprint model of the terminal 300 can improve the voice wake-up rate and reduce the false wake-up rate when the terminal performs voice wake-up.
  • the first voice data is automatically acquired by the terminal 300 during the voice wake-up process performed by the terminal 300, instead of prompting the user to manually re-register the wake-up word to receive user input.
  • using the first voice data to update the voiceprint model can also simplify the process of updating the wake word.
  • the terminal 300 may start a voice assistant.
  • the user may speak a preset wake-up word (ie, voice data) of the terminal 300 during a conversation with others.
  • voice data the real purpose of the user speaking the preset wake-up word of the terminal 300 is not to start the voice assistant.
  • the voice assistant of the terminal 300 is activated, the user will not trigger the terminal 300 to perform any function through voice.
  • this type of voice wakeup is referred to as invalid voice wakeup. That is, after the voice assistant is started, the terminal 300 does not receive a valid voice command through the voice assistant.
  • the terminal 300 can determine whether to use the first voice data to update the first voiceprint model in the terminal 300 by determining whether the voice assistant has received a valid voice command after the voice assistant is started.
  • an embodiment of the present application provides a method for a terminal to update a wake-up voice of a voice assistant.
  • the method for updating the wake-up voice of the voice assistant by the terminal may include S401-S403, S501-S503, S404, and S405:
  • the terminal 300 may continue to execute S501-S503. If the first voice data fails the voiceprint verification, the terminal 300 may continue to execute S405.
  • the terminal 300 starts a voice assistant.
  • the terminal 300 receives the second voice data through a voice assistant.
  • the voice assistant After the voice assistant is started, it can receive the second voice data input by the user, and trigger the terminal 100 to execute the function corresponding to the second voice data.
  • the terminal is a mobile phone 400 shown in FIG. 4B as an example.
  • the mobile phone 400 may display the "voice assistant" interface 401 shown in FIG. 4B.
  • the "Voice Assistant" interface 401 includes a "Record” button 403 and a "Setting” option 404.
  • the mobile phone 400 may receive a voice command issued by the user in response to a user's click operation (such as a long-press operation) on the "Record” button 403, and trigger the mobile phone 400 to execute an event corresponding to the voice command.
  • the “setting” option 404 is used to set various functions and parameters of the “Voice Assistant” application.
  • the mobile phone 400 may receive a user's click operation on the “setting” option 306 in the voice control interface 303. In response to the user's click operation on the “setting” option 404, the mobile phone 400 may display the voice control interface 106 shown in (c) in FIG. 1.
  • the "voice assistant" interface 401 may further include prompt information 402. The prompt information 402 is used to indicate a common function of the "Voice Assistant" application to the user.
  • the "voice assistant" interface 401 may not include a “record” button 403.
  • the user does not need to click any button (such as the "Record” button 403) in the "Voice Assistant” interface, and the mobile phone 400 can also record voice commands issued by the user.
  • the "Voice Assistant" interface of the terminal 300 includes, but is not limited to, the “Voice Assistant” interface 401 shown in FIG. 4B.
  • the terminal 300 determines whether the second voice data is a valid voice command.
  • the effective voice command described in the embodiment of the present application refers to an instruction capable of triggering the terminal 300 to perform a corresponding function.
  • the terminal 300 receives an instruction (that is, a valid voice command) for triggering the terminal 300 to perform a corresponding function through the voice assistant, it means that the terminal will execute the corresponding function in response to the valid voice command , You can determine that this voice wakeup is a voice wakeup that matches the user's intention. In the embodiment of the present application, this voice wakeup is referred to as effective voice wakeup.
  • the terminal 300 performs a voice wake-up rate of voice wake-up.
  • the terminal only updates the wake-up word of the terminal 300 with the voice data corresponding to the effective voice wake-up.
  • the voice assistant of the terminal 300 receives a valid voice command after being started, it means that the user using the first voice data to wake up the voice assistant of the terminal 300 is a valid voice wakeup, that is, the second voice data is a valid voice Command, the terminal 300 can execute 404.
  • the second voice data is not received after the voice assistant of the terminal 300 is started, it means that the user using the first voice data to wake up the voice assistant of the terminal 300 is an invalid voice wakeup, that is, the second voice data is not a valid voice command.
  • the terminal 300 may delete the first voice data, that is, execute S405.
  • the terminal 300 uses the first voice data to update the first voice data in the terminal 300 only after receiving a valid voice command for triggering the terminal 300 to perform a corresponding function.
  • a voiceprint model If a valid voice command is received after the voice assistant of the terminal 300 is started, it means that the voice wakeup is a valid voice wakeup in accordance with the user's intention.
  • the voiceprint model of the terminal 300 is updated by using the voice data that can reflect the user's true intention and can successfully wake up the terminal 300, which can further improve the voice wake-up rate of the terminal to perform voice wake-up and reduce the false wake-up rate.
  • the terminal 300 uses the first voice data to update the first voiceprint model, the terminal 300 uses the updated voiceprint model to perform voice wake-up, which will affect the success of voice wake-up rate.
  • the terminal 300 may determine whether the signal quality parameter of the first voice data is higher than that of the first voiceprint model before using the first voice data to update the first voiceprint model.
  • Two preset thresholds The signal quality parameters of the voice data are used to characterize the signal quality of the voice data.
  • the signal quality parameter of the voice data may be a signal-to-noise ratio of the voice data. If the signal quality parameter of the first voice data is higher than the second preset threshold, it means that the signal quality of the first voice data is relatively high. In this case, the terminal 300 may update the first voiceprint model by using the first voice data. If the signal quality parameter of the first voice data is lower than or equal to the second preset threshold, the terminal 300 may delete the first voice data.
  • the user may also decide whether to use the first voice data to update the first voiceprint model in the terminal 300.
  • the terminal 300 may further display a first interface for prompting the user whether to update the voiceprint model.
  • the terminal 300 determines whether to update the voiceprint model according to the user's selection in the first interface.
  • the terminal 300 is a mobile phone 500 shown in FIG. 5B as an example.
  • the mobile phone 500 may display the first interface 501 shown in FIG. 5B before the first voiceprint model in the mobile phone 500 is updated with the first voice data.
  • the first interface 501 is used to prompt the user whether to update the voiceprint model (that is, the wake word).
  • the first interface 501 includes first prompt information, such as "the mobile phone obtains voice data that can update the wake-up word during the voice wake-up process” and "is the wake-up word updated?"
  • the first interface 501 further includes: an "update” option for triggering the mobile phone 500 to update the voiceprint model and a "cancel” option for triggering the mobile phone 500 not to update the voiceprint model.
  • the terminal 300 before updating the voiceprint model, displays a first interface for prompting the user whether to update the voiceprint model. In this way, the user can decide whether to update the first interface of the voiceprint model. That is, the terminal 300 can determine whether to update the first interface of the voiceprint model according to user requirements, which can improve the interaction performance between the terminal 300 and the user, and improve the user experience.
  • terminal registering wake-up words it is known that when the terminal 300 registers a preset wake-up word, one or more voice data (referred to as registered voice data) is recorded.
  • the first voiceprint model is generated based on the one or more registered voice data. It is assumed that the first voiceprint model is generated based on at least two registered voice data. Then, after the terminal 300 generates a new voiceprint model according to the first voice data, if the first voiceprint model is directly replaced with the new voiceprint model, the voice wakeup rate of the terminal 300 performing voice wakeup can be improved.
  • the method in which the terminal 300 uses the first voice data to update the first voiceprint model may include S601-S603.
  • S404 shown in FIG. 5A may include S601-S603:
  • the terminal 300 uses the first voice data to replace the third voice data in the at least two registered voice data, and obtains at least two updated registered voice data.
  • the terminal 300 generates a second voiceprint model according to the updated at least two registered voice data.
  • the terminal 300 replaces the first voiceprint model with the second voiceprint model.
  • the terminal 300 may determine the third voice data from the at least two registered voice data saved by the terminal 300.
  • the third voice data is voice data in which the signal quality parameter of the at least two registered voice data is lower than the signal quality parameters of other voice data.
  • the terminal 300 uses the first voice data to replace the third voice data whose signal quality parameter is lower than the signal quality parameters of other voice data; and then generates a second voiceprint model according to the updated at least two registered voice data.
  • the voice data replaced by the first voice data in the at least two registered voice data has lower signal quality parameters. That is, the signal quality parameters of the retained voice data (that is, at least two updated registered voice data) are higher.
  • the second voiceprint model generated by the terminal 300 based on the voice data with higher signal quality parameters can more accurately and clearly characterize the voiceprint characteristics of the user.
  • the terminal 300 uses the second voiceprint model to perform voice wake-up, which can increase the voice wake-up rate and reduce the false wake-up rate of the terminal performing voice wake-up.
  • the third voice data may be the earliest voice data stored by the terminal among the at least two registered voice data.
  • the earliest voice data that is, the third voice data
  • the terminal is related to the user's current physical state and the user's current
  • the consistency of the real-time conditions of the noise scene at the place is low. Therefore, after the first voice data is used to replace the third voice data, the real-time conditions of the retained voice data (that is, at least two registered voice data after update) and the current physical state of the user and the noise scene in which the user is currently located can be improved.
  • the real-time conditions of the retained voice data that is, at least two registered voice data after update
  • the second voiceprint model generated by the terminal 300 according to the voice data with a higher degree of conformity can more accurately and clearly characterize the voiceprint characteristics of the user under the user's current body state and the current noise scene.
  • the terminal 300 uses the second voiceprint model to perform voice wake-up, which can increase the voice wake-up rate and reduce the false wake-up rate of the terminal performing voice wake-up.
  • the terminal 300 uses the first voice data to replace part of the voice data in the at least two registered voice data, such as the third voice data; instead of generating the second voiceprint model completely based on the first voice data.
  • the voice wake-up rate of the voice wake-up performed by the terminal 300 can be relatively stabilized.
  • the false wake-up rate of the voice wake-up performed by the terminal 300 can be reduced.
  • the method in the embodiment of the present application may further include S701-S702:
  • the terminal 300 generates a second voiceprint threshold according to the second voiceprint model and the updated at least two registered voice data.
  • the second voiceprint model is equivalent to a function.
  • the terminal 300 may use each of the updated at least two registered voice data as input values, respectively, and substitute them into the second voiceprint model to obtain at least two voiceprint thresholds.
  • the terminal 300 may calculate an average value of the at least two voiceprint thresholds to obtain a second voiceprint threshold.
  • the at least two updated registered voice data include the registered voice data a and the registered voice data b.
  • the terminal 300 may substitute the registered voice data a into the second voiceprint model to obtain the voiceprint threshold A; substitute the registered voice data b into the second voiceprint model to obtain the voiceprint threshold B; calculate the voiceprint threshold A and the voiceprint threshold B. The average, to get the second voiceprint threshold.
  • the terminal 300 determines whether a difference between the second voiceprint threshold and the first voiceprint threshold is less than a first preset threshold.
  • the terminal 300 may execute S603.
  • the terminal 300 may execute S703:
  • the terminal 300 deletes the second voiceprint model and the first voice data.
  • the terminal 300 deletes the second voiceprint model and the first voice data, that is, the first voiceprint model is not used to replace the second voiceprint model.
  • the large difference between the second voiceprint threshold and the first voiceprint threshold can prevent the wake-up rate of the terminal 300 from performing the voice wakeup from fluctuating greatly, affecting the user experience.
  • An embodiment of the present application provides a method for a terminal to update a wake-up voice of a voice assistant.
  • the method for updating the wake-up voice of the voice assistant by the terminal may include S801-S808:
  • the terminal 300 receives first voice data.
  • the terminal 300 determines whether the text corresponding to the first voice data matches the text of the preset wake-up word registered in the terminal.
  • the AP may continue to perform voiceprint verification on the first voice data, that is, the terminal 300 continues to perform S803. If the text corresponding to the first voice data does not match the text of the preset wake-up word registered in the terminal, the terminal 300 may delete the first voice data, that is, the terminal 300 may continue to execute S808.
  • the terminal 300 performs voiceprint verification on the first voice data by using the first voiceprint model.
  • the terminal 300 may continue to execute S804. If the first voice data fails the voiceprint verification, the terminal 300 may continue to execute S808.
  • the terminal 300 starts a voice assistant.
  • the terminal 300 performs text verification on the voice data received within the first preset time.
  • the terminal 300 determines whether the terminal 300 receives the second voice data and at least one voice data that matches the text of the preset wake-up word within the first preset time.
  • the first preset time is a pre-determined time determined from the terminal 300 that the first voice data is the same as the text information of the wake-up word registered in the terminal 300 (that is, the first voice data passes the text verification), but fails to start the voiceprint verification Set the time period.
  • the AP of the terminal 300 is in a sleep state.
  • the DSP of the terminal 300 monitors the first voice data.
  • the DSP hands the monitored voice data to the AP, and the AP is woken up.
  • the AP performs text verification and voiceprint verification on the voice data to determine whether the voice data matches the generated voiceprint model.
  • the AP enters the sleep state until it receives the voice data sent by the DSP again.
  • the DSP will only send to the AP voice data that has a certain degree of similarity with the wake word registered in the terminal 300.
  • the AP only performs text verification and voiceprint verification on the voice data sent by the DSP (that is, the voice data whose similarity with the wake-up word registered in the terminal 300 satisfies certain conditions).
  • the DSP can recognize the first voice data and the wake-up registered in the terminal 300 The similarity of words meets certain conditions.
  • the DSP may transmit the first voice data to the AP to wake up the AP.
  • the AP performs text verification and voiceprint verification on the first voice data.
  • the AP determines that the first voice data is the same as the text information of the wake word registered in the terminal 300 (that is, the first voice data can pass the text verification), but the first voice data fails Voiceprint verification, then the AP will not enter the sleep state immediately after receiving the verification result. Instead, the DSP delivers all voice data monitored in the first preset time to the AP, and the AP can perform text verification on all voice data monitored by the DSP in the first preset time.
  • the first voiceprint model is used to perform voiceprint verification when the voice assistant is awakened.
  • the first voiceprint model can represent the voiceprint features of the wake-up words registered in the terminal.
  • the text corresponding to the second voice data contains a preset keyword.
  • the second voice data may be voice data in which the user complains that the voice wake-up fails, such as "how to wake up”, “how not”, “not responding”, “unable to wake up”, and "voice wake up failed”.
  • the AP performs text verification on all voice data monitored by the DSP within the first preset time. If the AP recognizes the second voice data such as "how to wake up”, “how not to wake up”, “not responding", “unable to wake up”, and "voice wake up failure" at the first preset time, and at least one The text information is the same voice data as the text information of the wake-up word registered in the terminal 300, then the terminal 300 may use the first voice data received by the terminal 300 to update the first voiceprint model of the terminal 300.
  • the terminal 300 receives the first voice data in S801 and finds that the voiceprint verification of the first voice data fails. Subsequently, the terminal 300 can receive at least one voice data that passes the text verification within the first preset time, which indicates that the user repeatedly wants to voice wake up the voice assistant of the terminal 300, but the voice wake-up fails. In this case, if the terminal 300 also receives the second voice data within the first preset time, it indicates that the user is dissatisfied with the result of the voice wake-up failure.
  • the terminal 300 receives the second voice data and the voice data that passes at least one text verification within the first preset time, indicating that the user has a strong willingness to wake up the voice assistant by voice; however, it may be because the user's current physical state and the user register the wake word The state of the body is very different at the time, resulting in multiple speech failures. Of course, it may also be because the real-time situation of the noise scene in which the user is currently located is different from the real-time situation of the noise scene in which the user is registering the wake-up word, resulting in multiple voice failures. In this case, even if the first voice data fails the voiceprint check, the terminal 300 may use the received first voice data to update the first voiceprint model in the terminal 300. That is, if the terminal 300 receives the second voice data and at least one voice data matching the text of the preset wake-up word within the first preset time, the first voice data model in the terminal is updated with the first voice data. , Then execute S807.
  • the terminal 300 updates the first voiceprint model of the terminal 300 with the first voice data.
  • the terminal 300 may delete the first voice data if the terminal 300 does not receive the second voice data and at least one voice data matching the text of the preset wake-up word within the first preset time.
  • the terminal 300 deletes the first voice data.
  • the method in which the terminal 300 uses the first voice data to update the first voiceprint model in the terminal 300 may include: the terminal 300 generates a second voiceprint model according to the first voice data, and uses the second voiceprint model to replace the first voiceprint model. .
  • the method for generating the second voiceprint model by the terminal 300 according to the first voice data may refer to the method for generating a voiceprint model by the terminal in the conventional technology. This embodiment of the present application will not repeat them here.
  • the first voice data is the voice data of the user acquired by the terminal 300 in real time; therefore, the first voice data may reflect the physical state of the user and / or the real-time condition of the noise scene in which the user is located. Therefore, by using the first voice data to update the voiceprint model of the terminal 300, the voice wake-up rate of the voice wake-up performed by the terminal can be improved, and the false wake-up rate can be reduced.
  • the received first voice data is voice data sent by the user for activating the voice assistant under the strong will of the voice assistant of the voice wake-up terminal 300. Therefore, the voiceprint model of the terminal 300 is updated by using voice data that can reflect the user's true intention, which can further increase the voice wake-up rate and reduce the false wake-up rate when the terminal performs voice wake-up.
  • the received first voice data is automatically acquired by the terminal 300 during the voice wake-up process performed by the terminal 300, instead of prompting the user to manually re-register the wake-up word and receiving user input.
  • updating the voiceprint model by using the received first voice data can also simplify the process of updating the wake word.
  • the terminal 300 uses the first voice data to update the first voiceprint model, the terminal 300 uses the updated voiceprint model to perform voice wake-up, which will affect the success of voice wake-up rate.
  • the terminal 300 may determine whether the signal quality parameter of the first voice data is higher than the first voiceprint model before updating the first voiceprint model with the first voice data.
  • Two preset thresholds The signal quality parameters of the voice data are used to characterize the signal quality of the voice data.
  • the signal quality parameter of the voice data may be a signal-to-noise ratio of the voice data. If the signal quality parameter of the first voice data is higher than the second preset threshold, it means that the signal quality of the first voice data is relatively high. In this case, the terminal 300 may update the first voiceprint model by using the first voice data. If the signal quality parameter of the first voice data is lower than or equal to the second preset threshold, the terminal 300 may delete the first voice data.
  • the terminal 300 may also use the at least one voice data that matches the text of the preset wake-up word to update the first voiceprint model. Specifically, the terminal may select voice data with a signal quality parameter higher than a second preset threshold from the first voice data and at least one voice data that matches the text of the preset wake-up word; and then use the voice signal quality higher than The second preset threshold of speech data updates the first voiceprint model.
  • the first voiceprint model in the terminal 300 is updated to achieve the purpose of awakening the terminal 300 by voice.
  • the terminal 300 may perform user identity verification before executing S807. After the user authentication is passed, S807 is performed again. Specifically, after S806 and before S807, the terminal 300 may perform identity authentication on the user; if the identity authentication passes, the terminal 300 performs S807; if the identity authentication fails, the terminal 300 performs S808.
  • the method for the terminal to authenticate the user may include S901-S903. As shown in FIG. 9, after S806 shown in FIG. 8 and before S807, the method in this embodiment of the present application may further include S901-S903:
  • the terminal 300 displays an identity verification interface.
  • the authentication interface is used to receive authentication information input by a user.
  • the terminal 300 receives the authentication information input by the user on the authentication interface.
  • the terminal 300 performs user identity verification according to the identity verification information.
  • the terminal 300 updates the first voiceprint model with the first voice data, that is, the terminal 300 executes S807. If the identity authentication fails, the terminal 300 deletes the first voice data, that is, the terminal 300 executes S808.
  • the identity verification information may be any one of a digital password, a pattern password, fingerprint information, iris information, and facial feature information.
  • the aforementioned authentication interface may be any one of an interface for inputting a digital password or a pattern password, an interface for entering fingerprint information, an interface for entering iris information, and an interface for entering facial feature information.
  • the terminal 300 is the mobile phone 1000 shown in FIG. 10, the above-mentioned identity verification information is a digital password, and the above-mentioned identity verification interface is an interface for entering a digital password as an example.
  • the mobile phone 1000 can display the authentication interface 1001 shown in FIG. 10.
  • the authentication interface 1001 includes a password input box 1002 and a first prompt message "After the user authentication is passed, the mobile phone will automatically update the wake-up word" 1003.
  • the terminal 300 performs user authentication. After the user authentication is passed, the terminal updates the first voiceprint model in the terminal 300. In this way, it is possible to prevent a malicious user from using the voice of the malicious user to trigger the terminal 300 to update the first voiceprint model in the terminal 300, so as to achieve the purpose of waking the terminal 300 by malicious voice. With this solution, the voiceprint model in the terminal 300 can be prevented from being maliciously updated, and the security of the terminal 300 can be improved.
  • the new voiceprint model is directly generated by using the first voice data, and the first voiceprint model is replaced by the new voiceprint model, although the voice wakeup rate of the terminal 300 performing voice wakeup can be improved.
  • directly replacing the first voiceprint model with the voiceprint model generated based on the first voice data will greatly improve the voice wake-up rate.
  • a substantial increase in the voice wake-up rate may increase the false wake-up rate of the voice wake-up performed by the terminal 300 accordingly.
  • the above S807 may include the above S601-S603.
  • the terminal uses the first voice data to replace part of the voice data in the at least two registered voice data; instead of generating the second voiceprint model completely based on the first voice data.
  • the voice wake-up rate of the voice wake-up performed by the terminal 300 can be relatively stabilized.
  • the false wake-up rate of the voice wake-up performed by the terminal 300 can be reduced.
  • the method in this embodiment of the present application may further include S701-S702:
  • the terminal 300 may execute S603.
  • the terminal 300 may execute S703.
  • the terminal 300 deletes the second voiceprint model and the third voice data, that is, the first voiceprint model is not used to replace the second voiceprint model.
  • the large difference between the second voiceprint threshold and the first voiceprint threshold can prevent the wake-up rate of the terminal 300 from performing the voice wakeup from fluctuating greatly, affecting the user experience.
  • the foregoing terminal and the like include a hardware structure and / or a software module corresponding to performing each function.
  • the embodiments of the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is performed by hardware or computer software-driven hardware depends on the specific application of the technical solution and design constraints. Professional technicians can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of the embodiments of the present application.
  • each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.
  • the above integrated modules may be implemented in the form of hardware or software functional modules. It should be noted that the division of the modules in the embodiments of the present application is schematic, and is only a logical function division. In actual implementation, there may be another division manner.
  • FIG. 13 shows a possible structural diagram of a terminal involved in the foregoing embodiment.
  • the terminal 1300 includes: a storage unit 1301, an input unit 1302, and a text check.
  • the storage unit 1301 stores a preset wake-up word registered in the terminal 1300 and a first voiceprint model.
  • the first voiceprint model is used for voiceprint verification when the voice assistant is woken up.
  • the first voiceprint model represents the voiceprint characteristics of the preset wake word.
  • the input unit 1302 is used to support the terminal 1300 to perform S401, S502, S801, and S902 in the foregoing method embodiments, and / or other processes used in the technology described herein.
  • the text verification unit 1303 is configured to support the terminal 1300 to perform S402, S802, and S805 in the foregoing method embodiments, and / or other processes used in the technology described herein.
  • the voiceprint verification unit 1304 is configured to support the terminal 1300 to perform S403, S803 in the foregoing method embodiments, and / or other processes used in the technology described herein.
  • the update unit 1305 is configured to support the terminal 1300 to perform S404, S603, and S807 in the foregoing method embodiments, and / or other processes used in the technology described herein.
  • the terminal 1300 may further include: a starting unit and a determining unit.
  • the initiating unit is configured to support the terminal 1300 to perform S501, S804 in the foregoing method embodiments, and / or other processes used in the technology described herein.
  • the determining unit is configured to support the terminal 1300 to perform S503 in the foregoing method embodiment, and / or other processes used in the technology described herein.
  • the terminal 1300 may further include: an identity authentication unit 1306.
  • the identity authentication unit 1306 is configured to support the terminal 1300 to perform user identity verification on the user.
  • the identity authentication unit 1306 is configured to support the terminal 1300 to perform S903 in the foregoing method embodiment, and / or other processes used in the technology described herein.
  • the terminal 1300 may further include a display unit.
  • the display unit is configured to support the terminal 1300 to execute S901 in the foregoing method embodiment, and / or other processes used in the technology described herein.
  • the terminal 1300 may further include a replacement unit and a generation unit.
  • the replacement unit is configured to support the terminal 1300 to perform S601 in the foregoing method embodiment, and / or other processes used in the technology described herein.
  • the generating unit is configured to support the terminal 1300 to perform S602, S701 in the foregoing method embodiment, and / or other processes used in the technology described herein.
  • the terminal 1300 may further include: a deleting unit.
  • the deleting unit is configured to support the terminal 1300 to perform S405, S703, and S808 in the foregoing method embodiments, and / or other processes used in the technology described herein.
  • the terminal 1300 may further include a judging unit.
  • the judging unit is configured to support the terminal 1300 to execute S702 and S806 in the foregoing method embodiments, and / or other processes used in the technology described herein.
  • the terminal 1300 includes, but is not limited to, the unit modules listed above.
  • the terminal 300 may further include a receiving unit and a transmitting unit.
  • the receiving unit is used to receive data or instructions sent by other terminals.
  • the sending unit is used to send data or instructions to other terminals.
  • the functions that can be implemented by the above functional units also include, but are not limited to, the functions corresponding to the method steps described in the above examples. For detailed descriptions of other units of the terminal 1300, refer to the detailed description of the corresponding method steps. Examples are not repeated here.
  • FIG. 15 shows a possible structural diagram of a terminal involved in the foregoing embodiment.
  • the terminal 1500 includes a processing module 1501, a storage module 1502, and a display module 1503.
  • the processing module 1501 is configured to control and manage the actions of the terminal 1500.
  • the display module 1503 is configured to display an image generated by the processing module 1501.
  • the storage module 1502 is configured to store program codes and data of the terminal.
  • the storage module 1502 stores a preset wake-up word registered in the terminal and a first voiceprint model, where the first voiceprint model is used to perform voiceprint verification when the voice assistant is woken up, and the first The voiceprint model characterizes the voiceprint characteristics of the preset wake word.
  • the terminal 1500 may further include a communication module for supporting communication between the terminal and other network entities.
  • a communication module for supporting communication between the terminal and other network entities.
  • the processing module 1501 may be a processor or a controller.
  • the processing module 1501 may be a central processing unit (CPU), a general-purpose processor, a digital signal processor (DSP), and an application-specific integrated circuit (Application-Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA), or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It may implement or execute various exemplary logical blocks, modules, and circuits described in connection with the present disclosure.
  • the processor may also be a combination that implements computing functions, such as a combination including one or more microprocessors, a combination of a DSP and a microprocessor, and so on.
  • the communication module may be a transceiver, a transceiver circuit, or a communication interface.
  • the storage module 1502 may be a memory.
  • the processing module 1501 is a processor (such as the processor 310 shown in FIG. 3)
  • the communication module includes a Wi-Fi module and a Bluetooth module (such as the communication module 360 shown in FIG. 3).
  • Communication modules such as Wi-Fi modules and Bluetooth modules can be collectively referred to as communication interfaces.
  • the storage module 1502 is a memory (an internal memory 321 as shown in FIG. 3 and an external SD card connected to the terminal 1500 through the external memory interface 320).
  • the display module 1503 is a touch screen (including the display screen 394 shown in FIG. 3)
  • the terminal provided in this embodiment of the present application may be the terminal 300 shown in FIG. 3.
  • the processor, the communication interface, the touch screen, and the memory may be coupled together through a bus.
  • An embodiment of the present application further provides a computer storage medium.
  • the computer storage medium stores computer program code.
  • the processor executes the computer program code
  • the terminal executes FIG. 4A, FIG. 5A, FIG. 6, FIG. 7, and FIG. 8.
  • the relevant method steps in any of Figures 9, 11, and 12 implement the method in the above embodiment.
  • the embodiment of the present application also provides a computer program product, which causes the computer to execute FIG. 4A, FIG. 5A, FIG. 6, FIG. 7, FIG. 9, FIG. 11, FIG. 11 and FIG. 12 when the computer program product runs on the computer.
  • the relevant method steps in any of the figures implement the method in the above embodiments.
  • the terminal 1300, the terminal 1500, the computer storage medium, or the computer program product provided in the embodiment of the present application are all used to execute the corresponding methods provided above. Therefore, for the beneficial effects that can be achieved, refer to the foregoing provided. The beneficial effects in the corresponding method are not repeated here.
  • the disclosed apparatus and method may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the modules or units is only a logical function division.
  • multiple units or components may be divided.
  • the combination can either be integrated into another device, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may be one physical unit or multiple physical units, that is, may be located in one place, or may be distributed to multiple different places. . Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each of the units may exist separately physically, or two or more units may be integrated into one unit.
  • the above integrated unit may be implemented in the form of hardware or in the form of software functional unit.
  • the integrated unit When the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a readable storage medium.
  • the technical solution of the embodiments of the present application is essentially a part that contributes to the existing technology or all or part of the technical solution may be embodied in the form of a software product that is stored in a storage medium Included are several instructions for causing a device (which can be a single-chip microcomputer, a chip, etc.) or a processor to execute all or part of the steps of the method described in the embodiments of the present application.
  • the foregoing storage medium includes various media that can store program codes, such as a U disk, a mobile hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

Abstract

Disclosed are a method for updating a wake-up voice of a voice assistant by a terminal, and a terminal, wherein same relate to the technical field of voice control, and can update a wake-up word of a terminal in real time, thereby improving the voice wake-up rate of the terminal performing voice wake-up, and reducing the false wake-up rate. The specific solution is: a terminal receiving first voice data input by a user; the terminal determining whether the text corresponding to the first voice data matches the text of a preset wake-up word registered in the terminal; if the text corresponding to the first voice data matches the text of the preset wake-up word, the terminal performing identity authentication on the user; and if the identity authentication is passed, the terminal using the first voice data to update a first voiceprint model in the terminal, wherein the first voiceprint model is used for performing voiceprint verification when the voice assistant is woken up, and the first voiceprint model represents the voiceprint feature of the preset wake-up word.

Description

一种终端更新语音助手的唤醒语音的方法及终端Method and terminal for updating wake-up voice of voice assistant by terminal 技术领域Technical field
本申请实施例涉及语音控制技术领域,尤其涉及一种终端更新语音助手的唤醒语音的方法及终端。The embodiments of the present application relate to the technical field of voice control, and in particular, to a method and a terminal for updating a wake-up voice of a voice assistant by a terminal.
背景技术Background technique
语音助手是手机的一项重要应用。语音助手可以与用户进行智能对话和即时问答的智能交互。并且,语音助手还可以识别用户的语音命令,并使手机执行该语音命令对应的事件。例如,如果语音助手接收并识别用户输入的语音命令“向鲍勃(Bob)拨打电话”,手机则可以自动向联系人Bob拨打电话。Voice assistant is an important application for mobile phones. Voice assistants can intelligently interact with users for intelligent dialogue and instant Q & A. In addition, the voice assistant can also recognize the user's voice command and cause the mobile phone to execute the event corresponding to the voice command. For example, if the voice assistant receives and recognizes the voice command "make a call to Bob" input by the user, the mobile phone can automatically make a call to the contact Bob.
一般而言,语音助手处于休眠状态。用户想要使用语音助手前,可以对语音助手进行语音唤醒。在进行语音唤醒之前,用户需要在手机中注册用于唤醒语音助手的唤醒词(即唤醒语音)。手机可以根据用户输入的唤醒词生成可以表征该唤醒词的声纹特征的声纹模型。语音唤醒过程可以包括:手机通过低功耗的数字信号处理器(Digital Signal Processing,DSP)监测语音数据。当DSP监测到语音数据与上述唤醒词的相似度满足一定条件时,DSP将监测到的语音数据交给应用处理器(Application Processor,AP)。由AP对上述语音数据进行文本校验和声纹校验,以判断该语音数据是否与生成的声纹模型匹配。当语音数据与声纹模型匹配时,手机则可以开启语音助手。Generally speaking, the voice assistant is dormant. Before users want to use the voice assistant, they can wake up the voice assistant. Before performing the voice wake-up, the user needs to register a wake-up word (ie, wake-up voice) in the mobile phone to wake up the voice assistant. The mobile phone can generate a voiceprint model that can characterize the voiceprint of the wakeword according to the wakeword input by the user. The voice wake-up process may include: the mobile phone monitors voice data through a low-power digital signal processor (Digital Signal Processing, DSP). When the DSP detects that the similarity between the voice data and the awake word satisfies a certain condition, the DSP delivers the monitored voice data to an Application Processor (AP). The AP performs text verification and voiceprint verification on the voice data to determine whether the voice data matches the generated voiceprint model. When the voice data matches the voiceprint model, the phone can start the voice assistant.
其中,用户在手机中注册唤醒词后,很少会再重新注册(即更新)唤醒词。但是,手机中注册的唤醒词,只是用户在当时的身体状态下,于某一噪声场景中录制的语音数据。而用户身体状态的变化和用户所处噪声场景的变化都会对用户发出的语音数据产生影响。因此,当用户的身体状态和/或用户所处的噪声场景发生变化时,如果还是使用最初注册的唤醒词进行语音唤醒,则会降低手机的语音唤醒率,增大手机执行语音唤醒的误唤醒率。Among them, after a user registers a wake-up word in a mobile phone, the wake-up word is rarely re-registered (ie, updated). However, the wake-up words registered in the mobile phone are only the voice data recorded by the user in a certain noise scene under the current state of the body. Changes in the user's physical state and changes in the user's noise scene will affect the voice data sent by the user. Therefore, when the physical state of the user and / or the noise scene in which the user is present changes, if the wake-up word that is originally registered is still used for voice wake-up, the voice wake-up rate of the mobile phone will be reduced and the false wake-up of the mobile phone to perform voice wake-up rate.
发明内容Summary of the Invention
本申请实施例提供一种终端更新语音助手的唤醒语音的方法及终端,可以实时更新终端的唤醒语音,从而可以提高终端执行语音唤醒的语音唤醒率,降低误唤醒率。Embodiments of the present application provide a method and a terminal for updating a wake-up voice of a voice assistant by a terminal, which can update a wake-up voice of the terminal in real time, thereby improving a voice wake-up rate of the terminal performing a voice wake-up and reducing a false wake-up rate.
第一方面,本申请实施例提供一种终端更新语音助手的唤醒语音的方法。该方法可以包括:终端接收用户输入的第一语音数据;终端判断第一语音数据对应的文本与终端中注册的预置唤醒词的文本是否匹配;若第一语音数据对应的文本与预置唤醒词的文本匹配,则终端对用户进行身份认证。若身份认证通过,终端则采用第一语音数据更新终端中的第一声纹模型。其中,第一声纹模型用于在唤醒语音助手时进行声纹校验,第一声纹模型表征预置唤醒词的声纹特征。In a first aspect, an embodiment of the present application provides a method for a terminal to update a wake-up voice of a voice assistant. The method may include: the terminal receives the first voice data input by the user; the terminal judges whether the text corresponding to the first voice data matches the text of a preset wake-up word registered in the terminal; if the text corresponding to the first voice data matches the preset wake-up If the text of the word matches, the terminal authenticates the user. If the identity authentication is passed, the terminal uses the first voice data to update the first voiceprint model in the terminal. The first voiceprint model is used to perform voiceprint verification when the voice assistant is awakened, and the first voiceprint model represents the voiceprint characteristics of the preset wakeup word.
本申请实施例中,如果第一语音数据可以对应的文本与预置唤醒词的文本匹配,且用户身份认证通过,则表示第一语音数据是身份认证通过的用户发出的可以唤醒语音助手的唤醒语音。并且,由于第一语音数据是终端实时获取的用户的语音数据;因 此,第一语音数据可以反映用户的身体状态和/或用户所处的噪声场景的实时状况。综上所示,采用第一语音数据更新终端的声纹模型,可以提高终端执行语音唤醒的语音唤醒率,降低误唤醒率。In the embodiment of the present application, if the text corresponding to the first voice data matches the text of the preset wake-up word, and the user identity authentication is passed, it means that the first voice data is a wake-up of the voice assistant sent by the user who passed the identity authentication. voice. In addition, since the first voice data is user voice data acquired by the terminal in real time; therefore, the first voice data may reflect a user's physical state and / or a real-time condition of a noise scene in which the user is located. In summary, using the first voice data to update the voiceprint model of the terminal can increase the voice wake-up rate and reduce the false wake-up rate when the terminal performs voice wake-up.
进一步的,第一语音数据是终端在终端执行语音唤醒过程中自动获取的,而不是提示用户手动重新注册唤醒词后接收用户输入的。如此,采用第一语音数据更新声纹模型,还可以简化唤醒词更新的流程。Further, the first voice data is automatically acquired by the terminal during the voice wake-up process performed by the terminal, instead of prompting the user to manually re-register the wake-up word and receiving user input. In this way, using the first voice data to update the voiceprint model can also simplify the process of updating the wake word.
结合第一方面,在一种可能的设计方式中,终端对用户进行身份认证,具体为:终端使用第一声纹模型对第一语音数据进行声纹校验。其中,如果第一语音数据通过声纹校验,则表示身份认证通过。With reference to the first aspect, in a possible design manner, the terminal performs identity authentication on the user. Specifically, the terminal uses the first voiceprint model to perform voiceprint verification on the first voice data. If the first voice data passes the voiceprint verification, it means that the identity authentication is passed.
本申请实施例中,终端可以获取终端执行语音唤醒时,通过文本校验和声纹校验的第一语音数据。然后,采用该第一语音数据更新终端中的第一声纹模型。其中,由于第一语音数据是终端实时获取的用户的语音数据;因此,第一语音数据可以反映用户的身体状态和/或用户所处的噪声场景的实时状况。并且,由于第一语音数据通过了文本校验和声纹校验;因此,采用第一语音数据更新终端的声纹模型,可以提高终端执行语音唤醒的语音唤醒率,降低误唤醒率。In the embodiment of the present application, the terminal may obtain first voice data that passes text verification and voice print verification when the terminal performs voice wake-up. Then, the first voiceprint model in the terminal is updated by using the first voice data. The first voice data is user voice data acquired by the terminal in real time; therefore, the first voice data may reflect a user's physical state and / or a real-time condition of a noise scene in which the user is located. In addition, because the first voice data passes the text check and voiceprint check; therefore, updating the voiceprint model of the terminal by using the first voice data can improve the voice wake-up rate and reduce the false wake-up rate of the terminal performing voice wake-up.
结合第一方面,在另一种可能的设计方式中,如果第一语音数据通过声纹校验,终端可以启动语音助手。语音助手启动后,终端通过语音助手可能会接收到有效的语音命令,也可能不会接收到有效的语音命令。终端可以通过判断该终端是否接收到有效的语音命令,来决定是否采用第一语音数据更新第一声纹模型。具体的,本申请实施例的方法还包括:当身份认证通过时,终端启动语音助手;终端通过语音助手接收第二语音数据;终端确定第二语音数据为有效的语音命令。如此,在身份认证通过之后,如果终端确定第二语音数据为有效的语音命令,那么终端则可以采用第一语音数据更新所述终端中的第一声纹模型。With reference to the first aspect, in another possible design manner, if the first voice data passes the voiceprint verification, the terminal may start a voice assistant. After the voice assistant is started, the terminal may receive valid voice commands or may not receive valid voice commands through the voice assistant. The terminal may determine whether to use the first voice data to update the first voiceprint model by determining whether the terminal has received a valid voice command. Specifically, the method in the embodiment of the present application further includes: when identity authentication is passed, the terminal starts a voice assistant; the terminal receives the second voice data through the voice assistant; and the terminal determines that the second voice data is a valid voice command. In this way, after the identity authentication is passed, if the terminal determines that the second voice data is a valid voice command, the terminal may use the first voice data to update the first voiceprint model in the terminal.
其中,终端是在语音助手被启动后,接收到用于触发终端执行对应功能的有效的语音命令的情况下,才采用第一语音数据更新终端中的第一声纹模型的。如果终端的语音助手启动后,接收到有效的语音命令,则表示这次语音唤醒是与用户意图相符的有效语音唤醒。采用能够反映用户真实意图、并且可以成功唤醒终端的语音数据更新终端的声纹模型,可以进一步提高终端执行语音唤醒的语音唤醒率,降低误唤醒率。Wherein, the terminal uses the first voice data to update the first voiceprint model in the terminal only after the voice assistant is activated and receives a valid voice command for triggering the terminal to perform a corresponding function. If the terminal's voice assistant starts and receives a valid voice command, it means that the voice wake-up is a valid voice wake-up that matches the user's intention. The voiceprint model of the terminal is updated by using the voice data that can reflect the user's true intentions and can successfully wake up the terminal, which can further increase the voice wake-up rate of the terminal to perform voice wake-up and reduce the false wake-up rate.
结合第一方面,在另一种可能的设计方式中,终端包括协处理器和主处理器;终端使用协处理器监测语音数据;当协处理器监测到与预置唤醒词的相似度满足预设条件的所述第一语音数据时,通知主处理器判断第一语音数据对应的文本与终端预置唤醒词的文本是否匹配,在确定第一语音数据对应的文本与预置唤醒词的文本匹配时,主处理使用第一声纹模型对第一语音数据进行声纹校验。例如,协处理器为DSP,主处理器为AP。With reference to the first aspect, in another possible design manner, the terminal includes a coprocessor and a main processor; the terminal uses the coprocessor to monitor voice data; when the coprocessor detects that the similarity with the preset wake-up word satisfies the pre- When the first voice data is set, the main processor is notified to determine whether the text corresponding to the first voice data matches the text of the preset wake-up word of the terminal, and determines whether the text corresponding to the first voice data and the text of the preset wake-up word are determined. When matching, the main process uses the first voiceprint model to perform voiceprint verification on the first voice data. For example, the coprocessor is a DSP and the main processor is an AP.
结合第一方面,在另一种可能的设计方式中,在终端对用户进行身份认证之前,终端可以使用第一声纹模型对第一语音数据进行声纹校验;若第一语音数据未通过声纹校验,终端对第一预设时间内接收到的语音数据进行文本校验;如果终端在第一预设时间内接收到第二语音数据和至少一个与所述预置唤醒词的文本匹配的语音数据,终端对所述用户进行身份认证。其中,第二语音数据对应的文本包含预置的关键词。 例如,第二语音数据可以为用户抱怨语音唤醒失败的语音数据,如“怎么唤不醒”、“怎么不行”、“不响应”、“不能唤醒”和“语音唤醒故障了”等语音数据。With reference to the first aspect, in another possible design manner, before the terminal authenticates the user, the terminal may use the first voiceprint model to perform voiceprint verification on the first voice data; if the first voice data fails, Voiceprint verification, the terminal performs text verification on the voice data received within the first preset time; if the terminal receives the second voice data and at least one text with the preset wake-up word within the first preset time For the matched voice data, the terminal authenticates the user. The text corresponding to the second voice data includes a preset keyword. For example, the second voice data may be voice data in which the user complains that the voice wake-up fails, such as "how to wake up", "how not", "not responding", "unable to wake up", and "voice wake up failed".
其中,如果终端接收到第一语音数据后,发现第一语音数据声纹校验未通过。随后,终端在第一预设时间内可以接收到至少一个文本校验通过的语音数据,则表示用户多次想要语音唤醒终端的语音助手,但是语音唤醒失败。这种情况下,如果终端在第一预设时间内还接收到第二语音数据,则表示用户对语音唤醒失败的结果不满。终端在第一预设时间内接收到第二语音数据和至少一个文本校验通过的语音数据,表示用户存在语音唤醒语音助手的强烈意愿;但是,可能因为用户当前身体状态与用户注册唤醒词时的身体状态的差异较大,导致多次语音失败。由于接收到的第一语音数据是用户在有语音唤醒终端的语音助手的强烈意愿下,发出的用于语音唤醒语音助手的语音数据。因此,采用能够反映用户真实意图的语音数据更新终端的声纹模型,可以进一步提高终端执行语音唤醒的语音唤醒率,降低误唤醒率。Wherein, after receiving the first voice data, the terminal finds that the voiceprint verification of the first voice data fails. Subsequently, the terminal can receive at least one voice data that passes the text verification within the first preset time, which means that the user repeatedly wants to voice wake up the voice assistant of the terminal, but the voice wake up fails. In this case, if the terminal also receives the second voice data within the first preset time, it indicates that the user is dissatisfied with the result of the voice wake-up failure. The terminal receives the second voice data and at least one voice data that has passed the text verification within the first preset time, indicating that the user has a strong willingness to wake up the voice assistant by voice; however, it may be because the user's current physical state and the user registered the wake word The difference in the physical state of the body is large, resulting in multiple speech failures. Because the received first voice data is voice data sent by the user for voice wake-up of the voice assistant under the strong will of the voice assistant of the voice wake-up terminal. Therefore, updating the voiceprint model of the terminal with voice data that can reflect the user's true intention can further increase the voice wake-up rate and reduce the false wake-up rate when the terminal performs voice wake-up.
并且,由于第一语音数据是终端实时获取的用户的语音数据;因此,第一语音数据可以反映用户的身体状态和/或用户所处的噪声场景的实时状况。因此,采用第一语音数据更新终端的声纹模型,可以提高终端执行语音唤醒的语音唤醒率,降低误唤醒率。进一步的,接收到的第一语音数据是终端在终端执行语音唤醒过程中自动获取的,而不是提示用户手动重新注册唤醒词后接收用户输入的。如此,采用接收到的第一语音数据更新声纹模型,还可以简化唤醒词更新的流程。In addition, since the first voice data is voice data of the user acquired by the terminal in real time; therefore, the first voice data may reflect a user's physical state and / or a real-time condition of a noise scene in which the user is located. Therefore, using the first voice data to update the voiceprint model of the terminal can improve the voice wake-up rate and reduce the false wake-up rate when the terminal performs voice wake-up. Further, the received first voice data is obtained automatically by the terminal during the voice wake-up process performed by the terminal, instead of prompting the user to manually re-register the wake-up word and receiving user input. In this way, updating the voiceprint model by using the received first voice data can also simplify the process of updating the wake word.
结合第一方面,在另一种可能的设计方式中,终端对用户进行身份认证,包括:终端显示身份验证界面;终端接收用户在身份验证界面输入的身份验证信息;终端根据身份验证信息对用户进行用户身份验证。With reference to the first aspect, in another possible design manner, the terminal authenticates the user, including: the terminal displays an authentication interface; the terminal receives the authentication information entered by the user on the authentication interface; the terminal authenticates the user based on the authentication information Perform user authentication.
结合第一方面,在另一种可能的设计方式中,终端包括协处理器和主处理器;终端使用协处理器监测语音数据;当协处理器监测到与预置唤醒词的相似度满足预设条件的第一语音数据时,通知主处理器判断第一语音数据对应的文本与终端预置唤醒词的文本是否匹配,在确定第一语音数据对应的文本与预置唤醒词的文本匹配时,主处理使用第一声纹模型对第一语音数据进行声纹校验。终端使用协处理器监测第一预设时间内的语音数据;通知主处理器判断第一预设时间内接收到的语音数据是否包括第二语音数据和至少一个与预置唤醒词的文本匹配的语音数据,第二语音数据对应的文本包含预置的关键词。例如,协处理器为DSP,主处理器为AP。With reference to the first aspect, in another possible design manner, the terminal includes a coprocessor and a main processor; the terminal uses the coprocessor to monitor voice data; when the coprocessor detects that the similarity with the preset wake-up word satisfies the pre- When the first voice data is set, the main processor is notified to determine whether the text corresponding to the first voice data matches the text of the preset wake-up word of the terminal, and when it is determined that the text corresponding to the first voice data matches the text of the preset wake-up word The main process uses the first voiceprint model to perform voiceprint verification on the first voice data. The terminal uses the coprocessor to monitor the voice data in the first preset time; and notifies the main processor to determine whether the voice data received in the first preset time includes the second voice data and at least one that matches the text of the preset wake-up word. The voice data, and the text corresponding to the second voice data contains preset keywords. For example, the coprocessor is a DSP and the main processor is an AP.
结合第一方面,在另一种可能的设计方式中,上述预置唤醒词包括至少两个注册语音数据,至少两个注册语音数据是所述终端注册预置唤醒词时录制的,第一声纹模型是根据至少两个注册语音数据生成的。终端根据第一语音数据生成新的声纹模型后,如果直接采用新的声纹模型替换第一声纹模型,虽然可以提升终端执行语音唤醒的语音唤醒率。但是,直接采用根据新的语音数据(即第一语音数据)生成的声纹模型替换第一声纹模型会大幅度提升语音唤醒率。而大幅度提升语音唤醒率,可能会相应的提高终端执行语音唤醒的误唤醒率。With reference to the first aspect, in another possible design manner, the preset wake-up word includes at least two registered voice data, and at least two of the registered voice data are recorded when the terminal registers the preset wake-up word, the first sound The pattern is generated based on at least two registered speech data. After the terminal generates a new voiceprint model according to the first voice data, if the first voiceprint model is directly replaced with the new voiceprint model, the voice wakeup rate of the terminal performing voice wakeup can be improved. However, directly replacing the first voiceprint model with a voiceprint model generated based on the new voice data (ie, the first voice data) will greatly improve the voice wake-up rate. And greatly increasing the voice wake-up rate may correspondingly increase the false wake-up rate of the terminal performing voice wake-up.
为了可以在稳定提升终端的语音唤醒率的同时,降低终端执行语音唤醒的误唤醒率。上述终端采用第一语音数据更新终端中的第一声纹模型的方法可以包括:终端采用第一语音数据,替换至少两个注册语音数据中的第三语音数据,得到更新后的至少 两个注册语音数据,第三语音数据的信号质量参数低于至少两个注册语音数据中其他语音数据的信号质量参数;终端根据更新后的至少两个注册语音数据,生成第二声纹模型;终端采用第二声纹模型替换第一声纹模型。第二声纹模型用于表征更新后的至少两个注册语音数据的声纹特征。In order to stably increase the voice wake-up rate of the terminal, and reduce the false wake-up rate of the voice wake-up performed by the terminal. The method for the terminal to update the first voiceprint model in the terminal by using the first voice data may include: the terminal uses the first voice data to replace the third voice data in the at least two registered voice data to obtain at least two updated registrations The signal quality parameters of the voice data and the third voice data are lower than the signal quality parameters of other voice data in the at least two registered voice data; the terminal generates a second voiceprint model according to the updated at least two registered voice data; the terminal uses the first The two voiceprint models replace the first voiceprint model. The second voiceprint model is used to characterize the voiceprint features of the at least two registered voice data after the update.
本申请实施例中,终端采用第一语音数据替换至少两个注册语音数据中的部分语音数据,如第三语音数据;而不是完全根据第一语音数据生成第二声纹模型。这样,可以较为稳定的提升终端执行语音唤醒的语音唤醒率。并且,可以在稳定提升终端的语音唤醒率的同时,降低终端执行语音唤醒的误唤醒率。In the embodiment of the present application, the terminal uses the first voice data to replace part of the voice data in the at least two registered voice data, such as the third voice data; instead of generating the second voiceprint model completely based on the first voice data. In this way, the voice wake-up rate of the terminal performing voice wake-up can be relatively stably improved. In addition, while stably increasing the voice wake-up rate of the terminal, the false wake-up rate of the terminal performing voice wake-up can be reduced.
结合第一方面,在另一种可能的设计方式中,如果终端根据第二声纹模型生成的第二声纹门限与第一声纹门限相差较大,则会导致终端执行语音唤醒的唤醒率大幅度波动,影响用户体验。基于此,终端可以根据第二声纹模型和更新后的至少两个注册语音数据,生成第二声纹门限;如果第二声纹门限与第一声纹门限的差值小于第一预设阈值,终端才会采用第二声纹模型替换第一声纹模型。With reference to the first aspect, in another possible design manner, if the second voiceprint threshold generated by the terminal according to the second voiceprint model is significantly different from the first voiceprint threshold, the wakeup rate of the terminal performing voice wakeup will be caused Large fluctuations affect user experience. Based on this, the terminal may generate a second voiceprint threshold according to the second voiceprint model and the updated at least two registered voice data; if the difference between the second voiceprint threshold and the first voiceprint threshold is less than the first preset threshold , The terminal will replace the first voiceprint model with the second voiceprint model.
其中,终端在第二声纹门限与第一声纹门限的变化较大时,可以删除第二声纹模型和第一语音数据,即不采用第一声纹模型替换第二声纹模型。这样,可以避免由于第二声纹门限与第一声纹门限的差值较大,导致终端执行语音唤醒的唤醒率大幅度波动,影响用户体验。Wherein, when the change between the second voiceprint threshold and the first voiceprint threshold is large, the terminal may delete the second voiceprint model and the first voice data, that is, the first voiceprint model is not used to replace the second voiceprint model. In this way, the large difference between the second voiceprint threshold and the first voiceprint threshold can prevent the wake-up rate of the terminal from performing a voice wakeup to fluctuate greatly, affecting the user experience.
结合第一方面,在另一种可能的设计方式中,为了避免终端采用信号质量较差的语音数据更新第一声纹模型,终端在采用第一语音数据更新第一声纹模型之前,可以先判断第一语音数据的信号质量参数是否高于第二预设阈值。其中,语音数据的信号质量参数用于表征语音数据的信号质量的高低。例如,语音数据的信号质量参数可以为语音数据的信噪比。如果第一语音数据的信号质量参数高于第二预设阈值,则表示第一语音数据的信号质量比较高。在这种情况下,终端可以采用第一语音数据更新第一声纹模型。如果第一语音数据的信号质量参数低于或者等于第二预设阈值,终端则可以删除上述第一语音数据。With reference to the first aspect, in another possible design manner, in order to prevent the terminal from updating the first voiceprint model with voice data with poor signal quality, the terminal may first update the first voiceprint model with the first voice data. It is determined whether a signal quality parameter of the first voice data is higher than a second preset threshold. The signal quality parameters of the voice data are used to characterize the signal quality of the voice data. For example, the signal quality parameter of the voice data may be a signal-to-noise ratio of the voice data. If the signal quality parameter of the first voice data is higher than the second preset threshold, it means that the signal quality of the first voice data is relatively high. In this case, the terminal may update the first voiceprint model by using the first voice data. If the signal quality parameter of the first voice data is lower than or equal to the second preset threshold, the terminal may delete the first voice data.
第二方面,本申请实施例提供一种终端,该终端包括:存储单元、输入单元、文本校验单元、身份认证单元和更新单元。其中,存储单元中保存有终端中注册的预置唤醒词,以及第一声纹模型。第一声纹模型用于在唤醒语音助手时进行声纹校验,所述第一声纹模型表征预置唤醒词的声纹特征。输入单元,用于接收用户输入的第一语音数据。文本校验单元,用于判断第一语音数据对应的文本与终端中注册的预置唤醒词的文本是否匹配。身份认证单元,用于若文本校验单元确定第一语音数据对应的文本与预置唤醒词的文本匹配,则对用户进行身份认证。更新单元,用于若身份认证单元确定身份认证通过,终端则采用第一语音数据更新终端中的第一声纹模型。In a second aspect, an embodiment of the present application provides a terminal. The terminal includes a storage unit, an input unit, a text verification unit, an identity authentication unit, and an update unit. The storage unit stores a preset wake-up word registered in the terminal and a first voiceprint model. The first voiceprint model is used to perform voiceprint verification when the voice assistant is awakened, and the first voiceprint model represents the voiceprint characteristics of a preset wakeup word. The input unit is configured to receive first voice data input by a user. The text verification unit is configured to determine whether the text corresponding to the first voice data matches the text of a preset wake-up word registered in the terminal. An identity authentication unit is configured to authenticate the user if the text verification unit determines that the text corresponding to the first voice data matches the text of the preset wake-up word. The updating unit is configured to: if the identity authentication unit determines that the identity authentication is passed, the terminal uses the first voice data to update the first voiceprint model in the terminal.
结合第二方面,在一种可能的设计方式中,身份认证单元,具体用于:使用第一声纹模型对第一语音数据进行声纹校验,若通过声纹校验,则身份认证通过。With reference to the second aspect, in a possible design manner, the identity authentication unit is specifically configured to: use the first voiceprint model to perform voiceprint verification on the first voice data; if the voiceprint verification is passed, the identity authentication passes .
结合第二方面,在另一种可能的设计方式中,终端还包括:启动单元和确定单元。启动单元,用于当身份认证单元确定身份认证通过时,启动语音助手。输入单元,还用于通过语音助手接收第二语音数据。确定单元,用于在身份认证单元身份认证通过之后,确定输入单元接收的第二语音数据为有效的语音命令。更新单元,用于在确定 单元确定第二语音数据为有效的语音命令后,采用第一语音数据更新第一声纹模型。With reference to the second aspect, in another possible design manner, the terminal further includes: a starting unit and a determining unit. The starting unit is configured to start the voice assistant when the identity authentication unit determines that the identity authentication is passed. The input unit is further configured to receive the second voice data through a voice assistant. The determining unit is configured to determine that the second voice data received by the input unit is a valid voice command after the identity authentication unit passes the identity authentication. An updating unit is configured to update the first voiceprint model with the first voice data after the determining unit determines that the second voice data is a valid voice command.
结合第二方面,在另一种可能的设计方式中,终端还包括:声纹校验单元。声纹校验单元,用于在身份认证单元对用户进行身份认证之前,使用第一声纹模型对第一语音数据进行声纹校验。文本校验单元,还用于若声纹校验单元确定第一语音数据未通过声纹校验,对输入单元在第一预设时间内接收到的语音数据进行文本校验。身份认证单元,具体用于:如果文本校验单元确定输入单元在第一预设时间内接收到第二语音数据和至少一个与预置唤醒词的文本匹配的语音数据,对用户进行身份认证。其中,第二语音数据对应的文本包含预置的关键词。With reference to the second aspect, in another possible design manner, the terminal further includes: a voiceprint verification unit. The voiceprint verification unit is configured to perform voiceprint verification on the first voice data using the first voiceprint model before the identity authentication unit authenticates the user. The text verification unit is further configured to perform text verification on the voice data received by the input unit within a first preset time if the voiceprint verification unit determines that the first voice data fails the voiceprint verification. The identity authentication unit is specifically configured to: if the text verification unit determines that the input unit receives the second voice data and at least one voice data that matches the text of the preset wake-up word within the first preset time, authenticate the user. The text corresponding to the second voice data includes a preset keyword.
结合第二方面,在另一种可能的设计方式中,上述终端还包括:显示单元。显示单元,用于如果文本校验单元确定输入单元在第一预设时间内接收到第二语音数据和至少一个与预置唤醒词的文本匹配的语音数据,则显示身份验证界面。输入单元,还用于接收用户在显示单元显示的身份验证界面输入的身份验证信息。身份认证单元,具体用于根据输入单元接收的身份验证信息对用户进行用户身份验证。With reference to the second aspect, in another possible design manner, the foregoing terminal further includes: a display unit. The display unit is configured to display an authentication interface if the text verification unit determines that the input unit receives the second voice data and at least one voice data that matches the text of the preset wake-up word within the first preset time. The input unit is further configured to receive authentication information input by a user on an authentication interface displayed on the display unit. The identity authentication unit is specifically configured to perform user identity verification on the user according to the identity verification information received by the input unit.
结合第二方面,在另一种可能的设计方式中,上述预置唤醒词包括至少两个注册语音数据,至少两个注册语音数据是终端注册预置唤醒词时录制的,第一声纹模型是根据至少两个注册语音数据生成的。终端还包括:替换单元和生成单元。替换单元,用于采用第一语音数据,替换至少两个注册语音数据中的第三语音数据,得到更新后的至少两个注册语音数据,第三语音数据的信号质量参数低于至少两个注册语音数据中其他语音数据的信号质量参数。生成单元,用于根据替换单元得到的更新后的至少两个注册语音数据,生成第二声纹模型。更新单元,用于采用生成单元生成的第二声纹模型替换第一声纹模型,第二声纹模型用于表征更新后的至少两个注册语音数据的声纹特征。With reference to the second aspect, in another possible design manner, the preset wake-up word includes at least two registered voice data, at least two of the registered voice data are recorded when the terminal registers the preset wake-up word, and the first voiceprint model It is generated based on at least two registered voice data. The terminal also includes: a replacement unit and a generation unit. The replacement unit is configured to replace the third voice data of the at least two registered voice data with the first voice data to obtain the updated at least two registered voice data. The signal quality parameter of the third voice data is lower than the at least two registered voice data. Signal quality parameters of other voice data in the voice data. A generating unit is configured to generate a second voiceprint model according to the updated at least two registered voice data obtained by the replacement unit. The updating unit is configured to replace the first voiceprint model with the second voiceprint model generated by the generating unit, and the second voiceprint model is used to characterize the voiceprint features of the updated at least two registered voice data.
结合第二方面,在另一种可能的设计方式中,存储单元用于保存了第一声纹门限,第一声纹门限是生成单元根据第一声纹模型和至少两个注册语音数据生成的。生成单元,还用于在生成第二声纹模型之后,更新单元采用第二声纹模型替换第一声纹模型之前,根据第二声纹模型和更新后的至少两个注册语音数据,生成第二声纹门限;With reference to the second aspect, in another possible design manner, the storage unit is configured to save a first voiceprint threshold, and the first voiceprint threshold is generated by the generating unit according to the first voiceprint model and at least two registered voice data. . The generating unit is further configured to generate a second voiceprint model, and before the updating unit replaces the first voiceprint model with the second voiceprint model, generate a first voiceprint model according to the second voiceprint model and the updated at least two registered voice data. Two voiceprint thresholds;
更新单元,具体用于如果生成单元生成的第二声纹门限与第一声纹门限的差值小于第一预设阈值,采用第二声纹模型替换第一声纹模型。The updating unit is specifically configured to use a second voiceprint model to replace the first voiceprint model if the difference between the second voiceprint threshold and the first voiceprint threshold generated by the generating unit is less than the first preset threshold.
结合第二方面,在另一种可能的设计方式中,上述终端还包括:删除单元。删除单元,用于如果生成单元生成的第二声纹门限与第一声纹门限的差值大于或等于第一预设阈值,删除第二声纹模型和第一语音数据。With reference to the second aspect, in another possible design manner, the foregoing terminal further includes: a deleting unit. The deleting unit is configured to delete the second voiceprint model and the first voice data if the difference between the second voiceprint threshold and the first voiceprint threshold generated by the generating unit is greater than or equal to the first preset threshold.
结合第二方面,在另一种可能的设计方式中,上述更新单元,具体用于如果第一语音数据的信号质量参数高于第二预设阈值,采用第一语音数据更新第一声纹模型。其中,第一语音数据的信号质量参数包括第一语音数据的信噪比。With reference to the second aspect, in another possible design manner, the update unit is specifically configured to update the first voiceprint model by using the first voice data if the signal quality parameter of the first voice data is higher than a second preset threshold. . The signal quality parameter of the first voice data includes a signal-to-noise ratio of the first voice data.
第三方面,本申请实施例提供一种终端,该终端可以包括:处理器、存储器和显示器。存储器、显示器与处理器耦合。显示器用于显示处理器生成的图像。存储器用于存储计算机程序代码、语音助手的相关信息、终端中注册的预置唤醒词和第一声纹模型。计算机程序代码包括计算机指令,当处理器执行上述计算机指令时,处理器,用于接收用户输入的第一语音数据;判断第一语音数据对应的文本与预置唤醒词的文 本是否匹配;若第一语音数据对应的文本与预置唤醒词的文本匹配,则对用户进行身份认证;若身份认证通过,则采用第一语音数据更新第一声纹模型。其中,第一声纹模型用于在唤醒语音助手时进行声纹校验,第一声纹模型表征预置唤醒词的声纹特征。In a third aspect, an embodiment of the present application provides a terminal. The terminal may include a processor, a memory, and a display. The memory, display and processor are coupled. The display is used to display images generated by the processor. The memory is used to store computer program code, related information of the voice assistant, preset wake-up words registered in the terminal, and the first voiceprint model. The computer program code includes computer instructions. When the processor executes the computer instructions, the processor is configured to receive the first voice data input by the user; determine whether the text corresponding to the first voice data matches the text of the preset wake-up word; The text corresponding to a voice data matches the text of the preset wake-up word, and then the user is authenticated. If the authentication is passed, the first voice data model is updated by using the first voice data. The first voiceprint model is used to perform voiceprint verification when the voice assistant is awakened, and the first voiceprint model represents the voiceprint characteristics of the preset wakeup word.
结合第三方面,在一种可能的设计方式中,上述处理器,还可以用于使用第一声纹模型对第一语音数据进行声纹校验。其中,若通过声纹校验,则身份认证通过。With reference to the third aspect, in a possible design manner, the foregoing processor may be further configured to perform voiceprint verification on the first voice data using the first voiceprint model. Among them, if the voiceprint verification is passed, the identity authentication is passed.
结合第三方面,在另一种可能的设计方式中,上述处理器,还可以用于当身份认证通过时,启动语音助手;通过语音助手接收第二语音数据;在身份认证通过之后,确定第二语音数据为有效的语音命令。在确定第二语音数据为有效的语音命令后,采用第一语音数据更新终端中的第一声纹模型。With reference to the third aspect, in another possible design manner, the foregoing processor may also be used to start a voice assistant when the identity authentication is passed; receive the second voice data through the voice assistant; after the identity authentication is passed, determine the first The second voice data is a valid voice command. After determining that the second voice data is a valid voice command, the first voice data model in the terminal is updated with the first voice data.
结合第三方面,在另一种可能的设计方式中,上述处理器包括协处理器和主处理器;该协处理器语音监测语音数据;当协处理器监测到与预置唤醒词的相似度满足预设条件的第一语音数据时,通知主处理器判断第一语音数据对应的文本与终端预置唤醒词的文本是否匹配,在确定第一语音数据对应的文本与预置唤醒词的文本匹配时,主处理使用第一声纹模型对第一语音数据进行声纹校验。With reference to the third aspect, in another possible design manner, the processor includes a coprocessor and a main processor; the coprocessor voice monitors voice data; and when the coprocessor monitors the similarity with the preset wake word When the first voice data that meets the preset conditions is notified, the main processor is notified to determine whether the text corresponding to the first voice data matches the text of the preset wake-up word of the terminal, and determines that the text corresponding to the first voice data and the text of the preset wake-up word When matching, the main process uses the first voiceprint model to perform voiceprint verification on the first voice data.
结合第三方面,在另一种可能的设计方式中,上述处理器,还用于在对用户进行身份认证之前,使用第一声纹模型对第一语音数据进行声纹校验;若第一语音数据未通过声纹校验,对第一预设时间内接收到的语音数据进行文本校验;如果在第一预设时间内接收到第二语音数据和至少一个与预置唤醒词的文本匹配的语音数据,对用户进行身份认证。其中,第二语音数据对应的文本包含预置的关键词。With reference to the third aspect, in another possible design manner, the processor is further configured to perform voiceprint verification on the first voice data using the first voiceprint model before performing user identity authentication; if the first The voice data does not pass the voiceprint verification. Text verification is performed on the voice data received within the first preset time; if the second voice data and at least one text with the preset wake-up word are received within the first preset time Match the voice data to authenticate the user. The text corresponding to the second voice data includes a preset keyword.
结合第三方面,在另一种可能的设计方式中,上述处理器,还用于如果在第一预设时间内接收到第二语音数据和至少一个与预置唤醒词的文本匹配的语音数据,则控制显示器显示身份验证界面。处理器,还用于接收用户在显示器显示的身份验证界面输入的身份验证信息;根据身份验证信息对用户进行用户身份验证。With reference to the third aspect, in another possible design manner, the processor is further configured to, if the second voice data is received within the first preset time and at least one voice data that matches the text of the preset wake-up word , Then control the display to display the authentication interface. The processor is further configured to receive the authentication information input by the user on the authentication interface displayed on the display; and perform user authentication on the user according to the authentication information.
结合第三方面,在另一种可能的设计方式中,上述处理器包括协处理器和主处理器;协处理器监测语音数据;当协处理器监测到与预置唤醒词的相似度满足预设条件的第一语音数据时,通知主处理器判断第一语音数据对应的文本与终端预置唤醒词的文本是否匹配,在确定第一语音数据对应的文本与预置唤醒词的文本匹配时,主处理使用第一声纹模型对第一语音数据进行声纹校验。协处理器监测第一预设时间内的语音数据;通知主处理器判断第一预设时间内接收到的语音数据是否包括第二语音数据和至少一个与预置唤醒词的文本匹配的语音数据,第二语音数据对应的文本包含预置的关键词。With reference to the third aspect, in another possible design manner, the foregoing processor includes a coprocessor and a main processor; the coprocessor monitors voice data; when the coprocessor detects that the similarity with the preset wake-up word satisfies the pre- When the conditional first voice data is set, the main processor is notified to determine whether the text corresponding to the first voice data matches the text of the preset wake-up word of the terminal. The main process uses the first voiceprint model to perform voiceprint verification on the first voice data. The coprocessor monitors the voice data in the first preset time; notifies the main processor to determine whether the voice data received in the first preset time includes the second voice data and at least one voice data that matches the text of the preset wake-up word The text corresponding to the second voice data contains a preset keyword.
结合第三方面,在另一种可能的设计方式中,上述存储器保存的预置唤醒词包括至少两个注册语音数据,至少两个注册语音数据是处理器注册预置唤醒词时录制的,第一声纹模型是处理器根据至少两个注册语音数据生成的。上述处理器,还用于采用第一语音数据,替换至少两个注册语音数据中的第三语音数据,得到更新后的至少两个注册语音数据,第三语音数据的信号质量参数低于至少两个注册语音数据中其他语音数据的信号质量参数;根据更新后的至少两个注册语音数据,生成第二声纹模型;采用第二声纹模型替换第一声纹模型,第二声纹模型用于表征更新后的至少两个注册语音数据的声纹特征。With reference to the third aspect, in another possible design manner, the preset wake-up word stored in the memory includes at least two registered voice data, and at least two of the registered voice data are recorded when the processor registers the preset wake-up word. A voiceprint model is generated by the processor based on at least two registered voice data. The processor is further configured to use the first voice data to replace the third voice data in the at least two registered voice data to obtain updated at least two registered voice data, and the signal quality parameter of the third voice data is lower than at least two Signal quality parameters of other voice data in the registered voice data; generating a second voiceprint model based on the updated at least two registered voice data; replacing the first voiceprint model with the second voiceprint model, and using the second voiceprint model with To characterize the voiceprint features of the at least two registered voice data after the update.
结合第三方面,在另一种可能的设计方式中,上述存储器中还保存了第一声纹门限,第一声纹门限是处理器根据第一声纹模型和至少两个注册语音数据生成的。上述处理器,还用于在生成第二声纹模型之后,采用第二声纹模型替换第一声纹模型之前,根据第二声纹模型和更新后的至少两个注册语音数据,生成第二声纹门限;如果第二声纹门限与第一声纹门限的差值小于第一预设阈值,采用第二声纹模型替换第一声纹模型。With reference to the third aspect, in another possible design manner, a first voiceprint threshold is also stored in the memory, and the first voiceprint threshold is generated by the processor according to the first voiceprint model and at least two registered voice data. . The processor is further configured to generate a second voiceprint model and replace the first voiceprint model with the second voiceprint model, and generate a second voiceprint model according to the second voiceprint model and the updated at least two registered voice data. Voiceprint threshold; if the difference between the second voiceprint threshold and the first voiceprint threshold is less than the first preset threshold, the second voiceprint model is used to replace the first voiceprint model.
结合第三方面,在另一种可能的设计方式中,上述处理器,还用于如果第二声纹门限与第一声纹门限的差值大于或等于第一预设阈值,删除第二声纹模型和第一语音数据。With reference to the third aspect, in another possible design manner, the processor is further configured to delete the second voice if the difference between the second voiceprint threshold and the first voiceprint threshold is greater than or equal to the first preset threshold. Pattern and first speech data.
结合第三方面,在另一种可能的设计方式中,上述处理器,还用于如果第一语音数据的信号质量参数高于第二预设阈值,采用第一语音数据更新第一声纹模型。其中,第一语音数据的信号质量参数包括第一语音数据的信噪比。With reference to the third aspect, in another possible design manner, the processor is further configured to update the first voiceprint model by using the first voice data if the signal quality parameter of the first voice data is higher than a second preset threshold. . The signal quality parameter of the first voice data includes a signal-to-noise ratio of the first voice data.
第四方面,本申请实施例提供一种计算机存储介质,所述计算机存储介质包括计算机指令,当所述计算机指令在终端上运行时,使得所述终端执行如第一方面及其任一种可能的设计方式所述的方法。In a fourth aspect, an embodiment of the present application provides a computer storage medium, where the computer storage medium includes computer instructions, and when the computer instructions are run on a terminal, the terminal is caused to execute the same as the first aspect and any of the possibilities. Designed in the way described.
第五方面,本申请实施例提供一种计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如第一方面及其任一种可能的设计方式所述的方法。In a fifth aspect, an embodiment of the present application provides a computer program product, and when the computer program product runs on a computer, the computer is caused to execute the method according to the first aspect and any one of possible design manners.
另外,第二方面和第三方面及其任一种设计方式所述的终端,以及第四方面所述的计算机存储介质、第五方面所述的计算机程序产品所带来的技术效果可参见上述第一方面及其不同设计方式所带来的技术效果,此处不再赘述。In addition, for the technical effects brought by the terminals described in the second and third aspects and any one of the design methods, the computer storage medium described in the fourth aspect, and the computer program product described in the fifth aspect, refer to the foregoing. The technical effects brought by the first aspect and its different design methods are not repeated here.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为本申请实施例提供的一种终端的显示界面实例示意图一;FIG. 1 is a first schematic diagram of a display interface example of a terminal according to an embodiment of the present application; FIG.
图2为本申请实施例提供的一种终端的显示界面实例示意图二;FIG. 2 is a second schematic diagram of a display interface example of a terminal according to an embodiment of the present application; FIG.
图3为本申请实施例提供的一种终端的硬件结构的组成示意图;FIG. 3 is a schematic structural diagram of a hardware structure of a terminal according to an embodiment of the present application; FIG.
图4A为本申请实施例提供的一种终端更新语音助手的唤醒语音的方法流程图一;4A is a first flowchart of a method for a terminal to update a wake-up voice of a voice assistant according to an embodiment of the present application;
图4B为本申请实施例提供的一种终端的显示界面实例示意图三;FIG. 4B is a third schematic diagram of a display interface example of a terminal according to an embodiment of the present application; FIG.
图5A为本申请实施例提供的一种终端更新语音助手的唤醒语音的方法流程图二;5A is a second flowchart of a method for a terminal to update a wake-up voice of a voice assistant according to an embodiment of the present application;
图5B为本申请实施例提供的一种终端的显示界面实例示意图四;FIG. 5B is a fourth schematic view of an example of a display interface of a terminal according to an embodiment of the present application; FIG.
图6为本申请实施例提供的一种终端更新语音助手的唤醒语音的方法流程图三;6 is a third flowchart of a method for a terminal to update a wake-up voice of a voice assistant according to an embodiment of the present application;
图7为本申请实施例提供的一种终端更新语音助手的唤醒语音的方法流程图四;7 is a fourth flowchart of a method for a terminal to update a wake-up voice of a voice assistant according to an embodiment of the present application;
图8为本申请实施例提供的一种终端更新语音助手的唤醒语音的方法流程图五;8 is a fifth flowchart of a method for a terminal to update a wake-up voice of a voice assistant according to an embodiment of the present application;
图9为本申请实施例提供的一种终端更新语音助手的唤醒语音的方法流程图六;9 is a flowchart of a method for a terminal to update a wake-up voice of a voice assistant according to an embodiment of the present application;
图10为本申请实施例提供的一种终端的显示界面实例示意图五;FIG. 10 is a fifth schematic diagram of a display interface example of a terminal according to an embodiment of the present application; FIG.
图11为本申请实施例提供的一种终端更新语音助手的唤醒语音的方法流程图七;11 is a flowchart VII of a method for a terminal to update a wake-up voice of a voice assistant according to an embodiment of the present application;
图12为本申请实施例提供的一种终端更新语音助手的唤醒语音的方法流程图八;FIG. 12 is a flowchart of a method for updating a wake-up voice of a voice assistant provided by a terminal according to an embodiment of the present application;
图13为本申请实施例提供的一种终端的结构组成示意图一;FIG. 13 is a first schematic structural composition diagram of a terminal according to an embodiment of the present application; FIG.
图14为本申请实施例提供的一种终端的结构组成示意图二;14 is a second schematic diagram of the structure and composition of a terminal according to an embodiment of the present application;
图15为本申请实施例提供的一种终端的结构组成示意图三。FIG. 15 is a third schematic structural diagram of a terminal according to an embodiment of the present application.
具体实施方式detailed description
本申请实施例提供一种终端更新语音助手的唤醒语音的方法及终端,可以应用于终端响应于用户输入的语音数据,执行语音唤醒的过程中。Embodiments of the present application provide a method and a terminal for updating a wake-up voice of a voice assistant by a terminal, which can be applied to a process in which the terminal performs a voice wake-up in response to voice data input by a user.
其中,在执行语音唤醒之前,终端可以接收用户注册的预置唤醒词。该预置唤醒词用于唤醒终端中的语音助手,以便于终端可以通过语音助手为用户提供语音控制服务。本申请实施例中所述的唤醒语音助手是指,终端响应于用户发出的语音数据,启动语音助手。语音控制服务是指:终端的语音助手启动后,用户可以通过向语音助手发出语音命令(即语音数据),来触发终端执行相应的事件。其中,本申请实施例中的预置唤醒词是一段语音数据。该语音数据是用于唤醒语音助手的唤醒语音。Before performing the voice wake-up, the terminal may receive a preset wake-up word registered by the user. The preset wake-up word is used to wake up the voice assistant in the terminal, so that the terminal can provide the user with voice control services through the voice assistant. The wake-up voice assistant described in the embodiments of the present application means that the terminal starts the voice assistant in response to the voice data sent by the user. The voice control service means that after the terminal's voice assistant is started, the user can trigger the terminal to execute a corresponding event by sending a voice command (ie, voice data) to the voice assistant. The preset wake-up word in the embodiment of the present application is a piece of voice data. The voice data is a wake-up voice used to wake up the voice assistant.
其中,语音助手可以是安装在终端中的应用程序(Application,APP)。该语音助手可以是终端中嵌入式应用程序(即终端的系统应用)或者可下载应用程序。其中,嵌入式应用程序是作为终端(如手机)实现的一部分提供的应用程序。例如,嵌入式应用程序可以为“设置”应用、“短消息”应用和“相机”应用等。可下载应用程序是一个可以提供自己的因特网协议多媒体子系统(Internet Protocol Multimedia Subsystem,IMS)连接的应用程序,该可下载应用程序可以预先安装在终端中的应用或可以由用户下载并安装在终端中的第三方应用。例如,该可下载应用程序可以为“微信”应用、“支付宝”应用和“邮件”应用等。The voice assistant may be an application (Application, APP) installed in the terminal. The voice assistant may be an embedded application in the terminal (that is, a system application of the terminal) or a downloadable application. Among them, the embedded application program is an application program provided as part of a terminal (such as a mobile phone) implementation. For example, the embedded application can be a Settings application, a Short Message application, a Camera application, and so on. A downloadable application is an application that can provide its own Internet Protocol Multimedia Subsystem (IMS) connection. The downloadable application can be an application installed in the terminal in advance or can be downloaded and installed by the user in the terminal. Third-party applications in. For example, the downloadable application may be a "WeChat" application, an "Alipay" application, a "Mail" application, and the like.
本申请实施例这里以图1所示的手机100为例,对终端注册预置唤醒词的过程进行说明:In the embodiment of the present application, the mobile phone 100 shown in FIG. 1 is used as an example to describe the process of registering a preset wake-up word by a terminal:
手机100可以接收用户对“设置”应用图标的点击操作(如单击操作)。响应于用户对“设置”应用图标的点击操作,手机100可以显示图1中的(a)所示的设置界面101。该设置界面101中可以包括“飞行模式”选项、“WLAN”选项、“蓝牙”选项、“移动网络”选项和“智能辅助”选项102等。其中,“飞行模式”选项、“WLAN”选项、“蓝牙”选项和“移动网络”选项的具体功能可以参考常规技术中的具体描述,本申请实施例这里不予赘述。The mobile phone 100 can receive a user's click operation (such as a click operation) on the "Settings" application icon. In response to the user's click operation on the “Settings” application icon, the mobile phone 100 may display the setting interface 101 shown in (a) in FIG. 1. The setting interface 101 may include a "airplane mode" option, a "WLAN" option, a "Bluetooth" option, a "mobile network" option, a "smart assistance" option 102, and the like. For specific functions of the "airplane mode" option, the "WLAN" option, the "Bluetooth" option, and the "mobile network" option, reference may be made to specific descriptions in conventional technologies, which are not described herein in the embodiment of the present application.
手机100可以接收用户对“智能辅助”选项102的点击操作(如单击操作)。响应于用户对“智能辅助”选项102的点击操作,手机100可以显示图1中的(b)所示的智能辅助界面103。该智能辅助界面103中包括“手势控制”选项104和“语音控制”选项105等。其中,“手势控制”选项104用于管理触发手机100执行相应事件的用户手势。“语音控制”选项105用于管理手机100的语音唤醒功能。具体的,手机100可以接收用户对“语音控制”选项105的点击操作,手机100可以显示图1中的(c)所示的语音控制界面106。该语音控制界面106中包括“语音唤醒”选项107和“来电语音控制”选项108。其中,“语音唤醒”选项107用于开启或者关闭手机100的语音唤醒功能。终端(如手机100)的语音唤醒功能参见本申请实施例后续描述,本申请实施例这里不作介绍。“来电语音控制”选项108用于触发手机100开启或者关闭手机100接收到来电时的语音唤醒功能。例如,假设手机100的“来电语音控制”选项108处于开启状态。当手机100接收到其他终端的来电并进行来电提醒时,如果手机100识别到机主录入的语音数据“接听电话”,手机100则可以自动接听来电;如果手机100识别到机主录入的语音数据“挂断电话”,手机100则可以自动拒接来 电。The mobile phone 100 may receive a user's click operation (such as a click operation) on the “smart assistance” option 102. In response to the user's click operation on the “smart assistance” option 102, the mobile phone 100 may display the smart assistance interface 103 shown in (b) of FIG. 1. The smart assistant interface 103 includes a "gesture control" option 104 and a "voice control" option 105. The “gesture control” option 104 is used to manage a user gesture that triggers the mobile phone 100 to execute a corresponding event. The “voice control” option 105 is used to manage a voice wake-up function of the mobile phone 100. Specifically, the mobile phone 100 may receive a user's click operation on the “voice control” option 105, and the mobile phone 100 may display the voice control interface 106 shown in (c) of FIG. 1. The voice control interface 106 includes a "voice wakeup" option 107 and an "incoming voice control" option 108. The “voice wakeup” option 107 is used to enable or disable the voice wakeup function of the mobile phone 100. For a voice wake-up function of a terminal (such as the mobile phone 100), refer to subsequent descriptions of the embodiments of the present application, which are not described herein. The "caller voice control" option 108 is used to trigger the mobile phone 100 to enable or disable the voice wake-up function when the mobile phone 100 receives an incoming call. For example, it is assumed that the “call voice control” option 108 of the mobile phone 100 is turned on. When the mobile phone 100 receives an incoming call from another terminal and performs a call reminder, if the mobile phone 100 recognizes the voice data "answer the call" entered by the owner, the mobile phone 100 can automatically answer the call; if the mobile phone 100 recognizes the voice data entered by the owner "Hang up the phone", the mobile phone 100 can automatically reject the call.
手机100可以接收用户对“语音唤醒”选项107的点击操作(如单击操作)。响应于用户对“语音唤醒”选项107的点击操作,手机100可以显示图1中的(d)所示的语音唤醒界面109。该语音唤醒界面109中包括“语音唤醒”开关110、“找手机”选项111、“如何拨打电话”选项112和“唤醒词”选项113等。其中,“语音唤醒”开关110用于触发手机100开启或者关闭语音唤醒功能。“找手机”选项111和“如何拨打电话”选项112等用于指示手机100的语音助手启动后,手机100的语音控制功能。例如,“找手机”选项111用于指示手机100的语音助手启动后,手机100的语音助手响应于用户的语音数据“你在哪?”可以回应用户,以方便用户找到手机100。“如何拨打电话”选项112用于指示手机100的语音助手启动后,手机100的语音助手响应于用户的语音数据“打电话给鲍勃(Bob)”,可以自动拨打电话给联系人Bob。The mobile phone 100 may receive a user's click operation (such as a click operation) on the "voice wakeup" option 107. In response to the user's click operation on the "voice wakeup" option 107, the mobile phone 100 may display the voice wakeup interface 109 shown in (d) of FIG. The voice wakeup interface 109 includes a "voice wakeup" switch 110, a "find a phone" option 111, a "how to make a call" option 112, a "wake word" option 113, and the like. The “voice wakeup” switch 110 is used to trigger the mobile phone 100 to enable or disable the voice wakeup function. The "Find a phone" option 111 and the "How to make a call" option 112 are used to instruct the voice control function of the mobile phone 100 after the voice assistant of the mobile phone 100 is activated. For example, the "Find a mobile phone" option 111 is used to indicate that after the voice assistant of the mobile phone 100 is activated, the voice assistant of the mobile phone 100 can respond to the user's voice data "Where are you?" To respond to the user to facilitate the user to find the mobile phone 100. The "how to make a call" option 112 is used to indicate that the voice assistant of the mobile phone 100 can automatically make a call to the contact Bob in response to the user's voice data "call Bob" after the voice assistant of the mobile phone 100 is activated.
“唤醒词”选项113用于向手机100注册用于唤醒手机100(如手机100的语音助手)的唤醒词。在用户还未在手机100中注册自定义的唤醒词之前,手机100可以向用户指示默认的唤醒词、例如,假设手机100的默认唤醒词为“我的小k”。The “wake word” option 113 is used to register a wake up word for the mobile phone 100 to wake up the mobile phone 100 (such as the voice assistant of the mobile phone 100). Before the user has registered a custom wake-up word in the mobile phone 100, the mobile phone 100 may indicate a default wake-up word to the user. For example, it is assumed that the default wake-up word of the mobile phone 100 is "my little k".
假设“语音唤醒”开关110处于开启状态,且手机100中还未注册用户自定义的唤醒词。手机100可以接收用户对图1中的(d)所示的“唤醒词”选项113的点击操作(如单击操作)。响应于用户对“唤醒词”选项113的点击操作,手机100可以显示图2中的(a)所示的默认唤醒词注册界面201。该默认唤醒词注册界面201中可以包括:录音进度条202、“自定义唤醒词”选项203、“麦克风”选项204和录音提示信息205。其中,“麦克风”选项204用于触发手机100开始录制作为唤醒词的语音数据。录音进度条202用于显示手机100录制唤醒词的进度。录音提示信息205用于指示手机100的默认唤醒词。例如,录音提示信息205可以为“请帮助手机学习唤醒词(我的小k),点击后说‘我的小k’”。可选的,默认唤醒词注册界面201还可以包括录音提示信息“请在安静的环境下,离手机30厘米左右录音!”。默认唤醒词注册界面201还包括“取消”按钮206和“确定”按钮207。“确定”按钮207用于触发手机100保存录制的唤醒词。“取消”按钮206用于触发手机取消唤醒词的注册,并显示图1中的(d)所示的语音唤醒界面109。It is assumed that the “voice wakeup” switch 110 is on and the user-defined wake-up word has not been registered in the mobile phone 100. The mobile phone 100 may receive a user's click operation (such as a click operation) on the "wake word" option 113 shown in (d) in FIG. 1. In response to the user's click operation on the "wake word" option 113, the mobile phone 100 may display the default wake word registration interface 201 shown in (a) of FIG. 2. The default wakeup word registration interface 201 may include a recording progress bar 202, a "custom wakeup word" option 203, a "microphone" option 204, and a recording prompt message 205. The “microphone” option 204 is used to trigger the mobile phone 100 to start recording voice data as the wake-up word. The recording progress bar 202 is used to display the progress of the mobile phone 100 recording the wake-up word. The recording prompt information 205 is used to indicate a default wake-up word of the mobile phone 100. For example, the recording prompt information 205 may be "Please help the mobile phone to learn the wake word (my little k), click and say‘ my little k ’”. Optionally, the default wake-up word registration interface 201 may further include a recording prompt message "Please record in a quiet environment, about 30 cm away from the mobile phone!". The default wake-up word registration interface 201 further includes a "Cancel" button 206 and an "OK" button 207. The “OK” button 207 is used to trigger the mobile phone 100 to save the recorded wake-up word. The “Cancel” button 206 is used to trigger the mobile phone to cancel the registration of the wake-up word, and display the voice wake-up interface 109 shown in (d) of FIG. 1.
手机100响应于用户对“麦克风”选项204的点击操作,可以开始录制用户输入的语音数据。手机100接收到用户输入的语音数据(记为语音数据1)后,可以判断该语音数据1是否满足预设条件。如果语音数据1不满足预设条件,手机100可以删除语音数据1,并重新显示图2中的(a)所示的默认唤醒词注册界面201。如果语音数据1满足预设条件,手机100可以保存该语音数据1。In response to the user's click operation on the "microphone" option 204, the mobile phone 100 can start recording voice data input by the user. After receiving the voice data (recorded as voice data 1) input by the user, the mobile phone 100 can determine whether the voice data 1 meets a preset condition. If the voice data 1 does not satisfy the preset condition, the mobile phone 100 may delete the voice data 1 and re-display the default wake-up word registration interface 201 shown in (a) of FIG. 2. If the voice data 1 meets a preset condition, the mobile phone 100 can save the voice data 1.
本申请实施例中,语音数据1满足预设条件具体可以为:语音数据1对应的文本信息为默认唤醒词的文本信息“我的小k”,且语音数据1的信噪比高于预设阈值。In the embodiment of the present application, the voice data 1 meeting the preset condition may specifically be: the text information corresponding to the voice data 1 is the text information “my small k” of the default wake-up word, and the signal-to-noise ratio of the voice data 1 is higher than the preset Threshold.
手机100接收到用户输入的满足预设条件的语音数据1后,则可以根据满足预设条件的语音数据1生成用于唤醒语音助手时进行声纹校验的声纹模型,并根据语音数据1和声纹模型生成声纹门限。该声纹模型可以表征用户注册的唤醒词的声纹特征。After the mobile phone 100 receives the voice data 1 that meets the preset conditions and is input by the user, it can generate a voiceprint model for voiceprint verification when the voice assistant is awakened based on the voice data 1 that meets the preset conditions, and The harmony pattern model generates a threshold for the pattern. The voiceprint model can characterize the voiceprint characteristics of wake words registered by the user.
可以理解,声纹模型相当于一个函数。根据不同的语音数据可以生成不同的声纹模型。也就是说,手机100根据同一用户注册的不同的唤醒词,可以生成不同的声纹 模型。不同用户向手机100注册相同的唤醒词也可以生成不同的声纹模型。手机100可以将上述语音数据1(即用户在注册唤醒词时输入的、满足预设条件的语音数据)作为输入值,代入上述声纹模型后,得到的一个声纹值(如声纹值a)。Understandably, the voiceprint model is equivalent to a function. Different voiceprint models can be generated based on different speech data. That is, the mobile phone 100 can generate different voiceprint models according to different wake-up words registered by the same user. Different users registering the same wake-up word with the mobile phone 100 can also generate different voiceprint models. The mobile phone 100 may use the voice data 1 (that is, the voice data that is input by the user when registering the wake-up word and meets the preset conditions) as an input value, and substitute it into the voiceprint model to obtain a voiceprint value (such as the voiceprint value a). ).
可选的,为了提高语音唤醒的准确性。终端可以录制多个满足预设条件的语音数据。终端可以根据多个满足预设条件的语音数据生成用于唤醒语音助手时进行声纹校验的声纹模型。例如,手机100在语音数据1满足预设条件,保存该语音数据1之后,可以提示用户再次录制语音数据。Optionally, in order to improve the accuracy of voice wake-up. The terminal can record multiple voice data that meet preset conditions. The terminal may generate a voiceprint model for performing voiceprint verification when the voice assistant is awakened based on a plurality of voice data satisfying preset conditions. For example, the mobile phone 100 may prompt the user to record the voice data again after the voice data 1 meets a preset condition, and the voice data 1 is saved.
其中,上述“自定义唤醒词”选项203用于触发手机100显示唤醒词输入界面。例如,手机100响应于用户对图2中的(a)所示的“自定义唤醒词”选项203的点击操作(如单击操作),可以显示图2中的(b)所示的唤醒词输入界面208。唤醒词输入界面208可以包括“取消”按钮209、“确定”按钮210、“唤醒词输入框”211和唤醒词建议212。其中,“取消”按钮209用于触发手机取消自定义唤醒词,并显示图2中的(a)所示的默认唤醒词注册界面201。“唤醒词输入框”211用于接收用户输入的自定义唤醒词。“确定”按钮210用于保存用户在“唤醒词输入框”211输入的自定义唤醒词。唤醒词建议212用于提示用户手机对自定义唤醒词的要求。The “custom wake word” option 203 is used to trigger the mobile phone 100 to display a wake word input interface. For example, the mobile phone 100 may display the wake-up word shown in (b) of FIG. 2 in response to a user's click operation (such as a click operation) on the “custom wake-up word” option 203 shown in (a) of FIG. 2. Input interface 208. The wake-up word input interface 208 may include a “cancel” button 209, an “OK” button 210, a “wake-up word input box” 211, and a wake-up word suggestion 212. The “Cancel” button 209 is used to trigger the mobile phone to cancel the customized wake-up word and display the default wake-up word registration interface 201 shown in (a) of FIG. 2. The “wake word input box” 211 is used to receive a custom wake word input by a user. The "OK" button 210 is used to save a custom wake-up word entered by the user in the "wake-up word input box" 211. The wake-up word suggestion 212 is used to prompt the user of the mobile phone's request for a custom wake-up word.
假设用户在图2中的(c)所示的“唤醒词输入框”211输入自定义唤醒词“我的超级手机”。手机100响应于用户对图2中的(c)所示的“确定”按钮210的点击操作(如单击操作),可以显示图2中的(d)所示的自定义唤醒词注册界面213,以便于用户可以在自定义唤醒词注册界面213注册自定义唤醒词。其中,用户在自定义唤醒词注册界面213注册自定义唤醒词的方法与在默认唤醒词注册界面201注册默认唤醒词的方法相同,本申请实施例这里不予赘述。Assume that the user inputs a custom wake-up word "my super phone" in the "wake-up word input box" 211 shown in (c) in FIG. 2. The mobile phone 100 may display a custom wakeup word registration interface 213 shown in (d) of FIG. 2 in response to a user's click operation (such as a click operation) on the “OK” button 210 shown in (c) of FIG. 2. , So that the user can register a custom wake-up word on the custom wake-up word registration interface 213. The method for a user to register a custom wake-up word on the custom wake-up word registration interface 213 is the same as the method for registering a default wake-up word on the default wake-up word registration interface 201, which is not described in the embodiment of the present application.
可以理解,如果手机100中已经注册了用户自定义的唤醒词,如自定义的唤醒词为“我的超级手机”,那么响应于用户对图1中的(d)所示的“唤醒词”选项113的点击操作,手机100可以显示图2中的(d)所示的自定义唤醒词注册界面216。It can be understood that if the user-defined wake-up word has been registered in the mobile phone 100, for example, the custom wake-up word is "my super phone", then in response to the user's response to the "wake-up word" shown in (d) of FIG. 1 With the click operation of the option 113, the mobile phone 100 may display the customized wake-up word registration interface 216 shown in (d) of FIG. 2.
需要说明的是,不同的终端有不同的设计。例如,在部分终端中,上述智能辅助可以称为辅助功能,上述语音控制可以称为语音助手,上述语音唤醒可以称为唤醒功能。并且,用户触发终端显示唤醒词注册界面(如默认唤醒词注册界面或者自定义唤醒词注册界面)的方式包括但不限于用户在终端中的“设置-智能辅助-语音控制-语音唤醒-唤醒词”操作。例如,在部分终端中,用户触发终端显示唤醒词注册界面的方式可以为“设置-语音助手-语音唤醒-唤醒词”。It should be noted that different terminals have different designs. For example, in some terminals, the above-mentioned intelligent assistance may be referred to as an auxiliary function, the above-mentioned voice control may be referred to as a voice assistant, and the above-mentioned voice wakeup may be referred to as a wake-up function. In addition, the manner in which the user triggers the terminal to display the wake-up word registration interface (such as a default wake-up word registration interface or a custom wake-up word registration interface) includes, but is not limited to, the user's "settings-intelligent assistance-voice control-voice wake-up-wake words" "operating. For example, in some terminals, the manner in which the user triggers the terminal to display the wake-up word registration interface may be "settings-voice assistant-voice wake-up wake-up word".
本申请实施例这里以手机100的唤醒词为默认唤醒词“我的小k”为例,对手机100的语音唤醒过程进行说明:In the embodiment of the present application, the wake-up word of the mobile phone 100 is used as a default wake-up word “my little k” as an example to describe the voice wake-up process of the mobile phone 100:
手机100的DSP监测到语音数据(如语音数据2)与默认唤醒词“我的小k”的相似度满足一定条件时,可以将监测到的语音数据2交给AP。由AP对该语音数据2进行文本校验。AP在识别到该语音数据2对应的文本为“我的小k”时,可以将该语音数据2作为输入值,代入手机100的声纹模型,得到一个声纹值(声纹值b)。如果该声纹值b与上述声纹门限(即声纹值a)的差值小于预设阈值,AP则可以确定该语音数据2与用户注册的唤醒词匹配。When the DSP of the mobile phone 100 detects that the similarity between the voice data (such as the voice data 2) and the default wake-up word "My little k" satisfies a certain condition, the monitored voice data 2 may be delivered to the AP. The AP performs text verification on the voice data 2. When the AP recognizes that the text corresponding to the voice data 2 is “my little k”, the AP may use the voice data 2 as an input value and substitute it into the voiceprint model of the mobile phone 100 to obtain a voiceprint value (voiceprint value b). If the difference between the voiceprint value b and the voiceprint threshold (ie, the voiceprint value a) is less than a preset threshold, the AP may determine that the voice data 2 matches the wake-up word registered by the user.
为了适应用户的身体状态和/或用户所处的噪声场景的变化,部分手机可以周期性 的提醒用户重新注册唤醒词。但是,手动注册唤醒词过程繁琐,并且多次手动注册唤醒词会浪费用户的时间,影响用户体验。In order to adapt to changes in the user's physical state and / or the noise scene in which the user is located, some mobile phones can periodically remind the user to re-register the wake-up word. However, the process of manually registering the wake-up word is cumbersome, and the manual registration of the wake-up word multiple times will waste the user's time and affect the user experience.
本申请实施例中,终端可以获取执行语音唤醒的过程中的有效唤醒词,终端采用该有效唤醒词更新用户注册的唤醒词。其中,本申请实施例中的有效唤醒词可以包括成功唤醒终端的语音数据。终端在执行语音唤醒的过程中,自动获取有效唤醒词来更新用户注册的唤醒词,可以省略用户手动重新注册唤醒词时的繁琐操作。In the embodiment of the present application, the terminal may obtain a valid wake-up word in the process of performing a voice wake-up, and the terminal uses the valid wake-up word to update the registered wake-up word of the user. Wherein, the effective wake-up word in the embodiment of the present application may include voice data of a terminal that is successfully awakened. In the process of performing a voice wake-up, the terminal automatically obtains a valid wake-up word to update the registered wake-up word of the user, which can omit the tedious operation of the user when manually re-registering the wake-up word.
本申请实施例提供的终端更新语音助手的唤醒语音的方法的原理:由于有效唤醒词是终端在执行语音唤醒的过程中获取的语音数据;因此,该有效唤醒词是与用户当前的身体状态和用户当前所处的噪声场景相关的语音数据。并且,由于该有效唤醒词可以成功唤醒终端;因此,该有效唤醒词与用户注册的唤醒词的匹配程度满足语音唤醒的条件。综上所述,如果终端采用该有效唤醒词更新用户注册的唤醒词,然后采用更新后的唤醒词进行语音唤醒,可以适应于用户的身体状态和/或用户所处的噪声场景发生变化,进而可以提高手机的语音唤醒率,降低终端执行语音唤醒的误唤醒率。The principle of the method for updating the wake-up voice of the voice assistant provided by the terminal according to the embodiment of the present application: since the effective wake-up word is the voice data obtained by the terminal during the process of performing the voice wake-up; therefore, the effective wake-up word is related to the current physical state of the user and Voice data related to the noise scene that the user is currently in. And, since the effective wake-up word can successfully wake up the terminal; therefore, the degree of matching between the effective wake-up word and the wake-up word registered by the user satisfies the condition of voice wake-up. In summary, if the terminal uses the effective wake-up word to update the wake-up word registered by the user, and then uses the updated wake-up word to wake up the voice, it can adapt to the user's physical state and / or the noise scene in which the user is located, and further It can increase the voice wake-up rate of the mobile phone and reduce the false wake-up rate when the terminal performs voice wake-up.
本申请实施例中的终端可以为便携式计算机(如手机)、笔记本电脑、个人计算机(Personal Computer,PC)、可穿戴电子设备(如智能手表)、平板电脑、增强现实(augmented reality,AR)\虚拟现实(virtual reality,VR)设备、车载电脑等,以下实施例对该终端的具体形式不做特殊限制。The terminal in the embodiment of the present application may be a portable computer (such as a mobile phone), a notebook computer, a personal computer (PC), a wearable electronic device (such as a smart watch), a tablet computer, or augmented reality (AR) \ Virtual reality (VR) equipment, on-board computers, and the like, the following embodiments do not specifically limit the specific form of the terminal.
请参考图3,其示出本申请实施例提供一种终端300的结构框图。其中,终端300可以包括处理器310,外部存储器接口320,内部存储器321,USB接口330,充电管理模块340,电源管理模块341,电池342,天线1,天线2,射频模块350,通信模块360,音频模块370,扬声器370A,受话器370B,麦克风370C,耳机接口370D,传感器模块380,按键390,马达391,指示器392,摄像头393,显示屏394,以及SIM卡接口395等。其中传感器模块可以包括压力传感器380A,陀螺仪传感器380B,气压传感器380C,磁传感器380D,加速度传感器380E,距离传感器380F,接近光传感器380G,指纹传感器380H,温度传感器380J,触摸传感器380K,环境光传感器380L,骨传导传感器等。Please refer to FIG. 3, which shows a structural block diagram of a terminal 300 provided by an embodiment of the present application. The terminal 300 may include a processor 310, an external memory interface 320, an internal memory 321, a USB interface 330, a charge management module 340, a power management module 341, a battery 342, an antenna 1, an antenna 2, a radio frequency module 350, a communication module 360, Audio module 370, speaker 370A, receiver 370B, microphone 370C, headphone interface 370D, sensor module 380, button 390, motor 391, indicator 392, camera 393, display 394, and SIM card interface 395. The sensor module can include pressure sensor 380A, gyroscope sensor 380B, barometric pressure sensor 380C, magnetic sensor 380D, acceleration sensor 380E, distance sensor 380F, proximity light sensor 380G, fingerprint sensor 380H, temperature sensor 380J, touch sensor 380K, ambient light sensor 380L, bone conduction sensor, etc.
其中,图3所示的终端300仅仅是终端的一个范例。图3示意的结构并不构成对终端300的限定。可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。图示的部件可以以硬件,软件或软件和硬件的组合实现。The terminal 300 shown in FIG. 3 is only an example of the terminal. The structure shown in FIG. 3 does not limit the terminal 300. It may include more or fewer parts than shown, or some parts may be combined, or some parts may be split, or different parts may be arranged. The illustrated components can be implemented in hardware, software, or a combination of software and hardware.
处理器310可以包括一个或多个处理单元,例如:处理器310可以包括应用处理器(Application Processor,AP),调制解调处理器,图形处理器(Graphics Processing Unit,GPU),图像信号处理器(Image Signal Processor,ISP),控制器,存储器,视频编解码器,DSP,基带处理器,和/或神经网络处理器(Neural-network Processing Unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以是集成在同一个处理器中。The processor 310 may include one or more processing units. For example, the processor 310 may include an application processor (Application Processor), a modem processor, a graphics processor (Graphics Processing Unit, GPU), and an image signal processor. (Image Signal Processor, ISP), controller, memory, video codec, DSP, baseband processor, and / or neural network processing unit (NPU), etc. Among them, different processing units can be independent devices or integrated in the same processor.
在本申请实施例中,DSP可以实时监测语音数据。当DSP监测到的语音数据与终端中注册的唤醒词的相似度满足预设条件时,便可以将该语音数据交给AP。由AP对上述语音数据进行文本校验和声纹校验。当AP确定该语音数据与用户注册的唤醒词匹配时,终端便可以开启语音助手。In the embodiment of the present application, the DSP can monitor the voice data in real time. When the similarity between the voice data monitored by the DSP and the wake-up word registered in the terminal meets a preset condition, the voice data can be handed over to the AP. The AP performs text verification and voiceprint verification on the voice data. When the AP determines that the voice data matches the wake-up word registered by the user, the terminal can start the voice assistant.
控制器可以是指挥终端300的各个部件按照指令协调工作的决策者。是终端300的神经中枢和指挥中心。控制器根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。The controller may be a decision maker that directs the various components of the terminal 300 to coordinate work according to the instructions. It is the nerve center and command center of the terminal 300. The controller generates operation control signals according to the instruction operation code and timing signals, and completes the control of fetching and executing the instructions.
处理器310中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器中的存储器为高速缓冲存储器。可以保存处理器刚用过或循环使用的指令或数据。如果处理器需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器的等待时间,因而提高了系统的效率。The processor 310 may further include a memory for storing instructions and data. In some embodiments, the memory in the processor is a cache memory. You can save instructions or data that the processor has just used or recycled. If the processor needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided, the processor's waiting time is reduced, and the efficiency of the system is improved.
在一些实施例中,处理器310可以包括接口。其中接口可以包括集成电路(Inter-Integrated Circuit,I2C)接口,集成电路内置音频(Inter-Integrated Circuit Sound,I2S)接口,脉冲编码调制(Pulse Code Modulation,PCM)接口,通用异步收发传输器(Universal Asynchronous Receiver/Transmitter,UART)接口,移动产业处理器接口(Mobile Industry Processor Interface,MIPI),通用输入输出(General-Purpose Input/output,GPIO)接口,用户标识模块(Subscriber Identity Module,SIM)接口,和/或通用串行总线(Universal Serial Bus,USB)接口等。In some embodiments, the processor 310 may include an interface. The interface may include an integrated circuit (Inter-Integrated Circuit, I2C) interface, an integrated circuit (Inter-Integrated Circuit, Sound, I2S) interface, a pulse code modulation (Pulse Code Modulation, PCM) interface, a universal asynchronous transceiver (Universal Asynchronous Receiver / Transmitter (UART) interface, Mobile Industry Processor Interface (MIPI), General-Purpose Input / output (GPIO) interface, Subscriber Identity Module (SIM) interface, And / or universal serial bus (Universal Serial Bus, USB) interface.
I2C接口是一种双向同步串行总线,包括一根串行数据线(Serial Data Line,SDA)和一根串行时钟线(Derail Clock Line,SCL)。在一些实施例中,处理器可以包含多组I2C总线。处理器可以通过不同的I2C总线接口分别耦合触摸传感器,充电器,闪光灯,摄像头等。例如:处理器可以通过I2C接口耦合触摸传感器,使处理器与触摸传感器通过I2C总线接口通信,实现终端300的触摸功能。The I2C interface is a two-way synchronous serial bus, including a serial data line (Serial Data Line, SDA) and a serial clock line (Derail Clock Line, SCL). In some embodiments, the processor may include multiple sets of I2C buses. The processor can be coupled to touch sensors, chargers, flashes, cameras, etc. through different I2C bus interfaces. For example, the processor may couple the touch sensor through the I2C interface, so that the processor and the touch sensor communicate through the I2C bus interface to implement the touch function of the terminal 300.
I2S接口可以用于音频通信。在一些实施例中,处理器可以包含多组I2S总线。处理器可以通过I2S总线与音频模块耦合,实现处理器与音频模块之间的通信。在一些实施例中,音频模块可以通过I2S接口向通信模块传递音频信号,实现通过蓝牙耳机接听电话的功能。The I2S interface can be used for audio communication. In some embodiments, the processor may include multiple sets of I2S buses. The processor may be coupled to the audio module through an I2S bus to implement communication between the processor and the audio module. In some embodiments, the audio module can transmit audio signals to the communication module through the I2S interface, so as to implement the function of receiving calls through a Bluetooth headset.
PCM接口也可以用于音频通信,将模拟信号抽样,量化和编码。在一些实施例中,音频模块与通信模块可以通过PCM总线接口耦合。在一些实施例中,音频模块也可以通过PCM接口向通信模块传递音频信号,实现通过蓝牙耳机接听电话的功能。所述I2S接口和所述PCM接口都可以用于音频通信,两种接口的采样速率不同。The PCM interface can also be used for audio communications, sampling, quantizing, and encoding analog signals. In some embodiments, the audio module and the communication module may be coupled through a PCM bus interface. In some embodiments, the audio module can also transmit audio signals to the communication module through the PCM interface, so as to implement the function of receiving calls through a Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication, and the sampling rates of the two interfaces are different.
UART接口是一种通用串行数据总线,用于异步通信。该总线为双向通信总线。它将要传输的数据在串行通信与并行通信之间转换。在一些实施例中,UART接口通常被用于连接处理器与通信模块360。例如:处理器通过UART接口与蓝牙模块通信,实现蓝牙功能。在一些实施例中,音频模块可以通过UART接口向通信模块传递音频信号,实现通过蓝牙耳机播放音乐的功能。The UART interface is a universal serial data bus for asynchronous communication. This bus is a two-way communication bus. It converts the data to be transferred between serial and parallel communications. In some embodiments, a UART interface is typically used to connect the processor and the communication module 360. For example, the processor communicates with the Bluetooth module through a UART interface to implement the Bluetooth function. In some embodiments, the audio module can transmit audio signals to the communication module through the UART interface, so as to implement the function of playing music through a Bluetooth headset.
MIPI接口可以被用于连接处理器与显示屏,摄像头等外围器件。MIPI接口包括摄像头串行接口(Camera Serial Interface,CSI),显示屏串行接口(Display Serial Interface,DSI)等。在一些实施例中,处理器和摄像头通过CSI接口通信,实现终端300的拍摄功能。处理器和显示屏通过DSI接口通信,实现终端300的显示功能。The MIPI interface can be used to connect processors with peripheral devices such as displays, cameras, etc. The MIPI interface includes a camera serial interface (CSI), a display serial interface (DSI), and the like. In some embodiments, the processor and the camera communicate through a CSI interface to implement a shooting function of the terminal 300. The processor and the display screen communicate through a DSI interface to implement a display function of the terminal 300.
GPIO接口可以通过软件配置。GPIO接口可以配置为控制信号,也可配置为数据信号。在一些实施例中,GPIO接口可以用于连接处理器与摄像头,显示屏,通信模块,音频模块,传感器等。GPIO接口还可以被配置为I2C接口,I2S接口,UART接口, MIPI接口等。The GPIO interface can be configured by software. The GPIO interface can be configured as a control signal or as a data signal. In some embodiments, the GPIO interface may be used to connect the processor with a camera, a display screen, a communication module, an audio module, a sensor, and the like. GPIO interface can also be configured as I2C interface, I2S interface, UART interface, MIPI interface, etc.
USB接口330可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口可以用于连接充电器为终端300充电,也可以用于终端300与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。还可以用于连接其他电子设备,例如AR设备等。The USB interface 330 may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like. The USB interface can be used to connect a charger to charge the terminal 300, and can also be used to transfer data between the terminal 300 and a peripheral device. It can also be used to connect headphones and play audio through headphones. It can also be used to connect other electronic devices, such as AR devices.
本申请实施例示意的各模块间的接口连接关系,只是示意性说明,并不构成对终端300的结构限定。终端300可以使用本申请实施例中不同的接口连接方式,或多种接口连接方式的组合。The interface connection relationship between the modules illustrated in the embodiments of the present application is only a schematic description, and does not constitute a limitation on the structure of the terminal 300. The terminal 300 may use different interface connection modes or a combination of multiple interface connection modes in the embodiments of the present application.
充电管理模块340用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块可以通过USB接口接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块可以通过终端300的无线充电线圈接收无线充电输入。充电管理模块为电池充电的同时,还可以通过电源管理模块341为终端设备供电。The charging management module 340 is configured to receive a charging input from a charger. The charger may be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module may receive a charging input of a wired charger through a USB interface. In some embodiments of wireless charging, the charging management module may receive a wireless charging input through a wireless charging coil of the terminal 300. While the charging management module is charging the battery, it can also supply power to the terminal device through the power management module 341.
电源管理模块341用于连接电池342,充电管理模块340与处理器310。电源管理模块接收所述电池和/或充电管理模块的输入,为处理器,内部存储器,外部存储器,显示屏,摄像头,和通信模块等供电。电源管理模块还可以用于监测电池容量,电池循环次数,电池健康状态(漏电,阻抗)等参数。在一些实施例中,电源管理模块341也可以设置于处理器310中。在一些实施例中,电源管理模块341和充电管理模块也可以设置于同一个器件中。The power management module 341 is used to connect the battery 342, the charge management module 340, and the processor 310. The power management module receives inputs from the battery and / or charge management module, and supplies power to a processor, an internal memory, an external memory, a display screen, a camera, and a communication module. The power management module can also be used to monitor battery capacity, battery cycle times, battery health (leakage, impedance) and other parameters. In some embodiments, the power management module 341 may also be disposed in the processor 310. In some embodiments, the power management module 341 and the charge management module may also be provided in the same device.
终端300的无线通信功能可以通过天线模块1,天线模块2,射频模块350,通信模块360,调制解调器以及基带处理器等实现。天线1和天线2用于发射和接收电磁波信号。终端300中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将蜂窝网天线复用为无线局域网分集天线。在一些实施例中,天线可以和调谐开关结合使用。The wireless communication function of the terminal 300 may be implemented by the antenna module 1, the antenna module 2, the radio frequency module 350, the communication module 360, a modem, and a baseband processor. The antenna 1 and the antenna 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in the terminal 300 may be used to cover a single or multiple communication frequency bands. Different antennas can also be multiplexed to improve antenna utilization. For example, a cellular network antenna can be multiplexed into a wireless LAN diversity antenna. In some embodiments, the antenna may be used in conjunction with a tuning switch.
射频模块350可以提供应用在终端300上的包括2G/3G/4G/5G等无线通信的解决方案的通信处理模块。可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(Low Noise Amplifier,LNA)等。射频模块由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调器进行解调。射频模块还可以对经调制解调器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,射频模块350的至少部分功能模块可以被设置于处理器310中。在一些实施例中,射频模块350的至少部分功能模块可以与处理器310的至少部分模块被设置在同一个器件中。The radio frequency module 350 may provide a communication processing module for a wireless communication solution including 2G / 3G / 4G / 5G and the like applied on the terminal 300. It may include at least one filter, switch, power amplifier, Low Noise Amplifier (LNA), and the like. The radio frequency module receives electromagnetic waves from the antenna 1, and processes the received electromagnetic waves by filtering, amplifying, etc., and transmitting them to the modem for demodulation. The radio frequency module can also amplify the signal modulated by the modem and turn it into electromagnetic wave radiation through the antenna 1. In some embodiments, at least part of the functional modules of the radio frequency module 350 may be disposed in the processor 310. In some embodiments, at least part of the functional modules of the radio frequency module 350 may be provided in the same device as at least part of the modules of the processor 310.
调制解调器可以包括调制器和解调器。调制器用于将待发送的低频基带信号调制成中高频信号。解调器用于将接收的电磁波信号解调为低频基带信号。随后解调器将解调得到的低频基带信号传送至基带处理器处理。低频基带信号经基带处理器处理后,被传递给应用处理器。应用处理器通过音频设备(不限于扬声器,受话器等)输出声音信号,或通过显示屏显示图像或视频。在一些实施例中,调制解调器可以是独立的器件。在一些实施例中,调制解调器可以独立于处理器,与射频模块或其他功能模块设置在同一个器件中。The modem may include a modulator and a demodulator. The modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing. The low-frequency baseband signal is processed by the baseband processor and then passed to the application processor. The application processor outputs sound signals through audio equipment (not limited to speakers, receivers, etc.), or displays images or videos through a display screen. In some embodiments, the modem may be a separate device. In some embodiments, the modem may be independent of the processor and disposed in the same device as the radio frequency module or other functional modules.
通信模块360可以提供应用在终端300上的包括无线局域网(Wireless Local Area  Networks,WLAN)(如无线保真(Wireless Fidelity,Wi-Fi)网络),蓝牙(Blue Tooth,BT),全球导航卫星系统(Global Navigation Satellite System,GNSS),调频(Frequency Modulation,FM),近距离无线通信技术(Near Field Communication,NFC),红外技术(Infrared,IR)等无线通信的解决方案的通信处理模块。通信模块360可以是集成至少一个通信处理模块的一个或多个器件。通信模块经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器。通信模块360还可以从处理器接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。The communication module 360 can provide wireless local area networks (WLAN) (such as Wireless Fidelity (Wi-Fi) networks), Bluetooth (BlueTooth, BT), and global navigation satellite systems applied to the terminal 300. (Global Navigation System, GNSS), Frequency Modulation (Frequency Modulation, FM), Near Field Communication (NFC), Infrared (IR) and other wireless communication solutions. The communication module 360 may be one or more devices that integrate at least one communication processing module. The communication module receives the electromagnetic wave through the antenna 2, frequency-modulates and filters the electromagnetic wave signal, and sends the processed signal to the processor. The communication module 360 may also receive a signal to be transmitted from the processor, frequency-modulate it, amplify it, and turn it into electromagnetic wave radiation through the antenna 2.
在一些实施例中,终端300的天线1和射频模块耦合,天线2和通信模块360耦合。使得终端300可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(Global System For Mobile Communications,GSM),通用分组无线服务(General Packet Radio Service,GPRS),码分多址接入(Code Division Multiple Access,CDMA),宽带码分多址(Wideband Code Division Multiple Access,WCDMA),时分码分多址(Time-Division Code Division Multiple Access,TD-SCDMA),长期演进(Long Term Evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(Global Positioning System,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(BeiDou Navigation Satellite System,BDS),准天顶卫星系统(Quasi-Zenith Satellite System,QZSS))和/或星基增强系统(Satellite Based Augmentation Systems,SBAS)。In some embodiments, the antenna 1 of the terminal 300 is coupled to a radio frequency module, and the antenna 2 is coupled to a communication module 360. This enables the terminal 300 to communicate with the network and other devices through wireless communication technology. The wireless communication technology may include a Global System for Mobile Communications (GSM), a General Packet Radio Service (GPRS), a Code Division Multiple Access (CDMA), and a broadband Code Division Multiple Access (WCDMA), Time-Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC , FM, and / or IR technology. The GNSS may include a Global Positioning System (GPS), a Global Navigation Satellite System (GLONASS), a BeiDou Navigation Navigation Satellite System (BDS), and a Quasi-Zenith Satellite System (Quasi). -Zenith Satellite System (QZSS)) and / or Satellite Based Augmentation Systems (SBAS).
终端300通过GPU,显示屏394,以及应用处理器等实现显示功能。GPU为图像处理的微处理器,连接显示屏和应用处理器。GPU用于执行数学和几何计算,用于图形渲染。处理器310可包括一个或多个GPU,其执行程序指令以生成或改变显示信息。The terminal 300 implements a display function through a GPU, a display screen 394, and an application processor. The GPU is a microprocessor for image processing, which connects the display screen and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 310 may include one or more GPUs that execute program instructions to generate or change display information.
显示屏394用于显示图像,视频等。显示屏包括显示面板。显示面板可以使用液晶显示屏(Liquid Crystal Display,LCD),有机发光二极管(Organic Light-Emitting Diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(Active-Matrix Organic Light Emitting Diode,AMOLED),柔性发光二极管(Flex Light-Emitting Diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(Quantum Dot Light Emitting Diodes,QLED)等。在一些实施例中,终端300可以包括1个或N个显示屏,N为大于1的正整数。The display 394 is used to display images, videos, and the like. The display includes a display panel. The display panel can use a liquid crystal display (Liquid Crystal Display, LCD), organic light emitting diode (Organic Light-Emitting Diode, OLED), active matrix organic light emitting diode or active matrix organic light emitting diode (Active-Matrix Organic Light Emitting (Diode, AMOLED), Flexible Light-Emitting Diode (FLED), Miniled, MicroLed, Micro-oLed, Quantum Dot Light (Emitting Diodes, QLED), etc. In some embodiments, the terminal 300 may include one or N display screens, where N is a positive integer greater than 1.
终端300可以通过ISP,摄像头393,视频编解码器,GPU,显示屏以及应用处理器等实现拍摄功能。The terminal 300 can implement a shooting function through an ISP, a camera 393, a video codec, a GPU, a display screen, and an application processor.
ISP用于处理摄像头反馈的数据。例如,拍照时,打开快门,光线通过镜头被传递到摄像头感光元件上,光信号转换为电信号,摄像头感光元件将所述电信号传递给ISP处理,转化为肉眼可见的图像。ISP还可以对图像的噪点,亮度,肤色进行算法优化。ISP还可以对拍摄场景的曝光,色温等参数优化。在一些实施例中,ISP可以设置在摄像头393中。ISP is used to process data from camera feedback. For example, when taking a picture, the shutter is opened, and the light is transmitted to the light receiving element of the camera through the lens. The light signal is converted into an electrical signal, and the light receiving element of the camera passes the electrical signal to the ISP for processing and converts the image to the naked eye. ISP can also optimize the image's noise, brightness, and skin tone. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, an ISP may be provided in the camera 393.
摄像头393用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图 像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,终端300可以包括1个或N个摄像头,N为大于1的正整数。The camera 393 is used to capture still images or videos. An object generates an optical image through a lens and projects it onto a photosensitive element. The photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The photosensitive element converts the optical signal into an electrical signal, and then passes the electrical signal to the ISP to convert it into a digital image signal. The ISP outputs digital image signals to the DSP for processing. DSP converts digital image signals into image signals in standard RGB, YUV and other formats. In some embodiments, the terminal 300 may include one or N cameras, where N is a positive integer greater than 1.
数字信号处理器用于处理数字信号,除了可以处理数字图像信号,还可以处理其他数字信号。例如,当终端300在频点选择时,数字信号处理器用于对频点能量进行傅里叶变换等。A digital signal processor is used to process digital signals. In addition to digital image signals, it can also process other digital signals. For example, when the terminal 300 selects at a frequency point, the digital signal processor is used to perform a Fourier transform on the frequency point energy and the like.
视频编解码器用于对数字视频压缩或解压缩。终端300可以支持一种或多种编解码器。这样,终端300可以播放或录制多种编码格式的视频,例如:MPEG1,MPEG2,MPEG3,MPEG4等。Video codecs are used to compress or decompress digital video. The terminal 300 may support one or more codecs. In this way, the terminal 300 can play or record videos in multiple encoding formats, such as: MPEG1, MPEG2, MPEG3, MPEG4, and so on.
NPU为神经网络(Neural-Network,NN)计算处理器,通过借鉴生物神经网络结构,例如借鉴人脑神经元之间传递模式,对输入信息快速处理,还可以不断的自学习。通过NPU可以实现终端300的智能认知等应用,例如:图像识别,人脸识别,语音识别,文本理解等。NPU is a neural network (Neural-Network, NN) computing processor. By drawing on the structure of biological neural networks, such as the transfer mode between neurons in the human brain, the NPU can quickly process input information and continuously learn. Through the NPU, applications such as intelligent recognition of the terminal 300 can be implemented, such as: image recognition, face recognition, speech recognition, text understanding, and the like.
外部存储器接口320可以用于连接外部存储卡,例如Micro SD卡,实现扩展终端300的存储能力。外部存储卡通过外部存储器接口与处理器通信,实现数据存储功能。例如将音乐,视频等文件保存在外部存储卡中。The external memory interface 320 may be used to connect an external memory card, such as a Micro SD card, to realize the expansion of the storage capacity of the terminal 300. The external memory card communicates with the processor through an external memory interface to implement a data storage function. For example, save music, videos and other files on an external memory card.
内部存储器321可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器310通过运行存储在内部存储器321的指令,从而执行终端300的各种功能应用以及数据处理。存储器321可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,至少一个功能所需的应用程序(比如声音播放功能,图像播放功能等)等。存储数据区可存储终端300使用过程中所创建的数据(比如音频数据,电话本等)等。其中,终端300使用过程中所创建的数据(比如音频数据,电话本等)可以称为用户数据。此外,内部存储器321可以包括高速随机存取存储器(Random Access Memory,RAM)、只读存储器(Read Only Memory,ROM),还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,其他易失性固态存储器件,通用闪存存储器(Universal Flash Storage,UFS)等。The internal memory 321 may be used to store computer executable program code, where the executable program code includes instructions. The processor 310 executes various functional applications and data processing of the terminal 300 by running instructions stored in the internal memory 321. The memory 321 may include a storage program area and a storage data area. The storage program area may store an operating system, at least one application required by a function (such as a sound playback function, an image playback function, etc.) and the like. The storage data area can store data (such as audio data, phone book, etc.) created during the use of the terminal 300. The data (such as audio data, phone book, etc.) created during the use of the terminal 300 may be referred to as user data. In addition, the internal memory 321 may include high-speed random access memory (RAM), read-only memory (Read Only Memory, ROM), and may also include non-volatile memory, such as at least one disk storage device, flash memory device, Other volatile solid-state storage devices, Universal Flash Memory (Universal Flash Storage, UFS), etc.
其中,上述内部存储器321包括本申请实施例中所述的数据分区(如,数据分区)。该数据分区中保存有操作系统启动时所需要读写的文件或数据,以及终端使用过程中所创建的用户数据。数据分区可以是上述内部存储器321中预先设定的存储区域。例如,数据分区可以包含于内部存储器321中的RAM中。The internal memory 321 includes a data partition (such as a data partition) described in the embodiment of the present application. The data partition stores files or data that need to be read and written when the operating system starts, and user data created during terminal use. The data partition may be a storage area set in advance in the internal memory 321. For example, the data partition may be contained in a RAM in the internal memory 321.
本申请实施例中的虚拟数据分区可以为内部存储器321中的RAM的一个存储区域。或者,虚拟数据分区可以为内部存储器321中的ROM的一个存储区域。或者,虚拟数据分区可以为外部存储器接口320连接的外部存储卡,例如Micro SD卡。The virtual data partition in the embodiment of the present application may be a storage area of the RAM in the internal memory 321. Alternatively, the virtual data partition may be a storage area of a ROM in the internal memory 321. Alternatively, the virtual data partition may be an external memory card connected to the external memory interface 320, such as a Micro SD card.
终端300可以通过音频模块370,扬声器370A,受话器370B,麦克风370C,耳机接口370D,以及应用处理器等实现音频功能。例如音乐播放,录音等。The terminal 300 can implement audio functions through an audio module 370, a speaker 370A, a receiver 370B, a microphone 370C, a headphone interface 370D, and an application processor. Such as music playback, recording, etc.
音频模块用于将数字音频信息转换成模拟音频信号输出,也用于将模拟音频输入转换为数字音频信号。音频模块还可以用于对音频信号编码和解码。在一些实施例中,音频模块可以设置于处理器310中,或将音频模块的部分功能模块设置于处理器310中。The audio module is used to convert digital audio information into an analog audio signal output, and is also used to convert an analog audio input into a digital audio signal. The audio module can also be used to encode and decode audio signals. In some embodiments, the audio module may be disposed in the processor 310, or some functional modules of the audio module may be disposed in the processor 310.
扬声器370A,也称“喇叭”,用于将音频电信号转换为声音信号。终端300可以通过扬声器收听音乐,或收听免提通话。The speaker 370A, also called a "horn", is used to convert audio electrical signals into sound signals. The terminal 300 can listen to music through a speaker or listen to a hands-free call.
受话器370B,也称“听筒”,用于将音频电信号转换成声音信号。当终端300接听电话或语音信息时,可以通过将受话器靠近人耳接听语音。The receiver 370B, also known as the "earpiece", is used to convert audio electrical signals into sound signals. When the terminal 300 answers a call or a voice message, it can answer the voice by holding the receiver close to the human ear.
麦克风370C,也称“话筒”,“传声器”,用于将声音信号转换为电信号。当拨打电话或发送语音信息时,用户可以通过人嘴靠近麦克风发声,将声音信号输入到麦克风。终端300可以设置至少一个麦克风。在一些实施例中,终端300可以设置两个麦克风,除了采集声音信号,还可以实现降噪功能。在一些实施例中,终端300还可以设置三个,四个或更多麦克风,实现采集声音信号,降噪,还可以识别声音来源,实现定向录音功能等。Microphone 370C, also called "microphone", "microphone", is used to convert sound signals into electrical signals. When making a call or sending a voice message, the user can make a sound through the mouth close to the microphone, and input the sound signal into the microphone. The terminal 300 may be provided with at least one microphone. In some embodiments, the terminal 300 may be provided with two microphones, and in addition to collecting sound signals, it may also implement a noise reduction function. In some embodiments, the terminal 300 may further be provided with three, four, or more microphones to collect sound signals, reduce noise, and also identify sound sources, and implement a directional recording function.
耳机接口370D用于连接有线耳机。耳机接口可以是USB接口,也可以是3.5mm的开放移动终端平台(Open Mobile Terminal Platform,OMTP)标准接口,美国蜂窝电信工业协会(Cellular Telecommunications Industry Association of the USA,CTIA)标准接口。The headset interface 370D is used to connect a wired headset. The earphone interface can be a USB interface or a 3.5mm Open Mobile Terminal Platform (OMTP) standard interface, and the Cellular Telecommunications Industry Association of the USA (CTIA) standard interface.
压力传感器380A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器可以设置于显示屏。压力传感器的种类很多,如电阻式压力传感器,电感式压力传感器,电容式压力传感器等。电容式压力传感器可以是包括至少两个具有导电材料的平行板。当有力作用于压力传感器,电极之间的电容改变。终端300根据电容的变化确定压力的强度。当有触摸操作作用于显示屏,终端300根据压力传感器检测所述触摸操作强度。终端300也可以根据压力传感器的检测信号计算触摸的位置。在一些实施例中,作用于相同触摸位置,但不同触摸操作强度的触摸操作,可以对应不同的操作指令。例如:当有触摸操作强度小于第一压力阈值的触摸操作作用于短消息应用图标时,执行查看短消息的指令。当有触摸操作强度大于或等于第一压力阈值的触摸操作作用于短消息应用图标时,执行新建短消息的指令。The pressure sensor 380A is used to sense the pressure signal, and can convert the pressure signal into an electrical signal. In some embodiments, the pressure sensor may be disposed on the display screen. There are many types of pressure sensors, such as resistive pressure sensors, inductive pressure sensors, and capacitive pressure sensors. The capacitive pressure sensor may be at least two parallel plates having a conductive material. When a force is applied to the pressure sensor, the capacitance between the electrodes changes. The terminal 300 determines the intensity of the pressure according to the change in capacitance. When a touch operation acts on the display screen, the terminal 300 detects the intensity of the touch operation according to a pressure sensor. The terminal 300 may also calculate the touched position based on the detection signal of the pressure sensor. In some embodiments, touch operations acting on the same touch position but different touch operation intensities may correspond to different operation instructions. For example, when a touch operation with a touch operation intensity lower than the first pressure threshold is applied to the short message application icon, an instruction for viewing the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold is applied to the short message application icon, an instruction for creating a short message is executed.
陀螺仪传感器380B可以用于确定终端300的运动姿态。在一些实施例中,可以通过陀螺仪传感器确定终端300围绕三个轴(即,x,y和z轴)的角速度。陀螺仪传感器可以用于拍摄防抖。示例性的,当按下快门,陀螺仪传感器检测终端300抖动的角度,根据角度计算出镜头模组需要补偿的距离,让镜头通过反向运动抵消终端300的抖动,实现防抖。陀螺仪传感器还可以用于导航,体感游戏场景。The gyro sensor 380B may be used to determine a motion posture of the terminal 300. In some embodiments, the angular velocity of the terminal 300 around three axes (ie, x, y, and z axes) may be determined by a gyro sensor. A gyroscope sensor can be used for image stabilization. Exemplarily, when the shutter is pressed, the gyro sensor detects the angle of the terminal 300 to shake, and calculates the distance that the lens module needs to compensate according to the angle, so that the lens can offset the shake of the terminal 300 through reverse movement to achieve anti-shake. The gyroscope sensor can also be used for navigation and somatosensory game scenes.
气压传感器380C用于测量气压。在一些实施例中,终端300通过气压传感器测得的气压值计算海拔高度,辅助定位和导航。The barometric pressure sensor 380C is used to measure air pressure. In some embodiments, the terminal 300 calculates the altitude through the air pressure value measured by the air pressure sensor to assist in positioning and navigation.
磁传感器380D包括霍尔传感器。终端300可以利用磁传感器检测翻盖皮套的开合。在一些实施例中,当终端300是翻盖机时,终端300可以根据磁传感器检测翻盖的开合。进而根据检测到的皮套的开合状态或翻盖的开合状态,设置翻盖自动解锁等特性。The magnetic sensor 380D includes a Hall sensor. The terminal 300 can detect the opening and closing of the flip leather case by using a magnetic sensor. In some embodiments, when the terminal 300 is a flip machine, the terminal 300 may detect the opening and closing of the flip according to a magnetic sensor. Further, according to the opened and closed state of the holster or the opened and closed state of the flip cover, characteristics such as automatic unlocking of the flip cover are set.
加速度传感器380E可检测终端300在各个方向上(一般为三轴)加速度的大小。当终端300静止时可检测出重力的大小及方向。还可以用于识别终端姿态,应用于横竖屏切换,计步器等应用。The acceleration sensor 380E can detect the magnitude of the acceleration of the terminal 300 in various directions (generally three axes). The magnitude and direction of gravity can be detected when the terminal 300 is stationary. It can also be used to identify the posture of the terminal, and is used in applications such as switching between horizontal and vertical screens, and pedometers.
距离传感器380F,用于测量距离。终端300可以通过红外或激光测量距离。在一 些实施例中,拍摄场景,终端300可以利用距离传感器测距以实现快速对焦。Distance sensor 380F, used to measure distance. The terminal 300 can measure the distance by infrared or laser. In some embodiments, when shooting a scene, the terminal 300 may use a distance sensor to measure distances to achieve fast focusing.
接近光传感器380G可以包括例如发光二极管(LED)和光检测器,例如光电二极管。发光二极管可以是红外发光二极管。通过发光二极管向外发射红外光。使用光电二极管检测来自附近物体的红外反射光。当检测到充分的反射光时,可以确定终端300附近有物体。当检测到不充分的反射光时,可以确定终端300附近没有物体。终端300可以利用接近光传感器检测用户手持终端300贴近耳朵通话,以便自动熄灭屏幕达到省电的目的。接近光传感器也可用于皮套模式,口袋模式自动解锁与锁屏。The proximity light sensor 380G may include, for example, a light emitting diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. Infrared light is emitted outward through a light emitting diode. Use photodiodes to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the terminal 300. When insufficient reflected light is detected, it can be determined that there is no object near the terminal 300. The terminal 300 may use a proximity light sensor to detect that the user is holding the terminal 300 close to the ear to talk, so as to automatically turn off the screen to save power. The proximity light sensor can also be used in holster mode, and the pocket mode automatically unlocks and locks the screen.
环境光传感器380L用于感知环境光亮度。终端300可以根据感知的环境光亮度自适应调节显示屏亮度。环境光传感器也可用于拍照时自动调节白平衡。环境光传感器还可以与接近光传感器配合,检测终端300是否在口袋里,以防误触。Ambient light sensor 380L is used to sense ambient light brightness. The terminal 300 can adaptively adjust the brightness of the display screen according to the perceived ambient light brightness. The ambient light sensor can also be used to automatically adjust the white balance when taking pictures. The ambient light sensor can also cooperate with the proximity light sensor to detect whether the terminal 300 is in a pocket to prevent accidental touch.
指纹传感器380H用于采集指纹。终端300可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。The fingerprint sensor 380H is used to collect fingerprints. The terminal 300 may use the collected fingerprint characteristics to realize fingerprint unlocking, access application lock, fingerprint photographing, fingerprint answering an incoming call, and the like.
温度传感器380J用于检测温度。在一些实施例中,终端300利用温度传感器检测的温度,执行温度处理策略。例如,当温度传感器上报的温度超过阈值,终端300执行降低位于温度传感器附近的处理器的性能,以便降低功耗实施热保护。The temperature sensor 380J is used to detect the temperature. In some embodiments, the terminal 300 executes a temperature processing strategy using the temperature detected by the temperature sensor. For example, when the temperature reported by the temperature sensor exceeds a threshold, the terminal 300 executes reducing the performance of a processor located near the temperature sensor in order to reduce power consumption and implement thermal protection.
触摸传感器380K,也称“触控面板”。可设置于显示屏。用于检测作用于其上或附近的触摸操作。可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型,并通过显示屏提供相应的视觉输出。The touch sensor 380K is also called "touch panel". Can be set on the display. Used to detect touch operations on or near it. The detected touch operation can be passed to the application processor to determine the type of touch event and provide corresponding visual output through the display screen.
骨传导传感器380M可以获取振动信号。在一些实施例中,骨传导传感器可以获取人体声部振动骨块的振动信号。骨传导传感器也可以接触人体脉搏,接收血压跳动信号。在一些实施例中,骨传导传感器也可以设置于耳机中。音频模块370可以基于所述骨传导传感器获取的声部振动骨块的振动信号,解析出语音信号,实现语音功能。应用处理器可以基于所述骨传导传感器获取的血压跳动信号解析心率信息,实现心率检测功能。The bone conduction sensor 380M can acquire vibration signals. In some embodiments, the bone conduction sensor may obtain a vibration signal of a human voice oscillating bone mass. Bone conduction sensors can also touch the human pulse and receive blood pressure beating signals. In some embodiments, a bone conduction sensor may also be provided in the headset. The audio module 370 may analyze a voice signal based on a vibration signal of the oscillating bone mass obtained by the bone conduction sensor to implement a voice function. The application processor may analyze the heart rate information based on the blood pressure beating signal obtained by the bone conduction sensor to implement a heart rate detection function.
按键390包括开机键,音量键等。按键可以是机械按键。也可以是触摸式按键。终端300接收按键输入,产生与终端300的用户设置以及功能控制有关的键信号输入。The keys 390 include a start key, a volume key, and the like. The keys can be mechanical keys. It can also be a touch button. The terminal 300 receives key input, and generates key signal inputs related to user settings and function control of the terminal 300.
马达391可以产生振动提示。马达可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。作用于显示屏不同区域的触摸操作,也可对应不同的振动反馈效果。不同的应用场景(例如:时间提醒,接收信息,闹钟,游戏等)也可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。The motor 391 may generate a vibration alert. The motor can be used for incoming vibration alert and touch vibration feedback. For example, the touch operation applied to different applications (such as taking pictures, playing audio, etc.) can correspond to different vibration feedback effects. Touch operations on different areas of the display can also correspond to different vibration feedback effects. Different application scenarios (such as time reminders, receiving information, alarm clocks, games, etc.) can also correspond to different vibration feedback effects. Touch vibration feedback effect can also support customization.
指示器392可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。The indicator 392 can be an indicator light, which can be used to indicate the charging status, power change, and can also be used to indicate messages, missed calls, notifications, etc.
SIM卡接口395用于连接用户标识模块(Subscriber Identity Module,SIM)。SIM卡可以通过插入SIM卡接口,或从SIM卡接口拔出,实现和终端300的接触和分离。终端300可以支持1个或N个SIM卡接口,N为大于1的正整数。SIM卡接口可以支持Nano SIM卡,Micro SIM卡,SIM卡等。同一个SIM卡接口可以同时插入多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口也可以兼容不同类型的SIM卡。SIM卡接口也可以兼容外部存储卡。终端300通过SIM卡和网络交互,实现通话以及 数据通信等功能。在一些实施例中,终端300使用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在终端300中,不能和终端300分离。The SIM card interface 395 is used to connect to a Subscriber Identity Module (SIM). The SIM card can be contacted and separated from the terminal 300 by inserting or removing the SIM card interface. The terminal 300 may support one or N SIM card interfaces, and N is a positive integer greater than 1. The SIM card interface can support Nano SIM cards, Micro SIM cards, SIM cards, etc. Multiple SIM cards can be inserted into the same SIM card interface at the same time. The types of the multiple cards may be the same or different. The SIM card interface is also compatible with different types of SIM cards. The SIM card interface is also compatible with external memory cards. The terminal 300 interacts with the network through the SIM card, and realizes functions such as calling and data communication. In some embodiments, the terminal 300 uses an eSIM, that is, an embedded SIM card. The eSIM card can be embedded in the terminal 300 and cannot be separated from the terminal 300.
本申请实施例提供的终端更新语音助手的唤醒语音的方法可以在上述终端300中实现。The method for updating the wake-up voice of the voice assistant provided by the terminal provided in the embodiment of the present application may be implemented in the terminal 300 described above.
本申请实施例提供一种终端更新语音助手的唤醒语音的方法。终端300可以接收用户输入的第一语音数据;判断第一语音数据对应的文本与终端300中注册的预置唤醒词的文本是否匹配;如果第一语音数据对应的文本与预置唤醒词的文本匹配,则终端300对用户进行身份认证;若身份认证通过,终端300则采用第一语音数据更新终端中的第一声纹模型。An embodiment of the present application provides a method for a terminal to update a wake-up voice of a voice assistant. The terminal 300 may receive the first voice data input by the user; determine whether the text corresponding to the first voice data matches the text of the preset wake-up word registered in the terminal 300; if the text corresponding to the first voice data matches the text of the preset wake-up word If they match, the terminal 300 authenticates the user. If the authentication succeeds, the terminal 300 uses the first voice data to update the first voiceprint model in the terminal.
其中,第一声纹模型用于在唤醒语音助手时进行声纹校验,第一声纹模型表征终端中预置唤醒词的声纹特征。The first voiceprint model is used to perform voiceprint verification when the voice assistant is awakened, and the first voiceprint model represents the voiceprint characteristics of the preset wakeup word in the terminal.
在本申请一实施例中,终端对用户进行身份认证,具体为:终端使用第一声纹模型对第一语音数据进行声纹校验。其中,如果第一语音数据通过声纹校验,则表示身份认证通过。In an embodiment of the present application, the terminal performs identity authentication on the user. Specifically, the terminal uses the first voiceprint model to perform voiceprint verification on the first voice data. If the first voice data passes the voiceprint verification, it means that the identity authentication is passed.
本申请实施例中,如果第一语音数据可以对应的文本与预置唤醒词的文本匹配,且用户身份认证通过,则表示第一语音数据是身份认证通过的用户发出的可以唤醒语音助手的唤醒语音。并且,由于第一语音数据是终端300实时获取的用户的语音数据;因此,第一语音数据可以反映用户的身体状态和/或用户所处的噪声场景的实时状况。综上所示,采用第一语音数据更新终端300的声纹模型,可以提高终端执行语音唤醒的语音唤醒率,降低误唤醒率。In the embodiment of the present application, if the text corresponding to the first voice data matches the text of the preset wake-up word, and the user identity authentication is passed, it means that the first voice data is a wake-up of the voice assistant sent by the user who passed the identity authentication. voice. In addition, since the first voice data is the voice data of the user acquired by the terminal 300 in real time; therefore, the first voice data may reflect the physical state of the user and / or the real-time condition of the noise scene in which the user is located. In summary, using the first voice data to update the voiceprint model of the terminal 300 can improve the voice wake-up rate and reduce the false wake-up rate when the terminal performs voice wake-up.
进一步的,第一语音数据是终端300在终端300执行语音唤醒过程中自动获取的,而不是提示用户手动重新注册唤醒词后接收用户输入的。如此,采用第一语音数据更新声纹模型,还可以简化唤醒词更新的流程。Further, the first voice data is automatically acquired by the terminal 300 during the voice wake-up process performed by the terminal 300, instead of prompting the user to manually re-register the wake-up word to receive user input. In this way, using the first voice data to update the voiceprint model can also simplify the process of updating the wake word.
本申请实施例提供一种终端更新语音助手的唤醒语音的方法。如图4A所示,该终端更新语音助手的唤醒语音的方法可以包括S401-S405:An embodiment of the present application provides a method for a terminal to update a wake-up voice of a voice assistant. As shown in FIG. 4A, the method for updating the wake-up voice of the voice assistant by the terminal may include S401-S405:
S401、终端300接收第一语音数据。S401. The terminal 300 receives first voice data.
S402、终端300判断第一语音数据对应的文本与终端中注册的预置唤醒词的文本是否匹配。S402. The terminal 300 determines whether the text corresponding to the first voice data matches the text of the preset wake-up word registered in the terminal.
其中,终端300的DSP监测到第一语音数据后,可以通知终端300的AP对第一语音数据进行文本校验和声纹校验。其中,AP可以通过判断第一语音数据对应的文本与终端中注册的预置唤醒词的文本是否匹配,对第一语音数据进行文本校验。如果第一语音数据对应的文本与终端中注册的预置唤醒词的文本匹配(如相同),AP则可以继续对第一语音数据进行声纹校验,即终端300继续执行S403。如果第一语音数据对应的文本与终端中注册的预置唤醒词的文本不匹配,终端300则可以删除第一语音数据,即终端300可以继续执行S405。After the DSP of the terminal 300 detects the first voice data, the DSP of the terminal 300 may notify the AP of the terminal 300 to perform text verification and voice print verification on the first voice data. The AP may perform text verification on the first voice data by determining whether the text corresponding to the first voice data matches the text of a preset wake-up word registered in the terminal. If the text corresponding to the first voice data matches the text of the preset wake-up word registered in the terminal (if the same), the AP may continue to perform voiceprint verification on the first voice data, that is, the terminal 300 continues to execute S403. If the text corresponding to the first voice data does not match the text of the preset wake-up word registered in the terminal, the terminal 300 may delete the first voice data, that is, the terminal 300 may continue to execute S405.
S403、终端300使用第一声纹模型对第一语音数据进行声纹校验。S403. The terminal 300 performs voiceprint verification on the first voice data using the first voiceprint model.
其中,第一声纹模型用于唤醒语音助手时进行声纹校验。第一声纹模型用于表征终端300中注册的唤醒词的声纹特征。The first voiceprint model is used to perform voiceprint verification when the voice assistant is woken up. The first voiceprint model is used to characterize the voiceprint features of the wake-up words registered in the terminal 300.
由本申请实施例介绍的“终端注册唤醒词”的过程可知:终端300在注册预置唤 醒词时,录制了语音数据(称为注册语音数据)。终端300中注册的预置唤醒词可以包括该注册语音数据。上述第一声纹模型是根据该注册语音数据生成的。结合本申请实施例的上述描述,终端300生成第一声纹模型后,可以将注册语音数据作为输入值,代入第一声纹模型,得到第一声纹门限。It can be known from the process of “terminal registering wake-up words” described in the embodiment of the present application that when the terminal 300 registers a preset wake-up word, voice data (referred to as registered voice data) is recorded. The preset wake-up word registered in the terminal 300 may include the registered voice data. The first voiceprint model is generated based on the registered voice data. With reference to the above description of the embodiment of the present application, after the terminal 300 generates the first voiceprint model, the registered voice data can be used as an input value to substitute the first voiceprint model to obtain the first voiceprint threshold.
终端300使用第一声纹模型对第一语音数据进行声纹校验的方法可以包括:终端300确定第一语音数据通过文本校验后,可以将该第一语音数据作为输入值,代入第一声纹模型,得到一个声纹值。终端300判断该声纹值与第一声纹门限的差值是否小于预设阈值。如果该声纹值与第一声纹门限的差值小于预设阈值,则声纹验证通过。如果该声纹值与第一声纹门限的差值大于或者等于预设阈值,则声纹验证未通过。The method for the terminal 300 to perform voiceprint verification on the first voice data using the first voiceprint model may include: After the terminal 300 determines that the first voice data passes the text verification, the terminal 300 may use the first voice data as an input value and substitute it into the first Voiceprint model to get a voiceprint value. The terminal 300 determines whether the difference between the voiceprint value and the first voiceprint threshold is less than a preset threshold. If the difference between the voiceprint value and the first voiceprint threshold is less than a preset threshold, the voiceprint verification passes. If the difference between the voiceprint value and the first voiceprint threshold is greater than or equal to a preset threshold, the voiceprint verification fails.
如果第一语音数据通过声纹校验,终端300则可以采用第一语音数据更新终端300中的第一声纹模型,即终端300可以继续执行S404。如果第一语音数据未通过声纹校验,终端300则可以删除第一语音数据,即终端300可以继续执行S405。If the first voice data passes the voiceprint verification, the terminal 300 may use the first voice data to update the first voiceprint model in the terminal 300, that is, the terminal 300 may continue to execute S404. If the first voice data fails the voiceprint verification, the terminal 300 may delete the first voice data, that is, the terminal 300 may continue to execute S405.
S404、终端300采用第一语音数据更新终端300中的第一声纹模型。S404. The terminal 300 updates the first voiceprint model in the terminal 300 with the first voice data.
其中,终端300采用第一语音数据更新第一声纹模型的方法(即S404)可以包括:终端300根据第一语音数据生成第二声纹模型,采用第二声纹模型替换第一声纹模型。其中,终端300根据第一语音数据生成第二声纹模型的方法,可以参考常规技术中终端生成声纹模型的方法。本申请实施例这里不予赘述。The method in which the terminal 300 uses the first voice data to update the first voiceprint model (ie, S404) may include: the terminal 300 generates a second voiceprint model according to the first voice data, and uses the second voiceprint model to replace the first voiceprint model . The method for generating the second voiceprint model by the terminal 300 according to the first voice data may refer to the method for generating a voiceprint model by the terminal in the conventional technology. This embodiment of the present application will not repeat them here.
S405、终端300删除第一语音数据。S405. The terminal 300 deletes the first voice data.
本申请实施例提供一种终端更新语音助手的唤醒语音的方法,终端300可以获取终端300执行语音唤醒时,通过文本校验和声纹校验的第一语音数据。然后,采用该第一语音数据更新终端300中的第一声纹模型。其中,由于第一语音数据是终端300实时获取的用户的语音数据;因此,第一语音数据可以反映用户的身体状态和/或用户所处的噪声场景的实时状况。并且,由于第一语音数据通过了文本校验和声纹校验;因此,采用第一语音数据更新终端300的声纹模型,可以提高终端执行语音唤醒的语音唤醒率,降低误唤醒率。An embodiment of the present application provides a method for a terminal to update a wake-up voice of a voice assistant. The terminal 300 may obtain first voice data that passes text verification and voice print verification when the terminal 300 performs voice wake-up. Then, the first voice data model in the terminal 300 is updated using the first voice data. The first voice data is the voice data of the user obtained by the terminal 300 in real time; therefore, the first voice data may reflect the physical state of the user and / or the real-time condition of the noise scene in which the user is located. In addition, since the first voice data passes the text check and voiceprint check; therefore, using the first voice data to update the voiceprint model of the terminal 300 can improve the voice wake-up rate and reduce the false wake-up rate when the terminal performs voice wake-up.
进一步的,第一语音数据是终端300在终端300执行语音唤醒过程中自动获取的,而不是提示用户手动重新注册唤醒词后接收用户输入的。如此,采用第一语音数据更新声纹模型,还可以简化唤醒词更新的流程。Further, the first voice data is automatically acquired by the terminal 300 during the voice wake-up process performed by the terminal 300, instead of prompting the user to manually re-register the wake-up word to receive user input. In this way, using the first voice data to update the voiceprint model can also simplify the process of updating the wake word.
其中,如果第一语音数据通过声纹校验,终端300可以启动语音助手。在一些情况下,用户在与他人交谈过程中可能会说出终端300的预置唤醒词(即语音数据)。在这种情况下,用户说出终端300的预置唤醒词的真实目的并不是要启动语音助手。终端300的语音助手被启动后,用户也不会通过语音触发终端300执行任何功能。本申请实施例中,将这类语音唤醒称为无效语音唤醒。也就是说,语音助手启动后,终端300通过语音助手没有接收到有效的语音命令。基于这种情况,终端300可以通过判断语音助手启动后,终端300通过语音助手是否接收到了有效的语音命令,来决定是否采用第一语音数据更新终端300中的第一声纹模型。具体的,本申请实施例提供一种终端更新语音助手的唤醒语音的方法。如图5A所示,该终端更新语音助手的唤醒语音的方法可以包括S401-S403、S501-S503、S404和S405:Wherein, if the first voice data passes the voiceprint verification, the terminal 300 may start a voice assistant. In some cases, the user may speak a preset wake-up word (ie, voice data) of the terminal 300 during a conversation with others. In this case, the real purpose of the user speaking the preset wake-up word of the terminal 300 is not to start the voice assistant. After the voice assistant of the terminal 300 is activated, the user will not trigger the terminal 300 to perform any function through voice. In the embodiments of the present application, this type of voice wakeup is referred to as invalid voice wakeup. That is, after the voice assistant is started, the terminal 300 does not receive a valid voice command through the voice assistant. Based on this situation, the terminal 300 can determine whether to use the first voice data to update the first voiceprint model in the terminal 300 by determining whether the voice assistant has received a valid voice command after the voice assistant is started. Specifically, an embodiment of the present application provides a method for a terminal to update a wake-up voice of a voice assistant. As shown in FIG. 5A, the method for updating the wake-up voice of the voice assistant by the terminal may include S401-S403, S501-S503, S404, and S405:
其中,在S403之后,如果第一语音数据通过声纹校验,终端300则可以继续执行 S501-S503。如果第一语音数据未通过声纹校验,终端300则可以继续执行S405。Among them, after S403, if the first voice data passes the voiceprint check, the terminal 300 may continue to execute S501-S503. If the first voice data fails the voiceprint verification, the terminal 300 may continue to execute S405.
S501、终端300启动语音助手。S501. The terminal 300 starts a voice assistant.
S502、终端300通过语音助手接收第二语音数据。S502. The terminal 300 receives the second voice data through a voice assistant.
其中,语音助手被启动后,可以接收用户输入的第二语音数据,触发终端100执行该第二语音数据对应的功能。After the voice assistant is started, it can receive the second voice data input by the user, and trigger the terminal 100 to execute the function corresponding to the second voice data.
其中,以终端是图4B所示的手机400为例。手机400启动语音助手后,手机400可以显示图4B所示的“语音助手”界面401。其中,“语音助手”界面401中包括“录制”按钮403和“设置”选项404。其中,手机400响应于用户对“录制”按钮403的点击操作(如长按操作),可以接收用户发出的语音命令,并触发手机400执行该语音命令对应的事件。“设置”选项404用于设置“语音助手”应用的各项功能和参数。手机400可以接收用户对语音控制界面303中的“设置”选项306的点击操作。响应于用户对“设置”选项404的点击操作,手机400可以显示图1中的(c)所示的语音控制界面106。可选的,“语音助手”界面401中还可以包括提示信息402。该提示信息402用于向用户指示该“语音助手”应用的常用功能。The terminal is a mobile phone 400 shown in FIG. 4B as an example. After the mobile phone 400 starts the voice assistant, the mobile phone 400 may display the "voice assistant" interface 401 shown in FIG. 4B. The "Voice Assistant" interface 401 includes a "Record" button 403 and a "Setting" option 404. In response, the mobile phone 400 may receive a voice command issued by the user in response to a user's click operation (such as a long-press operation) on the "Record" button 403, and trigger the mobile phone 400 to execute an event corresponding to the voice command. The “setting” option 404 is used to set various functions and parameters of the “Voice Assistant” application. The mobile phone 400 may receive a user's click operation on the “setting” option 306 in the voice control interface 303. In response to the user's click operation on the “setting” option 404, the mobile phone 400 may display the voice control interface 106 shown in (c) in FIG. 1. Optionally, the "voice assistant" interface 401 may further include prompt information 402. The prompt information 402 is used to indicate a common function of the "Voice Assistant" application to the user.
需要说明的是,“语音助手”界面401可以不包括“录制”按钮403。也就是说,当手机400显示“语音助手”界面时,用户不需要点击“语音助手”界面中的任一按钮(如“录制”按钮403),手机400也可以录制用户发出的语音命令。终端300的“语音助手”界面包括但不限于图4B所示的“语音助手”界面401。It should be noted that the "voice assistant" interface 401 may not include a "record" button 403. In other words, when the mobile phone 400 displays the "Voice Assistant" interface, the user does not need to click any button (such as the "Record" button 403) in the "Voice Assistant" interface, and the mobile phone 400 can also record voice commands issued by the user. The "Voice Assistant" interface of the terminal 300 includes, but is not limited to, the "Voice Assistant" interface 401 shown in FIG. 4B.
S503、终端300判断第二语音数据是否为有效的语音命令。S503. The terminal 300 determines whether the second voice data is a valid voice command.
本申请实施例中所述的有效的语音命令是指:能够触发终端300执行相应功能的指令。The effective voice command described in the embodiment of the present application refers to an instruction capable of triggering the terminal 300 to perform a corresponding function.
可以理解,如果用户特意说出终端300的预置唤醒词,即用户说出终端300的预置唤醒词的真实目的是要唤醒终端300的语音助手,那么在终端300的语音助手被启动后,用户一般都会通过语音触发终端300执行对应的功能。换言之,如果在语音助手被启动后,终端300通过语音助手接收到了用于触发终端300执行相应功能的指令(即有效的语音命令),则表示终端将响应于该有效的语音命令执行对应的功能,则可以确定这次语音唤醒是与用户意图相符的语音唤醒。本申请实施例中,将本次语音唤醒称为有效语音唤醒。It can be understood that if the user intentionally speaks the preset wake-up word of the terminal 300, that is, the real purpose of the user to speak the preset wake-up word of the terminal 300 is to wake up the voice assistant of the terminal 300, then after the voice assistant of the terminal 300 is activated, Users generally trigger the terminal 300 to perform corresponding functions through voice. In other words, if after the voice assistant is activated, the terminal 300 receives an instruction (that is, a valid voice command) for triggering the terminal 300 to perform a corresponding function through the voice assistant, it means that the terminal will execute the corresponding function in response to the valid voice command , You can determine that this voice wakeup is a voice wakeup that matches the user's intention. In the embodiment of the present application, this voice wakeup is referred to as effective voice wakeup.
为了保证采用第一语音数据更新终端300的第一声纹模型后,终端300执行语音唤醒的语音唤醒率。本申请实施例中,终端仅采用有效语音唤醒对应的语音数据更新终端300的唤醒词。具体的,如果终端300的语音助手被启动后,接收到了有效的语音命令,则表示用户使用第一语音数据唤醒终端300的语音助手是一次有效的语音唤醒,即第二语音数据是有效的语音命令,终端300可以执行404。如果终端300的语音助手被启动后,没有接收到第二语音数据,则表示用户使用第一语音数据唤醒终端300的语音助手是一次无效的语音唤醒,即第二语音数据不是有效的语音命令,终端300则可以删除第一语音数据,即执行S405。In order to ensure that after the first voiceprint model of the terminal 300 is updated using the first voice data, the terminal 300 performs a voice wake-up rate of voice wake-up. In the embodiment of the present application, the terminal only updates the wake-up word of the terminal 300 with the voice data corresponding to the effective voice wake-up. Specifically, if the voice assistant of the terminal 300 receives a valid voice command after being started, it means that the user using the first voice data to wake up the voice assistant of the terminal 300 is a valid voice wakeup, that is, the second voice data is a valid voice Command, the terminal 300 can execute 404. If the second voice data is not received after the voice assistant of the terminal 300 is started, it means that the user using the first voice data to wake up the voice assistant of the terminal 300 is an invalid voice wakeup, that is, the second voice data is not a valid voice command. The terminal 300 may delete the first voice data, that is, execute S405.
本申请实施例中,终端300是在终端300的语音助手被启动后,接收到用于触发终端300执行对应功能的有效的语音命令的情况下,才采用第一语音数据更新终端300中的第一声纹模型的。如果终端300的语音助手启动后,接收到有效的语音命令,则 表示这次语音唤醒是与用户意图相符的有效语音唤醒。采用能够反映用户真实意图、并且可以成功唤醒终端300的语音数据更新终端300的声纹模型,可以进一步提高终端执行语音唤醒的语音唤醒率,降低误唤醒率。In the embodiment of the present application, after the terminal 300's voice assistant is activated, the terminal 300 uses the first voice data to update the first voice data in the terminal 300 only after receiving a valid voice command for triggering the terminal 300 to perform a corresponding function. A voiceprint model. If a valid voice command is received after the voice assistant of the terminal 300 is started, it means that the voice wakeup is a valid voice wakeup in accordance with the user's intention. The voiceprint model of the terminal 300 is updated by using the voice data that can reflect the user's true intention and can successfully wake up the terminal 300, which can further improve the voice wake-up rate of the terminal to perform voice wake-up and reduce the false wake-up rate.
可以理解,如果第一语音数据的信号质量较差,那么在终端300采用第一语音数据更新第一声纹模型后,终端300使用更新后的声纹模型进行语音唤醒,会影响语音唤醒的成功率。It can be understood that if the signal quality of the first voice data is poor, after the terminal 300 uses the first voice data to update the first voiceprint model, the terminal 300 uses the updated voiceprint model to perform voice wake-up, which will affect the success of voice wake-up rate.
为了避免终端300采用信号质量较差的语音数据更新第一声纹模型,终端300在采用第一语音数据更新第一声纹模型之前,可以先判断第一语音数据的信号质量参数是否高于第二预设阈值。其中,语音数据的信号质量参数用于表征语音数据的信号质量的高低。例如,语音数据的信号质量参数可以为语音数据的信噪比。如果第一语音数据的信号质量参数高于第二预设阈值,则表示第一语音数据的信号质量比较高。在这种情况下,终端300可以采用第一语音数据更新第一声纹模型。如果第一语音数据的信号质量参数低于或者等于第二预设阈值,终端300则可以删除上述第一语音数据。In order to prevent the terminal 300 from updating the first voiceprint model with voice data with poor signal quality, the terminal 300 may determine whether the signal quality parameter of the first voice data is higher than that of the first voiceprint model before using the first voice data to update the first voiceprint model. Two preset thresholds. The signal quality parameters of the voice data are used to characterize the signal quality of the voice data. For example, the signal quality parameter of the voice data may be a signal-to-noise ratio of the voice data. If the signal quality parameter of the first voice data is higher than the second preset threshold, it means that the signal quality of the first voice data is relatively high. In this case, the terminal 300 may update the first voiceprint model by using the first voice data. If the signal quality parameter of the first voice data is lower than or equal to the second preset threshold, the terminal 300 may delete the first voice data.
可选的,本申请实施例中,还可以由用户决定是否采用第一语音数据更新终端300中的第一声纹模型。具体的,终端300在采用第一语音数据更新终端300中的第一声纹模型之前,还可以显示用于提示用户是否更新声纹模型的第一界面。然后,终端300根据用户在第一界面中选择,确定是否更新声纹模型。例如,以终端300是图5B所示的手机500为例。手机500在采用第一语音数据更新手机500中的第一声纹模型之前,可以显示图5B所示的第一界面501。第一界面501用于提示用户是否更新声纹模型(即唤醒词)。例如,第一界面501中包括第一提示信息,如“手机在语音唤醒过程中,获取到可以更新唤醒词的语音数据”和“是否更新唤醒词?”。第一界面501中还包括:用于触发手机500更新声纹模型的“更新”选项和用于触发手机500不更新声纹模型的“取消”选项。Optionally, in the embodiment of the present application, the user may also decide whether to use the first voice data to update the first voiceprint model in the terminal 300. Specifically, before updating the first voiceprint model in the terminal 300 with the first voice data, the terminal 300 may further display a first interface for prompting the user whether to update the voiceprint model. Then, the terminal 300 determines whether to update the voiceprint model according to the user's selection in the first interface. For example, the terminal 300 is a mobile phone 500 shown in FIG. 5B as an example. The mobile phone 500 may display the first interface 501 shown in FIG. 5B before the first voiceprint model in the mobile phone 500 is updated with the first voice data. The first interface 501 is used to prompt the user whether to update the voiceprint model (that is, the wake word). For example, the first interface 501 includes first prompt information, such as "the mobile phone obtains voice data that can update the wake-up word during the voice wake-up process" and "is the wake-up word updated?" The first interface 501 further includes: an "update" option for triggering the mobile phone 500 to update the voiceprint model and a "cancel" option for triggering the mobile phone 500 not to update the voiceprint model.
本申请实施例中,在更新声纹模型之前,终端300显示用于提示用户是否更新声纹模型的第一界面。这样,可以由用户决定是否更新声纹模型的第一界面。即终端300可以按照用户需求来确定是否更新声纹模型的第一界面,可以提高终端300与用户的交互性能,提升用户体验。In the embodiment of the present application, before updating the voiceprint model, the terminal 300 displays a first interface for prompting the user whether to update the voiceprint model. In this way, the user can decide whether to update the first interface of the voiceprint model. That is, the terminal 300 can determine whether to update the first interface of the voiceprint model according to user requirements, which can improve the interaction performance between the terminal 300 and the user, and improve the user experience.
由本申请实施例介绍的“终端注册唤醒词”的过程可知:终端300在注册预置唤醒词时,录制了一个或多个语音数据(称为注册语音数据)。上述第一声纹模型是根据该一个或多个注册语音数据生成的。假设第一声纹模型是根据至少两个注册语音数据生成的。那么,终端300根据第一语音数据生成新的声纹模型后,如果直接采用新的声纹模型替换第一声纹模型,虽然可以提升终端300执行语音唤醒的语音唤醒率。但是,直接采用根据新的语音数据(即第一语音数据)生成的声纹模型替换第一声纹模型会大幅度提升语音唤醒率。而大幅度提升语音唤醒率,可能会相应的提高终端300执行语音唤醒的误唤醒率。为了可以在稳定提升终端300的语音唤醒率的同时,降低终端300执行语音唤醒的误唤醒率。终端300采用第一语音数据更新第一声纹模型的方法(即S404)可以包括S601-S603。例如,如图6所示,图5A所示的S404可以包括S601-S603:According to the process of “terminal registering wake-up words” introduced in the embodiment of the present application, it is known that when the terminal 300 registers a preset wake-up word, one or more voice data (referred to as registered voice data) is recorded. The first voiceprint model is generated based on the one or more registered voice data. It is assumed that the first voiceprint model is generated based on at least two registered voice data. Then, after the terminal 300 generates a new voiceprint model according to the first voice data, if the first voiceprint model is directly replaced with the new voiceprint model, the voice wakeup rate of the terminal 300 performing voice wakeup can be improved. However, directly replacing the first voiceprint model with a voiceprint model generated based on the new voice data (ie, the first voice data) will greatly improve the voice wake-up rate. A substantial increase in the voice wake-up rate may increase the false wake-up rate of the voice wake-up performed by the terminal 300 accordingly. In order to stably increase the voice wake-up rate of the terminal 300, and reduce the false wake-up rate of the voice wake-up performed by the terminal 300. The method in which the terminal 300 uses the first voice data to update the first voiceprint model (ie, S404) may include S601-S603. For example, as shown in FIG. 6, S404 shown in FIG. 5A may include S601-S603:
S601、终端300采用第一语音数据,替换至少两个注册语音数据中的第三语音数 据,得到更新后的至少两个注册语音数据。S601. The terminal 300 uses the first voice data to replace the third voice data in the at least two registered voice data, and obtains at least two updated registered voice data.
S602、终端300根据更新后的至少两个注册语音数据,生成第二声纹模型。S602. The terminal 300 generates a second voiceprint model according to the updated at least two registered voice data.
S603、终端300采用第二声纹模型替换第一声纹模型。S603. The terminal 300 replaces the first voiceprint model with the second voiceprint model.
其中,如果终端300的语音助手被启动后,接收到有效的语音命令,终端300则可以从终端300保存的至少两个注册语音数据中确定出第三语音数据。Wherein, after the voice assistant of the terminal 300 is activated and receives a valid voice command, the terminal 300 may determine the third voice data from the at least two registered voice data saved by the terminal 300.
在一些实施例中,上述第三语音数据是上述至少两个注册语音数据中、信号质量参数低于其他语音数据的信号质量参数的语音数据。终端300采用第一语音数据替换信号质量参数低于其他语音数据的信号质量参数的第三语音数据;然后,根据更新后的至少两个注册语音数据,生成第二声纹模型。至少两个注册语音数据中被第一语音数据替换的语音数据相比于其他语音数据,其信号质量参数较低。也就是说,保留下来的语音数据(即更新后的至少两个注册语音数据)的信号质量参数较高。终端300根据信号质量参数较高的语音数据生成的第二声纹模型可以更加准确、清楚的表征用户的声纹特征。终端300采用该第二声纹模型进行语音唤醒,可以提高语音唤醒率,降低终端执行语音唤醒的误唤醒率。In some embodiments, the third voice data is voice data in which the signal quality parameter of the at least two registered voice data is lower than the signal quality parameters of other voice data. The terminal 300 uses the first voice data to replace the third voice data whose signal quality parameter is lower than the signal quality parameters of other voice data; and then generates a second voiceprint model according to the updated at least two registered voice data. Compared with other voice data, the voice data replaced by the first voice data in the at least two registered voice data has lower signal quality parameters. That is, the signal quality parameters of the retained voice data (that is, at least two updated registered voice data) are higher. The second voiceprint model generated by the terminal 300 based on the voice data with higher signal quality parameters can more accurately and clearly characterize the voiceprint characteristics of the user. The terminal 300 uses the second voiceprint model to perform voice wake-up, which can increase the voice wake-up rate and reduce the false wake-up rate of the terminal performing voice wake-up.
在另一些实施例中,上述第三语音数据可以是上述至少两个注册语音数据中、被终端保存的时间最早的语音数据。其中,相比于至少两个注册语音数据中除第三语音数据之外的其他语音数据,被终端保存的时间最早的语音数据(即第三语音数据)与用户当前的身体状态以及用户当前所处的噪声场景的实时状况的符合度较低。因此,采用第一语音数据替换第三语音数据后,可以提高保留下来的语音数据(即更新后的至少两个注册语音数据)与用户当前的身体状态以及用户当前所处的噪声场景的实时状况的符合度。终端300根据上述符合度较高的语音数据生成的第二声纹模型可以更加准确、清楚的表征在用户当前身体状态和当前所处的噪声场景下用户的声纹特征。终端300采用该第二声纹模型进行语音唤醒,可以提高语音唤醒率,降低终端执行语音唤醒的误唤醒率。In other embodiments, the third voice data may be the earliest voice data stored by the terminal among the at least two registered voice data. Among them, compared with the other voice data other than the third voice data in the at least two registered voice data, the earliest voice data (that is, the third voice data) stored by the terminal is related to the user's current physical state and the user's current The consistency of the real-time conditions of the noise scene at the place is low. Therefore, after the first voice data is used to replace the third voice data, the real-time conditions of the retained voice data (that is, at least two registered voice data after update) and the current physical state of the user and the noise scene in which the user is currently located can be improved. Of compliance. The second voiceprint model generated by the terminal 300 according to the voice data with a higher degree of conformity can more accurately and clearly characterize the voiceprint characteristics of the user under the user's current body state and the current noise scene. The terminal 300 uses the second voiceprint model to perform voice wake-up, which can increase the voice wake-up rate and reduce the false wake-up rate of the terminal performing voice wake-up.
本申请实施例中,终端300采用第一语音数据替换至少两个注册语音数据中的部分语音数据,如第三语音数据;而不是完全根据第一语音数据生成第二声纹模型。这样,可以较为稳定的提升终端300执行语音唤醒的语音唤醒率。并且,可以在稳定提升终端300的语音唤醒率的同时,降低终端300执行语音唤醒的误唤醒率。In the embodiment of the present application, the terminal 300 uses the first voice data to replace part of the voice data in the at least two registered voice data, such as the third voice data; instead of generating the second voiceprint model completely based on the first voice data. In this way, the voice wake-up rate of the voice wake-up performed by the terminal 300 can be relatively stabilized. In addition, while the voice wake-up rate of the terminal 300 can be steadily increased, the false wake-up rate of the voice wake-up performed by the terminal 300 can be reduced.
可以理解,如果终端根据第二声纹模型生成的第二声纹门限与第一声纹门限相差较大,则会导致终端300执行语音唤醒的唤醒率大幅度波动,影响用户体验。基于此,如图7所示,在图6所示的S602之后,S603之前,本申请实施例的方法还可以包括S701-S702:It can be understood that if the second voiceprint threshold generated by the terminal according to the second voiceprint model is significantly different from the first voiceprint threshold, it will cause the wake-up rate of the terminal 300 to perform voice wakeup to fluctuate greatly, affecting the user experience. Based on this, as shown in FIG. 7, after S602 and before S603 shown in FIG. 6, the method in the embodiment of the present application may further include S701-S702:
S701、终端300根据第二声纹模型和更新后的至少两个注册语音数据,生成第二声纹门限。S701. The terminal 300 generates a second voiceprint threshold according to the second voiceprint model and the updated at least two registered voice data.
其中,第二声纹模型相当于一个函数。举例而言,终端300可以将更新后的至少两个注册语音数据中的每个注册语音数据分别作为输入值,代入上述第二声纹模型,得到至少两个声纹门限。终端300可以计算上述至少两个声纹门限的平均值,得到第二声纹门限。例如,假设上述更新后的至少两个注册语音数据包括注册语音数据a和注册语音数据b。终端300可以将注册语音数据a代入第二声纹模型,得到声纹门限A; 将注册语音数据b代入第二声纹模型,得到声纹门限B;计算声纹门限A和声纹门限B的平均值,得到第二声纹门限。Among them, the second voiceprint model is equivalent to a function. For example, the terminal 300 may use each of the updated at least two registered voice data as input values, respectively, and substitute them into the second voiceprint model to obtain at least two voiceprint thresholds. The terminal 300 may calculate an average value of the at least two voiceprint thresholds to obtain a second voiceprint threshold. For example, it is assumed that the at least two updated registered voice data include the registered voice data a and the registered voice data b. The terminal 300 may substitute the registered voice data a into the second voiceprint model to obtain the voiceprint threshold A; substitute the registered voice data b into the second voiceprint model to obtain the voiceprint threshold B; calculate the voiceprint threshold A and the voiceprint threshold B. The average, to get the second voiceprint threshold.
S702、终端300判断第二声纹门限与第一声纹门限的差值是否小于第一预设阈值。S702. The terminal 300 determines whether a difference between the second voiceprint threshold and the first voiceprint threshold is less than a first preset threshold.
具体的,如果第二声纹门限与第一声纹门限的差值小于第一预设阈值,则表示第二声纹门限与第一声纹门限的变化较小。这种情况下,采用第二声纹模型进行声纹校验,不会对终端300进行语音唤醒的唤醒率产生较大的影响。此时,终端300可以执行S603。Specifically, if the difference between the second voiceprint threshold and the first voiceprint threshold is smaller than the first preset threshold, it means that the change between the second voiceprint threshold and the first voiceprint threshold is small. In this case, using the second voiceprint model for voiceprint verification will not have a significant impact on the wake-up rate of voice wake-up of the terminal 300. At this time, the terminal 300 may execute S603.
如果第二声纹门限与第一声纹门限的差值大于或者等于第一预设阈值,则表示第二声纹门限与第一声纹门限的变化较大。这种情况下,采用第二声纹模型进行声纹校验,会对终端300进行语音唤醒的唤醒率产生较大的影响。此时,如图7所示,终端300可以执行S703:If the difference between the second voiceprint threshold and the first voiceprint threshold is greater than or equal to the first preset threshold, it means that the change between the second voiceprint threshold and the first voiceprint threshold is large. In this case, performing the voiceprint verification by using the second voiceprint model will greatly affect the wake-up rate of the voice wake-up of the terminal 300. At this time, as shown in FIG. 7, the terminal 300 may execute S703:
S703、终端300删除第二声纹模型和第一语音数据。S703. The terminal 300 deletes the second voiceprint model and the first voice data.
可以理解,终端300在第二声纹门限与第一声纹门限的变化较大时,删除第二声纹模型和第一语音数据,即不采用第一声纹模型替换第二声纹模型。这样,可以避免由于第二声纹门限与第一声纹门限的差值较大,导致终端300执行语音唤醒的唤醒率大幅度波动,影响用户体验。It can be understood that when the change between the second voiceprint threshold and the first voiceprint threshold is large, the terminal 300 deletes the second voiceprint model and the first voice data, that is, the first voiceprint model is not used to replace the second voiceprint model. In this way, the large difference between the second voiceprint threshold and the first voiceprint threshold can prevent the wake-up rate of the terminal 300 from performing the voice wakeup from fluctuating greatly, affecting the user experience.
本申请实施例提供一种终端更新语音助手的唤醒语音的方法。如图8所示,该终端更新语音助手的唤醒语音的方法可以包括S801-S808:An embodiment of the present application provides a method for a terminal to update a wake-up voice of a voice assistant. As shown in FIG. 8, the method for updating the wake-up voice of the voice assistant by the terminal may include S801-S808:
S801、终端300接收第一语音数据。S801. The terminal 300 receives first voice data.
S802、终端300判断第一语音数据对应的文本与终端中注册的预置唤醒词的文本是否匹配。S802. The terminal 300 determines whether the text corresponding to the first voice data matches the text of the preset wake-up word registered in the terminal.
其中,如果第一语音数据对应的文本与终端中注册的预置唤醒词的文本匹配,AP则可以继续对第一语音数据进行声纹校验,即终端300继续执行S803。如果第一语音数据对应的文本与终端中注册的预置唤醒词的文本不匹配,终端300则可以删除第一语音数据,即终端300可以继续执行S808。Wherein, if the text corresponding to the first voice data matches the text of the preset wake-up word registered in the terminal, the AP may continue to perform voiceprint verification on the first voice data, that is, the terminal 300 continues to perform S803. If the text corresponding to the first voice data does not match the text of the preset wake-up word registered in the terminal, the terminal 300 may delete the first voice data, that is, the terminal 300 may continue to execute S808.
S803、终端300使用第一声纹模型对第一语音数据进行声纹校验。S803. The terminal 300 performs voiceprint verification on the first voice data by using the first voiceprint model.
如果第一语音数据通过声纹校验,终端300则可以继续执行S804。如果第一语音数据未通过声纹校验,终端300则可以继续执行S808。If the first voice data passes the voiceprint verification, the terminal 300 may continue to execute S804. If the first voice data fails the voiceprint verification, the terminal 300 may continue to execute S808.
其中,S801-S803的详细描述可以参考本申请实施例对S401-S403的介绍,本申请实施例这里不予赘述。For detailed descriptions of S801-S803, reference may be made to the introduction of S401-S403 in the embodiments of the present application, which will not be repeated here in the embodiments of the present application.
S804、终端300启动语音助手。S804. The terminal 300 starts a voice assistant.
S805、终端300对第一预设时间内接收到的语音数据进行文本校验。S805. The terminal 300 performs text verification on the voice data received within the first preset time.
S806、终端300判断终端300在第一预设时间内是否接收到第二语音数据和至少一个与预置唤醒词的文本匹配的语音数据。S806. The terminal 300 determines whether the terminal 300 receives the second voice data and at least one voice data that matches the text of the preset wake-up word within the first preset time.
其中,第一预设时间是从终端300确定第一语音数据与终端300中注册的唤醒词的文本信息相同(即第一语音数据通过文本校验)、但是未通过声纹校验开始的预设时间段。The first preset time is a pre-determined time determined from the terminal 300 that the first voice data is the same as the text information of the wake-up word registered in the terminal 300 (that is, the first voice data passes the text verification), but fails to start the voiceprint verification Set the time period.
一般而言,终端300的AP处于休眠状态。终端300的DSP监测第一语音数据。当监测到语音数据与终端300中注册的唤醒词的相似度满足一定条件时,DSP将监测 到的语音数据交给AP,AP被唤醒。由AP对上述语音数据进行文本校验和声纹校验,以判断语音数据是否与生成的声纹模型匹配。AP对语音数据进行文本校验和声纹校验得到校验结果后,AP进入休眠状态,直至再次接收到DSP发送的语音数据。也就是说,DSP只会向AP发送与终端300中注册的唤醒词的相似度满足一定条件的语音数据。AP只会对DSP发送的语音数据(即与终端300中注册的唤醒词的相似度满足一定条件的语音数据)进行文本校验和声纹校验。Generally speaking, the AP of the terminal 300 is in a sleep state. The DSP of the terminal 300 monitors the first voice data. When the similarity between the detected voice data and the wake-up word registered in the terminal 300 satisfies a certain condition, the DSP hands the monitored voice data to the AP, and the AP is woken up. The AP performs text verification and voiceprint verification on the voice data to determine whether the voice data matches the generated voiceprint model. After the AP performs text verification and voiceprint verification on the voice data to obtain the verification result, the AP enters the sleep state until it receives the voice data sent by the DSP again. That is to say, the DSP will only send to the AP voice data that has a certain degree of similarity with the wake word registered in the terminal 300. The AP only performs text verification and voiceprint verification on the voice data sent by the DSP (that is, the voice data whose similarity with the wake-up word registered in the terminal 300 satisfies certain conditions).
可以理解,如果第一语音数据与终端300中注册的唤醒词的文本信息相同(即第一语音数据可以通过文本校验),那么DSP便可以识别到第一语音数据与终端300中注册的唤醒词的相似度满足一定条件。DSP可以向AP传输该第一语音数据,唤醒AP。由AP对第一语音数据进行文本校验和声纹校验。It can be understood that if the first voice data is the same as the text information of the wake-up word registered in the terminal 300 (that is, the first voice data can pass text verification), the DSP can recognize the first voice data and the wake-up registered in the terminal 300 The similarity of words meets certain conditions. The DSP may transmit the first voice data to the AP to wake up the AP. The AP performs text verification and voiceprint verification on the first voice data.
不同的是,在本申请实施例中,如果AP确定第一语音数据与终端300中注册的唤醒词的文本信息相同(即第一语音数据可以通过文本校验),但第一语音数据未通过声纹校验,那么AP不会在得到校验结果后立即进入休眠状态。而是由DSP将第一预设时间内监测到的所有语音数据都交给AP,AP可以对DSP在第一预设时间内监测到的所有语音数据进行文本校验。The difference is that in the embodiment of the present application, if the AP determines that the first voice data is the same as the text information of the wake word registered in the terminal 300 (that is, the first voice data can pass the text verification), but the first voice data fails Voiceprint verification, then the AP will not enter the sleep state immediately after receiving the verification result. Instead, the DSP delivers all voice data monitored in the first preset time to the AP, and the AP can perform text verification on all voice data monitored by the DSP in the first preset time.
其中,第一声纹模型用于在唤醒语音助手时进行声纹校验,第一声纹模型可以表征终端中注册的唤醒词的声纹特征。第二语音数据对应的文本包含预置的关键词。例如,第二语音数据可以为用户抱怨语音唤醒失败的语音数据,如“怎么唤不醒”、“怎么不行”、“不响应”、“不能唤醒”和“语音唤醒故障了”等语音数据。The first voiceprint model is used to perform voiceprint verification when the voice assistant is awakened. The first voiceprint model can represent the voiceprint features of the wake-up words registered in the terminal. The text corresponding to the second voice data contains a preset keyword. For example, the second voice data may be voice data in which the user complains that the voice wake-up fails, such as "how to wake up", "how not", "not responding", "unable to wake up", and "voice wake up failed".
AP对DSP在第一预设时间内监测到的所有语音数据进行文本校验。如果AP在第一预设时间识别到文本信息为“怎么唤不醒”、“怎么不行”、“不响应”、“不能唤醒”和“语音唤醒故障了”等第二语音数据,以及至少一个文本信息与终端300中注册的唤醒词的文本信息相同的语音数据,那么终端300则可以采用终端300接收到的第一语音数据,更新终端300的第一声纹模型。The AP performs text verification on all voice data monitored by the DSP within the first preset time. If the AP recognizes the second voice data such as "how to wake up", "how not to wake up", "not responding", "unable to wake up", and "voice wake up failure" at the first preset time, and at least one The text information is the same voice data as the text information of the wake-up word registered in the terminal 300, then the terminal 300 may use the first voice data received by the terminal 300 to update the first voiceprint model of the terminal 300.
可以理解,如果终端300执行S801接收到第一语音数据后,发现第一语音数据声纹校验未通过。随后,终端300在第一预设时间内可以接收到至少一个文本校验通过的语音数据,则表示用户多次想要语音唤醒终端300的语音助手,但是语音唤醒失败。这种情况下,如果终端300在第一预设时间内还接收到第二语音数据,则表示用户对语音唤醒失败的结果不满。It can be understood that if the terminal 300 receives the first voice data in S801 and finds that the voiceprint verification of the first voice data fails. Subsequently, the terminal 300 can receive at least one voice data that passes the text verification within the first preset time, which indicates that the user repeatedly wants to voice wake up the voice assistant of the terminal 300, but the voice wake-up fails. In this case, if the terminal 300 also receives the second voice data within the first preset time, it indicates that the user is dissatisfied with the result of the voice wake-up failure.
终端300在第一预设时间内接收到第二语音数据和至少一个文本校验通过的语音数据,表示用户存在语音唤醒语音助手的强烈意愿;但是,可能因为用户当前身体状态与用户注册唤醒词时的身体状态的差异较大,导致多次语音失败。当然,还可能是因为用户当前所处的噪声场景的实时状况与用户注册唤醒词时所处的噪声场景的实时状况差异较大,导致多次语音失败。在这种情况下,即使第一语音数据未通过声纹校验,终端300也可以采用接收到的第一语音数据,更新终端300中的第一声纹模型。也就是说,如果终端300在第一预设时间内接收到第二语音数据和至少一个与预置唤醒词的文本匹配的语音数据,则采用第一语音数据更新终端中的第一声纹模型,即执行S807。The terminal 300 receives the second voice data and the voice data that passes at least one text verification within the first preset time, indicating that the user has a strong willingness to wake up the voice assistant by voice; however, it may be because the user's current physical state and the user register the wake word The state of the body is very different at the time, resulting in multiple speech failures. Of course, it may also be because the real-time situation of the noise scene in which the user is currently located is different from the real-time situation of the noise scene in which the user is registering the wake-up word, resulting in multiple voice failures. In this case, even if the first voice data fails the voiceprint check, the terminal 300 may use the received first voice data to update the first voiceprint model in the terminal 300. That is, if the terminal 300 receives the second voice data and at least one voice data matching the text of the preset wake-up word within the first preset time, the first voice data model in the terminal is updated with the first voice data. , Then execute S807.
S807、终端300采用第一语音数据更新终端300的第一声纹模型。S807. The terminal 300 updates the first voiceprint model of the terminal 300 with the first voice data.
其中,如果终端300在第一预设时间内没有接收到第二语音数据和至少一个与预置唤醒词的文本匹配的语音数据,那么终端300则可以删除上述第一语音数据。Wherein, if the terminal 300 does not receive the second voice data and at least one voice data matching the text of the preset wake-up word within the first preset time, the terminal 300 may delete the first voice data.
S808、终端300删除第一语音数据。S808. The terminal 300 deletes the first voice data.
其中,终端300采用第一语音数据更新终端300中的第一声纹模型的方法可以包括:终端300根据第一语音数据生成第二声纹模型,采用第二声纹模型替换第一声纹模型。其中,终端300根据第一语音数据生成第二声纹模型的方法,可以参考常规技术中终端生成声纹模型的方法。本申请实施例这里不予赘述。The method in which the terminal 300 uses the first voice data to update the first voiceprint model in the terminal 300 may include: the terminal 300 generates a second voiceprint model according to the first voice data, and uses the second voiceprint model to replace the first voiceprint model. . The method for generating the second voiceprint model by the terminal 300 according to the first voice data may refer to the method for generating a voiceprint model by the terminal in the conventional technology. This embodiment of the present application will not repeat them here.
由于第一语音数据是终端300实时获取的用户的语音数据;因此,第一语音数据可以反映用户的身体状态和/或用户所处的噪声场景的实时状况。因此,采用第一语音数据更新终端300的声纹模型,可以提高终端执行语音唤醒的语音唤醒率,降低误唤醒率。Since the first voice data is the voice data of the user acquired by the terminal 300 in real time; therefore, the first voice data may reflect the physical state of the user and / or the real-time condition of the noise scene in which the user is located. Therefore, by using the first voice data to update the voiceprint model of the terminal 300, the voice wake-up rate of the voice wake-up performed by the terminal can be improved, and the false wake-up rate can be reduced.
并且,由于接收到的第一语音数据是用户在有语音唤醒终端300的语音助手的强烈意愿下,发出的用于启动语音助手的语音数据。因此,采用能够反映用户真实意图的语音数据更新终端300的声纹模型,可以进一步提高终端执行语音唤醒的语音唤醒率,降低误唤醒率。In addition, because the received first voice data is voice data sent by the user for activating the voice assistant under the strong will of the voice assistant of the voice wake-up terminal 300. Therefore, the voiceprint model of the terminal 300 is updated by using voice data that can reflect the user's true intention, which can further increase the voice wake-up rate and reduce the false wake-up rate when the terminal performs voice wake-up.
进一步的,接收到的第一语音数据是终端300在终端300执行语音唤醒过程中自动获取的,而不是提示用户手动重新注册唤醒词后接收用户输入的。如此,采用接收到的第一语音数据更新声纹模型,还可以简化唤醒词更新的流程。Further, the received first voice data is automatically acquired by the terminal 300 during the voice wake-up process performed by the terminal 300, instead of prompting the user to manually re-register the wake-up word and receiving user input. In this way, updating the voiceprint model by using the received first voice data can also simplify the process of updating the wake word.
可以理解,如果第一语音数据的信号质量较差,那么在终端300采用第一语音数据更新第一声纹模型后,终端300使用更新后的声纹模型进行语音唤醒,会影响语音唤醒的成功率。It can be understood that if the signal quality of the first voice data is poor, after the terminal 300 uses the first voice data to update the first voiceprint model, the terminal 300 uses the updated voiceprint model to perform voice wake-up, which will affect the success of voice wake-up rate.
为了避免终端300采用信号质量较差的语音数据更新第一声纹模型,终端300在采用第一语音数据更新第一声纹模型之前,可以先判断第一语音数据的信号质量参数是否高于第二预设阈值。其中,语音数据的信号质量参数用于表征语音数据的信号质量的高低。例如,语音数据的信号质量参数可以为语音数据的信噪比。如果第一语音数据的信号质量参数高于第二预设阈值,则表示第一语音数据的信号质量比较高。在这种情况下,终端300可以采用第一语音数据更新第一声纹模型。如果第一语音数据的信号质量参数低于或者等于第二预设阈值,终端300则可以删除上述第一语音数据In order to prevent the terminal 300 from updating the first voiceprint model with voice data with poor signal quality, the terminal 300 may determine whether the signal quality parameter of the first voice data is higher than the first voiceprint model before updating the first voiceprint model with the first voice data. Two preset thresholds. The signal quality parameters of the voice data are used to characterize the signal quality of the voice data. For example, the signal quality parameter of the voice data may be a signal-to-noise ratio of the voice data. If the signal quality parameter of the first voice data is higher than the second preset threshold, it means that the signal quality of the first voice data is relatively high. In this case, the terminal 300 may update the first voiceprint model by using the first voice data. If the signal quality parameter of the first voice data is lower than or equal to the second preset threshold, the terminal 300 may delete the first voice data.
可选的,除了上述第一语音数据之外,终端300还可以采用上述至少一个与预置唤醒词的文本匹配的语音数据,更新第一声纹模型。具体的,终端可以从上述第一语音数据和至少一个与预置唤醒词的文本匹配的语音数据中,选择出信号质量参数高于第二预设阈值的语音数据;然后采用语音信号质量高于第二预设阈值的语音数据,更新第一声纹模型。Optionally, in addition to the above-mentioned first voice data, the terminal 300 may also use the at least one voice data that matches the text of the preset wake-up word to update the first voiceprint model. Specifically, the terminal may select voice data with a signal quality parameter higher than a second preset threshold from the first voice data and at least one voice data that matches the text of the preset wake-up word; and then use the voice signal quality higher than The second preset threshold of speech data updates the first voiceprint model.
为了避免恶意用户触发终端300执行S801-S808,更新终端300中的第一声纹模型,达到语音唤醒终端300的目的。终端300在执行S807之前,可以进行用户身份验证。在用户身份验证通过后,再执行S807。具体的,在S806之后,S807之前,终端300可以对用户进行身份认证;若身份认证通过,终端300执行S807;若身份认证未通过,终端300执行S808。其中,终端对用户进行身份认证的方法可以包括S901-S903。如图9所示,在图8所示的S806之后,S807之前,本申请实施例的方法还可以包括 S901-S903:In order to prevent a malicious user from triggering the terminal 300 to execute S801-S808, the first voiceprint model in the terminal 300 is updated to achieve the purpose of awakening the terminal 300 by voice. The terminal 300 may perform user identity verification before executing S807. After the user authentication is passed, S807 is performed again. Specifically, after S806 and before S807, the terminal 300 may perform identity authentication on the user; if the identity authentication passes, the terminal 300 performs S807; if the identity authentication fails, the terminal 300 performs S808. The method for the terminal to authenticate the user may include S901-S903. As shown in FIG. 9, after S806 shown in FIG. 8 and before S807, the method in this embodiment of the present application may further include S901-S903:
S901、终端300显示身份验证界面。S901. The terminal 300 displays an identity verification interface.
其中。该身份验证界面用于接收用户输入的身份验证信息。among them. The authentication interface is used to receive authentication information input by a user.
S902、终端300接收用户在身份验证界面输入的身份验证信息。S902. The terminal 300 receives the authentication information input by the user on the authentication interface.
S903、终端300根据身份验证信息进行用户身份验证。S903. The terminal 300 performs user identity verification according to the identity verification information.
其中,若身份认证通过,终端300采用第一语音数据更新第一声纹模型,即终端300执行S807。若身份认证未通过,终端300则删除第一语音数据,即终端300执行S808。Wherein, if the identity authentication is passed, the terminal 300 updates the first voiceprint model with the first voice data, that is, the terminal 300 executes S807. If the identity authentication fails, the terminal 300 deletes the first voice data, that is, the terminal 300 executes S808.
举例来说,上述身份验证信息可以为数字密码、图案密码、指纹信息、虹膜信息和面部特征信息中的任一种。相应的,上述身份验证界面可以是用于输入数字密码或者图案密码的界面、用于录入指纹信息的界面和用于录入虹膜信息的界面,以及用于录入面部特征信息的界面等界面中的任一种。For example, the identity verification information may be any one of a digital password, a pattern password, fingerprint information, iris information, and facial feature information. Correspondingly, the aforementioned authentication interface may be any one of an interface for inputting a digital password or a pattern password, an interface for entering fingerprint information, an interface for entering iris information, and an interface for entering facial feature information. One.
示例性的,以终端300是图10所示的手机1000,上述身份验证信息为数字密码,上述身份验证界面是用于录入数字密码的界面为例。手机1000可以显示图10所示的身份验证界面1001。该身份验证界面1001中包括密码输入框1002和第一提示信息“用户身份验证通过后,手机将自动更新唤醒词”1003。Exemplarily, the terminal 300 is the mobile phone 1000 shown in FIG. 10, the above-mentioned identity verification information is a digital password, and the above-mentioned identity verification interface is an interface for entering a digital password as an example. The mobile phone 1000 can display the authentication interface 1001 shown in FIG. 10. The authentication interface 1001 includes a password input box 1002 and a first prompt message "After the user authentication is passed, the mobile phone will automatically update the wake-up word" 1003.
终端300进行用户身份验证。在用户身份验证通过后,终端才更新终端300中的第一声纹模型。这样,可以避免恶意用户使用该恶意用户的声音触发终端300更新终端300中的第一声纹模型,达到恶意语音唤醒终端300的目的。通过本方案,可以避免终端300中的声纹模型被恶意更新,可以提升终端300的安全性。The terminal 300 performs user authentication. After the user authentication is passed, the terminal updates the first voiceprint model in the terminal 300. In this way, it is possible to prevent a malicious user from using the voice of the malicious user to trigger the terminal 300 to update the first voiceprint model in the terminal 300, so as to achieve the purpose of waking the terminal 300 by malicious voice. With this solution, the voiceprint model in the terminal 300 can be prevented from being maliciously updated, and the security of the terminal 300 can be improved.
可以理解,直接采用第一语音数据生成新的声纹模型,并采用新的声纹模型替换第一声纹模型,虽然可以提升终端300执行语音唤醒的语音唤醒率。但是,直接采用根据第一语音数据生成的声纹模型替换第一声纹模型会大幅度提升语音唤醒率。而大幅度提升语音唤醒率,可能会相应的提高终端300执行语音唤醒的误唤醒率。为了可以在稳定提升终端300的语音唤醒率的同时,降低终端300执行语音唤醒的误唤醒率。如图11所示,上述S807可以包括上述S601-S603。It can be understood that the new voiceprint model is directly generated by using the first voice data, and the first voiceprint model is replaced by the new voiceprint model, although the voice wakeup rate of the terminal 300 performing voice wakeup can be improved. However, directly replacing the first voiceprint model with the voiceprint model generated based on the first voice data will greatly improve the voice wake-up rate. A substantial increase in the voice wake-up rate may increase the false wake-up rate of the voice wake-up performed by the terminal 300 accordingly. In order to stably increase the voice wake-up rate of the terminal 300, and reduce the false wake-up rate of the voice wake-up performed by the terminal 300. As shown in FIG. 11, the above S807 may include the above S601-S603.
本申请实施例中,终端采用第一语音数据替换至少两个注册语音数据中的部分语音数据;而不是完全根据第一语音数据生成第二声纹模型。这样,可以较为稳定的提升终端300执行语音唤醒的语音唤醒率。并且,可以在稳定提升终端300的语音唤醒率的同时,降低终端300执行语音唤醒的误唤醒率。In the embodiment of the present application, the terminal uses the first voice data to replace part of the voice data in the at least two registered voice data; instead of generating the second voiceprint model completely based on the first voice data. In this way, the voice wake-up rate of the voice wake-up performed by the terminal 300 can be relatively stabilized. In addition, while the voice wake-up rate of the terminal 300 can be steadily increased, the false wake-up rate of the voice wake-up performed by the terminal 300 can be reduced.
可以理解,如果终端根据第二声纹模型生成的第二声纹门限与第一声纹门限相差较大,则会导致终端300执行语音唤醒的唤醒率大幅度波动,影响用户体验。基于此,如图2所示,在图12所示的S602之后,S603之前,本申请实施例的方法还可以包括S701-S702:It can be understood that if the second voiceprint threshold generated by the terminal according to the second voiceprint model is significantly different from the first voiceprint threshold, it will cause the wakeup rate of the terminal 300 to perform voice wakeup to fluctuate greatly, affecting the user experience. Based on this, as shown in FIG. 2, after S602 and before S603 shown in FIG. 12, the method in this embodiment of the present application may further include S701-S702:
在S702之后,如果第二声纹门限与第一声纹门限的差值小于第一预设阈值,则表示第二声纹门限与第一声纹门限的变化较小。这种情况下,采用第二声纹模型进行声纹校验,不会对终端300进行语音唤醒的唤醒率产生较大的影响。此时,终端300可以执行S603。在S702之后,如果第二声纹门限与第一声纹门限的差值大于或者等于第一预设阈值,则表示第二声纹门限与第一声纹门限的变化较大。这种情况下,采用 第二声纹模型进行声纹校验,会对终端300进行语音唤醒的唤醒率产生较大的影响。此时,终端300可以执行S703。After S702, if the difference between the second voiceprint threshold and the first voiceprint threshold is less than the first preset threshold, it means that the change between the second voiceprint threshold and the first voiceprint threshold is small. In this case, using the second voiceprint model for voiceprint verification will not have a significant impact on the wake-up rate of voice wake-up of the terminal 300. At this time, the terminal 300 may execute S603. After S702, if the difference between the second voiceprint threshold and the first voiceprint threshold is greater than or equal to the first preset threshold, it means that the second voiceprint threshold and the first voiceprint threshold have a larger change. In this case, performing the voiceprint verification by using the second voiceprint model will greatly affect the wake-up rate of the voice wake-up of the terminal 300. At this time, the terminal 300 may execute S703.
可以理解,终端300在第二声纹门限与第一声纹门限的变化较大时,删除第二声纹模型和第三语音数据,即不采用第一声纹模型替换第二声纹模型。这样,可以避免由于第二声纹门限与第一声纹门限的差值较大,导致终端300执行语音唤醒的唤醒率大幅度波动,影响用户体验。It can be understood that when the change between the second voiceprint threshold and the first voiceprint threshold is large, the terminal 300 deletes the second voiceprint model and the third voice data, that is, the first voiceprint model is not used to replace the second voiceprint model. In this way, the large difference between the second voiceprint threshold and the first voiceprint threshold can prevent the wake-up rate of the terminal 300 from performing the voice wakeup from fluctuating greatly, affecting the user experience.
可以理解的是,上述终端等为了实现上述功能,其包含了执行各个功能相应的硬件结构和/或软件模块。本领域技术人员应该很容易意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,本申请实施例能够以硬件或硬件和计算机软件的结合形式来实现。某个功能究竟以硬件还是计算机软件驱动硬件的方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请实施例的范围。It can be understood that, in order to implement the foregoing functions, the foregoing terminal and the like include a hardware structure and / or a software module corresponding to performing each function. Those skilled in the art should easily realize that, in combination with the units and algorithm steps of each example described in the embodiments disclosed herein, the embodiments of the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is performed by hardware or computer software-driven hardware depends on the specific application of the technical solution and design constraints. Professional technicians can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of the embodiments of the present application.
本申请实施例可以根据上述方法示例对上述终端等进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。需要说明的是,本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。In the embodiment of the present application, functional modules may be divided into the foregoing terminals and the like according to the foregoing method examples. For example, each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module. The above integrated modules may be implemented in the form of hardware or software functional modules. It should be noted that the division of the modules in the embodiments of the present application is schematic, and is only a logical function division. In actual implementation, there may be another division manner.
在采用对应各个功能划分各个功能模块的情况下,图13示出了上述实施例中所涉及的终端的一种可能的结构示意图,该终端1300包括:存储单元1301、输入单元1302、文本校验单元1303、声纹校验单元1304和更新单元1305。其中,存储单元1301中保存有终端1300中注册的预置唤醒词,以及第一声纹模型。第一声纹模型用于在唤醒语音助手时进行声纹校验。第一声纹模型表征预置唤醒词的声纹特征。In a case where each functional module is divided according to each function, FIG. 13 shows a possible structural diagram of a terminal involved in the foregoing embodiment. The terminal 1300 includes: a storage unit 1301, an input unit 1302, and a text check. The unit 1303, the voiceprint verification unit 1304, and the update unit 1305. The storage unit 1301 stores a preset wake-up word registered in the terminal 1300 and a first voiceprint model. The first voiceprint model is used for voiceprint verification when the voice assistant is woken up. The first voiceprint model represents the voiceprint characteristics of the preset wake word.
其中,输入单元1302用于支持终端1300执行上述方法实施例中的S401、S502、S801、S902,和/或用于本文所描述的技术的其它过程。文本校验单元1303用于支持终端1300执行上述方法实施例中的S402、S802、S805,和/或用于本文所描述的技术的其它过程。声纹校验单元1304用于支持终端1300执行上述方法实施例中的S403、S803,和/或用于本文所描述的技术的其它过程。更新单元1305用于支持终端1300执行上述方法实施例中的S404、S603、S807,和/或用于本文所描述的技术的其它过程。The input unit 1302 is used to support the terminal 1300 to perform S401, S502, S801, and S902 in the foregoing method embodiments, and / or other processes used in the technology described herein. The text verification unit 1303 is configured to support the terminal 1300 to perform S402, S802, and S805 in the foregoing method embodiments, and / or other processes used in the technology described herein. The voiceprint verification unit 1304 is configured to support the terminal 1300 to perform S403, S803 in the foregoing method embodiments, and / or other processes used in the technology described herein. The update unit 1305 is configured to support the terminal 1300 to perform S404, S603, and S807 in the foregoing method embodiments, and / or other processes used in the technology described herein.
进一步的,上述终端1300还可以包括:启动单元和确定单元。启动单元用于支持终端1300执行上述方法实施例中的S501、S804,和/或用于本文所描述的技术的其它过程。确定单元用于支持终端1300执行上述方法实施例中的S503,和/或用于本文所描述的技术的其它过程。Further, the terminal 1300 may further include: a starting unit and a determining unit. The initiating unit is configured to support the terminal 1300 to perform S501, S804 in the foregoing method embodiments, and / or other processes used in the technology described herein. The determining unit is configured to support the terminal 1300 to perform S503 in the foregoing method embodiment, and / or other processes used in the technology described herein.
进一步的,如图14所示,上述终端1300还可以包括:身份认证单元1306。身份认证单元1306用于支持终端1300对用户进行用户身份验证。例如,身份认证单元1306用于支持终端1300执行上述方法实施例中的S903,和/或用于本文所描述的技术的其它过程。Further, as shown in FIG. 14, the terminal 1300 may further include: an identity authentication unit 1306. The identity authentication unit 1306 is configured to support the terminal 1300 to perform user identity verification on the user. For example, the identity authentication unit 1306 is configured to support the terminal 1300 to perform S903 in the foregoing method embodiment, and / or other processes used in the technology described herein.
进一步的,上述终端1300还可以包括:显示单元。显示单元用于支持终端1300执行上述方法实施例中的S901,和/或用于本文所描述的技术的其它过程。Further, the terminal 1300 may further include a display unit. The display unit is configured to support the terminal 1300 to execute S901 in the foregoing method embodiment, and / or other processes used in the technology described herein.
进一步的,上述终端1300还可以包括:替换单元和生成单元。替换单元用于支持 终端1300执行上述方法实施例中的S601,和/或用于本文所描述的技术的其它过程。生成单元用于支持终端1300执行上述方法实施例中的S602、S701,和/或用于本文所描述的技术的其它过程。Further, the terminal 1300 may further include a replacement unit and a generation unit. The replacement unit is configured to support the terminal 1300 to perform S601 in the foregoing method embodiment, and / or other processes used in the technology described herein. The generating unit is configured to support the terminal 1300 to perform S602, S701 in the foregoing method embodiment, and / or other processes used in the technology described herein.
进一步的,上述终端1300还可以包括:删除单元。删除单元用于支持终端1300执行上述方法实施例中的S405、S703、S808,和/或用于本文所描述的技术的其它过程。Further, the terminal 1300 may further include: a deleting unit. The deleting unit is configured to support the terminal 1300 to perform S405, S703, and S808 in the foregoing method embodiments, and / or other processes used in the technology described herein.
进一步的,上述终端1300还可以包括:判断单元。判断单元用于支持终端1300执行上述方法实施例中的S702、S806,和/或用于本文所描述的技术的其它过程。Further, the terminal 1300 may further include a judging unit. The judging unit is configured to support the terminal 1300 to execute S702 and S806 in the foregoing method embodiments, and / or other processes used in the technology described herein.
其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。Wherein, all relevant content of each step involved in the above method embodiment can be referred to the functional description of the corresponding functional module, which will not be repeated here.
当然,终端1300包括但不限于上述所列举的单元模块。例如,终端300还可以包括接收单元和发送单元。接收单元用于接收其他终端发送的数据或者指令。发送单元用于向其他终端发送数据或者指令。并且,上述功能单元的具体所能够实现的功能也包括但不限于上述实例所述的方法步骤对应的功能,终端1300的其他单元的详细描述可以参考其所对应方法步骤的详细描述,本申请实施例这里不再赘述。Of course, the terminal 1300 includes, but is not limited to, the unit modules listed above. For example, the terminal 300 may further include a receiving unit and a transmitting unit. The receiving unit is used to receive data or instructions sent by other terminals. The sending unit is used to send data or instructions to other terminals. In addition, the functions that can be implemented by the above functional units also include, but are not limited to, the functions corresponding to the method steps described in the above examples. For detailed descriptions of other units of the terminal 1300, refer to the detailed description of the corresponding method steps. Examples are not repeated here.
在采用集成的单元的情况下,图15示出了上述实施例中所涉及的终端的一种可能的结构示意图。该终端1500包括:处理模块1501、存储模块1502和显示模块1503。处理模块1501用于对终端1500的动作进行控制管理。显示模块1503用于显示处理模块1501生成的图像。存储模块1502,用于保存终端的程序代码和数据。例如,存储模块1502中保存有终端中注册的预置唤醒词,以及第一声纹模型,所述第一声纹模型用于在唤醒所述语音助手时进行声纹校验,所述第一声纹模型表征所述预置唤醒词的声纹特征。可选的,终端1500还可以包括通信模块用于支持终端与其他网络实体的通信。终端1500包括的各个单元的详细描述可以参考上述各方法实施例中的描述,这里不再赘述。In the case of using an integrated unit, FIG. 15 shows a possible structural diagram of a terminal involved in the foregoing embodiment. The terminal 1500 includes a processing module 1501, a storage module 1502, and a display module 1503. The processing module 1501 is configured to control and manage the actions of the terminal 1500. The display module 1503 is configured to display an image generated by the processing module 1501. The storage module 1502 is configured to store program codes and data of the terminal. For example, the storage module 1502 stores a preset wake-up word registered in the terminal and a first voiceprint model, where the first voiceprint model is used to perform voiceprint verification when the voice assistant is woken up, and the first The voiceprint model characterizes the voiceprint characteristics of the preset wake word. Optionally, the terminal 1500 may further include a communication module for supporting communication between the terminal and other network entities. For a detailed description of each unit included in the terminal 1500, reference may be made to the description in the foregoing method embodiments, and details are not described herein again.
其中,处理模块1501可以是处理器或控制器,例如可以是中央处理器(Central Processing Unit,CPU),通用处理器,数字信号处理器(Digital Signal Processor,DSP),专用集成电路(Application-Specific Integrated Circuit,ASIC),现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。通信模块可以是收发器、收发电路或通信接口等。存储模块1502可以是存储器。The processing module 1501 may be a processor or a controller. For example, the processing module 1501 may be a central processing unit (CPU), a general-purpose processor, a digital signal processor (DSP), and an application-specific integrated circuit (Application-Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA), or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It may implement or execute various exemplary logical blocks, modules, and circuits described in connection with the present disclosure. The processor may also be a combination that implements computing functions, such as a combination including one or more microprocessors, a combination of a DSP and a microprocessor, and so on. The communication module may be a transceiver, a transceiver circuit, or a communication interface. The storage module 1502 may be a memory.
当处理模块1501为处理器(如图3所示的处理器310),通信模块包括Wi-Fi模块和蓝牙模块(如图3所示的通信模块360)。Wi-Fi模块和蓝牙模块等通信模块可以统称为通信接口。存储模块1502为存储器(如图3所示的内部存储器321和通过外部存储器接口320连接终端1500的外置SD卡)。显示模块1503为触摸屏(包括图3所示的显示屏394)时,本申请实施例所提供的终端可以为图3所示的终端300。其中,上述处理器、通信接口、触摸屏和存储器可以通过总线耦合在一起。When the processing module 1501 is a processor (such as the processor 310 shown in FIG. 3), the communication module includes a Wi-Fi module and a Bluetooth module (such as the communication module 360 shown in FIG. 3). Communication modules such as Wi-Fi modules and Bluetooth modules can be collectively referred to as communication interfaces. The storage module 1502 is a memory (an internal memory 321 as shown in FIG. 3 and an external SD card connected to the terminal 1500 through the external memory interface 320). When the display module 1503 is a touch screen (including the display screen 394 shown in FIG. 3), the terminal provided in this embodiment of the present application may be the terminal 300 shown in FIG. 3. The processor, the communication interface, the touch screen, and the memory may be coupled together through a bus.
本申请实施例还提供一种计算机存储介质,该计算机存储介质中存储有计算机程 序代码,当上述处理器执行该计算机程序代码时,该终端执行图4A、图5A、图6、图7、图8、图9、图11和图12中任一附图中的相关方法步骤实现上述实施例中的方法。An embodiment of the present application further provides a computer storage medium. The computer storage medium stores computer program code. When the processor executes the computer program code, the terminal executes FIG. 4A, FIG. 5A, FIG. 6, FIG. 7, and FIG. 8. The relevant method steps in any of Figures 9, 11, and 12 implement the method in the above embodiment.
本申请实施例还提供了一种计算机程序产品,当该计算机程序产品在计算机上运行时,使得计算机执行图4A、图5A、图6、图7、图8、图9、图11和图12中任一附图中的相关方法步骤实现上述实施例中的方法。The embodiment of the present application also provides a computer program product, which causes the computer to execute FIG. 4A, FIG. 5A, FIG. 6, FIG. 7, FIG. 9, FIG. 11, FIG. 11 and FIG. 12 when the computer program product runs on the computer. The relevant method steps in any of the figures implement the method in the above embodiments.
其中,本申请实施例提供的终端1300、终端1500、计算机存储介质或者计算机程序产品均用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。The terminal 1300, the terminal 1500, the computer storage medium, or the computer program product provided in the embodiment of the present application are all used to execute the corresponding methods provided above. Therefore, for the beneficial effects that can be achieved, refer to the foregoing provided. The beneficial effects in the corresponding method are not repeated here.
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。Through the description of the above embodiments, those skilled in the art can clearly understand that, for the convenience and brevity of the description, only the division of the above functional modules is used as an example. In practical applications, the above functions can be allocated as required. Completed by different functional modules, that is, the internal structure of the device is divided into different functional modules to complete all or part of the functions described above.
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个装置,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。In the several embodiments provided in this application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the device embodiments described above are only schematic. For example, the division of the modules or units is only a logical function division. In actual implementation, there may be another division manner. For example, multiple units or components may be divided. The combination can either be integrated into another device, or some features can be ignored or not implemented. In addition, the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是一个物理单元或多个物理单元,即可以位于一个地方,或者也可以分布到多个不同地方。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components displayed as units may be one physical unit or multiple physical units, that is, may be located in one place, or may be distributed to multiple different places. . Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
另外,在本申请各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以使用硬件的形式实现,也可以使用软件功能单元的形式实现。In addition, each functional unit in each embodiment of the present application may be integrated into one processing unit, or each of the units may exist separately physically, or two or more units may be integrated into one unit. The above integrated unit may be implemented in the form of hardware or in the form of software functional unit.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该软件产品存储在一个存储介质中,包括若干指令用以使得一个设备(可以是单片机,芯片等)或处理器(processor)执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。When the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a readable storage medium. Based on such an understanding, the technical solution of the embodiments of the present application is essentially a part that contributes to the existing technology or all or part of the technical solution may be embodied in the form of a software product that is stored in a storage medium Included are several instructions for causing a device (which can be a single-chip microcomputer, a chip, etc.) or a processor to execute all or part of the steps of the method described in the embodiments of the present application. The foregoing storage medium includes various media that can store program codes, such as a U disk, a mobile hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.
以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何在本申请揭露的技术范围内的变化或替换,都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以所述权利要求的保护范围为准。The above description is only a specific implementation of this application, but the scope of protection of this application is not limited to this. Any changes or replacements within the technical scope disclosed in this application shall be covered by the scope of protection of this application. . Therefore, the protection scope of this application shall be subject to the protection scope of the claims.

Claims (24)

  1. 一种终端更新语音助手的唤醒语音的方法,其特征在于,包括:A method for a terminal to update a wake-up voice of a voice assistant, which includes:
    所述终端接收用户输入的第一语音数据;Receiving, by the terminal, first voice data input by a user;
    所述终端判断所述第一语音数据对应的文本与所述终端中注册的预置唤醒词的文本是否匹配;Determining, by the terminal, whether the text corresponding to the first voice data matches the text of a preset wake-up word registered in the terminal;
    若所述第一语音数据对应的文本与所述预置唤醒词的文本匹配,则所述终端对所述用户进行身份认证;If the text corresponding to the first voice data matches the text of the preset wake-up word, the terminal authenticates the user;
    若所述身份认证通过,所述终端则采用所述第一语音数据更新所述终端中的第一声纹模型;If the identity authentication is passed, the terminal updates the first voiceprint model in the terminal by using the first voice data;
    其中,所述第一声纹模型用于在唤醒所述语音助手时进行声纹校验,所述第一声纹模型表征所述预置唤醒词的声纹特征。The first voiceprint model is used to perform voiceprint verification when the voice assistant is awakened, and the first voiceprint model represents the voiceprint characteristics of the preset wake-up word.
  2. 根据权利要求1所述的终端更新语音助手的唤醒语音的方法,其特征在于,所述终端对所述用户进行身份认证包括:The method for updating a wake-up voice of a voice assistant according to claim 1, wherein the terminal performing identity authentication on the user comprises:
    所述终端使用所述第一声纹模型对所述第一语音数据进行声纹校验,若通过声纹校验,则所述身份认证通过。The terminal uses the first voiceprint model to perform voiceprint verification on the first voice data. If the voiceprint verification is passed, the identity authentication passes.
  3. 根据权利要求1或2所述的终端更新语音助手的唤醒语音的方法,其特征在于,所述方法还包括:The method according to claim 1 or 2, wherein the method further comprises:
    当所述身份认证通过时,所述终端启动所述语音助手;When the identity authentication is passed, the terminal starts the voice assistant;
    所述终端通过所述语音助手接收第二语音数据;Receiving, by the terminal, second voice data through the voice assistant;
    在所述身份认证通过之后,所述终端采用第一语音数据更新所述终端中的第一声纹模型之前,所述方法还包括:After the identity authentication is passed, before the terminal uses the first voice data to update the first voiceprint model in the terminal, the method further includes:
    所述终端确定所述第二语音数据为有效的语音命令。The terminal determines that the second voice data is a valid voice command.
  4. 根据权利要求1-3中任意一项所述的终端更新语音助手的唤醒语音的方法,其特征在于,所述终端包括协处理器和主处理器;所述终端使用协处理器监测语音数据;当所述协处理器监测到与所述预置唤醒词的相似度满足预设条件的所述第一语音数据时,通知所述主处理器判断所述第一语音数据对应的文本与所述终端预置唤醒词的文本是否匹配,在确定所述第一语音数据对应的文本与所述预置唤醒词的文本匹配时,所述主处理使用所述第一声纹模型对所述第一语音数据进行声纹校验。The method for updating a wake-up voice of a voice assistant according to any one of claims 1-3, wherein the terminal includes a coprocessor and a main processor; and the terminal uses the coprocessor to monitor voice data; When the coprocessor detects the first voice data whose similarity to the preset wake-up word satisfies a preset condition, notifies the main processor to judge that the text corresponding to the first voice data is related to the first voice data Whether the terminal preset text of the wakeup word matches, and when determining that the text corresponding to the first voice data matches the text of the preset wakeup word, the main process uses the first voiceprint model to Voice data is checked for voiceprint.
  5. 根据权利要求1所述的终端更新语音助手的唤醒语音的方法,其特征在于,在所述终端对所述用户进行身份认证之前,所述方法包括:The method for updating a wake-up voice of a voice assistant according to claim 1, wherein before the terminal authenticates the user, the method comprises:
    所述终端使用所述第一声纹模型对所述第一语音数据进行声纹校验;Performing, by the terminal, voiceprint verification on the first voice data using the first voiceprint model;
    若所述第一语音数据未通过声纹校验,所述终端对第一预设时间内接收到的语音数据进行文本校验;If the first voice data fails the voiceprint check, the terminal performs a text check on the voice data received within the first preset time;
    如果所述终端在所述第一预设时间内接收到第二语音数据和至少一个与所述预置唤醒词的文本匹配的语音数据,所述终端对所述用户进行身份认证;If the terminal receives the second voice data and at least one voice data that matches the text of the preset wake-up word within the first preset time, the terminal authenticates the user;
    其中,所述第二语音数据对应的文本包含预置的关键词。The text corresponding to the second voice data includes a preset keyword.
  6. 根据权利要求5所述的终端更新语音助手的唤醒语音的方法,其特征在于,所述终端对所述用户进行身份认证,包括:The method for updating a wake-up voice of a voice assistant according to claim 5, wherein the terminal performing identity authentication on the user comprises:
    所述终端显示身份验证界面;The terminal displays an authentication interface;
    所述终端接收用户在所述身份验证界面输入的身份验证信息;Receiving, by the terminal, the authentication information input by the user on the authentication interface;
    所述终端根据所述身份验证信息对所述用户进行用户身份验证。The terminal performs user identity verification on the user according to the identity verification information.
  7. 根据权利要求5或6所述的终端更新语音助手的唤醒语音的方法,其特征在于,所述终端包括协处理器和主处理器;所述终端使用协处理器监测语音数据;当所述协处理器监测到与所述预置唤醒词的相似度满足预设条件的所述第一语音数据时,通知所述主处理器判断所述第一语音数据对应的文本与所述终端预置唤醒词的文本是否匹配,在确定所述第一语音数据对应的文本与所述预置唤醒词的文本匹配时,所述主处理使用所述第一声纹模型对所述第一语音数据进行声纹校验;The method for updating a wake-up voice of a voice assistant according to claim 5 or 6, wherein the terminal comprises a coprocessor and a main processor; the terminal uses the coprocessor to monitor voice data; and when the coprocessor When the processor detects the first voice data whose similarity with the preset wake-up word satisfies a preset condition, notifies the main processor to judge that the text corresponding to the first voice data and the terminal preset wake-up Whether the text of the word matches, and when determining that the text corresponding to the first voice data matches the text of the preset wake-up word, the main process uses the first voiceprint model to voice the first voice data Pattern check
    所述终端使用所述协处理器监测所述第一预设时间内的语音数据;通知所述主处理器判断所述第一预设时间内接收到的语音数据是否包括第二语音数据和至少一个与所述预置唤醒词的文本匹配的语音数据,所述第二语音数据对应的文本包含预置的关键词。Monitoring, by the terminal, the voice data in the first preset time using the coprocessor; and notifying the main processor to determine whether the voice data received in the first preset time includes second voice data and at least A piece of speech data matching the text of the preset wake-up word, and the text corresponding to the second speech data contains a preset keyword.
  8. 根据权利要求1-7中任意一项所述的终端更新语音助手的唤醒语音的方法,其特征在于,所述预置唤醒词包括至少两个注册语音数据,所述至少两个注册语音数据是所述终端注册所述预置唤醒词时录制的,所述第一声纹模型是根据所述至少两个注册语音数据生成的;The method for updating a wake-up voice of a voice assistant according to any one of claims 1-7, wherein the preset wake-up word includes at least two registered voice data, and the at least two registered voice data are Recorded by the terminal when registering the preset wake-up word, the first voiceprint model is generated based on the at least two registered voice data;
    其中,所述终端采用所述第一语音数据更新所述终端中的第一声纹模型,包括:The updating the first voiceprint model in the terminal by using the first voice data includes:
    所述终端采用所述第一语音数据,替换所述至少两个注册语音数据中的第三语音数据,得到更新后的至少两个注册语音数据,所述第三语音数据的信号质量参数低于所述至少两个注册语音数据中其他语音数据的信号质量参数;The terminal uses the first voice data to replace the third voice data in the at least two registered voice data to obtain updated at least two registered voice data. The signal quality parameter of the third voice data is lower than Signal quality parameters of other voice data in the at least two registered voice data;
    所述终端根据所述更新后的至少两个注册语音数据,生成第二声纹模型;Generating, by the terminal, a second voiceprint model according to the updated at least two registered voice data;
    所述终端采用所述第二声纹模型替换所述第一声纹模型,所述第二声纹模型用于表征所述更新后的至少两个注册语音数据的声纹特征。The terminal replaces the first voiceprint model with the second voiceprint model, and the second voiceprint model is used to characterize voiceprint features of the updated at least two registered voice data.
  9. 根据权利要求8所述的终端更新语音助手的唤醒语音的方法,其特征在于,所述终端中还保存了第一声纹门限,所述第一声纹门限是根据所述第一声纹模型和所述至少两个注册语音数据生成的;The method for updating a wake-up voice of a voice assistant according to claim 8, wherein the terminal further stores a first voiceprint threshold, and the first voiceprint threshold is based on the first voiceprint model. And the at least two registered voice data are generated;
    在所述终端根据所述更新后的至少两个注册语音数据,生成第二声纹模型之后,所述终端采用所述第二声纹模型替换所述第一声纹模型之前,所述方法还包括:After the terminal generates a second voiceprint model based on the updated at least two registered voice data, before the terminal replaces the first voiceprint model with the second voiceprint model, the method further includes: include:
    所述终端根据所述第二声纹模型和所述更新后的至少两个注册语音数据,生成第二声纹门限;Generating, by the terminal, a second voiceprint threshold according to the second voiceprint model and the updated at least two registered voice data;
    所述终端采用所述第二声纹模型替换所述第一声纹模型,包括:The terminal replacing the first voiceprint model with the second voiceprint model includes:
    如果所述第二声纹门限与所述第一声纹门限的差值小于第一预设阈值,所述终端采用所述第二声纹模型替换所述第一声纹模型。If the difference between the second voiceprint threshold and the first voiceprint threshold is less than a first preset threshold, the terminal replaces the first voiceprint model with the second voiceprint model.
  10. 根据权利要求9所述的终端更新语音助手的唤醒语音的方法,其特征在于,所述方法还包括:The method for updating a wake-up voice of a voice assistant according to claim 9, wherein the method further comprises:
    如果所述第二声纹门限与所述第一声纹门限的差值大于或等于所述第一预设阈值,所述终端删除所述第二声纹模型和所述第一语音数据。If the difference between the second voiceprint threshold and the first voiceprint threshold is greater than or equal to the first preset threshold, the terminal deletes the second voiceprint model and the first voice data.
  11. 根据权利要求1-10中任意一项所述的终端更新语音助手的唤醒语音的方法,其特征在于,所述终端采用所述第一语音数据更新所述终端中的第一声纹模型,包括:The method for updating a wake-up voice of a voice assistant according to any one of claims 1-10, wherein the terminal updates the first voiceprint model in the terminal by using the first voice data, comprising: :
    如果所述第一语音数据的信号质量参数高于第二预设阈值,所述终端采用所述第一语音数据更新所述第一声纹模型;If the signal quality parameter of the first voice data is higher than a second preset threshold, the terminal uses the first voice data to update the first voiceprint model;
    其中,所述第一语音数据的信号质量参数包括所述第一语音数据的信噪比。The signal quality parameter of the first voice data includes a signal-to-noise ratio of the first voice data.
  12. 一种终端,其特征在于,所述终端包括:处理器、存储器和显示器;所述存储器、所述显示器与所述处理器耦合;所述显示器用于显示所述处理器生成的图像;所述存储器用于存储计算机程序代码、语音助手的相关信息、所述终端中注册的预置唤醒词和第一声纹模型;所述计算机程序代码包括计算机指令,当所述处理器执行上述计算机指令时,A terminal, wherein the terminal includes: a processor, a memory, and a display; the memory, the display, and the processor are coupled; the display is configured to display an image generated by the processor; and The memory is configured to store computer program code, related information of a voice assistant, a preset wake-up word registered in the terminal, and a first voiceprint model; the computer program code includes computer instructions, and when the processor executes the computer instructions, ,
    所述处理器,用于接收用户输入的第一语音数据;判断所述第一语音数据对应的文本与所述预置唤醒词的文本是否匹配;若所述第一语音数据对应的文本与所述预置唤醒词的文本匹配,则对所述用户进行身份认证;若所述身份认证通过,则采用所述第一语音数据更新所述存储器中保存的所述第一声纹模型;The processor is configured to receive first voice data input by a user; determine whether the text corresponding to the first voice data matches the text of the preset wake-up word; if the text corresponding to the first voice data matches the If the text matching of the preset wake word is performed, the user is authenticated; if the authentication is passed, the first voice data model stored in the memory is updated by using the first voice data;
    其中,所述第一声纹模型用于在唤醒语音助手时进行声纹校验,所述第一声纹模型表征所述预置唤醒词的声纹特征。Wherein, the first voiceprint model is used to perform voiceprint verification when the voice assistant is awakened, and the first voiceprint model characterizes the voiceprint characteristics of the preset wakeup word.
  13. 根据权利要求12所述的终端,其特征在于,所述处理器,用于对用户进行身份认证,包括:The terminal according to claim 12, wherein the processor, configured to perform identity authentication on a user, comprises:
    所述处理器,用于使用所述第一声纹模型对所述第一语音数据进行声纹校验,若通过声纹校验,则所述身份认证通过。The processor is configured to perform voiceprint verification on the first voice data using the first voiceprint model, and if the voiceprint verification is passed, the identity authentication is passed.
  14. 根据权利要求12或13所述的终端,其特征在于,所述处理器,还用于当所述身份认证通过时,启动所述语音助手;通过所述语音助手接收第二语音数据;The terminal according to claim 12 or 13, wherein the processor is further configured to start the voice assistant when the identity authentication is passed; and receive second voice data through the voice assistant;
    所述处理器,还用于在所述身份认证通过之后,采用所述第一语音数据更新所述第一声纹模型之前,确定所述第二语音数据为有效的语音命令。The processor is further configured to determine that the second voice data is a valid voice command after the identity authentication is passed and before the first voiceprint model is updated with the first voice data.
  15. 根据权利要求12-14中任意一项所述的终端,其特征在于,所述处理器包括协处理器和主处理器;所述协处理器用于监测语音数据;当所述协处理器监测到与所述预置唤醒词的相似度满足预设条件的所述第一语音数据时,通知所述主处理器判断所述第一语音数据对应的文本与所述终端预置唤醒词的文本是否匹配,在确定所述第一语音数据对应的文本与所述预置唤醒词的文本匹配时,所述主处理使用所述第一声纹模型对所述第一语音数据进行声纹校验。The terminal according to any one of claims 12 to 14, wherein the processor includes a coprocessor and a main processor; the coprocessor is used to monitor voice data; and when the coprocessor monitors that When the first voice data whose similarity with the preset wake-up word satisfies a preset condition, notify the main processor to determine whether the text corresponding to the first voice data and the text of the terminal preset wake-up word are Matching. When it is determined that the text corresponding to the first voice data matches the text of the preset wake-up word, the main process uses the first voiceprint model to perform voiceprint verification on the first voice data.
  16. 根据权利要求12所述的终端,其特征在于,所述处理器,还用于在对所述用户进行身份认证之前,使用所述第一声纹模型对所述第一语音数据进行声纹校验;若所述第一语音数据未通过声纹校验,对第一预设时间内接收到的语音数据进行文本校验;如果所述处理器在所述第一预设时间内接收到第二语音数据和至少一个与所述预置唤醒词的文本匹配的语音数据,则对所述用户进行身份认证;The terminal according to claim 12, wherein the processor is further configured to perform voiceprint correction on the first voice data using the first voiceprint model before performing identity authentication on the user. If the first voice data does not pass the voiceprint verification, perform text verification on the voice data received within the first preset time; if the processor receives the first voice data within the first preset time Two voice data and at least one voice data that matches the text of the preset wake-up word, then authenticate the user;
    其中,所述第二语音数据对应的文本包含预置的关键词。The text corresponding to the second voice data includes a preset keyword.
  17. 根据权利要求16所述的终端,其特征在于,所述处理器,用于对所述用户进行身份认证,包括:The terminal according to claim 16, wherein the processor, configured to perform identity authentication on the user, comprises:
    所述处理器,用于控制所述显示器显示身份验证界面;接收用户在所述显示器显示的所述身份验证界面输入的身份验证信息;根据所述身份验证信息对所述用户进行用户身份验证。The processor is configured to control the display to display an authentication interface; receive authentication information input by a user on the authentication interface displayed on the display; and perform user authentication on the user according to the authentication information.
  18. 根据权利要求16或17所述的终端,其特征在于,所述处理器包括协处理器和主处理器;所述协处理器用于监测语音数据;当所述协处理器监测到与所述预置唤醒词的相似度满足预设条件的所述第一语音数据时,通知所述主处理器判断所述第一语音数据对应的文本与所述终端预置唤醒词的文本是否匹配,在确定所述第一语音数据对应的文本与所述预置唤醒词的文本匹配时,所述主处理使用所述第一声纹模型对所述第一语音数据进行声纹校验;The terminal according to claim 16 or 17, wherein the processor comprises a coprocessor and a main processor; the coprocessor is used to monitor voice data; when the coprocessor detects When the first voice data whose similarity of the wake word meets a preset condition is notified, the main processor is notified to determine whether the text corresponding to the first voice data matches the text of the preset wake word of the terminal, and When the text corresponding to the first voice data matches the text of the preset wake-up word, the main process uses the first voiceprint model to perform voiceprint verification on the first voice data;
    所述协处理器还用于监测所述第一预设时间内的语音数据;通知所述主处理器判断所述第一预设时间内接收到的语音数据是否包括第二语音数据和至少一个与所述预置唤醒词的文本匹配的语音数据,所述第二语音数据对应的文本包含预置的关键词。The coprocessor is further configured to monitor the voice data in the first preset time; and notify the main processor to determine whether the voice data received in the first preset time includes second voice data and at least one Voice data matching the text of the preset wake-up word, and the text corresponding to the second voice data contains a preset keyword.
  19. 根据权利要求12-18中任意一项所述的终端,其特征在于,所述存储器保存的所述预置唤醒词包括至少两个注册语音数据,所述至少两个注册语音数据是所述处理器注册所述预置唤醒词时录制的,所述第一声纹模型是所述处理器根据所述至少两个注册语音数据生成的;The terminal according to any one of claims 12 to 18, wherein the preset wake-up word stored in the memory includes at least two registered voice data, and the at least two registered voice data are the processing Recorded by the processor when registering the preset wake-up word, the first voiceprint model is generated by the processor according to the at least two registered voice data;
    其中,所述处理器采用所述第一语音数据更新所述第一声纹模型,包括:The updating the first voiceprint model by using the first voice data includes:
    所述处理器,用于采用所述第一语音数据,替换所述至少两个注册语音数据中的第三语音数据,得到更新后的至少两个注册语音数据,所述第三语音数据的信号质量参数低于所述至少两个注册语音数据中其他语音数据的信号质量参数;根据所述更新后的至少两个注册语音数据,生成第二声纹模型;采用所述第二声纹模型替换所述第一声纹模型,所述第二声纹模型用于表征所述更新后的至少两个注册语音数据的声纹特征。The processor is configured to use the first voice data to replace the third voice data in the at least two registered voice data, to obtain updated at least two registered voice data, and a signal of the third voice data Quality parameters are lower than signal quality parameters of other voice data in the at least two registered voice data; generating a second voiceprint model according to the updated at least two registered voice data; and using the second voiceprint model to replace The first voiceprint model and the second voiceprint model are used to characterize voiceprint features of the updated at least two registered voice data.
  20. 根据权利要求19所述的终端,其特征在于,所述存储器中还保存了第一声纹门限,所述第一声纹门限是所述处理器根据所述第一声纹模型和所述至少两个注册语音数据生成的;The terminal according to claim 19, wherein a first voiceprint threshold is further stored in the memory, and the first voiceprint threshold is based on the first voiceprint model and the at least Generated by two registered voice data;
    所述处理器,还用于在根据所述更新后的至少两个注册语音数据,生成所述第二声纹模型之后,采用所述第二声纹模型替换所述第一声纹模型之前,根据所述第二声纹模型和所述更新后的至少两个注册语音数据,生成第二声纹门限;The processor is further configured to, after generating the second voiceprint model according to the updated at least two registered voice data, before using the second voiceprint model to replace the first voiceprint model, Generating a second voiceprint threshold according to the second voiceprint model and the updated at least two registered voice data;
    所述处理器,用于采用所述第二声纹模型替换所述第一声纹模型,包括:The processor configured to replace the first voiceprint model with the second voiceprint model includes:
    所述处理器,用于如果所述第二声纹门限与所述第一声纹门限的差值小于第一预设阈值,采用所述第二声纹模型替换所述第一声纹模型。The processor is configured to use the second voiceprint model to replace the first voiceprint model if a difference between the second voiceprint threshold and the first voiceprint threshold is less than a first preset threshold.
  21. 根据权利要求20所述的终端,其特征在于,所述处理器,还用于如果所述第二声纹门限与所述第一声纹门限的差值大于或等于所述第一预设阈值,删除所述第二声纹模型和所述第一语音数据。The terminal according to claim 20, wherein the processor is further configured to: if a difference between the second voiceprint threshold and the first voiceprint threshold is greater than or equal to the first preset threshold , Deleting the second voiceprint model and the first voice data.
  22. 根据权利要求12-21中任意一项所述的终端,其特征在于,所述处理器,用于采用所述第一语音数据更新所述终端中的第一声纹模型,包括:The terminal according to any one of claims 12-21, wherein the processor is configured to update the first voiceprint model in the terminal by using the first voice data, comprising:
    所述处理器,用于如果所述第一语音数据的信号质量参数高于第二预设阈值,采用所述第一语音数据更新所述第一声纹模型;The processor is configured to update the first voiceprint model by using the first voice data if a signal quality parameter of the first voice data is higher than a second preset threshold;
    其中,所述第一语音数据的信号质量参数包括所述第一语音数据的信噪比。The signal quality parameter of the first voice data includes a signal-to-noise ratio of the first voice data.
  23. 一种计算机存储介质,其特征在于,所述计算机存储介质包括计算机指令,当所述计算机指令在终端上运行时,使得所述终端执行如权利要求1-11中任意一项所 述的终端更新语音助手的唤醒语音的方法。A computer storage medium, characterized in that the computer storage medium includes computer instructions, and when the computer instructions are run on a terminal, the terminal causes the terminal to execute the terminal update according to any one of claims 1-11. Way to wake up the voice assistant.
  24. 一种计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得所述计算机执行如权利要求1-11中任意一项所述的终端更新语音助手的唤醒语音的方法。A computer program product, characterized in that when the computer program product is run on a computer, the computer is caused to execute the method for updating a wake-up voice of a voice assistant by a terminal according to any one of claims 1-11.
PCT/CN2018/096917 2018-07-24 2018-07-24 Method for updating wake-up voice of voice assistant by terminal, and terminal WO2020019176A1 (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2018/096917 WO2020019176A1 (en) 2018-07-24 2018-07-24 Method for updating wake-up voice of voice assistant by terminal, and terminal
CN201880089912.7A CN111742361B (en) 2018-07-24 2018-07-24 Method for updating wake-up voice of voice assistant by terminal and terminal

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/096917 WO2020019176A1 (en) 2018-07-24 2018-07-24 Method for updating wake-up voice of voice assistant by terminal, and terminal

Publications (1)

Publication Number Publication Date
WO2020019176A1 true WO2020019176A1 (en) 2020-01-30

Family

ID=69181102

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/096917 WO2020019176A1 (en) 2018-07-24 2018-07-24 Method for updating wake-up voice of voice assistant by terminal, and terminal

Country Status (2)

Country Link
CN (1) CN111742361B (en)
WO (1) WO2020019176A1 (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111627449A (en) * 2020-05-20 2020-09-04 Oppo广东移动通信有限公司 Screen voiceprint unlocking method and device
CN111833869A (en) * 2020-07-01 2020-10-27 中关村科学城城市大脑股份有限公司 Voice interaction method and system applied to urban brain
CN112489650A (en) * 2020-11-26 2021-03-12 北京小米松果电子有限公司 Wake-up control method and device, storage medium and terminal
CN113593549A (en) * 2021-06-29 2021-11-02 青岛海尔科技有限公司 Method and device for determining awakening rate of voice equipment
WO2023202442A1 (en) * 2022-04-18 2023-10-26 华为技术有限公司 Method for waking up device, electronic device, and storage medium
CN117153166A (en) * 2022-07-18 2023-12-01 荣耀终端有限公司 Voice wakeup method, equipment and storage medium

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112417451B (en) * 2020-11-20 2022-04-12 复旦大学 Malicious software detection method adaptive to intelligent chip hierarchical architecture and based on deep learning
CN117012205A (en) * 2022-04-29 2023-11-07 荣耀终端有限公司 Voiceprint recognition method, graphical interface and electronic equipment
CN115312068B (en) * 2022-07-14 2023-05-09 荣耀终端有限公司 Voice control method, equipment and storage medium
CN115376524B (en) * 2022-07-15 2023-08-04 荣耀终端有限公司 Voice awakening method, electronic equipment and chip system

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110196676A1 (en) * 2010-02-09 2011-08-11 International Business Machines Corporation Adaptive voice print for conversational biometric engine
CN106156583A (en) * 2016-06-03 2016-11-23 深圳市金立通信设备有限公司 A kind of method of speech unlocking and terminal
US20170140760A1 (en) * 2015-11-18 2017-05-18 Uniphore Software Systems Adaptive voice authentication system and method
CN107331400A (en) * 2017-08-25 2017-11-07 百度在线网络技术(北京)有限公司 A kind of Application on Voiceprint Recognition performance improvement method, device, terminal and storage medium
CN107919961A (en) * 2017-12-07 2018-04-17 广州势必可赢网络科技有限公司 A kind of identity authentication protocol and server updated based on dynamic code and dynamic vocal print
CN108231082A (en) * 2017-12-29 2018-06-29 广州势必可赢网络科技有限公司 A kind of update method and device of self study Application on Voiceprint Recognition

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107046517A (en) * 2016-02-05 2017-08-15 阿里巴巴集团控股有限公司 A kind of method of speech processing, device and intelligent terminal
CN106653031A (en) * 2016-10-17 2017-05-10 海信集团有限公司 Voice wake-up method and voice interaction device
CN106961418A (en) * 2017-02-08 2017-07-18 北京捷通华声科技股份有限公司 Identity identifying method and identity authorization system
CN107799120A (en) * 2017-11-10 2018-03-13 北京康力优蓝机器人科技有限公司 Service robot identifies awakening method and device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110196676A1 (en) * 2010-02-09 2011-08-11 International Business Machines Corporation Adaptive voice print for conversational biometric engine
US20170140760A1 (en) * 2015-11-18 2017-05-18 Uniphore Software Systems Adaptive voice authentication system and method
CN106156583A (en) * 2016-06-03 2016-11-23 深圳市金立通信设备有限公司 A kind of method of speech unlocking and terminal
CN107331400A (en) * 2017-08-25 2017-11-07 百度在线网络技术(北京)有限公司 A kind of Application on Voiceprint Recognition performance improvement method, device, terminal and storage medium
CN107919961A (en) * 2017-12-07 2018-04-17 广州势必可赢网络科技有限公司 A kind of identity authentication protocol and server updated based on dynamic code and dynamic vocal print
CN108231082A (en) * 2017-12-29 2018-06-29 广州势必可赢网络科技有限公司 A kind of update method and device of self study Application on Voiceprint Recognition

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111627449A (en) * 2020-05-20 2020-09-04 Oppo广东移动通信有限公司 Screen voiceprint unlocking method and device
CN111627449B (en) * 2020-05-20 2023-02-28 Oppo广东移动通信有限公司 Screen voiceprint unlocking method and device
CN111833869A (en) * 2020-07-01 2020-10-27 中关村科学城城市大脑股份有限公司 Voice interaction method and system applied to urban brain
CN111833869B (en) * 2020-07-01 2022-02-11 中关村科学城城市大脑股份有限公司 Voice interaction method and system applied to urban brain
CN112489650A (en) * 2020-11-26 2021-03-12 北京小米松果电子有限公司 Wake-up control method and device, storage medium and terminal
CN113593549A (en) * 2021-06-29 2021-11-02 青岛海尔科技有限公司 Method and device for determining awakening rate of voice equipment
WO2023202442A1 (en) * 2022-04-18 2023-10-26 华为技术有限公司 Method for waking up device, electronic device, and storage medium
CN117153166A (en) * 2022-07-18 2023-12-01 荣耀终端有限公司 Voice wakeup method, equipment and storage medium

Also Published As

Publication number Publication date
CN111742361B (en) 2023-08-22
CN111742361A (en) 2020-10-02

Similar Documents

Publication Publication Date Title
WO2020019176A1 (en) Method for updating wake-up voice of voice assistant by terminal, and terminal
WO2021000876A1 (en) Voice control method, electronic equipment and system
WO2020037795A1 (en) Voice recognition method, wearable device and electronic device
CN111369988A (en) Voice awakening method and electronic equipment
CN115442783A (en) Bluetooth connection method, system and electronic equipment
CN110784830B (en) Data processing method, Bluetooth module, electronic device and readable storage medium
CN110730114B (en) Method and equipment for configuring network configuration information
WO2021023046A1 (en) Electronic device control method and electronic device
WO2021017988A1 (en) Multi-mode identity identification method and device
WO2021052139A1 (en) Gesture input method and electronic device
WO2020029094A1 (en) Method for generating speech control command, and terminal
WO2021175266A1 (en) Identity verification method and apparatus, and electronic devices
CN110572866B (en) Management method of wake-up lock and electronic equipment
WO2020034104A1 (en) Voice recognition method, wearable device, and system
CN112860428A (en) High-energy-efficiency display processing method and equipment
CN114422340A (en) Log reporting method, electronic device and storage medium
WO2020019355A1 (en) Touch control method for wearable device, and wearable device and system
WO2020051852A1 (en) Method for recording and displaying information in communication process, and terminals
CN109285563B (en) Voice data processing method and device in online translation process
CN114120987B (en) Voice wake-up method, electronic equipment and chip system
CN113467747B (en) Volume adjusting method, electronic device and storage medium
CN113676339B (en) Multicast method, device, terminal equipment and computer readable storage medium
WO2021147483A1 (en) Data sharing method and apparatus
CN114116610A (en) Method, device, electronic equipment and medium for acquiring storage information
CN113467735A (en) Image adjusting method, electronic device and storage medium

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18927251

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18927251

Country of ref document: EP

Kind code of ref document: A1