WO2020019176A1 - Procédé de mise à jour de voix de réveil d'un assistant vocal par un terminal, et terminal - Google Patents

Procédé de mise à jour de voix de réveil d'un assistant vocal par un terminal, et terminal Download PDF

Info

Publication number
WO2020019176A1
WO2020019176A1 PCT/CN2018/096917 CN2018096917W WO2020019176A1 WO 2020019176 A1 WO2020019176 A1 WO 2020019176A1 CN 2018096917 W CN2018096917 W CN 2018096917W WO 2020019176 A1 WO2020019176 A1 WO 2020019176A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice data
terminal
voice
voiceprint
wake
Prior art date
Application number
PCT/CN2018/096917
Other languages
English (en)
Chinese (zh)
Inventor
许军
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to PCT/CN2018/096917 priority Critical patent/WO2020019176A1/fr
Priority to CN201880089912.7A priority patent/CN111742361B/zh
Publication of WO2020019176A1 publication Critical patent/WO2020019176A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Definitions

  • the embodiments of the present application relate to the technical field of voice control, and in particular, to a method and a terminal for updating a wake-up voice of a voice assistant by a terminal.
  • Voice assistant is an important application for mobile phones. Voice assistants can intelligently interact with users for intelligent dialogue and instant Q & A. In addition, the voice assistant can also recognize the user's voice command and cause the mobile phone to execute the event corresponding to the voice command. For example, if the voice assistant receives and recognizes the voice command "make a call to Bob" input by the user, the mobile phone can automatically make a call to the contact Bob.
  • the voice assistant is dormant. Before users want to use the voice assistant, they can wake up the voice assistant. Before performing the voice wake-up, the user needs to register a wake-up word (ie, wake-up voice) in the mobile phone to wake up the voice assistant.
  • the mobile phone can generate a voiceprint model that can characterize the voiceprint of the wakeword according to the wakeword input by the user.
  • the voice wake-up process may include: the mobile phone monitors voice data through a low-power digital signal processor (Digital Signal Processing, DSP). When the DSP detects that the similarity between the voice data and the awake word satisfies a certain condition, the DSP delivers the monitored voice data to an Application Processor (AP). The AP performs text verification and voiceprint verification on the voice data to determine whether the voice data matches the generated voiceprint model. When the voice data matches the voiceprint model, the phone can start the voice assistant.
  • DSP Digital Signal Processing
  • AP Application Processor
  • the AP performs text verification and voiceprint verification on the voice data
  • the wake-up word is rarely re-registered (ie, updated).
  • the wake-up words registered in the mobile phone are only the voice data recorded by the user in a certain noise scene under the current state of the body. Changes in the user's physical state and changes in the user's noise scene will affect the voice data sent by the user. Therefore, when the physical state of the user and / or the noise scene in which the user is present changes, if the wake-up word that is originally registered is still used for voice wake-up, the voice wake-up rate of the mobile phone will be reduced and the false wake-up of the mobile phone to perform voice wake-up rate.
  • Embodiments of the present application provide a method and a terminal for updating a wake-up voice of a voice assistant by a terminal, which can update a wake-up voice of the terminal in real time, thereby improving a voice wake-up rate of the terminal performing a voice wake-up and reducing a false wake-up rate.
  • an embodiment of the present application provides a method for a terminal to update a wake-up voice of a voice assistant.
  • the method may include: the terminal receives the first voice data input by the user; the terminal judges whether the text corresponding to the first voice data matches the text of a preset wake-up word registered in the terminal; if the text corresponding to the first voice data matches the preset wake-up If the text of the word matches, the terminal authenticates the user. If the identity authentication is passed, the terminal uses the first voice data to update the first voiceprint model in the terminal.
  • the first voiceprint model is used to perform voiceprint verification when the voice assistant is awakened, and the first voiceprint model represents the voiceprint characteristics of the preset wakeup word.
  • the first voice data is a wake-up of the voice assistant sent by the user who passed the identity authentication. voice.
  • the first voice data since the first voice data is user voice data acquired by the terminal in real time; therefore, the first voice data may reflect a user's physical state and / or a real-time condition of a noise scene in which the user is located.
  • using the first voice data to update the voiceprint model of the terminal can increase the voice wake-up rate and reduce the false wake-up rate when the terminal performs voice wake-up.
  • the first voice data is automatically acquired by the terminal during the voice wake-up process performed by the terminal, instead of prompting the user to manually re-register the wake-up word and receiving user input.
  • using the first voice data to update the voiceprint model can also simplify the process of updating the wake word.
  • the terminal performs identity authentication on the user. Specifically, the terminal uses the first voiceprint model to perform voiceprint verification on the first voice data. If the first voice data passes the voiceprint verification, it means that the identity authentication is passed.
  • the terminal may obtain first voice data that passes text verification and voice print verification when the terminal performs voice wake-up. Then, the first voiceprint model in the terminal is updated by using the first voice data.
  • the first voice data is user voice data acquired by the terminal in real time; therefore, the first voice data may reflect a user's physical state and / or a real-time condition of a noise scene in which the user is located.
  • the first voice data passes the text check and voiceprint check; therefore, updating the voiceprint model of the terminal by using the first voice data can improve the voice wake-up rate and reduce the false wake-up rate of the terminal performing voice wake-up.
  • the terminal may start a voice assistant. After the voice assistant is started, the terminal may receive valid voice commands or may not receive valid voice commands through the voice assistant. The terminal may determine whether to use the first voice data to update the first voiceprint model by determining whether the terminal has received a valid voice command.
  • the method in the embodiment of the present application further includes: when identity authentication is passed, the terminal starts a voice assistant; the terminal receives the second voice data through the voice assistant; and the terminal determines that the second voice data is a valid voice command. In this way, after the identity authentication is passed, if the terminal determines that the second voice data is a valid voice command, the terminal may use the first voice data to update the first voiceprint model in the terminal.
  • the terminal uses the first voice data to update the first voiceprint model in the terminal only after the voice assistant is activated and receives a valid voice command for triggering the terminal to perform a corresponding function. If the terminal's voice assistant starts and receives a valid voice command, it means that the voice wake-up is a valid voice wake-up that matches the user's intention.
  • the voiceprint model of the terminal is updated by using the voice data that can reflect the user's true intentions and can successfully wake up the terminal, which can further increase the voice wake-up rate of the terminal to perform voice wake-up and reduce the false wake-up rate.
  • the terminal includes a coprocessor and a main processor; the terminal uses the coprocessor to monitor voice data; when the coprocessor detects that the similarity with the preset wake-up word satisfies the pre- When the first voice data is set, the main processor is notified to determine whether the text corresponding to the first voice data matches the text of the preset wake-up word of the terminal, and determines whether the text corresponding to the first voice data and the text of the preset wake-up word are determined. When matching, the main process uses the first voiceprint model to perform voiceprint verification on the first voice data.
  • the coprocessor is a DSP and the main processor is an AP.
  • the terminal may use the first voiceprint model to perform voiceprint verification on the first voice data; if the first voice data fails, Voiceprint verification, the terminal performs text verification on the voice data received within the first preset time; if the terminal receives the second voice data and at least one text with the preset wake-up word within the first preset time For the matched voice data, the terminal authenticates the user.
  • the text corresponding to the second voice data includes a preset keyword.
  • the second voice data may be voice data in which the user complains that the voice wake-up fails, such as "how to wake up", “how not”, “not responding", “unable to wake up", and "voice wake up failed".
  • the terminal finds that the voiceprint verification of the first voice data fails. Subsequently, the terminal can receive at least one voice data that passes the text verification within the first preset time, which means that the user repeatedly wants to voice wake up the voice assistant of the terminal, but the voice wake up fails. In this case, if the terminal also receives the second voice data within the first preset time, it indicates that the user is dissatisfied with the result of the voice wake-up failure.
  • the terminal receives the second voice data and at least one voice data that has passed the text verification within the first preset time, indicating that the user has a strong willingness to wake up the voice assistant by voice; however, it may be because the user's current physical state and the user registered the wake word The difference in the physical state of the body is large, resulting in multiple speech failures. Because the received first voice data is voice data sent by the user for voice wake-up of the voice assistant under the strong will of the voice assistant of the voice wake-up terminal. Therefore, updating the voiceprint model of the terminal with voice data that can reflect the user's true intention can further increase the voice wake-up rate and reduce the false wake-up rate when the terminal performs voice wake-up.
  • the first voice data is voice data of the user acquired by the terminal in real time; therefore, the first voice data may reflect a user's physical state and / or a real-time condition of a noise scene in which the user is located. Therefore, using the first voice data to update the voiceprint model of the terminal can improve the voice wake-up rate and reduce the false wake-up rate when the terminal performs voice wake-up. Further, the received first voice data is obtained automatically by the terminal during the voice wake-up process performed by the terminal, instead of prompting the user to manually re-register the wake-up word and receiving user input. In this way, updating the voiceprint model by using the received first voice data can also simplify the process of updating the wake word.
  • the terminal authenticates the user, including: the terminal displays an authentication interface; the terminal receives the authentication information entered by the user on the authentication interface; the terminal authenticates the user based on the authentication information Perform user authentication.
  • the terminal includes a coprocessor and a main processor; the terminal uses the coprocessor to monitor voice data; when the coprocessor detects that the similarity with the preset wake-up word satisfies the pre- When the first voice data is set, the main processor is notified to determine whether the text corresponding to the first voice data matches the text of the preset wake-up word of the terminal, and when it is determined that the text corresponding to the first voice data matches the text of the preset wake-up word The main process uses the first voiceprint model to perform voiceprint verification on the first voice data.
  • the terminal uses the coprocessor to monitor the voice data in the first preset time; and notifies the main processor to determine whether the voice data received in the first preset time includes the second voice data and at least one that matches the text of the preset wake-up word.
  • the voice data, and the text corresponding to the second voice data contains preset keywords.
  • the coprocessor is a DSP and the main processor is an AP.
  • the preset wake-up word includes at least two registered voice data, and at least two of the registered voice data are recorded when the terminal registers the preset wake-up word, the first sound The pattern is generated based on at least two registered speech data.
  • the terminal After the terminal generates a new voiceprint model according to the first voice data, if the first voiceprint model is directly replaced with the new voiceprint model, the voice wakeup rate of the terminal performing voice wakeup can be improved.
  • directly replacing the first voiceprint model with a voiceprint model generated based on the new voice data ie, the first voice data
  • greatly increasing the voice wake-up rate may correspondingly increase the false wake-up rate of the terminal performing voice wake-up.
  • the method for the terminal to update the first voiceprint model in the terminal by using the first voice data may include: the terminal uses the first voice data to replace the third voice data in the at least two registered voice data to obtain at least two updated registrations The signal quality parameters of the voice data and the third voice data are lower than the signal quality parameters of other voice data in the at least two registered voice data; the terminal generates a second voiceprint model according to the updated at least two registered voice data; the terminal uses the first The two voiceprint models replace the first voiceprint model.
  • the second voiceprint model is used to characterize the voiceprint features of the at least two registered voice data after the update.
  • the terminal uses the first voice data to replace part of the voice data in the at least two registered voice data, such as the third voice data; instead of generating the second voiceprint model completely based on the first voice data.
  • the voice wake-up rate of the terminal performing voice wake-up can be relatively stably improved.
  • the false wake-up rate of the terminal performing voice wake-up can be reduced.
  • the terminal may generate a second voiceprint threshold according to the second voiceprint model and the updated at least two registered voice data; if the difference between the second voiceprint threshold and the first voiceprint threshold is less than the first preset threshold , The terminal will replace the first voiceprint model with the second voiceprint model.
  • the terminal may delete the second voiceprint model and the first voice data, that is, the first voiceprint model is not used to replace the second voiceprint model.
  • the large difference between the second voiceprint threshold and the first voiceprint threshold can prevent the wake-up rate of the terminal from performing a voice wakeup to fluctuate greatly, affecting the user experience.
  • the terminal may first update the first voiceprint model with the first voice data. It is determined whether a signal quality parameter of the first voice data is higher than a second preset threshold.
  • the signal quality parameters of the voice data are used to characterize the signal quality of the voice data.
  • the signal quality parameter of the voice data may be a signal-to-noise ratio of the voice data. If the signal quality parameter of the first voice data is higher than the second preset threshold, it means that the signal quality of the first voice data is relatively high.
  • the terminal may update the first voiceprint model by using the first voice data. If the signal quality parameter of the first voice data is lower than or equal to the second preset threshold, the terminal may delete the first voice data.
  • an embodiment of the present application provides a terminal.
  • the terminal includes a storage unit, an input unit, a text verification unit, an identity authentication unit, and an update unit.
  • the storage unit stores a preset wake-up word registered in the terminal and a first voiceprint model.
  • the first voiceprint model is used to perform voiceprint verification when the voice assistant is awakened, and the first voiceprint model represents the voiceprint characteristics of a preset wakeup word.
  • the input unit is configured to receive first voice data input by a user.
  • the text verification unit is configured to determine whether the text corresponding to the first voice data matches the text of a preset wake-up word registered in the terminal.
  • An identity authentication unit is configured to authenticate the user if the text verification unit determines that the text corresponding to the first voice data matches the text of the preset wake-up word.
  • the updating unit is configured to: if the identity authentication unit determines that the identity authentication is passed, the terminal uses the first voice data to update the first voiceprint model in the terminal.
  • the identity authentication unit is specifically configured to: use the first voiceprint model to perform voiceprint verification on the first voice data; if the voiceprint verification is passed, the identity authentication passes .
  • the terminal further includes: a starting unit and a determining unit.
  • the starting unit is configured to start the voice assistant when the identity authentication unit determines that the identity authentication is passed.
  • the input unit is further configured to receive the second voice data through a voice assistant.
  • the determining unit is configured to determine that the second voice data received by the input unit is a valid voice command after the identity authentication unit passes the identity authentication.
  • An updating unit is configured to update the first voiceprint model with the first voice data after the determining unit determines that the second voice data is a valid voice command.
  • the terminal further includes: a voiceprint verification unit.
  • the voiceprint verification unit is configured to perform voiceprint verification on the first voice data using the first voiceprint model before the identity authentication unit authenticates the user.
  • the text verification unit is further configured to perform text verification on the voice data received by the input unit within a first preset time if the voiceprint verification unit determines that the first voice data fails the voiceprint verification.
  • the identity authentication unit is specifically configured to: if the text verification unit determines that the input unit receives the second voice data and at least one voice data that matches the text of the preset wake-up word within the first preset time, authenticate the user.
  • the text corresponding to the second voice data includes a preset keyword.
  • the foregoing terminal further includes: a display unit.
  • the display unit is configured to display an authentication interface if the text verification unit determines that the input unit receives the second voice data and at least one voice data that matches the text of the preset wake-up word within the first preset time.
  • the input unit is further configured to receive authentication information input by a user on an authentication interface displayed on the display unit.
  • the identity authentication unit is specifically configured to perform user identity verification on the user according to the identity verification information received by the input unit.
  • the preset wake-up word includes at least two registered voice data, at least two of the registered voice data are recorded when the terminal registers the preset wake-up word, and the first voiceprint model It is generated based on at least two registered voice data.
  • the terminal also includes: a replacement unit and a generation unit.
  • the replacement unit is configured to replace the third voice data of the at least two registered voice data with the first voice data to obtain the updated at least two registered voice data.
  • the signal quality parameter of the third voice data is lower than the at least two registered voice data. Signal quality parameters of other voice data in the voice data.
  • a generating unit is configured to generate a second voiceprint model according to the updated at least two registered voice data obtained by the replacement unit.
  • the updating unit is configured to replace the first voiceprint model with the second voiceprint model generated by the generating unit, and the second voiceprint model is used to characterize the voiceprint features of the updated at least two registered voice data.
  • the storage unit is configured to save a first voiceprint threshold, and the first voiceprint threshold is generated by the generating unit according to the first voiceprint model and at least two registered voice data.
  • the generating unit is further configured to generate a second voiceprint model, and before the updating unit replaces the first voiceprint model with the second voiceprint model, generate a first voiceprint model according to the second voiceprint model and the updated at least two registered voice data.
  • the updating unit is specifically configured to use a second voiceprint model to replace the first voiceprint model if the difference between the second voiceprint threshold and the first voiceprint threshold generated by the generating unit is less than the first preset threshold.
  • the foregoing terminal further includes: a deleting unit.
  • the deleting unit is configured to delete the second voiceprint model and the first voice data if the difference between the second voiceprint threshold and the first voiceprint threshold generated by the generating unit is greater than or equal to the first preset threshold.
  • the update unit is specifically configured to update the first voiceprint model by using the first voice data if the signal quality parameter of the first voice data is higher than a second preset threshold.
  • the signal quality parameter of the first voice data includes a signal-to-noise ratio of the first voice data.
  • an embodiment of the present application provides a terminal.
  • the terminal may include a processor, a memory, and a display.
  • the memory, display and processor are coupled.
  • the display is used to display images generated by the processor.
  • the memory is used to store computer program code, related information of the voice assistant, preset wake-up words registered in the terminal, and the first voiceprint model.
  • the computer program code includes computer instructions.
  • the processor executes the computer instructions, the processor is configured to receive the first voice data input by the user; determine whether the text corresponding to the first voice data matches the text of the preset wake-up word; The text corresponding to a voice data matches the text of the preset wake-up word, and then the user is authenticated. If the authentication is passed, the first voice data model is updated by using the first voice data.
  • the first voiceprint model is used to perform voiceprint verification when the voice assistant is awakened, and the first voiceprint model represents the voiceprint characteristics of the preset wakeup word.
  • the foregoing processor may be further configured to perform voiceprint verification on the first voice data using the first voiceprint model. Among them, if the voiceprint verification is passed, the identity authentication is passed.
  • the foregoing processor may also be used to start a voice assistant when the identity authentication is passed; receive the second voice data through the voice assistant; after the identity authentication is passed, determine the first The second voice data is a valid voice command. After determining that the second voice data is a valid voice command, the first voice data model in the terminal is updated with the first voice data.
  • the processor includes a coprocessor and a main processor; the coprocessor voice monitors voice data; and when the coprocessor monitors the similarity with the preset wake word
  • the main processor is notified to determine whether the text corresponding to the first voice data matches the text of the preset wake-up word of the terminal, and determines that the text corresponding to the first voice data and the text of the preset wake-up word
  • the main process uses the first voiceprint model to perform voiceprint verification on the first voice data.
  • the processor is further configured to perform voiceprint verification on the first voice data using the first voiceprint model before performing user identity authentication; if the first The voice data does not pass the voiceprint verification.
  • Text verification is performed on the voice data received within the first preset time; if the second voice data and at least one text with the preset wake-up word are received within the first preset time Match the voice data to authenticate the user.
  • the text corresponding to the second voice data includes a preset keyword.
  • the processor is further configured to, if the second voice data is received within the first preset time and at least one voice data that matches the text of the preset wake-up word , Then control the display to display the authentication interface.
  • the processor is further configured to receive the authentication information input by the user on the authentication interface displayed on the display; and perform user authentication on the user according to the authentication information.
  • the foregoing processor includes a coprocessor and a main processor; the coprocessor monitors voice data; when the coprocessor detects that the similarity with the preset wake-up word satisfies the pre- When the conditional first voice data is set, the main processor is notified to determine whether the text corresponding to the first voice data matches the text of the preset wake-up word of the terminal.
  • the main process uses the first voiceprint model to perform voiceprint verification on the first voice data.
  • the coprocessor monitors the voice data in the first preset time; notifies the main processor to determine whether the voice data received in the first preset time includes the second voice data and at least one voice data that matches the text of the preset wake-up word
  • the text corresponding to the second voice data contains a preset keyword.
  • the preset wake-up word stored in the memory includes at least two registered voice data, and at least two of the registered voice data are recorded when the processor registers the preset wake-up word.
  • a voiceprint model is generated by the processor based on at least two registered voice data.
  • the processor is further configured to use the first voice data to replace the third voice data in the at least two registered voice data to obtain updated at least two registered voice data, and the signal quality parameter of the third voice data is lower than at least two Signal quality parameters of other voice data in the registered voice data; generating a second voiceprint model based on the updated at least two registered voice data; replacing the first voiceprint model with the second voiceprint model, and using the second voiceprint model with To characterize the voiceprint features of the at least two registered voice data after the update.
  • a first voiceprint threshold is also stored in the memory, and the first voiceprint threshold is generated by the processor according to the first voiceprint model and at least two registered voice data.
  • the processor is further configured to generate a second voiceprint model and replace the first voiceprint model with the second voiceprint model, and generate a second voiceprint model according to the second voiceprint model and the updated at least two registered voice data.
  • Voiceprint threshold if the difference between the second voiceprint threshold and the first voiceprint threshold is less than the first preset threshold, the second voiceprint model is used to replace the first voiceprint model.
  • the processor is further configured to delete the second voice if the difference between the second voiceprint threshold and the first voiceprint threshold is greater than or equal to the first preset threshold. Pattern and first speech data.
  • the processor is further configured to update the first voiceprint model by using the first voice data if the signal quality parameter of the first voice data is higher than a second preset threshold.
  • the signal quality parameter of the first voice data includes a signal-to-noise ratio of the first voice data.
  • an embodiment of the present application provides a computer storage medium, where the computer storage medium includes computer instructions, and when the computer instructions are run on a terminal, the terminal is caused to execute the same as the first aspect and any of the possibilities. Designed in the way described.
  • an embodiment of the present application provides a computer program product, and when the computer program product runs on a computer, the computer is caused to execute the method according to the first aspect and any one of possible design manners.
  • FIG. 1 is a first schematic diagram of a display interface example of a terminal according to an embodiment of the present application
  • FIG. 2 is a second schematic diagram of a display interface example of a terminal according to an embodiment of the present application.
  • FIG. 3 is a schematic structural diagram of a hardware structure of a terminal according to an embodiment of the present application.
  • 4A is a first flowchart of a method for a terminal to update a wake-up voice of a voice assistant according to an embodiment of the present application
  • FIG. 4B is a third schematic diagram of a display interface example of a terminal according to an embodiment of the present application.
  • 5A is a second flowchart of a method for a terminal to update a wake-up voice of a voice assistant according to an embodiment of the present application
  • FIG. 5B is a fourth schematic view of an example of a display interface of a terminal according to an embodiment of the present application.
  • FIG. 6 is a third flowchart of a method for a terminal to update a wake-up voice of a voice assistant according to an embodiment of the present application
  • FIG. 7 is a fourth flowchart of a method for a terminal to update a wake-up voice of a voice assistant according to an embodiment of the present application
  • FIG. 8 is a fifth flowchart of a method for a terminal to update a wake-up voice of a voice assistant according to an embodiment of the present application
  • FIG. 9 is a flowchart of a method for a terminal to update a wake-up voice of a voice assistant according to an embodiment of the present application.
  • FIG. 10 is a fifth schematic diagram of a display interface example of a terminal according to an embodiment of the present application.
  • 11 is a flowchart VII of a method for a terminal to update a wake-up voice of a voice assistant according to an embodiment of the present application;
  • FIG. 12 is a flowchart of a method for updating a wake-up voice of a voice assistant provided by a terminal according to an embodiment of the present application
  • FIG. 13 is a first schematic structural composition diagram of a terminal according to an embodiment of the present application.
  • FIG. 14 is a second schematic diagram of the structure and composition of a terminal according to an embodiment of the present application.
  • FIG. 15 is a third schematic structural diagram of a terminal according to an embodiment of the present application.
  • Embodiments of the present application provide a method and a terminal for updating a wake-up voice of a voice assistant by a terminal, which can be applied to a process in which the terminal performs a voice wake-up in response to voice data input by a user.
  • the terminal Before performing the voice wake-up, the terminal may receive a preset wake-up word registered by the user.
  • the preset wake-up word is used to wake up the voice assistant in the terminal, so that the terminal can provide the user with voice control services through the voice assistant.
  • the wake-up voice assistant described in the embodiments of the present application means that the terminal starts the voice assistant in response to the voice data sent by the user.
  • the voice control service means that after the terminal's voice assistant is started, the user can trigger the terminal to execute a corresponding event by sending a voice command (ie, voice data) to the voice assistant.
  • the preset wake-up word in the embodiment of the present application is a piece of voice data.
  • the voice data is a wake-up voice used to wake up the voice assistant.
  • the voice assistant may be an application (Application, APP) installed in the terminal.
  • the voice assistant may be an embedded application in the terminal (that is, a system application of the terminal) or a downloadable application.
  • the embedded application program is an application program provided as part of a terminal (such as a mobile phone) implementation.
  • the embedded application can be a Settings application, a Short Message application, a Camera application, and so on.
  • a downloadable application is an application that can provide its own Internet Protocol Multimedia Subsystem (IMS) connection.
  • the downloadable application can be an application installed in the terminal in advance or can be downloaded and installed by the user in the terminal.
  • Third-party applications in.
  • the downloadable application may be a "WeChat” application, an "Alipay” application, a "Mail” application, and the like.
  • the mobile phone 100 shown in FIG. 1 is used as an example to describe the process of registering a preset wake-up word by a terminal:
  • the mobile phone 100 can receive a user's click operation (such as a click operation) on the "Settings” application icon.
  • a user's click operation such as a click operation
  • the mobile phone 100 may display the setting interface 101 shown in (a) in FIG. 1.
  • the setting interface 101 may include a "airplane mode” option, a "WLAN” option, a “Bluetooth” option, a “mobile network” option, a “smart assistance” option 102, and the like.
  • airplane mode” option, the "WLAN” option, the “Bluetooth” option, and the “mobile network” option reference may be made to specific descriptions in conventional technologies, which are not described herein in the embodiment of the present application.
  • the mobile phone 100 may receive a user's click operation (such as a click operation) on the “smart assistance” option 102.
  • a user's click operation such as a click operation
  • the mobile phone 100 may display the smart assistance interface 103 shown in (b) of FIG. 1.
  • the smart assistant interface 103 includes a "gesture control” option 104 and a "voice control” option 105.
  • the “gesture control” option 104 is used to manage a user gesture that triggers the mobile phone 100 to execute a corresponding event.
  • the “voice control” option 105 is used to manage a voice wake-up function of the mobile phone 100.
  • the mobile phone 100 may receive a user's click operation on the “voice control” option 105, and the mobile phone 100 may display the voice control interface 106 shown in (c) of FIG. 1.
  • the voice control interface 106 includes a "voice wakeup” option 107 and an "incoming voice control” option 108.
  • the “voice wakeup” option 107 is used to enable or disable the voice wakeup function of the mobile phone 100.
  • a voice wake-up function of a terminal such as the mobile phone 100
  • the "caller voice control” option 108 is used to trigger the mobile phone 100 to enable or disable the voice wake-up function when the mobile phone 100 receives an incoming call.
  • the “call voice control” option 108 of the mobile phone 100 is turned on.
  • the mobile phone 100 receives an incoming call from another terminal and performs a call reminder, if the mobile phone 100 recognizes the voice data "answer the call" entered by the owner, the mobile phone 100 can automatically answer the call; if the mobile phone 100 recognizes the voice data entered by the owner "Hang up the phone", the mobile phone 100 can automatically reject the call.
  • the mobile phone 100 may receive a user's click operation (such as a click operation) on the "voice wakeup" option 107.
  • a user's click operation such as a click operation
  • the mobile phone 100 may display the voice wakeup interface 109 shown in (d) of FIG.
  • the voice wakeup interface 109 includes a "voice wakeup” switch 110, a "find a phone” option 111, a "how to make a call” option 112, a “wake word” option 113, and the like.
  • the “voice wakeup” switch 110 is used to trigger the mobile phone 100 to enable or disable the voice wakeup function.
  • the "Find a phone” option 111 and the "How to make a call” option 112 are used to instruct the voice control function of the mobile phone 100 after the voice assistant of the mobile phone 100 is activated.
  • the "Find a mobile phone” option 111 is used to indicate that after the voice assistant of the mobile phone 100 is activated, the voice assistant of the mobile phone 100 can respond to the user's voice data "Where are you?" To respond to the user to facilitate the user to find the mobile phone 100.
  • the "how to make a call” option 112 is used to indicate that the voice assistant of the mobile phone 100 can automatically make a call to the contact Bob in response to the user's voice data "call Bob" after the voice assistant of the mobile phone 100 is activated.
  • the “wake word” option 113 is used to register a wake up word for the mobile phone 100 to wake up the mobile phone 100 (such as the voice assistant of the mobile phone 100). Before the user has registered a custom wake-up word in the mobile phone 100, the mobile phone 100 may indicate a default wake-up word to the user. For example, it is assumed that the default wake-up word of the mobile phone 100 is "my little k".
  • the mobile phone 100 may receive a user's click operation (such as a click operation) on the "wake word” option 113 shown in (d) in FIG. 1.
  • a user's click operation such as a click operation
  • the mobile phone 100 may display the default wake word registration interface 201 shown in (a) of FIG. 2.
  • the default wakeup word registration interface 201 may include a recording progress bar 202, a "custom wakeup word” option 203, a "microphone” option 204, and a recording prompt message 205.
  • the “microphone” option 204 is used to trigger the mobile phone 100 to start recording voice data as the wake-up word.
  • the recording progress bar 202 is used to display the progress of the mobile phone 100 recording the wake-up word.
  • the recording prompt information 205 is used to indicate a default wake-up word of the mobile phone 100.
  • the recording prompt information 205 may be "Please help the mobile phone to learn the wake word (my little k), click and say‘ my little k ’”.
  • the default wake-up word registration interface 201 may further include a recording prompt message "Please record in a quiet environment, about 30 cm away from the mobile phone!.
  • the default wake-up word registration interface 201 further includes a "Cancel” button 206 and an "OK” button 207.
  • the “OK” button 207 is used to trigger the mobile phone 100 to save the recorded wake-up word.
  • the “Cancel” button 206 is used to trigger the mobile phone to cancel the registration of the wake-up word, and display the voice wake-up interface 109 shown in (d) of FIG. 1.
  • the mobile phone 100 can start recording voice data input by the user. After receiving the voice data (recorded as voice data 1) input by the user, the mobile phone 100 can determine whether the voice data 1 meets a preset condition. If the voice data 1 does not satisfy the preset condition, the mobile phone 100 may delete the voice data 1 and re-display the default wake-up word registration interface 201 shown in (a) of FIG. 2. If the voice data 1 meets a preset condition, the mobile phone 100 can save the voice data 1.
  • the voice data 1 meeting the preset condition may specifically be: the text information corresponding to the voice data 1 is the text information “my small k” of the default wake-up word, and the signal-to-noise ratio of the voice data 1 is higher than the preset Threshold.
  • the mobile phone 100 After the mobile phone 100 receives the voice data 1 that meets the preset conditions and is input by the user, it can generate a voiceprint model for voiceprint verification when the voice assistant is awakened based on the voice data 1 that meets the preset conditions, and The harmony pattern model generates a threshold for the pattern.
  • the voiceprint model can characterize the voiceprint characteristics of wake words registered by the user.
  • the voiceprint model is equivalent to a function. Different voiceprint models can be generated based on different speech data. That is, the mobile phone 100 can generate different voiceprint models according to different wake-up words registered by the same user. Different users registering the same wake-up word with the mobile phone 100 can also generate different voiceprint models.
  • the mobile phone 100 may use the voice data 1 (that is, the voice data that is input by the user when registering the wake-up word and meets the preset conditions) as an input value, and substitute it into the voiceprint model to obtain a voiceprint value (such as the voiceprint value a). ).
  • the terminal can record multiple voice data that meet preset conditions.
  • the terminal may generate a voiceprint model for performing voiceprint verification when the voice assistant is awakened based on a plurality of voice data satisfying preset conditions.
  • the mobile phone 100 may prompt the user to record the voice data again after the voice data 1 meets a preset condition, and the voice data 1 is saved.
  • the “custom wake word” option 203 is used to trigger the mobile phone 100 to display a wake word input interface.
  • the mobile phone 100 may display the wake-up word shown in (b) of FIG. 2 in response to a user's click operation (such as a click operation) on the “custom wake-up word” option 203 shown in (a) of FIG. 2.
  • the wake-up word input interface 208 may include a “cancel” button 209, an “OK” button 210, a “wake-up word input box” 211, and a wake-up word suggestion 212.
  • the “Cancel” button 209 is used to trigger the mobile phone to cancel the customized wake-up word and display the default wake-up word registration interface 201 shown in (a) of FIG.
  • the “wake word input box” 211 is used to receive a custom wake word input by a user.
  • the "OK” button 210 is used to save a custom wake-up word entered by the user in the "wake-up word input box” 211.
  • the wake-up word suggestion 212 is used to prompt the user of the mobile phone's request for a custom wake-up word.
  • the mobile phone 100 may display a custom wakeup word registration interface 213 shown in (d) of FIG. 2 in response to a user's click operation (such as a click operation) on the “OK” button 210 shown in (c) of FIG. 2. , So that the user can register a custom wake-up word on the custom wake-up word registration interface 213.
  • the method for a user to register a custom wake-up word on the custom wake-up word registration interface 213 is the same as the method for registering a default wake-up word on the default wake-up word registration interface 201, which is not described in the embodiment of the present application.
  • the mobile phone 100 may display the customized wake-up word registration interface 216 shown in (d) of FIG. 2.
  • the above-mentioned intelligent assistance may be referred to as an auxiliary function
  • the above-mentioned voice control may be referred to as a voice assistant
  • the above-mentioned voice wakeup may be referred to as a wake-up function.
  • the manner in which the user triggers the terminal to display the wake-up word registration interface includes, but is not limited to, the user's "settings-intelligent assistance-voice control-voice wake-up-wake words" "operating.
  • the manner in which the user triggers the terminal to display the wake-up word registration interface may be "settings-voice assistant-voice wake-up wake-up word".
  • the wake-up word of the mobile phone 100 is used as a default wake-up word “my little k” as an example to describe the voice wake-up process of the mobile phone 100:
  • the monitored voice data 2 may be delivered to the AP.
  • the AP performs text verification on the voice data 2.
  • the AP may use the voice data 2 as an input value and substitute it into the voiceprint model of the mobile phone 100 to obtain a voiceprint value (voiceprint value b). If the difference between the voiceprint value b and the voiceprint threshold (ie, the voiceprint value a) is less than a preset threshold, the AP may determine that the voice data 2 matches the wake-up word registered by the user.
  • some mobile phones can periodically remind the user to re-register the wake-up word.
  • the process of manually registering the wake-up word is cumbersome, and the manual registration of the wake-up word multiple times will waste the user's time and affect the user experience.
  • the terminal may obtain a valid wake-up word in the process of performing a voice wake-up, and the terminal uses the valid wake-up word to update the registered wake-up word of the user.
  • the effective wake-up word in the embodiment of the present application may include voice data of a terminal that is successfully awakened.
  • the terminal automatically obtains a valid wake-up word to update the registered wake-up word of the user, which can omit the tedious operation of the user when manually re-registering the wake-up word.
  • the principle of the method for updating the wake-up voice of the voice assistant provided by the terminal since the effective wake-up word is the voice data obtained by the terminal during the process of performing the voice wake-up; therefore, the effective wake-up word is related to the current physical state of the user and Voice data related to the noise scene that the user is currently in. And, since the effective wake-up word can successfully wake up the terminal; therefore, the degree of matching between the effective wake-up word and the wake-up word registered by the user satisfies the condition of voice wake-up.
  • the terminal uses the effective wake-up word to update the wake-up word registered by the user, and then uses the updated wake-up word to wake up the voice, it can adapt to the user's physical state and / or the noise scene in which the user is located, and further It can increase the voice wake-up rate of the mobile phone and reduce the false wake-up rate when the terminal performs voice wake-up.
  • the terminal in the embodiment of the present application may be a portable computer (such as a mobile phone), a notebook computer, a personal computer (PC), a wearable electronic device (such as a smart watch), a tablet computer, or augmented reality (AR) ⁇ Virtual reality (VR) equipment, on-board computers, and the like, the following embodiments do not specifically limit the specific form of the terminal.
  • a portable computer such as a mobile phone
  • a notebook computer such as a notebook computer
  • a wearable electronic device such as a smart watch
  • a tablet computer or augmented reality (AR) ⁇ Virtual reality (VR) equipment, on-board computers, and the like
  • AR augmented reality
  • VR Virtual reality
  • FIG. 3 shows a structural block diagram of a terminal 300 provided by an embodiment of the present application.
  • the terminal 300 may include a processor 310, an external memory interface 320, an internal memory 321, a USB interface 330, a charge management module 340, a power management module 341, a battery 342, an antenna 1, an antenna 2, a radio frequency module 350, a communication module 360, Audio module 370, speaker 370A, receiver 370B, microphone 370C, headphone interface 370D, sensor module 380, button 390, motor 391, indicator 392, camera 393, display 394, and SIM card interface 395.
  • a processor 310 may include a processor 310, an external memory interface 320, an internal memory 321, a USB interface 330, a charge management module 340, a power management module 341, a battery 342, an antenna 1, an antenna 2, a radio frequency module 350, a communication module 360, Audio module 370, speaker 370A, receiver 370B, microphone 370C, headphone interface 370D, sensor module 380
  • the sensor module can include pressure sensor 380A, gyroscope sensor 380B, barometric pressure sensor 380C, magnetic sensor 380D, acceleration sensor 380E, distance sensor 380F, proximity light sensor 380G, fingerprint sensor 380H, temperature sensor 380J, touch sensor 380K, ambient light sensor 380L, bone conduction sensor, etc.
  • the terminal 300 shown in FIG. 3 is only an example of the terminal.
  • the structure shown in FIG. 3 does not limit the terminal 300. It may include more or fewer parts than shown, or some parts may be combined, or some parts may be split, or different parts may be arranged.
  • the illustrated components can be implemented in hardware, software, or a combination of software and hardware.
  • the processor 310 may include one or more processing units.
  • the processor 310 may include an application processor (Application Processor), a modem processor, a graphics processor (Graphics Processing Unit, GPU), and an image signal processor. (Image Signal Processor, ISP), controller, memory, video codec, DSP, baseband processor, and / or neural network processing unit (NPU), etc.
  • ISP Application Processor
  • controller memory
  • video codec digital signal processor
  • DSP digital signal processor
  • NPU neural network processing unit
  • different processing units can be independent devices or integrated in the same processor.
  • the DSP can monitor the voice data in real time.
  • the voice data can be handed over to the AP.
  • the AP performs text verification and voiceprint verification on the voice data.
  • the terminal can start the voice assistant.
  • the controller may be a decision maker that directs the various components of the terminal 300 to coordinate work according to the instructions. It is the nerve center and command center of the terminal 300.
  • the controller generates operation control signals according to the instruction operation code and timing signals, and completes the control of fetching and executing the instructions.
  • the processor 310 may further include a memory for storing instructions and data.
  • the memory in the processor is a cache memory. You can save instructions or data that the processor has just used or recycled. If the processor needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided, the processor's waiting time is reduced, and the efficiency of the system is improved.
  • the processor 310 may include an interface.
  • the interface may include an integrated circuit (Inter-Integrated Circuit, I2C) interface, an integrated circuit (Inter-Integrated Circuit, Sound, I2S) interface, a pulse code modulation (Pulse Code Modulation, PCM) interface, a universal asynchronous transceiver (Universal Asynchronous Receiver / Transmitter (UART) interface, Mobile Industry Processor Interface (MIPI), General-Purpose Input / output (GPIO) interface, Subscriber Identity Module (SIM) interface, And / or universal serial bus (Universal Serial Bus, USB) interface.
  • I2C Inter-Integrated Circuit
  • I2S Inter-Integrated Circuit, Sound, I2S
  • PCM pulse code modulation
  • PCM Pulse Code Modulation
  • UART Universal Asynchronous Receiver / Transmitter
  • MIPI Mobile Industry Processor Interface
  • GPIO General-Purpose Input / output
  • SIM Subscriber Identity Module
  • USB Universal Serial Bus
  • the I2C interface is a two-way synchronous serial bus, including a serial data line (Serial Data Line, SDA) and a serial clock line (Derail Clock Line, SCL).
  • the processor may include multiple sets of I2C buses.
  • the processor can be coupled to touch sensors, chargers, flashes, cameras, etc. through different I2C bus interfaces.
  • the processor may couple the touch sensor through the I2C interface, so that the processor and the touch sensor communicate through the I2C bus interface to implement the touch function of the terminal 300.
  • the I2S interface can be used for audio communication.
  • the processor may include multiple sets of I2S buses.
  • the processor may be coupled to the audio module through an I2S bus to implement communication between the processor and the audio module.
  • the audio module can transmit audio signals to the communication module through the I2S interface, so as to implement the function of receiving calls through a Bluetooth headset.
  • the PCM interface can also be used for audio communications, sampling, quantizing, and encoding analog signals.
  • the audio module and the communication module may be coupled through a PCM bus interface.
  • the audio module can also transmit audio signals to the communication module through the PCM interface, so as to implement the function of receiving calls through a Bluetooth headset. Both the I2S interface and the PCM interface can be used for audio communication, and the sampling rates of the two interfaces are different.
  • the UART interface is a universal serial data bus for asynchronous communication. This bus is a two-way communication bus. It converts the data to be transferred between serial and parallel communications.
  • a UART interface is typically used to connect the processor and the communication module 360.
  • the processor communicates with the Bluetooth module through a UART interface to implement the Bluetooth function.
  • the audio module can transmit audio signals to the communication module through the UART interface, so as to implement the function of playing music through a Bluetooth headset.
  • the MIPI interface can be used to connect processors with peripheral devices such as displays, cameras, etc.
  • the MIPI interface includes a camera serial interface (CSI), a display serial interface (DSI), and the like.
  • the processor and the camera communicate through a CSI interface to implement a shooting function of the terminal 300.
  • the processor and the display screen communicate through a DSI interface to implement a display function of the terminal 300.
  • the GPIO interface can be configured by software.
  • the GPIO interface can be configured as a control signal or as a data signal.
  • the GPIO interface may be used to connect the processor with a camera, a display screen, a communication module, an audio module, a sensor, and the like.
  • GPIO interface can also be configured as I2C interface, I2S interface, UART interface, MIPI interface, etc.
  • the USB interface 330 may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like.
  • the USB interface can be used to connect a charger to charge the terminal 300, and can also be used to transfer data between the terminal 300 and a peripheral device. It can also be used to connect headphones and play audio through headphones. It can also be used to connect other electronic devices, such as AR devices.
  • the interface connection relationship between the modules illustrated in the embodiments of the present application is only a schematic description, and does not constitute a limitation on the structure of the terminal 300.
  • the terminal 300 may use different interface connection modes or a combination of multiple interface connection modes in the embodiments of the present application.
  • the charging management module 340 is configured to receive a charging input from a charger.
  • the charger may be a wireless charger or a wired charger.
  • the charging management module may receive a charging input of a wired charger through a USB interface.
  • the charging management module may receive a wireless charging input through a wireless charging coil of the terminal 300. While the charging management module is charging the battery, it can also supply power to the terminal device through the power management module 341.
  • the power management module 341 is used to connect the battery 342, the charge management module 340, and the processor 310.
  • the power management module receives inputs from the battery and / or charge management module, and supplies power to a processor, an internal memory, an external memory, a display screen, a camera, and a communication module.
  • the power management module can also be used to monitor battery capacity, battery cycle times, battery health (leakage, impedance) and other parameters.
  • the power management module 341 may also be disposed in the processor 310.
  • the power management module 341 and the charge management module may also be provided in the same device.
  • the wireless communication function of the terminal 300 may be implemented by the antenna module 1, the antenna module 2, the radio frequency module 350, the communication module 360, a modem, and a baseband processor.
  • the antenna 1 and the antenna 2 are used for transmitting and receiving electromagnetic wave signals.
  • Each antenna in the terminal 300 may be used to cover a single or multiple communication frequency bands. Different antennas can also be multiplexed to improve antenna utilization. For example, a cellular network antenna can be multiplexed into a wireless LAN diversity antenna. In some embodiments, the antenna may be used in conjunction with a tuning switch.
  • the radio frequency module 350 may provide a communication processing module for a wireless communication solution including 2G / 3G / 4G / 5G and the like applied on the terminal 300. It may include at least one filter, switch, power amplifier, Low Noise Amplifier (LNA), and the like.
  • the radio frequency module receives electromagnetic waves from the antenna 1, and processes the received electromagnetic waves by filtering, amplifying, etc., and transmitting them to the modem for demodulation.
  • the radio frequency module can also amplify the signal modulated by the modem and turn it into electromagnetic wave radiation through the antenna 1.
  • at least part of the functional modules of the radio frequency module 350 may be disposed in the processor 310. In some embodiments, at least part of the functional modules of the radio frequency module 350 may be provided in the same device as at least part of the modules of the processor 310.
  • the modem may include a modulator and a demodulator.
  • the modulator is used for modulating the low-frequency baseband signal to be transmitted into a medium-high frequency signal.
  • the demodulator is used to demodulate the received electromagnetic wave signal into a low-frequency baseband signal.
  • the demodulator then transmits the demodulated low-frequency baseband signal to the baseband processor for processing.
  • the low-frequency baseband signal is processed by the baseband processor and then passed to the application processor.
  • the application processor outputs sound signals through audio equipment (not limited to speakers, receivers, etc.), or displays images or videos through a display screen.
  • the modem may be a separate device.
  • the modem may be independent of the processor and disposed in the same device as the radio frequency module or other functional modules.
  • the communication module 360 can provide wireless local area networks (WLAN) (such as Wireless Fidelity (Wi-Fi) networks), Bluetooth (BlueTooth, BT), and global navigation satellite systems applied to the terminal 300. (Global Navigation System, GNSS), Frequency Modulation (Frequency Modulation, FM), Near Field Communication (NFC), Infrared (IR) and other wireless communication solutions.
  • WLAN wireless local area networks
  • GNSS Global Navigation System
  • Frequency Modulation Frequency Modulation, FM
  • NFC Near Field Communication
  • IR Infrared
  • the communication module 360 may be one or more devices that integrate at least one communication processing module.
  • the communication module receives the electromagnetic wave through the antenna 2, frequency-modulates and filters the electromagnetic wave signal, and sends the processed signal to the processor.
  • the communication module 360 may also receive a signal to be transmitted from the processor, frequency-modulate it, amplify it, and turn it into electromagnetic wave radiation through the antenna 2.
  • the antenna 1 of the terminal 300 is coupled to a radio frequency module, and the antenna 2 is coupled to a communication module 360.
  • the wireless communication technology may include a Global System for Mobile Communications (GSM), a General Packet Radio Service (GPRS), a Code Division Multiple Access (CDMA), and a broadband Code Division Multiple Access (WCDMA), Time-Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC , FM, and / or IR technology.
  • the GNSS may include a Global Positioning System (GPS), a Global Navigation Satellite System (GLONASS), a BeiDou Navigation Navigation Satellite System (BDS), and a Quasi-Zenith Satellite System (Quasi).
  • GPS Global Positioning System
  • GLONASS Global Navigation Satellite System
  • BDS BeiDou Navigation Navigation Satellite System
  • QZSS Quasi-Zenith Satellite System
  • SBAS Satellite Based Augmentation Systems
  • the terminal 300 implements a display function through a GPU, a display screen 394, and an application processor.
  • the GPU is a microprocessor for image processing, which connects the display screen and the application processor.
  • the GPU is used to perform mathematical and geometric calculations for graphics rendering.
  • the processor 310 may include one or more GPUs that execute program instructions to generate or change display information.
  • the display 394 is used to display images, videos, and the like.
  • the display includes a display panel.
  • the display panel can use a liquid crystal display (Liquid Crystal Display, LCD), organic light emitting diode (Organic Light-Emitting Diode, OLED), active matrix organic light emitting diode or active matrix organic light emitting diode (Active-Matrix Organic Light Emitting (Diode, AMOLED), Flexible Light-Emitting Diode (FLED), Miniled, MicroLed, Micro-oLed, Quantum Dot Light (Emitting Diodes, QLED), etc.
  • the terminal 300 may include one or N display screens, where N is a positive integer greater than 1.
  • the terminal 300 can implement a shooting function through an ISP, a camera 393, a video codec, a GPU, a display screen, and an application processor.
  • ISP is used to process data from camera feedback. For example, when taking a picture, the shutter is opened, and the light is transmitted to the light receiving element of the camera through the lens. The light signal is converted into an electrical signal, and the light receiving element of the camera passes the electrical signal to the ISP for processing and converts the image to the naked eye. ISP can also optimize the image's noise, brightness, and skin tone. ISP can also optimize the exposure, color temperature and other parameters of the shooting scene. In some embodiments, an ISP may be provided in the camera 393.
  • the camera 393 is used to capture still images or videos.
  • An object generates an optical image through a lens and projects it onto a photosensitive element.
  • the photosensitive element may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then passes the electrical signal to the ISP to convert it into a digital image signal.
  • the ISP outputs digital image signals to the DSP for processing.
  • DSP converts digital image signals into image signals in standard RGB, YUV and other formats.
  • the terminal 300 may include one or N cameras, where N is a positive integer greater than 1.
  • a digital signal processor is used to process digital signals. In addition to digital image signals, it can also process other digital signals. For example, when the terminal 300 selects at a frequency point, the digital signal processor is used to perform a Fourier transform on the frequency point energy and the like.
  • Video codecs are used to compress or decompress digital video.
  • the terminal 300 may support one or more codecs. In this way, the terminal 300 can play or record videos in multiple encoding formats, such as: MPEG1, MPEG2, MPEG3, MPEG4, and so on.
  • NPU is a neural network (Neural-Network, NN) computing processor.
  • NN neural network
  • the NPU can quickly process input information and continuously learn.
  • applications such as intelligent recognition of the terminal 300 can be implemented, such as: image recognition, face recognition, speech recognition, text understanding, and the like.
  • the external memory interface 320 may be used to connect an external memory card, such as a Micro SD card, to realize the expansion of the storage capacity of the terminal 300.
  • the external memory card communicates with the processor through an external memory interface to implement a data storage function. For example, save music, videos and other files on an external memory card.
  • the internal memory 321 may be used to store computer executable program code, where the executable program code includes instructions.
  • the processor 310 executes various functional applications and data processing of the terminal 300 by running instructions stored in the internal memory 321.
  • the memory 321 may include a storage program area and a storage data area.
  • the storage program area may store an operating system, at least one application required by a function (such as a sound playback function, an image playback function, etc.) and the like.
  • the storage data area can store data (such as audio data, phone book, etc.) created during the use of the terminal 300.
  • the data (such as audio data, phone book, etc.) created during the use of the terminal 300 may be referred to as user data.
  • the internal memory 321 may include high-speed random access memory (RAM), read-only memory (Read Only Memory, ROM), and may also include non-volatile memory, such as at least one disk storage device, flash memory device, Other volatile solid-state storage devices, Universal Flash Memory (Universal Flash Storage, UFS), etc.
  • RAM random access memory
  • ROM read-only memory
  • non-volatile memory such as at least one disk storage device, flash memory device, Other volatile solid-state storage devices, Universal Flash Memory (Universal Flash Storage, UFS), etc.
  • the internal memory 321 includes a data partition (such as a data partition) described in the embodiment of the present application.
  • the data partition stores files or data that need to be read and written when the operating system starts, and user data created during terminal use.
  • the data partition may be a storage area set in advance in the internal memory 321.
  • the data partition may be contained in a RAM in the internal memory 321.
  • the virtual data partition in the embodiment of the present application may be a storage area of the RAM in the internal memory 321.
  • the virtual data partition may be a storage area of a ROM in the internal memory 321.
  • the virtual data partition may be an external memory card connected to the external memory interface 320, such as a Micro SD card.
  • the terminal 300 can implement audio functions through an audio module 370, a speaker 370A, a receiver 370B, a microphone 370C, a headphone interface 370D, and an application processor. Such as music playback, recording, etc.
  • the audio module is used to convert digital audio information into an analog audio signal output, and is also used to convert an analog audio input into a digital audio signal.
  • the audio module can also be used to encode and decode audio signals.
  • the audio module may be disposed in the processor 310, or some functional modules of the audio module may be disposed in the processor 310.
  • the speaker 370A also called a "horn" is used to convert audio electrical signals into sound signals.
  • the terminal 300 can listen to music through a speaker or listen to a hands-free call.
  • the receiver 370B also known as the "earpiece" is used to convert audio electrical signals into sound signals.
  • the terminal 300 answers a call or a voice message, it can answer the voice by holding the receiver close to the human ear.
  • Microphone 370C also called “microphone”, “microphone”, is used to convert sound signals into electrical signals.
  • the user can make a sound through the mouth close to the microphone, and input the sound signal into the microphone.
  • the terminal 300 may be provided with at least one microphone.
  • the terminal 300 may be provided with two microphones, and in addition to collecting sound signals, it may also implement a noise reduction function.
  • the terminal 300 may further be provided with three, four, or more microphones to collect sound signals, reduce noise, and also identify sound sources, and implement a directional recording function.
  • the headset interface 370D is used to connect a wired headset.
  • the earphone interface can be a USB interface or a 3.5mm Open Mobile Terminal Platform (OMTP) standard interface, and the Cellular Telecommunications Industry Association of the USA (CTIA) standard interface.
  • OMTP Open Mobile Terminal Platform
  • CTIA Cellular Telecommunications Industry Association of the USA
  • the pressure sensor 380A is used to sense the pressure signal, and can convert the pressure signal into an electrical signal.
  • the pressure sensor may be disposed on the display screen.
  • the capacitive pressure sensor may be at least two parallel plates having a conductive material. When a force is applied to the pressure sensor, the capacitance between the electrodes changes.
  • the terminal 300 determines the intensity of the pressure according to the change in capacitance.
  • the terminal 300 detects the intensity of the touch operation according to a pressure sensor.
  • the terminal 300 may also calculate the touched position based on the detection signal of the pressure sensor.
  • touch operations acting on the same touch position but different touch operation intensities may correspond to different operation instructions. For example, when a touch operation with a touch operation intensity lower than the first pressure threshold is applied to the short message application icon, an instruction for viewing the short message is executed. When a touch operation with a touch operation intensity greater than or equal to the first pressure threshold is applied to the short message application icon, an instruction for creating a short message is executed.
  • the gyro sensor 380B may be used to determine a motion posture of the terminal 300.
  • the angular velocity of the terminal 300 around three axes may be determined by a gyro sensor.
  • a gyroscope sensor can be used for image stabilization. Exemplarily, when the shutter is pressed, the gyro sensor detects the angle of the terminal 300 to shake, and calculates the distance that the lens module needs to compensate according to the angle, so that the lens can offset the shake of the terminal 300 through reverse movement to achieve anti-shake.
  • the gyroscope sensor can also be used for navigation and somatosensory game scenes.
  • the barometric pressure sensor 380C is used to measure air pressure.
  • the terminal 300 calculates the altitude through the air pressure value measured by the air pressure sensor to assist in positioning and navigation.
  • the magnetic sensor 380D includes a Hall sensor.
  • the terminal 300 can detect the opening and closing of the flip leather case by using a magnetic sensor.
  • the terminal 300 may detect the opening and closing of the flip according to a magnetic sensor. Further, according to the opened and closed state of the holster or the opened and closed state of the flip cover, characteristics such as automatic unlocking of the flip cover are set.
  • the acceleration sensor 380E can detect the magnitude of the acceleration of the terminal 300 in various directions (generally three axes).
  • the magnitude and direction of gravity can be detected when the terminal 300 is stationary. It can also be used to identify the posture of the terminal, and is used in applications such as switching between horizontal and vertical screens, and pedometers.
  • the terminal 300 can measure the distance by infrared or laser. In some embodiments, when shooting a scene, the terminal 300 may use a distance sensor to measure distances to achieve fast focusing.
  • the proximity light sensor 380G may include, for example, a light emitting diode (LED) and a light detector, such as a photodiode.
  • the light emitting diode may be an infrared light emitting diode. Infrared light is emitted outward through a light emitting diode.
  • the terminal 300 may use a proximity light sensor to detect that the user is holding the terminal 300 close to the ear to talk, so as to automatically turn off the screen to save power.
  • the proximity light sensor can also be used in holster mode, and the pocket mode automatically unlocks and locks the screen.
  • Ambient light sensor 380L is used to sense ambient light brightness.
  • the terminal 300 can adaptively adjust the brightness of the display screen according to the perceived ambient light brightness.
  • the ambient light sensor can also be used to automatically adjust the white balance when taking pictures.
  • the ambient light sensor can also cooperate with the proximity light sensor to detect whether the terminal 300 is in a pocket to prevent accidental touch.
  • the fingerprint sensor 380H is used to collect fingerprints.
  • the terminal 300 may use the collected fingerprint characteristics to realize fingerprint unlocking, access application lock, fingerprint photographing, fingerprint answering an incoming call, and the like.
  • the temperature sensor 380J is used to detect the temperature.
  • the terminal 300 executes a temperature processing strategy using the temperature detected by the temperature sensor. For example, when the temperature reported by the temperature sensor exceeds a threshold, the terminal 300 executes reducing the performance of a processor located near the temperature sensor in order to reduce power consumption and implement thermal protection.
  • the touch sensor 380K is also called “touch panel”. Can be set on the display. Used to detect touch operations on or near it. The detected touch operation can be passed to the application processor to determine the type of touch event and provide corresponding visual output through the display screen.
  • the bone conduction sensor 380M can acquire vibration signals.
  • the bone conduction sensor may obtain a vibration signal of a human voice oscillating bone mass.
  • Bone conduction sensors can also touch the human pulse and receive blood pressure beating signals.
  • a bone conduction sensor may also be provided in the headset.
  • the audio module 370 may analyze a voice signal based on a vibration signal of the oscillating bone mass obtained by the bone conduction sensor to implement a voice function.
  • the application processor may analyze the heart rate information based on the blood pressure beating signal obtained by the bone conduction sensor to implement a heart rate detection function.
  • the keys 390 include a start key, a volume key, and the like.
  • the keys can be mechanical keys. It can also be a touch button.
  • the terminal 300 receives key input, and generates key signal inputs related to user settings and function control of the terminal 300.
  • the motor 391 may generate a vibration alert.
  • the motor can be used for incoming vibration alert and touch vibration feedback.
  • the touch operation applied to different applications can correspond to different vibration feedback effects.
  • Touch operations on different areas of the display can also correspond to different vibration feedback effects.
  • Different application scenarios (such as time reminders, receiving information, alarm clocks, games, etc.) can also correspond to different vibration feedback effects.
  • Touch vibration feedback effect can also support customization.
  • the indicator 392 can be an indicator light, which can be used to indicate the charging status, power change, and can also be used to indicate messages, missed calls, notifications, etc.
  • the SIM card interface 395 is used to connect to a Subscriber Identity Module (SIM).
  • SIM Subscriber Identity Module
  • the SIM card can be contacted and separated from the terminal 300 by inserting or removing the SIM card interface.
  • the terminal 300 may support one or N SIM card interfaces, and N is a positive integer greater than 1.
  • the SIM card interface can support Nano SIM cards, Micro SIM cards, SIM cards, etc. Multiple SIM cards can be inserted into the same SIM card interface at the same time. The types of the multiple cards may be the same or different.
  • the SIM card interface is also compatible with different types of SIM cards.
  • the SIM card interface is also compatible with external memory cards.
  • the terminal 300 interacts with the network through the SIM card, and realizes functions such as calling and data communication.
  • the terminal 300 uses an eSIM, that is, an embedded SIM card.
  • the eSIM card can be embedded in the terminal 300 and cannot be separated from the terminal 300.
  • the method for updating the wake-up voice of the voice assistant provided by the terminal provided in the embodiment of the present application may be implemented in the terminal 300 described above.
  • An embodiment of the present application provides a method for a terminal to update a wake-up voice of a voice assistant.
  • the terminal 300 may receive the first voice data input by the user; determine whether the text corresponding to the first voice data matches the text of the preset wake-up word registered in the terminal 300; if the text corresponding to the first voice data matches the text of the preset wake-up word If they match, the terminal 300 authenticates the user. If the authentication succeeds, the terminal 300 uses the first voice data to update the first voiceprint model in the terminal.
  • the first voiceprint model is used to perform voiceprint verification when the voice assistant is awakened, and the first voiceprint model represents the voiceprint characteristics of the preset wakeup word in the terminal.
  • the terminal performs identity authentication on the user. Specifically, the terminal uses the first voiceprint model to perform voiceprint verification on the first voice data. If the first voice data passes the voiceprint verification, it means that the identity authentication is passed.
  • the first voice data is a wake-up of the voice assistant sent by the user who passed the identity authentication. voice.
  • the first voice data since the first voice data is the voice data of the user acquired by the terminal 300 in real time; therefore, the first voice data may reflect the physical state of the user and / or the real-time condition of the noise scene in which the user is located.
  • using the first voice data to update the voiceprint model of the terminal 300 can improve the voice wake-up rate and reduce the false wake-up rate when the terminal performs voice wake-up.
  • the first voice data is automatically acquired by the terminal 300 during the voice wake-up process performed by the terminal 300, instead of prompting the user to manually re-register the wake-up word to receive user input.
  • using the first voice data to update the voiceprint model can also simplify the process of updating the wake word.
  • An embodiment of the present application provides a method for a terminal to update a wake-up voice of a voice assistant.
  • the method for updating the wake-up voice of the voice assistant by the terminal may include S401-S405:
  • the terminal 300 receives first voice data.
  • the terminal 300 determines whether the text corresponding to the first voice data matches the text of the preset wake-up word registered in the terminal.
  • the DSP of the terminal 300 may notify the AP of the terminal 300 to perform text verification and voice print verification on the first voice data.
  • the AP may perform text verification on the first voice data by determining whether the text corresponding to the first voice data matches the text of a preset wake-up word registered in the terminal. If the text corresponding to the first voice data matches the text of the preset wake-up word registered in the terminal (if the same), the AP may continue to perform voiceprint verification on the first voice data, that is, the terminal 300 continues to execute S403. If the text corresponding to the first voice data does not match the text of the preset wake-up word registered in the terminal, the terminal 300 may delete the first voice data, that is, the terminal 300 may continue to execute S405.
  • the terminal 300 performs voiceprint verification on the first voice data using the first voiceprint model.
  • the first voiceprint model is used to perform voiceprint verification when the voice assistant is woken up.
  • the first voiceprint model is used to characterize the voiceprint features of the wake-up words registered in the terminal 300.
  • terminal registering wake-up words when the terminal 300 registers a preset wake-up word, voice data (referred to as registered voice data) is recorded.
  • the preset wake-up word registered in the terminal 300 may include the registered voice data.
  • the first voiceprint model is generated based on the registered voice data.
  • the registered voice data can be used as an input value to substitute the first voiceprint model to obtain the first voiceprint threshold.
  • the method for the terminal 300 to perform voiceprint verification on the first voice data using the first voiceprint model may include: After the terminal 300 determines that the first voice data passes the text verification, the terminal 300 may use the first voice data as an input value and substitute it into the first Voiceprint model to get a voiceprint value. The terminal 300 determines whether the difference between the voiceprint value and the first voiceprint threshold is less than a preset threshold. If the difference between the voiceprint value and the first voiceprint threshold is less than a preset threshold, the voiceprint verification passes. If the difference between the voiceprint value and the first voiceprint threshold is greater than or equal to a preset threshold, the voiceprint verification fails.
  • the terminal 300 may use the first voice data to update the first voiceprint model in the terminal 300, that is, the terminal 300 may continue to execute S404. If the first voice data fails the voiceprint verification, the terminal 300 may delete the first voice data, that is, the terminal 300 may continue to execute S405.
  • the terminal 300 updates the first voiceprint model in the terminal 300 with the first voice data.
  • the method in which the terminal 300 uses the first voice data to update the first voiceprint model may include: the terminal 300 generates a second voiceprint model according to the first voice data, and uses the second voiceprint model to replace the first voiceprint model .
  • the method for generating the second voiceprint model by the terminal 300 according to the first voice data may refer to the method for generating a voiceprint model by the terminal in the conventional technology. This embodiment of the present application will not repeat them here.
  • the terminal 300 deletes the first voice data.
  • An embodiment of the present application provides a method for a terminal to update a wake-up voice of a voice assistant.
  • the terminal 300 may obtain first voice data that passes text verification and voice print verification when the terminal 300 performs voice wake-up. Then, the first voice data model in the terminal 300 is updated using the first voice data.
  • the first voice data is the voice data of the user obtained by the terminal 300 in real time; therefore, the first voice data may reflect the physical state of the user and / or the real-time condition of the noise scene in which the user is located.
  • the first voice data passes the text check and voiceprint check; therefore, using the first voice data to update the voiceprint model of the terminal 300 can improve the voice wake-up rate and reduce the false wake-up rate when the terminal performs voice wake-up.
  • the first voice data is automatically acquired by the terminal 300 during the voice wake-up process performed by the terminal 300, instead of prompting the user to manually re-register the wake-up word to receive user input.
  • using the first voice data to update the voiceprint model can also simplify the process of updating the wake word.
  • the terminal 300 may start a voice assistant.
  • the user may speak a preset wake-up word (ie, voice data) of the terminal 300 during a conversation with others.
  • voice data the real purpose of the user speaking the preset wake-up word of the terminal 300 is not to start the voice assistant.
  • the voice assistant of the terminal 300 is activated, the user will not trigger the terminal 300 to perform any function through voice.
  • this type of voice wakeup is referred to as invalid voice wakeup. That is, after the voice assistant is started, the terminal 300 does not receive a valid voice command through the voice assistant.
  • the terminal 300 can determine whether to use the first voice data to update the first voiceprint model in the terminal 300 by determining whether the voice assistant has received a valid voice command after the voice assistant is started.
  • an embodiment of the present application provides a method for a terminal to update a wake-up voice of a voice assistant.
  • the method for updating the wake-up voice of the voice assistant by the terminal may include S401-S403, S501-S503, S404, and S405:
  • the terminal 300 may continue to execute S501-S503. If the first voice data fails the voiceprint verification, the terminal 300 may continue to execute S405.
  • the terminal 300 starts a voice assistant.
  • the terminal 300 receives the second voice data through a voice assistant.
  • the voice assistant After the voice assistant is started, it can receive the second voice data input by the user, and trigger the terminal 100 to execute the function corresponding to the second voice data.
  • the terminal is a mobile phone 400 shown in FIG. 4B as an example.
  • the mobile phone 400 may display the "voice assistant" interface 401 shown in FIG. 4B.
  • the "Voice Assistant" interface 401 includes a "Record” button 403 and a "Setting” option 404.
  • the mobile phone 400 may receive a voice command issued by the user in response to a user's click operation (such as a long-press operation) on the "Record” button 403, and trigger the mobile phone 400 to execute an event corresponding to the voice command.
  • the “setting” option 404 is used to set various functions and parameters of the “Voice Assistant” application.
  • the mobile phone 400 may receive a user's click operation on the “setting” option 306 in the voice control interface 303. In response to the user's click operation on the “setting” option 404, the mobile phone 400 may display the voice control interface 106 shown in (c) in FIG. 1.
  • the "voice assistant" interface 401 may further include prompt information 402. The prompt information 402 is used to indicate a common function of the "Voice Assistant" application to the user.
  • the "voice assistant" interface 401 may not include a “record” button 403.
  • the user does not need to click any button (such as the "Record” button 403) in the "Voice Assistant” interface, and the mobile phone 400 can also record voice commands issued by the user.
  • the "Voice Assistant" interface of the terminal 300 includes, but is not limited to, the “Voice Assistant” interface 401 shown in FIG. 4B.
  • the terminal 300 determines whether the second voice data is a valid voice command.
  • the effective voice command described in the embodiment of the present application refers to an instruction capable of triggering the terminal 300 to perform a corresponding function.
  • the terminal 300 receives an instruction (that is, a valid voice command) for triggering the terminal 300 to perform a corresponding function through the voice assistant, it means that the terminal will execute the corresponding function in response to the valid voice command , You can determine that this voice wakeup is a voice wakeup that matches the user's intention. In the embodiment of the present application, this voice wakeup is referred to as effective voice wakeup.
  • the terminal 300 performs a voice wake-up rate of voice wake-up.
  • the terminal only updates the wake-up word of the terminal 300 with the voice data corresponding to the effective voice wake-up.
  • the voice assistant of the terminal 300 receives a valid voice command after being started, it means that the user using the first voice data to wake up the voice assistant of the terminal 300 is a valid voice wakeup, that is, the second voice data is a valid voice Command, the terminal 300 can execute 404.
  • the second voice data is not received after the voice assistant of the terminal 300 is started, it means that the user using the first voice data to wake up the voice assistant of the terminal 300 is an invalid voice wakeup, that is, the second voice data is not a valid voice command.
  • the terminal 300 may delete the first voice data, that is, execute S405.
  • the terminal 300 uses the first voice data to update the first voice data in the terminal 300 only after receiving a valid voice command for triggering the terminal 300 to perform a corresponding function.
  • a voiceprint model If a valid voice command is received after the voice assistant of the terminal 300 is started, it means that the voice wakeup is a valid voice wakeup in accordance with the user's intention.
  • the voiceprint model of the terminal 300 is updated by using the voice data that can reflect the user's true intention and can successfully wake up the terminal 300, which can further improve the voice wake-up rate of the terminal to perform voice wake-up and reduce the false wake-up rate.
  • the terminal 300 uses the first voice data to update the first voiceprint model, the terminal 300 uses the updated voiceprint model to perform voice wake-up, which will affect the success of voice wake-up rate.
  • the terminal 300 may determine whether the signal quality parameter of the first voice data is higher than that of the first voiceprint model before using the first voice data to update the first voiceprint model.
  • Two preset thresholds The signal quality parameters of the voice data are used to characterize the signal quality of the voice data.
  • the signal quality parameter of the voice data may be a signal-to-noise ratio of the voice data. If the signal quality parameter of the first voice data is higher than the second preset threshold, it means that the signal quality of the first voice data is relatively high. In this case, the terminal 300 may update the first voiceprint model by using the first voice data. If the signal quality parameter of the first voice data is lower than or equal to the second preset threshold, the terminal 300 may delete the first voice data.
  • the user may also decide whether to use the first voice data to update the first voiceprint model in the terminal 300.
  • the terminal 300 may further display a first interface for prompting the user whether to update the voiceprint model.
  • the terminal 300 determines whether to update the voiceprint model according to the user's selection in the first interface.
  • the terminal 300 is a mobile phone 500 shown in FIG. 5B as an example.
  • the mobile phone 500 may display the first interface 501 shown in FIG. 5B before the first voiceprint model in the mobile phone 500 is updated with the first voice data.
  • the first interface 501 is used to prompt the user whether to update the voiceprint model (that is, the wake word).
  • the first interface 501 includes first prompt information, such as "the mobile phone obtains voice data that can update the wake-up word during the voice wake-up process” and "is the wake-up word updated?"
  • the first interface 501 further includes: an "update” option for triggering the mobile phone 500 to update the voiceprint model and a "cancel” option for triggering the mobile phone 500 not to update the voiceprint model.
  • the terminal 300 before updating the voiceprint model, displays a first interface for prompting the user whether to update the voiceprint model. In this way, the user can decide whether to update the first interface of the voiceprint model. That is, the terminal 300 can determine whether to update the first interface of the voiceprint model according to user requirements, which can improve the interaction performance between the terminal 300 and the user, and improve the user experience.
  • terminal registering wake-up words it is known that when the terminal 300 registers a preset wake-up word, one or more voice data (referred to as registered voice data) is recorded.
  • the first voiceprint model is generated based on the one or more registered voice data. It is assumed that the first voiceprint model is generated based on at least two registered voice data. Then, after the terminal 300 generates a new voiceprint model according to the first voice data, if the first voiceprint model is directly replaced with the new voiceprint model, the voice wakeup rate of the terminal 300 performing voice wakeup can be improved.
  • the method in which the terminal 300 uses the first voice data to update the first voiceprint model may include S601-S603.
  • S404 shown in FIG. 5A may include S601-S603:
  • the terminal 300 uses the first voice data to replace the third voice data in the at least two registered voice data, and obtains at least two updated registered voice data.
  • the terminal 300 generates a second voiceprint model according to the updated at least two registered voice data.
  • the terminal 300 replaces the first voiceprint model with the second voiceprint model.
  • the terminal 300 may determine the third voice data from the at least two registered voice data saved by the terminal 300.
  • the third voice data is voice data in which the signal quality parameter of the at least two registered voice data is lower than the signal quality parameters of other voice data.
  • the terminal 300 uses the first voice data to replace the third voice data whose signal quality parameter is lower than the signal quality parameters of other voice data; and then generates a second voiceprint model according to the updated at least two registered voice data.
  • the voice data replaced by the first voice data in the at least two registered voice data has lower signal quality parameters. That is, the signal quality parameters of the retained voice data (that is, at least two updated registered voice data) are higher.
  • the second voiceprint model generated by the terminal 300 based on the voice data with higher signal quality parameters can more accurately and clearly characterize the voiceprint characteristics of the user.
  • the terminal 300 uses the second voiceprint model to perform voice wake-up, which can increase the voice wake-up rate and reduce the false wake-up rate of the terminal performing voice wake-up.
  • the third voice data may be the earliest voice data stored by the terminal among the at least two registered voice data.
  • the earliest voice data that is, the third voice data
  • the terminal is related to the user's current physical state and the user's current
  • the consistency of the real-time conditions of the noise scene at the place is low. Therefore, after the first voice data is used to replace the third voice data, the real-time conditions of the retained voice data (that is, at least two registered voice data after update) and the current physical state of the user and the noise scene in which the user is currently located can be improved.
  • the real-time conditions of the retained voice data that is, at least two registered voice data after update
  • the second voiceprint model generated by the terminal 300 according to the voice data with a higher degree of conformity can more accurately and clearly characterize the voiceprint characteristics of the user under the user's current body state and the current noise scene.
  • the terminal 300 uses the second voiceprint model to perform voice wake-up, which can increase the voice wake-up rate and reduce the false wake-up rate of the terminal performing voice wake-up.
  • the terminal 300 uses the first voice data to replace part of the voice data in the at least two registered voice data, such as the third voice data; instead of generating the second voiceprint model completely based on the first voice data.
  • the voice wake-up rate of the voice wake-up performed by the terminal 300 can be relatively stabilized.
  • the false wake-up rate of the voice wake-up performed by the terminal 300 can be reduced.
  • the method in the embodiment of the present application may further include S701-S702:
  • the terminal 300 generates a second voiceprint threshold according to the second voiceprint model and the updated at least two registered voice data.
  • the second voiceprint model is equivalent to a function.
  • the terminal 300 may use each of the updated at least two registered voice data as input values, respectively, and substitute them into the second voiceprint model to obtain at least two voiceprint thresholds.
  • the terminal 300 may calculate an average value of the at least two voiceprint thresholds to obtain a second voiceprint threshold.
  • the at least two updated registered voice data include the registered voice data a and the registered voice data b.
  • the terminal 300 may substitute the registered voice data a into the second voiceprint model to obtain the voiceprint threshold A; substitute the registered voice data b into the second voiceprint model to obtain the voiceprint threshold B; calculate the voiceprint threshold A and the voiceprint threshold B. The average, to get the second voiceprint threshold.
  • the terminal 300 determines whether a difference between the second voiceprint threshold and the first voiceprint threshold is less than a first preset threshold.
  • the terminal 300 may execute S603.
  • the terminal 300 may execute S703:
  • the terminal 300 deletes the second voiceprint model and the first voice data.
  • the terminal 300 deletes the second voiceprint model and the first voice data, that is, the first voiceprint model is not used to replace the second voiceprint model.
  • the large difference between the second voiceprint threshold and the first voiceprint threshold can prevent the wake-up rate of the terminal 300 from performing the voice wakeup from fluctuating greatly, affecting the user experience.
  • An embodiment of the present application provides a method for a terminal to update a wake-up voice of a voice assistant.
  • the method for updating the wake-up voice of the voice assistant by the terminal may include S801-S808:
  • the terminal 300 receives first voice data.
  • the terminal 300 determines whether the text corresponding to the first voice data matches the text of the preset wake-up word registered in the terminal.
  • the AP may continue to perform voiceprint verification on the first voice data, that is, the terminal 300 continues to perform S803. If the text corresponding to the first voice data does not match the text of the preset wake-up word registered in the terminal, the terminal 300 may delete the first voice data, that is, the terminal 300 may continue to execute S808.
  • the terminal 300 performs voiceprint verification on the first voice data by using the first voiceprint model.
  • the terminal 300 may continue to execute S804. If the first voice data fails the voiceprint verification, the terminal 300 may continue to execute S808.
  • the terminal 300 starts a voice assistant.
  • the terminal 300 performs text verification on the voice data received within the first preset time.
  • the terminal 300 determines whether the terminal 300 receives the second voice data and at least one voice data that matches the text of the preset wake-up word within the first preset time.
  • the first preset time is a pre-determined time determined from the terminal 300 that the first voice data is the same as the text information of the wake-up word registered in the terminal 300 (that is, the first voice data passes the text verification), but fails to start the voiceprint verification Set the time period.
  • the AP of the terminal 300 is in a sleep state.
  • the DSP of the terminal 300 monitors the first voice data.
  • the DSP hands the monitored voice data to the AP, and the AP is woken up.
  • the AP performs text verification and voiceprint verification on the voice data to determine whether the voice data matches the generated voiceprint model.
  • the AP enters the sleep state until it receives the voice data sent by the DSP again.
  • the DSP will only send to the AP voice data that has a certain degree of similarity with the wake word registered in the terminal 300.
  • the AP only performs text verification and voiceprint verification on the voice data sent by the DSP (that is, the voice data whose similarity with the wake-up word registered in the terminal 300 satisfies certain conditions).
  • the DSP can recognize the first voice data and the wake-up registered in the terminal 300 The similarity of words meets certain conditions.
  • the DSP may transmit the first voice data to the AP to wake up the AP.
  • the AP performs text verification and voiceprint verification on the first voice data.
  • the AP determines that the first voice data is the same as the text information of the wake word registered in the terminal 300 (that is, the first voice data can pass the text verification), but the first voice data fails Voiceprint verification, then the AP will not enter the sleep state immediately after receiving the verification result. Instead, the DSP delivers all voice data monitored in the first preset time to the AP, and the AP can perform text verification on all voice data monitored by the DSP in the first preset time.
  • the first voiceprint model is used to perform voiceprint verification when the voice assistant is awakened.
  • the first voiceprint model can represent the voiceprint features of the wake-up words registered in the terminal.
  • the text corresponding to the second voice data contains a preset keyword.
  • the second voice data may be voice data in which the user complains that the voice wake-up fails, such as "how to wake up”, “how not”, “not responding”, “unable to wake up”, and "voice wake up failed”.
  • the AP performs text verification on all voice data monitored by the DSP within the first preset time. If the AP recognizes the second voice data such as "how to wake up”, “how not to wake up”, “not responding", “unable to wake up”, and "voice wake up failure" at the first preset time, and at least one The text information is the same voice data as the text information of the wake-up word registered in the terminal 300, then the terminal 300 may use the first voice data received by the terminal 300 to update the first voiceprint model of the terminal 300.
  • the terminal 300 receives the first voice data in S801 and finds that the voiceprint verification of the first voice data fails. Subsequently, the terminal 300 can receive at least one voice data that passes the text verification within the first preset time, which indicates that the user repeatedly wants to voice wake up the voice assistant of the terminal 300, but the voice wake-up fails. In this case, if the terminal 300 also receives the second voice data within the first preset time, it indicates that the user is dissatisfied with the result of the voice wake-up failure.
  • the terminal 300 receives the second voice data and the voice data that passes at least one text verification within the first preset time, indicating that the user has a strong willingness to wake up the voice assistant by voice; however, it may be because the user's current physical state and the user register the wake word The state of the body is very different at the time, resulting in multiple speech failures. Of course, it may also be because the real-time situation of the noise scene in which the user is currently located is different from the real-time situation of the noise scene in which the user is registering the wake-up word, resulting in multiple voice failures. In this case, even if the first voice data fails the voiceprint check, the terminal 300 may use the received first voice data to update the first voiceprint model in the terminal 300. That is, if the terminal 300 receives the second voice data and at least one voice data matching the text of the preset wake-up word within the first preset time, the first voice data model in the terminal is updated with the first voice data. , Then execute S807.
  • the terminal 300 updates the first voiceprint model of the terminal 300 with the first voice data.
  • the terminal 300 may delete the first voice data if the terminal 300 does not receive the second voice data and at least one voice data matching the text of the preset wake-up word within the first preset time.
  • the terminal 300 deletes the first voice data.
  • the method in which the terminal 300 uses the first voice data to update the first voiceprint model in the terminal 300 may include: the terminal 300 generates a second voiceprint model according to the first voice data, and uses the second voiceprint model to replace the first voiceprint model. .
  • the method for generating the second voiceprint model by the terminal 300 according to the first voice data may refer to the method for generating a voiceprint model by the terminal in the conventional technology. This embodiment of the present application will not repeat them here.
  • the first voice data is the voice data of the user acquired by the terminal 300 in real time; therefore, the first voice data may reflect the physical state of the user and / or the real-time condition of the noise scene in which the user is located. Therefore, by using the first voice data to update the voiceprint model of the terminal 300, the voice wake-up rate of the voice wake-up performed by the terminal can be improved, and the false wake-up rate can be reduced.
  • the received first voice data is voice data sent by the user for activating the voice assistant under the strong will of the voice assistant of the voice wake-up terminal 300. Therefore, the voiceprint model of the terminal 300 is updated by using voice data that can reflect the user's true intention, which can further increase the voice wake-up rate and reduce the false wake-up rate when the terminal performs voice wake-up.
  • the received first voice data is automatically acquired by the terminal 300 during the voice wake-up process performed by the terminal 300, instead of prompting the user to manually re-register the wake-up word and receiving user input.
  • updating the voiceprint model by using the received first voice data can also simplify the process of updating the wake word.
  • the terminal 300 uses the first voice data to update the first voiceprint model, the terminal 300 uses the updated voiceprint model to perform voice wake-up, which will affect the success of voice wake-up rate.
  • the terminal 300 may determine whether the signal quality parameter of the first voice data is higher than the first voiceprint model before updating the first voiceprint model with the first voice data.
  • Two preset thresholds The signal quality parameters of the voice data are used to characterize the signal quality of the voice data.
  • the signal quality parameter of the voice data may be a signal-to-noise ratio of the voice data. If the signal quality parameter of the first voice data is higher than the second preset threshold, it means that the signal quality of the first voice data is relatively high. In this case, the terminal 300 may update the first voiceprint model by using the first voice data. If the signal quality parameter of the first voice data is lower than or equal to the second preset threshold, the terminal 300 may delete the first voice data.
  • the terminal 300 may also use the at least one voice data that matches the text of the preset wake-up word to update the first voiceprint model. Specifically, the terminal may select voice data with a signal quality parameter higher than a second preset threshold from the first voice data and at least one voice data that matches the text of the preset wake-up word; and then use the voice signal quality higher than The second preset threshold of speech data updates the first voiceprint model.
  • the first voiceprint model in the terminal 300 is updated to achieve the purpose of awakening the terminal 300 by voice.
  • the terminal 300 may perform user identity verification before executing S807. After the user authentication is passed, S807 is performed again. Specifically, after S806 and before S807, the terminal 300 may perform identity authentication on the user; if the identity authentication passes, the terminal 300 performs S807; if the identity authentication fails, the terminal 300 performs S808.
  • the method for the terminal to authenticate the user may include S901-S903. As shown in FIG. 9, after S806 shown in FIG. 8 and before S807, the method in this embodiment of the present application may further include S901-S903:
  • the terminal 300 displays an identity verification interface.
  • the authentication interface is used to receive authentication information input by a user.
  • the terminal 300 receives the authentication information input by the user on the authentication interface.
  • the terminal 300 performs user identity verification according to the identity verification information.
  • the terminal 300 updates the first voiceprint model with the first voice data, that is, the terminal 300 executes S807. If the identity authentication fails, the terminal 300 deletes the first voice data, that is, the terminal 300 executes S808.
  • the identity verification information may be any one of a digital password, a pattern password, fingerprint information, iris information, and facial feature information.
  • the aforementioned authentication interface may be any one of an interface for inputting a digital password or a pattern password, an interface for entering fingerprint information, an interface for entering iris information, and an interface for entering facial feature information.
  • the terminal 300 is the mobile phone 1000 shown in FIG. 10, the above-mentioned identity verification information is a digital password, and the above-mentioned identity verification interface is an interface for entering a digital password as an example.
  • the mobile phone 1000 can display the authentication interface 1001 shown in FIG. 10.
  • the authentication interface 1001 includes a password input box 1002 and a first prompt message "After the user authentication is passed, the mobile phone will automatically update the wake-up word" 1003.
  • the terminal 300 performs user authentication. After the user authentication is passed, the terminal updates the first voiceprint model in the terminal 300. In this way, it is possible to prevent a malicious user from using the voice of the malicious user to trigger the terminal 300 to update the first voiceprint model in the terminal 300, so as to achieve the purpose of waking the terminal 300 by malicious voice. With this solution, the voiceprint model in the terminal 300 can be prevented from being maliciously updated, and the security of the terminal 300 can be improved.
  • the new voiceprint model is directly generated by using the first voice data, and the first voiceprint model is replaced by the new voiceprint model, although the voice wakeup rate of the terminal 300 performing voice wakeup can be improved.
  • directly replacing the first voiceprint model with the voiceprint model generated based on the first voice data will greatly improve the voice wake-up rate.
  • a substantial increase in the voice wake-up rate may increase the false wake-up rate of the voice wake-up performed by the terminal 300 accordingly.
  • the above S807 may include the above S601-S603.
  • the terminal uses the first voice data to replace part of the voice data in the at least two registered voice data; instead of generating the second voiceprint model completely based on the first voice data.
  • the voice wake-up rate of the voice wake-up performed by the terminal 300 can be relatively stabilized.
  • the false wake-up rate of the voice wake-up performed by the terminal 300 can be reduced.
  • the method in this embodiment of the present application may further include S701-S702:
  • the terminal 300 may execute S603.
  • the terminal 300 may execute S703.
  • the terminal 300 deletes the second voiceprint model and the third voice data, that is, the first voiceprint model is not used to replace the second voiceprint model.
  • the large difference between the second voiceprint threshold and the first voiceprint threshold can prevent the wake-up rate of the terminal 300 from performing the voice wakeup from fluctuating greatly, affecting the user experience.
  • the foregoing terminal and the like include a hardware structure and / or a software module corresponding to performing each function.
  • the embodiments of the present application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is performed by hardware or computer software-driven hardware depends on the specific application of the technical solution and design constraints. Professional technicians can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of the embodiments of the present application.
  • each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.
  • the above integrated modules may be implemented in the form of hardware or software functional modules. It should be noted that the division of the modules in the embodiments of the present application is schematic, and is only a logical function division. In actual implementation, there may be another division manner.
  • FIG. 13 shows a possible structural diagram of a terminal involved in the foregoing embodiment.
  • the terminal 1300 includes: a storage unit 1301, an input unit 1302, and a text check.
  • the storage unit 1301 stores a preset wake-up word registered in the terminal 1300 and a first voiceprint model.
  • the first voiceprint model is used for voiceprint verification when the voice assistant is woken up.
  • the first voiceprint model represents the voiceprint characteristics of the preset wake word.
  • the input unit 1302 is used to support the terminal 1300 to perform S401, S502, S801, and S902 in the foregoing method embodiments, and / or other processes used in the technology described herein.
  • the text verification unit 1303 is configured to support the terminal 1300 to perform S402, S802, and S805 in the foregoing method embodiments, and / or other processes used in the technology described herein.
  • the voiceprint verification unit 1304 is configured to support the terminal 1300 to perform S403, S803 in the foregoing method embodiments, and / or other processes used in the technology described herein.
  • the update unit 1305 is configured to support the terminal 1300 to perform S404, S603, and S807 in the foregoing method embodiments, and / or other processes used in the technology described herein.
  • the terminal 1300 may further include: a starting unit and a determining unit.
  • the initiating unit is configured to support the terminal 1300 to perform S501, S804 in the foregoing method embodiments, and / or other processes used in the technology described herein.
  • the determining unit is configured to support the terminal 1300 to perform S503 in the foregoing method embodiment, and / or other processes used in the technology described herein.
  • the terminal 1300 may further include: an identity authentication unit 1306.
  • the identity authentication unit 1306 is configured to support the terminal 1300 to perform user identity verification on the user.
  • the identity authentication unit 1306 is configured to support the terminal 1300 to perform S903 in the foregoing method embodiment, and / or other processes used in the technology described herein.
  • the terminal 1300 may further include a display unit.
  • the display unit is configured to support the terminal 1300 to execute S901 in the foregoing method embodiment, and / or other processes used in the technology described herein.
  • the terminal 1300 may further include a replacement unit and a generation unit.
  • the replacement unit is configured to support the terminal 1300 to perform S601 in the foregoing method embodiment, and / or other processes used in the technology described herein.
  • the generating unit is configured to support the terminal 1300 to perform S602, S701 in the foregoing method embodiment, and / or other processes used in the technology described herein.
  • the terminal 1300 may further include: a deleting unit.
  • the deleting unit is configured to support the terminal 1300 to perform S405, S703, and S808 in the foregoing method embodiments, and / or other processes used in the technology described herein.
  • the terminal 1300 may further include a judging unit.
  • the judging unit is configured to support the terminal 1300 to execute S702 and S806 in the foregoing method embodiments, and / or other processes used in the technology described herein.
  • the terminal 1300 includes, but is not limited to, the unit modules listed above.
  • the terminal 300 may further include a receiving unit and a transmitting unit.
  • the receiving unit is used to receive data or instructions sent by other terminals.
  • the sending unit is used to send data or instructions to other terminals.
  • the functions that can be implemented by the above functional units also include, but are not limited to, the functions corresponding to the method steps described in the above examples. For detailed descriptions of other units of the terminal 1300, refer to the detailed description of the corresponding method steps. Examples are not repeated here.
  • FIG. 15 shows a possible structural diagram of a terminal involved in the foregoing embodiment.
  • the terminal 1500 includes a processing module 1501, a storage module 1502, and a display module 1503.
  • the processing module 1501 is configured to control and manage the actions of the terminal 1500.
  • the display module 1503 is configured to display an image generated by the processing module 1501.
  • the storage module 1502 is configured to store program codes and data of the terminal.
  • the storage module 1502 stores a preset wake-up word registered in the terminal and a first voiceprint model, where the first voiceprint model is used to perform voiceprint verification when the voice assistant is woken up, and the first The voiceprint model characterizes the voiceprint characteristics of the preset wake word.
  • the terminal 1500 may further include a communication module for supporting communication between the terminal and other network entities.
  • a communication module for supporting communication between the terminal and other network entities.
  • the processing module 1501 may be a processor or a controller.
  • the processing module 1501 may be a central processing unit (CPU), a general-purpose processor, a digital signal processor (DSP), and an application-specific integrated circuit (Application-Specific Integrated Circuit (ASIC), Field Programmable Gate Array (FPGA), or other programmable logic devices, transistor logic devices, hardware components, or any combination thereof. It may implement or execute various exemplary logical blocks, modules, and circuits described in connection with the present disclosure.
  • the processor may also be a combination that implements computing functions, such as a combination including one or more microprocessors, a combination of a DSP and a microprocessor, and so on.
  • the communication module may be a transceiver, a transceiver circuit, or a communication interface.
  • the storage module 1502 may be a memory.
  • the processing module 1501 is a processor (such as the processor 310 shown in FIG. 3)
  • the communication module includes a Wi-Fi module and a Bluetooth module (such as the communication module 360 shown in FIG. 3).
  • Communication modules such as Wi-Fi modules and Bluetooth modules can be collectively referred to as communication interfaces.
  • the storage module 1502 is a memory (an internal memory 321 as shown in FIG. 3 and an external SD card connected to the terminal 1500 through the external memory interface 320).
  • the display module 1503 is a touch screen (including the display screen 394 shown in FIG. 3)
  • the terminal provided in this embodiment of the present application may be the terminal 300 shown in FIG. 3.
  • the processor, the communication interface, the touch screen, and the memory may be coupled together through a bus.
  • An embodiment of the present application further provides a computer storage medium.
  • the computer storage medium stores computer program code.
  • the processor executes the computer program code
  • the terminal executes FIG. 4A, FIG. 5A, FIG. 6, FIG. 7, and FIG. 8.
  • the relevant method steps in any of Figures 9, 11, and 12 implement the method in the above embodiment.
  • the embodiment of the present application also provides a computer program product, which causes the computer to execute FIG. 4A, FIG. 5A, FIG. 6, FIG. 7, FIG. 9, FIG. 11, FIG. 11 and FIG. 12 when the computer program product runs on the computer.
  • the relevant method steps in any of the figures implement the method in the above embodiments.
  • the terminal 1300, the terminal 1500, the computer storage medium, or the computer program product provided in the embodiment of the present application are all used to execute the corresponding methods provided above. Therefore, for the beneficial effects that can be achieved, refer to the foregoing provided. The beneficial effects in the corresponding method are not repeated here.
  • the disclosed apparatus and method may be implemented in other ways.
  • the device embodiments described above are only schematic.
  • the division of the modules or units is only a logical function division.
  • multiple units or components may be divided.
  • the combination can either be integrated into another device, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, which may be electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may be one physical unit or multiple physical units, that is, may be located in one place, or may be distributed to multiple different places. . Some or all of the units may be selected according to actual needs to achieve the objective of the solution of this embodiment.
  • each functional unit in each embodiment of the present application may be integrated into one processing unit, or each of the units may exist separately physically, or two or more units may be integrated into one unit.
  • the above integrated unit may be implemented in the form of hardware or in the form of software functional unit.
  • the integrated unit When the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a readable storage medium.
  • the technical solution of the embodiments of the present application is essentially a part that contributes to the existing technology or all or part of the technical solution may be embodied in the form of a software product that is stored in a storage medium Included are several instructions for causing a device (which can be a single-chip microcomputer, a chip, etc.) or a processor to execute all or part of the steps of the method described in the embodiments of the present application.
  • the foregoing storage medium includes various media that can store program codes, such as a U disk, a mobile hard disk, a ROM, a RAM, a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephone Function (AREA)

Abstract

L'invention concerne un procédé de mise à jour d'une voix de réveil d'un assistant vocal par un terminal, et un terminal, lesdits procédé et terminal se rapportant au domaine technique de la commande vocale et pouvant mettre à jour un mot de réveil d'un terminal en temps réel, améliorant ainsi le taux de réveil vocal du terminal effectuant un réveil vocal, et réduisant le taux de faux réveil. La solution spécifique consiste : à recevoir, par un terminal, des premières données vocales entrées par un utilisateur ; à déterminer, par le terminal, si le texte correspondant aux premières données vocales correspond au texte d'un mot de réveil prédéfini enregistré dans le terminal ; si le texte correspondant aux premières données vocales correspond au texte du mot de réveil prédéfini, à effectuer, par le terminal, une authentification d'identité sur l'utilisateur ; et si l'authentification d'identité est réussie, à utiliser, par le terminal, les premières données vocales afin de mettre à jour un premier modèle d'empreinte vocale dans le terminal, le premier modèle d'empreinte vocale étant utilisé pour effectuer une vérification d'empreinte vocale lorsque l'assistant vocal est réveillé, et le premier modèle d'empreinte vocale représentant la caractéristique d'empreinte vocale du mot de réveil prédéfini.
PCT/CN2018/096917 2018-07-24 2018-07-24 Procédé de mise à jour de voix de réveil d'un assistant vocal par un terminal, et terminal WO2020019176A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2018/096917 WO2020019176A1 (fr) 2018-07-24 2018-07-24 Procédé de mise à jour de voix de réveil d'un assistant vocal par un terminal, et terminal
CN201880089912.7A CN111742361B (zh) 2018-07-24 2018-07-24 一种终端更新语音助手的唤醒语音的方法及终端

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/096917 WO2020019176A1 (fr) 2018-07-24 2018-07-24 Procédé de mise à jour de voix de réveil d'un assistant vocal par un terminal, et terminal

Publications (1)

Publication Number Publication Date
WO2020019176A1 true WO2020019176A1 (fr) 2020-01-30

Family

ID=69181102

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/096917 WO2020019176A1 (fr) 2018-07-24 2018-07-24 Procédé de mise à jour de voix de réveil d'un assistant vocal par un terminal, et terminal

Country Status (2)

Country Link
CN (1) CN111742361B (fr)
WO (1) WO2020019176A1 (fr)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111627449A (zh) * 2020-05-20 2020-09-04 Oppo广东移动通信有限公司 屏幕的声纹解锁方法和装置
CN111833869A (zh) * 2020-07-01 2020-10-27 中关村科学城城市大脑股份有限公司 一种应用于城市大脑的语音交互方法及系统
CN112489650A (zh) * 2020-11-26 2021-03-12 北京小米松果电子有限公司 唤醒控制方法、装置、存储介质及终端
CN113593549A (zh) * 2021-06-29 2021-11-02 青岛海尔科技有限公司 确定语音设备的唤醒率的方法及装置
WO2023202442A1 (fr) * 2022-04-18 2023-10-26 华为技术有限公司 Procédé de réveil de dispositif, dispositif électronique et support de stockage
CN117153166A (zh) * 2022-07-18 2023-12-01 荣耀终端有限公司 语音唤醒方法、设备及存储介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112417451B (zh) * 2020-11-20 2022-04-12 复旦大学 适配智能芯片分级架构的基于深度学习的恶意软件检测方法
CN117012205A (zh) * 2022-04-29 2023-11-07 荣耀终端有限公司 声纹识别方法、图形界面及电子设备
CN115312068B (zh) * 2022-07-14 2023-05-09 荣耀终端有限公司 语音控制方法、设备及存储介质
CN115376524B (zh) * 2022-07-15 2023-08-04 荣耀终端有限公司 一种语音唤醒方法、电子设备及芯片系统

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110196676A1 (en) * 2010-02-09 2011-08-11 International Business Machines Corporation Adaptive voice print for conversational biometric engine
CN106156583A (zh) * 2016-06-03 2016-11-23 深圳市金立通信设备有限公司 一种语音解锁的方法及终端
US20170140760A1 (en) * 2015-11-18 2017-05-18 Uniphore Software Systems Adaptive voice authentication system and method
CN107331400A (zh) * 2017-08-25 2017-11-07 百度在线网络技术(北京)有限公司 一种声纹识别性能提升方法、装置、终端及存储介质
CN107919961A (zh) * 2017-12-07 2018-04-17 广州势必可赢网络科技有限公司 一种基于动态码和动态声纹更新的身份认证协议及服务器
CN108231082A (zh) * 2017-12-29 2018-06-29 广州势必可赢网络科技有限公司 一种自学习声纹识别的更新方法和装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107046517A (zh) * 2016-02-05 2017-08-15 阿里巴巴集团控股有限公司 一种语音处理方法、装置和智能终端
CN106653031A (zh) * 2016-10-17 2017-05-10 海信集团有限公司 语音唤醒方法及语音交互装置
CN106961418A (zh) * 2017-02-08 2017-07-18 北京捷通华声科技股份有限公司 身份认证方法和身份认证系统
CN107799120A (zh) * 2017-11-10 2018-03-13 北京康力优蓝机器人科技有限公司 服务机器人识别唤醒方法及装置

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110196676A1 (en) * 2010-02-09 2011-08-11 International Business Machines Corporation Adaptive voice print for conversational biometric engine
US20170140760A1 (en) * 2015-11-18 2017-05-18 Uniphore Software Systems Adaptive voice authentication system and method
CN106156583A (zh) * 2016-06-03 2016-11-23 深圳市金立通信设备有限公司 一种语音解锁的方法及终端
CN107331400A (zh) * 2017-08-25 2017-11-07 百度在线网络技术(北京)有限公司 一种声纹识别性能提升方法、装置、终端及存储介质
CN107919961A (zh) * 2017-12-07 2018-04-17 广州势必可赢网络科技有限公司 一种基于动态码和动态声纹更新的身份认证协议及服务器
CN108231082A (zh) * 2017-12-29 2018-06-29 广州势必可赢网络科技有限公司 一种自学习声纹识别的更新方法和装置

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111627449A (zh) * 2020-05-20 2020-09-04 Oppo广东移动通信有限公司 屏幕的声纹解锁方法和装置
CN111627449B (zh) * 2020-05-20 2023-02-28 Oppo广东移动通信有限公司 屏幕的声纹解锁方法和装置
CN111833869A (zh) * 2020-07-01 2020-10-27 中关村科学城城市大脑股份有限公司 一种应用于城市大脑的语音交互方法及系统
CN111833869B (zh) * 2020-07-01 2022-02-11 中关村科学城城市大脑股份有限公司 一种应用于城市大脑的语音交互方法及系统
CN112489650A (zh) * 2020-11-26 2021-03-12 北京小米松果电子有限公司 唤醒控制方法、装置、存储介质及终端
CN113593549A (zh) * 2021-06-29 2021-11-02 青岛海尔科技有限公司 确定语音设备的唤醒率的方法及装置
WO2023202442A1 (fr) * 2022-04-18 2023-10-26 华为技术有限公司 Procédé de réveil de dispositif, dispositif électronique et support de stockage
CN117153166A (zh) * 2022-07-18 2023-12-01 荣耀终端有限公司 语音唤醒方法、设备及存储介质

Also Published As

Publication number Publication date
CN111742361B (zh) 2023-08-22
CN111742361A (zh) 2020-10-02

Similar Documents

Publication Publication Date Title
WO2020019176A1 (fr) Procédé de mise à jour de voix de réveil d'un assistant vocal par un terminal, et terminal
WO2021000876A1 (fr) Procédé de commande vocale, équipement électronique et système
WO2020037795A1 (fr) Procédé de reconnaissance vocale, dispositif portable et dispositif électronique
CN111369988A (zh) 一种语音唤醒方法及电子设备
CN110784830B (zh) 数据处理方法、蓝牙模块、电子设备与可读存储介质
CN110730114B (zh) 一种网络配置信息的配置方法及设备
WO2021023046A1 (fr) Procédé de commande de dispositif électronique et dispositif électronique
CN110572866B (zh) 一种唤醒锁的管理方法及电子设备
WO2021017988A1 (fr) Procédé et dispositif d'identification d'identité multi-mode
WO2021052139A1 (fr) Procédé d'entrée de geste et dispositif électronique
WO2020029094A1 (fr) Procédé de génération d'instruction de commande vocale et terminal
WO2020019355A1 (fr) Procédé de commande tactile pour dispositif vestimentaire, et système et dispositif vestimentaire
WO2021175266A1 (fr) Procédé et appareil de vérification d'identité, et dispositifs électroniques
CN112860428A (zh) 一种高能效的显示处理方法及设备
WO2020034104A1 (fr) Procédé de reconnaissance vocale, dispositif pouvant être porté et système
CN113676339B (zh) 组播方法、装置、终端设备及计算机可读存储介质
CN114422340A (zh) 日志上报方法、电子设备及存储介质
WO2022161077A1 (fr) Procédé de commande vocale et dispositif électronique
WO2020051852A1 (fr) Procédé d'enregistrement et d'affichage d'informations dans un processus de communication, et terminaux
CN113467735A (zh) 图像调整方法、电子设备及存储介质
CN109285563B (zh) 在线翻译过程中的语音数据处理方法及装置
CN114120987B (zh) 一种语音唤醒方法、电子设备及芯片系统
CN113467747B (zh) 音量调节方法、电子设备及存储介质
CN115119336A (zh) 耳机连接系统、方法、耳机、电子设备及可读存储介质
WO2021147483A1 (fr) Procédé et appareil de partage de données

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18927251

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18927251

Country of ref document: EP

Kind code of ref document: A1