CN114444042A - Electronic equipment unlocking method and device - Google Patents

Electronic equipment unlocking method and device Download PDF

Info

Publication number
CN114444042A
CN114444042A CN202011188127.2A CN202011188127A CN114444042A CN 114444042 A CN114444042 A CN 114444042A CN 202011188127 A CN202011188127 A CN 202011188127A CN 114444042 A CN114444042 A CN 114444042A
Authority
CN
China
Prior art keywords
voiceprint feature
voice data
electronic device
user
voiceprint
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011188127.2A
Other languages
Chinese (zh)
Inventor
吴大
毛峰
唐吴全
王斌
孙峰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Device Co Ltd
Original Assignee
Huawei Device Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Device Co Ltd filed Critical Huawei Device Co Ltd
Priority to CN202011188127.2A priority Critical patent/CN114444042A/en
Priority to PCT/CN2021/116073 priority patent/WO2022088963A1/en
Publication of CN114444042A publication Critical patent/CN114444042A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building

Abstract

The application provides an electronic equipment unlocking method and device, relates to the technical field of intelligent terminals, and aims to improve unlocking safety of terminal equipment. In the method, the electronic equipment can receive first voice data input by a user in a screen locking state, extract a first voiceprint feature, and still be in the screen locking state when the comparison result of the first voiceprint feature and a preset reference voiceprint feature is consistent and the first voice data contains an appointed text. The electronic device can extract a second voiceprint feature of the second voice data, and unlock the screen and execute the control instruction when the second voiceprint feature is consistent with a preset reference voiceprint feature comparison result. Therefore, when the user needs to unlock the electronic equipment, voice data can be input into the electronic equipment, the electronic equipment can respectively compare the voiceprint characteristics of the awakening words and the control commands input by the user, and the electronic equipment is unlocked when the results of two voiceprint characteristic comparisons show that the voiceprint unlocking is successful, so that the voiceprint unlocking safety can be improved.

Description

Electronic equipment unlocking method and device
Technical Field
The application relates to the technical field of intelligent terminals, in particular to an electronic equipment unlocking method and device.
Background
For the unlocking scheme of the terminal device, there are schemes based on human face, fingerprint, iris or voice in the industry at present. However, most of the existing voice-based unlocking schemes perform voiceprint recognition based on a wake-up word input by a user, so as to unlock the terminal device. However, the awakening words are short text contents, so that the accuracy is low during voiceprint recognition, and the unlocking safety requirement cannot be met. Meanwhile, attacks such as recording, voice synthesis, voice simulation and the like cannot be solved, so that the safety is low.
In order to improve unlocking safety, the unlocking scheme of the terminal equipment can be realized through manual unlocking of the user after the user inputs a wake-up word to perform voiceprint recognition. However, the scheme needs manual access of a user, the convenience of voice unlocking is lost, and the method is not intelligent enough.
Disclosure of Invention
The embodiment of the application provides an unlocking method and device for electronic equipment, which are used for reducing manual operation of a user on the basis of improving unlocking safety of terminal equipment so as to improve efficiency of an unlocking process.
In a first aspect, an embodiment of the present application provides an electronic device unlocking method. The method can be executed by the electronic device provided by the application, or by a chip with similar electronic device functions. In the method, the electronic equipment can receive first voice data input by a user in a screen locking state and extract a first voiceprint feature of the first voice data. The electronic device may be still in a screen-locked state when the comparison result of the first voiceprint feature is consistent with the preset reference voiceprint feature and the first voice data includes the designated text. The electronic device may receive second voice data input by the user. The second voice data may include a control command, and the control command may be used to trigger at least one functional requirement of the user. The electronic device may extract a second voiceprint feature of the second speech data. The electronic device may unlock the screen and execute the control instruction to trigger the at least one function requirement when the second voiceprint feature is consistent with the preset reference voiceprint feature comparison result.
Based on the scheme, when the user needs to unlock the electronic equipment, the voice data can be input into the electronic equipment, the electronic equipment can respectively compare the voiceprint characteristics of the two times of voice data input by the user, namely the awakening words and the control commands input by the user, and the electronic equipment can be unlocked when the two voiceprint characteristic comparison results show that the matching is successful. Therefore, the accuracy of the voiceprint characteristic comparison result can be improved, attacks such as simulation, recording and synthesis can be effectively avoided, and the security of voiceprint unlocking can be improved. In addition, because the voiceprint feature comparison is performed on the voice data input by the user in two times, the user does not need to manually operate, and convenience can be provided for the user. In addition, the voice data input for the second time carries the control instruction, so that the electronic equipment can execute the functions required by the user after being successfully unlocked, and the efficiency of executing a certain requirement by the user can be improved.
In one possible implementation, the electronic device may receive first registration voice data input by the user; the first registered voice data contains the specified text. The electronic device may obtain a first enrollment voiceprint feature of the first enrollment voice data; the electronic device may receive second registration voice data input by the user; the second registration voice data and the first registration voice data come from the same user; the electronic device may obtain a second enrollment voiceprint feature of the second enrollment voice data; the electronic device may store the first registered voiceprint feature and the second registered voiceprint feature as the reference voiceprint feature.
Based on the scheme, the electronic equipment can acquire the voiceprint feature of the user as the reference voiceprint feature through the first registered voice data and the second registered voice data input by the user, and the user can unlock the voiceprint of the electronic equipment.
In a possible implementation manner, the electronic device may combine the first registered voiceprint feature and the second registered voiceprint feature to obtain a third registered voiceprint feature; the electronic device may store the third registered voiceprint feature as the reference voiceprint feature.
Based on the scheme, the electronic equipment can fuse the two voiceprint characteristics so as to improve the accuracy rate of voiceprint characteristic comparison and effectively prevent attacks such as voice simulation, voice synthesis and recording.
In a possible implementation manner, the electronic device may combine the second voiceprint feature and the first voiceprint feature to obtain a third voiceprint feature; and the electronic equipment can unlock the screen when the third voiceprint feature comparison result is consistent.
Based on the scheme, the electronic equipment can combine the first voiceprint feature and the second voiceprint feature extracted according to the first voice data and the second voice data input by the user during voiceprint unlocking, and compare the first voiceprint feature and the second voiceprint feature with the stored reference voiceprint feature, so that the accuracy of voiceprint feature comparison can be improved.
In a possible implementation manner, the electronic device may acquire the first voiceprint feature of the first voice data by using a first voiceprint feature model trained in advance; the first voiceprint feature model is obtained by training according to a plurality of first voice data marked with speakers; the electronic device may acquire the second voiceprint feature of the second voice data by using a second voiceprint feature model trained in advance; the second acoustic pattern feature model is obtained by training according to a plurality of second voice data marked with speakers.
Based on the scheme, the electronic equipment can acquire the voiceprint characteristics of the voice data input by the user based on the pre-trained voiceprint characteristic model, and can quickly and accurately extract the voiceprint characteristics of the user.
The embodiment of the application provides electronic equipment, and the electronic equipment is folding screen electronic equipment. The electronic device includes: one or more processors; a memory; a plurality of application programs; and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs including instructions that, when executed by the electronic device, cause the electronic device to perform aspects of any of the possible implementations of the first aspect and the first aspect as described above.
In a third aspect, an embodiment of the present application provides a chip, where the chip is coupled to a memory in an electronic device, and is configured to call a computer program stored in the memory and execute a technical solution of any one of the first aspect and the possible design of the first aspect of the embodiment of the present application; "coupled" in the context of this application means that two elements are joined to each other either directly or indirectly.
In a fourth aspect, the present application further provides a circuit system. The circuitry may be one or more chips, such as a system-on-a-chip (SoC). The circuit system includes: at least one processing circuit; the at least one processing circuit is configured to execute the first aspect of the embodiments of the present application and any one of the possible designs of the first aspect of the embodiments of the present application.
In a fifth aspect, embodiments of the present application further provide an electronic device, where the electronic device includes a module/unit that performs the method of the first aspect or any one of the possible designs of the first aspect; these modules/units may be implemented by hardware, or by hardware executing corresponding software.
In a sixth aspect, an embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium includes a computer program, and when the computer program runs on an electronic device, the electronic device is enabled to execute a technical solution of any one of the first aspect of the present application and the first aspect of the present application as designed.
In a seventh aspect, a program product in the embodiments of the present application includes instructions, and when the program product runs on an electronic device, the electronic device is enabled to execute the technical solution of the first aspect and any possible design of the first aspect of the embodiments of the present application.
In addition, the beneficial effects of the second to seventh aspects can be seen from the beneficial effects of the first aspect, and are not described herein again.
Drawings
Fig. 1A is a schematic diagram of recording an unlocking voice of an electronic device according to an embodiment of the present application;
fig. 1B is one of unlocking schematic diagrams of an electronic device according to an embodiment of the present application;
fig. 1C is one of the unlocking schematic diagrams of the electronic device provided in the embodiment of the present application;
fig. 2 is a schematic hardware structure diagram of an electronic device according to an embodiment of the present disclosure;
fig. 3 is a schematic diagram of a software structure of an electronic device according to an embodiment of the present application;
FIG. 4 is a schematic flowchart of training a voiceprint feature model according to an embodiment of the present application;
fig. 5A is a schematic view of recording an unlocking voice of an electronic device according to an embodiment of the present application;
fig. 5B is a schematic view of recording an unlocking voice of an electronic device according to an embodiment of the present application;
FIG. 6 is a schematic flow chart illustrating fusion of voiceprint features provided by an embodiment of the present application;
fig. 7 is an exemplary flowchart of an electronic device unlocking method provided in an embodiment of the present application;
fig. 8A is one of unlocking schematic diagrams of an electronic device according to an embodiment of the present application;
fig. 8B is one of unlocking schematic diagrams of an electronic device according to an embodiment of the present application;
fig. 9 is one of unlocking schematic diagrams of an electronic device according to an embodiment of the present application;
fig. 10 is one of unlocking schematic diagrams of an electronic device according to an embodiment of the present application;
fig. 11A is one of unlocking schematic diagrams of an electronic device according to an embodiment of the present application;
fig. 11B is a schematic diagram of an electronic device lock provided in an embodiment of the application;
fig. 12 is one of unlocking schematic diagrams of an electronic device according to an embodiment of the present application;
fig. 13 is one of unlocking schematic diagrams of an electronic device according to an embodiment of the present application;
fig. 14 is one of unlocking schematic diagrams of an electronic device according to an embodiment of the present application;
fig. 15 is an exemplary flowchart of an electronic device unlocking method provided in an embodiment of the present application;
fig. 16 is a scene schematic diagram of unlocking an electronic device according to an embodiment of the present application;
fig. 17 is a block diagram of an electronic device provided in an embodiment of the present application.
Detailed Description
Currently, terminal devices can be unlocked using faces, irises, fingerprints, or voices. The sound unlocking process comprises the following steps:
referring to fig. 1A, a user may record a segment of unlocking voice in advance, and the terminal device may obtain a voiceprint feature of the user according to the unlocking voice recorded by the user and store the voiceprint feature. When the user needs to unlock the terminal device, referring to fig. 1B, the user may input the unlocking voice, and the terminal device may obtain the voiceprint feature according to the unlocking voice. The terminal device may compare the acquired voiceprint features with the stored voiceprint features, determine to unlock when the acquired voiceprint features are consistent with the stored voiceprint features, and present a main interface to the user, where the main interface includes a plurality of application icons, as specifically shown in fig. 1B.
However, in the above voiceprint unlocking scheme, the unlocking voice is generally a voice of a designated short text content such as a wakeup word, for example, only a setting voice such as "please open the screen" can be input when the unlocking voice is designated. Therefore, few voiceprint features can be acquired according to the unlocking voice. In addition, since the sound is greatly affected by the external environment, the acquired voiceprint feature may also be inaccurate when the unlock voice is recorded in advance. Therefore, the situation that the terminal device cannot be unlocked correctly occurs in the above scheme, that is, the accuracy of the mode for unlocking the terminal device based on voice is low. Moreover, since the unlock speech is a speech of a designated short text content, the above-mentioned scheme cannot solve attacks such as recording, speech synthesis, and speech simulation, and the security of unlocking is also low.
In order to improve the security of a mode for unlocking terminal equipment based on voice, at present, after a user inputs unlocking voice and compares the unlocking voice with voiceprint characteristics, if the unlocking is unsuccessful, the user may need to manually input a password or unlock information such as a fingerprint for further verification. As shown in fig. 1C, the terminal device may display that "the voice unlocking is unsuccessful, please select fingerprint unlocking or password unlocking", at which time the user may select to input a numeric password through a numeric keypad or input fingerprint information through a fingerprint identification area for decoding. However, the scheme is not intelligent enough for the user, and is more tedious, so that the characteristics of convenience and high efficiency of voiceprint unlocking are lost.
Based on this, the application provides a new electronic equipment unlocking scheme to avoid the above existing problems, and improve the safety and the high efficiency of the electronic equipment based on voice unlocking. The embodiment of the application can be applied to various types of electronic equipment, such as electronic equipment with a curved screen, a full screen, a folding screen and the like. Electronic devices such as mobile phones, tablet computers, wearable devices (e.g., watches, bracelets, smart helmets, etc.), in-vehicle devices, smart homes, Augmented Reality (AR)/Virtual Reality (VR) devices, notebook computers, ultra-mobile personal computers (UMPCs), netbooks, Personal Digital Assistants (PDAs), etc., which are not limited herein.
In the method provided by the embodiment of the application, the problem that the accuracy rate is not high enough when the voice print features are compared through the wake-up word because the wake-up word is mostly specified short text content is considered, and the problem that the safety is low due to attacks such as recording, voice synthesis or voice simulation cannot be solved, so that the accuracy rate of voice print feature comparison can be improved by comparing the voice print features input by a user for multiple times. In this embodiment of the present application, a first voiceprint feature comparison may be performed on a wake-up word input by a user to identify whether a person who speaks the wake-up word is the same as the user who is registered. When the identification result is determined to be the same person, the electronic device may perform a second voiceprint feature comparison on the command word input by the user to identify whether the person speaking the command word is the same person as the user at the time of registration. When the recognition result is determined to be the same person, the character related information contained in the command word can be unlocked and analyzed, so that the task related to the command word can be executed. Therefore, the accuracy of voiceprint feature comparison can be improved, and the safety of unlocking the electronic equipment is improved. In addition, the voiceprint feature comparison process does not need manual participation of a user, convenience can be provided for the user, and the unlocking process is more intelligent and efficient.
In the following, in order to fully understand the technical solutions provided in the embodiments of the present application, terms appearing in the embodiments of the present application are explained first.
1) And the awakening word refers to voice for fixing short text content.
2) Voiceprint recognition, which is one of the biometric recognition technologies, can also be called speaker recognition, speaker identification and speaker confirmation, and can identify a speaker by converting the voice model of the speaker into an electric signal and extracting voice characteristics through a computer technology.
3) And comparing the voiceprint features, namely extracting the voiceprint features from the voice input by the speaker and comparing the voiceprint features with the voiceprint features of the voice input by the user during registration.
The technical solutions in the embodiments of the present application will be described in detail below with reference to the drawings in the following embodiments of the present application.
The terminology used in the following examples is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of this application and the appended claims, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, such as "one or more", unless the context clearly indicates otherwise. It should also be understood that in the embodiments of the present application, "one or more" means one, two, or more than two; "and/or" describes the association relationship of the associated objects, indicating that three relationships may exist; for example, a and/or B, may represent: a alone, both A and B, and B alone, where A, B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise.
The embodiments of the present application relate to at least one, including one or more; wherein a plurality means greater than or equal to two. In addition, it is to be understood that the terms first, second, etc. in the description of the present application are used for distinguishing between the descriptions and not necessarily for describing a sequential or chronological order.
In the following embodiments, the electronic device is a mobile phone as an example. Various application programs (apps), which may be simply referred to as applications, may be installed in the mobile phone, so as to implement one or more software programs with specific functions. For example, the applications include an instant messaging type application, a video type application, an audio type application, an image capturing type application, and the like. The instant messaging application may include, for example, a short message application, WeChat (Wechat), WhatsApp Messenger, Link (Line), photo sharing (instagram), Kakao Talk, and nailing. The image capture class application may, for example, include a camera application (system camera or third party camera application). Video-like applications may include, for example, Youtube, Twitter, tremble, love art, Tencent video, and so on. Audio-like applications may include, for example, cool dog music, dried shrimp, QQ music, and so forth. The application mentioned in the following embodiments may be an application installed when the electronic device leaves a factory, or an application downloaded from a network or acquired by another electronic device during the use of the electronic device by a user.
An embodiment of the present application provides an electronic device unlocking method, which may be applied to any electronic device, and refer to fig. 2, which is a schematic diagram of a hardware structure of an electronic device provided in an embodiment of the present application. The electronic device may be a mobile phone (a folding screen mobile phone or a non-folding screen mobile phone), a tablet computer (a folding tablet computer or a non-folding tablet computer), and the like. As shown in fig. 2, the electronic device may include a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display screen 194, a Subscriber Identification Module (SIM) card interface 195, and the like.
Processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors. The controller can be a neural center and a command center of the electronic device. The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution. A memory may also be provided in processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system. The processor 110 may receive voice data input by the audio module 170, and the processor 110 may obtain a voiceprint feature of the voice data.
The USB interface 130 is an interface conforming to the USB standard specification, and may be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The charging management module 140 is configured to receive charging input from a charger. The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 and provides power to the processor 110, the internal memory 121, the external memory, the display 194, the camera 193, the wireless communication module 160, and the like.
The wireless communication function of the electronic device may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, the baseband processor, and the like. The mobile communication module 150 may provide a solution including wireless communication of 2G/3G/4G/5G, etc. applied to the electronic device. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The mobile communication module 150 may receive the electromagnetic wave from the antenna 1, filter, amplify, etc. the received electromagnetic wave, and transmit the electromagnetic wave to the modem processor for demodulation. The mobile communication module 150 may also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna 1 to radiate the electromagnetic wave.
The wireless communication module 160 may provide solutions for wireless communication applied to electronic devices, including Wireless Local Area Networks (WLANs) (such as wireless fidelity (Wi-Fi) networks), Bluetooth (BT), Global Navigation Satellite Systems (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR), and the like.
In some embodiments, antenna 1 of the electronic device is coupled to the mobile communication module 150 and antenna 2 is coupled to the wireless communication module 160 so that the electronic device can communicate with the network and other devices through wireless communication techniques. The wireless communication technology may include global system for mobile communications (GSM), General Packet Radio Service (GPRS), code division multiple access (code division multiple access, CDMA), Wideband Code Division Multiple Access (WCDMA), time-division code division multiple access (time-division code division multiple access, TD-SCDMA), Long Term Evolution (LTE), LTE, BT, GNSS, WLAN, NFC, FM, and/or IR technologies, etc. The GNSS may include a Global Positioning System (GPS), a global navigation satellite system (GLONASS), a beidou navigation satellite system (BDS), a quasi-zenith satellite system (QZSS), and/or a Satellite Based Augmentation System (SBAS).
The display screen 194 is used to display a display interface of an application and the like. The display screen 194 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), and the like. In some embodiments, the electronic device may include 1 or N display screens 194, with N being a positive integer greater than 1. In some embodiments, the display screen 194 may illuminate when the electronic device enters the wake mode, or may present a main interface to the user after the electronic device is unlocked.
The camera 193 is used to capture still images or video. In some embodiments, camera 193 may include at least one camera, such as a front camera and a rear camera.
The internal memory 121 may be used to store computer-executable program code, which includes instructions. The processor 110 executes various functional applications of the electronic device and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a program storage area and a data storage area. Wherein, the storage program area can store an operating system, software codes of at least one application program (such as an Aichi art application, a WeChat application, etc.), and the like. The data storage area may store data (e.g., images, video, etc.) generated during use of the electronic device, and the like. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (UFS), and the like. In some embodiments, internal memory 121 may store voiceprint features at the time of user enrollment and a model of the voiceprint features used to extract the speech data. For example, the internal memory 121 may store the first voiceprint feature model and the second voiceprint feature model in the embodiment of the present application.
The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to extend the memory capability of the electronic device. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as pictures, videos, and the like are saved in an external memory card.
The electronic device may implement audio functions via the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone interface 170D, and the application processor. For example, the receiver 170B and the microphone 170C may receive voice data input by a user, and the like. The receiver 170B or the microphone 170C may be turned on when the electronic device is in the sleep mode, or may be turned on when the electronic device is in the wake mode.
The sensor module 180 may include a fingerprint sensor 180H, a touch sensor 180K, and the like.
The fingerprint sensor 180H is used to collect a fingerprint. The electronic equipment can utilize the collected fingerprint characteristics to realize fingerprint unlocking, access to an application lock, fingerprint photographing, fingerprint incoming call answering and the like.
The touch sensor 180K is also referred to as a "touch panel". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is used to detect a touch operation applied thereto or nearby. The touch sensor can communicate the detected touch operation to the application processor to determine the touch event type. Visual output associated with the touch operation may be provided via the display screen 194. In other embodiments, the touch sensor 180K may be disposed on a surface of the electronic device at a different position than the display screen 194.
The keys 190 include a power-on key, a volume key, and the like. The keys 190 may be mechanical keys. Or may be touch keys. The electronic device may receive a key input, and generate a key signal input related to user settings and function control of the electronic device. The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration cues, as well as for touch vibration feedback. For example, touch operations applied to different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. Indicator 192 may be an indicator light that may be used to indicate a state of charge, a change in charge, or a message, missed call, notification, etc. The SIM card interface 195 is used to connect a SIM card. The SIM card can be brought into and out of contact with the electronic device by being inserted into the SIM card interface 195 or pulled out of the SIM card interface 195.
It will be understood that the components shown in fig. 2 are not intended to be limiting, and that the handset may include more or fewer components than those shown, or some components may be combined, some components may be separated, or a different arrangement of components may be used. In the following embodiments, the electronic device shown in fig. 2 is taken as an example for description.
Fig. 3 shows a block diagram of a software structure of an electronic device according to an embodiment of the present application. As shown in fig. 3, the software structure of the electronic device may be a layered architecture, for example, the software may be divided into several layers, each layer having a clear role and division of labor. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, an application layer, an application framework layer (FWK), an Android runtime (Android runtime) and system library, and a kernel layer from top to bottom.
The application layer may include a series of application packages. As shown in fig. 3, the application layer may include a camera, settings, a skin module, a User Interface (UI), a three-party application, and the like. The three-party application program may include WeChat, QQ, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message, etc.
The application framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer may include some predefined functions. As shown in FIG. 4, the application framework layers may include a window manager, content provider, view system, phone manager, resource manager, notification manager, and the like.
The window manager is used for managing window programs. The window manager can obtain the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like. The content provider is used to store and retrieve data and make it accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phone books, etc.
The view system includes visual controls such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.
The phone manager is used to provide communication functions of the electronic device 100. Such as management of call status (including on, off, etc.).
The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and the like.
The notification manager enables the application to display notification information in the status bar, can be used to convey notification-type messages, can disappear automatically after a short dwell, and does not require user interaction. Such as a notification manager used to inform download completion, message alerts, etc. The notification manager may also be a notification that appears in the form of a chart or scrollbar text in a status bar at the top of the system, such as a notification of a running application in the background, or a notification that appears on the screen in the form of a dialog window. For example, prompting text information in the status bar, sounding a prompt tone, vibrating the electronic device, flashing an indicator light, etc.
The Android runtime comprises a core library and a virtual machine. The Android runtime is responsible for scheduling and managing an Android system.
The core library comprises two parts: one part is a function which needs to be called by java language, and the other part is a core library of android. The application layer and the application framework layer run in a virtual machine. And executing java files of the application program layer and the application program framework layer into a binary file by the virtual machine. The virtual machine is used for performing the functions of object life cycle management, stack management, thread management, safety and exception management, garbage collection and the like.
The system library may include a plurality of functional modules. For example: surface managers (surface managers), media libraries (media libraries), three-dimensional graphics processing libraries (e.g., OpenGL ES), 2D graphics engines (e.g., SGL), and the like.
The surface manager is used to manage the display subsystem and provide fusion of 2D and 3D layers for multiple applications.
The media library supports a variety of commonly used audio, video format playback and recording, and still image files, among others. The media library may support a variety of audio-video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, and the like.
The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.
The 2D graphics engine is a drawing engine for 2D drawing.
In addition, the system library may further include a display/rendering service, an energy saving display control service. The display/rendering service is configured to determine a display digital code stream, where the display digital code stream includes display information of each pixel unit (hereinafter, may be referred to as a pixel) on a display screen, and the display information may include display brightness, display time, text information, image information, and the like. The kernel layer is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.
The hardware layer may include various sensors, such as an acceleration sensor, a gyro sensor, a touch sensor, and the like, which are referred to in the embodiments of the present application.
The following describes an exemplary workflow of software and hardware of an electronic device in conjunction with an electronic device unlocking method according to an embodiment of the present application.
In the embodiment of the present application, the input speech of the speaker needs to be verified twice, so the voiceprint feature obtaining model in the embodiment of the present application can be divided into a first voiceprint feature model and a second voiceprint feature model. The first voiceprint feature model is obtained by training according to specified voice data input by a speaker, and the second voiceprint feature model is obtained by training according to random voice data input by the speaker. Hereinafter, a method for training a first voiceprint feature model in an embodiment of the present application is first described, and with reference to fig. 4, the following steps may be included.
Step 401: first voice data of specified text content is acquired.
The first voice data of the specified text content here may be short text voice data specified by a wakeup word or the like. For example, it may be a wake-up word such as "hello art" that can wake up the electronic device. First voice data input by different speakers can be acquired, and each piece of acquired first voice data is marked with the speaker.
In one example, first speech data input by a speaker in different environments may be obtained. For example, the first voice data may be input in an environment such as inside a car, indoors, outdoors, or the like.
Step 402: and learning the first voice data marked with the speaker through a deep neural network model to obtain a first voiceprint feature model.
In the embodiment of the application, each piece of first voice data marked with a speaker can be used as an input, and the voiceprint feature of each piece of first voice data can be obtained through deep learning. Because each piece of first voice data is marked with a speaker, parameters of the first voiceprint feature model can be learned according to the obtained corresponding relation between the voiceprint features and the speakers, and therefore the first voiceprint feature model can be constructed according to the parameters of the first voiceprint feature model. Wherein the first voiceprint feature model can extract voiceprint features of the input first voice data.
Step 403: and transplanting the first voiceprint feature model obtained through training to the electronic equipment.
Through the steps 401 to 402, the first voiceprint feature model in the embodiment of the present application can be obtained through training. Therefore, the trained first voiceprint feature model can be transplanted to the electronic device through step 403, and the electronic device can extract the voiceprint feature in the first voice data input by the user by using the first voiceprint feature model.
After the method for training the first voiceprint feature model in the embodiment of the present application is introduced, the method for training the second voiceprint feature model in the embodiment of the present application is introduced below, and with reference to fig. 4, the method may include the following steps:
step 401: second speech data of the random text content is obtained.
The second voice data here may be voice data containing random text content of the control command. For example, the control commands may be "inquire about weather today", "play music", and the like. Second voice data input by different speakers can be acquired, and the speaker is labeled for each piece of second voice data.
In one example, second speech data of the speaker in a different application scenario may be obtained. Such as an in-vehicle scenario, a query encyclopedia scenario, a taxi-taking scenario, a takeaway scenario, and so forth. For example, for a car scene, the speaker may input "please turn on navigation," or "go to ground" and the like second speech data.
In another example, second speech data of the speaker in a different environment may also be obtained. Such as in-vehicle, indoor, outdoor, etc. For example, the speaker may input second voice data, such as "please turn on navigation" and "please play music", for the in-vehicle scene in the vehicle.
For another example, the speaker may input second speech data such as "how the weather is today" for the query encyclopedia scene indoors.
Step 402: and learning the second voice data marked with the speaker through the deep neural network model to obtain a second voiceprint feature model.
In this embodiment of the application, each piece of second speech data labeled with a speaker may be used as an input, and the voiceprint feature of each piece of second speech data is obtained through deep learning. Because each piece of second voice data is labeled with the speaker, the parameters of the second voiceprint feature model can be learned through the obtained corresponding relation between the voiceprint features and the speaker, and the second voiceprint feature model can be constructed according to the learned parameters of the second voiceprint feature model. Wherein, the second voiceprint feature model can extract the voiceprint feature of the input second voice data.
Step 403: and transplanting the trained second acoustic pattern feature model to the electronic equipment.
Through the steps 401 to 402, a second voiceprint feature model in the embodiment of the present application can be obtained through training. Therefore, the trained second voiceprint feature model can be transplanted to the electronic device in step 403, and the electronic device can extract the voiceprint feature in the second voice data input by the user by using the second voiceprint feature model.
After the training method of the voiceprint feature model in the embodiment of the present application is introduced, the method for unlocking an electronic device provided in the embodiment of the present application is described below with reference to the drawings.
First, the user may pre-register the voiceprint feature on the electronic device. The electronic device may compare the voiceprint features previously registered by the user with the voiceprint features extracted from the first voice data through voiceprint recognition, thereby determining whether the voiceprint recognition is successful. Hereinafter, a manner in which the user registers the voiceprint feature is described.
Referring to fig. 5A, a user may register for voiceprint features on an electronic device based on a prompt. As shown in fig. 5A, after the electronic device is activated to register the voiceprint feature, "please say, your good art" can be displayed on the display screen. The user can then say "hello, art" according to the prompt. Therefore, the electronic device can acquire the first registered voice data with the content of "hello, art" input by the user through a receiver, a microphone and other sensors. The electronic device may extract a first enrollment voiceprint feature from first enrollment voice data input by the user through a pre-trained first voiceprint feature model. In this way, the electronic device can acquire the first registered voiceprint feature of the user when the user speaks the first registered voice data of the specified short text content. The electronic device may store the first registered voiceprint feature in a memory.
Referring to fig. 5B, the electronic device may display "please say, the information of the art" through the display screen. The user can speak the 'art and information' according to the prompt. Therefore, the electronic device can acquire the second registration voice data with the content of "art, news" input by the user through a receiver, a microphone and other sensors. The electronic device may extract a second enrollment voiceprint feature from second enrollment speech data input by the user through a second pre-trained voiceprint feature model. In this way, the electronic device can acquire the second registered voiceprint feature of the user when speaking the second registered voice data of the random text content. The electronic device may store the second registered voiceprint feature in a memory.
Optionally, the electronic device may further prompt the user to input the second registration voice data for multiple times through the display screen, so that more second registration voiceprint features can be obtained, and accuracy of voiceprint feature comparison is improved. For example, after the user inputs the second registered voice data when the content is "art, issue information", the electronic device may further display prompt contents such as "please say, how the weather of the art is today", "please say, music on art spot", and the like, through the display screen. The user can input corresponding second registration voice data according to the prompt content displayed on the display screen, so that the electronic equipment can extract second registration voiceprint characteristics of the second registration voice data.
In a possible implementation manner, the electronic device may combine the first registered voiceprint feature and the second registered voiceprint feature of the user to obtain a third registered voiceprint feature. Referring to fig. 6, the electronic device obtains a first registered voiceprint feature and a second registered voiceprint feature of the user through the first voiceprint feature model and the second voiceprint feature model, merges the first registered voiceprint feature and the second registered voiceprint feature of the user according to a specified mode, and stores the merged first registered voiceprint feature and second registered voiceprint feature, that is, stores a third registered voiceprint feature.
Optionally, when the first registered voiceprint feature is merged with the second registered voiceprint feature, the merging may be simple. For example, the first registered voiceprint feature is a and the second registered voiceprint feature is B, so the first registered voiceprint feature and the second registered voiceprint feature can be combined to obtain a third registered voiceprint feature a + B. Or, weights may be respectively assigned to the first registered voiceprint feature and the second registered voiceprint feature, for example, the weight assignment of the first registered voiceprint feature is 0.4, the weight assignment of the second registered voiceprint feature is 0.6, the first registered voiceprint feature is multiplied by the corresponding weight, and the second registered voiceprint feature is multiplied by the corresponding weight, and then the two registered voiceprint features are combined.
Referring to fig. 7, a flowchart of an electronic device unlocking method provided in an embodiment of the present application is schematically illustrated, where the method may be executed by the electronic device shown in fig. 2 or fig. 3, and a flow of the method includes:
701: the electronic equipment receives first voice data input by a user and acquires a first voiceprint feature of the first voice data.
Wherein, the electronic device can receive the first voice data input by the user through the receiver 170B or the microphone 170C as shown in fig. 2. The first voice data here may be designated short text voice data such as a wakeup word or the like. For example, the user may say "hello, art", and the electronic device may receive the voice data of "hello, art" input by the user through the receiver 170B or the microphone 170C.
In the embodiment of the application, the user can input the first voice data when the display screen of the electronic device is not lighted, namely, the screen is dark. It should be understood that the receiver 170B or the microphone 170C of the electronic device may be on even if the display of the electronic device is not lit. The user can say "hello" or the like specified short text voice data.
In order to save power consumption of the electronic device, the user may trigger the electronic device to turn on the receiver 170B or the microphone 170C by touching the display screen or pressing a key of the electronic device (e.g., the key 190 shown in fig. 2) before inputting the first voice data. Referring to fig. 8A, after a screen of the electronic device is lit, a user may input first voice data.
Optionally, after the screen of the electronic device is lighted up, the electronic device may display a prompt message through the display screen to prompt the user to input the first voice data. Referring to fig. 8B, after the display screen of the electronic device is lit up by the user touching the display screen or pressing a key of the electronic device, prompt information such as "please input a voice password" may be displayed to prompt the user to input voice data to unlock the electronic device.
In one possible implementation, if the external environment is noisy and noisy, the electronic device may not receive the first voice data input by the user. Therefore, the electronic device can also display prompt information through the display screen to prompt the user to re-input the first voice data. Referring to fig. 9, the user inputs the first voice data, but the electronic device fails to receive the first voice data. Accordingly, the electronic device may display a prompt message "what you said" prompting the user to re-enter the first voice data. Optionally, if the number of times that the electronic device does not receive the first voice data input by the user reaches the specified number of times, the electronic device may prompt the user to unlock through a fingerprint or input a password to unlock, as shown in fig. 1C. The number of times specified herein may be preset, and for example, may be set to 3 times, 4 times, and the like, and the present application is not particularly limited.
After the electronic device receives first voice data input by a user, a first voiceprint feature of the first voice data can be acquired. The electronic device may perform voiceprint feature extraction on the first voice data by using the first voiceprint feature model obtained in steps 401 to 403, so as to obtain a first voiceprint feature of the first voice data.
702: and the electronic equipment compares the voiceprint characteristics of the first voice data according to the first voiceprint characteristics.
The electronic device can compare the first voiceprint feature with the stored first registered voiceprint feature of the user to determine whether the speakers are the same person.
In one example, the electronic device may determine whether the voiceprint features are matched by obtaining a cosine (cosine) distance d between the first voiceprint feature and the first registered voiceprint feature. Wherein d satisfies the following formula (1)
d=cos(xi,xj)=xi T*xjFormula (1)
Wherein x isiRepresenting the first voiceprint feature, x, extracted in step 701jThe first registered voiceprint feature obtained at the time of registration is represented, and T represents the transposition of the matrix.
In the embodiment of the present application, a relationship between the cosine distance and the voiceprint feature comparison result may be maintained in advance. For example, when the cosine distance is smaller than the specified value, it indicates that the matching is successful, and when the cosine distance is greater than or equal to the specified value, it indicates that the matching is failed. Therefore, the cosine distance can be obtained by the above formula (1) and compared with the specified value to determine whether the matching is successful, i.e., whether the speakers are the same person can be determined.
703: and the electronic equipment performs text recognition on the first voice data when the first voiceprint feature comparison result of the first voice data shows that the matching is successful.
The electronic device may perform text recognition on the first voice data upon determining that the first voiceprint feature matches the first registered voiceprint feature successfully. The electronic device can recognize whether the specified text content exists in the first voice data. The specified text content here may be a wake-up word of a short text or the like.
The electronic device may pre-store the specified textual content in memory. For example, the specified text content may contain a short wake up word such as "hello", "hello, art", or "at do".
In a possible implementation manner, if the electronic device determines that the first voiceprint feature and the first registered voiceprint feature fail to be matched, the electronic device may display a prompt message indicating that the matching fails through the display screen. Referring to fig. 10, the electronic device may display a prompt message of "unlock failure" through the display screen. Optionally, referring to fig. 10, the electronic device may prompt the user to re-input the first voice data, so that the electronic device may perform voiceprint feature comparison according to the first voice data re-input by the user.
In another possible implementation manner, if the number of times of matching failure of the first voiceprint feature comparison result of the electronic device for performing voiceprint feature comparison on the first voice data is greater than or equal to a specified threshold, the electronic device may prompt the user to unlock the electronic device by means of fingerprint unlocking, face unlocking or password unlocking. Referring to fig. 1C, the electronic device may display an interface for inputting a password or an interface for inputting a fingerprint through the display screen to prompt the user to unlock the electronic device by means of fingerprint unlocking or password unlocking. Optionally, after the user unlocks the electronic device by means of fingerprint unlocking or password unlocking, the voiceprint unlocking mode of the electronic device may be restarted. Referring to fig. 11A, after the user unlocks the electronic device by means of fingerprint unlocking or password unlocking, the user locks the electronic device. At this time, the voiceprint unlocking mode of the electronic device is restarted, and the user can input the first voice data again, so that the electronic device can perform voiceprint feature comparison on the first voice data.
In another possible implementation manner, if the number of times of the first voiceprint feature comparison result of the electronic device that performs voiceprint feature comparison on the first voice data is that the matching fails is greater than or equal to the specified threshold, the electronic device may lock the specified duration. Referring to fig. 11B, the electronic device may display a designated time period of locking on the display screen and may display the remaining time period of locking of the electronic device in a countdown manner. When the locking duration of the electronic equipment reaches the designated duration, the user can unlock the electronic equipment in a voiceprint unlocking mode, a password unlocking mode, a fingerprint unlocking mode or a face unlocking mode.
704: and the electronic equipment enters an awakening mode when the appointed text content exists in the first voice data.
The electronic device may enter a wake mode when it determines that the specified text content is present in the first speech data. In the wake-up mode, the electronic device may turn on an audio module, such as a receiver or a microphone, to record. Referring to fig. 12, if the electronic device determines that the specified text content exists in the first speech recognition result, the electronic device may light up the display screen, turn on the audio module, and display a prompt content such as "please say" or "i am there" through the display screen to prompt the user to continue inputting the control command. It should be appreciated that the electronic device is still in the locked state at this time.
The electronic device can remain in a sleep state when the electronic device determines that the specified text content is not present in the first speech data. For example, the electronic device may turn off the display screen, or the electronic device may turn off the audio module. The electronic device can display prompt information such as 'what you say' or 'i do not understand' in the display screen to prompt the user that the electronic device is not awakened and the first voice data needs to be input again so that the electronic device can perform voiceprint feature matching on the first voice data.
705: the electronic equipment receives second voice data input by a user and obtains a second voiceprint feature of the second voice data.
The electronic device may enter the wake mode after determining that the first voiceprint feature matches the first registered voiceprint feature successfully. In the wake mode, the electronic device may turn on the audio module to receive the second voice data input by the user. The second voice data here may be random voice data. For example, the second voice data may be voice data containing a control command.
In one possible implementation, if the external environment is noisy and noisy, the electronic device may not receive the second voice data input by the user. Therefore, the electronic device can also display prompt information through the display screen to prompt the user to re-input the second voice data. Referring to fig. 13, the user inputs the second voice data, but the electronic device fails to receive the second voice data. Therefore, the electronic device may display a prompt message "please say it again" to prompt the user to re-input the second voice data. Optionally, if the number of times that the electronic device does not receive the second voice data input by the user reaches the specified number of times, the electronic device may prompt the user to unlock through a fingerprint, a human face, or a password, as shown in fig. 1C.
Optionally, if the number of times that the electronic device fails to receive the second voice data reaches a specified number of times, the electronic device may enter a sleep mode. For example, the electronic device may turn off the display screen or turn off the audio module.
After the electronic device receives second voice data input by the user, a second voiceprint feature of the second voice data can be acquired. The electronic device may perform voiceprint feature extraction on the second voice data by using the second voiceprint feature model obtained in steps 401 to 403, so as to obtain a second voiceprint feature of the second voice data.
706: and the electronic equipment compares the voiceprint characteristics of the second voice data according to the second voiceprint characteristics.
The electronic device can compare the second voiceprint feature with the stored second registered voiceprint feature of the user to determine whether the speakers are the same person.
In one example, the electronic device may find the previous distance d of the second voiceprint feature from the second registered voiceprint feature by equation (1). Wherein x isiRepresenting the second acoustic line feature, x, extracted in step 705jRepresents a second registered voiceprint feature obtained at the time of registration, and T represents a transposition of the matrix.
In another example, the electronic device may combine the first voiceprint feature obtained in step 701 with the second voiceprint feature obtained in step 705 in a specified manner. It should be appreciated that the manner in which the first voiceprint feature and the second voiceprint feature are combined at this time should be the same as the manner in which the first enrolled voiceprint feature and the second enrolled voiceprint feature are combined when the user enrolls the voiceprint feature. The electronic device may compare a third voiceprint feature obtained by combining the first voiceprint feature and the second voiceprint feature with a stored third registered voiceprint feature of the user, and determine whether the speakers are the same person.
For example, when the user registers the voiceprint features, the weight of the first registered voiceprint feature is 0.4, the weight of the second registered voiceprint feature is 0.6, and the first registered voiceprint feature and the second registered voiceprint feature are combined according to the weight of the first registered voiceprint feature and the weight of the second registered voiceprint feature respectively to obtain a third registered voiceprint feature. In this case, the weight of the first voiceprint feature acquired by the electronic device in step 701 is also 0.4, and the weight of the second voiceprint feature acquired in step 705 is also 0.6. The electronic device may combine the first voiceprint feature obtained in step 701 and the second voiceprint feature obtained in step 705 according to the weight of the first voiceprint feature and the weight of the second voiceprint feature, so as to obtain a third voiceprint feature. The electronic device may compare a third voiceprint feature obtained by merging the first voiceprint feature and the second voiceprint feature with a stored third registered voiceprint feature.
The electronic device may calculate the previous distance d between the third voiceprint feature and the third registered voiceprint feature by using the formula (1). Wherein, xiExpressed in a third voiceprint feature, xjRepresents a third registered voiceprint feature obtained at the time of registration, and T represents a transposition of the matrix.
707: and when the second voiceprint feature comparison result of the second voice data shows that the matching is successful, the electronic equipment performs text recognition on the second voice data.
If the electronic device determines that the second voiceprint feature comparison result obtained by comparing the voiceprint features of the second voice data indicates that the matching is successful, the electronic device can identify whether the control instruction exists in the second voice data. The electronic device may respond to the control instruction when the control instruction is present in the second voice data.
Referring to fig. 14, the user inputs the second voice data with the content of "click music", and when the electronic device determines that the second voiceprint feature comparison result of the second voice data indicates that the matching is successful, the electronic device may perform text recognition on the second voice data. The electronic device may determine that the second voice data includes the control instruction, and therefore the electronic device may respond to the control instruction, open an application program capable of playing music, and randomly play music.
In a possible implementation manner, if the second voiceprint feature comparison result of the electronic device for the second voice data indicates that matching fails, the electronic device may display a prompt message indicating that matching fails through a display screen. Referring to fig. 10, the electronic device may display a prompt message of "unlocking failure" through the display screen, so as to enable the user to know that the voiceprint unlocking is failed this time. Alternatively, referring to fig. 10, the electronic device may prompt the user to re-enter the second voice data. The user can input the second voice data again according to the prompt of the electronic equipment to unlock the electronic equipment.
In another possible implementation manner, if the second voiceprint feature comparison result of the electronic device for the second voice data indicates that matching fails, the electronic device may end the voiceprint unlocking operation this time, and close the display screen. At this time, the user may input the first voice data, and the electronic device may compare the voiceprint characteristics of the first voice data and the second voice data input by the user through the methods shown in steps 701 to 707, and determine whether to unlock the electronic device.
In another possible implementation manner, if the second voiceprint feature comparison result obtained by the electronic device performing voiceprint feature comparison on the second voice data indicates that the number of times of matching failure is greater than or equal to a specified threshold, the electronic device may prompt the user to unlock the electronic device by means of fingerprint unlocking, face unlocking or password unlocking. Referring to fig. 1C, the electronic device may display an interface for inputting a password and/or an interface for inputting a fingerprint on the display screen.
In another possible implementation manner, if the first voiceprint feature comparison result obtained by comparing the voiceprint features of the first voice data by the electronic device indicates that the number of times of the matching failure is greater than or equal to the specified threshold, the electronic device may lock the specified duration. Referring to fig. 11B, the electronic device may display a designated time period of locking on the display screen and may display the remaining time period of locking of the electronic device in a countdown manner. When the locking duration of the electronic equipment reaches the designated duration, the user can unlock the electronic equipment in a voiceprint unlocking mode, a password unlocking mode, a fingerprint unlocking mode or a face unlocking mode.
Based on the scheme, when the user needs to unlock the electronic equipment, the voice data can be input into the electronic equipment, the electronic equipment can respectively compare the voiceprint characteristics of the two times of voice data input by the user, namely the awakening words and the control commands input by the user, and the electronic equipment can be unlocked when the two voiceprint characteristic comparison results show that the matching is successful. Therefore, the accuracy of the voiceprint feature comparison result can be improved, attacks such as simulation, recording and synthesis can be effectively avoided, and the security of voiceprint unlocking can be improved. In addition, because the voiceprint feature comparison is performed on the voice data input by the user in two times, the user does not need to manually operate, and convenience can be provided for the user.
The electronic device unlocking method provided by the embodiment of the present application is described below by specific embodiments.
Examples 1,
Referring to fig. 15, an exemplary flowchart of an unlocking method for an electronic device provided in an embodiment of the present application may include the following steps.
Step 1501: the electronic equipment acquires the awakening words input by the user through the microphone MIC. The wake-up word here may be a specified short text content set in advance, such as "hello", "hello, xiao", "hi", or "morning" or the like.
Step 1502: the electronic device performs text recognition on the wake-up word.
The text recognition is to determine whether the wake-up word is identical to the preset designated short text content.
Step 1503: the electronic device extracts a first voiceprint feature in a wake-up word input by a user.
The electronic equipment can extract the first voiceprint feature in the awakening word through the pre-trained first voiceprint feature model.
Optionally, in the embodiment of the present application, step 1502 may be performed first, and then step 1503 may be performed, or step 1503 may be performed first, and then step 1502 may be performed, or step 1502 and step 1503 may be performed simultaneously.
Step 1504: and the electronic equipment performs text comparison and voiceprint feature comparison.
Wherein the electronic device may determine whether the wake-up word input by the user is a preset specified short text content, and the electronic device may determine whether the first voiceprint feature extracted in step 1503 is the same as the voiceprint feature at the time of the user registration.
If the voiceprint feature comparison and the text comparison both pass, step 1505 may be executed, and if any one of the voiceprint feature comparison and the text comparison does not pass, the operation is ended.
Step 1505: the electronic device enters a wake-up mode and is lit. The electronic equipment can start MIC recording in the wake-up mode to acquire voice data input by a user.
Step 1506: the electronic equipment obtains the command words input by the user through microphone MIC recording.
The command word may be a word for controlling the electronic device to perform a corresponding operation. For example, words such as "open navigation", "play music", or "inquire weather" related to the control command may be used.
Step 1507: the electronic device extracts a second voiceprint feature in the command word input by the user.
The electronic device can extract the second acoustic line feature in the command word through a second acoustic line feature model trained in advance.
Step 1508: the electronic device combines the first voiceprint feature and the second voiceprint feature and verifies the first voiceprint feature and the second voiceprint feature.
The electronic device may combine the first voiceprint feature and the second voiceprint feature in a specified manner to obtain a third voiceprint feature. The electronic device may compare the third voiceprint feature obtained by the merging with the third voiceprint feature obtained during the registration to determine whether the speakers are the same person.
If the electronic device determines that the speakers are the same person, step 1509 may be performed; if the electronic device determines that the speakers are not the same person, the operation may be ended.
Step 1509: the electronic equipment identifies the command words and analyzes the tasks to be executed. The electronic device can perform ASR on the command word to determine text-related information contained in the command word.
Step 1510: and unlocking the electronic equipment and executing the task corresponding to the command word.
Alternatively, the electronic device may unlock directly after determining that the speaker is the same person in step 1508, i.e., directly perform step 1509.
Examples 2,
Referring to fig. 16, a user a is driving a vehicle and needs to turn on navigation in the electronic device. Thus, user A may say "hello, please turn on navigation". At this time, "hello" can be regarded as first voice data, and "please open navigation" can be regarded as second voice data. The microphone of the electronic device is in an on state, so that the electronic device can acquire the first voice data with the content of 'hello' input by the user through the microphone when the user A speaks the 'hello'. The electronic device may turn off the microphone and extract a first voiceprint feature in the first speech data through a pre-trained first voiceprint feature model. And comparing the first voiceprint feature with the first registration voiceprint feature obtained during registration, thereby determining that the user A and the user during registration are the same person. The electronic device may perform text recognition on the first speech data, and thus may determine that the specified text content exists in the first speech data. At this time, the electronic device may enter the wake-up mode, turn on the microphone again, acquire the second voice data with the content of "please turn on navigation" input by the user a, and extract the second voiceprint feature through the pre-trained second voiceprint feature model. The electronic device may merge the first voiceprint feature with the second voiceprint feature and compare the first voiceprint feature with a third registered voiceprint feature during registration. The electronic device again determines that user a is the same person as the user at the time of registration. At this time, the electronic device may unlock and perform text recognition on "please open navigation" input by the user. The electronic device recognizes that the user needs to open the navigation, i.e. the control command is "open navigation", so that the electronic device can open the application program which can be navigated.
As shown in fig. 17, some further embodiments of the present application disclose an electronic device 1700 that may include: includes one or more processors 1701; one or more memories 1702 and one or more displays 1703; the one or more memories 1702 store, among other things, one or more computer programs comprising instructions. Illustratively, a processor 1701 and a memory 1702 are illustrated in FIG. 17. The instructions, when executed by the one or more processors 1701, cause the electronic device 1700 to perform the steps of:
receiving first voice data input by a user in a screen locking state, and extracting first voiceprint characteristics of the first voice data; when the comparison result of the first voiceprint feature is consistent with a preset reference voiceprint feature and the first voice data comprises a designated text, the first voiceprint feature is still in a screen locking state; receiving second voice data input by the user, wherein the second voice data comprise a control instruction, and the control instruction is used for triggering at least one function requirement of the user; extracting a second voiceprint feature of the second voice data; and when the second voiceprint feature is consistent with the preset reference voiceprint feature comparison result, unlocking the screen, and executing the control command to trigger the at least one function requirement. For the specified text and the control instruction, reference may be made to the related description in the method embodiment shown in fig. 7, which is not described herein again.
In one design, the processor 1701 also performs the following steps: receiving first registration voice data input by the user; the first registered voice data contains the specified text; acquiring a first registration voiceprint feature of the first registration voice data; receiving second registration voice data input by the user; the second registration voice data and the first registration voice data come from the same user; acquiring a second registration voiceprint feature of the second registration voice data; the memory 1702 stores the first registered voiceprint feature and the second registered voiceprint feature as the reference voiceprint feature. For the first registration voiceprint feature and the second registration voiceprint feature, reference may be made to the related description in the method embodiment shown in fig. 7, and details are not repeated here.
In one design, the processor 1701 also performs the following steps: and merging the first registration voiceprint feature and the second registration voiceprint feature to obtain a third registration voiceprint feature. The memory 1702 stores the third registered voiceprint feature as the reference voiceprint feature. The description of the third voiceprint feature may refer to the related description in the method embodiment shown in fig. 7, and is not repeated here.
In one design, the processor 1701 is specifically configured to perform the following steps: combining the second voiceprint feature and the first voiceprint feature to obtain a third voiceprint feature; and when the third voiceprint feature comparison result is consistent, unlocking the screen. The description of the third voiceprint feature may refer to the related description in the method embodiment shown in fig. 7, and is not repeated here.
In one design, the processor 1701 is specifically configured to perform the following steps: acquiring the first voiceprint feature of the first voice data by adopting a pre-trained first voiceprint feature model; and acquiring the second voiceprint characteristics of the second voice data by adopting a pre-trained second voiceprint characteristic model. The description of the first voiceprint feature model and the second voiceprint feature model may refer to the related description in the method embodiment shown in fig. 7, and is not described herein again.
It should be noted that the division of the unit in the embodiment of the present application is schematic, and is only a logic function division, and there may be another division manner in actual implementation. Each functional unit in the embodiments of the present invention may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. For example, in the above embodiment, the first obtaining unit and the second obtaining unit may be the same unit or different units. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
As used in the above embodiments, the term "when …" may be interpreted to mean "if …" or "after …" or "in response to determination …" or "in response to detection …", depending on the context. Similarly, depending on the context, the phrase "at the time of determination …" or "if (a stated condition or event) is detected" may be interpreted to mean "if the determination …" or "in response to the determination …" or "upon detection (a stated condition or event)" or "in response to detection (a stated condition or event)".
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), among others.
The foregoing description, for purpose of explanation, has been described with reference to specific embodiments. However, the exemplary discussions above are not intended to be exhaustive or to limit the application to the precise forms disclosed. Many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the application and its practical application, to thereby enable others skilled in the art to best utilize the application and various embodiments with various modifications as are suited to the particular use contemplated.

Claims (12)

1. A method for unlocking an electronic device, the method comprising:
the method comprises the steps that the electronic equipment receives first voice data input by a user in a screen locking state, and first voiceprint features of the first voice data are extracted;
the electronic equipment is still in a screen locking state when the comparison result of the first voiceprint feature is consistent with a preset reference voiceprint feature and the first voice data contains a specified text;
the electronic equipment receives second voice data input by the user, wherein the second voice data comprise a control instruction, and the control instruction is used for triggering at least one function requirement of the user;
the electronic equipment extracts a second voiceprint feature of the second voice data;
and when the second voiceprint feature is consistent with the preset reference voiceprint feature comparison result, the electronic equipment unlocks the screen and executes the control command to trigger the at least one function requirement.
2. The method of claim 1, further comprising:
the electronic equipment receives first registration voice data input by the user; the first registered voice data contains the specified text;
the electronic equipment acquires a first registration voiceprint feature of the first registration voice data;
the electronic equipment receives second registration voice data input by the user; the second registration voice data and the first registration voice data come from the same user;
the electronic equipment acquires a second registration voiceprint feature of the second registration voice data;
the electronic device stores the first registered voiceprint feature and the second registered voiceprint feature as the reference voiceprint feature.
3. The method of claim 2, further comprising:
the electronic equipment combines the first registration voiceprint feature and the second registration voiceprint feature to obtain a third registration voiceprint feature;
the electronic device stores the third registered voiceprint feature as the reference voiceprint feature.
4. The method according to claim 2, wherein the electronic device performs screen unlocking when the second voiceprint feature is consistent with the preset reference voiceprint feature comparison result, and the method comprises:
the electronic equipment combines the second voiceprint feature and the first voiceprint feature to obtain a third voiceprint feature;
and the electronic equipment unlocks the screen when the third voiceprint characteristic comparison result is consistent.
5. The method according to any one of claims 1 to 4, wherein the electronic device obtains the first voiceprint feature of the first speech data by using a first voiceprint feature model trained in advance; the first voiceprint feature model is obtained by training according to a plurality of first voice data marked with speakers;
the electronic equipment acquires the second voiceprint feature of the second voice data by adopting a pre-trained second voiceprint feature model; the second acoustic pattern feature model is obtained by training according to a plurality of second voice data marked with speakers.
6. An electronic device, comprising:
one or more processors;
a memory;
a plurality of application programs;
and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs comprising instructions, which when executed by the electronic device, cause the electronic device to perform the steps of:
receiving first voice data input by a user in a screen locking state, and extracting first voiceprint characteristics of the first voice data;
when the comparison result of the first voiceprint feature is consistent with a preset reference voiceprint feature and the first voice data comprises a designated text, the first voiceprint feature is still in a screen locking state;
receiving second voice data input by the user, wherein the second voice data comprise a control instruction, and the control instruction is used for triggering at least one function requirement of the user;
extracting a second voiceprint feature of the second voice data;
and when the second voiceprint feature is consistent with the preset reference voiceprint feature comparison result, unlocking the screen, and executing the control command to trigger the at least one function requirement.
7. The electronic device of claim 6, wherein the instructions, when executed by the electronic device, cause the electronic device to further perform the steps of:
receiving first registration voice data input by the user; the first registered voice data contains the specified text;
acquiring a first registration voiceprint feature of the first registration voice data;
receiving second registration voice data input by the user; the second registration voice data and the first registration voice data come from the same user;
acquiring a second registration voiceprint feature of the second registration voice data;
storing the first registered voiceprint feature and the second registered voiceprint feature as the reference voiceprint feature.
8. The electronic device of claim 7, wherein the instructions, when executed by the electronic device, cause the electronic device to further perform the steps of:
merging the first registration voiceprint feature and the second registration voiceprint feature to obtain a third registration voiceprint feature;
storing the third registered voiceprint feature as the reference voiceprint feature.
9. The electronic device of claim 7, wherein the instructions, when executed by the electronic device, cause the electronic device to perform the steps of:
combining the second voiceprint feature and the first voiceprint feature to obtain a third voiceprint feature;
and when the third voiceprint feature comparison result is consistent, unlocking the screen.
10. The electronic device of any of claims 6-9, wherein the instructions, when executed by the electronic device, cause the electronic device to perform the steps of:
acquiring the first voiceprint feature of the first voice data by adopting a pre-trained first voiceprint feature model; the first voiceprint feature model is obtained by training according to a plurality of first voice data marked with speakers;
acquiring the second acoustic line characteristic of the second voice data by adopting a pre-trained second acoustic line characteristic model; the second acoustic pattern feature model is obtained by training according to a plurality of second voice data marked with speakers.
11. A computer-readable storage medium comprising instructions that, when executed on an electronic device, cause the electronic device to perform the method of any of claims 1-5.
12. A computer program product comprising instructions for causing an electronic device to perform the method according to any of claims 1-5 when the computer program product is run on the electronic device.
CN202011188127.2A 2020-10-30 2020-10-30 Electronic equipment unlocking method and device Pending CN114444042A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011188127.2A CN114444042A (en) 2020-10-30 2020-10-30 Electronic equipment unlocking method and device
PCT/CN2021/116073 WO2022088963A1 (en) 2020-10-30 2021-09-01 Method and apparatus for unlocking electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011188127.2A CN114444042A (en) 2020-10-30 2020-10-30 Electronic equipment unlocking method and device

Publications (1)

Publication Number Publication Date
CN114444042A true CN114444042A (en) 2022-05-06

Family

ID=81358452

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011188127.2A Pending CN114444042A (en) 2020-10-30 2020-10-30 Electronic equipment unlocking method and device

Country Status (2)

Country Link
CN (1) CN114444042A (en)
WO (1) WO2022088963A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117131491A (en) * 2023-10-27 2023-11-28 荣耀终端有限公司 Unlocking control method and related device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105701887A (en) * 2014-11-26 2016-06-22 常州峰成科技有限公司 Voiceprint lock and unlocking method thereof
CN108766441B (en) * 2018-05-29 2020-11-10 广东声将军科技有限公司 Voice control method and device based on offline voiceprint recognition and voice recognition
CN109325337A (en) * 2018-11-05 2019-02-12 北京小米移动软件有限公司 Unlocking method and device
CN109515385A (en) * 2018-12-05 2019-03-26 上海博泰悦臻电子设备制造有限公司 Prevent the method, vehicle device and vehicle of vocal print Replay Attack
CN109979438A (en) * 2019-04-04 2019-07-05 Oppo广东移动通信有限公司 Voice awakening method and electronic equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117131491A (en) * 2023-10-27 2023-11-28 荣耀终端有限公司 Unlocking control method and related device
CN117131491B (en) * 2023-10-27 2024-04-02 荣耀终端有限公司 Unlocking control method and related device

Also Published As

Publication number Publication date
WO2022088963A1 (en) 2022-05-05

Similar Documents

Publication Publication Date Title
RU2766255C1 (en) Voice control method and electronic device
CN110138959B (en) Method for displaying prompt of human-computer interaction instruction and electronic equipment
CN111819533B (en) Method for triggering electronic equipment to execute function and electronic equipment
CN111724775B (en) Voice interaction method and electronic equipment
US20220269762A1 (en) Voice control method and related apparatus
CN110866254B (en) Vulnerability detection method and electronic equipment
CN114255745A (en) Man-machine interaction method, electronic equipment and system
CN113946810A (en) Application program running method and electronic equipment
WO2022088964A1 (en) Control method and apparatus for electronic device
WO2022088963A1 (en) Method and apparatus for unlocking electronic device
CN115312068B (en) Voice control method, equipment and storage medium
CN113805771B (en) Notification reminding method, terminal equipment and computer readable storage medium
CN113380240B (en) Voice interaction method and electronic equipment
CN114465975B (en) Content pushing method, device, storage medium and chip system
CN116032942A (en) Method, device, equipment and storage medium for synchronizing cross-equipment navigation tasks
CN114765026A (en) Voice control method, device and system
CN116030817B (en) Voice wakeup method, equipment and storage medium
CN114528538A (en) Fingerprint verification method, electronic equipment and server
CN113742460A (en) Method and device for generating virtual role
CN116028534B (en) Method and device for processing traffic information
CN116679900B (en) Audio service processing method, firmware loading method and related devices
CN113470638B (en) Method for slot filling, chip, electronic device and readable storage medium
EP4293664A1 (en) Voiceprint recognition method, graphical interface, and electronic device
CN112102848B (en) Method, chip and terminal for identifying music
CN117133281A (en) Speech recognition method and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination