WO2022088963A1 - 一种电子设备解锁方法和装置 - Google Patents

一种电子设备解锁方法和装置 Download PDF

Info

Publication number
WO2022088963A1
WO2022088963A1 PCT/CN2021/116073 CN2021116073W WO2022088963A1 WO 2022088963 A1 WO2022088963 A1 WO 2022088963A1 CN 2021116073 W CN2021116073 W CN 2021116073W WO 2022088963 A1 WO2022088963 A1 WO 2022088963A1
Authority
WO
WIPO (PCT)
Prior art keywords
voiceprint feature
electronic device
voice data
registered
user
Prior art date
Application number
PCT/CN2021/116073
Other languages
English (en)
French (fr)
Inventor
吴大
毛峰
唐吴全
王斌
孙峰
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2022088963A1 publication Critical patent/WO2022088963A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building

Definitions

  • the present application relates to the technical field of intelligent terminals, and in particular, to a method and apparatus for unlocking an electronic device.
  • the unlocking scheme of the terminal device can also be implemented by manually unlocking the terminal device after the user inputs a wake-up word for voiceprint recognition.
  • the above solution requires manual access by the user, which loses the convenience of voice unlocking and is not intelligent enough.
  • Embodiments of the present application provide a method and apparatus for unlocking an electronic device, which are used to reduce the manual operation of a user on the basis of improving the unlocking security of a terminal device, so as to improve the efficiency of the unlocking process.
  • an embodiment of the present application provides a method for unlocking an electronic device.
  • the method can be executed by the electronic device provided in this application, or executed by a chip with similar functions of the electronic device.
  • the electronic device can receive the first voice data input by the user in a locked screen state, and extract the first voiceprint feature of the first voice data.
  • the electronic device may still be in a locked screen state when the first voiceprint feature is consistent with the preset reference voiceprint feature comparison result, and the first voice data includes specified text.
  • the electronic device may receive the second voice data input by the user.
  • the second voice data may contain control instructions, and the control instructions may be used to trigger at least one functional requirement of the user.
  • the electronic device may extract the second voiceprint feature of the second voice data.
  • the electronic device may unlock the screen when the second voiceprint feature is consistent with the preset reference voiceprint feature comparison result, and execute the control instruction to trigger the at least one functional requirement.
  • the electronic device when the user needs to unlock the electronic device, he can input voice data to the electronic device, and the electronic device can perform voiceprint feature comparison on the two voice data input by the user, that is, the wake-up word and the control command input by the user, respectively.
  • the electronic device can be unlocked. Therefore, the accuracy of the voiceprint feature comparison result can be improved, and attacks such as imitation, recording, and synthesis can be effectively avoided, and the security of voiceprint unlocking can be improved.
  • the two voiceprint feature comparisons are performed on the voice data input by the user, manual operation by the user is not required, which can provide convenience for the user.
  • the second input voice data carries the control command, which enables the electronic device to perform the function required by the user after the electronic device is successfully unlocked, thus improving the efficiency of the user in performing a certain requirement.
  • the electronic device may receive first registered voice data input by the user; the first registered voice data includes the specified text.
  • the electronic device can acquire the first registered voiceprint feature of the first registered voice data; the electronic device can receive the second registered voice data input by the user; the second registered voice data is the same as the first registered voice data.
  • the registered voice data is from the same user; the electronic device can acquire the second registered voiceprint feature of the second registered voice data; the electronic device can store the first registered voiceprint feature and the second registered voiceprint feature as the reference voiceprint feature.
  • the electronic device can obtain the user's voiceprint feature as a reference voiceprint feature through the first registered voice data and the second registered voice data input by the user, which is used for the user to unlock the electronic device's voiceprint.
  • the electronic device may combine the first registered voiceprint feature and the second registered voiceprint feature to obtain a third registered voiceprint feature; the electronic device may store the first registered voiceprint feature.
  • Three registered voiceprint features are used as the reference voiceprint features.
  • the electronic device can fuse the two voiceprint features to improve the accuracy of the voiceprint feature comparison, and effectively prevent attacks such as speech imitation, speech synthesis, and recording.
  • the electronic device may combine the second voiceprint feature and the first voiceprint feature to obtain a third voiceprint feature; the electronic device may display the third voiceprint feature When the feature comparison results are consistent, the screen is unlocked.
  • the electronic device can combine the first voiceprint feature and the second voiceprint feature extracted according to the first voice data and the second voice data inputted when the user performs voiceprint unlocking, and perform an analysis with the stored reference voiceprint feature.
  • the accuracy of voiceprint feature comparison can be improved.
  • the electronic device may acquire the first voiceprint feature of the first voice data by using a pre-trained first voiceprint feature model; the first voiceprint feature model is based on obtained by training a plurality of the first voice data marked with speakers; the electronic device may obtain the second voiceprint feature of the second voice data by using a pre-trained second voiceprint feature model; the The second voiceprint feature model is obtained by training a plurality of the second voice data marked with the speaker.
  • the electronic device can acquire the voiceprint feature of the voice data input by the user based on the pre-trained voiceprint feature model, and can quickly and accurately extract the user's voiceprint feature.
  • An embodiment of the present application provides an electronic device, for example, an electronic device with a folding screen.
  • An electronic device includes: one or more processors; a memory; a plurality of application programs; and one or more computer programs, wherein the one or more computer programs are stored in the memory, the one or more computer programs
  • the program includes instructions, which, when executed by the electronic device, cause the electronic device to execute the technical solution of the first aspect and any possible implementation manner of the first aspect.
  • an embodiment of the present application provides a chip, which is coupled to a memory in an electronic device, and is used to call a computer program stored in the memory and execute the first aspect of the embodiment of the present application and any possible design of the first aspect thereof.
  • "coupling” means that two components are directly or indirectly combined with each other.
  • an embodiment of the present application further provides a circuit system.
  • the circuitry may be one or more chips, such as a system-on-a-chip (SoC).
  • SoC system-on-a-chip
  • the circuit system includes: at least one processing circuit; the at least one processing circuit is configured to execute the technical solution of the first aspect and any possible design of the first aspect of the embodiments of the present application.
  • an embodiment of the present application further provides an electronic device, the electronic device includes modules/units for performing the above-mentioned first aspect or any possible design method of the first aspect; these modules/units can be configured by It can be realized by hardware, and can also be realized by executing corresponding software by hardware.
  • an embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium includes a computer program, and when the computer program is run on an electronic device, the electronic device is made to execute the first embodiment of the present application.
  • a program product of the embodiments of the present application includes instructions, when the program product is run on an electronic device, the electronic device is caused to execute the first aspect of the embodiments of the present application and any one of the first aspects thereof possible technical solutions.
  • FIG. 1A is one of the schematic diagrams of recording an electronic device unlocking voice provided by an embodiment of the present application
  • FIG. 1B is one of the schematic diagrams of unlocking an electronic device provided by an embodiment of the present application.
  • 1C is one of the schematic diagrams of unlocking an electronic device provided by an embodiment of the present application.
  • FIG. 2 is a schematic diagram of a hardware structure of an electronic device provided by an embodiment of the present application.
  • FIG. 3 is a schematic diagram of a software structure of an electronic device provided by an embodiment of the present application.
  • FIG. 4 is a schematic flowchart of training a voiceprint feature model provided by an embodiment of the present application.
  • 5A is one of the schematic diagrams of recording an electronic device unlocking voice provided by an embodiment of the present application.
  • 5B is one of the schematic diagrams of recording an electronic device unlocking voice provided by an embodiment of the present application.
  • FIG. 6 is a schematic flowchart of a fusion voiceprint feature provided by an embodiment of the present application.
  • FIG. 7 is one of exemplary flowcharts of a method for unlocking an electronic device provided by an embodiment of the present application.
  • 8A is one of the schematic diagrams of unlocking an electronic device provided by an embodiment of the present application.
  • 8B is one of the schematic diagrams of unlocking an electronic device provided by an embodiment of the present application.
  • FIG. 9 is one of the schematic diagrams of unlocking an electronic device provided by an embodiment of the present application.
  • 11A is one of the schematic diagrams of unlocking an electronic device provided by an embodiment of the present application.
  • 11B is a schematic diagram of electronic device locking provided by an embodiment of the present application.
  • FIG. 13 is one of the schematic diagrams of unlocking an electronic device provided by an embodiment of the present application.
  • FIG. 15 is one of the exemplary flowcharts of a method for unlocking an electronic device provided by an embodiment of the present application.
  • FIG. 16 is a schematic diagram of a scenario in which an electronic device is unlocked according to an embodiment of the present application.
  • FIG. 17 is a block diagram of an electronic device provided by an embodiment of the present application.
  • terminal devices can be unlocked using a face, iris, fingerprint or voice.
  • voice unlocking is as follows:
  • the user may record an unlocking voice in advance, and the terminal device may acquire the user's voiceprint feature according to the unlocking voice recorded by the user, and store the voiceprint feature.
  • the terminal device can input the aforementioned unlocking voice, and the terminal device can obtain the voiceprint feature according to the unlocking voice.
  • the terminal device can compare the acquired voiceprint features with the stored voiceprint features, and when the acquired voiceprint features are consistent with the stored voiceprint features, determine to unlock, and present the user with a main interface, which includes multiple Application icons are shown in FIG. 1B for details.
  • the unlocking voice is generally the voice of the short text content specified by the wake-up word, for example, it is specified that only setting voices such as "Please turn on the screen" can be input. Therefore, fewer voiceprint features can be obtained according to the unlocking voice.
  • the sound is greatly affected by the external environment, when the unlocking voice is recorded in advance, the acquired voiceprint features may also be inaccurate. Therefore, in the above solution, the terminal device cannot be unlocked correctly, that is to say, the method of unlocking the terminal device based on voice has a low accuracy rate.
  • the unlocking voice is the voice of the specified short text content
  • the above solution cannot solve attacks such as recording, speech synthesis and voice imitation, and the security of unlocking is also low.
  • the terminal device can display "Unsuccessful voice unlocking, please select fingerprint unlocking or password unlocking", at which time the user can choose to input a digital password through the numeric keyboard, or input fingerprint information through the fingerprint recognition area to decode.
  • the above solution is not smart enough for users, and it is also cumbersome, losing the convenience and efficiency of voiceprint unlocking.
  • the present application proposes a new solution for unlocking an electronic device to avoid the above-mentioned problems and improve the security and efficiency of unlocking an electronic device based on voice.
  • the embodiments of the present application may be applied to various types of electronic devices, for example, electronic devices having a curved screen, a full screen, a folding screen, and the like.
  • Electronic devices such as mobile phones, tablets, wearable devices (eg, watches, wristbands, smart helmets, etc.), in-vehicle devices, smart homes, augmented reality (AR)/virtual reality (VR) devices, notebooks Computers, ultra-mobile personal computers (UMPCs), netbooks, personal digital assistants (personal digital assistants, PDAs), etc., are not limited in this application.
  • the accuracy of the voiceprint feature comparison using the wake-up words is not high enough, and at the same time, it cannot solve the problem of recording, speech synthesis or speech imitation.
  • the accuracy of voiceprint feature comparison can be improved by comparing the multiple voiceprint features input by the user.
  • a first voiceprint feature comparison may be performed on the wake-up word input by the user to identify whether the person who uttered the wake-up word is the same person as the user at the time of registration.
  • the electronic device may perform a second voiceprint feature comparison on the command word input by the user to identify whether the person who uttered the command word is the same person as the registered user.
  • the text-related information contained in the command word can be unlocked and parsed, so that tasks related to the command word can be performed. Therefore, the accuracy of voiceprint feature comparison can be improved, and the security of unlocking the electronic device can be improved.
  • the voiceprint feature comparison process does not require the user to manually participate, it can provide convenience for the user and make the unlocking process more intelligent and efficient.
  • Wake-up word which refers to the voice of fixed short text content.
  • Voiceprint recognition is a kind of biometric technology, which can also be called speaker recognition, speaker identification, and speaker confirmation. It can convert human voice models into electrical signals, and then extract voice features through computer technology. , so as to identify the speaker.
  • Voiceprint feature comparison which refers to extracting voiceprint features from the voice input by the speaker and comparing with the voiceprint features of the voice input by the user during registration.
  • references in this specification to "one embodiment” or “some embodiments” and the like mean that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application.
  • appearances of the phrases “in one embodiment,” “in some embodiments,” “in other embodiments,” “in other embodiments,” etc. in various places in this specification are not necessarily All refer to the same embodiment, but mean “one or more but not all embodiments” unless specifically emphasized otherwise.
  • the terms “including”, “including”, “having” and their variants mean “including but not limited to” unless specifically emphasized otherwise.
  • At least one involved in the embodiments of the present application includes one or more; wherein, multiple refers to greater than or equal to two.
  • words such as “first” and “second” are only used for the purpose of distinguishing the description, and should not be understood as indicating or implying relative importance, nor should it be understood as indicating or implied order.
  • the electronic device is a mobile phone as an example for description.
  • Various application programs can be installed in the mobile phone, which can be referred to as application, so as to be a software program capable of realizing one or more specific functions.
  • the applications include instant messaging applications, video applications, audio applications, image capturing applications, and the like.
  • instant messaging applications for example, may include SMS applications, WeChat (WeChat), WhatsApp Messenger, Line, photo sharing (instagram), Kakao Talk, DingTalk, and the like.
  • Image capturing applications for example, may include camera applications (system cameras or third-party camera applications).
  • Video applications for example, may include applications such as Youtube, Twitter, Douyin, iQiyi, and Tencent Video.
  • Audio applications may include applications such as Kugou Music, Xiami, and QQ Music.
  • the applications mentioned in the following embodiments may be applications that have been installed when the electronic device leaves the factory, or may be applications downloaded by the user from the network or obtained from other electronic devices during the use of the electronic device.
  • An embodiment of the present application provides a method for unlocking an electronic device, and the method can be applied to any electronic device.
  • FIG. 2 a schematic diagram of a hardware structure of an electronic device provided by an embodiment of the present application is shown.
  • the electronic device may be a mobile phone (a mobile phone with a folding screen or a mobile phone with a non-folding screen), a tablet computer (a foldable tablet computer or a non-folding tablet computer), and the like. As shown in FIG.
  • the electronic device may include a processor 110, an external memory interface 120, an internal memory 121, a universal serial bus (USB) interface 130, a charge management module 140, a power management module 141, a battery 142, Antenna 1, Antenna 2, Mobile Communication Module 150, Wireless Communication Module 160, Audio Module 170, Speaker 170A, Receiver 170B, Microphone 170C, Headphone Interface 170D, Sensor Module 180, Key 190, Motor 191, Indicator 192, Camera 193, Display screen 194, and subscriber identification module (subscriber identification module, SIM) card interface 195 and so on.
  • SIM subscriber identification module
  • the processor 110 may include one or more processing units, for example, the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processor (graphics processing unit, GPU), an image signal processor (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) Wait.
  • different processing units may be independent devices, or may be integrated in one or more processors.
  • the controller can be the nerve center and command center of the electronic device. The controller can generate operation control signals according to the instruction opcode and timing signal, and complete the control of fetching and executing instructions.
  • a memory may also be provided in the processor 110 for storing instructions and data.
  • the memory in processor 110 is cache memory. This memory may hold instructions or data that have just been used or recycled by the processor 110 . If the processor 110 needs to use the instruction or data again, it can be called directly from the memory. Repeated accesses are avoided and the latency of the processor 110 is reduced, thereby increasing the efficiency of the system.
  • the processor 110 may receive the voice data input by the audio module 170, and the processor 110 may acquire the voiceprint feature of the voice data.
  • the USB interface 130 is an interface that conforms to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like.
  • the charging management module 140 is used to receive charging input from the charger.
  • the power management module 141 is used for connecting the battery 142 , the charging management module 140 and the processor 110 .
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140 and supplies power to the processor 110 , the internal memory 121 , the external memory, the display screen 194 , the camera 193 , and the wireless communication module 160 .
  • the wireless communication function of the electronic device can be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modulation and demodulation processor, the baseband processor, and the like.
  • the mobile communication module 150 can provide a wireless communication solution including 2G/3G/4G/5G etc. applied on the electronic device.
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA) and the like.
  • the mobile communication module 150 can receive electromagnetic waves from the antenna 1, filter and amplify the received electromagnetic waves, and transmit them to the modulation and demodulation processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modulation and demodulation processor, and then turn it into an electromagnetic wave for radiation through the antenna 1 .
  • the wireless communication module 160 can provide applications on electronic devices including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) networks), bluetooth (BT), global navigation satellite systems (global navigation satellite system, GNSS), frequency modulation (frequency modulation, FM), near field communication technology (near field communication, NFC), infrared technology (infrared, IR) and other wireless communication solutions.
  • WLAN wireless local area networks
  • Wi-Fi wireless fidelity
  • BT wireless fidelity
  • GNSS global navigation satellite systems
  • frequency modulation frequency modulation
  • FM near field communication technology
  • NFC near field communication technology
  • infrared technology infrared, IR
  • the antenna 1 of the electronic device is coupled with the mobile communication module 150, and the antenna 2 is coupled with the wireless communication module 160, so that the electronic device can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code Division Multiple Access (WCDMA), Time Division Code Division Multiple Access (TD-SCDMA), Long Term Evolution (LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
  • the GNSS may include a global positioning system (global positioning system, GPS), a global navigation satellite system (GLONASS), a Beidou navigation satellite system (BDS), a quasi-zenith satellite system (quasi -zenith satellite system, QZSS) and/or satellite based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite based augmentation systems
  • the display screen 194 is used to display the display interface of the application and the like.
  • Display screen 194 includes a display panel.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode or an active-matrix organic light-emitting diode (active-matrix organic light).
  • LED organic light-emitting diode
  • AMOLED organic light-emitting diode
  • FLED flexible light-emitting diode
  • Miniled MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on.
  • the electronic device may include 1 or N display screens 194 , where N is a positive integer greater than 1.
  • the display screen 194 may illuminate when the electronic device enters a wake-up mode, or may present a home interface to the user after the electronic device is unlocked.
  • Camera 193 is used to capture still images or video.
  • the camera 193 may include at least one camera, eg, a front-facing camera and a rear-facing camera.
  • Internal memory 121 may be used to store computer executable program code, which includes instructions.
  • the processor 110 executes various functional applications and data processing of the electronic device by executing the instructions stored in the internal memory 121 .
  • the internal memory 121 may include a storage program area and a storage data area.
  • the storage program area may store the operating system, and the software code of at least one application (eg, iQIYI application, WeChat application, etc.).
  • the storage data area can store data (such as images, videos, etc.) generated during the use of the electronic device.
  • the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, universal flash storage (UFS), and the like.
  • the internal memory 121 may store the voiceprint features at the time of user registration and a model for extracting the voiceprint features of the voice data.
  • the internal memory 121 may store the first voiceprint feature model and the second voiceprint feature model in this embodiment of the present application.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to realize the data storage function. Such as saving pictures, videos and other files in an external memory card.
  • the electronic device can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone jack 170D, and the application processor.
  • the receiver 170B and the microphone 170C can receive voice data and the like input by the user.
  • the receiver 170B or the microphone 170C may be turned on when the electronic device is in the sleep mode, or may be turned on when the electronic device is in the awake mode.
  • the sensor module 180 may include a fingerprint sensor 180H, a touch sensor 180K, and the like.
  • the fingerprint sensor 180H is used to collect fingerprints. Electronic devices can use the collected fingerprint characteristics to unlock fingerprints, access application locks, take photos with fingerprints, and answer incoming calls with fingerprints.
  • Touch sensor 180K also called “touch panel”.
  • the touch sensor 180K may be disposed on the display screen 194 , and the touch sensor 180K and the display screen 194 form a touch screen, also called a “touch screen”.
  • the touch sensor 180K is used to detect a touch operation on or near it.
  • the touch sensor can pass the detected touch operation to the application processor to determine the type of touch event.
  • Visual output related to touch operations may be provided through display screen 194 .
  • the touch sensor 180K may also be disposed on the surface of the electronic device, which is different from the location where the display screen 194 is located.
  • the keys 190 include a power-on key, a volume key, and the like. Keys 190 may be mechanical keys. It can also be a touch key.
  • the electronic device may receive key input and generate key signal input related to user settings and function control of the electronic device.
  • Motor 191 can generate vibrating cues.
  • the motor 191 can be used for vibrating alerts for incoming calls, and can also be used for touch vibration feedback. For example, touch operations acting on different applications (such as taking pictures, playing audio, etc.) can correspond to different vibration feedback effects.
  • the indicator 192 can be an indicator light, which can be used to indicate the charging state, the change of the power, and can also be used to indicate a message, a missed call, a notification, and the like.
  • the SIM card interface 195 is used to connect a SIM card. The SIM card can be inserted into the SIM card interface 195 or pulled out from the SIM card interface 195 to achieve contact and separation with the electronic device.
  • FIG. 2 do not constitute a specific limitation on the mobile phone, and the mobile phone may also include more or less components than those shown in the figure, or combine some components, or separate some components, or different components. component layout.
  • the electronic device shown in FIG. 2 is used as an example for description.
  • FIG. 3 shows a block diagram of a software structure of an electronic device provided by an embodiment of the present application.
  • the software structure of an electronic device can be a layered architecture, for example, the software can be divided into several layers, and each layer has a clear role and division of labor. Layers communicate with each other through software interfaces.
  • the Android system is divided into four layers, which are, from top to bottom, an application layer, an application framework layer (framework, FWK), an Android runtime (Android runtime) and system libraries, and a kernel layer.
  • the application layer can include a series of application packages. As shown in FIG. 3, the application layer may include cameras, settings, skin modules, user interface (UI), third-party applications, and the like. Among them, the three-party applications can include WeChat, QQ, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message and so on.
  • UI user interface
  • the three-party applications can include WeChat, QQ, gallery, calendar, call, map, navigation, WLAN, Bluetooth, music, video, short message and so on.
  • the application framework layer provides an application programming interface (application programming interface, API) and a programming framework for applications in the application layer.
  • the application framework layer can include some predefined functions. As shown in Figure 4, the application framework layer may include window managers, content providers, view systems, telephony managers, resource managers, notification managers, and the like.
  • a window manager is used to manage window programs.
  • the window manager can get the size of the display screen, determine whether there is a status bar, lock the screen, take screenshots, etc.
  • Content providers are used to store and retrieve data and make these data accessible to applications.
  • the data may include video, images, audio, calls made and received, browsing history and bookmarks, phone book, etc.
  • the view system includes visual controls, such as controls for displaying text, controls for displaying pictures, and so on. View systems can be used to build applications.
  • a display interface can consist of one or more views.
  • the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.
  • the phone manager is used to provide the communication function of the electronic device 100 .
  • the management of call status including connecting, hanging up, etc.).
  • the resource manager provides various resources for the application, such as localization strings, icons, pictures, layout files, video files and other resources.
  • the notification manager enables applications to display notification information in the status bar, which can be used to convey notification-type messages, and can disappear automatically after a brief pause without user interaction. For example, the notification manager is used to notify download completion, message reminders, etc.
  • the notification manager can also display notifications in the status bar at the top of the system in the form of graphs or scroll bar text, such as notifications of applications running in the background, and notifications on the screen in the form of dialog windows. For example, text information is prompted in the status bar, a prompt sound is issued, the electronic device vibrates, and the indicator light flashes.
  • the Android runtime includes core libraries and a virtual machine. Android runtime is responsible for scheduling and management of the Android system.
  • the core library consists of two parts: one is the function functions that the java language needs to call, and the other is the core library of Android.
  • the application layer and the application framework layer run in virtual machines.
  • the virtual machine executes the java files of the application layer and the application framework layer as binary files.
  • the virtual machine is used to perform functions such as object lifecycle management, stack management, thread management, safety and exception management, and garbage collection.
  • a system library can include multiple functional modules. For example: surface manager (surface manager), media library (media library), 3D graphics processing library (eg: OpenGL ES), 2D graphics engine (eg: SGL), etc.
  • surface manager surface manager
  • media library media library
  • 3D graphics processing library eg: OpenGL ES
  • 2D graphics engine eg: SGL
  • the Surface Manager is used to manage the display subsystem and provides a fusion of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of a variety of commonly used audio and video formats, as well as still image files.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, compositing, and layer processing.
  • 2D graphics engine is a drawing engine for 2D drawing.
  • the system library can also include display/rendering services, energy-saving display control services.
  • the display/rendering service is used to determine the display digital stream, and the display digital stream includes display information of each pixel unit (hereinafter referred to as pixel) on the display screen, and the display information may include display brightness, display time, text information , image information, etc.
  • the kernel layer is the layer between hardware and software. The kernel layer contains at least display drivers, camera drivers, audio drivers, and sensor drivers.
  • the hardware layer may include various types of sensors, for example, an acceleration sensor, a gyroscope sensor, and a touch sensor involved in the embodiments of the present application.
  • the input speech of the speaker needs to be verified twice, so the voiceprint feature acquisition model in the embodiment of the present application can be divided into a first voiceprint feature model and a second voiceprint feature model.
  • the first voiceprint feature model is obtained by training according to the specified voice data input by the speaker
  • the second voiceprint feature model is obtained by training according to the random voice data input by the speaker.
  • the training method of the first voiceprint feature model in the embodiment of the present application is first introduced. Referring to FIG. 4 , the training method may include the following steps.
  • Step 401 Acquire first voice data of specified text content.
  • the first voice data specifying the text content here may be short text voice data specified by a wake-up word or the like.
  • a wake-up word such as "Hello Xiaoyi” that can wake up the electronic device.
  • the first voice data input by different speakers can be acquired, and each piece of acquired first voice data has been marked with the speaker.
  • the first speech data input by the speaker in different environments may be obtained.
  • the first voice data may be input in an environment such as a vehicle, indoors, or outdoors.
  • Step 402 Learning the first voice data marked with the speaker through the deep neural network model to obtain a first voiceprint feature model.
  • each piece of first voice data marked with a speaker may be used as input, and a voiceprint feature of each piece of first voice data may be obtained through deep learning. Since each piece of first voice data is marked with a speaker, the parameters of the first voiceprint feature model can be learned according to the obtained correspondence between the voiceprint feature and the speaker, so that the first voiceprint feature model can be parameters to construct the first voiceprint feature model. Wherein, the first voiceprint feature model can extract the voiceprint feature of the input first voice data.
  • Step 403 Transplant the first voiceprint feature model obtained by training to the electronic device.
  • the first voiceprint feature model in the voiceprint feature models in the embodiment of the present application can be obtained by training. Therefore, the trained first voiceprint feature model can be transplanted to the electronic device through step 403, and the electronic device can use the first voiceprint feature model to extract the voiceprint feature from the first voice data input by the user.
  • the training method of the first voiceprint feature model in the embodiment of the present application is introduced, the following describes the training method of the second voiceprint feature model in the embodiment of the present application. Referring to FIG. 4 , the following steps may be included:
  • Step 401 Acquire second voice data of random text content.
  • the second voice data here may be voice data containing random text content of the control command.
  • it can be a control command such as "query today's weather", "play music” and so on.
  • the second voice data input by different speakers can be obtained, and the speaker can be marked for each piece of second voice data.
  • the second speech data of the speaker in different application scenarios can be obtained.
  • car scenarios encyclopedia query scenarios, taxi-hailing scenarios, take-out scenarios and other scenarios.
  • the speaker may input second voice data such as "Please turn on the navigation" or "Go to place A”.
  • the second voice data of the speaker in different environments can also be acquired.
  • the speaker can input second voice data such as "Please turn on the navigation", “Please play music”, etc. for the vehicle scene in the car.
  • the speaker can enter second speech data such as "how is the weather today" for the query encyclopedia scene when indoors.
  • Step 402 Learning the second voice data marked with the speaker through the deep neural network model to obtain a second voiceprint feature model.
  • each piece of second voice data marked with a speaker may be used as input, and a voiceprint feature of each piece of second voice data may be obtained through deep learning. Since each piece of second voice data is marked with a speaker, the parameters of the second voiceprint feature model can be learned through the obtained correspondence between the voiceprint feature and the speaker, so that the parameters of the second voiceprint feature model can be learned according to the learned second voiceprint feature.
  • the parameters of the voiceprint feature model are used to construct a second voiceprint feature model. Wherein, the second voiceprint feature model can extract the voiceprint feature of the input second voice data.
  • Step 403 Transplant the second voiceprint feature model obtained by training to the electronic device.
  • the second voiceprint feature model in the voiceprint feature model in the embodiment of the present application can be obtained by training. Therefore, the trained second voiceprint feature model can be transplanted to the electronic device through step 403, and the electronic device can use the second voiceprint feature model to extract the voiceprint feature from the second voice data input by the user.
  • the user can pre-register the voiceprint feature on the electronic device.
  • the electronic device may compare the voiceprint feature pre-registered by the user with the voiceprint feature extracted from the first voice data through voiceprint recognition, so as to determine whether the voiceprint recognition is successful.
  • the method of registering the voiceprint feature of the user is introduced.
  • the user can perform voiceprint feature registration on the electronic device according to the prompt.
  • the electronic device can display “please say, hello Xiaoyi” through the display screen.
  • the user can say "Hello, Xiaoyi” according to the prompt. Therefore, the electronic device can obtain the first registered voice data whose content input by the user is "Hello, Xiaoyi” through sensors such as a receiver and a microphone.
  • the electronic device may extract the first registered voiceprint feature from the first registered voice data input by the user by using the pre-trained first voiceprint feature model. In this way, the electronic device can acquire the first registered voiceprint feature of the user when he speaks the first registered voice data of the specified short text content.
  • the electronic device may store the first registered voiceprint feature in memory.
  • the electronic device can display "please say, Xiaoyi sends a message" through the display screen.
  • the user can say “Xiaoyi, send a message” according to the prompt. Therefore, the electronic device can obtain the second registered voice data input by the user as "Xiaoyi, send a message" through sensors such as a receiver and a microphone.
  • the electronic device can extract the second registered voiceprint feature from the second registered voice data input by the user by using the pre-trained second voiceprint feature model. In this way, the electronic device can acquire the second registered voiceprint feature when the user speaks the second registered voice data of random text content.
  • the electronic device may store the second registered voiceprint feature in memory.
  • the electronic device may also prompt the user to input the second registered voice data multiple times through the display screen, so as to obtain more second registered voiceprint features and improve the accuracy of voiceprint feature comparison.
  • the electronic device can also display "Please say, how is Xiaoyi's weather today", "Please say, Xiaoyi put Click Music” and other prompts.
  • the user can input the corresponding second registered voice data according to the prompt content displayed on the display screen, so that the electronic device can extract the second registered voiceprint feature of the second registered voice data.
  • the electronic device may combine the user's first registered voiceprint feature with the second registered voiceprint feature to obtain a third registered voiceprint feature.
  • the electronic device obtains the user's first registered voiceprint feature and the second registered voiceprint feature through the first voiceprint feature model and the second voiceprint feature model, and uses the user's first registered voiceprint feature and second registered voiceprint feature.
  • the registered voiceprint features are combined in a specified manner, and the combined first registered voiceprint feature and the second registered voiceprint feature are stored, that is, the third registered voiceprint feature is stored.
  • the first registered voiceprint feature and the second registered voiceprint feature when the first registered voiceprint feature and the second registered voiceprint feature are combined, it may be a simple combination.
  • the first registered voiceprint feature is A
  • the second registered voiceprint feature is B. Therefore, the first registered voiceprint feature and the second registered voiceprint feature can be combined to obtain the third registered voiceprint feature A+B.
  • weights may also be assigned to the first registered voiceprint feature and the second registered voiceprint feature, for example, the weight of the first registered voiceprint feature is assigned as 0.4, the weight of the second registered voiceprint feature is assigned as 0.6, and the first registered voiceprint feature is assigned a weight of 0.4.
  • a registered voiceprint feature is multiplied by the corresponding weight
  • the second registered voiceprint feature is multiplied by the corresponding weight, and then merged.
  • FIG. 7 a schematic flowchart of a method for unlocking an electronic device provided by an embodiment of the present application, the method may be executed by the electronic device shown in FIG. 2 or FIG. 3 , and the process of the method includes:
  • Step 701 The electronic device receives the first voice data input by the user, and acquires the first voiceprint feature of the first voice data.
  • the electronic device may receive the first voice data input by the user through the receiver 170B or the microphone 170C as shown in FIG. 2 .
  • the first voice data here may be specified short text voice data, such as wake-up words and the like.
  • the user can say "Hello, Xiaoyi”
  • the electronic device can receive the voice data of "Hello, Xiaoyi” input by the user through the receiver 170B or the microphone 170C.
  • the user may input the first voice data when the display screen of the electronic device is not lit, that is, the screen is black. It should be understood that the receiver 170B or the microphone 170C of the electronic device may be on even if the display screen of the electronic device is not turned on. The user can say "hello" and other designated short text voice data.
  • the user can trigger the electronic device to turn on the receiver 170B or the microphone 170C by touching the display screen or pressing the button of the electronic device (the button 190 shown in FIG. 2 ) before inputting the first voice data.
  • the user can input the first voice data.
  • the electronic device may display prompt information on the display screen to prompt the user to input the first voice data.
  • prompt information such as “Please enter the voice password” can be displayed, prompting the user to input voice data to unlock the electronic device.
  • the electronic device may not receive the first voice data input by the user. Therefore, the electronic device can also display prompt information through the display screen to prompt the user to re-input the first voice data.
  • the electronic device may display prompt information of "what did you say", prompting the user to re-input the first voice data.
  • the electronic device may prompt the user to unlock by fingerprint or input a password, as shown in FIG. 1C .
  • the specified number of times here may be preset, for example, may be set to 3 times, 4 times, etc., which is not specifically limited in this application.
  • the electronic device may acquire the first voiceprint feature of the first voice data.
  • the electronic device may use the first voiceprint feature model obtained through steps 401 to 403 to perform voiceprint feature extraction on the first voice data to obtain the first voiceprint feature of the first voice data.
  • Step 702 The electronic device performs voiceprint feature comparison on the first voice data according to the first voiceprint feature.
  • the electronic device may compare the first voiceprint feature with the stored first registered voiceprint feature of the user to determine whether the speaker is the same person.
  • the electronic device may determine whether the above-mentioned voiceprint features match by obtaining a cosine distance d between the first voiceprint feature and the first registered voiceprint feature. where d satisfies the following formula (1):
  • x i represents the first voiceprint feature extracted in step 701
  • xj represents the first registered voiceprint feature obtained during registration
  • T represents the transpose of the matrix
  • a relationship between the cosine distance and the voiceprint feature comparison result may be maintained in advance. For example, when the cosine distance is less than the specified value, the matching is successful, and when the consine distance is greater than or equal to the specified value, the matching fails. Therefore, the cosine distance can be obtained by the above formula (1), and it can be compared with the specified value to judge whether the matching is successful, that is, it can be judged whether the speaker is the same person.
  • Step 703 The electronic device performs text recognition on the first voice data when the first voiceprint feature comparison result of the first voice data indicates that the matching is successful.
  • the electronic device may perform text recognition on the first voice data when it is determined that the first voiceprint feature is successfully matched with the first registered voiceprint feature.
  • the electronic device can identify whether the specified text content exists in the first voice data.
  • the specified text content here may be a wake-up word of a short text, or the like.
  • the electronic device may pre-store the specified text content in the memory.
  • the specified text content may contain wake words with shorter text content such as "Hello”, “Hello, Xiaoyi” or "Are you there".
  • the electronic device may display a prompt message of the matching failure through the display screen.
  • the electronic device can display a prompt message of "unlock failure" through the display screen.
  • the electronic device can prompt the user to re-input the first voice data, so that the electronic device can perform voiceprint feature comparison according to the first voice data re-input by the user.
  • the electronic device may prompt the user to pass the fingerprint Unlock electronic devices by unlocking, face unlocking or password unlocking.
  • the electronic device may display an interface for inputting a password or an interface for inputting a fingerprint through the display screen, which is used to prompt the user to unlock the electronic device through fingerprint unlocking or password unlocking.
  • the voiceprint unlocking method of the electronic device can be restarted. Referring to FIG.
  • the user locks the electronic device.
  • the voiceprint unlocking mode of the electronic device is restarted, and the user can re-input the first voice data, so that the electronic device can perform voiceprint feature comparison on the first voice data.
  • the electronic device can lock for a specified duration.
  • the electronic device may display the specified locking duration on the display screen, and may display the remaining locking duration of the electronic device in a countdown manner.
  • the user can unlock the electronic device by means of voiceprint unlocking, password unlocking, fingerprint unlocking or face unlocking.
  • Step 704 The electronic device enters a wake-up mode when the specified text content exists in the first voice data.
  • the electronic device When the electronic device determines that the specified text content exists in the first voice data, it can enter the wake-up mode. Wherein, in the wake-up mode, the electronic device can turn on an audio module, such as a receiver or a microphone, for recording. Referring to FIG. 12 , if the electronic device determines that the specified text content exists in the first speech recognition result, the electronic device can light up the display screen, turn on the audio module, and display prompts such as “please say” or “where am I” through the display screen content to prompt the user to continue entering control commands. It should be understood that the electronic device is still in a locked state at this time.
  • an audio module such as a receiver or a microphone
  • the electronic device may maintain a sleep state. For example, the electronic device can turn off the display screen, or the electronic device can turn off the audio module.
  • the electronic device may display prompt information such as "what did you say” or "I don't understand” on the display screen, prompting the user that the electronic device has not been awakened and the first voice data needs to be re-entered so that the electronic device can perform the first voice data processing. Voiceprint feature matching.
  • Step 705 The electronic device receives the second voice data input by the user, and acquires the second voiceprint feature of the second voice data.
  • the electronic device may enter the wake-up mode.
  • the electronic device can turn on the audio module for receiving the second voice data input by the user.
  • the second voice data here may be random voice data.
  • the second voice data may be voice data containing control commands.
  • the electronic device may not receive the second voice data input by the user. Therefore, the electronic device can also display prompt information through the display screen to prompt the user to re-input the second voice data.
  • the electronic device may display the prompt information of "please say one more time" to prompt the user to re-input the second voice data.
  • the electronic device may prompt the user to unlock by fingerprint, face unlock or password, as shown in FIG. 1C .
  • the electronic device may enter a sleep mode. For example, the electronic device can turn off the display screen or turn off the audio module.
  • the electronic device may acquire the second voiceprint feature of the second voice data.
  • the electronic device may use the second voiceprint feature model obtained through steps 401 to 403 to perform voiceprint feature extraction on the second voice data to obtain the second voiceprint feature of the second voice data.
  • Step 706 The electronic device performs voiceprint feature comparison on the second voice data according to the second voiceprint feature.
  • the electronic device may compare the second voiceprint feature with the stored second registered voiceprint feature of the user to determine whether the speaker is the same person.
  • the electronic device may obtain the pre-distance d between the second voiceprint feature and the second registered voiceprint feature by using formula (1).
  • x i represents the second voiceprint feature extracted in step 705
  • xj represents the second registered voiceprint feature obtained during registration
  • T represents the transpose of the matrix.
  • the electronic device may combine the first voiceprint feature acquired in step 701 with the second voiceprint feature acquired in step 705 in a specified manner. It should be understood that the manner of merging the first voiceprint feature and the second voiceprint feature at this time should be the same as the manner of merging the first registered voiceprint feature and the second registered voiceprint feature when the user registers the voiceprint feature.
  • the electronic device may compare the third voiceprint feature obtained by combining the first voiceprint feature and the second voiceprint feature with the stored third registered voiceprint feature of the user to determine whether the speaker is the same person.
  • the weight of the first registered voiceprint feature is 0.4, and the weight of the second registered voiceprint feature is 0.6.
  • the weight combines the first registered voiceprint feature and the second registered voiceprint feature to obtain a third registered voiceprint feature.
  • the weight of the first voiceprint feature acquired by the electronic device in step 701 is also 0.4
  • the weight of the second voiceprint feature acquired in step 705 is also 0.6.
  • the electronic device may combine the first voiceprint feature obtained in step 701 with the second voiceprint feature obtained in step 705 according to the weight of the first voiceprint feature and the second voiceprint feature to obtain a third voiceprint feature.
  • the electronic device may compare the third voiceprint feature obtained by combining the first voiceprint feature and the second voiceprint feature with the stored third registered voiceprint feature.
  • the electronic device can obtain the pre-distance d between the third voiceprint feature and the third registered voiceprint feature by calculating the above formula (1).
  • x i represents the third voiceprint feature
  • x j represents the third registered voiceprint feature obtained during registration
  • T represents the transpose of the matrix.
  • Step 707 The electronic device performs text recognition on the second voice data when the second voiceprint feature comparison result of the second voice data indicates that the matching is successful.
  • the electronic device may identify whether there is a control instruction in the second voice data. When there is a control instruction in the second voice data, the electronic device may respond to the control instruction.
  • the user input content is the second voice data of “put some music”
  • the electronic device determines that the second voiceprint feature comparison result of the second voice data indicates that the matching is successful
  • the electronic device can perform the second voice data on the second voice data. text recognition.
  • the electronic device can determine that there is a control command in the second voice data, so the electronic device can respond to the control command, open an application program that can play music, and play music randomly.
  • the electronic device may display a prompt message of the matching failure through the display screen.
  • the electronic device may display a prompt message "unlocking failed" on the display screen, which is used to let the user know that the voiceprint unlocking has failed this time.
  • the electronic device may prompt the user to re-input the second voice data. The user can re-input the second voice data according to the prompt of the electronic device to unlock the electronic device.
  • the electronic device may end the voiceprint unlocking operation and turn off the display screen.
  • the user can input the first voice data, and the electronic device can perform voiceprint feature comparison on the first voice data and the second voice data input by the user through the methods shown in steps 701 to 707, and determine whether the electronic device to unlock.
  • the electronic device may prompt the user to pass the fingerprint Unlock electronic devices by unlocking, face unlocking or password unlocking. Referring to FIG. 1C , the electronic device may display an interface for entering a password and/or an interface for entering a fingerprint on the display screen.
  • the electronic device can lock for a specified duration.
  • the electronic device may display the specified locking duration on the display screen, and may display the remaining locking duration of the electronic device in a countdown manner.
  • the user can unlock the electronic device by means of voiceprint unlocking, password unlocking, fingerprint unlocking or face unlocking.
  • the electronic device when the user needs to unlock the electronic device, he can input voice data to the electronic device, and the electronic device can perform voiceprint feature comparison on the two voice data input by the user, that is, the wake-up word and the control command input by the user, respectively.
  • the electronic device can be unlocked. Therefore, the accuracy of the voiceprint feature comparison result can be improved, and attacks such as imitation, recording, and synthesis can be effectively avoided, and the security of voiceprint unlocking can be improved.
  • the two voiceprint feature comparisons are performed on the voice data input by the user, manual operation by the user is not required, which can provide convenience for the user.
  • an exemplary flowchart of a method for unlocking an electronic device may include the following steps.
  • Step 1501 The electronic device obtains the wake-up word input by the user through the microphone MIC.
  • the wake-up word here can be a preset specified short text content, such as "Hello”, “Hello, Xiaoyi", “Hi” or “Good morning” and so on.
  • Step 1502 The electronic device performs text recognition on the wake word.
  • the text recognition is to determine whether the wake word is the same as the preset specified short text content.
  • Step 1503 The electronic device extracts the first voiceprint feature in the wake-up word input by the user.
  • the electronic device can extract the first voiceprint feature in the wake-up word by using the pre-trained first voiceprint feature model.
  • step 1502 may be performed first and then step 1503 may be performed, or step 1503 may be performed first and then step 1502 may be performed, or step 1502 and step 1503 may be performed simultaneously.
  • Step 1504 The electronic device performs text comparison and voiceprint feature comparison.
  • the electronic device can determine whether the wake-up word input by the user is a preset specified short text content, and the electronic device can determine whether the first voiceprint feature extracted in step 1503 is the same as the voiceprint feature during user registration.
  • step 1505 may be executed, and if either of the voiceprint feature comparison and the text comparison fails, the operation ends.
  • Step 1505 The electronic device enters the wake-up mode and turns on the screen. Wherein, the electronic device can start the MIC recording in the wake-up mode to obtain the voice data input by the user.
  • Step 1506 The electronic device acquires the command word input by the user through the microphone MIC recording.
  • the command word here may be a word for controlling the electronic device to perform a corresponding operation.
  • it can be words related to control commands, such as "turn on navigation", “play some music” or "check the weather”.
  • Step 1507 The electronic device extracts the second voiceprint feature in the command word input by the user.
  • the electronic device can extract the second voiceprint feature in the command word through the pre-trained second voiceprint feature model.
  • Step 1508 The electronic device merges the first voiceprint feature and the second voiceprint feature, and verifies.
  • the electronic device may combine the first voiceprint feature and the second voiceprint feature in a specified manner to obtain a third voiceprint feature.
  • the electronic device may compare the third voiceprint feature obtained by combining with the third voiceprint feature obtained during registration to determine whether the speaker is the same person.
  • step 1509 may be performed; if the electronic device determines that the speaker is not the same person, this operation may be ended.
  • Step 1509 The electronic device recognizes the command word and parses the task to be performed.
  • the electronic device may perform ASR on the command word to determine the text-related information contained in the command word.
  • Step 1510 The electronic device is unlocked and the task corresponding to the command word is executed.
  • step 1508 the electronic device may directly unlock after determining that the speaker is the same person, that is, step 1509 is directly executed.
  • user A is driving a vehicle and needs to turn on the navigation in the electronic device at this time. So User A can say “Hello, please turn on Navigation”. At this time, “Hello” may be considered as the first voice data, and "Please open the navigation" may be considered as the second voice data.
  • the microphone of the electronic device is in an on state, so when the user A speaks “hello", the electronic device can obtain the first voice data input by the user as “hello” through the microphone. The electronic device can turn off the microphone, and extract the first voiceprint feature in the first voice data through the pre-trained first voiceprint feature model.
  • the first voiceprint feature is compared with the first registered voiceprint feature obtained during registration, so that it is determined that user A and the user at the time of registration are the same person.
  • the electronic device can perform text recognition on the first voice data, so it can be determined that the specified text content exists in the first voice data.
  • the electronic device can enter the wake-up mode, turn on the microphone again, obtain the second voice data input by user A as "please open the navigation", and extract the second voice through the pre-trained second voiceprint feature model pattern features.
  • the electronic device may combine the first voiceprint feature with the second voiceprint feature, and compare it with the third registered voiceprint feature during registration. The electronic device again determines that the user A is the same person as the user at the time of registration.
  • the electronic device can be unlocked and perform text recognition of "Please open the navigation" input by the user.
  • the electronic device recognizes that the user needs to open the navigation, that is, the control command is "open navigation", so the electronic device can open the application that can be navigated.
  • the electronic device may include: one or more processors 1701 ; one or more memories 1702 and one or more display screens 1703 ;
  • the one or more memories 1702 store one or more computer programs, wherein the one or more computer programs include instructions.
  • a processor 1701 and a memory 1702 are shown in FIG. 17 .
  • the instructions When executed by the one or more processors 1701, the instructions cause the electronic device 1700 to perform the following steps:
  • the first voice data input by the user is received, and the first voiceprint feature of the first voice data is extracted; the comparison result between the first voiceprint feature and the preset reference voiceprint feature is consistent , and when the first voice data includes the specified text, the screen is still in the lock screen state; the second voice data input by the user is received, and the second voice data includes a control instruction, and the control instruction is used to trigger the user at least one functional requirement; extract the second voiceprint feature of the second voice data; when the second voiceprint feature is consistent with the preset reference voiceprint feature comparison result, unlock the screen, and execute the control instruction to trigger the at least one functional requirement.
  • the specified text and control instructions reference may be made to the relevant descriptions in the method embodiment shown in FIG. 7 , which will not be repeated here.
  • the processor 1701 further performs the following steps: receiving the first registered voice data input by the user; the first registered voice data contains the specified text; acquiring the first registered voice data The first registered voiceprint feature; the second registered voice data input by the user is received; the second registered voice data and the first registered voice data are from the same user; the second registered voice data of the second registered voice data is obtained Voiceprint feature; the memory 1702 stores the first registered voiceprint feature and the second registered voiceprint feature as the reference voiceprint feature.
  • the first registered voiceprint feature and the second registered voiceprint feature reference may be made to the relevant description in the method embodiment shown in FIG. 7 , and details are not repeated here.
  • the processor 1701 further performs the following steps: combining the first registered voiceprint feature and the second registered voiceprint feature to obtain a third registered voiceprint feature.
  • the memory 1702 stores the third registered voiceprint feature as the reference voiceprint feature.
  • the third voiceprint feature reference may be made to the relevant description in the method embodiment shown in FIG. 7 , and details are not repeated here.
  • the processor 1701 is specifically configured to perform the following steps: combining the second voiceprint feature and the first voiceprint feature to obtain a third voiceprint feature; When the comparison results are consistent, unlock the screen.
  • the third voiceprint feature reference may be made to the relevant description in the method embodiment shown in FIG. 7 , and details are not repeated here.
  • the processor 1701 is specifically configured to perform the following steps: using a pre-trained first voiceprint feature model to obtain the first voiceprint feature of the first voice data; using a pretrained second voiceprint feature model The voiceprint feature model acquires the second voiceprint feature of the second voice data.
  • a pre-trained first voiceprint feature model to obtain the first voiceprint feature of the first voice data
  • a pretrained second voiceprint feature model The voiceprint feature model acquires the second voiceprint feature of the second voice data.
  • each functional unit in this embodiment of the present invention may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit.
  • the first acquisition unit and the second acquisition unit may be the same unit or different units.
  • the above-mentioned integrated units may be implemented in the form of hardware, or may be implemented in the form of software functional units.
  • the term “when” may be interpreted to mean “if” or “after” or “in response to determining" or “in response to detecting" depending on the context.
  • the phrases “in determining" or “if detecting (the stated condition or event)” can be interpreted to mean “if determining" or “in response to determining" or “on detecting (the stated condition or event)” or “in response to the detection of (the stated condition or event)”.
  • the above-mentioned embodiments it may be implemented in whole or in part by software, hardware, firmware or any combination thereof.
  • software it can be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions. When the computer program instructions are loaded and executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission to another website site, computer, server, or data center by wire (eg, coaxial cable, optical fiber, digital subscriber line) or wireless (eg, infrared, wireless, microwave, etc.).
  • the computer-readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that includes an integration of one or more available media.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media (eg, solid state drives), and the like.

Abstract

本申请提供一种电子设备解锁方法和装置,涉及智能终端技术领域,用来提高终端设备的解锁安全性。该方法中,电子设备可以在锁屏状态下,接收用户输入的第一语音数据,并提取第一声纹特征,以及在第一声纹特征与预设的参考声纹特征比对结果一致,且第一语音数据包含指定文本时,仍处于锁屏状态。电子设备可以提取第二语音数据的第二声纹特征,在第二声纹特征与预设的参考声纹特征比对结果一致时,进行屏幕解锁,并执行控制指令。这样用户需要对电子设备进行解锁时可以向电子设备输入语音数据,电子设备可以对用户输入的唤醒词和控制命令分别进行声纹特征比对,在两次声纹特征比对的结果均表示匹配成功时解锁,因此可以提高声纹解锁的安全性。

Description

一种电子设备解锁方法和装置
相关申请的交叉引用
本申请要求在2020年10月30日提交中国专利局、申请号为202011188127.2、申请名称为“一种电子设备解锁方法和装置”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及智能终端技术领域,尤其涉及一种电子设备解锁方法和装置。
背景技术
针对终端设备的解锁方案,当前业界有基于人脸、指纹、虹膜或者是声音的方案。然而,目前基于声音的解锁方案大多是基于用户输入的唤醒词进行声纹识别,从而解锁终端设备。但是,由于唤醒词都是短文本内容,因此在声纹识别时准确率较低,达不到解锁安全要求。同时也无法解决录音、语音合成和语音模仿等攻击,因此安全性较低。
为了提高解锁安全,还可以在用户输入唤醒词进行声纹识别后,通过用户手动解锁实现终端设备的解锁方案。但是上述方案需要用户手动接入,损失了语音解锁的便捷性,不够智能。
发明内容
本申请实施例提供一种电子设备解锁方法和装置,用于在提高终端设备的解锁安全性的基础上,减少用户的手动操作,以提高解锁过程的效率。
第一方面,本申请实施例提供一种电子设备解锁方法。该方法可以由本申请提供的电子设备执行,或者由类似电子设备功能的芯片执行。该方法中,电子设备可以在锁屏状态下,接收用户输入的第一语音数据,并提取所述第一语音数据的第一声纹特征。电子设备可以在第一声纹特征与预设的参考声纹特征比对结果一致,且所述第一语音数据包含指定文本时,仍处于锁屏状态。电子设备可以接收用户输入的第二语音数据。该第二语音数据中可以包含控制指令,控制指令可以用于触发用户的至少一个功能需求。电子设备可以提取第二语音数据的第二声纹特征。电子设备可以在第二声纹特征与预设的参考声纹特征比对结果一致时,进行屏幕解锁,并执行所述控制指令,以触发所述至少一个功能需求。
基于上述方案,用户需要对电子设备进行解锁时可以向电子设备输入语音数据,电子设备可以对用户输入的两次语音数据,即对用户输入的唤醒词和控制命令分别进行声纹特征比对,在两次声纹特征比对的结果均表示匹配成功时,电子设备可以解锁。因此,可以提高声纹特征比对结果的准确率,且可以有效的避免模仿、录音和合成等攻击,可以提高声纹解锁的安全性。另外,由于两次声纹特征比对均是对用户输入的语音数据进行的,因此无需用户手动操作,可以为用户提供便利。此外,第二次输入的语音数据中携带控制指令,可以使得电子设备在成功解锁后,执行用户需要的功能,因此也可以提高用户执行某个需求的效率。
在一种可能的实现方式中,所述电子设备可以接收所述用户输入的第一注册语音数据;所述第一注册语音数据包含所述指定文本。所述电子设备可以获取所述第一注册语音数据的第一注册声纹特征;所述电子设备可以接收所述用户输入的第二注册语音数据;所述第二注册语音数据与所述第一注册语音数据来自同一用户;所述电子设备可以获取所述第二注册语音数据的第二注册声纹特征;所述电子设备可以存储所述第一注册声纹特征和所述第二注册声纹特征作为所述参考声纹特征。
基于上述方案,电子设备可以通过用户输入的第一注册语音数据和第二注册语音数据,获取用户的声纹特征作为参考声纹特征,用于用户对电子设备进行声纹解锁。
在一种可能的实现方式中,所述电子设备可以合并所述第一注册声纹特征和所述第二注册声纹特征,得到第三注册声纹特征;所述电子设备可以存储所述第三注册声纹特征作为所述参考声纹特征。
基于上述方案,电子设备可以将两个声纹特征进行融合,以提高声纹特征对比时的准确率,有效的防止语音模仿、语音合成和录音等攻击。
在一种可能的实现方式中,所述电子设备可以合并所述第二声纹特征和所述第一声纹特征,得到第三声纹特征;所述电子设备可以在所述第三声纹特征比对结果一致时,进行屏幕解锁。
基于上述方案,电子设备可以将根据用户进行声纹解锁时输入的第一语音数据和第二语音数据提取的第一声纹特征与第二声纹特征合并,并与存储的参考声纹特征进行对比,可以提高声纹特征比对时的准确率。
在一种可能的实现方式中,所述电子设备可以采用预先训练的第一声纹特征模型获取所述第一语音数据的所述第一声纹特征;所述第一声纹特征模型是根据多个标注有说话人的所述第一语音数据训练得到的;所述电子设备可以采用预先训练的第二声纹特征模型获取所述第二语音数据的所述第二声纹特征;所述第二声纹特征模型是根据多个标注有说话人的所述第二语音数据训练得到的。
基于上述方案,电子设备可以基于预先训练好的声纹特征模型获取用户输入的语音数据的声纹特征,可以快速且准确的提取到用户的声纹特征。
本申请实施例提供一种电子设备,该电子设备例如为折叠屏电子设备。电子设备包括:一个或多个处理器;存储器;多个应用程序;以及一个或多个计算机程序,其中所述一个或多个计算机程序被存储在所述存储器中,所述一个或多个计算机程序包括指令,当所述指令被所述电子设备执行时,使得所述电子设备执行上述第一方面及其第一方面任一可能的实现方式的技术方案。
第三方面,本申请实施例提供一种芯片,该芯片与电子设备中的存储器耦合,用于调用存储器中存储的计算机程序并执行本申请实施例第一方面及其第一方面任一可能设计的技术方案;本申请实施例中“耦合”是指两个部件彼此直接或间接地结合。
第四方面,本申请实施例还提供了一种电路系统。该电路系统可以是一个或多个芯片,比如,片上系统(system-on-a-chip,SoC)。该电路系统包括:至少一个处理电路;所述至少一个处理电路,用于执行本申请实施例第一方面及其第一方面任一可能设计的技术方案。
第五方面,本申请实施例还提供了一种电子设备,所述电子设备包括执行上述第一方面或者第一方面的任意一种可能的设计的方法的模块/单元;这些模块/单元可以通过硬件实现,也可以通过硬件执行相应的软件实现。
第六方面,本申请实施例还提供一种计算机可读存储介质,所述计算机可读存储介质包括计算机程序,当计算机程序在电子设备上运行时,使得所述电子设备执行本申请实施例第一方面及其第一方面任一可能设计的技术方案。
第七方面,本申请实施例的中一种程序产品,包括指令,当所述程序产品在电子设备上运行时,使得所述电子设备执行本申请实施例第一方面及其第一方面任一可能设计的技术方案。
另外,第二方面至第七方面的有益效果可以参见第一方面的有益效果,此处不再赘述。
附图说明
图1A为本申请实施例提供的录制电子设备解锁语音的示意图之一;
图1B为本申请实施例提供的电子设备解锁示意图之一;
图1C为本申请实施例提供的电子设备解锁示意图之一;
图2为本申请实施例提供的电子设备的硬件结构示意图;
图3为本申请实施例提供的电子设备的软件结构示意图;
图4为本申请实施例提供的训练声纹特征模型的流程示意图;
图5A为本申请实施例提供的录制电子设备解锁语音的示意图之一;
图5B为本申请实施例提供的录制电子设备解锁语音的示意图之一;
图6为本申请实施例提供的融合声纹特征的流程示意图;
图7为本申请实施例提供的电子设备解锁方法的示例性流程图之一;
图8A为本申请实施例提供的电子设备解锁示意图之一;
图8B为本申请实施例提供的电子设备解锁示意图之一;
图9为本申请实施例提供的电子设备解锁示意图之一;
图10为本申请实施例提供的电子设备解锁示意图之一;
图11A为本申请实施例提供的电子设备解锁示意图之一;
图11B为本申请实施例提供的电子设备锁定的示意图;
图12为本申请实施例提供的电子设备解锁示意图之一;
图13为本申请实施例提供的电子设备解锁示意图之一;
图14为本申请实施例提供的电子设备解锁示意图之一;
图15为本申请实施例提供的电子设备解锁方法的示例性流程图之一;
图16为本申请实施例提供的电子设备解锁的场景示意图;
图17为本申请实施例提供的电子设备的框图。
具体实施方式
目前,终端设备可以使用人脸、虹膜、指纹或者声音解锁。其中,声音解锁的流程如下:
参阅图1A,用户可以提前录制一段解锁语音,终端设备可以根据用户录制的解锁语音获取用户的声纹特征,并存储该声纹特征。在用户需要解锁终端设备时,参阅图1B,用户可以输入前述解锁语音,终端设备可以根据该解锁语音获取声纹特征。终端设备可以将获取到的声纹特征与存储的声纹特征进行对比,在获取到的声纹特征与存储的声纹特征一致 时,确定解锁,为用户呈现主界面,主界面上包括多个应用程序图标,具体参照图1B所示。
但是,上述声纹解锁方案中,由于解锁语音一般为唤醒词等指定的短文本内容的语音,比如指定只能输入“请打开屏幕”等设定语音。因此根据该解锁语音可以获取到的声纹特征较少。另外,由于声音受外界环境影响很大,因此提前录制解锁语音时,获取到的声纹特征也可能不准确。因此,上述方案中会出现无法正确解锁终端设备的情况,也就是说基于语音解锁终端设备的方式准确率较低。不仅如此,由于解锁语音为指定的短文本内容的语音,因此上述方案也无法解决录音、语音合成和语音模仿等攻击,解锁的安全性也较低。
为了提高基于语音解锁终端设备的方式的安全性,目前在用户输入解锁语音并通过声纹特征比对后,如果出现解锁不成功的情况,可能还需要用户手动输入密码或解锁指纹等信息进行进一步的验证。如图1C所示,终端设备可以显示“声音解锁不成功,请选择指纹解锁或密码解锁”,这时用户可以选择通过数字键盘输入数字密码,或通过指纹识别区域输入指纹信息进行解码。但是,上述方案对于用户来说不够智能,也比较繁琐,丧失了声纹解锁的便捷性和高效性的特点。
基于此,本申请提出一种新的电子设备解锁方案,以避免上述存在的问题,提高电子设备基于语音解锁的安全性和高效性。本申请实施例可以适用于各种类型的电子设备中,例如具有曲面屏、全面屏、折叠屏等的电子设备。电子设备诸如手机、平板电脑、可穿戴设备(例如,手表、手环、智能头盔等)、车载设备、智能家居、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)等,本申请这里不做限定。
本申请实施例提供的方法中,考虑到唤醒词大多是指定的短文本内容,因此通过唤醒词进行声纹特征比对时准确率不够高的问题,同时也无法解决录音、语音合成或语音模仿等攻击导致的安全性较低的问题,提出可以通过对用户输入的多次声纹特征进行比对,以提高声纹特征比对准确率。在本申请实施例中,可以首先对用户输入的唤醒词进行第一次声纹特征比对,以识别说出该唤醒词的人是否与注册时的用户为同一人。在确定识别结果为同一人时,电子设备可以对用户输入的命令词进行第二次声纹特征比对,以识别说出该命令词的人是否与注册时的用户为同一人。在确定识别结果为同一人时,可以解锁并解析命令词包含的文字相关信息,从而可以执行命令词相关的任务。因此,可以提高声纹特征比对的准确率,提高解锁电子设备的安全性。此外,由于声纹特征比对的过程无需用户手动参与,可以为用户提供便捷,使解锁过程更加智能和高效。
以下,为了可以详尽的了解本申请实施例提供的技术方案,先对本申请实施例出现的名词进行解释。
1)唤醒词,指固定短文本内容的语音。
2)声纹识别,是生物识别技术的一种,又可以称为说话人识别、说话人辨别、说话人确认,可以通过把人的声音型号转换成电信号,在通过计算机技术,提取声音特征,从而识别说话人。
3)声纹特征比对,指通过在说话人输入的语音中提取声纹特征,并与注册时用户输入的语音的声纹特征进行比对。
下面将结合本申请以下实施例中的附图,对本申请实施例中的技术方案进行详尽描述。
以下实施例中所使用的术语只是为了描述特定实施例的目的,而并非旨在作为对本申请的限制。如在本申请的说明书和所附权利要求书中所使用的那样,单数表达形式“一个”、“一种”、“所述”、“上述”、“该”和“这一”旨在也包括例如“一个或多个”这种表达形式,除非其上下文中明确地有相反指示。还应当理解,在本申请实施例中,“一个或多个”是指一个、两个或两个以上;“和/或”,描述关联对象的关联关系,表示可以存在三种关系;例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A、B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。
在本说明书中描述的参考“一个实施例”或“一些实施例”等意味着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。术语“包括”、“包含”、“具有”及它们的变形都意味着“包括但不限于”,除非是以其他方式另外特别强调。
本申请实施例涉及的至少一个,包括一个或者多个;其中,多个是指大于或者等于两个。另外,需要理解的是,在本申请的描述中,“第一”、“第二”等词汇,仅用于区分描述的目的,而不能理解为指示或暗示相对重要性,也不能理解为指示或暗示顺序。
以下实施例中以电子设备是手机为例进行介绍。其中手机中可以安装各种应用程序(application,app),可以简称应用,以能够实现某项或多项特定功能的软件程序。例如,应用包括即时通讯类应用、视频类应用、音频类应用、图像拍摄类应用等等。其中,即时通信类应用,例如可以包括短信应用、微信(WeChat)、WhatsApp Messenger、连我(Line)、照片分享(instagram)、Kakao Talk、钉钉等。图像拍摄类应用,例如可以包括相机应用(系统相机或第三方相机应用)。视频类应用,例如可以包括Youtube、Twitter、抖音、爱奇艺,腾讯视频等应用。音频类应用,例如可以包括酷狗音乐、虾米、QQ音乐等应用。以下实施例中提到的应用,可以是电子设备出厂时已安装的应用,也可以是用户在使用电子设备的过程中从网络下载或其他电子设备获取的应用。
本申请实施例提供了一种电子设备解锁方法,该方法可以适用于任何电子设备,参见图2所示,为本申请一实施例提供的电子设备的硬件结构示意图。该电子设备可以是手机(折叠屏手机或非折叠屏手机)、平板电脑(折叠式平板电脑或非折叠式平板电脑)等。如图2所示,电子设备可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195等。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。其中,控制器可以是电子设备的神经中枢和指挥中心。控制器可以根据指 令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。处理器110可以接收音频模块170输入的语音数据,处理器110可以获取该语音数据的声纹特征。
USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。充电管理模块140用于从充电器接收充电输入。电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像头193,和无线通信模块160等供电。
电子设备的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。移动通信模块150可以提供应用在电子设备上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。
无线通信模块160可以提供应用在电子设备上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。
在一些实施例中,电子设备的天线1和移动通信模块150耦合,天线2和无线通信模块160耦合,使得电子设备可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
显示屏194用于显示应用的显示界面等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备可以包括1个或N个显示屏194,N为大于1的正整数。在一 些实施例中,显示屏194可以在电子设备进入唤醒模式时亮起,也可以在电子设备解锁后为用户呈现主界面。
摄像头193用于捕获静态图像或视频。在一些实施例中,摄像头193可以包括至少一个摄像头,例如一个前置摄像头和一个后置摄像头。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,以及至少一个应用程序(例如爱奇艺应用,微信应用等)的软件代码等。存储数据区可存储电子设备使用过程中所产生的数据(例如图像、视频等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。在一些实施例中,内部存储器121可以存储用户注册时的声纹特征以及用于提取语音数据的声纹特征的模型。比如,内部存储器121可以存储本申请实施例中的第一声纹特征模型和第二声纹特征模型。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将图片,视频等文件保存在外部存储卡中。
电子设备可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如受话器170B和麦克风170C可以接收用户输入的语音数据等。受话器170B或麦克风170C可以在电子设备处于睡眠模式时开启,或者也可以在电子设备处于唤醒模式时开启。
其中,传感器模块180可以包括指纹传感器180H,触摸传感器180K等。
指纹传感器180H用于采集指纹。电子设备可以利用采集的指纹特性实现指纹解锁,访问应用锁,指纹拍照,指纹接听来电等。
触摸传感器180K,也称“触控面板”。触摸传感器180K可以设置于显示屏194,由触摸传感器180K与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180K用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180K也可以设置于电子设备的表面,与显示屏194所处的位置不同。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备可以接收按键输入,产生与电子设备的用户设置以及功能控制有关的键信号输入。马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现与电子设备的接触和分离。
可以理解的是,图2所示的部件并不构成对手机的具体限定,手机还可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。以下的实施例中,以图2所示的电子设备为例进行介绍。
图3示出了本申请一实施例提供的电子设备的软件结构框图。如图3所示,电子设备的软件结构可以是分层架构,例如可以将软件分成若干个层,每一层都有清晰的角色和分 工。层与层之间通过软件接口通信。在一些实施例中,将Android系统分为四层,从上至下分别为应用程序层,应用程序框架层(framework,FWK),安卓运行时(Android runtime)和系统库,以及内核层。
应用程序层可以包括一系列应用程序包。如图3所示,应用程序层可以包括相机、设置、皮肤模块、用户界面(user interface,UI)、三方应用程序等。其中,三方应用程序可以包括微信、QQ、图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层可以包括一些预先定义的函数。如图4所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器等。
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。
电话管理器用于提供电子设备100的通信功能。例如通话状态的管理(包括接通,挂断等)。
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件,视频文件等资源。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。
Android runtime包括核心库和虚拟机。Android runtime负责安卓系统的调度和管理。
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是安卓的核心库。应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(media libraries),三维图形处理库(例如:OpenGL ES),2D图形引擎(例如:SGL)等。
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。
三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。
2D图形引擎是2D绘图的绘图引擎。
此外,系统库还可以包括显示/渲染服务、节能显示控制服务。其中,显示/渲染服务用于确定显示数码流,该显示数码流中包括显示屏上的每个像素单元(下文可以简称像元)的显示信息,显示信息可以包括显示亮度、显示时间、文字信息、图像信息等。内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。
硬件层可以包括各类传感器,例如本申请实施例中涉及的加速度传感器、陀螺仪传感器、触摸传感器等。
下面结合本申请实施例的电子设备解锁方法,示例性说明电子设备的软件以及硬件的工作流程。
在本申请实施例中,需要对说话人的输入的语音进行两次验证,因此本申请实施例中声纹特征获取模型可以分为第一声纹特征模型和第二声纹特征模型。其中,第一声纹特征模型是根据说话人输入的指定的语音数据训练而得到的,第二声纹特征模型是根据说话人输入的随机的语音数据训练而得到的。以下,首先介绍本申请实施例中第一声纹特征模型的训练方法,参阅图4,可以包括以下步骤。
步骤401:获取指定文本内容的第一语音数据。
这里的指定文本内容的第一语音数据可以是唤醒词等指定的短文本语音数据。例如,可以是“你好小艺”等能够唤醒电子设备的唤醒词。其中,可以获取不同的说话人输入的第一语音数据,获取到的每一条第一语音数据都已经标注有说话人。
在一个示例中,可以获取说话人在不同环境中输入的第一语音数据。例如,可以在车内、室内、室外等环境输入第一语音数据。
步骤402:通过深度神经网络模型,对已经标注有说话人的第一语音数据进行学习,得到第一声纹特征模型。
在本申请实施例中,可以将标注有说话人的每一条第一语音数据作为输入,通过深度学习得到每一条第一语音数据的声纹特征。由于每一条第一语音数据均标注有说话人,因此,可以根据获取到的声纹特征与说话人的对应关系,学习到第一声纹特征模型的参数,从而可以根据第一声纹特征模型的参数,构建第一声纹特征模型。其中,第一声纹特征模型可以提取输入的第一语音数据的声纹特征。
步骤403:将训练得到的第一声纹特征模型移植到电子设备上。
通过上述步骤401-步骤402,可以训练得到本申请实施例中的声纹特征模型中的第一声纹特征模型。因此,可以通过步骤403将训练好的第一声纹特征模型移植到电子设备上,电子设备即可以采用该第一声纹特征模型,在用户输入的第一语音数据中提取声纹特征。
在介绍了本申请实施例中第一声纹特征模型的训练方法后,以下介绍本申请实施例中第二声纹特征模型的训练方法,参阅图4,可以包括以下步骤:
步骤401:获取随机文本内容的第二语音数据。
这里的第二语音数据可以是包含控制命令的随机文本内容的语音数据。例如,可以是“查询今天天气”、“播放音乐”等控制命令。其中,可以获取不同说话人输入的第二语音数据,并为每一条第二语音数据标注说话人。
在一个示例中,可以获取说话人在不同应用场景下的第二语音数据。例如,车载场景、查询百科类场景、打车场景、外卖场景等场景。例如,针对车载场景,说话人可以输入“请打开导航”、或者“去A地”等第二语音数据。
在另一个示例中,还可以获取说话人在不同环境下的第二语音数据。例如,车内、室 内、室外等环境。举例来说,说话人可以在车内,针对车载场景输入“请打开导航”、“请播放音乐”等第二语音数据。
又例如,说话人可以在室内,针对查询百科类场景输入“今天天气怎么样”等第二语音数据。
步骤402:通过深度神经网络模型,对已经标注有说话人的第二语音数据进行学习,得到第二声纹特征模型。
在本申请实施例中,可以将标注有说话人的每一条第二语音数据作为输入,通过深度学习得到每一条第二语音数据的声纹特征。由于每一条第二语音数据均标注有说话人,因此,可以通过获取到的声纹特征与说话人的对应关系,学习到第二声纹特征模型的参数,从而可以根据学习到的第二声纹特征模型的参数,构建第二声纹特征模型。其中,第二声纹特征模型可以提取输入的第二语音数据的声纹特征。
步骤403:将训练得到的第二声纹特征模型移植到电子设备上。
通过上述步骤401-步骤402,可以训练得到本申请实施例中的声纹特征模型中的第二声纹特征模型。因此,可以通过步骤403将训练好的第二声纹特征模型移植到电子设备上,电子设备即可以采用该第二声纹特征模型,在用户输入的第二语音数据中提取声纹特征。
在介绍了本申请实施例中声纹特征模型的训练方法后,以下结合如图介绍本申请实施例提供的电子设备解锁方法。
首先,用户可以在电子设备上预先注册声纹特征。电子设备可以将用户预先注册的声纹特征与从第一语音数据中通过声纹识别提取的声纹特征进行比对,从而确定声纹识别是否成功。以下,介绍用户注册声纹特征的方式。
参阅图5A,用户可以根据提示在电子设备上进行声纹特征注册。如图5A所示,电子设备在被启动声纹特征注册后,可以通过显示屏显示“请说,你好小艺”。用户则可以根据该提示,说出“你好,小艺”。因此,电子设备可以通过受话器、麦克风等传感器获取到用户输入的内容为“你好,小艺”的第一注册语音数据。电子设备可以通过预先训练好的第一声纹特征模型,从用户输入的第一注册语音数据中提取第一注册声纹特征。这样,电子设备则可以获取到用户在说指定短文本内容的第一注册语音数据时的第一注册声纹特征。电子设备可以在存储器中存储该第一注册声纹特征。
参阅图5B所示,电子设备可以通过显示屏显示“请说,小艺发信息”。用户则可以根据该提示,说出“小艺,发信息”。因此,电子设备可以通过受话器、麦克风等传感器获取用户输入的内容为“小艺,发信息”的第二注册语音数据。电子设备可以通过预先训练好的第二声纹特征模型,从用户输入的第二注册语音数据中提取第二注册声纹特征。这样,电子设备则可以获取到用户在说随机文本内容的第二注册语音数据时的第二注册声纹特征。电子设备可以在存储器中存储该第二注册声纹特征。
可选的,电子设备还可以通过显示屏提示用户多次输入第二注册语音数据,以便能够获取到更多的第二注册声纹特征,提高声纹特征比对时的准确度。例如,电子设备在用户输入内容为“小艺,发信息”时的第二注册语音数据后,还可以通过显示屏显示“请说,小艺今天天气怎么样”、“请说,小艺放点音乐”等提示内容。用户可以根据显示屏显示的提示内容,输入对应的第二注册语音数据,以便电子设备可以提取第二注册语音数据的第二注册声纹特征。
在一种可能的实现方式中,电子设备可以将用户第一注册声纹特征与第二注册声纹特 征进行合并,得到第三注册声纹特征。参阅图6,电子设备通过第一声纹特征模型和第二声纹特征模型获取到用户第一注册声纹特征和第二注册声纹特征,并将用户的第一注册声纹特征和第二注册声纹特征按照指定方式进行合并,并存储合并后的第一注册声纹特征和第二注册声纹特征,即存储第三注册声纹特征。
可选的,在第一注册声纹特征与第二注册声纹特征进行合并时,可以是简单的合并。例如,第一注册声纹特征为A,第二注册声纹特征为B,因此可以将第一注册声纹特征与第二注册声纹特征进行合并,得到第三注册声纹特征A+B。或者,也可以分别为第一注册声纹特征和第二注册声纹特征分配权重,如第一注册声纹特征的权重分配为0.4,第二注册声纹特征的权重分配为0.6,并将第一注册声纹特征与对应的权重相乘,将第二注册声纹特征与对应的权重相乘后,再进行合并。
参见图7,为本申请实施例提供的电子设备解锁方法的流程示意图,该方法可以由图2或图3所示的电子设备执行,该方法的流程包括:
步骤701:电子设备接收用户输入的第一语音数据,获取第一语音数据的第一声纹特征。
其中,电子设备可以通过如图2所示的受话器170B、或麦克风170C接收用户输入的第一语音数据。这里的第一语音数据可以是指定的短文本语音数据,如唤醒词等。比如,用户可以说“你好,小艺”,电子设备可以通过受话器170B或麦克风170C接收用户输入的“你好,小艺”的语音数据。
在本申请实施例中,用户可以在电子设备的显示屏未亮起,即黑屏的情况下,输入第一语音数据。应理解,即使电子设备的显示屏未亮起,但电子设备的受话器170B或麦克风170C可以是开启的。用户可以说“你好”等指定的短文本语音数据。
为了节省电子设备的用电,用户可以在输入第一语音数据之前,通过触摸显示屏,或按动电子设备的按键(如图2所示的按键190),触发电子设备开启受话器170B或麦克风170C。参阅图8A,在电子设备的屏幕亮起后,用户可以输入第一语音数据。
可选的,电子设备在屏幕亮起后,可以通过显示屏显示提示信息,提示用户输入第一语音数据。参阅图8B,用户通过触摸显示屏或者按动电子设备的按键,使电子设备的显示屏亮起后,可以显示“请输入声音密码”等提示信息,提示用户输入语音数据解锁电子设备。
在一种可能的实现方式中,如果外部环境比较嘈杂、噪声较大时,电子设备可能未接收到用户输入的第一语音数据。因此,电子设备也可以通过显示屏显示提示信息,以提示用户重新输入第一语音数据。参阅图9,用户输入第一语音数据,但电子设备未能接收该第一语音数据。因此,电子设备可以显示“你说什么”的提示信息,提示用户重新输入第一语音数据。可选的,如果电子设备未接收到用户输入的第一语音数据的次数达到指定次数时,电子设备可以提示用户通过指纹解锁或输入密码解锁,如图1C所示。这里的指定次数可以是预先设置的,例如可以设置为3次、4次等,本申请不做具体限定。
电子设备接收用户输入的第一语音数据之后,可以获取该第一语音数据的第一声纹特征。其中,电子设备可以采用通过步骤401-步骤403得到的第一声纹特征模型对第一语音数据进行声纹特征提取,获取第一语音数据的第一声纹特征。
步骤702:电子设备根据第一声纹特征对第一语音数据进行声纹特征比对。
其中,电子设备可以将第一声纹特征与存储的用户的第一注册声纹特征进行声纹特征比对,确定说话人是否为同一人。
在一个示例中,电子设备可以通过求取第一声纹特征和第一注册声纹特征的余弦(cosine)距离d的方式,判断上述的声纹特征是否匹配。其中,d满足以下公式(1):
d=cos(x i,x j)=x i T*x j        公式(1)
其中,x i表示在步骤701中提取到的第一声纹特征,x j表示注册时得到的第一注册声纹特征,T表示矩阵的转置。
在本申请实施例中,可以预先维护一个cosine距离与声纹特征比对结果的关系。例如,cosine距离小于指定值时,表示匹配成功,consine距离大于或等于指定值时,表示匹配失败。因此,可以通过上述公式(1)可以得到cosine距离,并将其与指定值进行比较,以判断匹配是否成功,即可以判断说话人是否为同一人。
步骤703:电子设备在第一语音数据的第一声纹特征比对结果表示匹配成功时,对第一语音数据进行文本识别。
电子设备在确定第一声纹特征与第一注册声纹特征匹配成功时,可以对第一语音数据进行文本识别。电子设备可以识别第一语音数据中是否存在指定文本内容。这里的指定文本内容可以是短文本的唤醒词等。
电子设备可以在存储器中预先存储指定文本内容。比如,指定文本内容可以包含“你好”、“你好,小艺”或者“在吗”等文本内容较短的唤醒词。
在一种可能的实现方式中,如果电子设备确定第一声纹特征与第一注册声纹特征匹配失败,电子设备可以通过显示屏显示匹配失败的提示信息。参阅图10,电子设备可以通过显示屏显示“解锁失败”的提示信息。可选的,参阅图10,电子设备可以提示用户重新输入第一语音数据,以使电子设备可以根据用户重新输入的第一语音数据进行声纹特征比对。
另一种可能的实现方式中,如果电子设备对第一语音数据进行声纹特征比对的第一声纹特征比对结果为匹配失败的次数大于或等于指定阈值,电子设备可以提示用户通过指纹解锁、人脸解锁或者密码解锁的方式解锁电子设备。参阅图1C,电子设备可以通过显示屏显示输入密码的界面,或输入指纹的界面,用来提示用户通过指纹解锁或密码解锁的方式解锁电子设备。可选的,在用户通过指纹解锁或者密码解锁的方式解锁电子设备后,电子设备的声纹解锁方式可以被重启。参阅图11A,用户通过指纹解锁或者密码解锁的方式解锁电子设备后,用户锁定电子设备。此时,电子设备的声纹解锁方式被重启,用户可以重新输入第一语音数据,使得电子设备可以对第一语音数据进行声纹特征比对。
再一种可能的实现方式中,如果电子设备对第一语音数据进行声纹特征比对的第一声纹特征比对结果为匹配失败的次数大于或等于指定阈值,电子设备可以锁定指定时长。参阅图11B,电子设备可以在显示屏显示锁定的指定时长,并且可以以倒计时的方式显示电子设备剩余的锁定时长。在电子设备的锁定时长达到指定时长时,用户可以通过声纹解锁、密码解锁、指纹解锁或人脸解锁的方式对电子设备进行解锁。
步骤704:电子设备在所述第一语音数据中存在指定文本内容时,进入唤醒模式。
在电子设备确定第一语音数据中存在指定文本内容时,可以进入唤醒模式。其中,唤醒模式下电子设备可以开启音频模块,如受话器或麦克风进行录音。参阅图12,如果电子设备确定第一语音识别结果中存在指定文本内容时,电子设备可以亮起显示屏,开启音频模块,并可以通过显示屏显示“请说”或者“我在呢”等提示内容,以提示用户继续输入控制命令。应理解,此时电子设备仍为锁定状态。
在电子设备确定第一语音数据中不存在指定文本内容时,电子设备可以保持睡眠状态。 比如,电子设备可以将显示屏关闭,或者电子设备可以将音频模块关闭。电子设备可以在显示屏中显示“你说什么”、或者“我不理解”等提示信息,提示用户电子设备未被唤醒,需要重新输入第一语音数据以使电子设备可以对第一语音数据进行声纹特征匹配。
步骤705:电子设备接收用户输入的第二语音数据,获取第二语音数据的第二声纹特征。
电子设备在确定第一声纹特征与第一注册声纹特征匹配成功后,可以进入唤醒模式。在唤醒模式下,电子设备可以开启音频模块,用来接收用户输入的第二语音数据。这里的第二语音数据可以是随机的语音数据。比如,第二语音数据可以是包含控制命令的语音数据。
在一种可能的实现方式中,如果外部环境比较嘈杂、噪声较大时,电子设备可能未接收到用户输入的第二语音数据。因此,电子设备也可以通过显示屏显示提示信息,以提示用户重新输入第二语音数据。参阅图13,用户输入第二语音数据,但电子设备未能接收该第二语音数据。因此,电子设备可以显示“请再说一次”的提示信息,提示用户重新输入第二语音数据。可选的,如果电子设备未接收到用户输入的第二语音数据的次数达到指定次数时,电子设备可以提示用户通过指纹解锁、人脸解锁或者密码解锁,如图1C所示。
可选的,如果电子设备未能接收该第二语音数据的次数达到指定次数时,电子设备可以进入睡眠模式。比如,电子设备可以将显示屏关闭,也可以将音频模块关闭。
电子设备接收用户输入的第二语音数据之后,可以获取该第二语音数据的第二声纹特征。其中,电子设备可以采用通过步骤401-步骤403得到的第二声纹特征模型对第二语音数据进行声纹特征提取,获取第二语音数据的第二声纹特征。
步骤706:电子设备根据第二声纹特征对第二语音数据进行声纹特征比对。
其中,电子设备可以将第二声纹特征与存储的用户的第二注册声纹特征进行声纹特征比对,确定说话人是否为同一人。
在一个示例中,电子设备可以通过公式(1),求取第二声纹特征与第二注册声纹特征的预先距离d。其中,x i表示在步骤705中提取到的第二声纹特征,x j表示注册时得到的第二注册声纹特征,T表示矩阵的转置。
在另一个示例中,电子设备可以将在步骤701中获取到的第一声纹特征与在步骤705中获取到的第二声纹特征按照指定方式进行合并。应理解,此时合并第一声纹特征与第二声纹特征的方式,应与用户注册声纹特征时合并第一注册声纹特征与第二注册声纹特征的方式相同。电子设备可以将第一声纹特征与第二声纹特征合并后得到的第三声纹特征,与存储的用户的第三注册声纹特征进行比对,确定说话人是否为同一人。
例如,在用户注册声纹特征时第一注册声纹特征的权重为0.4,第二注册声纹特征的权重为0.6,并分别根据第一注册声纹特征的权重和第二注册声纹特征的权重将第一注册声纹特征和第二注册声纹特征进行合并,得到第三注册声纹特征。此时,步骤701中电子设备获取到的第一声纹特征的权重也是0.4,在步骤705中获取到的第二声纹特征的权重也是0.6。电子设备可以根据第一声纹特征和第二声纹特征的权重,将步骤701中获取的第一声纹特征和步骤705中获取的第二声纹特征进行合并,得到第三声纹特征。电子设备可以将第一声纹特征与第二声纹特征合并后得到的第三声纹特征与存储的第三注册声纹特征进行比对。
其中,电子设备可以通过上述公式(1)计算得到第三声纹特征与第三注册声纹特征 的预先距离d。其中,其中,x i表示在第三声纹特征,x j表示注册时得到的第三注册声纹特征,T表示矩阵的转置。
步骤707:电子设备在第二语音数据的第二声纹特征比对结果表示匹配成功时,对第二语音数据进行文本识别。
如果电子设备确定对第二语音数据进行声纹特征比对的第二声纹特征比对结果表示匹配成功时,电子设备可以识别第二语音数据中是否存在控制指令。在第二语音数据中存在控制指令时,电子设备可以对控制指令进行响应。
参阅图14,用户输入内容为“放点音乐”的第二语音数据,在电子设备确定第二语音数据的第二声纹特征比对结果表示匹配成功时,电子设备可以对第二语音数据进行文本识别。电子设备可以确定第二语音数据中存在控制指令,因此电子设备可以对控制指令进行响应,打开可以播放音乐的应用程序,并随机播放音乐。
在一种可能的实现方式中,如果电子设备对第二语音数据的第二声纹特征比对结果表示匹配失败,电子设备可以通过显示屏显示匹配失败的提示信息。参阅图10,电子设备可以通过显示屏显示“解锁失败”的提示信息,用来让用户了解到本次声纹解锁是失败的。可选的,参阅图10,电子设备可以提示用户重新输入第二语音数据。用户可以根据电子设备的提示,重新输入第二语音数据,对电子设备进行解锁。
又一种可能的实现方式中,如果电子设备对第二语音数据的第二声纹特征比对结果表示匹配失败,电子设备可以结束本次声纹解锁操作,并关闭显示屏。此时,用户可以输入第一语音数据,电子设备可以通过步骤701-步骤707所示的方法对用户输入的第一语音数据和第二语音数据进行声纹特征比对,并判断是否对电子设备进行解锁。
另一种可能的实现方式中,如果电子设备对第二语音数据进行声纹特征比对的第二声纹特征比对结果表示匹配失败的次数大于或等于指定阈值,电子设备可以提示用户通过指纹解锁、人脸解锁或者密码解锁的方式解锁电子设备。参阅图1C,电子设备可以在显示屏显示输入密码的界面,和/或输入指纹的界面。
再一种可能的实现方式中,如果电子设备对第一语音数据进行声纹特征比对的第一声纹特征比对结果表示匹配失败的次数大于或等于指定阈值,电子设备可以锁定指定时长。参阅图11B,电子设备可以在显示屏显示锁定的指定时长,并且可以以倒计时的方式显示电子设备剩余的锁定时长。在电子设备的锁定时长达到指定时长时,用户可以通过声纹解锁、密码解锁、指纹解锁或人脸解锁的方式对电子设备进行解锁。
基于上述方案,用户需要对电子设备进行解锁时可以向电子设备输入语音数据,电子设备可以对用户输入的两次语音数据,即对用户输入的唤醒词和控制命令分别进行声纹特征比对,在两次声纹特征比对的结果均表示匹配成功时,电子设备可以解锁。因此,可以提高声纹特征比对结果的准确率,且可以有效的避免模仿、录音和合成等攻击,可以提高声纹解锁的安全性。另外,由于两次声纹特征比对均是对用户输入的语音数据进行的,因此无需用户手动操作,可以为用户提供便利。
以下,通过具体实施例介绍本申请实施例提供的电子设备解锁方法。
实施例1:
参阅图15,为本申请实施例提供的电子设备解锁方法的示例性流程图,可以包括以下步骤。
步骤1501:电子设备通过麦克风MIC获取用户输入的唤醒词。这里的唤醒词可以是 预先设定的指定短文本内容,例如“你好”、“你好,小艺”、“嗨”或者“早上好”等等。
步骤1502:电子设备对唤醒词进行文本识别。
这里的文本识别是为了确定唤醒词是否与预先设定的指定短文本内容相同。
步骤1503:电子设备提取用户输入的唤醒词中的第一声纹特征。
其中,电子设备可以通过预先训练好的第一声纹特征模型,提取唤醒词中的第一声纹特征。
可选的,本申请实施例中可以先执行步骤1502后再执行步骤1503,或者也可以先执行步骤1503后再执行步骤1502,或者也可以同时执行步骤1502与步骤1503。
步骤1504:电子设备进行文本比对和声纹特征比对。
其中,电子设备可以确定用户输入的唤醒词是否为预设的指定短文本内容,以及电子设备可以确定在步骤1503中提取到的第一声纹特征是否与用户注册时的声纹特征相同。
如果声纹特征比对与文本比对均通过,则可以执行步骤1505,如果声纹特征比对与文本比对中的任意一个未通过,则结束本次操作。
步骤1505:电子设备进入唤醒模式,并亮屏。其中,电子设备在唤醒模式下可以启动MIC录音,以获取用户输入的语音数据。
步骤1506:电子设备通过麦克风MIC录音获取用户输入的命令词。
其中,这里的命令词可以是控制电子设备执行相应操作的词。举例来说,可以是“打开导航”、“放点音乐”或“查询天气”等与控制命令相关的词。
步骤1507:电子设备提取用户输入的命令词中的第二声纹特征。
其中,电子设备可以通过预先训练好的第二声纹特征模型提取命令词中的第二声纹特征。
步骤1508:电子设备合并第一声纹特征与第二声纹特征,并验证。
电子设备可以按照指定方式合并第一声纹特征与第二声纹特征,得到第三声纹特征。电子设备可以将合并得到的第三声纹特征与注册时得到的第三声纹特征进行比对,以确定说话人是否为同一人。
如果电子设备确定说话人是同一人,则可以执行步骤1509;如果电子设备确定说话人不是同一人,则可以结束本次操作。
步骤1509:电子设备识别命令词,解析需要执行的任务。电子设备可以对命令词进行ASR,以确定命令词中包含的文字相关信息。
步骤1510:电子设备解锁,并执行命令词对应的任务。
可选的,电子设备可以在步骤1508中,确定说话人为同一人之后直接进行解锁,即直接执行步骤1509。
实施例2:
参阅图16,用户A在驾驶车辆,而此时需要开启电子设备中的导航。因此,用户A可以说“你好,请打开导航”。此时,“你好”可以被认为是第一语音数据,“请打开导航”可以被认为是第二语音数据。电子设备的麦克风是开启状态,因此电子设备可以在用户A说出“你好”时,通过麦克风获取到用户输入的内容为“你好”的第一语音数据。电子设备可以关闭麦克风,并通过预先训练好的第一声纹特征模型提取第一语音数据中的第一声纹特征。并将该第一声纹特征与注册时得到的第一注册声纹特征进行比对,从而确定用户A与注册时的用户为同一人。电子设备可以对第一语音数据进行文本识别,因此可以确定第一语音 数据中存在指定文本内容。此时,电子设备可以进入唤醒模式,再一次开启麦克风,获取到用户A输入的内容为“请打开导航”的第二语音数据,并通过预先训练好的第二声纹特征模型提取第二声纹特征。电子设备可以将第一声纹特征与第二声纹特征进行合并,并与注册时的第三注册声纹特征进行比对。电子设备再次确定用户A与注册时的用户为同一人。此时,电子设备可以解锁并对用户输入的“请打开导航”进行文本识别。电子设备识别到用户需要打开导航,即控制命令为“打开导航”,因此电子设备可以打开可以导航的应用程序。
如图17所示,本申请另外一些实施例公开了一种电子设备1700,该电子设备可以包括:包括一个或多个处理器1701;一个或多个存储器1702和一个或多个显示屏1703;其中,所述一个或多个存储器1702存储有一个或多个计算机程序,所述一个或多个计算机程序包括指令。示例性的,图17中示意出了一个处理器1701以及一个存储器1702。当所述指令被所述一个或多个处理器1701执行时,使得所述电子设备1700执行以下步骤:
在锁屏状态下,接收用户输入的第一语音数据,并提取所述第一语音数据的第一声纹特征;在所述第一声纹特征与预设的参考声纹特征比对结果一致、且所述第一语音数据包含指定文本时,仍处于锁屏状态;接收所述用户输入的第二语音数据,所述第二语音数据包含控制指令,所述控制指令用于触发所述用户的至少一个功能需求;提取所述第二语音数据的第二声纹特征;在所述第二声纹特征与所述预设的参考声纹特征比对结果一致时,进行屏幕解锁,并执行所述控制指令,以触发所述至少一个功能需求。其中,所述指定文本和控制指令可以参见如图7所示的方法实施例中的相关描述,此处不再赘述。
在一种设计中,所述处理器1701还执行以下步骤:接收所述用户输入的第一注册语音数据;所述第一注册语音数据包含所述指定文本;获取所述第一注册语音数据的第一注册声纹特征;接收所述用户输入的第二注册语音数据;所述第二注册语音数据与所述第一注册语音数据来自同一用户;获取所述第二注册语音数据的第二注册声纹特征;所述存储器1702存储所述第一注册声纹特征和所述第二注册声纹特征作为所述参考声纹特征。其中,第一注册声纹特征与第二注册声纹特征可以参见如图7所示的方法实施例中的相关描述,此处不再赘述。
在一种设计中,所述处理器1701还执行以下步骤:合并所述第一注册声纹特征和所述第二注册声纹特征,得到第三注册声纹特征。所述存储器1702存储所述第三注册声纹特征作为所述参考声纹特征。其中,所述第三声纹特征的描述可以参见如图7所示的方法实施例中的相关描述,此处不再赘述。
在一种设计中,所述处理器1701具体用于执行以下步骤:合并所述第二声纹特征和所述第一声纹特征,得到第三声纹特征;在所述第三声纹特征比对结果一致时,进行屏幕解锁。其中,所述第三声纹特征的描述可以参见如图7所示的方法实施例中的相关描述,此处不再赘述。
在一种设计中,所述处理器1701具体用于执行以下步骤:采用预先训练的第一声纹特征模型获取所述第一语音数据的所述第一声纹特征;采用预先训练的第二声纹特征模型获取所述第二语音数据的所述第二声纹特征。其中,第一声纹特征模型和第二声纹特征模型的描述可以参见如图7所示的方法实施例中的相关描述,此处不再赘述。
需要说明的是,本申请实施例中对单元的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。本发明实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。 例如,上述实施例中,第一获取单元和第二获取单元可以是同一个单元,也不同的单元。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
上述实施例中所用,根据上下文,术语“当…时”可以被解释为意思是“如果…”或“在…后”或“响应于确定…”或“响应于检测到…”。类似地,根据上下文,短语“在确定…时”或“如果检测到(所陈述的条件或事件)”可以被解释为意思是“如果确定…”或“响应于确定…”或“在检测到(所陈述的条件或事件)时”或“响应于检测到(所陈述的条件或事件)”。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线(例如同轴电缆、光纤、数字用户线)或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集成的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如DVD)、或者半导体介质(例如固态硬盘)等。
为了解释的目的,前面的描述是通过参考具体实施例来进行描述的。然而,上面的示例性的讨论并非意图是详尽的,也并非意图要将本申请限制到所公开的精确形式。根据以上教导内容,很多修改形式和变型形式都是可能的。选择和描述实施例是为了充分阐明本申请的原理及其实际应用,以由此使得本领域的其他技术人员能够充分利用具有适合于所构想的特定用途的各种修改的本申请以及各种实施例。

Claims (12)

  1. 一种电子设备解锁方法,其特征在于,所述方法包括:
    电子设备在锁屏状态下,接收用户输入的第一语音数据,并提取所述第一语音数据的第一声纹特征;
    所述电子设备在所述第一声纹特征与预设的参考声纹特征比对结果一致、且所述第一语音数据包含指定文本时,仍处于锁屏状态;
    所述电子设备接收所述用户输入的第二语音数据,所述第二语音数据包含控制指令,所述控制指令用于触发所述用户的至少一个功能需求;
    所述电子设备提取所述第二语音数据的第二声纹特征;
    所述电子设备在所述第二声纹特征与所述预设的参考声纹特征比对结果一致时,进行屏幕解锁,并执行所述控制指令,以触发所述至少一个功能需求。
  2. 根据权利要求1所述的方法,其特征在于,还包括:
    所述电子设备接收所述用户输入的第一注册语音数据;所述第一注册语音数据包含所述指定文本;
    所述电子设备获取所述第一注册语音数据的第一注册声纹特征;
    所述电子设备接收所述用户输入的第二注册语音数据;所述第二注册语音数据与所述第一注册语音数据来自同一用户;
    所述电子设备获取所述第二注册语音数据的第二注册声纹特征;
    所述电子设备存储所述第一注册声纹特征和所述第二注册声纹特征作为所述参考声纹特征。
  3. 根据权利要求2所述的方法,其特征在于,还包括:
    所述电子设备合并所述第一注册声纹特征和所述第二注册声纹特征,得到第三注册声纹特征;
    所述电子设备存储所述第三注册声纹特征作为所述参考声纹特征。
  4. 根据权利要求2所述的方法,其特征在于,所述电子设备在所述第二声纹特征与所述预设的参考声纹特征比对结果一致时,进行屏幕解锁,包括:
    所述电子设备合并所述第二声纹特征和所述第一声纹特征,得到第三声纹特征;
    所述电子设备在所述第三声纹特征比对结果一致时,进行屏幕解锁。
  5. 根据权利要求1-4任一所述的方法,其特征在于,所述电子设备采用预先训练的第一声纹特征模型获取所述第一语音数据的所述第一声纹特征;所述第一声纹特征模型是根据多个标注有说话人的所述第一语音数据训练得到的;
    所述电子设备采用预先训练的第二声纹特征模型获取所述第二语音数据的所述第二声纹特征;所述第二声纹特征模型是根据多个标注有说话人的所述第二语音数据训练得到的。
  6. 一种电子设备,其特征在于,包括:
    一个或多个处理器;
    存储器;
    多个应用程序;
    以及一个或多个计算机程序,其中所述一个或多个计算机程序被存储在所述存储器中,所述一个或多个计算机程序包括指令,当所述指令被所述电子设备执行时,使得所述电子 设备执行以下步骤:
    在锁屏状态下,接收用户输入的第一语音数据,并提取所述第一语音数据的第一声纹特征;
    在所述第一声纹特征与预设的参考声纹特征比对结果一致、且所述第一语音数据包含指定文本时,仍处于锁屏状态;
    接收所述用户输入的第二语音数据,所述第二语音数据包含控制指令,所述控制指令用于触发所述用户的至少一个功能需求;
    提取所述第二语音数据的第二声纹特征;
    在所述第二声纹特征与所述预设的参考声纹特征比对结果一致时,进行屏幕解锁,并执行所述控制指令,以触发所述至少一个功能需求。
  7. 根据权利要求6所述的电子设备,其特征在于,当所述指令被所述电子设备执行时,使得所述电子设备还执行以下步骤:
    接收所述用户输入的第一注册语音数据;所述第一注册语音数据包含所述指定文本;
    获取所述第一注册语音数据的第一注册声纹特征;
    接收所述用户输入的第二注册语音数据;所述第二注册语音数据与所述第一注册语音数据来自同一用户;
    获取所述第二注册语音数据的第二注册声纹特征;
    存储所述第一注册声纹特征和所述第二注册声纹特征作为所述参考声纹特征。
  8. 根据权利要求7所述的电子设备,其特征在于,当所述指令被所述电子设备执行时,使得所述电子设备还执行以下步骤:
    合并所述第一注册声纹特征和所述第二注册声纹特征,得到第三注册声纹特征;
    存储所述第三注册声纹特征作为所述参考声纹特征。
  9. 根据权利要求7所述的电子设备,其特征在于,当所述指令被所述电子设备执行时,使得所述电子设备具体执行以下步骤:
    合并所述第二声纹特征和所述第一声纹特征,得到第三声纹特征;
    在所述第三声纹特征比对结果一致时,进行屏幕解锁。
  10. 根据权利要求6-9任一所述的电子设备,其特征在于,当所述指令被所述电子设备执行时,使得所述电子设备具体执行以下步骤:
    采用预先训练的第一声纹特征模型获取所述第一语音数据的所述第一声纹特征;所述第一声纹特征模型是根据多个标注有说话人的所述第一语音数据训练得到的;
    采用预先训练的第二声纹特征模型获取所述第二语音数据的所述第二声纹特征;所述第二声纹特征模型是根据多个标注有说话人的所述第二语音数据训练得到的。
  11. 一种计算机可读存储介质,包括指令,其特征在于,当所述指令在电子设备上运行时,使得所述电子设备执行如权利要求1-5中任一项所述的方法。
  12. 一种包含指令的计算机程序产品,其特征在于,当所述计算机程序产品在电子设备上运行时,使得所述电子设备执行如权利要求1-5中任一项所述的方法。
PCT/CN2021/116073 2020-10-30 2021-09-01 一种电子设备解锁方法和装置 WO2022088963A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011188127.2A CN114444042A (zh) 2020-10-30 2020-10-30 一种电子设备解锁方法和装置
CN202011188127.2 2020-10-30

Publications (1)

Publication Number Publication Date
WO2022088963A1 true WO2022088963A1 (zh) 2022-05-05

Family

ID=81358452

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/116073 WO2022088963A1 (zh) 2020-10-30 2021-09-01 一种电子设备解锁方法和装置

Country Status (2)

Country Link
CN (1) CN114444042A (zh)
WO (1) WO2022088963A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117131491B (zh) * 2023-10-27 2024-04-02 荣耀终端有限公司 解锁控制方法及相关装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105701887A (zh) * 2014-11-26 2016-06-22 常州峰成科技有限公司 一种声纹锁及其开锁方法
CN108766441A (zh) * 2018-05-29 2018-11-06 广东声将军科技有限公司 一种基于离线声纹识别和语音识别的语音控制方法及装置
CN109325337A (zh) * 2018-11-05 2019-02-12 北京小米移动软件有限公司 解锁方法及装置
CN109515385A (zh) * 2018-12-05 2019-03-26 上海博泰悦臻电子设备制造有限公司 防止声纹重放攻击的方法、车机及车辆
CN109979438A (zh) * 2019-04-04 2019-07-05 Oppo广东移动通信有限公司 语音唤醒方法及电子设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105701887A (zh) * 2014-11-26 2016-06-22 常州峰成科技有限公司 一种声纹锁及其开锁方法
CN108766441A (zh) * 2018-05-29 2018-11-06 广东声将军科技有限公司 一种基于离线声纹识别和语音识别的语音控制方法及装置
CN109325337A (zh) * 2018-11-05 2019-02-12 北京小米移动软件有限公司 解锁方法及装置
CN109515385A (zh) * 2018-12-05 2019-03-26 上海博泰悦臻电子设备制造有限公司 防止声纹重放攻击的方法、车机及车辆
CN109979438A (zh) * 2019-04-04 2019-07-05 Oppo广东移动通信有限公司 语音唤醒方法及电子设备

Also Published As

Publication number Publication date
CN114444042A (zh) 2022-05-06

Similar Documents

Publication Publication Date Title
WO2021052263A1 (zh) 语音助手显示方法及装置
RU2766255C1 (ru) Способ голосового управления и электронное устройство
WO2021063343A1 (zh) 语音交互方法及装置
CN110138959B (zh) 显示人机交互指令的提示的方法及电子设备
US20220269762A1 (en) Voice control method and related apparatus
CN111724775B (zh) 一种语音交互方法及电子设备
US20220147207A1 (en) Application Quick Start Method and Related Apparatus
CN111819533B (zh) 一种触发电子设备执行功能的方法及电子设备
CN114173000B (zh) 一种回复消息的方法、电子设备和系统、存储介质
CN113496426A (zh) 一种推荐服务的方法、电子设备和系统
CN114840842A (zh) 智能终端的登录方法及电子设备
CN114255745A (zh) 一种人机交互的方法、电子设备及系统
CN110866254A (zh) 一种检测漏洞方法与电子设备
WO2022088964A1 (zh) 一种电子设备的控制方法和装置
WO2022143258A1 (zh) 一种语音交互处理方法及相关装置
WO2022088963A1 (zh) 一种电子设备解锁方法和装置
CN113196732B (zh) 一种跨设备认证方法及相关装置
WO2021147483A1 (zh) 数据分享的方法和装置
CN116032942A (zh) 跨设备的导航任务的同步方法、装置、设备及存储介质
CN114765026A (zh) 一种语音控制方法、装置及系统
CN113742460A (zh) 生成虚拟角色的方法及装置
CN114528538A (zh) 一种指纹验证方法、电子设备与服务器
US20240126897A1 (en) Access control method and related apparatus
CN116030817B (zh) 语音唤醒方法、设备及存储介质
CN116679900B (zh) 一种音频业务处理方法、固件去加载方法及相关装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21884694

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21884694

Country of ref document: EP

Kind code of ref document: A1