WO2022233239A1 - 一种升级方法、装置及电子设备 - Google Patents

一种升级方法、装置及电子设备 Download PDF

Info

Publication number
WO2022233239A1
WO2022233239A1 PCT/CN2022/088237 CN2022088237W WO2022233239A1 WO 2022233239 A1 WO2022233239 A1 WO 2022233239A1 CN 2022088237 W CN2022088237 W CN 2022088237W WO 2022233239 A1 WO2022233239 A1 WO 2022233239A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
user
electronic device
verification
threshold
Prior art date
Application number
PCT/CN2022/088237
Other languages
English (en)
French (fr)
Inventor
徐嘉明
郎玥
杜云凡
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to JP2023568018A priority Critical patent/JP2024517830A/ja
Priority to EP22798580.1A priority patent/EP4318465A1/en
Publication of WO2022233239A1 publication Critical patent/WO2022233239A1/zh
Priority to US18/502,517 priority patent/US20240071392A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F8/00Arrangements for software engineering
    • G06F8/60Software deployment
    • G06F8/65Updates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/08Use of distortion metrics or a particular distance between probe pattern and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches

Definitions

  • the present application relates to the field of computer technology, and in particular, to an upgrade method, apparatus, and electronic device.
  • Voiceprint recognition is a technology that automatically recognizes and confirms the speaker's identity through speech signals.
  • the basic scheme of voiceprint recognition includes two stages: registration process and verification process.
  • the voiceprint recognition system on the electronic device uses a pre-trained deep model (referred to as "voiceprint feature extraction model” or “model” in this paper) to extract voiceprint features from the registered voice entered by the user, It is stored in the electronic device as a user feature template;
  • the voiceprint recognition system on the electronic device uses the same voiceprint feature extraction model as in the registration process to extract the voiceprint feature from the verification voice as the feature to be verified, and then The user's identity is verified based on the characteristics to be verified and the user characteristic template obtained in the registration process.
  • the electronic device when upgrading the voiceprint recognition system on the electronic device (for example, updating the voiceprint feature extraction model), it is necessary to re-execute the registration process (that is, the user re-enters the registered voice, and the electronic device uses the new voiceprint feature extraction model from the new The voiceprint feature is extracted from the registered voice as a new user feature template). If you do not re-register, in the subsequent verification process, the features to be verified extracted by the electronic device using the new voiceprint feature extraction model cannot match the features of the old user template, resulting in poorer recognition performance of the voiceprint recognition system; however, If the registration process is re-executed for each upgrade, it will have a great negative impact on the user experience.
  • Embodiments of the present application provide an upgrade method, apparatus, and electronic device, which can realize the upgrade of the voiceprint recognition system without the user's perception, and take into account both the voiceprint recognition performance and the user experience.
  • an upgrade method is provided, which is applied to an electronic device.
  • the method includes: collecting a first verification voice entered by a user by the electronic device; processing the first verification voice by using a first model saved in the electronic device to obtain the first verification voice.
  • Voiceprint feature verify the identity of the user based on the first voiceprint feature and the first user feature template saved in the electronic device; wherein, the first user feature template is the electronic device using the first model to verify the history of the user's voice or The voiceprint feature obtained by registering the voice for processing; after verifying the identity of the user, if the electronic device has received the second model, the second model is used to process the first verification voice to obtain the second voiceprint feature ; using the second voiceprint feature to update the first user feature template saved in the electronic device, and using the second model to update the first model saved in the electronic device.
  • the verification voice obtained in the verification process is used as the new registration voice to complete the upgrade registration, and the voiceprint recognition system can be upgraded without the user's perception. Take into account the voiceprint recognition performance and user experience.
  • the electronic device can calculate the similarity between the first voiceprint feature and the first user feature template; verify the user's identity by judging whether the similarity is greater than the first verification threshold corresponding to the first model; If yes, the verification passes; otherwise, the verification fails.
  • the electronic device uses the second model to process the first verification voice, if the electronic device has received the second verification threshold corresponding to the second model, the electronic device also uses the second verification threshold to update the first verification threshold.
  • the electronic device may use the second model to process the first verification voice only when the quality of the first verification voice satisfies the first preset condition.
  • the first preset condition includes, but is not limited to, for example: the similarity between the first voiceprint feature and the first user feature template is greater than or equal to the first registration-free threshold; and/or, the signal-to-noise ratio of the first verification voice is greater than or equal to Equal to the first SNR threshold.
  • the quality of the second voiceprint feature can be guaranteed, thereby ensuring the performance of the upgraded voiceprint recognition system.
  • the first registration-free threshold is greater than or equal to the first verification threshold corresponding to the first model.
  • the quality of the second voiceprint feature can be further improved, and the performance of the upgraded voiceprint recognition system can be improved.
  • the electronic device uses the second model to process the first verification voice, if the electronic device has received the second registration-free threshold, it can also use the second registration-free threshold to update the first registration-free threshold. and/or, after the electronic device uses the second model to process the first verification voice, if the electronic device has received the second signal-to-noise ratio threshold, the second signal-to-noise ratio threshold may also be used to update the first signal-to-noise ratio threshold than the threshold.
  • the registration-free threshold, the signal-to-noise ratio threshold, etc. can also be automatically updated, which can further improve the quality of the second voiceprint feature and improve the performance of the upgraded voiceprint recognition system.
  • the electronic device may use the preset number of second voiceprint features to update the first user stored in the electronic device only after the number of second voiceprint features accumulated by the electronic device reaches a preset number. feature template, and use the second model to update the first model saved in the electronic device.
  • the upgraded voiceprint recognition system has multiple user feature templates (ie, second voiceprint features), which can further improve the performance of the upgraded voiceprint recognition system.
  • the electronic device uses the second voiceprint feature to update the first user feature template saved in the electronic device, and uses the second model to update the first model saved in the electronic device, the electronic device also collects the first user entered data. Second, verify the voice; use the second model to process the second verification voice to obtain a third voiceprint feature; verify the identity of the user based on the third voiceprint feature and the second voiceprint feature.
  • the electronic device will use the new model and the new user feature template to perform the verification process, which can further improve the voiceprint recognition performance of the electronic device.
  • the electronic device may also prompt the user to enter the verification voice.
  • the display screen displays prompt information, or the speaker outputs prompt voice and so on.
  • an upgrade apparatus in a second aspect, is provided, the apparatus may be an electronic device or a chip in the electronic device, and the apparatus includes a method for executing the method described in the first aspect or any possible implementation manner of the first aspect. unit/module.
  • the device may include: a data collection unit for collecting the first verification voice entered by the user; a computing unit for processing the first verification voice by using the first model saved in the device to obtain the first voiceprint feature; verify the identity of the user based on the first voiceprint feature and the first user feature template saved in the device; wherein, the first user feature template is that the device uses the first model to process the user's historical verification voice or registered voice The obtained voiceprint feature; after verifying the identity of the user, if the device has received the second model, the second model is used to process the first verification voice to obtain the second voiceprint feature; The first user feature template saved in the pattern feature updating device is used, and the first model saved in the device is updated using the second model.
  • an electronic device comprising: a microphone and a processor; wherein the microphone is used for: collecting a first verification voice entered by a user; and the processor is used for: using a first model saved in the electronic device to perform the first verification voice Perform processing to obtain the first voiceprint feature; verify the identity of the user based on the first voiceprint feature and the first user feature template saved in the electronic device; wherein, the first user feature template is the electronic device using the first model to The voiceprint feature obtained by processing the user's historical verification voice or registered voice; after verifying the user's identity, if the electronic device has received the second model, the second model is used to process the first verification voice to obtaining a second voiceprint feature; using the second voiceprint feature to update the first user feature template saved in the electronic device, and using the second model to update the first model saved in the electronic device.
  • a chip is provided, the chip is coupled with a memory in an electronic device, and performs the method described in the first aspect or any possible implementation manner of the first aspect.
  • a fifth aspect provides a computer storage medium, where computer instructions are stored in the computer storage medium, and when the computer instructions are executed by one or more processing modules, the first aspect or any of the possible implementations of the first aspect is implemented. method described.
  • a sixth aspect provides a computer program product comprising instructions, the computer program product having instructions stored in the computer program product, when running on a computer, causes the computer to execute the above-mentioned first aspect or any possible implementation manner of the first aspect method described in.
  • FIG. 1 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
  • FIG. 2 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 3 is a flowchart of an upgrade method provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of an electronic device prompting a user to enter a registered voice
  • FIG. 5 is a schematic diagram of a user triggering an electronic device to collect and verify voice
  • 6A is a schematic diagram of a specific registration-free upgrade processing method provided by an embodiment of the present application.
  • 6B is a schematic diagram of another specific registration-free upgrade processing method provided by an embodiment of the present application.
  • 6C is a schematic diagram of another specific registration-free upgrade processing method provided by an embodiment of the present application.
  • FIG. 6D is a schematic diagram of another specific registration-free upgrade processing manner provided by an embodiment of the present application.
  • Voiceprint is a spectrum of sound waves that carry speech information displayed by electroacoustic instruments. Voiceprint has the characteristics of stability, measurability and uniqueness. After adulthood, the human voice can remain relatively stable for a long time. The vocal organs that people use when speaking are very different in size and shape, so any two people have different voiceprints, and different people's voices have different distributions of formants in the spectrogram. Voiceprint recognition is to judge whether it is the same person by comparing the voices of the speakers of the two speeches on the same phoneme, so as to realize the function of "recognizing people by hearing voices".
  • voiceprint recognition can be divided into two types: Text-Dependent and Text-1 independent.
  • the text-related voiceprint recognition system requires users to pronounce according to the specified content, each person's voiceprint model is accurately established one by one, and the recognition must also be pronounced according to the specified content, so a better recognition effect can be achieved, but the system The user's cooperation is required. If the user's pronunciation does not conform to the prescribed content, the user cannot be correctly identified.
  • the text-independent recognition system does not specify the content of the speaker's pronunciation, and it is relatively difficult to build a model, but it is easy for users to use and has a wide range of applications. Because of practicality, currently text-related voiceprint recognition algorithms are generally used on terminal devices.
  • voiceprint recognition can be divided into two types: speaker identification (SI) and speaker verification (SV).
  • SI speaker identification
  • SV speaker verification
  • speaker identification which is used to determine which one of several people speaks a certain piece of speech
  • Speaker confirmation which is used to confirm whether a certain speech is spoken by a designated person, is a "one-to-one discrimination" problem.
  • the speaker confirmation function includes two parts: the registration process and the verification process.
  • the registration process includes: before the user officially uses the voiceprint recognition function, the voiceprint recognition system collects the registered voice entered by the user, and then according to the pre-trained depth model (this article calls it "voiceprint feature extraction model” or “model") Extract the voiceprint feature from the registered voice, and save the voiceprint feature as a user feature template in the electronic device;
  • the verification process includes: when the user uses the voiceprint recognition function, the voiceprint recognition system collects the verification voice entered by the user, and uses the same The same voiceprint feature extraction model in the registration process extracts the voiceprint features from the verification voice as the features to be verified, and then scores the similarity between the features to be verified and the user feature template obtained in the registration process, and confirms the user's identity according to the scoring results.
  • the voiceprint recognition system is generally run offline on electronic devices, and the trained voiceprint features need to be pre-trained.
  • the extracted model is stored on the electronic device.
  • the voiceprint feature is extracted from the entered registered voice as a new user feature template). If you do not re-register, in the subsequent verification process, the features to be verified extracted by the voiceprint recognition system using the new voiceprint feature extraction model cannot match the features of the old user template, and the recognition performance of the voiceprint recognition system will deteriorate instead; However, if the registration process is re-executed for each upgrade, it will have a great negative impact on the user experience.
  • the embodiment of the present application provides an upgrade solution, when the electronic device is detected to have a new feature extraction model, the user does not need to provide a registration voice for user registration, but directly according to the verification voice obtained in the verification process. User registration.
  • the voiceprint recognition system can be upgraded without the user's perception, taking into account the voiceprint recognition performance and user experience.
  • the electronic device in the embodiment of the present application includes at least a data acquisition unit 01, a storage unit 02, a communication unit 03 and a calculation unit 04, and each unit can be connected and communicated through an input output (IO) interface.
  • IO input output
  • the data collection unit 01 is used to collect the voice input by the user (registration voice, verification voice, etc.). Its specific implementation can be a microphone, a sound sensor, and the like.
  • the storage unit 02 is used for storing the voiceprint feature extraction model and threshold used by the voiceprint recognition function, and the user template features obtained by the user registration module in the computing unit 04 .
  • the communication unit 03 is used to receive a new voiceprint feature extraction model, and can also be used to receive a new threshold, and provide it to the calculation unit 04 .
  • Computing unit 04 includes:
  • the user registration module 401 is used for extracting user template features according to the registration voice obtained by the data collection unit 01, and providing them to the verification module 402;
  • the verification module 402 is used to verify the identity of the speaker according to the verification voice obtained by the data acquisition unit 01 and the user template features, models and thresholds stored in the storage unit 02 to obtain a verification result;
  • the registration-free upgrade module 403 is used to determine the new user template feature according to the new model received by the communication unit 03, and the verification voice and the scoring result (optional) obtained by the verification module, and update the storage based on the new user template feature and the new model Old user template feature, old model in unit 02.
  • the registration-free upgrade module further updates the threshold stored in the storage unit 02 .
  • the electronic device in the embodiments of the present application.
  • electronic device including but not limited to: mobile phones, tablet computers, artificial intelligence (artificial intelligence, AI) intelligent voice terminals, wearable devices, augmented reality (AR)/virtual reality (virtual reality, VR) devices, in-vehicle terminals, Laptops, desktop computers, smart home devices (eg, smart TVs, smart speakers), etc.
  • AI artificial intelligence
  • AR augmented reality
  • VR virtual reality
  • in-vehicle terminals Laptops
  • desktop computers smart home devices (eg, smart TVs, smart speakers), etc.
  • FIG. 2 it is a schematic diagram of the hardware structure of a mobile phone 100 according to an embodiment of the present application.
  • the mobile phone 100 includes a processor 110, an internal memory 121, an external memory interface 122, a camera 131, a display screen 132, a sensor module 140, a subscriber identification module (SIM) card interface 151, buttons 152, an audio module 160, and a speaker 161 , receiver 162, microphone 163, headphone jack 164, universal serial bus (USB) interface 170, charging management module 180, power management module 181, battery 182, mobile communication module 191 and wireless communication module 192.
  • the cell phone 100 may further include motors, indicators, buttons, and the like.
  • FIG. 2 is only an example.
  • the mobile phone 100 of the embodiment of the present application may have more or fewer components than the mobile phone 100 shown in the figure, two or more components may be combined, or may have different component configurations.
  • the various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.
  • the processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (AP), a modem, a graphics processing unit (GPU), an image signal processor (ISP), a controller, a video codec, Digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (neural-network processing unit, NPU), etc.
  • AP application processor
  • GPU graphics processing unit
  • ISP image signal processor
  • controller a video codec
  • Digital signal processor digital signal processor
  • DSP digital signal processor
  • baseband processor baseband processor
  • neural-network processing unit neural-network processing unit
  • a buffer may also be provided in the processor 110 for storing instructions and/or data.
  • the buffer in the processor 110 may be a cache memory.
  • the cache may be used to hold instructions and/or data that have just been used, generated, or recycled by the processor 110 . If the processor 110 needs to use the instruction or data, it can be called directly from the buffer. This helps reduce the time it takes for the processor 110 to fetch instructions or data, thereby helping to improve the efficiency of the system.
  • Internal memory 121 may be used to store programs and/or data.
  • the internal memory 121 includes a stored program area and a stored data area.
  • the program storage area may be used to store an operating system (such as an operating system such as Android, IOS, etc.), a computer program required for at least one function, and the like.
  • the stored program area may store a computer program (such as a voiceprint recognition system) required for the voiceprint recognition function, and the like.
  • the storage data area may be used to store data (such as audio data) created and/or collected during the use of the mobile phone 100 .
  • the processor 110 may cause the mobile phone 100 to execute a corresponding method by calling programs and/or data stored in the internal memory 121, thereby implementing one or more functions.
  • the processor 110 invokes certain programs and/or data in the internal memory, so that the mobile phone 100 executes the upgrade method provided in the embodiments of the present application.
  • the internal memory 121 may adopt a high-speed random access memory, and/or a non-volatile memory, or the like.
  • the non-volatile memory may include at least one of one or more magnetic disk storage devices, flash memory devices, and/or universal flash storage (UFS), among others.
  • the external memory interface 122 can be used to connect an external memory card (eg, a Micro SD card) to expand the storage capacity of the mobile phone 100 .
  • the external memory card communicates with the processor 110 through the external memory interface 122 to realize the data storage function.
  • the mobile phone 100 can save images, music, videos and other files in the external memory card through the external memory interface 122 .
  • the camera 131 may be used to capture moving, still images, and the like.
  • the camera 131 includes a lens and an image sensor.
  • the optical image generated by the object through the lens is projected onto the image sensor, and then converted into an electrical signal for subsequent processing.
  • the image sensor may be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the image sensor converts the optical signal into an electrical signal, and then passes the electrical signal to the ISP to convert it into a digital image signal.
  • the mobile phone 100 may include one or N cameras 131 , where N is a positive integer greater than one.
  • Display screen 132 may include a display panel for displaying a user interface.
  • the display panel can be a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode, or an active matrix organic light emitting diode (active-matrix organic light).
  • emitting diode, AMOLED organic light-emitting diode
  • flexible light-emitting diode flexible light-emitting diode, FLED
  • Miniled MicroLed, Micro-oLed, quantum dot light-emitting diode (quantum dot light emitting diodes, QLED) and so on.
  • the mobile phone 100 may include one or M display screens 132 , where M is a positive integer greater than one.
  • the mobile phone 100 may implement a display function through a GPU, a display screen 132, an application processor, and the like.
  • Sensor module 140 may include one or more sensors. For example, a touch sensor 140A, a gyroscope 140B, an acceleration sensor 140C, a fingerprint sensor 140D, a pressure sensor 140E, and the like. In some embodiments, the sensor module 140 may also include an ambient light sensor, a distance sensor, a proximity light sensor, a bone conduction sensor, a temperature sensor, and the like.
  • the SIM card interface 151 is used for connecting a SIM card.
  • the SIM card can be contacted and separated from the mobile phone 100 by inserting into the SIM card interface 151 or pulling out from the SIM card interface 151 .
  • the mobile phone 100 may support one or K SIM card interfaces 151 , where K is a positive integer greater than one.
  • the SIM card interface 151 may support Nano SIM cards, Micro SIM cards, and/or SIM cards, and the like. Multiple cards can be inserted into the same SIM card interface 151 at the same time.
  • the types of the plurality of cards may be the same or different.
  • the SIM card interface 151 can also be compatible with different types of SIM cards.
  • the SIM card interface 151 is also compatible with external memory cards.
  • the mobile phone 100 interacts with the network through the SIM card to realize functions such as call and data communication.
  • the handset 100 may also employ an eSIM, i.e. an embedded SIM card.
  • the eSIM card can be embedded in the mobile phone 100 and cannot be separated from the mobile phone 100 .
  • the keys 152 may include a power key, a volume key, and the like.
  • the keys 152 may be mechanical keys or touch keys.
  • the cell phone 100 can receive key input and generate key signal input related to user settings and function control of the cell phone 100 .
  • the mobile phone 100 can implement audio functions through an audio module 160, a speaker 161, a receiver 162, a microphone 163, an earphone interface 164, an application processor, and the like. For example, audio playback function, recording function, voiceprint registration function, voiceprint verification function, voiceprint recognition function, etc.
  • the audio module 160 may be used to perform digital-to-analog conversion and/or analog-to-digital conversion on audio data, and may also be used to encode and/or decode audio data.
  • the audio module 160 may be provided independently of the processor, or may be provided in the processor 110 , or some functional modules of the audio module 160 may be provided in the processor 110 .
  • the speaker 161 also called “speaker”
  • the speaker 161 is used to convert audio data into sound and play the sound.
  • the mobile phone 0100 can listen to music through the speaker 161, answer a hands-free call, or issue a voice prompt, and so on.
  • the receiver 162 also referred to as an "earpiece" is used to convert audio data into sound and to play back the sound. For example, when the mobile phone 0100 answers a call, it can be answered by placing the receiver 162 close to the human ear.
  • the microphone 163, also called “microphone” or “microphone”, is used to collect sound (eg, ambient sound, including sounds made by people, sounds made by equipment, etc.), and convert the sound into audio electrical data.
  • sound eg, ambient sound, including sounds made by people, sounds made by equipment, etc.
  • the user can make a sound through the human mouth close to the microphone 163, and the microphone 163 collects the sound made by the user.
  • the microphone 163 can collect ambient sound in real time to obtain audio data.
  • the mobile phone 100 may be provided with at least one microphone 163 .
  • two microphones 163 are provided in the mobile phone 100, which can realize noise reduction function in addition to collecting sound.
  • three, four or more microphones 163 may be provided in the mobile phone 100, so that on the basis of sound collection and noise reduction, sound source identification, or directional recording functions, etc. can be realized.
  • the earphone jack 164 is used to connect wired earphones.
  • the earphone interface 164 can be a USB interface 170, or a 3.5mm open mobile terminal platform (OMTP) standard interface, a cellular telecommunications industry association of the USA (CTIA) standard interface, etc. .
  • OMTP open mobile terminal platform
  • CTIA cellular telecommunications industry association of the USA
  • the USB interface 170 is an interface that conforms to the USB standard specification, and specifically may be a Mini USB interface, a Micro USB interface, a USB Type C interface, and the like.
  • the USB interface 170 can be used to connect a charger to charge the mobile phone 100, and can also be used to transfer data between the mobile phone 100 and peripheral devices. It can also be used to connect headphones to play audio through the headphones.
  • the USB interface 170 can be used to connect other mobile phones 100, such as AR devices, computers, and the like, in addition to the headphone interface 164.
  • the charging management module 180 is used to receive charging input from the charger.
  • the charger may be a wireless charger or a wired charger.
  • the charging management module 180 may receive charging input from the wired charger through the USB interface 170 .
  • the charging management module 180 may receive wireless charging input through the wireless charging coil of the mobile phone 100 . While the charging management module 180 charges the battery 182 , the mobile phone 100 can also be powered by the power management module 180 .
  • the power management module 181 is used for connecting the battery 182 , the charging management module 180 and the processor 110 .
  • the power management module 181 receives input from the battery 182 and/or the charge management module 180, and supplies power to the processor 110, the internal memory 121, the display screen 132, the camera 131, and the like.
  • the power management module 181 can also be used to monitor parameters such as battery capacity, battery cycle times, battery health status (leakage, impedance).
  • the power management module 181 may also be provided in the processor 110 .
  • the power management module 181 and the charging management module 180 may also be provided in the same device.
  • the mobile communication module 191 can provide wireless communication solutions including 2G/3G/4G/5G etc. applied on the mobile phone 100 .
  • the mobile communication module 191 may include a filter, a switch, a power amplifier, a low noise amplifier (LNA), and the like.
  • LNA low noise amplifier
  • the wireless communication module 192 may provide applications on the mobile phone 100 including WLAN (such as Wi-Fi network), Bluetooth (BT), global navigation satellite system (GNSS), frequency modulation (FM), Solutions for wireless communication such as near field communication (NFC) and infrared (IR).
  • WLAN such as Wi-Fi network
  • BT Bluetooth
  • GNSS global navigation satellite system
  • FM frequency modulation
  • Solutions for wireless communication such as near field communication (NFC) and infrared (IR).
  • the wireless communication module 192 may be one or more devices integrating at least one communication processing module.
  • the antenna 1 of the cell phone 100 is coupled with the mobile communication module 191, and the antenna 2 is coupled with the wireless communication module 192, so that the cell phone 100 can communicate with other devices.
  • the mobile communication module 191 can communicate with other devices through the antenna 1
  • the wireless communication module 192 can communicate with other devices through the antenna 2 .
  • the mobile phone 100 may receive upgrade information (including a new voiceprint feature extraction model, a new threshold, etc.) from other devices based on the wireless communication module 192, and then update the voiceprint recognition system on the electronic device based on the upgrade information (eg, update Voiceprint feature extraction model, update threshold, etc.).
  • the other device may be a server of a cloud service manufacturer, such as a platform established and maintained by the manufacturer or operator of the mobile phone 100 to provide required services in an on-demand and easily scalable manner through the network.
  • the server of the mobile phone manufacturer "Huawei".
  • other devices may also be other electronic devices, which are not limited in this application.
  • FIG. 3 a flow of an upgrade method provided by an embodiment of the present application is exemplarily shown.
  • the electronic device collects the first registered voice entered by the user
  • the electronic device may collect ambient sounds through the microphone 163 to obtain the first registered voice entered by the user.
  • the user can speak the first registration voice under the prompt of the electronic device.
  • the electronic device may display text on the display screen 132 to prompt the user to speak the registered phrase "1234567".
  • the electronic device can also give voice prompts through the speaker 161, and so on.
  • the electronic device prompts the user to enter the registration voice.
  • the electronic device may automatically prompt the user to speak the registration voice when the user activates the voiceprint recognition function of the electronic device for the first time, or it may be the first time the user activates the electronic device.
  • the voiceprint recognition function of the device is enabled, the user operates the electronic device to prompt the user to speak the registration voice, or, when the user subsequently activates the voiceprint recognition function, the user triggers the electronic device to prompt the user to speak the registration voice as required.
  • the user can input the registration voice for multiple times when registering the voiceprint, so that the accuracy of the voiceprint recognition can be improved.
  • the electronic device uses the pre-saved first model to process the first registered voice, obtains a first user feature template, and saves the first user feature template;
  • the first model is a voiceprint feature extraction model trained in advance with a neural network, the input of the first model is speech, and the output is the voiceprint feature corresponding to the input speech.
  • the algorithm on which the first model is based may be, but not limited to, algorithms such as filter bank (filter bank, FBank), mel-frequency cepstral coefficients (mel-frequency cepstral coefficients, MFCC), D-vector, and the like.
  • the first registered voice can be checked for quality first. . Only when the quality of the first registered voice meets the first preset requirement, the first registered voice is used for registration (that is, the pre-saved first model processes the first registered voice, obtains the first user feature template, and saves the first registered voice). User Feature Template). If the quality is not good, the first registration voice may be rejected for registration, and the user may be prompted to re-enter the registration voice to try registration again.
  • the electronic device prompts the user to say the keyword of the voice assistant three times on the display screen 132, such as "Xiaoyi Xiaoyi". Every time the user speaks Xiaoyi Xiaoyi, the microphone 163 of the electronic device will send the collected voice to the processor 110 of the mobile phone.
  • the processor 110 separates the segment of speech corresponding to the keyword as the registered speech. Then the processor 110 determines the signal-to-noise ratio of the registered speech, and judges whether the signal-to-noise ratio meets the requirements. When the signal-to-noise ratio is lower than the set threshold (ie, the noise is too large), the registration is rejected.
  • the processor 110 uses the first model to calculate the speech to obtain user template features, which are stored in the internal memory 121 .
  • the number of the first user feature templates stored by the electronic device may be multiple, so that the accuracy of voiceprint recognition can be improved.
  • S301 to S302 are the first time the electronic device registers the user's voice, that is, it is executed before the user uses the voiceprint recognition function for the first time. After the first registration is completed, the user can start to use the voiceprint recognition function, as shown in S303-S304.
  • the electronic device collects the first verification voice entered by the user
  • the user can speak the verification voice under the prompt of the electronic device.
  • the method of the electronic device prompting the user to speak the verification word is similar to the method of the electronic device prompting the user to speak the registration word, and the repeated parts will not be repeated one by one.
  • the electronic device prompts the user to enter the verification voice
  • the electronic device may automatically prompt the user to speak the first verification voice after the user is turned on, and the first verification voice is used to verify the user's identity to unlock the electronic device, or, It can be that when the user opens an encrypted application (such as a diary), the electronic device automatically prompts the user to speak the first verification voice, and the first verification voice is used to verify the user's identity to unlock the application, or it can also be that the user opens the application to prepare When logging into the account, the electronic device automatically prompts the user to speak the first verification voice, and the first verification voice is used to verify the user's identity to automatically fill in the user's account and password.
  • an encrypted application such as a diary
  • the electronic device automatically prompts the user to speak the first verification voice
  • the first verification voice is used to verify the user's identity to unlock the application
  • the electronic device may collect the first verification voice entered by the user under the trigger of the user's operation. For example, the user triggers the verification instruction by operating the electronic device, so that the electronic device collects and prompts the user to enter the first verification voice after receiving the verification instruction. And collect the first verification voice entered by the user.
  • the user can trigger the verification instruction by clicking the corresponding position of the icon corresponding to the voiceprint recognition function on the touch screen of the electronic device, so that the electronic device prompts the user to speak the first verification voice; for another example, the user can operate a physical entity (such as a physical key, mouse, joystick, etc.) to trigger; for another example, the user can trigger the verification instruction through a specific gesture (such as double-clicking the touch screen of the electronic device, etc.), so that the electronic device prompts the user to speak the first verification voice.
  • a physical entity such as a physical key, mouse, joystick, etc.
  • the user can speak the keyword "voiceprint recognition” to an electronic device (such as a smart phone, a vehicle-mounted device, etc.), and the electronic device collects the keyword "voiceprint recognition” sent by the user through the microphone 163 and triggers a verification instruction, And prompt the user to speak the first verification voice.
  • an electronic device such as a smart phone, a vehicle-mounted device, etc.
  • the electronic device collects the control command, and uses the control command as the first verification voice to perform voiceprint recognition. That is, the electronic device triggers the verification instruction when receiving the control command, and uses the control instruction as the first verification voice to perform voiceprint recognition.
  • a user can issue a control command “turn on music” to an electronic device (such as a smart phone, a vehicle-mounted device, etc.), and after the electronic device collects the user’s voice “turn on music” through the microphone 163 The voice is used as the first verification voice for voiceprint recognition.
  • a user can issue a control command "Turn to 27°C" to an electronic device (such as a smart air conditioner), and the electronic device collects the user's voice “Turn to 27°C” through the microphone 163, and uses the voice as the first verification voice. Perform voiceprint recognition.
  • an electronic device such as a smart air conditioner
  • the verification voice can be input multiple times, so that the accuracy of the voiceprint recognition can be improved.
  • the electronic device uses the first model to process the first verification voice to obtain the first voiceprint feature; verify the identity of the user based on the first voiceprint feature and the first user feature template saved in the electronic device;
  • the electronic device inputs the first verification voice into the same model (ie, the first model) as in the registration process in S302, and the first model outputs the voiceprint feature.
  • the electronic device calculates the similarity between the first voiceprint feature and the first user feature template.
  • the method for calculating the similarity may include, but is not limited to, algorithms such as cosine distance (CDS), linear discriminant analysis (LDA), and prob-ailistic linear discriminant analysis (PLDA).
  • CDS cosine distance
  • LDA linear discriminant analysis
  • PLDA prob-ailistic linear discriminant analysis
  • the cosine distance model scoring calculate the cosine value between the feature vector of the first voiceprint feature to be verified and the feature vector of the user template feature, as the similarity score (ie scoring result);
  • the probability linear discriminant analysis model scoring Use the pre-trained probabilistic linear discriminant analysis model to calculate the similarity score (ie the scoring result) between the first voiceprint feature to be verified and the user template feature. It should be understood that if the user has registered multiple user template features, fusion matching and scoring can be performed according to the first voiceprint feature to be verified and the multiple user template features.
  • the electronic device chooses to accept or reject the control instruction corresponding to the verification voice according to the scoring result. For example, the electronic device determines whether the similarity is greater than the first verification threshold corresponding to the first model; if so, the verification is passed, that is, the speaker of the verification voice is the same as the speaker of the registered voice, and then a corresponding control operation (such as unlocking the voice) is performed. Electronic devices, opening applications or logging in account passwords, etc.); otherwise, the verification fails, that is, the speaker of the verification voice is inconsistent with the speaker of the registered voice, and the corresponding control operation is not performed.
  • the electronic device may display the verification result on the display screen 132 to prompt the user that the verification fails, or the electronic device may also prompt the user to re-enter the verification voice and try the verification again.
  • the electronic device verifies the identity of the user, if the electronic device has received the second model, the second model is used to process the first verification voice to obtain the second voiceprint feature, and the second voiceprint feature is used to update The first user feature template saved in the electronic device, and the first model saved in the electronic device is updated using the second model.
  • the time when the electronic device receives the second model is later than the time when the electronic device receives the first model, in other words, the second model is a newer model than the first model.
  • the second model is a voiceprint feature extraction model trained by a neural network in advance.
  • the input of the second model is voice
  • the output is the voiceprint feature corresponding to the input voice.
  • the algorithm based on the second model may be, but not limited to, FBank, MFCC, D-vector and other algorithms.
  • the source of the second model can be actively pushed by the cloud server.
  • the cloud server may push a new model (eg, a second model) to the electronic device when the voiceprint recognition model on the electronic device needs to be upgraded.
  • the electronic device After receiving the second model, the electronic device uses the previous verification process (provided that the verification result of this verification process is passed, so as to ensure that the verification voice (such as the first verification voice) obtained in this verification process is spoken by the registrant.
  • the first verification voice obtained in is used as the new registration voice
  • the second model is used to process the first verification voice to obtain the second voiceprint feature.
  • the electronic device uses the second voiceprint feature to update the first user feature template saved in the electronic device, and uses the second model to update the first model saved in the electronic device, so that the user does not perceive (the user does not need to perform input registration voice operation) of the upgrade registration.
  • the electronic device directly uses the second voiceprint feature as a new user feature template (in order to distinguish it from the first user feature template, the second voiceprint feature is referred to as a second user feature template here), and uses the second user feature
  • the template replaces the first user feature template stored in the electronic device.
  • Mode 2 The electronic device performs weighting/merging processing on the second voiceprint feature and the first user template to obtain a third user feature template, and uses the third user feature template to replace the first user feature template saved in the electronic device.
  • the electronic device may directly use the second model to replace the first model, or the electronic device may perform weighting/combination processing on the first model and the second model, which is not limited in this application.
  • the electronic device only receives part of the updated parameters of the model, and then updates the relevant parameters of the first model based on the updated parameters, instead of directly updating the entire model.
  • different verification thresholds may be corresponding.
  • the electronic device uses the second model to process the first verification voice, if the electronic device has received the second verification threshold corresponding to the second model, it can also use the second verification threshold to update the first verification threshold, so that the verification threshold can be updated. 's update.
  • the electronic device may also use the second model to process the first verification voice after determining that both the second model and the second verification threshold have been received.
  • the update method of the verification threshold may be to use the second verification threshold to replace the first verification threshold, or to perform weighting/combination processing on the second verification threshold and the first verification threshold, and use the verification threshold obtained after the weighting/combining processing to replace the The first verification threshold, which is not limited in this application.
  • the electronic device may use the second model to process the first verification voice after determining that the quality of the first verification voice meets the second preset requirement (that is, use the second model to process the first verification voice).
  • the first verification voice is used as the new registration voice).
  • the first preset conditions include but are not limited to the following two:
  • the first registration-free threshold may be calculated according to the first verification threshold used in the verification process (for example, the first registration-free threshold is several decibels higher than the first verification threshold), or it may be an electronic device Pre-set (for example, received and saved from a cloud server in advance), which is not limited in this application.
  • the signal-to-noise ratio of the first verification speech is greater than or equal to the first signal-to-noise ratio threshold.
  • the first signal-to-noise ratio threshold may be obtained according to the set threshold used in the registration process (S301) (for example, the first signal-to-noise ratio threshold is consistent with the set threshold, or the first signal-to-noise ratio threshold is higher than the set threshold several decibels, etc.), or it may be preset by the electronic device (for example, it is received and saved from a cloud server in advance), which is not limited in this application.
  • the first signal-to-noise ratio threshold is greater than or equal to 20dB.
  • the value of the first signal-to-noise ratio threshold can also be fine-tuned according to the specific form of the electronic device. For example, for a mobile phone, the first signal-to-noise ratio threshold can be set to 22dB, and for a smart speaker, The first SNR threshold may be set to 20dB.
  • the first registration-free threshold is greater than or equal to the first verification threshold corresponding to the first model. In this way, the quality of the verification voice as the new registration voice can be guaranteed to be high, and the performance of the upgraded voiceprint recognition system can be further improved.
  • the cloud server may also push a new registration-free threshold to the electronic device, and the electronic device updates the registration-free threshold. For example, after the electronic device determines that the quality of the first verification voice meets the requirements based on the first registration-free threshold, and uses the second model to process the first verification voice, if the electronic device has received the second registration-free threshold, the first verification voice is used. The second registration-free threshold is updated to the first registration-free threshold.
  • the update method of the registration-free threshold may be to replace the first registration-free threshold with the second registration-free threshold, or perform weighting/combination processing on the second registration-free threshold and the first registration-free threshold, and use the weighted/combined processing
  • the obtained registration-free threshold replaces the first registration-free threshold, which is not limited in this application. In this way, the performance of the upgraded voiceprint recognition system can be further improved.
  • the cloud server may also push a new signal-to-noise ratio threshold to the electronic device, and the electronic device updates the signal-to-noise ratio threshold. For example, after the electronic device determines that the quality of the first verification voice meets the requirements based on the first signal-to-noise ratio threshold, and uses the second model to process the first verification voice, if the electronic device has received the second signal-to-noise ratio threshold, then The first signal-to-noise ratio threshold is updated using the second signal-to-noise ratio threshold.
  • the update method of the SNR threshold may be to replace the first SNR threshold with the second SNR threshold, or to perform weighting/combination processing on the second SNR threshold and the first SNR threshold, and use The signal-to-noise ratio threshold obtained after the weighting/combining process replaces the first signal-to-noise ratio threshold, which is not limited in this application. In this way, the performance of the upgraded voiceprint recognition system can be further improved.
  • the above two conditions ie, the registration-free threshold and the signal-to-noise ratio threshold
  • the above two conditions are only examples and not limitations.
  • the first preset condition may also have other implementation manners.
  • the electronic device may use the preset number of second user feature templates only after the cumulatively obtained number of second user feature templates reaches a preset number.
  • the feature template updates the first user feature template stored in the electronic device, and uses the second model to update the first model stored in the electronic device.
  • the electronic device uses the second model to process the verification voice obtained this time, obtains at least one second voiceprint feature, and uses each second voiceprint feature as a second voiceprint feature.
  • the user feature template is stored in the internal memory 121, and then it is judged whether the second user feature template accumulated in the internal memory 121 reaches a preset number (such as 3); if it does not reach the preset number, wait for the next verification process.
  • a second user feature template is obtained based on the second model and the verification voice in the next verification process; if the preset number has been reached, then all the first user feature templates are updated using all the second user feature templates, and the The second model updates the first model stored in the electronic device.
  • each of the above steps is used to trigger the electronic device to use the verification voice as a new registration voice (that is, the electronic device uses the second model to process the first verification voice to obtain the second voiceprint feature (ie, the second user feature template)).
  • Preconditions (such as judging whether the electronic device has received the second model and/or the second verification threshold, judging whether the quality of the first verification voice meets the second preset requirements, etc.) can be implemented in combination with each other, and the electronic device determines each precondition The sequence can be interchanged with each other.
  • step S305 For example, the following are several possible specific implementations of step S305:
  • the electronic device determines whether the scoring result obtained in the verification process is greater than or equal to the first registration-free threshold; if the scoring result is less than the first registration-free threshold , then enter the next verification process; if the scoring result is greater than or equal to the first registration-free threshold, continue to judge whether the electronic device has received the second model and the second verification threshold; if the electronic device has not received the second model and the first Second verification threshold, then enter the next verification process; if the electronic device has received the second model and the second verification threshold, use the second model to process the first verification voice to obtain the second user feature template; the electronic device determines the cumulative whether the number of second user feature templates accumulated reaches the preset number; if the accumulated second user feature template does not reach the preset number, enter the next verification process; if the accumulated second user feature template has reached the preset number, use all The second user profile template updates all the first user profile templates stored in the electronic device,
  • the electronic device executes the verification process and passes the verification, it determines whether the electronic device has received the second model and the second verification threshold; if the electronic device has not received the second model and the second verification threshold, then enter the next verification process; if the electronic device has received the second model and the second verification threshold, then continue to judge whether the scoring result obtained in the verification process is greater than or equal to the first registration-free threshold; if the scoring result If the score is less than the first registration-free threshold, enter the next verification process; if the scoring result is greater than or equal to the first registration-free threshold, use the second model to process the first verification voice to obtain a second user feature template; whether the number of second user feature templates accumulated reaches the preset number; if the accumulated second user feature template does not reach the preset number, enter the next verification process; if the accumulated second user feature template has reached the preset number, use all
  • the second user profile template updates all the first user profile templates stored in the electronic device, and uses the second model to
  • the electronic device determines whether the SNR of the first verification voice is greater than or equal to the first SNR threshold; If the signal-to-noise ratio is less than the first signal-to-noise ratio threshold, enter the next verification process; if the signal-to-noise ratio of the first verification voice is greater than or equal to the first signal-to-noise ratio threshold, continue to judge whether the electronic device has received the second model and The second verification threshold; if the electronic device has not received the second model and the second verification threshold, enter the next verification process; if the electronic device has received the second model and the second verification threshold, use the second model to verify the first 1.
  • the verification voice is processed to obtain a second user feature template; the electronic device determines whether the accumulated second user feature template reaches the preset number; if the accumulated second user feature template does not reach the preset number, enter the next verification process; If the accumulated second user feature templates has reached the preset number, all the second user feature templates are used to update all the first user feature templates saved in the electronic device, and the second model is used to update the first model saved in the electronic device, and using the second verification threshold to update the first verification threshold stored in the electronic device.
  • the electronic device determines whether the SNR of the first verification voice is greater than or equal to the first SNR threshold; If the signal-to-noise ratio is less than the first signal-to-noise ratio threshold, enter the next verification process; if the signal-to-noise ratio of the first verification voice is greater than or equal to the first signal-to-noise ratio threshold, continue to judge whether the scoring result obtained in the verification process is greater than or equal to The first registration-free threshold; if the scoring result is less than the first registration-free threshold, enter the next verification process; if the scoring result is greater than or equal to the first registration-free threshold, continue to judge whether the electronic device has received the second model and the second model.
  • Verification threshold if the electronic device has not received the second model and the second verification threshold, enter the next verification process; if the electronic device has received the second model and the second verification threshold, use the second model to verify the first
  • the voice is processed to obtain the second user feature template; the electronic device determines whether the accumulated second user feature template reaches the preset number; if the accumulated second user feature template does not reach the preset number, enter the next verification process; The number of second user feature templates has reached the preset number, then use all the second user feature templates to update all the first user feature templates saved in the electronic device, and use the second model to update the first model saved in the electronic device, and use The second verification threshold updates the first verification threshold stored in the electronic device.
  • the electronic device uses the second user feature template to update the first user feature template saved in the electronic device, and uses the second model to update the first model saved in the electronic device (that is, after the first registration-free upgrade is completed), the user again
  • the electronic device can use the updated model to perform the verification process, for example: collect the second verification voice entered by the user, and use the second model to process the second verification voice to obtain the third voiceprint feature; The identity of the user is verified based on the third voiceprint feature and the second user feature template.
  • S303 to S304 which will not be repeated here.
  • the electronic device After receiving a model newer than the second model, the electronic device performs a new round of registration-free upgrade processing. For example, after the above-mentioned verification process based on the second model is completed and the user identity verification is passed, if the electronic device has received the third model, the third model is used to process the second verification voice to obtain the third user feature template, and then The second user feature template stored in the electronic device is updated using the third user feature template, and the second model stored in the electronic device is updated using the third model.
  • S305 For a specific implementation method, reference may be made to S305, which will not be repeated here.
  • the embodiments of the present application are also applicable to a multi-user scenario.
  • the main differences in the multi-user scenario are: in the first registration process, the first user feature templates of multiple users need to be registered at the same time; in the verification process, the user of the current user needs to be determined from the user feature templates of multiple users The feature template is used to authenticate the current user; in the upgrade-free registration process, the user feature templates of multiple users need to be updated at the same time.
  • the verification voice obtained in the verification process is used as the new registration voice to complete the upgrade registration, and the voiceprint recognition can be realized without the user's perception.
  • the system upgrade can take into account the voiceprint recognition performance and user experience.
  • an embodiment of the present application further provides a chip, which is coupled with a memory in an electronic device, and can execute the methods shown in FIG. 3 and FIGS. 6A to 6D .
  • an embodiment of the present application also provides a computer storage medium, where computer instructions are stored in the computer storage medium, and the computer instructions, when executed by one or more processing modules, implement the steps shown in FIG. 3 and FIG. 6A to FIG. 6D . method shown.
  • an embodiment of the present application also provides a computer program product containing instructions, the computer program product stores instructions, and when the computer program product runs on a computer, causes the computer to execute the instructions in FIG. 3 and FIG. 6A to FIG. 6D . method shown.
  • the embodiments of the present application may be provided as a method, a system, or a computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory result in an article of manufacture comprising instruction means, the instructions
  • the apparatus implements the functions specified in the flow or flow of the flowcharts and/or the block or blocks of the block diagrams.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • General Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Security & Cryptography (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computer Hardware Design (AREA)
  • Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Telephone Function (AREA)

Abstract

本申请实施例提供一种升级方法、装置及电子设备,方法包括:电子设备采集用户录入的第一验证语音;使用电子设备中保存的第一模型对第一验证语音进行处理,获得第一声纹特征;基于第一声纹特征、以及电子设备中保存的第一用户特征模板验证用户的身份;在验证用户的身份通过之后,若电子设备已接收到第二模型,则使用第二模型对第一验证语音进行处理,以获得第二声纹特征;使用第二声纹特征更新第一用户特征模板,使用第二模型更新第一模型。本申请实施例将验证过程中获取的验证语音作为新的注册语音完成声纹识别系统的升级注册,可以在用户无感知的情况下实现对声纹识别系统的升级,能够兼顾声纹识别性能和用户体验。

Description

一种升级方法、装置及电子设备
相关申请的交叉引用
本申请要求在2021年05月07日提交中国专利局、申请号为202110493970.X、申请名称为“一种升级方法、装置及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及计算机技术领域,尤其涉及一种升级方法、装置及电子设备。
背景技术
声纹识别是一种通过语音信号来自动辨识和确认说话人身份的技术。声纹识别的基本方案包括注册流程和验证流程两个阶段。在注册流程阶段,电子设备上的声纹识别系统采用预先训练好的深度模型(本文称之为“声纹特征提取模型”或“模型”)从用户录入的注册语音中提取声纹特征,将其作为用户特征模板保存在电子设备中;在验证流程阶段,电子设备上的声纹识别系统采用与注册流程中相同的声纹特征提取模型从验证语音中提取声纹特征作为待验证特征,然后基于待验证特征和注册流程中获取的用户特征模板对用户的身份进行验证。
目前,在对电子设备上的声纹识别系统进行升级(例如更新声纹特征提取模型)时,需要重新执行注册流程(即用户重新录入注册语音,级电子设备使用新的声纹特征提取模型从新的注册语音中提取声纹特征作为新的用户特征模板)。如果不重新注册,后续的验证流程中,电子设备使用新的声纹特征提取模型提取的待验证特征无法跟旧的用户模板特征匹配,导致声纹识别系统的识别性能反而会变差;但是,如果每次升级都重新执行注册流程,又会对用户的使用体验产生很大的负面影响。
因此,如何兼顾声纹识别性能和用户体验,是亟需解决的问题。
发明内容
本申请实施例提供一种升级方法、装置及电子设备,可以在用户无感知的情况下实现对声纹识别系统的升级,兼顾声纹识别性能和用户体验。
第一方面,提供一种升级方法,应用于电子设备,该方法包括:电子设备采集用户录入的第一验证语音;使用电子设备中保存的第一模型对第一验证语音进行处理,获得第一声纹特征;基于第一声纹特征、以及电子设备中保存的第一用户特征模板验证该用户的身份;其中,第一用户特征模板为电子设备使用第一模型对该用户的历史验证语音或注册语音进行处理所获得的声纹特征;在验证该用户的身份通过之后,若电子设备已接收到第二模型,则使用第二模型对第一验证语音进行处理,以获得第二声纹特征;使用第二声纹特征更新电子设备中保存的第一用户特征模板,以及使用第二模型更新电子设备中保存的第一模型。
本申请实施例在对声纹识别系统进行升级时,将验证过程中获取的验证语音作为新的 注册语音,完成升级注册,可以在用户无感知的情况下实现对声纹识别系统的升级,能够兼顾声纹识别性能和用户体验。
一种可能的实现方式中,电子设备可以计算第一声纹特征和第一用户特征模板的相似度;通过判断相似度是否大于第一模型对应的第一验证门限来验证用户的身份;若为是,则验证通过;否则,验证不通过。在电子设备使用第二模型对第一验证语音进行处理之后,若电子设备已接收到第二模型对应的第二验证门限,则还使用第二验证门限更新第一验证门限。
如此,不同的模型对应不同的验证门限,电子设备在升级系统时可以字段更新验证门限,可以进一步提高声纹识别系统的性能。
一种可能的实现方式中,电子设备可以在第一验证语音的质量满足第一预设条件时,才使用第二模型对第一验证语音进行处理。其中,第一预设条件例如包括但不限于:第一声纹特征和第一用户特征模板的相似度大于或等于第一免注册门限;和/或,第一验证语音的信噪比大于或等于第一信噪比门限。
如此,可以保证第二声纹特征的质量,进而保证升级后的声纹识别系统的性能。
一种可能的实现方式中,第一免注册门限大于或等于第一模型对应的第一验证门限。
如此,可进一步提高第二声纹特征的质量,提高升级后的声纹识别系统的性能。
一种可能的实现方式中,电子设备在使用第二模型对第一验证语音进行处理之后,若电子设备已接收到第二免注册门限,则还可以使用第二免注册门限更新第一免注册门限;和/或,电子设备在使用第二模型对第一验证语音进行处理之后,若电子设备已接收到第二信噪比门限,则还可以使用第二信噪比门限更新第一信噪比门限。
如此,免注册门限、信噪比门限等也可以自动更新,可进一步提高第二声纹特征质量,提高升级后的声纹识别系统的性能。
一种可能的实现方式中,电子设备可以在电子设备累计获得的第二声纹特征的数量达到预设数量之后,才使用预设数量的第二声纹特征更新电子设备中保存的第一用户特征模板,以及使用第二模型更新电子设备中保存的第一模型。
如此,可以保证升级后的声纹识别系统中具有多个用户特征模板(即第二声纹特征),可进一步提高升级后的声纹识别系统的性能。
一种可能的实现方式中,电子设备在使用第二声纹特征更新电子设备中保存的第一用户特征模板,使用第二模型更新电子设备中保存的第一模型之后,还采集用户录入的第二验证语音;使用第二模型对第二验证语音进行处理,获得第三声纹特征;基于第三声纹特征、以及第二声纹特征验证用户的身份。
如此,在完成升级之后,电子设备会使用新的模型和新的用户特征模板来执行验证流程,可进一步提高电子设备的声纹识别性能。
一种可能的实现方式中,电子设备在采集用户录入的第一验证语音之前,还可以提示用户录入验证语音。例如显示屏显示提示信息,或者扬声器输提示语音等。
如此,可提高用户体验。
第二方面,提供一种升级装置,该装置可以是电子设备或电子设备中的芯片,该装置包括用于执行上述第一方面或第一方面任一种可能的实现方式中所述的方法的单元/模块。
示例性的,该装置可以包括:数据采集单元,用于采集用户录入的第一验证语音;计算单元,用于使用装置中保存的第一模型对第一验证语音进行处理,获得第一声纹特征; 基于第一声纹特征、以及装置中保存的第一用户特征模板验证该用户的身份;其中,第一用户特征模板为装置使用第一模型对该用户的历史验证语音或注册语音进行处理所获得的声纹特征;在验证该用户的身份通过之后,若装置已接收到第二模型,则使用第二模型对第一验证语音进行处理,以获得第二声纹特征;使用第二声纹特征更新装置中保存的第一用户特征模板,以及使用第二模型更新装置中保存的第一模型。
第三方面,提供一种电子设备,包括:麦克风和处理器;其中,麦克风用于:采集用户录入的第一验证语音;处理器用于:使用电子设备中保存的第一模型对第一验证语音进行处理,获得第一声纹特征;基于第一声纹特征、以及电子设备中保存的第一用户特征模板验证该用户的身份;其中,第一用户特征模板为电子设备使用第一模型对该用户的历史验证语音或注册语音进行处理所获得的声纹特征;在验证该用户的身份通过之后,若电子设备已接收到第二模型,则使用第二模型对第一验证语音进行处理,以获得第二声纹特征;使用第二声纹特征更新电子设备中保存的第一用户特征模板,以及使用第二模型更新电子设备中保存的第一模型。
第四方面,提供一种芯片,该芯片与电子设备中的存储器耦合,执行如上述第一方面或第一方面任一种可能的实现方式中所述的方法。
第五方面,提供一种计算机存储介质,计算机存储介质中存储计算机指令,该计算机指令在被一个或多个处理模块执行时实现上述第一方面或第一方面任一种可能的实现方式中所述的方法。
第六方面,提供一种包含指令的计算机程序产品,所述计算机程序产品中存储有指令,当其在计算机上运行时,使得计算机执行上述第一方面或第一方面任一种可能的实现方式中所述的方法。
附图说明
图1为本申请实施例提供的一种电子设备的结构示意图;
图2为本申请实施例提供的一种电子设备的结构示意图;
图3为本申请实施例提供的一种升级方法的流程图;
图4为电子设备提示用户录入注册语音的示意图;
图5为用户触发电子设备采集验证语音的示意图;
图6A为本申请实施例提供的一种具体的免注册升级处理方式的示意图;
图6B为本申请实施例提供的另一种具体的免注册升级处理方式的示意图;
图6C为本申请实施例提供的另一种具体的免注册升级处理方式的示意图;
图6D本申请实施例提供的另一种具体的免注册升级处理方式的示意图。
具体实施方式
声纹(Voiceprint),是用电声学仪器显示的携带言语信息的声波频谱。声纹具有稳定性、可测量性、唯一性等特点。人成年以后,人的声音可保持长期相对稳定不变。人在讲话时使用的发声器官在尺寸和形态方面每个人的差异很大,所以任何两个人的声纹图谱都有差异,不同人的声音在语谱图中共振峰的分布情况不同。声纹识别正是通过比对两段语音的说话人在相同音素上的发声来判断是否为同一个人,从而实现“闻声识人”的功能。
声纹识别从算法上看,还可分为文本相关(Text-Dependent)和文本无关(Text-1ndependent)两种。与文本有关的声纹识别系统要求用户按照规定的内容发音,每个人的声纹模型逐个被精确地建立,而识别时也必须按规定的内容发音,因此可以达到较好的识别效果,但系统需要用户配合,如果用户的发音与规定的内容不符合,则无法正确识别该用户。与文本无关的识别系统则不规定说话人的发音内容,模型建立相对困难,但用户使用方便,可应用范围较宽。因为要考虑到实用性,目前终端设备上一般都是使用文本相关的声纹识别算法。
声纹识别从应用上看,可分为说话人辨认(SI)和说话人确认(SV)两种。其中,说话人辨认,用以判断某段语音是若干人中的哪一个人所说的,是“多选一”问题。说话人确认,用以确认某段语音是否是指定的某个人所说的,是“一对一判别”问题。
本文主要涉及说话人确认功能。在后文中,除非有特别说明之外,出现的声纹识别功能均指说话人确认功能,即“说话人确认功能”与“声纹识别功能”可以相互替换。
说话人确认功能包括注册流程和验证流程两部分。注册流程包括:在用户正式使用声纹识别功能之前,声纹识别系统采集用户录入的注册语音,然后根据预先训练好的深度模型(本文称之为“声纹特征提取模型”或“模型”)从注册语音中提取声纹特征,将该声纹特征作为用户特征模板保存在电子设备中;验证流程包括:在用户使用声纹识别功能时,声纹识别系统采集用户录入的验证语音,使用与注册流程中相同的声纹特征提取模型从验证语音中提取声纹特征作为待验证特征,然后对待验证特征和注册流程中获取的用户特征模板进行相似度打分,根据打分结果确认用户身份。
因为用户的语音是比较敏感的个人信息,不能存储,更不能上传到云端,所以基于隐私安全的考虑,声纹识别系统一般是离线运行在电子设备上的,需要预先把训练好的声纹特征提取模型存储在电子设备上。
然而,具有声纹识别功能的电子设备一直在推陈出新,型号迭代非常快,基本一年更新一次。随着电子设备更新,对说话人确认技术的要求也会不断提高。当需要对说话人确认技术进行升级又要保证升级后的新算法在旧设备上兼容的时候,就需要对旧设备上的声纹识别系统进行升级,即远程推送新的声纹特征提取模型至电子设备。电子设备接收到新的声纹特征提取模型后,需要基于新的声纹特征提取模型重新执行注册流程(即需要用户重新录入注册语音,声纹识别系统使用新的声纹特征提取模型从用户新录入的注册语音中提取声纹特征作为新的用户特征模板)。如果不重新注册,则后续的验证流程中,声纹识别系统使用新的声纹特征提取模型提取的待验证特征无法跟旧的用户模板特征匹配,声纹识别系统的识别性能反而会变差;但是,如果每次升级都重新执行注册流程,又会对用户的使用体验产生很大的负面影响。
鉴于此,本申请实施例提供一种升级方案,当电子设被检测到有新的特征提取模型后,不需要用户重新提供注册语音进行用户注册,而是直接根据验证过程中获取的验证语音进行用户注册。如此,可以在用户无感知的情况下实现对声纹识别系统的升级,兼顾声纹识别性能和用户体验。
应理解,本申请实施例技术方案可以应用于具有声纹识别功能的任何电子设备。参见图1,本申请实施例中的电子设备至少具备数据采集单元01、存储单元02、通信单元03以及计算单元04,各单元之间可以通过输入输出(IO)接口连接和通信。
其中,数据采集单元01用于采集用户录入的语音(注册语音、验证语音等)。其具体 实现可以是麦克风、声音传感器等。
存储单元02,用于存储声纹识别功能使用的声纹特征提取模型、门限、以及计算单元04中用户注册模块获得的用户模板特征。
通信单元03,用于接收新的声纹特征提取模型,还可以用于接收新的门限,提供给计算单元04。
计算单元04包括:
用户注册模块401,用于根据数据采集单元01获取的注册语音提取用户模板特征,提供给验证模块402;
验证模块402,用于根据数据采集单元01获取的验证语音及存储单元02中存储的用户模板特征、模型和门限,对说话人的身份进行验证,得到验证结果;
免注册升级模块403,用于根据通信单元03接收的新模型,以及验证模块获取的验证语音、打分结果(可选的),确定新用户模板特征,并基于新用户模板特征、新模型更新存储单元02中的旧用户模板特征、旧模型。可选的,免注册升级模块还对存储单元02中存储的门限进行更新。
本申请实施例中电子设备的具体产品形态可以有多种。例如,包括但不限于:手机、平板电脑、人工智能(artificial intelligence,AI)智能语音终端、可穿戴设备、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、车载终端、膝上型计算机(Laptop)、台式计算机、智能家居设备(例如智能电视、智能音箱)等。
以电子设备是手机为例,如图2所示,为本申请实施例的一种手机100的硬件结构示意图。
手机100包括处理器110、内部存储器121、外部存储器接口122、摄像头131、显示屏132、传感器模块140、用户标识模块(subscriber identification module,SIM)卡接口151、按键152、音频模块160、扬声器161、受话器162、麦克风163、耳机接口164、通用串行总线(universal serial bus,USB)接口170、充电管理模块180、电源管理模块181、电池182、移动通信模块191和无线通信模块192。在另一些实施例中,手机100还可以包括马达、指示器、按键等。
应理解,图2所示的硬件结构仅是一个示例。本申请实施例的手机100可以具有比图中所示手机100更多的或者更少的部件,可以组合两个或更多的部件,或者可以具有不同的部件配置。图中所示出的各种部件可以在包括一个或多个信号处理和/或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。
其中,处理器110可以包括一个或多个处理单元。例如:处理器110可以包括应用处理器(application processor,AP)、调制解调器、图形处理器(graphics processing unit,GPU)、图像信号处理器(image signal processor,ISP)、控制器、视频编解码器、数字信号处理器(digital signal processor,DSP)、基带处理器、和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。
在一些实施例中,处理器110中还可以设置缓存器,用于存储指令和/或数据。示例的,处理器110中的缓存器可以为高速缓冲存储器。该缓存器可以用于保存处理器110刚用过的、生成的、或循环使用的指令和/或数据。如果处理器110需要使用该指令或数据,可从该缓存器中直接调用。有助于减少了处理器110获取指令或数据的时间,从而有助于提高 系统的效率。
内部存储器121可以用于存储程序和/或数据。在一些实施例中,内部存储器121包括存储程序区和存储数据区。
其中,存储程序区可以用于存储操作系统(如Android、IOS等操作系统)、至少一个功能所需的计算机程序等。例如,存储程序区可以存储声纹识别功能所需的计算机程序(如声纹识别系统)等。存储数据区可以用于存储手机100使用过程中所创建、和/或采集的数据(比如音频数据)等。示例的,处理器110可以通过调用内部存储器121中存储的程序和/或数据,使得手机100执行相应的方法,从而实现一种或多种功能。例如,处理器110调用内部存储器中的某些程序和/或数据,使得手机100执行本申请实施例中所提供的升级方法。
其中,内部存储器121可以采用高速随机存取存储器、和/或非易失性存储器等。例如,非易失性存储器可以包括一个或多个磁盘存储器件、闪存器件、和/或通用闪存存储器(universal flash storage,UFS)等中的至少一个。
外部存储器接口122可以用于连接外部存储卡(例如,Micro SD卡),实现扩展手机100的存储能力。外部存储卡通过外部存储器接口122与处理器110通信,实现数据存储功能。例如手机100可以通过外部存储器接口122将图像、音乐、视频等文件保存在外部存储卡中。
摄像头131可以用于捕获动、静态图像等。通常情况下,摄像头131包括镜头和图像传感器。其中,物体通过镜头生成的光学图像投射到图像传感器上,然后转换为电信号,在进行后续处理。示例的,图像传感器可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。图像传感器把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。需要说明的是,手机100可以包括1个或N个摄像头131,其中,N为大于1的正整数。
显示屏132可以包括显示面板,用于显示用户界面。显示面板可以采用液晶显示屏(liquid crystal display,LCD)、有机发光二极管(organic light-emitting diode,OLED)、有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode,AMOLED)、柔性发光二极管(flex light-emitting diode,FLED)、Miniled、MicroLed、Micro-oLed、量子点发光二极管(quantum dot light emitting diodes,QLED)等。需要说明的是,手机100可以包括1个或M个显示屏132,M为大于1的正整数。示例的,手机100可以通过GPU、显示屏132、应用处理器等实现显示功能。
传感器模块140可以包括一个或多个传感器。例如,触摸传感器140A、陀螺仪140B、加速度传感器140C、指纹传感器140D、压力传感器140E等。在一些实施例中,传感器模块140还可以包括环境光传感器、距离传感器、接近光传感器、骨传导传感器、温度传感器等。
SIM卡接口151用于连接SIM卡。SIM卡可以通过插入SIM卡接口151,或从SIM卡接口151拔出,实现和手机100的接触和分离。手机100可以支持1个或K个SIM卡接口151,K为大于1的正整数。SIM卡接口151可以支持Nano SIM卡、Micro SIM卡、和/或SIM卡等。同一个SIM卡接口151可以同时插入多张卡。所述多张卡的类型可以相同,也可以不同。SIM卡接口151也可以兼容不同类型的SIM卡。SIM卡接口151也可以兼容外部存储卡。手机100通过SIM卡和网络交互,实现通话以及数据通信等功能。在一 些实施例中,手机100还可以采用eSIM,即:嵌入式SIM卡。eSIM卡可以嵌在手机100中,不能和手机100分离。
按键152可以包括开机键、音量键等。按键152可以是机械按键,也可以是触摸式按键。手机100可以接收按键输入,产生与手机100的用户设置以及功能控制有关的键信号输入。
手机100可以通过音频模块160、扬声器161、受话器162、麦克风163、耳机接口164以及应用处理器等实现音频功能。例如,音频播放功能、录音功能、声纹注册功能、声纹验证功能、声纹识别功能等。
音频模块160可以用于对音频数据进行数模转换、和/或模数转换,还可以用于对音频数据进行编码和/或解码。示例的,音频模块160可以独立于处理器设置,也可以设置于处理器110中,或将音频模块160的部分功能模块设置于处理器110中。
扬声器161,也称“喇叭”,用于将音频数据转换为声音,并播放声音。例如,手机0100可以通过扬声器161收听音乐、接听免提电话、或者发出语音提示等。
受话器162,也称“听筒”,用于将音频数据转换成声音,并播放声音。例如,当手机0100接听电话时,可以通过将受话器162靠近人耳进行接听。
麦克风163,也称“话筒”、“传声器”,用于采集声音(例如周围环境声音,包括人发出的声音、设备发出的声音等),并将声音转换为音频电数据。当拨打电话或发送语音时,用户可以通过人嘴靠近麦克风163发出声音,麦克风163采集用户发出的声音。当手机100的声纹识别功能已开启的情况下,麦克风163可以实时采集周围环境声音,获取音频数据。
需要说明的是,手机100可以设置至少一个麦克风163。例如,手机100中设置两个麦克风163,除了采集声音,还可以实现降噪功能。又示例如,手机100中还可以设置三个、四个或更多个麦克风163,从而可以在实现声音采集、降噪的基础上,还可以实现声音来源的识别、或定向录音功能等。
耳机接口164用于连接有线耳机。耳机接口164可以是USB接口170,也可以是3.5mm的开放移动手机100平台(open mobile terminal platform,OMTP)标准接口、美国蜂窝电信工业协会(cellular telecommunications industry association of the USA,CTIA)标准接口等。
USB接口170是符合USB标准规范的接口,具体可以是Mini USB接口、Micro USB接口、USB Type C接口等。USB接口170可以用于连接充电器为手机100充电,也可以用于手机100与外围设备之间传输数据。也可以用于连接耳机,通过耳机播放音频。示例的,USB接口170除了可以为耳机接口164以外,还可以用于连接其他手机100,例如AR设备、计算机等。
充电管理模块180用于从充电器接收充电输入。其中,充电器可以是无线充电器,也可以是有线充电器。在一些有线充电的实施例中,充电管理模块180可以通过USB接口170接收有线充电器的充电输入。在一些无线充电的实施例中,充电管理模块180可以通过手机100的无线充电线圈接收无线充电输入。充电管理模块180为电池182充电的同时,还可以通过电源管理模块180为手机100供电。
电源管理模块181用于连接电池182、充电管理模块180与处理器110。电源管理模块181接收电池182和/或充电管理模块180的输入,为处理器110、内部存储器121、显示屏132、摄像头131等供电。电源管理模块181还可以用于监测电池容量、电池循环次数、 电池健康状态(漏电、阻抗)等参数。在其他一些实施例中,电源管理模块181也可以设置于处理器110中。在另一些实施例中,电源管理模块181和充电管理模块180也可以设置于同一个器件中。
移动通信模块191可以提供应用在手机100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块191可以包括滤波器、开关、功率放大器、低噪声放大器(low noise amplifier,LNA)等。
无线通信模块192可以提供应用在手机100上的包括WLAN(如Wi-Fi网络)、蓝牙(Bluetooth,BT)、全球导航卫星系统(global navigation satellite system,GNSS)、调频(frequency modulation,FM)、近距离无线通信技术(near field communication,NFC)、红外技术(infrared,IR)等无线通信的解决方案。无线通信模块192可以是集成至少一个通信处理模块的一个或多个器件。
[根据细则91更正 07.05.2022] 
在一些实施例中,手机100的天线1和移动通信模块191耦合,天线2和无线通信模块192耦合,使得手机100可以与其他设备通信。具体的,移动通信模块191可以通过天线1与其它设备通信,无线通信模块192可以通过天线2与其它设备通信。
例如,手机100可以基于无线通信模块192从其它设备接收升级信息(包括新的声纹特征提取模型、新的门限等),进而基于升级信息对电子设备上的声纹识别系统进行更新(例如更新声纹特征提取模型、更新门限等)。可选的,该其它设备可以是云服务厂商的服务器,例如是由手机100的厂商或运营商建立、维护、通过网络以按需、按易扩展的方式提供所需服务的平台,具体例如是手机厂商“华为”的服务器。当然,其它设备也可以是其它电子设备,本申请对此不做限制。
本申请实施例以下将结合附图和应用场景,对本申请实施例提供的声纹识别方法进行详细介绍。以下实施例均可以在具有上述硬件结构的手机100中实现。
参见图3,示例性的示出了本申请实施例提供的一种升级方法的流程。
S301、电子设备采集用户录入的第一注册语音;
具体的,电子设备可以通过麦克风163采集周围环境声音,获得用户录入的第一注册语音。
在具体实施时,用户可以在电子设备的提示下说出第一注册语音。例如,如图4所示,电子设备可以在显示屏132上显示文字提示用户说出注册语“1234567”。又例如,电子设备也可以通过扬声器161进行语音提示,等等。其中,电子设备提示用户录入注册语音的场景可以有多种,例如,可以是用户首次启动电子设备的声纹识别功能时电子设备自动提示用户说出注册语音,或者,也可以是用户首次启动电子设备的声纹识别功能时由用户操作电子设备提示用户说出注册语音,或者,也可以是用户在后续启动声纹识别功能时用户根据需求触发电子设备提示用户说出注册语音。
可选的,用户在进行声纹注册时可以多次输入注册语音,从而可以提高声纹识别的准确性。
S302、电子设备使用预先保存的第一模型对第一注册语音进行处理,获得第一用户特征模板,并保存第一用户特征模板;
其中,第一模型是预先用神经网络训练好的声纹特征提取模型,第一模型的输入为语音,输出为输入语音对应的声纹特征。第一模型基于的算法可以但不限于采用滤波器组(filter bank,FBank)、梅尔频率倒谱系数(mel-frequency cepstral coefficients,MFCC)、 D-vector等算法。
可选的,因为注册语音的好坏会对识别精度有比较大的影响,所以在电子设备使用预先保存的第一模型对第一注册语音进行处理之前,可以先对第一注册语音做质量检测。只有第一注册语音的质量满足第一预设要求时,才使用第一注册语音进行注册(即预先保存的第一模型对第一注册语音进行处理,获得第一用户特征模板,并保存第一用户特征模板)。如果质量不佳的话,可以拒绝使用该第一注册语音进行注册,还可以提示用户重新录入注册语音再次尝试注册等。
例如,电子设备在显示屏132上提示用户说三遍语音助手的关键词,如“小艺小艺”。用户每说一遍小艺小艺,电子设备的麦克风163都会把采集到的语音发给手机的处理器110。处理器110把关键字对应的那段语音切分出来,作为注册语音。然后处理器110确定注册语音的信噪比,并判断信噪比是否满足要求,当信噪比低于设定阈值(即噪声过大),则会拒绝注册。对通过信噪比检测的语音,处理器110使用第一模型对语音进行计算得到用户模板特征,存储在内部存储器121中。
可选的,电子设备保存的第一用户特征模板的数量可以是多个,从而可以提高声纹识别的准确性。
应理解,上述S301~S302是电子设备首次对用户的语音进行注册,即在用户首次使用声纹识别功能之前执行。在首次注册完成之后,用户就可以开始使用声纹识别功能,如S303~S304所示。
S303、电子设备采集用户录入的第一验证语音;
在具体实施时,用户可以在电子设备的提示下说出验证语音。其中,电子设备提示用户说出验证语的方法与电子设备提示用户说出注册语的方法类似,重复之处不再一一赘述。
电子设备提示用户录入验证语音的场景可以有多种,例如:可以是用户开机后电子设备自动提示用户说出第一验证语音,第一验证语音用于验证用户身份以解锁电子设备,或者,也可以是用户打开加密的应用(如日记本)时电子设备自动提示用户说出第一验证语音,第一验证语音用于验证用户身份以解锁应用,或者,还可以是也可以是用户打开应用准备登录账号时电子设备自动提示用户说出第一验证语音,第一验证语音用于验证用户身份以自动填充用户的账号和密码。
其中,电子设备可以是在用户的操作触发下采集用户录入的第一验证语音,例如,用户通过操作电子设备触发验证指令,从而电子设备在收到验证指令后采集提示用户录入第一验证语音,并采集用户录入的第一验证语音。例如,用户可以通过点击电子设备的触摸屏上声纹识别功能对应图标的相应位置触发验证指令,从而电子设备提示用户说出第一验证语音;又例如,用户可以通过操作物理实体(如物理键、鼠标、摇杆等)进行触发;又例如,用户可以通过特定手势(如双击电子设备的触摸屏等等)进行触发验证指令,从而电子设备提示用户说出第一验证语音。又例如,用户可以向电子设备(如智能手机、车载装置等等)说出关键词“声纹识别”,电子设备通过麦克风163采集到用户发出的关键词“声纹识别”后触发验证指令,并提示用户说出第一验证语音。
或者,用户也可以在向电子设备说出用于控制电子设备的控制命令时,电子设备采集该控制命令,并将该控制命令作为第一验证语音进行声纹识别。即,电子设备在接收到控制命令时触发验证指令,并将该控制指令作为第一验证语音进行声纹识别。例如,如图5所示,用户可以向电子设备(如智能手机、车载装置等等)发出控制命令“打开音乐”, 电子设备通过麦克风163采集到用户发出的语音“打开音乐”后,将该语音作为第一验证语音进行声纹识别。又例如,用户可以向电子设备(如智能空调)发出控制命令“调到27℃”,电子设备通过麦克风163采集到用户发出的语音“调到27℃”后,将该语音作为第一验证语音进行声纹识别。
可选的,用户在进行声纹验证时,可以多次输入验证语音,从而可以提高声纹识别的准确性。
S304、电子设备使用第一模型对第一验证语音进行处理,获得第一声纹特征;基于第一声纹特征、以及电子设备中保存的第一用户特征模板验证用户的身份;
首先,电子设备将第一验证语音输入与S302注册流程中相同的模型(即第一模型)中,第一模型输出声纹特征。
然后,电子设备计算第一声纹特征和第一用户特征模板的相似度。其中,计算相似度的方法可以但不限于包括:余弦距离(cosine distance,CDS)、线性判别分析(linear discriminant analysis,LDA)、概率线性判别分析(prob-ailistic linear discriminant analysis,PLDA)等算法。例如,余弦距离模型打分:计算待验证的第一声纹特征的特征向量和用户模板特征的特征向量之间的余弦值,作为相似度得分(即打分结果);例如,概率线性判别分析模型打分:使用预先训练好的概率线性判别分析模型来计算待验证的第一声纹特征和用户模板特征之间的相似度得分(即打分结果)。应理解,如果用户注册了多条用户模板特征,则可以根据待验证的第一声纹特征和多条用户模板特征进行融合匹配打分。
之后,电子设备根据打分结果,选择接受或者拒绝该验证语音对应的控制指令。例如,电子设备判断相似度是否大于第一模型对应的第一验证门限;若为是,则验证通过,即验证语音的说话人和注册语音的说话人一致,之后执行相应的控制操作(例如解锁电子设备、打开应用或登陆账号密码等);否则,验证不通过,即验证语音的说话人和注册语音的说话人不一致,不执行相应的控制操作。可选的,验证不通过的情况下,电子设备可以在显示屏132上显示验证结果,提示用户验证未通过,或者电子设备还可以提示用户重新录入验证语音再次尝试验证。
S305、电子设备在验证用户的身份通过之后,若电子设备已经接收到第二模型,则使用第二模型对第一验证语音进行处理,以获得第二声纹特征,使用第二声纹特征更新电子设备中保存的第一用户特征模板,以及使用第二模型更新电子设备中保存的第一模型。
应理解,电子设备接收第二模型的时间晚于电子设备接收第一模型的时间,换而言之,第二模型是比第一模型更新的模型。
第二模型是预先用神经网络训练好的声纹特征提取模型,第二模型的输入为语音,输出为输入语音对应的声纹特征。第二模型基于的算法可以但不限于采用FBank、MFCC、D-vector等算法。
第二模型的来源,可以是云服务器主动推送。例如,云服务器可以在需要对电子设备上的声纹识别模型进行升级时,将新的模型(例如第二模型)推送给电子设备。
电子设备在接收到第二模型之后,使用之前验证流程(前提是该次验证流程的验证结果是通过,以确保该次验证流程中获取的验证语音(如第一验证语音)是注册人说出的)中获取的第一验证语音,将其作为新的注册语音,使用第二模型对第一验证语音进行处理,获得第二声纹特征。之后,电子设备使用第二声纹特征更新电子设备中保存的第一用户特征模板,以及使用第二模型更新电子设备中保存的第一模型,从而实现用户无感知(用户 不需要执行录入注册语音的操作)的升级注册。
其中,电子设备使用第二声纹特征更新电子设备中保存的第一用户特征模板的具体实现方式包括但不限于以下两种:
方式1、电子设备直接将第二声纹特征作为新的用户特征模板(为了与第一用户特征模板相区分,这里将第二声纹特征称为第二用户特征模板),使用第二用户特征模板替换电子设备中保存的第一用户特征模板。
方式2、电子设备将第二声纹特征和第一用户模板进行加权/合并处理,获得第三用户特征模板,使用第三用户特征模板替换电子设备中保存的第一用户特征模板。
应理解,以上两种方式仅为示例而非限定,实际不限于此。
同理,对于模型的更新,可以是电子设备直接使用第二模型替换第一模型,也可以是电子设备对第一模型和第二模型进行加权/合并处理,本申请不做限制。另外,还可以是电子设备只接收到模型的部分更新参数,然后基于该些更新参数对第一模型的相关参数进行更新,而不是直接更新整个模型。
可选的,对于不同的模型,可以对应不同的验证门限。电子设备在使用第二模型对第一验证语音进行处理之后,若电子设备已接收到第二模型对应的第二验证门限,则还可以使用第二验证门限更新第一验证门限,实现对验证门限的更新。在这种情况下,电子设备还可以在确定第二模型以及第二验证门限均已经接收到之后,才使用第二模型对第一验证语音进行处理。其中,验证门限的更新方式可以是使用第二验证门限替换第一验证门限,也可以是将第二验证门限和第一验证门限进行加权/合并处理,使用加权/合并处理后获得的验证门限替换第一验证门限,本申请对此不做限制。
可选的,为保证升级后的声纹识别系统的性能,电子设备可以在确定第一验证语音的质量满足第二预设要求之后,才使用第二模型对第一验证语音进行处理(即使用第一验证语音作为新的注册语音)。
第一预设条件包括但不限于以下两种:
1)第一声纹特征和第一用户特征模板的相似度大于或等于第一免注册门限。
其中,第一免注册门限可以是根据验证流程中使用的第一验证门限值计算得到的(例如,第一免注册门限比第一验证门限值高几个分贝),也可以是电子设备预先设置(例如预先从云服务器接收并保存),本申请不做限定。
2)第一验证语音的信噪比大于或等于第一信噪比门限。
其中,第一信噪比门限可以是根据注册流程(S301)中使用的设定阈值得到(例如,第一信噪比门限与设定阈值一致,或者第一信噪比门限比设定阈值高几个分贝等),也可以是电子设备预先设置(例如预先从云服务器接收并保存),本申请不做限定。
一般情况下,第一信噪比门限大于或等于20dB。可选的,在具体实施时,还可以根据电子设备的具体形态对第一信噪比门限的数值进行微调,例如,对于手机,第一信噪比门限可以设置为22dB,而对于智能音箱,第一信噪比门限可以设置为20dB。
进一步可选的,第一免注册门限大于或等于第一模型对应的第一验证门限。如此,可保证作为新的注册语音的验证语音的质量较高,可以进一步提高升级后的声纹识别系统的性能。
进一步可选的,云服务器还可以向电子设备推送新的免注册门限,电子设备对免注册门限进行更新。例如,在电子设备基于第一免注册门限判断第一验证语音的质量满足要求, 且使用第二模型对第一验证语音进行处理之后,若电子设备已接收到第二免注册门限,则使用第二免注册门限更新第一免注册门限。其中,免注册门限的更新方式可以是使用第二免注册门限替换第一免注册门限,也可以是将第二免注册门限和第一免注册门限进行加权/合并处理,使用加权/合并处理后获得的免注册门限替换第一免注册门限,本申请对此不做限制。如此,可以进一步提高升级后的声纹识别系统的性能。
进一步的可选的,云服务器还可以向电子设备推送新的信噪比门限,电子设备对信噪比门限进行更新。例如,在电子设备基于第一信噪比门限判断第一验证语音的质量满足要求,且使用第二模型对第一验证语音进行处理之后,若电子设备已接收到第二信噪比门限,则使用第二信噪比门限更新第一信噪比门限。其中,信噪比门限的更新方式可以是使用第二信噪比门限替换第一信噪比门限,也可以是将第二信噪比门限和第一信噪比门限进行加权/合并处理,使用加权/合并处理后获得的信噪比门限替换第一信噪比门限,本申请对此不做限制。如此,可以进一步提高升级后的声纹识别系统的性能。
应理解,以上两种条件(即免注册门限和信噪比门限)可以分别单独实施,也可以同时实施,本申请不做限制。并且,以上两种条件仅为示例而非限定,具体实施时,第一预设条件还可以有其它实现方式。
可选的,在电子设备中保存的用户特征模板的数量为多个时,电子设备可以在累计获得的第二用户特征模板的数量达到预设数量之后,才使用该预设数量的第二用户特征模板更新电子设备中保存的第一用户特征模板,以及使用第二模型更新电子设备中保存的第一模型。
具体的,电子设备在执行完验证流程验证用户身份通过之后,使用第二模型对本次获得的验证语音进行处理,获得至少一个第二声纹特征,将每个第二声纹特征作为第二用户特征模板存储到内部存储器121中,然后判断内部存储器121中累计的第二用户特征模板是否达到预设数量(如3);若未达到预设数量,则等待下一次验证流程,在下一次验证流程中,基于第二模型和该下一次验证流程中的验证语音获取第二用户特征模板;若已达到预设数量,则使用所有第二用户特征模板更新所有的第一用户特征模板,以及使用第二模型更新电子设备中保存的第一模型。
如此,可以保证升级后的声纹识别系统中有多个第二用户特征模板可用,可以进一步提高升级后的声纹识别系统的性能。
应理解,上述各个用于触发电子设备将验证语音作为新的注册语音(即电子设备使用第二模型对第一验证语音进行处理,获得第二声纹特征(即第二用户特征模板))的前提条件(如判断电子设备是否收到第二模型和/或第二验证门限,判断第一验证语音的质量是否满足第二预设要求等),可以互结合实施,并且电子设备判断各前提条件的先后顺序可以相互调换。
例如,以下是步骤S305的几种可能的具体实现方式:
第一种实现方式中,如图6A所示:电子设备在执行验证流程且验证通过之后,判断验证流程获得的打分结果是否大于或等于第一免注册门限;若打分结果小于第一免注册门限,则进入下一次验证流程;若打分结果大于或等于第一免注册门限,则继续判断电子设备是否已经接收到第二模型和第二验证门限;若电子设备还未接收到第二模型和第二验证门限,则进入下一次验证流程;若电子设备已经接收到第二模型和第二验证门限,则使用第二模型对第一验证语音进行处理,获得第二用户特征模板;电子设备判断累计的第二用 户特征模板是否达到预设数量;若累计的第二用户特征模板未达到预设数量,则进入下一次验证流程;若累计的第二用户特征模板已达到预设数量,则使用所有第二用户特征模板更新电子设备中保存的所有第一用户特征模板,以及使用第二模型更新电子设备中保存的第一模型,以及使用第二验证门限更新电子设备中保存的第一验证门限。
第二种实现方式中,如图6B所示:电子设备在执行验证流程且验证通过之后,判断电子设备是否已经接收到第二模型和第二验证门限;若电子设备还未接收到第二模型和第二验证门限,则进入下一次验证流程;若电子设备已经接收到第二模型和第二验证门限,则继续判断验证流程获得的打分结果是否大于或等于第一免注册门限;若打分结果小于第一免注册门限,则进入下一次验证流程;若打分结果大于或等于第一免注册门限,则使用第二模型对第一验证语音进行处理,获得第二用户特征模板;电子设备判断累计的第二用户特征模板是否达到预设数量;若累计的第二用户特征模板未达到预设数量,则进入下一次验证流程;若累计的第二用户特征模板已达到预设数量,则使用所有第二用户特征模板更新电子设备中保存的所有第一用户特征模板,以及使用第二模型更新电子设备中保存的第一模型,以及使用第二验证门限更新电子设备中保存的第一验证门限。
第三种实现方式中,如图6C所示:电子设备在执行验证流程且验证通过之后,判断第一验证语音的信噪比是否大于或等于第一信噪比门限;若第一验证语音的信噪比小于第一信噪比门限,则进入下一次验证流程;若第一验证语音的信噪比大于或等于第一信噪比门限,则继续判断电子设备是否已经接收到第二模型和第二验证门限;若电子设备还未接收到第二模型和第二验证门限,则进入下一次验证流程;若电子设备已经接收到第二模型和第二验证门限,则使用第二模型对第一验证语音进行处理,获得第二用户特征模板;电子设备判断累计的第二用户特征模板是否达到预设数量;若累计的第二用户特征模板未达到预设数量,则进入下一次验证流程;若累计的第二用户特征模板已达到预设数量,则使用所有第二用户特征模板更新电子设备中保存的所有第一用户特征模板,以及使用第二模型更新电子设备中保存的第一模型,以及使用第二验证门限更新电子设备中保存的第一验证门限。
第四种实现方式中,如图6D所示:电子设备在执行验证流程且验证通过之后,判断第一验证语音的信噪比是否大于或等于第一信噪比门限;若第一验证语音的信噪比小于第一信噪比门限,则进入下一次验证流程;若第一验证语音的信噪比大于或等于第一信噪比门限,则继续判断验证流程获得的打分结果是否大于或等于第一免注册门限;若打分结果小于第一免注册门限,则进入下一次验证流程;若打分结果大于或等于第一免注册门限,则继续判断电子设备是否已经接收到第二模型和第二验证门限;若电子设备还未接收到第二模型和第二验证门限,则进入下一次验证流程;若电子设备已经接收到第二模型和第二验证门限,则使用第二模型对第一验证语音进行处理,获得第二用户特征模板;电子设备判断累计的第二用户特征模板是否达到预设数量;若累计的第二用户特征模板未达到预设数量,则进入下一次验证流程;若累计的第二用户特征模板已达到预设数量,则使用所有第二用户特征模板更新电子设备中保存的所有第一用户特征模板,以及使用第二模型更新电子设备中保存的第一模型,以及使用第二验证门限更新电子设备中保存的第一验证门限。
应理解,以上仅例举了四种可能的组合方式,实际不仅限于此。
当电子设备在使用第二用户特征模板更新电子设备中保存的第一用户特征模板,使用第二模型更新电子设备中保存的第一模型之后(即完成第一次免注册升级之后),用户再 次使用声纹验证功能时,电子设备可使用更新后的模型执行验证流程,例如:采集用户录入的第二验证语音,并使用第二模型对第二验证语音进行处理,获得第三声纹特征;基于第三声纹特征、以及第二用户特征模板验证用户的身份。具体实现方法可参考S303~S304,这里不再赘述。
当然,电子设备在收到比第二模型更新的模型之后,则进行新一轮的免注册升级处理。例如,在上述基于第二模型的验证流程完成且验证用户身份通过之后,若电子设备已收到第三模型,则使用第三模型对第二验证语音进行处理,获得第三用户特征模板,之后使用第三用户特征模板更新电子设备中保存的第二用户特征模板,以及使用第三模型更新电子设备中保存的第二模型。具体实现方法可参考S305,这里不再赘述。
以上,是以一个用户场景为例,对该一个用户的注册、验证和免注册升级进行了详细说明。在具体实施时,本申请实施例同样适用于多用户场景。多用户的场景下的主要区别有:在首次注册流程中,需要同时注册多个用户的第一用户特征模板;在验证流程中,需要从多个用户的用户特征模板中确定出当前用户的用户特征模板,来对当前用户进行身份验证;在升级免注册流程中,需要同时对多个用户的用户特征模板进行更新。
基于上述可知,本申请实施例在对声纹识别系统进行升级时,将验证过程中获取的验证语音作为新的注册语音,以完成升级注册,可以在用户无感知的情况下实现对声纹识别系统的升级,能够兼顾声纹识别性能和用户体验。
基于同一技术构思,本申请实施例还提供一种芯片,该芯片与电子设备中的存储器耦合,可以执行如图3、图6A~图6D中所示的方法。
基于同一技术构思,本申请实施例还提供一种计算机存储介质,该计算机存储介质中存储计算机指令,该计算机指令在被一个或多个处理模块执行时实现图3、图6A~图6D中所示的方法。
基于同一技术构思,本申请实施例还提供一种包含指令的计算机程序产品,所述计算机程序产品中存储有指令,当其在计算机上运行时,使得计算机执行图3、图6A~图6D中所示的方法。
应理解,在本申请中除非另有说明,“/”表示或的意思,例如,A/B可以表示A或B;本申请中的“和/或”仅仅是一种描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B这三种情况。“至少一个”是指一个或者多个,“多个”是指两个或两个以上。
在本申请中,“示例的”、“在一些实施例中”、“在另一些实施例中”等用于表示作例子、例证或说明。本申请中被描述为“示例”的任何实施例或设计方案不应被解释为比其它实施例或设计方案更优选或更具优势。确切而言,使用示例的一词旨在以具体方式呈现概念。
另外,本申请中涉及的“第一”、“第二”等词汇,仅用于区分描述的目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量,也不能理解为指示或暗示顺序。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程 序产品的形式。
本申请是参照根据本申请的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的精神和范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (27)

  1. 一种升级方法,其特征在于,应用于电子设备,所述方法包括:
    采集用户录入的第一验证语音;
    使用所述电子设备中保存的第一模型对所述第一验证语音进行处理,获得第一声纹特征;基于所述第一声纹特征、以及所述电子设备中保存的第一用户特征模板验证所述用户的身份;其中,所述第一用户特征模板为所述电子设备使用所述第一模型对所述用户的历史验证语音或注册语音进行处理所获得的声纹特征;
    在验证所述用户的身份通过之后,若所述电子设备已接收到第二模型,则使用所述第二模型对所述第一验证语音进行处理,以获得第二声纹特征;使用所述第二声纹特征更新所述电子设备中保存的所述第一用户特征模板,以及使用所述第二模型更新所述电子设备中保存的所述第一模型。
  2. 如权利要求1所述的方法,其特征在于,
    基于所述第一声纹特征、以及所述电子设备中保存的第一用户特征模板验证所述用户的身份,包括:
    计算所述第一声纹特征和所述第一用户特征模板的相似度;判断所述相似度是否大于所述第一模型对应的第一验证门限;若为是,则验证通过;否则,验证不通过;
    在使用所述第二模型对所述第一验证语音进行处理之后,所述方法还包括:
    若所述电子设备已接收到所述第二模型对应的第二验证门限,则使用所述第二验证门限更新所述第一验证门限。
  3. 如权利要求1或2所述的方法,其特征在于,使用所述第二模型对所述第一验证语音进行处理,包括:
    在所述第一验证语音的质量满足第一预设条件时,使用所述第二模型对所述第一验证语音进行处理;
    其中,所述第一预设条件包括:所述第一声纹特征和所述第一用户特征模板的相似度大于或等于第一免注册门限;和/或,所述第一验证语音的信噪比大于或等于第一信噪比门限。
  4. 如权利要求3所述的方法,其特征在于,所述第一免注册门限大于或等于所述第一模型对应的第一验证门限。
  5. 如权利要求3所述的方法,其特征在于,在使用所述第二模型对所述第一验证语音进行处理之后,所述方法还包括:
    若所述电子设备已接收到第二免注册门限,则使用所述第二免注册门限更新所述第一免注册门限;和/或,
    若所述电子设备已接收到第二信噪比门限,则使用所述第二信噪比门限更新所述第一信噪比门限。
  6. 如权利要求1-5任一项所述的方法,其特征在于,使用所述第二声纹特征更新所述电子设备中保存的所述第一用户特征模板,以及使用所述第二模型更新所述电子设备中保存的所述第一模型,包括:
    在所述电子设备累计获得的所述第二声纹特征的数量达到预设数量之后,使用所述预 设数量的所述第二声纹特征更新所述电子设备中保存的所述第一用户特征模板,以及使用所述第二模型更新所述电子设备中保存的所述第一模型。
  7. 如权利要求1-6任一项所述的方法,其特征在于,在使用所述第二声纹特征更新所述电子设备中保存的所述第一用户特征模板,使用所述第二模型更新所述电子设备中保存的所述第一模型之后,还包括:
    采集所述用户录入的第二验证语音;
    使用所述第二模型对所述第二验证语音进行处理,获得第三声纹特征;基于所述第三声纹特征、以及所述第二声纹特征验证所述用户的身份。
  8. 如权利要求1-7任一项所述的方法,其特征在于,在采集用户录入的第一验证语音之前,还包括:
    提示用户录入验证语音。
  9. 一种升级装置,其特征在于,所述装置包括:
    数据采集单元,用于采集用户录入的第一验证语音;
    计算单元,用于使用所述装置中保存的第一模型对所述第一验证语音进行处理,获得第一声纹特征;基于所述第一声纹特征、以及所述装置中保存的第一用户特征模板验证所述用户的身份;其中,所述第一用户特征模板为所述装置使用所述第一模型对所述用户的历史验证语音或注册语音进行处理所获得的声纹特征;在验证所述用户的身份通过之后,若所述装置已接收到第二模型,则使用所述第二模型对所述第一验证语音进行处理,以获得第二声纹特征;使用所述第二声纹特征更新所述装置中保存的所述第一用户特征模板,以及使用所述第二模型更新所述装置中保存的所述第一模型。
  10. 如权利要求9所述的装置,其特征在于,所述计算单元在基于所述第一声纹特征、以及所述装置中保存的第一用户特征模板验证所述用户的身份时,具体用于:
    计算所述第一声纹特征和所述第一用户特征模板的相似度;判断所述相似度是否大于所述第一模型对应的第一验证门限;若为是,则验证通过;否则,验证不通过;
    所述计算单元还用于:在使用所述第二模型对所述第一验证语音进行处理之后,若所述装置已接收到所述第二模型对应的第二验证门限,则使用所述第二验证门限更新所述第一验证门限。
  11. 如权利要求9或10所述的装置,其特征在于,所述计算单元在使用所述第二模型对所述第一验证语音进行处理时,具体用于:
    在所述第一验证语音的质量满足第一预设条件时,使用所述第二模型对所述第一验证语音进行处理;
    其中,所述第一预设条件包括:所述第一声纹特征和所述第一用户特征模板的相似度大于或等于第一免注册门限;和/或,所述第一验证语音的信噪比大于或等于第一信噪比门限。
  12. 如权利要求11所述的装置,其特征在于,所述第一免注册门限大于或等于所述第一模型对应的第一验证门限。
  13. 如权利要求11所述的装置,其特征在于,所述计算单元还用于:
    在使用所述第二模型对所述第一验证语音进行处理之后,若所述装置已接收到第二免注册门限,则使用所述第二免注册门限更新所述第一免注册门限;和/或,若所述装置已接收到第二信噪比门限,则使用所述第二信噪比门限更新所述第一信噪比门限。
  14. 如权利要求9-13任一项所述的装置,其特征在于,所述计算单元在使用所述第二声纹特征更新所述装置中保存的所述第一用户特征模板,以及使用所述第二模型更新所述装置中保存的所述第一模型时,具体用于:
    在累计获得的所述第二声纹特征的数量达到预设数量之后,使用所述预设数量的所述第二声纹特征更新所述装置中保存的所述第一用户特征模板,以及使用所述第二模型更新所述装置中保存的所述第一模型。
  15. 如权利要求9-14任一项所述的装置,其特征在于,所述计算单元还用于:
    在使用所述第二声纹特征更新所述装置中保存的所述第一用户特征模板,使用所述第二模型更新所述装置中保存的所述第一模型之后,采集所述用户录入的第二验证语音;
    使用所述第二模型对所述第二验证语音进行处理,获得第三声纹特征;基于所述第三声纹特征、以及所述第二声纹特征验证所述用户的身份。
  16. 如权利要求9-15任一项所述的装置,其特征在于,所述计算单元还用于:
    在所述数据采集单元采集用户录入的第一验证语音之前,提示用户录入验证语音。
  17. 一种电子设备,其特征在于,包括:麦克风和处理器;
    所述麦克风用于:采集用户录入的第一验证语音;
    所述处理器用于:使用所述电子设备中保存的第一模型对所述第一验证语音进行处理,获得第一声纹特征;基于所述第一声纹特征、以及所述电子设备中保存的第一用户特征模板验证所述用户的身份;其中,所述第一用户特征模板为所述电子设备使用所述第一模型对所述用户的历史验证语音或注册语音进行处理所获得的声纹特征;在验证所述用户的身份通过之后,若所述电子设备已接收到第二模型,则使用所述第二模型对所述第一验证语音进行处理,以获得第二声纹特征;使用所述第二声纹特征更新所述电子设备中保存的所述第一用户特征模板,以及使用所述第二模型更新所述电子设备中保存的所述第一模型。
  18. 如权利要求17所述的电子设备,其特征在于,所述处理器在基于所述第一声纹特征、以及所述电子设备中保存的第一用户特征模板验证所述用户的身份时,具体用于:
    计算所述第一声纹特征和所述第一用户特征模板的相似度;判断所述相似度是否大于所述第一模型对应的第一验证门限;若为是,则验证通过;否则,验证不通过;
    所述处理器还用于:在使用所述第二模型对所述第一验证语音进行处理之后,若所述电子设备已接收到所述第二模型对应的第二验证门限,则使用所述第二验证门限更新所述第一验证门限。
  19. 如权利要求17或18所述的电子设备,其特征在于,所述处理器在使用所述第二模型对所述第一验证语音进行处理时,具体用于:
    在所述第一验证语音的质量满足第一预设条件时,使用所述第二模型对所述第一验证语音进行处理;
    其中,所述第一预设条件包括:所述第一声纹特征和所述第一用户特征模板的相似度大于或等于第一免注册门限;和/或,所述第一验证语音的信噪比大于或等于第一信噪比门限。
  20. 如权利要求19所述的电子设备,其特征在于,所述第一免注册门限大于或等于所述第一模型对应的第一验证门限。
  21. 如权利要求19所述的电子设备,其特征在于,所述处理器还用于:
    在使用所述第二模型对所述第一验证语音进行处理之后,若所述电子设备已接收到第 二免注册门限,则使用所述第二免注册门限更新所述第一免注册门限;和/或,若所述电子设备已接收到第二信噪比门限,则使用所述第二信噪比门限更新所述第一信噪比门限。
  22. 如权利要求17-21任一项所述的电子设备,其特征在于,所述处理器在使用所述第二声纹特征更新所述电子设备中保存的所述第一用户特征模板,以及使用所述第二模型更新所述电子设备中保存的所述第一模型时,具体用于:
    在所述电子设备累计获得的所述第二声纹特征的数量达到预设数量之后,使用所述预设数量的所述第二声纹特征更新所述电子设备中保存的所述第一用户特征模板,以及使用所述第二模型更新所述电子设备中保存的所述第一模型。
  23. 如权利要求17-22任一项所述的电子设备,其特征在于,所述处理器还用于:
    在使用所述第二声纹特征更新所述电子设备中保存的所述第一用户特征模板,使用所述第二模型更新所述电子设备中保存的所述第一模型之后,采集所述用户录入的第二验证语音;
    使用所述第二模型对所述第二验证语音进行处理,获得第三声纹特征;基于所述第三声纹特征、以及所述第二声纹特征验证所述用户的身份。
  24. 如权利要求17-23任一项所述的电子设备,其特征在于,所述处理器还用于:
    在所述麦克风采集用户录入的第一验证语音之前,提示用户录入验证语音。
  25. 一种芯片,其特征在于,所述芯片与电子设备中的存储器耦合,执行如权利要求1至8中任一项所述的方法。
  26. 一种计算机存储介质,其特征在于,所述计算机存储介质中存储计算机指令,该计算机指令在被一个或多个处理模块执行时实现如权利要求1至8中任一项所述的方法。
  27. 一种包含指令的计算机程序产品,其特征在于,所述计算机程序产品中存储有指令,当其在计算机上运行时,使得计算机执行如权利要求1至8中任一项所述的方法。
PCT/CN2022/088237 2021-05-07 2022-04-21 一种升级方法、装置及电子设备 WO2022233239A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2023568018A JP2024517830A (ja) 2021-05-07 2022-04-21 アップグレード方法、アップグレード装置、および電子デバイス
EP22798580.1A EP4318465A1 (en) 2021-05-07 2022-04-21 Upgrading method and apparatus, and electronic device
US18/502,517 US20240071392A1 (en) 2021-05-07 2023-11-06 Upgrade method, upgrade apparatus, and electronic device

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110493970.X 2021-05-07
CN202110493970.XA CN115310066A (zh) 2021-05-07 2021-05-07 一种升级方法、装置及电子设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US18/502,517 Continuation US20240071392A1 (en) 2021-05-07 2023-11-06 Upgrade method, upgrade apparatus, and electronic device

Publications (1)

Publication Number Publication Date
WO2022233239A1 true WO2022233239A1 (zh) 2022-11-10

Family

ID=83854270

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/088237 WO2022233239A1 (zh) 2021-05-07 2022-04-21 一种升级方法、装置及电子设备

Country Status (5)

Country Link
US (1) US20240071392A1 (zh)
EP (1) EP4318465A1 (zh)
JP (1) JP2024517830A (zh)
CN (1) CN115310066A (zh)
WO (1) WO2022233239A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106782564A (zh) * 2016-11-18 2017-05-31 百度在线网络技术(北京)有限公司 用于处理语音数据的方法和装置
US20170301353A1 (en) * 2016-04-15 2017-10-19 Sensory, Incorporated Unobtrusive training for speaker verification
CN110047490A (zh) * 2019-03-12 2019-07-23 平安科技(深圳)有限公司 声纹识别方法、装置、设备以及计算机可读存储介质
WO2020192890A1 (en) * 2019-03-25 2020-10-01 Omilia Natural Language Solutions Ltd. Systems and methods for speaker verification
CN112735438A (zh) * 2020-12-29 2021-04-30 科大讯飞股份有限公司 一种在线声纹特征更新方法及设备、存储设备和建模设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170301353A1 (en) * 2016-04-15 2017-10-19 Sensory, Incorporated Unobtrusive training for speaker verification
CN106782564A (zh) * 2016-11-18 2017-05-31 百度在线网络技术(北京)有限公司 用于处理语音数据的方法和装置
CN110047490A (zh) * 2019-03-12 2019-07-23 平安科技(深圳)有限公司 声纹识别方法、装置、设备以及计算机可读存储介质
WO2020192890A1 (en) * 2019-03-25 2020-10-01 Omilia Natural Language Solutions Ltd. Systems and methods for speaker verification
CN112735438A (zh) * 2020-12-29 2021-04-30 科大讯飞股份有限公司 一种在线声纹特征更新方法及设备、存储设备和建模设备

Also Published As

Publication number Publication date
US20240071392A1 (en) 2024-02-29
JP2024517830A (ja) 2024-04-23
EP4318465A1 (en) 2024-02-07
CN115310066A (zh) 2022-11-08

Similar Documents

Publication Publication Date Title
EP3968144A1 (en) Voice control method and related apparatus
CN108615526B (zh) 语音信号中关键词的检测方法、装置、终端及存储介质
US9685161B2 (en) Method for updating voiceprint feature model and terminal
CN111131601B (zh) 一种音频控制方法、电子设备、芯片及计算机存储介质
EP3264333A1 (en) Method and system for authenticating user of a mobile device via hybrid biometics information
WO2021013255A1 (zh) 一种声纹识别方法及装置
US10916249B2 (en) Method of processing a speech signal for speaker recognition and electronic apparatus implementing same
WO2022033556A1 (zh) 电子设备及其语音识别方法和介质
CN110660398B (zh) 声纹特征更新方法、装置、计算机设备及存储介质
CN110364156A (zh) 语音交互方法、系统、终端及可读存储介质
US20220180859A1 (en) User speech profile management
CN115312068B (zh) 语音控制方法、设备及存储介质
CN115881118A (zh) 一种语音交互方法及相关电子设备
KR20140067687A (ko) 대화형 음성인식이 가능한 차량 시스템
US20240013789A1 (en) Voice control method and apparatus
CN110337030B (zh) 视频播放方法、装置、终端和计算机可读存储介质
WO2022233239A1 (zh) 一种升级方法、装置及电子设备
CN115019806A (zh) 声纹识别方法和装置
KR102622350B1 (ko) 전자 장치 및 그 제어 방법
CN115035886B (zh) 声纹识别方法及电子设备
CN110164450B (zh) 登录方法、装置、播放设备及存储介质
CN117953872A (zh) 语音唤醒模型更新方法、存储介质、程序产品及设备
CN116189718A (zh) 语音活性检测方法、装置、设备及存储介质
WO2017219925A1 (zh) 一种信息发送方法、装置及计算机存储介质
CN116153291A (zh) 一种语音识别方法及设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22798580

Country of ref document: EP

Kind code of ref document: A1

WWE Wipo information: entry into national phase

Ref document number: 2022798580

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2023568018

Country of ref document: JP

ENP Entry into the national phase

Ref document number: 2022798580

Country of ref document: EP

Effective date: 20231025

NENP Non-entry into the national phase

Ref country code: DE