WO2020000427A1 - 一种语音控制方法、可穿戴设备及终端 - Google Patents

一种语音控制方法、可穿戴设备及终端 Download PDF

Info

Publication number
WO2020000427A1
WO2020000427A1 PCT/CN2018/093829 CN2018093829W WO2020000427A1 WO 2020000427 A1 WO2020000427 A1 WO 2020000427A1 CN 2018093829 W CN2018093829 W CN 2018093829W WO 2020000427 A1 WO2020000427 A1 WO 2020000427A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
terminal
user
component
voice component
Prior art date
Application number
PCT/CN2018/093829
Other languages
English (en)
French (fr)
Inventor
张龙
黎椿键
仇存收
常青
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to US17/256,845 priority Critical patent/US20210256979A1/en
Priority to CN202011060047.9A priority patent/CN112420035A/zh
Priority to EP18924696.0A priority patent/EP3790006A4/en
Priority to CN201880024906.3A priority patent/CN110574103B/zh
Priority to RU2021101686A priority patent/RU2763392C1/ru
Priority to PCT/CN2018/093829 priority patent/WO2020000427A1/zh
Priority to KR1020207037501A priority patent/KR102525294B1/ko
Publication of WO2020000427A1 publication Critical patent/WO2020000427A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F1/00Details not covered by groups G06F3/00 - G06F13/00 and G06F21/00
    • G06F1/16Constructional details or arrangements
    • G06F1/1613Constructional details or arrangements for portable computers
    • G06F1/163Wearable computers, e.g. on a belt
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/08Mouthpieces; Microphones; Attachments therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/10Earpieces; Attachments therefor ; Earphones; Monophonic headphones
    • H04R1/1041Mechanical or electronic switches, or control elements
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/221Announcement of recognition results
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2460/00Details of hearing devices, i.e. of ear- or headphones covered by H04R1/10 or H04R5/033 but not provided for in any of their subgroups, or of hearing aids covered by H04R25/00 but not provided for in any of its subgroups
    • H04R2460/13Hearing devices using bone conduction transducers

Definitions

  • the present application relates to the field of terminals, and in particular, to a voice control method, a wearable device, and a terminal.
  • a voiceprint refers to a sound wave spectrum that carries speech information when a user speaks, and can reflect the audio characteristics of the user. Because the vocal organs (eg, tongue, teeth, throat, lungs, nasal cavity, etc.) used by different people differ in size and shape, the sound wave spectrum of any two people is generally different. Therefore, through voiceprint recognition (speaker recognition), one or more types of speech information can be analyzed, thereby achieving the purpose of identifying unknown sounds.
  • voiceprint recognition peak recognition
  • the traditional voiceprint recognition method mainly uses a conventional microphone to collect airborne speaker sound signals, and then uses the collected speaker sound signals to identify the speaker's identity.
  • the collected speaker's voice signal is relatively noisy, which easily interferes with the accuracy of voiceprint recognition.
  • terminals such as mobile phones will increase security risks because they cannot be accurately identified.
  • the present application provides a voice control method, a wearable device, and a terminal, which can improve the accuracy and security of voiceprint recognition when a user uses a voice control terminal.
  • the present application provides a voice control method, including: establishing a communication connection between a terminal and a wearable device; when a voice user inputs voice information to the wearable device, the terminal according to the first sound of the first voice component in the voice information The pattern recognition result and the second voiceprint recognition result of the second voice component in the voice information authenticate the voiced user; wherein the first voice component is collected by the first voice sensor of the wearable device, and the second The voice component is collected by the second voice sensor of the wearable device; if the identity authentication result of the voice user by the terminal is that the voice user is a legitimate user, the terminal executes an operation instruction corresponding to the voice information.
  • the two voice sensors have collected two pieces of voice information (that is, the first voice component and the second voice component described above).
  • the terminal can perform voiceprint recognition on the two pieces of voice information respectively.
  • the voiceprint recognition results of the two pieces of voice information match the legal user, it can be confirmed that the voiced user at this time is a legitimate user.
  • this dual voiceprint recognition process of two-way voice information can significantly improve the accuracy and security of user identity authentication compared to the voiceprint recognition process of one-way voice information.
  • the second speech component is collected by the bone conduction microphone of the wearable device, it means that the user has already worn the wearable device when speaking, thereby avoiding the situation where an illegal user uses the recording of a legitimate user to maliciously control the terminal of a legitimate user .
  • the terminal identifies the voiced user according to the first voiceprint recognition result of the first voice component in the voice information and the second voiceprint recognition result of the second voice component in the voice information.
  • the method further includes: the terminal obtains the first voiceprint recognition result and the second voiceprint recognition result from the wearable device, and the first voiceprint recognition result is obtained after the wearable device performs voiceprint recognition on the first voice component, The second voiceprint recognition result is obtained after the wearable device performs voiceprint recognition on the second voice component.
  • the wearable device collects the first voice component and the second voice component in the voice information of the uttering user, the two voice components can be locally recognized for voiceprint, and then the recognition result is sent to the terminal, thereby It can reduce the implementation complexity of the terminal when implementing voice control.
  • the terminal identifies the voiced user according to the first voiceprint recognition result of the first voice component in the voice information and the second voiceprint recognition result of the second voice component in the voice information.
  • the method further includes: the terminal acquires the first voice component and the second voice component from the wearable device; the terminal performs voiceprint recognition on the first voice component and the second voice component respectively to obtain a first voice component corresponding to the first voice component; A voiceprint recognition result and a second voiceprint recognition result corresponding to the second speech component. That is, after the wearable device collects the first voice component and the second voice component in the voice user's voice information, it can send these two voice components to the terminal for voiceprint recognition, thereby reducing the power consumption and Implementation complexity.
  • the terminal performs voiceprint recognition on the first voice component and the second voice component respectively, including: when the voice information includes a preset keyword, the terminal recognizes the first voice component and the second voice component; The voice component performs voiceprint recognition; or, when a preset operation input by the user is received, the terminal performs voiceprint recognition on the first voice component and the second voice component. Otherwise, it means that the user does not need the voiceprint recognition at this time, so the terminal does not need to enable the voiceprint recognition function, thereby reducing the power consumption of the terminal.
  • the terminal performs voiceprint recognition on the first voice component and the second voice component respectively, including: the terminal determines whether the first voice component matches the first voiceprint model of a legitimate user, and the first voiceprint The model is used to reflect the audio characteristics of the legitimate user collected by the first voice sensor; the terminal determines whether the second voice component matches the second voiceprint model of the legitimate user, and the second voiceprint model is used to reflect the second voice sensor collected The audio characteristics of the legitimate user;
  • the terminal performs identity authentication on the voiced user according to the first voiceprint recognition result of the first voice component in the voice information and the second voiceprint recognition result of the second voice component in the voice information, including: If a voice component matches the first voiceprint model of the legitimate user and the second voice component matches the second voiceprint model of the legitimate user, the terminal determines that the voice user is a legitimate user; otherwise, the terminal determines that the voice user is an illegal user .
  • the terminal determining whether the first voice component matches the first voiceprint model of the legal user includes: calculating, by the terminal, the first voice component between the first voice component and the first voiceprint model of the legal user; Matching degree; if the first matching degree is greater than the first threshold, the terminal determines that the first voice component matches the first voiceprint model of the legitimate user; the terminal determines whether the second voice component matches the second voiceprint model of the legitimate user, The method includes: the terminal calculates a second matching degree between the second voice component and the second voiceprint model of the legal user; if the second matching degree is greater than a second threshold, the terminal determines that the second voice component and the second voice component of the legal user are second Voiceprint model matching.
  • the terminal identifies the voiced user according to the first voiceprint recognition result of the first voice component in the voice information and the second voiceprint recognition result of the second voice component in the voice information.
  • the method further includes: the terminal acquires a startup instruction sent by the wearable device, the startup instruction being generated by the wearable device in response to a wake-up voice input by the user; and in response to the startup instruction, the terminal turns on a voiceprint recognition function.
  • the method further includes: the terminal determines whether the voice information includes a preset according to the first voice component and the second voice component. Wake-up word; if the preset wake-up word is included, the terminal turns on the voiceprint recognition function.
  • the user can trigger the terminal to enable the voiceprint recognition function by saying the wake word, otherwise, it means that the user does not need the voiceprint recognition at this time, and the terminal does not need to enable the voiceprint recognition function, thereby reducing the power consumption of the terminal.
  • the method further includes: the terminal automatically performs an unlock operation.
  • the user only needs to input voice information once to complete a series of operations such as user identity authentication, mobile phone unlocking, and opening a certain function of the mobile phone, thereby greatly improving the user's control efficiency of the mobile phone and the user experience.
  • the method before the terminal executes the operation instruction corresponding to the voice information, the method further includes: the terminal obtains a device identifier of the wearable device; wherein the terminal executes the operation instruction corresponding to the voice information, including: The device identification of the wearable device is a preset legal device identification, and the terminal executes an operation instruction corresponding to the voice information.
  • the device identification of the wearable device is a preset legal device identification, and the terminal executes an operation instruction corresponding to the voice information.
  • the present application provides a voice control method including: establishing a communication connection between a wearable device and a terminal; the wearable device using a first voice sensor to collect a first voice component in voice information; and a wearable device using a second voice sensor The second voice component in the voice information is collected; the wearable device performs voiceprint recognition on the first voice component and the second voice component respectively, so as to perform identity authentication on the voiced user.
  • the first voice sensor is located on a side of the wearable device that is not in contact with the user, and the second voice sensor is located on the wearable device that is in contact with the user One side.
  • the first voice sensor is an air conduction microphone and the second voice sensor is a bone conduction microphone.
  • the method before the wearable device uses the first voice sensor to collect the first voice component in the voice information, the method further includes: : Use the proximity light sensor on the wearable device to detect the ambient light intensity; use the acceleration sensor on the wearable device to detect the acceleration value; if the ambient light intensity is less than the preset light intensity threshold; or the acceleration value is greater than the preset acceleration threshold ; Or the ambient light intensity is less than a preset light intensity threshold and the acceleration value is greater than a preset acceleration threshold; then it is determined that the wearable device is in a wearing state.
  • the wearable device uses a second voice sensor to collect a second voice in the voice information After the component, the method further includes: the wearable device performs voice activity detection (VAD) on the first voice component to obtain the first VAD value; the wearable device performs VAD on the second voice component to obtain the second VAD value; The wearable device performs voiceprint recognition on the first voice component and the second voice component, and includes: when the first VAD value and the second VAD value both meet a preset condition, performing voice recognition on the first voice component and the second voice component. Pattern recognition.
  • VAD voice activity detection
  • the wearable device performs voiceprint recognition on the first voice component and the second voice component, Including: when the voice information includes a preset keyword, the wearable device performs voiceprint recognition on the first voice component and the second voice component; or when the preset operation input by the user is received, the wearable device pairs The first speech component and the second speech component perform voiceprint recognition.
  • the wearable device performs voiceprint recognition on the first voice component and the second voice component, The method includes: the wearable device judges whether the first voice component matches the first voiceprint model of the legal user, and the first voiceprint model is used to reflect the audio characteristics of the legal user collected by the first voice sensor; the wearable device judges the second Whether the voice component matches the second voiceprint model of the legitimate user, and the second voiceprint model is used to reflect the audio characteristics of the legitimate user collected by the second voice sensor;
  • the method further includes: if the first voice component matches the first voiceprint model of the legal user, and the second voice component matches the legal user's If the second voiceprint model matches, the wearable device determines that the voiced user is a legitimate user; otherwise, the wearable device determines that the voiced user is an illegal user.
  • the method further includes: the wearable device uses the first voice sensor to collect the registered voice input by the legitimate user The first registered component of the legal user to establish a first voiceprint model of the legal user; the wearable device uses a second voice sensor to collect a second registered component of the registered voice input by the legal user to establish a second of the legal user Voiceprint model.
  • the wearable device determines whether the first voice component and the first voiceprint model of a legitimate user are The matching includes: the wearable device calculates a first matching degree between the first voice component and the first voiceprint model of the legal user; if the first matching degree is greater than a first threshold, the wearable device determines that the first voice component and the The legal user ’s first voiceprint model matches; the wearable device determining whether the second voice component matches the legal user ’s second voiceprint model includes: the wearable device calculating the second voice component and the legal user ’s second voiceprint A second degree of matching between models; if the second degree of matching is greater than a second threshold, the wearable device determines that the second voice component matches the second voiceprint model of the legitimate user.
  • the wearable device performs voiceprint recognition on the first voice component and the second voice component After that, the method further includes: if the voiced user is a legitimate user, the wearable device sends an authentication pass message or an unlock instruction to the terminal.
  • the wearable device performs voiceprint recognition on the first voice component and the second voice component After that, the method further includes: if the voiced user is a legitimate user, the wearable device sends an operation instruction corresponding to the voice information to the terminal.
  • the wearable device performs voiceprint recognition on the first voice component and the second voice component Before, the method further includes: the wearable device performs noise reduction processing on the first voice component and the second voice component; and / or, the wearable device uses an echo cancellation algorithm to eliminate echo signals in the first voice component and the second voice component.
  • the wearable device uses the first voice sensor to collect the first voice in the voice information Before the component, the method further includes: the wearable device receives a wake-up voice input by the user, and the wake-up voice includes a preset wake-up word; in response to the wake-up voice, the wearable device sends a startup instruction to the terminal, and the startup instruction is used to instruct the terminal to turn on Voiceprint recognition function.
  • the present application provides a terminal including a connection unit, an acquisition unit, an identification unit, an authentication unit, and an execution unit.
  • the connection unit is configured to: establish a communication connection with the wearable device;
  • the authentication unit is configured to: when the vocal user inputs voice information to the wearable device, according to the first voiceprint recognition result of the first voice component in the voice information and
  • the second voiceprint recognition result of the second voice component in the voice information is used to authenticate the voiced user.
  • the first voice component is collected by the first voice sensor of the wearable device, and the second voice component is obtained by the wearable device. Collected by the second voice sensor of the wearable device;
  • the execution unit is configured to execute an operation instruction corresponding to the voice information if the identity authentication result of the voice user by the terminal is that the voice user is a legitimate user.
  • the foregoing obtaining unit is configured to obtain the first voiceprint recognition result and the second voiceprint recognition result from the wearable device, and the first voiceprint recognition result is the wearable device performing the first voice component
  • the second voiceprint recognition result is obtained after the wearable device performs voiceprint recognition on the second voice component.
  • the above-mentioned obtaining unit is configured to obtain the first voice component and the second voice component from the wearable device; and the above-mentioned recognition unit is configured to perform voiceprint on the first voice component and the second voice component, respectively. Recognize to obtain a first voiceprint recognition result corresponding to the first voice component and a second voiceprint recognition result corresponding to the second voice component.
  • the foregoing recognition unit is specifically configured to: when the voice information includes a preset keyword, perform voiceprint recognition on the first voice component and the second voice component; or when a user is received During the preset operation input, voiceprint recognition is performed on the first speech component and the second speech component.
  • the foregoing recognition unit is specifically configured to determine whether the first voice component matches a first voiceprint model of a legitimate user, and the first voiceprint model is used to reflect the legality collected by the first voice sensor. Audio characteristics of the user; judging whether the second voice component matches the second voiceprint model of a legitimate user, the second voiceprint model is used to reflect the audio characteristics of the legitimate user collected by the second voice sensor; the authentication unit is specifically used In: if the first voice component matches the first voiceprint model of a legitimate user and the second voice component matches the second voiceprint model of a legitimate user, then the voice user is determined to be a legitimate user; otherwise, the voice user is determined to be illegal user.
  • the foregoing recognition unit is specifically configured to: calculate a first matching degree between the first voice component and a first voiceprint model of the legal user; if the first matching degree is greater than a first threshold, then Determining that the first speech component matches the first voiceprint model of the legitimate user; calculating a second matching degree between the second speech component and the second voiceprint model of the legitimate user; if the second matching degree is greater than a second threshold, Then it is determined that the second voice component matches the second voiceprint model of the legitimate user.
  • the obtaining unit is further configured to obtain a startup instruction sent by the wearable device, the startup instruction being generated by the wearable device in response to a wake-up voice input by the user; the execution unit is further configured to: With this startup instruction, the voiceprint recognition function is turned on.
  • the recognition unit is further configured to determine whether the voice information includes a preset wake-up word according to the first voice component and the second voice component; and the execution unit is further configured to: Wakeup words, turn on voiceprint recognition.
  • the execution unit is further configured to: if the uttering user is a legitimate user, automatically perform an unlock operation.
  • the obtaining unit is further configured to obtain a device identifier of the wearable device; and the execution unit is specifically configured to: if the device identifier of the wearable device is a preset legal device identifier, execute Operation instructions corresponding to voice information.
  • the present application provides a wearable device including a connection unit, a detection unit, an identification unit, an authentication unit, and a sending unit.
  • the connection unit is configured to establish a communication connection with the terminal;
  • the detection unit is configured to use a first voice sensor to collect a first voice component in the voice information; and
  • the wearable device uses a second voice sensor to collect a second voice in the voice information Component;
  • the recognition unit is configured to perform voiceprint recognition on the first speech component and the second speech component, respectively.
  • the detection unit is further configured to: use a proximity light sensor on the wearable device to detect the ambient light intensity; use an acceleration sensor on the wearable device to detect the acceleration value; if the ambient light intensity is less than a preset Or the acceleration value is greater than the preset acceleration threshold; or the ambient light intensity is less than the preset light intensity threshold and the acceleration value is greater than the preset acceleration threshold; it is determined that the wearable device is in a wearing state.
  • the detection unit is further configured to: perform voice activity detection (VAD) on the first speech component to obtain a first VAD value; perform VAD on the second speech component to obtain a second VAD value
  • VAD voice activity detection
  • the above recognition unit is specifically configured to perform voiceprint recognition on the first speech component and the second speech component when both the first VAD value and the second VAD value satisfy a preset condition.
  • the foregoing recognition unit is specifically configured to: when the voice information includes a preset keyword, the wearable device performs voiceprint recognition on the first voice component and the second voice component; or; when When a preset operation input by the user is received, voiceprint recognition is performed on the first speech component and the second speech component.
  • the foregoing recognition unit is specifically configured to determine whether the first voice component matches a first voiceprint model of a legitimate user, and the first voiceprint model is used to reflect the legality collected by the first voice sensor. Audio characteristics of the user; judging whether the second voice component matches the second voiceprint model of a legitimate user, the second voiceprint model is used to reflect the audio characteristics of the legitimate user collected by the second voice sensor; the authentication unit is specifically used In: if the first voice component matches the first voiceprint model of a legitimate user and the second voice component matches the second voiceprint model of a legitimate user, then the voice user is determined to be a legitimate user; otherwise, the voice user is determined to be illegal user.
  • the foregoing recognition unit is specifically configured to: calculate a first matching degree between the first voice component and a first voiceprint model of the legal user; if the first matching degree is greater than a first threshold, then Determining that the first speech component matches the first voiceprint model of the legitimate user; calculating a second matching degree between the second speech component and the second voiceprint model of the legitimate user; if the second matching degree is greater than a second threshold, Then it is determined that the second voice component matches the second voiceprint model of the legitimate user.
  • the sending unit is further configured to: if the voiced user is a legitimate user, send an authentication pass message or an unlock instruction to the terminal.
  • the sending unit is further configured to: if the uttering user is a legitimate user, send an operation instruction corresponding to the voice information to the terminal.
  • the detection unit is further configured to detect a wake-up voice input by a user, where the wake-up voice includes a preset wake-up word; and the sending unit is further configured to send a startup instruction to the terminal. Used to instruct the terminal to enable voiceprint recognition.
  • the present application provides a terminal including: a touch screen, one or more processors, a memory, and one or more programs; wherein the processor is coupled to the memory, and the one or more programs are stored in the memory
  • the processor executes one or more programs stored in the memory, so that the terminal executes any one of the foregoing voice control methods.
  • the present application provides a wearable device including a first voice sensor provided outside the wearable device and a second voice sensor provided inside the wearable device, one or more processors, a memory, and a Or a plurality of programs; wherein the processor is coupled to the memory, and the one or more programs are stored in the memory, and when the wearable device is running, the processor executes the one or more programs stored in the memory to make the wearable
  • the device performs any of the above-mentioned voice control methods.
  • the present application provides a computer storage medium including computer instructions, and when the computer instructions are run on a terminal, the terminal or a wearable device is caused to execute the voice control method according to any one of the foregoing.
  • the present application provides a computer program product that, when the computer program product runs on a computer, causes the computer to execute the voice control method according to any one of the first aspect or a possible implementation manner of the first aspect.
  • the terminals described in the third and fifth aspects, the wearable devices described in the fourth and sixth aspects, the computer storage medium described in the seventh aspect, and the devices described in the eighth aspect are provided.
  • the computer program products are all used to execute the corresponding methods provided above. Therefore, for the beneficial effects that can be achieved, reference may be made to the beneficial effects in the corresponding methods provided above, which are not repeated here.
  • FIG. 1 is a scenario architecture diagram 1 of a voice control method according to an embodiment of the present application
  • FIG. 2 is a first schematic structural diagram of a wearable device according to an embodiment of the present application.
  • FIG. 3 is a first schematic structural diagram of a terminal according to an embodiment of the present application.
  • FIG. 4 is a first schematic diagram of interaction of a voice control method according to an embodiment of the present application.
  • FIG. 5 is a second scenario architecture diagram of a voice control method according to an embodiment of the present application.
  • FIG. 6 is an interaction schematic diagram 2 of a voice control method according to an embodiment of the present application.
  • FIG. 7 is a third scene architecture diagram of a voice control method according to an embodiment of the present application.
  • FIG. 8 is a second schematic structural diagram of a terminal according to an embodiment of the present application.
  • FIG. 9 is a second schematic structural diagram of a wearable device according to an embodiment of the present application.
  • FIG. 10 is a third schematic structural diagram of a terminal according to an embodiment of the present application.
  • a voice control method provided by an embodiment of the present application can be applied to a voice control system composed of a wearable device 11 and a terminal 12.
  • the wearable device 11 may be a device with a voice collection function, such as a wireless headset, a wired headset, smart glasses, a smart helmet, or a smart watch.
  • the terminal 12 may be a device such as a mobile phone, a tablet computer, a notebook computer, an Ultra-mobile Personal Computer (UMPC), a Personal Digital Assistant (PDA), and the like in this embodiment of the present application.
  • UMPC Ultra-mobile Personal Computer
  • PDA Personal Digital Assistant
  • the wearable device 11 may specifically include a first voice sensor 201 provided outside the wearable device 11 and a second voice sensor 202 provided inside the wearable device 11.
  • the inside of the wearable device 11 refers to the side where the user directly contacts the user when using the wearable device 11, and the outside of the wearable device 11 means the side that does not directly contact the user.
  • the first voice sensor 201 may be an air conduction microphone
  • the second voice sensor 202 may be a sensor capable of collecting a vibration signal generated by a user, such as a bone conduction microphone, an optical vibration sensor, an acceleration sensor, or an air conduction microphone.
  • the air conduction microphone collects voice information through the air to transmit the vibration signal to the microphone
  • the bone conduction microphone collects voice information through the bone to transmit the vibration signal to the microphone.
  • the wearable device 11 when the user wears the wearable device 11 to speak, the wearable device 11 can pass the first voice sensor. 201 collects voice information sent by the user after being transmitted through the air, and can also collect voice information sent by the user after being transmitted through the bone through the second voice sensor 202.
  • first voice sensors 201 there may be a plurality of first voice sensors 201 in the wearable device 11.
  • the first speech sensor 201 as an air conduction microphone as an example, two air conduction microphones can be provided outside the wearable device 11, and the two air conduction microphones collectively collect voice information sent by the user after being transmitted by the air to obtain the voice information.
  • the first speech component in.
  • the bone conduction microphone can be used to collect voice information sent by the user after being transmitted through the bone to obtain a second voice component in the voice information.
  • the wearable device 11 may further include an acceleration sensor 203 (the acceleration sensor 203 may also serve as the second voice sensor 202 described above), a proximity light sensor 204, a communication module 205, a speaker 206, a calculation module 207, and a storage device. Module 208 and power supply 209 and other components. It can be understood that the above-mentioned wearable device 11 may have more or fewer components than those shown in FIG. 2, may combine two or more components, or may have different component configurations.
  • the various components shown in FIG. 2 may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing or application specific integrated circuits.
  • the terminal 12 in the voice control system may be a mobile phone 100.
  • the mobile phone 100 may specifically include a processor 101, a radio frequency (RF) circuit 102, a memory 103, a touch screen 104, a Bluetooth device 105, one or more sensors 106, a Wi-Fi device 107, and a positioning device.
  • the mobile phone 100 may include more or fewer components than shown in the figure, or combine certain components, or arrange different components. .
  • the processor 101 is the control center of the mobile phone 100. It uses various interfaces and lines to connect various parts of the mobile phone 100, and executes the mobile phone by running or executing applications stored in the memory 103 and calling data and instructions stored in the memory 103. 100 various functions and processing data.
  • the processor 101 may include one or more processing units; the processor 101 may further integrate an application processor and a modem processor; wherein the application processor mainly processes an operating system, a user interface, and an application program, and the like.
  • the modem processor mainly handles wireless communication. It can be understood that the foregoing modem processor may not be integrated into the processor 101.
  • the processor 101 may be a Kirin 960 multi-core processor manufactured by Huawei Technologies Co., Ltd.
  • the radio frequency circuit 102 may be used for receiving and transmitting wireless signals during information transmission or communication. Specifically, the radio frequency circuit 102 may receive the downlink data of the base station and process it to the processor 101; in addition, send the uplink data to the base station. Generally, the radio frequency circuit includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency circuit 102 can also communicate with other devices through wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to a global mobile communication system, a general packet wireless service, code division multiple access, wideband code division multiple access, long-term evolution, email, short message service, and the like.
  • the memory 103 is configured to store application programs and data, and the processor 101 executes various functions and data processing of the mobile phone 100 by running the application programs and data stored in the memory 103.
  • the memory 103 mainly includes a storage program area and a storage data area, where the storage program area can store an operating system and at least one application required by a function (such as a sound playback function, an image playback function, etc.); the storage data area can store data according to the mobile phone used Data created at 100 (such as audio data, phone book, etc.).
  • the memory 103 may include a high-speed random access memory, and may also include a non-volatile memory, such as a magnetic disk storage device, a flash memory device, or other volatile solid-state storage devices.
  • the memory 103 can store various operating systems, such as those developed by Apple Inc. Operating system, developed by Google Operating system, etc.
  • the touch screen 104 may include a touch-sensitive surface 104-1 and a display 104-2.
  • the touch-sensitive surface 104-1 (for example, a touch panel) can capture touch events on or near the user of the mobile phone 100 (such as the user using a finger, a stylus, etc. on any suitable object on the touch-sensitive surface 104-1). Or operations near the touch-sensitive surface 104-1), and send the collected touch information to other devices such as the processor 101.
  • the user's touch event near the touch-sensitive surface 104-1 can be referred to as hovering touch; hovering touch can mean that the user does not need to directly touch the touchpad in order to select, move, or drag an object (such as an icon). , As long as the user is located near the terminal in order to perform the desired function.
  • the touch-sensitive surface 104-1 capable of floating touch can be implemented using a capacitive type, an infrared light sensor, and an ultrasonic wave.
  • the touch-sensitive surface 104-1 may include two parts, a touch detection device and a touch controller. Among them, the touch detection device detects the user's touch position, and detects the signal brought by the touch operation, and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device, converts it into contact coordinates, and then When sent to the processor 101, the touch controller may also receive and execute instructions sent by the processor 101.
  • various types such as resistive, capacitive, infrared, and surface acoustic waves can be used to implement the touch-sensitive surface 104-1.
  • the display (also referred to as a display screen) 104-2 may be used to display information input by the user or information provided to the user and various menus of the mobile phone 100.
  • the display 104-2 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like.
  • the touch-sensitive surface 104-1 may be overlaid on the display 104-2. When the touch-sensitive surface 104-1 detects a touch event on or near the touch-sensitive surface 104-1, it transmits it to the processor 101 to determine the type of the touch event, and then the processor 101 may provide a corresponding visual output on the display 104-2 according to the type of the touch event.
  • the touch-sensitive surface 104-1 and the display screen 104-2 are implemented as two independent components to implement the input and output functions of the mobile phone 100, in some embodiments, the touch-sensitive surface 104- 1 Integrated with the display screen 104-2 to implement the input and output functions of the mobile phone 100. It can be understood that the touch screen 104 is formed by stacking multiple layers of materials. Only the touch-sensitive surface (layer) and the display screen (layer) are shown in the embodiments of the present application, and other layers are not described in the embodiments of the present application.
  • the touch-sensitive surface 104-1 may cover the display 104-2, and the size of the touch-sensitive surface 104-1 is larger than the size of the display screen 104-2, so that the display screen 104- 2 is all covered under the touch-sensitive surface 104-1, or the touch-sensitive surface 104-1 can be configured on the front of the mobile phone 100 in the form of a comprehensive board, that is, the user's touch on the front of the mobile phone 100 can be perceived by the mobile phone, so You can achieve a full touch experience on the front of the phone.
  • the touch-sensitive surface 104-1 is configured on the front of the mobile phone 100 in the form of a full board
  • the display screen 104-2 may also be configured on the front of the mobile phone 100 in the form of a full board. Achieve frameless structure.
  • the touch screen 104 may further include one or more sets of sensor arrays. The touch screen 104 may also sense the pressure exerted by the user on the touch screen 104 while sensing a touch event on the user. Wait.
  • the mobile phone 100 may further include a Bluetooth device 105 for implementing data exchange between the mobile phone 100 and other short-range terminals (for example, the aforementioned wearable device 11 and the like).
  • the Bluetooth device in the embodiment of the present application may be an integrated circuit or a Bluetooth chip.
  • the mobile phone 100 may further include at least one sensor 106, such as a light sensor, a motion sensor, and other sensors.
  • the light sensor may include an ambient light sensor and a proximity sensor.
  • the ambient light sensor can adjust the brightness of the display of the touch screen 104 according to the brightness of the ambient light, and the proximity sensor can turn off the power of the display when the mobile phone 100 is moved to the ear.
  • an accelerometer sensor can detect the magnitude of acceleration in various directions (usually three axes), and can detect the magnitude and direction of gravity when it is stationary.
  • attitude of the mobile phone such as horizontal and vertical screen switching, related Games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer, tap), etc .
  • vibration recognition related functions such as pedometer, tap
  • the mobile phone 100 fingerprint recognition devices, gyroscopes, barometers, hygrometers, thermometers, infrared sensors and other sensors can also be configured Will not repeat them here.
  • Wi-Fi device 107 is used to provide mobile phone 100 with network access that complies with Wi-Fi related standard protocols. Mobile phone 100 can access Wi-Fi access points through Wi-Fi device 107 to help users send and receive email, Browse the web and access streaming media, etc., it provides users with wireless broadband Internet access. In some other embodiments, the Wi-Fi device 107 can also be used as a Wi-Fi wireless access point, and can provide Wi-Fi network access for other terminals.
  • the positioning device 108 is configured to provide a geographic location for the mobile phone 100. It can be understood that the positioning device 108 may specifically be a receiver of a positioning system such as a global positioning system (GPS) or a Beidou satellite navigation system. After receiving the geographical position sent by the positioning system, the positioning device 108 sends the information to the processor 101 for processing, or sends the information to the memory 103 for storage. In other embodiments, the positioning device 108 may be an assisted global positioning system (AGPS) receiver. AGPS is an operation mode for GPS positioning with certain assistance and cooperation.
  • AGPS assisted global positioning system
  • the signal of the base station and the GPS satellite signal can make the mobile phone 100 locate faster; in the AGPS system, the positioning device 108 can obtain positioning assistance through communication with an auxiliary positioning server (such as a mobile phone positioning server).
  • the AGPS system assists the positioning device 108 to complete ranging and positioning services by serving as an auxiliary server.
  • the auxiliary positioning server communicates with the terminal such as the positioning device 108 (ie, the GPS receiver) of the mobile phone 100 through a wireless communication network to provide positioning. assist.
  • the audio circuit 109, the speaker 113, and the microphone 114 may provide an audio interface between the user and the mobile phone 100.
  • the audio circuit 109 may transmit the received electrical data converted electric signal to the speaker 113, and the speaker 113 converts the sound signal into a sound signal for output.
  • the microphone 114 converts the collected sound signal into an electric signal, and the audio circuit 109 After receiving, it is converted into audio data, and then the audio data is output to the RF circuit 102 for sending to another mobile phone, or the audio data is output to the memory 103 for further processing.
  • the peripheral interface 110 is used to provide various interfaces for external input / output devices (such as a keyboard, a mouse, an external display, an external memory, a user identification module card, etc.). For example, it is connected to a mouse through a universal serial bus interface, and is electrically connected to a subscriber identity module (SIM) card provided by a telecommunications operator through a metal contact on a card slot of the subscriber identity module.
  • SIM subscriber identity module
  • the peripheral interface 110 may be used to couple the above-mentioned external input / output peripherals to the processor 101 and the memory 103.
  • the mobile phone 100 may further include a power supply device 111 (such as a battery and a power management chip) for supplying power to various components.
  • the battery may be logically connected to the processor 101 through the power management chip, so as to manage charging, discharging, and power consumption management through the power supply device 111. And other functions.
  • the mobile phone 100 may further include a camera, a flash, a micro-projection device, a near field communication (NFC) device, and the like, and details are not described herein.
  • a camera a flash
  • a micro-projection device a micro-projection device
  • NFC near field communication
  • the wearable device 11 is a Bluetooth headset and the terminal 12 is a mobile phone.
  • the Bluetooth headset and the mobile phone can communicate through a Bluetooth connection.
  • the user can input voice information into the Bluetooth headset while wearing the Bluetooth headset.
  • the Bluetooth headset can collect the voice information through a first voice sensor 201 externally set and a second voice sensor 202 internally set.
  • the voice information collected by the first voice sensor 201 is a first voice component
  • the voice information collected by the second voice sensor 202 is a second voice component.
  • the Bluetooth headset can perform voiceprint recognition on the first voice component and the second voice component, respectively, to obtain a first voiceprint recognition result corresponding to the first voice component and a second voiceprint recognition result corresponding to the second voice component.
  • a first voiceprint model and a second voiceprint model of a legal user may be stored in the Bluetooth headset in advance.
  • the first voiceprint model is generated based on the registered voice input by the legal user to the first voice sensor 201 in advance
  • the second voiceprint model is The model is generated based on a registered voice input by a legitimate user into the second voice sensor 202 in advance.
  • the Bluetooth headset may match the first voiceprint model with the collected first voice component, and match the second voiceprint model with the collected second voice component.
  • the Bluetooth headset may calculate a first matching degree between the first voice component and the first voiceprint model, and a second matching degree between the second voice component and the second voiceprint model by using a certain algorithm.
  • the degree of matching is higher, it indicates that the speech component is more consistent with the corresponding voiceprint model, and at this time, the possibility that the voiced user is a legitimate user is higher.
  • the Bluetooth headset may determine that the first voice component matches the first voiceprint model, and the second voice component matches the second voiceprint model.
  • the Bluetooth headset may determine that the first voice component matches the first voiceprint model, and the second voice component matches the second voiceprint model. Further, the Bluetooth headset may send an operation instruction corresponding to the voice information to the mobile phone, for example, an instruction such as an unlock instruction, a shutdown instruction, or a call to a specific contact. The mobile phone can perform a corresponding operation according to the operation instruction, thereby realizing a function of the user controlling the mobile phone by voice.
  • the Bluetooth headset can also send the collected first voice component and second voice component to the mobile phone, and the mobile phone performs voiceprint recognition on the first voice component and the second voice component, and judges and inputs the voice information according to the recognition result. Whether the user is a legitimate user. If it is a legal user, the mobile phone can execute the operation instruction corresponding to the voice message.
  • the above-mentioned legal user refers to a user who can pass the identity authentication measures preset by the mobile phone.
  • the identity authentication measures preset by the terminal are the input of a password, fingerprint recognition, and voiceprint recognition. Then, the password is entered or stored in the terminal in advance. Users with fingerprint information and voiceprint models that have been authenticated by the user can be considered legitimate users of the terminal.
  • the legal users of a terminal may include one or more, and any user except the legal user may be regarded as an illegal user of the terminal.
  • Illegal users can also be transformed into legal users after passing certain identity authentication measures, and this embodiment of the application does not place any restrictions on this.
  • the wearable device 11 when a user inputs voice information to the wearable device 11 to control the terminal 12, the wearable device 11 can collect voice information generated in the ear canal when the user speaks, and in the ear For the voice information generated outside the road, at this time, two pieces of voice information (that is, the first voice component and the second voice component described above) are generated in the wearable device 11. In this way, the wearable device 11 (or the terminal 12) can perform voiceprint recognition on the two pieces of voice information respectively. When the voiceprint recognition results of the two pieces of voice information match the voiceprint model of the legitimate user, it can be confirmed at this time The user who enters the voice message is a legal user. Obviously, this dual voiceprint recognition process of two-way voice information can significantly improve the accuracy and security of user identity authentication compared to the voiceprint recognition process of one-way voice information.
  • the wearable device 11 since the wearable device 11 must be worn by the user, the wearable device 11 can collect the voice information input by the user through the bone conduction method. Therefore, when the wearable device 11 collects the voice information input by the bone conduction method When the voiceprint can be identified, it also explains that the source of the above-mentioned voice information is generated by the voice of a legitimate user wearing the wearable device 11, thereby avoiding the situation where an illegal user uses the recording of the legitimate user to maliciously control the terminal of the legitimate user.
  • a mobile phone is used as a terminal, and a Bluetooth headset is used as a wearable device.
  • FIG. 4 is a schematic flowchart of a voice control method according to an embodiment of the present application. As shown in FIG. 4, the voice control method may include:
  • the mobile phone establishes a Bluetooth connection with the Bluetooth headset.
  • the Bluetooth function of the Bluetooth headset can be turned on. At this time, the Bluetooth headset can send a paired broadcast to the outside. If the mobile phone has the Bluetooth function turned on, the mobile phone can receive the pairing broadcast and prompt the user that the relevant Bluetooth device has been scanned. When the user selects a Bluetooth headset on the phone, the phone can pair with the Bluetooth headset and establish a Bluetooth connection. Subsequently, the mobile phone and the Bluetooth headset can communicate through the Bluetooth connection. Of course, if the mobile phone and the Bluetooth headset have been successfully paired before establishing this Bluetooth connection, the mobile phone can automatically establish a Bluetooth connection with the scanned Bluetooth headset.
  • the user can also operate the mobile phone to establish a Wi-Fi connection with the headset.
  • the earphone that the user wishes to use is a wired earphone
  • the user also inserts the plug of the earphone cable into the corresponding earphone interface of the mobile phone to establish a wired connection, which is not limited in the embodiment of the present application.
  • the Bluetooth headset detects whether it is in a wearing state.
  • a proximity light sensor and an acceleration sensor may be provided in the Bluetooth headset, where the proximity light sensor is disposed on a side that the user comes into contact with when wearing.
  • the proximity light sensor and acceleration sensor can be activated periodically to obtain the currently detected measurement value.
  • the Bluetooth headset Since a user wears a Bluetooth headset, the light emitted into the proximity light sensor will be blocked. Therefore, when the light intensity detected by the proximity light sensor is less than a preset light intensity threshold, the Bluetooth headset can determine that it is wearing at this time. Because the Bluetooth headset moves with the user when the user wears the Bluetooth headset, when the acceleration value detected by the acceleration sensor is greater than a preset acceleration threshold, the Bluetooth headset can determine that it is wearing at this time. Or, when the light intensity detected by the proximity light sensor is less than the preset light intensity threshold, if it is detected whether the acceleration value detected by the acceleration sensor at this time is greater than the preset acceleration threshold, the Bluetooth headset may determine that it is wearing at this time status.
  • the Bluetooth headset can further collect the current environment through the second voice sensor. Generated vibration signals.
  • the Bluetooth headset When the Bluetooth headset is in the wearing state, it comes into direct contact with the user, so the vibration signal collected by the second voice sensor is stronger than when it is not worn. If the energy of the vibration signal collected by the second voice sensor is greater than the energy threshold, Then the Bluetooth headset can determine that it is in a wearing state.
  • the Bluetooth headset can determine that it is in a wearing state. This can reduce the chance that the Bluetooth headset cannot accurately detect the wearing state through the proximity light sensor or the acceleration sensor in scenarios such as putting the Bluetooth headset in the pocket.
  • the above-mentioned energy threshold or preset spectral characteristics may be obtained by capturing various vibration signals generated by a large number of users who make sounds or exercise after wearing a Bluetooth headset, and the second voice sensor detects when the user is not wearing a Bluetooth headset. There are significant differences in the energy or spectral characteristics of speech signals.
  • the power consumption of the first voice sensor such as an air conduction microphone
  • the first voice sensor can be turned on to collect voice information generated when the user makes a sound, so as to reduce the power consumption of the Bluetooth headset.
  • the Bluetooth headset When the Bluetooth headset detects that it is currently wearing, it can continue to perform the following steps S403-S407; otherwise, the Bluetooth headset can enter the sleep state until it detects that it is currently wearing and continue to perform the following steps S403-S407.
  • the Bluetooth headset can detect the user ’s intention to use the Bluetooth headset, that is, the Bluetooth headset will trigger the process of collecting voice information and voiceprint recognition input by the user, thereby reducing the function of the Bluetooth headset. Consuming.
  • the above step S402 is an optional step, that is, whether or not the user wears a Bluetooth headset, the Bluetooth headset can continue to perform the following steps S403-S407, which is not limited in this embodiment of the present application.
  • the Bluetooth headset collects the first voice component in the voice information input by the user through the first voice sensor, and collects the second voice component in the voice information through the second voice sensor.
  • the Bluetooth headset may start a voice detection module, and use the first voice sensor and the second voice sensor to collect the voice information input by the user to obtain the first voice component and the second voice information in the voice information. Speech component.
  • the first voice sensor as an air conduction microphone and the second voice sensor as an example of a bone conduction microphone
  • the user can input a voice message "small E, using WeChat payment" while using a Bluetooth headset.
  • the Bluetooth headset can use the air-conducting microphone to receive a vibration signal (ie, the first voice component in the above-mentioned voice information) generated by air vibration after the user speaks.
  • the Bluetooth headset can use the bone conduction microphone to receive the vibration signal generated by the ear bone and skin vibration after the user speaks (that is, the second voice component in the above voice information) .
  • the voice signal and the background noise in the voice information may also be distinguished by a voice activity detection (VAD) algorithm.
  • VAD voice activity detection
  • the Bluetooth headset may input the first voice component and the second voice component in the voice information into the corresponding VAD algorithm to obtain the first VAD value corresponding to the first voice component and the second voice component.
  • the value of the second VAD can be used to reflect whether the voice information is a normal voice signal or a noise signal of the speaker.
  • the VAD value range can be set within the range of 0 to 100. When the VAD value is greater than a certain VAD threshold, the voice information is a normal voice signal of the speaker.
  • the voice information is a noise signal.
  • the VAD value can be set to 0 or 1.
  • the VAD value is 1, it indicates that the voice information is a normal voice signal of the speaker.
  • the VAD value is 0, it indicates that the voice information is a noise signal.
  • the Bluetooth headset may determine whether the voice information is a noise signal by combining the two VAD values of the first VAD value and the second VAD value. For example, when the first VAD value and the second VAD value are both 1, the Bluetooth headset may determine that the voice information is not a noise signal, but a normal voice signal of the speaker. As another example, when the value of the first VAD and the value of the second VAD are respectively greater than the preset value, the Bluetooth headset may determine that the voice information is not a noise signal, but a normal voice signal of the speaker.
  • the Bluetooth headset can only be based on the first The value of VAD determines whether the voice information is a noise signal.
  • the Bluetooth headset By performing voice activity detection on the first voice component and the second voice component, if the Bluetooth headset determines that the voice information is a noise signal, the Bluetooth headset can discard the voice information; if the Bluetooth headset determines that the voice information is not a noise signal , The Bluetooth headset may continue to perform the following steps S404-S407. That is, when the user inputs valid voice information into the Bluetooth headset, the Bluetooth headset will be triggered to perform subsequent voiceprint recognition and other processes, thereby reducing the power consumption of the Bluetooth headset.
  • a noise estimation algorithm for example, a minimum statistical algorithm or a minimum control Recursive averaging algorithms, etc. measure the noise values in the speech information.
  • a Bluetooth headset may be provided with a storage space dedicated to storing the noise value, and each time the Bluetooth headset calculates a new noise value, the new noise value may be updated in the above storage space. That is, the recently measured noise value is always stored in the storage space.
  • the noise value in the storage space may be used to perform noise reduction processing on the first voice component and the second voice component, so that subsequent Bluetooth headsets (Or mobile phone) when the first voice component and the second voice component respectively perform voiceprint recognition, the recognition result is more accurate.
  • the Bluetooth headset sends a first voice component and a second voice component to the mobile phone through a Bluetooth connection.
  • the Bluetooth headset After the Bluetooth headset obtains the first voice component and the second voice component, the first and second voice components can be sent to the mobile phone, and the mobile phone executes the following steps S705-S707 to implement the voice information input by the user Voiceprint identification, user identity authentication and other operations.
  • the mobile phone performs voiceprint recognition on the first voice component and the second voice component to obtain a first voiceprint recognition result corresponding to the first voice component and a second voiceprint recognition result corresponding to the second voice component.
  • Voiceprint models of one or more legitimate users can be stored in the phone in advance.
  • each legitimate user has two voiceprint models, one is a first voiceprint model established based on the voice characteristics of the user collected during the operation of the air conduction microphone (that is, the first voice sensor), and the other is based on bone conduction
  • a second voiceprint model established by a user's voice characteristics collected when the microphone (ie, the second voice sensor) works.
  • the first phase is the background model training phase.
  • developers can collect a large number of speeches (for example, "Hello, Little E", etc.) of the relevant text generated by the speaker when wearing the Bluetooth headset.
  • the mobile phone can filter the speech of these related texts and extract the audio features (such as time-frequency speech spectra or gammatone-like spectra) in the background speech after noise reduction, and use GMM (gaussian mixed model, Gaussian mixture model) or SVM (support vector machines) or deep neural network frameworks and other machine learning algorithms to establish a background model for voiceprint recognition.
  • GMM Gaussian mixed model, Gaussian mixture model
  • SVM support vector machines
  • the mobile phone or the Bluetooth headset may establish a first voiceprint model and a second voiceprint model belonging to the user based on the registered voice input by a user based on the background model.
  • the deep neural network framework includes, but is not limited to, a DNN (deep neural network) algorithm, a RNN (recurrent neural network) algorithm, and a LSTM (long short term memory) algorithm.
  • the second stage is the process of establishing the first voiceprint model and the second voiceprint model belonging to the user by entering the registered voice when the user first uses the voice control function on the mobile phone.
  • the voice assistant APP may prompt the user to wear a Bluetooth headset and say "Hello, Little E" for the registered voice.
  • the Bluetooth headset since the Bluetooth headset includes an air-conducting microphone and a bone-conducting microphone, the Bluetooth headset can obtain the first registered component collected by the air-conducting microphone and the second registered component collected by the bone-conducting microphone in the registered voice.
  • the mobile phone can extract the audio characteristics of the user 1 in the first registration component and the second registration component, respectively, and then input the audio characteristics of the user 1 into the above background.
  • a first voiceprint model and a second voiceprint model of the user 1 are obtained.
  • the mobile phone may save the first voiceprint model and the second voiceprint model of the legal user 1 locally on the mobile phone, or send the first voiceprint model and the second voiceprint model of the legal user 1 to the Bluetooth headset for storage.
  • the mobile phone may also use the Bluetooth headset connected at this time as a legal Bluetooth device.
  • the mobile phone may save the identification of the legal Bluetooth device (such as the MAC address of the Bluetooth headset, etc.) locally on the mobile phone.
  • the mobile phone can receive and execute related operation instructions sent by the legitimate Bluetooth device, and when an illegal Bluetooth device sends an operation instruction to the mobile phone, the mobile phone can discard the operation instruction to improve security.
  • a phone can manage one or more legitimate Bluetooth devices.
  • the user can enter the setting interface 701 of the voiceprint recognition function from the setting function. After clicking the setting button 705, the user can enter the legal device management interface shown in (b) of FIG. 7 706. The user may add or delete a legal Bluetooth device in the legal device management interface 806.
  • step S405 after the mobile phone obtains the first voice component and the second voice component in the voice information, it can extract the audio features in the first voice component and the second voice component, respectively, and then use the first voice of the legitimate user 1.
  • the pattern model matches the audio features in the first speech component, and uses the second voice pattern model of the legitimate user 1 to match the audio features in the second speech component.
  • the mobile phone may calculate a first matching degree between the first voiceprint model and the first voice component (that is, a first voiceprint recognition result) through a certain algorithm, and a second matching between the second voiceprint model and the second voice component.
  • Degree that is, the second voiceprint recognition result
  • the degree of matching is higher, it means that the audio features in the voice information are more similar to the audio features of the legitimate user 1, and the probability that the user who inputs the voice information is the legitimate user 1 is higher.
  • the phone can also calculate the first matching degree of the first voice component with other legitimate users (such as legal user 2, legal user 3) and the above.
  • the Bluetooth headset can determine the legal user (for example, legal user A) with the highest matching degree as the vocal user at this time.
  • the mobile phone may also determine whether voiceprint recognition is required for the first voice component and the second voice component. For example, if a Bluetooth headset or mobile phone can recognize preset keywords from the voice information entered by the user, such as "transfer”, “payment”, “** bank”, or “chat history”, etc. that involve user privacy or financial behavior The keyword indicates that the user needs a higher security requirement to control the mobile phone through voice at this time. Therefore, the mobile phone can perform step S405 for voiceprint recognition.
  • the Bluetooth headset may notify the mobile phone to perform step S405 to perform voice pattern recognition.
  • keywords corresponding to different security levels can also be set in the mobile phone in advance.
  • the keywords with the highest security level include “payment” and “payment”
  • the keywords with higher security levels include “photographing” and “calling”
  • the keywords with the lowest security level include “listening to songs” and “navigation” "Wait.
  • the mobile phone when it is detected that the collected voice information contains the keyword with the highest security level, the mobile phone can be triggered to perform voiceprint recognition on the first voice component and the second voice component respectively, that is, both the collected voice sources are processed.
  • Voiceprint recognition to improve the security of voice control phones.
  • the mobile phone can be triggered to perform only the first voice component or the second voice component.
  • Voiceprint recognition When it is detected that the collected voice information contains the keyword with the lowest security level, the mobile phone does not need to perform voiceprint recognition on the first voice component and the second voice component.
  • the mobile phone does not need to analyze the first voice component and the second voice component. Voiceprint recognition reduces power consumption of the phone.
  • the mobile phone can also preset one or more wake-up words to wake up the mobile phone to turn on the voiceprint recognition function.
  • the wake word could be "Hello, little E".
  • the Bluetooth headset or mobile phone can identify whether the voice information is a wake-up voice containing a wake-up word.
  • a Bluetooth headset may send the first voice component and the second voice component in the collected voice information to the mobile phone. If the mobile phone further recognizes that the voice information contains the above-mentioned wake word, the mobile phone may turn on the voiceprint recognition function (for example, Power on the voiceprint identification chip). Subsequently, if the voice information collected by the Bluetooth headset contains the above keywords, the mobile phone may use the voiceprint recognition function that is turned on to perform voiceprint recognition according to the method of step S405.
  • the voiceprint recognition function for example, Power on the voiceprint identification chip
  • the Bluetooth headset can further identify whether the voice information includes the above-mentioned wake-up word. If the wake word is included, it means that subsequent users may need to use the voiceprint recognition function. Then, the Bluetooth headset may send a startup instruction to the mobile phone, so that the mobile phone turns on the voiceprint recognition function in response to the startup instruction.
  • the mobile phone authenticates the user identity according to the first voiceprint recognition result and the second voiceprint recognition result.
  • step S706 after the mobile phone obtains the first voiceprint recognition result corresponding to the first voice component and the second voiceprint recognition result corresponding to the second voice component through voiceprint recognition, the two voiceprint recognition result pairs can be synthesized.
  • the user identity authentication by inputting the above-mentioned voice information improves the accuracy and security of the user identity authentication.
  • the first matching degree between the first voiceprint model of the legitimate user and the first voice component is the first voiceprint recognition result
  • the second matching degree between the second voiceprint model of the legitimate user and the second voice component. Recognize the result for the second voiceprint.
  • the authentication policy is when the first matching degree is greater than a first threshold and the second matching degree is greater than
  • the second threshold value the second threshold value is the same as or different from the first threshold value
  • the mobile phone determines that the user who sent the first voice component and the second voice component is a legitimate user; otherwise, the mobile phone may determine to send the first voice component and the second voice component.
  • the user of the voice component is an illegal user.
  • the mobile phone may calculate a weighted average of the first matching degree and the second matching degree.
  • the mobile phone may determine that the user who sent the first voice component and the second voice component is legitimate. A user; otherwise, the mobile phone may determine that the user who sent the first voice component and the second voice component is an illegal user.
  • the mobile phone can use different authentication strategies in different voiceprint recognition scenarios. For example, when the collected voice information contains keywords with the highest security level, the mobile phone may set the first threshold and the second threshold to 99 points. In this way, only when the first matching degree and the second matching degree are both greater than 99 points, the mobile phone determines that the current uttering user is a legitimate user. When the collected voice information includes keywords with a lower security level, the mobile phone may set the first threshold and the second threshold to 85 points. In this way, when the first matching degree and the second matching degree are both greater than 85 points, the mobile phone can determine that the current voiced user is a legitimate user. In other words, for voiceprint recognition scenarios with different security levels, mobile phones can use different security levels to authenticate users' identity.
  • the voiceprint model of one or more legal users is stored in the mobile phone, for example, the voiceprint models of legal user A, legal user B, and legal user C are stored in the mobile phone, the voiceprint model of each legal user includes the first One voiceprint model and second voiceprint model. Then, the mobile phone can match the collected first voice component and second voice component with the voiceprint model of each legitimate user according to the above method. Furthermore, the mobile phone can determine the legal user (for example, legal user A) that meets the above-mentioned authentication policy and has the highest matching degree as the voice user at this time.
  • the legal user for example, legal user A
  • the voiceprint model of a legal user stored in the mobile phone may also be established by the mobile phone by fusing the first registered component and the second registered component in the registered voice.
  • each legal user has a voiceprint model, and the voiceprint model can reflect both the audio characteristics of the legal user's voice through air conduction and the audio characteristics of the legal user's voice through bone conduction. .
  • the mobile phone after the mobile phone receives the first voice component and the second voice component in the voice information sent by the Bluetooth headset, it can fuse the first voice component and the second voice component for voiceprint recognition, for example, calculate the first voice component and The degree of matching between the second speech component fusion and the voiceprint model of a legitimate user. Furthermore, the mobile phone can also authenticate the user identity based on the matching degree. Since the voiceprint model of the legal user in this identity authentication method is merged into one, the complexity of the voiceprint model and the required storage space are reduced accordingly. At the same time, the voiceprint feature information of the second voice component is used It also has dual voiceprint protection and live detection functions.
  • the mobile phone executes an operation instruction corresponding to the voice information.
  • the mobile phone can generate an operation instruction corresponding to the voice information. For example, when the above voice message is "small E, use WeChat payment", the corresponding operation instruction is to open the payment interface of the WeChat APP. In this way, after the mobile phone generates an operation instruction to open the payment interface in the WeChat APP, it can automatically open the WeChat APP and display the payment interface in the WeChat APP.
  • the mobile phone can also unlock the screen before executing the operation instruction to open the payment interface in the WeChat APP to display WeChat. Payment interface 501 in the APP.
  • the voice control method provided in the above steps S401-S407 may be a function provided by the voice assistant APP.
  • the voice assistant APP When the Bluetooth headset interacts with the mobile phone, if it is determined through voiceprint recognition that the uttering user at this time is a legitimate user, the mobile phone can send the generated operation instructions or voice information to the voice assistant APP running at the application layer. Further, the voice assistant APP calls a related interface or service of the application framework layer to execute an operation instruction corresponding to the voice information.
  • the voice control method provided in the embodiment of the present application can unlock the mobile phone and execute related operation instructions in the voice information while identifying the identity of the user by using the voiceprint. That is, the user only needs to input voice information once to complete a series of operations such as user identity authentication, mobile phone unlocking, and opening a certain function of the mobile phone, thereby greatly improving the user's control efficiency and user experience of the mobile phone.
  • the voice control method may include:
  • the mobile phone establishes a Bluetooth connection with the Bluetooth headset.
  • the Bluetooth headset detects whether it is in a wearing state.
  • the Bluetooth headset collects the first voice component in the voice information input by the user through the first voice sensor, and collects the second voice component in the voice information through the second voice sensor.
  • the Bluetooth headset may further perform operations such as VAD detection, noise reduction, or filtering on the detected first voice component and second voice component.
  • VAD detection noise reduction
  • filtering filtering
  • the air conduction microphone and bone conduction microphone on the Bluetooth headset may receive an echo signal of a sound source played by the speaker when the Bluetooth headset speaker is working. . Therefore, after the Bluetooth headset obtains the first voice component and the second voice component, an echo cancellation algorithm (AEC) can also be used to eliminate the echo signals in the first voice component and the second voice component to improve the follow-up The accuracy of voiceprint recognition.
  • AEC echo cancellation algorithm
  • the Bluetooth headset performs voiceprint recognition on the first voice component and the second voice component, respectively, to obtain a first voiceprint recognition result corresponding to the first voice component and a second voiceprint recognition result corresponding to the second voice component.
  • the voiceprint model of one or more legal users can be stored in the Bluetooth headset in advance.
  • the voiceprint model stored locally on the Bluetooth headset can be used to perform voiceprint recognition on the first voice component and the second voice component.
  • the specific method for the voiceprint recognition of the first voice component and the second voice component by the Bluetooth headset can be referred to the specific method of voiceprint recognition of the first voice component and the second voice component by the mobile phone in step S405 above. I will not repeat them here.
  • the Bluetooth headset authenticates the user identity according to the first voiceprint recognition result and the second voiceprint recognition result.
  • the Bluetooth headset sends an operation instruction corresponding to the voice information to the mobile phone through a Bluetooth connection.
  • the mobile phone executes the foregoing operation instruction.
  • the Bluetooth headset may generate an operation instruction corresponding to the voice information. For example, when the above voice message is "small E, use WeChat payment", the corresponding operation instruction is to open the payment interface of the WeChat APP. In this way, the Bluetooth headset can send an operation instruction to open the payment interface in the WeChat APP to the mobile phone through the established Bluetooth connection. As shown in FIG. 5, after receiving the operation instruction, the mobile phone can automatically open the WeChat APP and display the payment in the WeChat APP Interface 501.
  • the Bluetooth headset since the Bluetooth headset has determined that the above-mentioned user is a legitimate user, when the phone is locked, the Bluetooth headset can also send a message that the user ’s identity is authenticated or an unlock instruction, so that the phone can unlock the screen before executing An operation instruction corresponding to the voice information.
  • the Bluetooth headset can also send the collected voice information to the mobile phone, and the mobile phone generates a corresponding operation instruction according to the voice information and executes the operation instruction.
  • the Bluetooth headset when the Bluetooth headset sends the voice information or a corresponding operation instruction to the mobile phone, it may also send its own device identification (such as a MAC address) to the mobile phone. Because the mobile phone stores the identification of the legal Bluetooth device that has passed the authentication, the mobile phone can determine whether the currently connected Bluetooth headset is a legal Bluetooth device according to the received device identification. If the Bluetooth headset is a legitimate Bluetooth device, the mobile phone can further execute the operation instructions sent by the Bluetooth headset, or perform voice recognition operations on the voice information sent by the Bluetooth headset, otherwise, the mobile phone can discard the Bluetooth headset. Operating instructions to avoid security issues caused by malicious manipulation of mobile phones by illegal Bluetooth devices.
  • the mobile phone and the legal Bluetooth device may agree in advance a password or password when transmitting the above operation instruction.
  • the Bluetooth headset when it sends the voice information or the corresponding operation instruction to the mobile phone, it can also send a pre-approved password or password to the mobile phone, so that the mobile phone can determine whether the currently connected Bluetooth headset is a legitimate Bluetooth device.
  • the mobile phone and the legal Bluetooth device may agree in advance an encryption and decryption algorithm used in transmitting the above operation instruction.
  • the operation instruction may be encrypted by using an agreed encryption algorithm.
  • the mobile phone receives the encrypted operation instruction, if the operation instruction can be decrypted using the agreed decryption algorithm, it means that the currently connected Bluetooth headset is a legitimate Bluetooth device, and the mobile phone can further execute the operation instruction sent by the Bluetooth headset; otherwise Indicates that the currently connected Bluetooth headset is an illegal Bluetooth device, and the mobile phone can discard the operation instructions sent by the Bluetooth headset.
  • steps S401-S407 and steps S601-S607 are only two implementations of the voice control method provided in this application. It can be understood that a person skilled in the art may set which steps in the above embodiments are performed by a Bluetooth headset and which steps are performed by a mobile phone according to an actual application scenario or actual experience, which is not limited in the embodiment of the present application.
  • the Bluetooth headset may also send the first voiceprint recognition result and the second voiceprint recognition result to the mobile phone after performing voiceprint recognition on the first voice component and the second voice component, and the mobile phone may subsequently perform the voiceprint recognition based on the voiceprint.
  • the recognition result is used for user identity authentication and other operations.
  • the Bluetooth headset may first determine whether it is necessary to perform voiceprint recognition on the first voice component and the second voice component. If voiceprint recognition is required for the first voice component and the second voice component, the Bluetooth headset can send the first voice component and the second voice component to the mobile phone, and then the mobile phone can complete subsequent voiceprint recognition, user identity authentication, and other operations. Otherwise, the Bluetooth headset does not need to send the first voice component and the second voice component to the mobile phone, thereby avoiding increasing the power consumption of the mobile phone to process the first voice component and the second voice component.
  • the user may also enter the setting interface 701 of the mobile phone to enable or disable the voice control function.
  • the user can set keywords that trigger the voice control, such as "small E", "payment”, etc. through the setting button 702, and the user can also manage the voiceprint model of legal users through the setting button 703. For example, a voiceprint model of a legitimate user is added or deleted.
  • the user can also set operation instructions that can be supported by the voice assistant through the setting button 704, such as payment, making a call, ordering food, and the like. In this way, users can get a customized voice control experience.
  • an embodiment of the present application discloses a terminal.
  • the terminal is configured to implement the methods described in the foregoing method embodiments, and includes: a connecting unit 801 and an obtaining unit 802. , An identification unit 803, an authentication unit 804, and an execution unit 805.
  • the connection unit 801 is used to support the terminal to execute the process S401 in FIG. 4 and the process S601 in FIG. 6;
  • the acquisition unit 802 supports the terminal to execute the process S404 in FIG. 4 and the process S606 in FIG. 6;
  • the supporting terminal executes the process S405 in FIG. 4;
  • the authentication unit 804 is used to support the terminal to execute the process S406 in FIG.
  • the execution unit 805 is used to support the terminal to execute the process S407 in FIG. 4 and the process S607 in FIG.
  • all relevant content of each step involved in the above method embodiment can be referred to the functional description of the corresponding functional module, which will not be repeated here.
  • an embodiment of the present application discloses a wearable device.
  • the wearable device is used to implement the methods described in the foregoing method embodiments, and includes: a connecting unit 901 , A detection unit 902, a sending unit 903, a recognition unit 904, and an authentication unit 905.
  • the connection unit 801 is used to support the terminal to execute the process S401 in FIG. 4 and the process S601 in FIG. 6;
  • the detection unit 902 is used to support the terminal to execute the processes S402-S403 in FIG. 4 and the process S602- S603;
  • the identification unit 904 is used to support the terminal to execute the process S604 in FIG.
  • an embodiment of the present application discloses a terminal.
  • the terminal may include a touch screen 1001, wherein the touch screen 1001 includes a touch-sensitive surface 1006 and a display screen 1007;
  • One or more processors 1002 ; a memory 1003; one or more application programs (not shown); and one or more computer programs 1004, each of which may be connected through one or more communication buses 1005.
  • the one or more computer programs 1004 are stored in the memory 1003 and are configured to be executed by the one or more processors 1002.
  • the one or more computer programs 1004 include instructions. 4. Each step in FIG. 6 and corresponding embodiments.
  • Each functional unit in each of the embodiments of the present application may be integrated into one processing unit, or each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above integrated unit may be implemented in the form of hardware or in the form of software functional unit.
  • the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it may be stored in a computer-readable storage medium.
  • the technical solution of the embodiments of the present application is essentially a part that contributes to the existing technology or all or part of the technical solution may be embodied in the form of a software product.
  • the computer software product is stored in a storage
  • the medium includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) or a processor to perform all or part of the steps of the method described in the embodiments of the present application.
  • the foregoing storage medium includes: various types of media that can store program codes, such as a flash memory, a mobile hard disk, a read-only memory, a random access memory, a magnetic disk, or an optical disk.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Computer Security & Cryptography (AREA)
  • Computer Hardware Design (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Telephone Function (AREA)
  • User Interface Of Digital Computer (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本申请实施例公开了一种语音控制方法、可穿戴设备及终端,涉及终端领域,可在用户使用语音控制终端时提高声纹识别的准确性和安全性。该方法包括:终端与可穿戴设备建立通信连接;当发声用户向可穿戴设备输入语音信息时,终端根据所述语音信息中第一语音分量的第一声纹识别结果和所述语音信息中第二语音分量的第二声纹识别结果,对所述发声用户进行身份鉴权;所述第一语音分量是由所述可穿戴设备的第一语音传感器采集到的,所述第二语音分量是由所述可穿戴设备的第二语音传感器采集到的;若所述终端对所述发声用户的身份鉴权结果为所述发声用户为合法用户,则所述终端执行与所述语音信息对应的操作指令。

Description

一种语音控制方法、可穿戴设备及终端 技术领域
本申请涉及终端领域,尤其涉及一种语音控制方法、可穿戴设备及终端。
背景技术
声纹(voiceprint)是指用户发声时携带言语信息的声波频谱,能够反映出用户的音频特征。由于不同人在讲话时使用的发声器官(例如,舌、牙齿、喉头、肺、鼻腔等)在尺寸和形态方面具有差异,所以任意两个人的声波频谱一般具有差异性。因此,通过声纹识别(speaker recognition,SR)可以对一种或多种语音信息进行分析,从而达到对未知声音进行辨别的目的。
目前,传统的声纹识别方式主要利用常规麦克风采集经过空气传播的说话人声音信号,进而根据采集到的说话人声音信号识别说话人身份。但是,如果说话人身处嘈杂的环境,采集到的说话人声音信号噪音较大,容易干扰声纹识别的准确性。并且,如果有人恶意使用说话人的录音模拟说话人声音信号,手机等终端会因无法准确辨识而增加安全风险。
发明内容
本申请提供一种语音控制方法、可穿戴设备及终端,可在用户使用语音控制终端时提高声纹识别的准确性和安全性。
为达到上述目的,本申请采用如下技术方案:
第一方面,本申请提供一种语音控制方法,包括:终端与可穿戴设备建立通信连接;当发声用户向可穿戴设备输入语音信息时,终端根据该语音信息中第一语音分量的第一声纹识别结果和该语音信息中第二语音分量的第二声纹识别结果,对发声用户进行身份鉴权;其中,第一语音分量是由可穿戴设备的第一语音传感器采集到的,第二语音分量是由可穿戴设备的第二语音传感器采集到的;若终端对发声用户的身份鉴权结果为该发声用户是合法用户,则终端执行与该语音信息对应的操作指令。
可以看出,可穿戴设备在采集发声用户的语音信息时使用两个语音传感器采集到了两路语音信息(即上述第一语音分量和第二语音分量)。这样,终端可针对这两路语音信息分别进行声纹识别,当这两路语音信息的声纹识别结果均与合法用户匹配时,可确认此时的发声用户为合法用户。显然,这种两路语音信息的双重声纹识别过程相比于一路语音信息的声纹识别过程能够显著提高用户身份鉴权时的准确性和安全性。
另外,如果第二语音分量是由可穿戴设备的骨传导麦克风采集到的,则说明用户在发声时已经佩戴该可穿戴设备,从而避免非法用户使用合法用户的录音恶意控制合法用户的终端的情况。
在一种可能的设计方法中,在终端根据该语音信息中第一语音分量的第一声纹识别结果和该语音信息中第二语音分量的第二声纹识别结果,对该发声用户进行身份鉴权之前,还包括:终端从可穿戴设备获取第一声纹识别结果和第二声纹识别结果,第一声纹识别结果为可穿戴设备对第一语音分量进行声纹识别后得到的,第二声纹识别结果为可穿戴设备 对第二语音分量进行声纹识别后得到的。也就是说,可穿戴设备采集到发声用户语音信息中的第一语音分量和第二语音分量后,可在本地对这两路语音分量分别进行声纹识别,进而将识别结果发送给终端,从而可降低终端在实现语音控制时的实现复杂度。
在一种可能的设计方法中,在终端根据该语音信息中第一语音分量的第一声纹识别结果和该语音信息中第二语音分量的第二声纹识别结果,对该发声用户进行身份鉴权之前,还包括:终端从可穿戴设备获取第一语音分量和第二语音分量;终端对第一语音分量和第二语音分量分别进行声纹识别,得到与第一语音分量对应的第一声纹识别结果以及与第二语音分量对应的第二声纹识别结果。也就是说,可穿戴设备采集到发声用户语音信息中的第一语音分量和第二语音分量后,可将这两路语音分量发送给终端进行声纹识别,从而降低可穿戴设备的功耗和实现复杂度。
在一种可能的设计方法中,终端对第一语音分量和第二语音分量分别进行声纹识别,包括:当该语音信息中包括预设的关键词时,终端对第一语音分量和第二语音分量进行声纹识别;或者;当接收到用户输入的预设操作时,终端对第一语音分量和第二语音分量进行声纹识别。否则,说明用户此时没有进行声纹识别的需求,则终端无需开启声纹识别功能,从而降低终端的功耗。
在一种可能的设计方法中,终端对第一语音分量和第二语音分量分别进行声纹识别,包括:终端判断第一语音分量与合法用户的第一声纹模型是否匹配,第一声纹模型用于反映第一语音传感器采集到的该合法用户的音频特征;终端判断第二语音分量与合法用户的第二声纹模型是否匹配,第二声纹模型用于反映第二语音传感器采集到的该合法用户的音频特征;
此时,终端根据该语音信息中第一语音分量的第一声纹识别结果和该语音信息中第二语音分量的第二声纹识别结果,对该发声用户进行身份鉴权,包括:若第一语音分量与合法用户的第一声纹模型匹配,且第二语音分量与合法用户的第二声纹模型匹配,则终端确定该发声用户为合法用户;否则,终端确定该发声用户为非法用户。
在一种可能的设计方法中,终端判断第一语音分量与合法用户的第一声纹模型是否匹配,包括:终端计算第一语音分量与该合法用户的第一声纹模型之间的第一匹配度;若第一匹配度大于第一阈值,则终端确定第一语音分量与该合法用户的第一声纹模型匹配;终端判断第二语音分量与合法用户的第二声纹模型是否匹配,包括:终端计算第二语音分量与该合法用户的第二声纹模型之间的第二匹配度;若第二匹配度大于第二阈值,则终端确定第二语音分量与该合法用户的第二声纹模型匹配。
在一种可能的设计方法中,在终端根据该语音信息中第一语音分量的第一声纹识别结果和该语音信息中第二语音分量的第二声纹识别结果,对该发声用户进行身份鉴权之前,还包括:终端获取可穿戴设备发送的启动指令,该启动指令是可穿戴设备响应于用户输入的唤醒语音生成的;响应于该启动指令,终端打开声纹识别功能。
在一种可能的设计方法中,在终端从可穿戴设备获取第一语音分量和第二语音分量之后,还包括:终端根据第一语音分量和第二语音分量确定该语音信息中是否包含预设的唤醒词;若包含预设的唤醒词,则终端打开声纹识别功能。
也就是说,用户可以通过说出唤醒词触发终端开启声纹识别功能,否则,说明用户此时没有进行声纹识别的需求,则终端无需开启声纹识别功能,从而降低终端的功耗。
在一种可能的设计方法中,若该发声用户为合法用户,则该方法还包括:终端自动执行解锁操作。这样,用户只需要输入一次语音信息即可完成用户身份鉴权、手机解锁以及打开手机某一功能等一些列操作,从而大大提高了用户对手机的操控效率和用户体验。
在一种可能的设计方法中,在终端执行与该语音信息对应的操作指令之前,还包括:终端获取可穿戴设备的设备标识;其中,终端执行与该语音信息对应的操作指令,包括:若可穿戴设备的设备标识为预设的合法设备标识,则终端执行与该语音信息对应的操作指令。这样,终端可以接收和执行合法蓝牙设备发来的相关操作指令,而当非法蓝牙设备向终端发送操作指令时,终端可丢弃该操作指令以提高安全性。
第二方面,本申请提供一种语音控制方法,包括:可穿戴设备与终端建立通信连接;可穿戴设备使用第一语音传感器采集语音信息中的第一语音分量;可穿戴设备使用第二语音传感器采集该语音信息中的第二语音分量;可穿戴设备对第一语音分量和第二语音分量分别进行声纹识别,以便对发声用户进行身份鉴权。
结合上述第二方面,在第二方面的第一种可能的设计方法中,第一语音传感器位于可穿戴设备上不与用户接触的一侧,第二语音传感器位于可穿戴设备上与用户接触的一侧。例如,第一语音传感器为气传导麦克风,第二语音传感器为骨传导麦克风。
结合上述第二方面的第一种可能的设计方法,在第二方面的第二种可能的设计方法中,在可穿戴设备使用第一语音传感器采集语音信息中的第一语音分量之前,还包括:使用可穿戴设备上的接近光传感器检测环境光强;使用可穿戴设备上的加速度传感器检测加速度值;若该环境光强小于预设的光强阈值;或该加速度值大于预设的加速度阈值;或该环境光强小于预设的光强阈值且该加速度值大于预设的加速度阈值;则确定可穿戴设备处于佩戴状态。
结合上述第二方面以及第二方面中任意一种可能的设计方法,在第二方面的第三种可能的设计方法中,在可穿戴设备使用第二语音传感器采集该语音信息中的第二语音分量之后,还包括:可穿戴设备对第一语音分量进行语音活动检测(VAD),得到第一VAD取值;可穿戴设备对第二语音分量进行VAD,得到第二VAD取值;其中,可穿戴设备对第一语音分量和第二语音分量进行声纹识别,包括:当第一VAD取值和第二VAD取值均满足预设条件时,对第一语音分量和第二语音分量进行声纹识别。
结合上述第二方面以及第二方面中任意一种可能的设计方法,在第二方面的第四种可能的设计方法中,可穿戴设备对第一语音分量和第二语音分量进行声纹识别,包括:当该语音信息中包括预设的关键词时,可穿戴设备对第一语音分量和第二语音分量进行声纹识别;或者;当接收到用户输入的预设操作时,可穿戴设备对第一语音分量和第二语音分量进行声纹识别。
结合上述第二方面以及第二方面中任意一种可能的设计方法,在第二方面的第五种可能的设计方法中,可穿戴设备对第一语音分量和第二语音分量进行声纹识别,包括:可穿戴设备判断第一语音分量与合法用户的第一声纹模型是否匹配,第一声纹模型用于反映第一语音传感器采集到的该合法用户的音频特征;可穿戴设备判断第二语音分量与合法用户的第二声纹模型是否匹配,第二声纹模型用于反映第二语音传感器采集到的该合法用户的音频特征;
其中,在可穿戴设备对第一语音分量和第二语音分量进行声纹识别之后,还包括:若 第一语音分量与合法用户的第一声纹模型匹配,且第二语音分量与合法用户的第二声纹模型匹配,则可穿戴设备确定该发声用户为合法用户;否则,可穿戴设备确定该发声用户为非法用户。
结合上述第二方面中任意第五种可能的设计方法,在第二方面的第六种可能的设计方法中,该方法还包括:可穿戴设备使用第一语音传感器采集该合法用户输入的注册语音中的第一注册分量,以便建立该合法用户的第一声纹模型;可穿戴设备使用第二语音传感器采集该合法用户输入的注册语音中的第二注册分量,以便建立该合法用户的第二声纹模型。
结合上述第二方面中任意第五或第六种可能的设计方法,在第二方面的第七种可能的设计方法中,可穿戴设备判断第一语音分量与合法用户的第一声纹模型是否匹配,包括:可穿戴设备计算第一语音分量与该合法用户的第一声纹模型之间的第一匹配度;若第一匹配度大于第一阈值,则可穿戴设备确定第一语音分量与该合法用户的第一声纹模型匹配;可穿戴设备判断第二语音分量与合法用户的第二声纹模型是否匹配,包括:可穿戴设备计算第二语音分量与该合法用户的第二声纹模型之间的第二匹配度;若第二匹配度大于第二阈值,则可穿戴设备确定第二语音分量与该合法用户的第二声纹模型匹配。
结合上述第二方面以及第二方面中任意一种可能的设计方法,在第二方面的第八种可能的设计方法中,在可穿戴设备对第一语音分量和第二语音分量进行声纹识别之后,还包括:若该发声用户为合法用户,则可穿戴设备向终端发送鉴权通过消息或解锁指令。
结合上述第二方面以及第二方面中任意一种可能的设计方法,在第二方面的第九种可能的设计方法中,在可穿戴设备对第一语音分量和第二语音分量进行声纹识别之后,还包括:若该发声用户为合法用户,则可穿戴设备向终端发送与该语音信息对应的操作指令。
结合上述第二方面以及第二方面中任意一种可能的设计方法,在第二方面的第十种可能的设计方法中,在可穿戴设备对第一语音分量和第二语音分量进行声纹识别之前,还包括:可穿戴设备对第一语音分量和第二语音分量进行降噪处理;和/或,可穿戴设备使用回声消除算法消除第一语音分量和第二语音分量中的回声信号。
结合上述第二方面以及第二方面中任意一种可能的设计方法,在第二方面的第十一种可能的设计方法中,在可穿戴设备使用第一语音传感器采集语音信息中的第一语音分量之前,还包括:可穿戴设备接收用户输入的唤醒语音,该唤醒语音中包括预设的唤醒词;响应于该唤醒语音,可穿戴设备向终端发送启动指令,该启动指令用于指示终端打开声纹识别功能。
第三方面,本申请提供一种终端,包括连接单元、获取单元、识别单元、鉴权单元以及执行单元。其中,连接单元用于:与可穿戴设备建立通信连接;鉴权单元用于:当发声用户向可穿戴设备输入语音信息时,根据该语音信息中第一语音分量的第一声纹识别结果和该语音信息中第二语音分量的第二声纹识别结果,对该发声用户进行身份鉴权,第一语音分量是由可穿戴设备的第一语音传感器采集到的,第二语音分量是由可穿戴设备的第二语音传感器采集到的;执行单元用于:若终端对该发声用户的身份鉴权结果为该发声用户为合法用户,则执行与该语音信息对应的操作指令。
在一种可能的设计方法中,上述获取单元用于:从可穿戴设备获取第一声纹识别结果和第二声纹识别结果,第一声纹识别结果为可穿戴设备对第一语音分量进行声纹识别后得到的,第二声纹识别结果为可穿戴设备对第二语音分量进行声纹识别后得到的。
在一种可能的设计方法中,上述获取单元用于:从可穿戴设备获取第一语音分量和第二语音分量;上述识别单元用于:对第一语音分量和第二语音分量分别进行声纹识别,得到与第一语音分量对应的第一声纹识别结果以及与第二语音分量对应的第二声纹识别结果。
在一种可能的设计方法中,上述识别单元具体用于:当该语音信息中包括预设的关键词时,对第一语音分量和第二语音分量进行声纹识别;或者;当接收到用户输入的预设操作时,对第一语音分量和第二语音分量进行声纹识别。
在一种可能的设计方法中,上述识别单元具体用于:判断第一语音分量与合法用户的第一声纹模型是否匹配,第一声纹模型用于反映第一语音传感器采集到的该合法用户的音频特征;判断第二语音分量与合法用户的第二声纹模型是否匹配,第二声纹模型用于反映第二语音传感器采集到的该合法用户的音频特征;上述鉴权单元具体用于:若第一语音分量与合法用户的第一声纹模型匹配,且第二语音分量与合法用户的第二声纹模型匹配,则确定该发声用户为合法用户;否则,确定该发声用户为非法用户。
在一种可能的设计方法中,上述识别单元具体用于:计算第一语音分量与该合法用户的第一声纹模型之间的第一匹配度;若第一匹配度大于第一阈值,则确定第一语音分量与该合法用户的第一声纹模型匹配;计算第二语音分量与该合法用户的第二声纹模型之间的第二匹配度;若第二匹配度大于第二阈值,则确定第二语音分量与该合法用户的第二声纹模型匹配。
在一种可能的设计方法中,上述获取单元还用于:获取可穿戴设备发送的启动指令,该启动指令是可穿戴设备响应于用户输入的唤醒语音生成的;上述执行单元还用于:响应于该启动指令,打开声纹识别功能。
在一种可能的设计方法中,上述识别单元还用于:根据第一语音分量和第二语音分量确定该语音信息中是否包含预设的唤醒词;上述执行单元还用于:若包含预设的唤醒词,则打开声纹识别功能。
在一种可能的设计方法中,上述执行单元还用于:若该发声用户为合法用户,则自动执行解锁操作。
在一种可能的设计方法中,上述获取单元还用于:获取可穿戴设备的设备标识;上述执行单元具体用于:若可穿戴设备的设备标识为预设的合法设备标识,则执行与该语音信息对应的操作指令。
第四方面,本申请提供一种可穿戴设备,包括连接单元、检测单元、识别单元、鉴权单元以及发送单元。其中,连接单元用于:与终端建立通信连接;检测单元用于:使用第一语音传感器采集语音信息中的第一语音分量;可穿戴设备使用第二语音传感器采集该语音信息中的第二语音分量;识别单元用于:对第一语音分量和第二语音分量分别进行声纹识别。
在一种可能的设计方法中,上述检测单元还用于:使用可穿戴设备上的接近光传感器检测环境光强;使用可穿戴设备上的加速度传感器检测加速度值;若该环境光强小于预设的光强阈值;或该加速度值大于预设的加速度阈值;或该环境光强小于预设的光强阈值且该加速度值大于预设的加速度阈值;则确定可穿戴设备处于佩戴状态。
在一种可能的设计方法中,上述检测单元还用于:对第一语音分量进行语音活动检测 (VAD),得到第一VAD取值;对第二语音分量进行VAD,得到第二VAD取值;上述识别单元具体用于:当第一VAD取值和第二VAD取值均满足预设条件时,对第一语音分量和第二语音分量进行声纹识别。
在一种可能的设计方法中,上述识别单元具体用于:当该语音信息中包括预设的关键词时,可穿戴设备对第一语音分量和第二语音分量进行声纹识别;或者;当接收到用户输入的预设操作时,对第一语音分量和第二语音分量进行声纹识别。
在一种可能的设计方法中,上述识别单元具体用于:判断第一语音分量与合法用户的第一声纹模型是否匹配,第一声纹模型用于反映第一语音传感器采集到的该合法用户的音频特征;判断第二语音分量与合法用户的第二声纹模型是否匹配,第二声纹模型用于反映第二语音传感器采集到的该合法用户的音频特征;上述鉴权单元具体用于:若第一语音分量与合法用户的第一声纹模型匹配,且第二语音分量与合法用户的第二声纹模型匹配,则确定该发声用户为合法用户;否则,确定该发声用户为非法用户。
在一种可能的设计方法中,上述识别单元具体用于:计算第一语音分量与该合法用户的第一声纹模型之间的第一匹配度;若第一匹配度大于第一阈值,则确定第一语音分量与该合法用户的第一声纹模型匹配;计算第二语音分量与该合法用户的第二声纹模型之间的第二匹配度;若第二匹配度大于第二阈值,则确定第二语音分量与该合法用户的第二声纹模型匹配。
在一种可能的设计方法中,上述发送单元还用于:若该发声用户为合法用户,则向终端发送鉴权通过消息或解锁指令。
在一种可能的设计方法中,上述发送单元还用于:若该发声用户为合法用户,则向终端发送与该语音信息对应的操作指令。
在一种可能的设计方法中,上述检测单元还用于:检测用户输入的唤醒语音,该唤醒语音中包括预设的唤醒词;上述发送单元还用于:向终端发送启动指令,该启动指令用于指示终端打开声纹识别功能。
第五方面,本申请提供一种终端,包括:触摸屏、一个或多个处理器、存储器、以及一个或多个程序;其中,处理器与存储器耦合,上述一个或多个程序被存储在存储器中,当终端运行时,该处理器执行该存储器存储的一个或多个程序,以使终端执行上述任一项语音控制方法。
第六方面,本申请提供一种可穿戴设备,包括:设置在可穿戴设备外部的第一语音传感器以及设置在可穿戴设备内部的第二语音传感器、一个或多个处理器、存储器、以及一个或多个程序;其中,处理器与存储器耦合,上述一个或多个程序被存储在存储器中,当可穿戴设备运行时,该处理器执行该存储器存储的一个或多个程序,以使可穿戴设备执行上述任一项语音控制方法。
第七方面,本申请提供一种计算机存储介质,包括计算机指令,当计算机指令在终端上运行时,使得终端或可穿戴设备执行上述任一项所述的语音控制方法。
第八方面,本申请提供一种计算机程序产品,当计算机程序产品在计算机上运行时,使得计算机执行如第一方面或第一方面的可能的实现方式中任一项所述的语音控制方法。
可以理解地,上述提供的第三方面和第五方面所述的终端、第四方面和第六方面所述的可穿戴设备、第七方面所述的计算机存储介质,以及第八方面所述的计算机程序产品均 用于执行上文所提供的对应的方法,因此,其所能达到的有益效果可参考上文所提供的对应的方法中的有益效果,此处不再赘述。
附图说明
图1为本申请实施例提供的一种语音控制方法的场景架构图一;
图2为本申请实施例提供的一种可穿戴设备的结构示意图一;
图3为本申请实施例提供的一种终端的结构示意图一;
图4为本申请实施例提供的一种语音控制方法的交互示意图一;
图5为本申请实施例提供的一种语音控制方法的场景架构图二;
图6为本申请实施例提供的一种语音控制方法的交互示意图二;
图7为本申请实施例提供的一种语音控制方法的场景架构图三;
图8为本申请实施例提供的一种终端的结构示意图二;
图9为本申请实施例提供的一种可穿戴设备的结构示意图二;
图10为本申请实施例提供的一种终端的结构示意图三。
具体实施方式
下面将结合附图对本申请实施例的实施方式进行详细描述。
如图1所示,本申请实施例提供的一种语音控制方法可以应用于可穿戴设备11与终端12组成的语音控制系统中。
其中,可穿戴设备11可以是无线耳机、有线耳机、智能眼镜、智能头盔或者智能腕表等具有语音采集功能的设备。终端12可以是手机、平板电脑、笔记本电脑、超级移动个人计算机(Ultra-mobile Personal Computer,UMPC)、个人数字助理(Personal Digital Assistant,PDA)等设备,本申请实施例对此不做任何限制。
如图2所示,可穿戴设备11具体可以包括设置在可穿戴设备11外部的第一语音传感器201以及设置在可穿戴设备11内部的第二语音传感器202。其中,可穿戴设备11内部是指用户使用可穿戴设备11时与用户直接接触的一侧,穿戴设备11外部是指不与用户直接接触的一侧。例如,上述第一语音传感器201可以是气传导麦克风,上述第二语音传感器202可以是骨传导麦克风、光学振动传感器、加速度传感器或气传导麦克风等能够采集用户发生时产生的振动信号的传感器。其中,气传导麦克风采集语音信息的方式是通过空气将发生时的振动信号传至麦克风,骨传导麦克风采集语音信息的方式是通过骨头将发生时的振动信号传至麦克风。
以第一语音传感器201为气传导麦克风,第二语音传感器202为骨传导麦克风为例,在本申请实施例中,用户佩戴可穿戴设备11说话时,可穿戴设备11既可以通过第一语音传感器201采集经空气传播后用户发出的语音信息,还可以通过第二语音传感器202采集经骨头传播后用户发出的语音信息。
另外,可穿戴设备11中的第一语音传感器201可以有多个。以第一语音传感器201为气传导麦克风为例,可以在可穿戴设备11外部设置两个气传导麦克风,由这两个气传导麦克风共同采集经空气传播后用户发出的语音信息,得到该语音信息中的第一语音分量。并且,可由骨传导麦克风采集经骨头传播后用户发出的语音信息,得到该语音信息中的第二语音分量。
仍如图2所示,可穿戴设备11中还可以包括加速度传感器203(加速度传感器203也 可作为上述第二语音传感器202)、接近光传感器204、通信模块205、扬声器206、计算模块207、存储模块208以及电源209等部件。可以理解的是,上述可穿戴设备11可以具有比图2中所示出的更多的或者更少的部件,可以组合两个或更多的部件,或者可以具有不同的部件配置。图2中所示出的各种部件可以在包括一个或多个信号处理或专用集成电路在内的硬件、软件、或硬件和软件的组合中实现。
如图3所示,上述语音控制系统中的终端12具体可以为手机100。如图3所示,手机100具体可以包括:处理器101、射频(radio frequency,RF)电路102、存储器103、触摸屏104、蓝牙装置105、一个或多个传感器106、Wi-Fi装置107、定位装置108、音频电路109、外设接口110、电源装置111等部件。这些部件可通过一根或多根通信总线或信号线(图3中未示出)进行通信。本领域技术人员可以理解,图3中示出的硬件结构并不构成对手机100的限定,手机100可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。
下面结合图3对手机100的各个部件进行具体的介绍:
处理器101是手机100的控制中心,利用各种接口和线路连接手机100的各个部分,通过运行或执行存储在存储器103内的应用程序,以及调用存储在存储器103内的数据和指令,执行手机100的各种功能和处理数据。在一些实施例中,处理器101可包括一个或多个处理单元;处理器101还可以集成应用处理器和调制解调处理器;其中,应用处理器主要处理操作系统、用户界面和应用程序等,调制解调处理器主要处理无线通信。可以理解的是,上述调制解调处理器也可以不集成到处理器101中。举例来说,处理器101可以是华为技术有限公司制造的麒麟960多核处理器。
射频电路102可用于在收发信息或通话过程中,无线信号的接收和发送。具体地,射频电路102可以将基站的下行数据接收后,给处理器101处理;另外,将涉及上行的数据发送给基站。通常,射频电路包括但不限于天线、至少一个放大器、收发信机、耦合器、低噪声放大器、双工器等。此外,射频电路102还可以通过无线通信和其他设备通信。所述无线通信可以使用任一通信标准或协议,包括但不限于全球移动通讯系统、通用分组无线服务、码分多址、宽带码分多址、长期演进、电子邮件、短信服务等。
存储器103用于存储应用程序以及数据,处理器101通过运行存储在存储器103的应用程序以及数据,执行手机100的各种功能以及数据处理。存储器103主要包括存储程序区以及存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序(比如声音播放功能、图像播放功能等);存储数据区可以存储根据使用手机100时所创建的数据(比如音频数据、电话本等)。此外,存储器103可以包括高速随机存取存储器,还可以包括非易失存储器,例如磁盘存储器件、闪存器件或其他易失性固态存储器件等。存储器103可以存储各种操作系统,例如苹果公司所开发的
Figure PCTCN2018093829-appb-000001
操作系统,谷歌公司所开发的
Figure PCTCN2018093829-appb-000002
操作系统等。
触摸屏104可以包括触敏表面104-1和显示器104-2。
其中,触敏表面104-1(例如触控面板)可采集手机100的用户在其上或附近的触摸事件(比如用户使用手指、触控笔等任何适合的物体在触敏表面104-1上或在触敏表面104-1附近的操作),并将采集到的触摸信息发送给其他器件例如处理器101。其中,用户在触敏表面104-1附近的触摸事件可以称之为悬浮触控;悬浮触控可以是指,用户无需为 了选择、移动或拖动目标(例如图标等)而直接接触触控板,而只需用户位于终端附近以便执行所想要的功能。在悬浮触控的应用场景下,术语“触摸”、“接触”等不会暗示用于直接接触触摸屏,而是在其附近或接近的接触。能够进行悬浮触控的触敏表面104-1可以采用电容式、红外光感以及超声波等实现。触敏表面104-1可包括触摸检测装置和触摸控制器两个部分。其中,触摸检测装置检测用户的触摸方位,并检测触摸操作带来的信号,将信号传送给触摸控制器;触摸控制器从触摸检测装置上接收触摸信息,并将它转换成触点坐标,再发送给处理器101,触摸控制器还可以接收处理器101发送的指令并加以执行。此外,可以采用电阻式、电容式、红外线以及表面声波等多种类型来实现触敏表面104-1。
显示器(也称为显示屏)104-2可用于显示由用户输入的信息或提供给用户的信息以及手机100的各种菜单。可以采用液晶显示器、有机发光二极管等形式来配置显示器104-2。触敏表面104-1可以覆盖在显示器104-2之上,当触敏表面104-1检测到在其上或附近的触摸事件后,传送给处理器101以确定触摸事件的类型,随后处理器101可以根据触摸事件的类型在显示器104-2上提供相应的视觉输出。虽然在图3中,触敏表面104-1与显示屏104-2是作为两个独立的部件来实现手机100的输入和输出功能,但是在某些实施例中,可以将触敏表面104-1与显示屏104-2集成而实现手机100的输入和输出功能。可以理解的是,触摸屏104是由多层材料堆叠而成,本申请实施例中只展示出了触敏表面(层)和显示屏(层),其他层在本申请实施例中不予记载。另外,在本申请其他一些实施例中,触敏表面104-1可以覆盖在显示器104-2之上,并且触敏表面104-1的尺寸大于显示屏104-2的尺寸,使得显示屏104-2全部覆盖在触敏表面104-1下面,或者,上述触敏表面104-1可以以全面板的形式配置在手机100的正面,也即用户在手机100正面的触摸均能被手机感知,这样就可以实现手机正面的全触控体验。在其他一些实施例中,触敏表面104-1以全面板的形式配置在手机100的正面,显示屏104-2也可以以全面板的形式配置在手机100的正面,这样在手机的正面就能够实现无边框的结构。在本申请其他一些实施例中,触摸屏104还可以包括一组或多组传感器阵列,用于触摸屏104在感测用户在其上的触摸事件的同时也可以感测到用户在其上施加的压力等。
手机100还可以包括蓝牙装置105,用于实现手机100与其他短距离的终端(例如,上述可穿戴设备11等)之间的数据交换。本申请实施例中的蓝牙装置可以是集成电路或者蓝牙芯片等。
手机100还可以包括至少一种传感器106,比如光传感器、运动传感器以及其他传感器。具体地,光传感器可包括环境光传感器及接近传感器。其中,环境光传感器可根据环境光线的明暗来调节触摸屏104的显示器的亮度,接近传感器可在手机100移动到耳边时,关闭显示器的电源。作为运动传感器的一种,加速计传感器可检测各个方向上(一般为三轴)加速度的大小,静止时可检测出重力的大小及方向,可用于识别手机姿态的应用(比如横竖屏切换、相关游戏、磁力计姿态校准)、振动识别相关功能(比如计步器、敲击)等;至于手机100还可配置的指纹识别器件、陀螺仪、气压计、湿度计、温度计、红外线传感器等其他传感器,在此不予赘述。
Wi-Fi装置107,用于为手机100提供遵循Wi-Fi相关标准协议的网络接入,手机100可以通过Wi-Fi装置107接入到Wi-Fi接入点,进而帮助用户收发电子邮件、浏览网页和访问流媒体等,它为用户提供了无线的宽带互联网访问。在其他一些实施例中,该Wi-Fi 装置107也可以作为Wi-Fi无线接入点,可以为其他终端提供Wi-Fi网络接入。
定位装置108,用于为手机100提供地理位置。可以理解的是,该定位装置108具体可以是全球定位系统(global positioning system,GPS)、北斗卫星导航系统等定位系统的接收器。定位装置108在接收到上述定位系统发送的地理位置后,将该信息发送给处理器101进行处理,或者发送给存储器103进行保存。在另外的一些实施例中,该定位装置108可以是辅助全球卫星定位系统(assisted global positioning system,AGPS)的接收器,AGPS是一种在一定辅助配合下进行GPS定位的运行方式,它可以利用基站的信号,配合GPS卫星信号,可以让手机100定位的速度更快;在AGPS系统中,该定位装置108可通过与辅助定位服务器(例如手机定位服务器)的通信而获得定位辅助。AGPS系统通过作为辅助服务器来协助定位装置108完成测距和定位服务,在这种情况下,辅助定位服务器通过无线通信网络与终端例如手机100的定位装置108(即GPS接收器)通信而提供定位协助。
音频电路109、扬声器113、麦克风114可提供用户与手机100之间的音频接口。音频电路109可将接收到的音频数据转换后的电信号,传输到扬声器113,由扬声器113转换为声音信号输出;另一方面,麦克风114将收集的声音信号转换为电信号,由音频电路109接收后转换为音频数据,再将音频数据输出至RF电路102以发送给比如另一手机,或者将音频数据输出至存储器103以便进一步处理。
外设接口110,用于为外部的输入/输出设备(例如键盘、鼠标、外接显示器、外部存储器、用户识别模块卡等)提供各种接口。例如通过通用串行总线接口与鼠标连接,通过用户识别模块卡卡槽上的金属触点与电信运营商提供的用户识别模块(subscriber identity module,SIM)卡电连接。外设接口110可以被用来将上述外部的输入/输出外围设备耦接到处理器101和存储器103。
手机100还可以包括给各个部件供电的电源装置111(比如电池和电源管理芯片),电池可以通过电源管理芯片与处理器101逻辑相连,从而通过电源装置111实现管理充电、放电、以及功耗管理等功能。
尽管图3未示出,手机100还可以包括摄像头、闪光灯、微型投影装置、近场通信(near field communication,NFC)装置等,在此不予赘述。
结合上述图1-图3,以可穿戴设备11为蓝牙耳机、终端12为手机举例,蓝牙耳机与手机之间可以通过蓝牙连接进行通信。在本申请实施例中,用户可在佩戴蓝牙耳机时向蓝牙耳机输入语音信息,此时,蓝牙耳机可通过外部设置的第一语音传感器201和内部设置的第二语音传感器202分别采集该语音信息。例如,第一语音传感器201采集到的语音信息为第一语音分量,第二语音传感器202采集到的语音信息为第二语音分量。
这样,蓝牙耳机可以分别对第一语音分量和第二语音分量进行声纹识别,得到与第一语音分量对应的第一声纹识别结果以及与第二语音分量对应的第二声纹识别结果。例如,蓝牙耳机内可预先存储合法用户的第一声纹模型和第二声纹模型,第一声纹模型是根据合法用户预先向第一语音传感器201输入的注册语音生成的,第二声纹模型是根据合法用户预先向第二语音传感器202输入的注册语音生成的。那么,蓝牙耳机可以将第一声纹模型与采集到的第一语音分量进行匹配,并且,将第二声纹模型与采集到的第二语音分量进行匹配。
当上述第一语音分量与第一声纹模型匹配,且第二语音分量与第二声纹模型匹配时, 说明蓝牙耳机此时采集到的语音信息为合法用户输入的。例如,蓝牙耳机可通过一定算法计算第一语音分量与第一声纹模型的第一匹配度,以及第二语音分量与第二声纹模型的第二匹配度。当匹配度越高时,说明该语音分量与对应的声纹模型越吻合,此时发声用户为合法用户的可能性越高。例如,当第一匹配度与第二匹配度的平均值大于80分时,蓝牙耳机可确定第一语音分量与第一声纹模型匹配,且第二语音分量与第二声纹模型匹配。又或者,当第一匹配度与第二匹配度分别大于85分时,蓝牙耳机可确定第一语音分量与第一声纹模型匹配,且第二语音分量与第二声纹模型匹配。进而,蓝牙耳机可向手机发送与该语音信息对应的操作指令,例如,解锁指令、关机指令或者向特定联系人打电话等指令。使得手机可以根据该操作指令执行对应的操作,实现用户通过语音操控手机的功能。
当然,蓝牙耳机也可以将采集到的第一语音分量和第二语音分量发送给手机,由手机分别对第一语音分量和第二语音分量进行声纹识别,并根据识别结果判断输入上述语音信息的用户是否为合法用户。若是合法用户,则手机可执行该语音信息对应的操作指令。
其中,上述合法用户是指能够通过手机预设的身份认证措施的用户,例如,终端预设的身份认证措施为输入密码、指纹识别和声纹识别,那么,通过密码输入或者预先在终端内存储有经过用户身份认证的指纹信息和声纹模型的用户可认为是该终端的合法用户。当然,一个终端的合法用户可以包括一个或多个,除合法用户之外的任意用户都可以视为该终端的非法用户。非法用户通过一定的身份认证措施后也可转变为合法用户,本申请实施例对此不做任何限制。
可以看出,在本申请实施例中,当用户通过向可穿戴设备11输入语音信息以达到控制终端12的目的时,可穿戴设备11可采集用户发声时在耳道内产生的语音信息以及在耳道外产生的语音信息,此时可穿戴设备11内产生了两路语音信息(即上述第一语音分量和第二语音分量)。这样,可穿戴设备11(或终端12)可针对这两路语音信息分别进行声纹识别,当这两路语音信息的声纹识别结果均与合法用户的声纹模型匹配时,可确认此时输入语音信息的用户为合法用户。显然,这种两路语音信息的双重声纹识别过程相比于一路语音信息的声纹识别过程能够显著提高用户身份鉴权时的准确性和安全性。
并且,由于用户必须佩戴该可穿戴设备11后,可穿戴设备11才能通过骨传导这种方式采集到用户输入的语音信息,因此,当可穿戴设备11通过骨传导这种方式采集到的语音信息能够通过声纹识别时,也说明了上述语音信息的来源是佩戴可穿戴设备11的合法用户发声产生的,从而避免非法用户使用合法用户的录音恶意控制合法用户的终端的情况。
为了便于理解,以下结合附图对本申请实施例提供的一种语音控制方法进行具体介绍。以下实施例中均以手机作为终端,以蓝牙耳机作为可穿戴设备举例说明。
图4为本申请实施例提供的一种语音控制方法的流程示意图。如图4所示,该语音控制方法可以包括:
S401、手机与蓝牙耳机建立蓝牙连接。
当用户希望使用蓝牙耳机时,可打开蓝牙耳机的蓝牙功能。此时,蓝牙耳机可对外发送配对广播。如果手机已经打开蓝牙功能,则手机可以接收到该配对广播并提示用户已经扫描到相关的蓝牙设备。当用户在手机上选中蓝牙耳机后,手机可与蓝牙耳机进行配对并建立蓝牙连接。后续,手机与蓝牙耳机之间可通过该蓝牙连接进行通信。当然,如果手机与蓝牙耳机在建立本次蓝牙连接之前已经成功配对,则手机可自动与扫描到的蓝牙耳机建 立蓝牙连接。
另外,如果用户希望使用的耳机具有Wi-Fi功能,用户也可操作手机与该耳机建立Wi-Fi连接。又或者,如果用户希望使用的耳机为有线耳机,用户也将耳机线的插头插入手机相应的耳机接口中建立有线连接,本申请实施例对此不做任何限制。
S402(可选的)、蓝牙耳机检测是否处于佩戴状态。
如图2所示,蓝牙耳机中可设置接近光传感器和加速度传感器,其中,接近光传感器设置在用户佩戴时与用户接触的一侧。该接近光传感器和加速度传感器可定期启动以获取当前检测到的测量值。
由于用户佩戴蓝牙耳机后会挡住射入接近光传感器的光线,因此,当接近光传感器检测到的光强小于预设的光强阈值时,蓝牙耳机可确定此时自身处于佩戴状态。又因为,用户佩戴蓝牙耳机后蓝牙耳机会随用户一起运动,因此,当加速度传感器检测到的加速度值大于预设的加速度阈值时,蓝牙耳机可确定此时自身处于佩戴状态。或者,当接近光传感器检测到的光强小于预设的光强阈值时,如果检测到此时加速度传感器检测到的加速度值是否大于预设的加速度阈值,则蓝牙耳机可确定此时自身处于佩戴状态。
进一步地,由于蓝牙耳机内还设置有通过骨传导的方式采集语音信息的第二语音传感器(例如骨传导麦克风或光学振动传感器等),因此,蓝牙耳机可进一步通过第二语音传感器采集当前环境中产生的振动信号。当蓝牙耳机处于佩戴状态时与用户直接接触,因此第二语音传感器采集到的振动信号相较于未佩戴状态下较为强烈,那么,如果第二语音传感器采集到的振动信号的能量大于能量阈值,则蓝牙耳机可确定出自身处于佩戴状态。又或者,由于用户佩戴蓝牙耳机时采集到的振动信号中的谐波、共振等频谱特征与蓝牙耳机未被佩戴时采集到的频谱特征具有显著区别,因此,如果第二语音传感器采集到的振动信号满足预设频谱特征,则蓝牙耳机可确定出自身处于佩戴状态。这样可以减少用户将蓝牙耳机放入口袋等场景下,蓝牙耳机无法通过接近光传感器或加速度传感器准确检测佩戴状态的几率。
其中,上述能量阈值或者预设频谱特征可以是通过抓取大量用户佩戴蓝牙耳机后发声或者运动等方式产生的各种振动信号后统计得到的,与用户没有佩戴蓝牙耳机时第二语音传感器检测到的语音信号的能量或频谱特征具有明显差异。另外,由于蓝牙耳机外部的第一语音传感器(例如气传导麦克风)的功耗一般较大,因此,在蓝牙耳机检测出当前处于佩戴状态之前,无需开启第一语音传感器。当蓝牙耳机检测出当前处于佩戴状态后,可开启第一语音传感器采集用户发声时产生的语音信息,以降低蓝牙耳机的功耗。
当蓝牙耳机检测出当前处于佩戴状态后,可继续执行下述步骤S403-S407;否则,蓝牙耳机可进入休眠状态,直到检测出当前处于佩戴状态后继续执行下述步骤S403-S407。也就是说,蓝牙耳机可在检测出用户佩戴了蓝牙耳机,即用户对蓝牙耳机具有使用意图时,才会触发蓝牙耳机采集用户输入的语音信息以及声纹识别等过程,从而降低蓝牙耳机的功耗。当然,上述步骤S402为可选步骤,即无论用户是否佩戴了蓝牙耳机,蓝牙耳机均可续执行下述步骤S403-S407,本申请实施例对此不做任何限制。
S403、若处于佩戴状态,则蓝牙耳机通过第一语音传感器采集用户输入的语音信息中的第一语音分量,并通过第二语音传感器采集上述语音信息中的第二语音分量。
当确定出蓝牙耳机处于佩戴状态时,蓝牙耳机可启动语音检测模块,分别使用上述第 一语音传感器和第二语音传感器采集用户输入的语音信息,得到该语音信息中的第一语音分量和第二语音分量。以第一语音传感器为气传导麦克风,第二语音传感器为骨传导麦克风举例,用户在使用蓝牙耳机的过程中可以输入语音信息“小E,使用微信支付”。此时,由于气传导麦克风暴露在空气中,因此,蓝牙耳机可使用气传导麦克风接收用户发声后由空气振动产生的振动信号(即上述语音信息中的第一语音分量)。同时,由于骨传导麦克风能够通过皮肤与用户耳骨接触,因此,蓝牙耳机可使用骨传导麦克风接收用户发声后由耳骨和皮肤振动产生的振动信号(即上述语音信息中的第二语音分量)。
在本申请的一些实施例中,当蓝牙耳机检测到用户输入的语音信息后,还可以通过VAD(voice activity detection,语音活动检测)算法区分上述语音信息中的语音信号和背景噪音。具体的,蓝牙耳机可以分别将上述语音信息中的第一语音分量和第二语音分量输入至相应的VAD算法中,得到与第一语音分量对应的第一VAD取值以及与第二语音分量对应的第二VAD取值。其中,VAD取值可用于反映上述语音信息是说话人正常的语音信号还是噪音信号。例如,可将VAD取值范围设置在0至100的区间内,当VAD取值大于某一VAD阈值时可说明该语音信息是说话人正常的语音信号,当VAD取值小于某一VAD阈值时可说明该语音信息是噪音信号。又例如,可将VAD取值设置为0或1,当VAD取值为1时,说明该语音信息是说话人正常的语音信号,当VAD取值为0时,说明该语音信息是噪音信号。
那么,蓝牙耳机可结合上述第一VAD取值和第二VAD取值这两个VAD取值确定上述语音信息是否为噪音信号。例如,当第一VAD取值和第二VAD取值均为1时,蓝牙耳机可确定上述语音信息不是噪音信号,而是说话人正常的语音信号。又例如,当第一VAD取值和第二VAD取值分别大于预设取值时,蓝牙耳机可确定上述语音信息不是噪音信号,而是说话人正常的语音信号。
另外,当第二VAD取值为1或者第二VAD取值大于预设取值时,可一定程度上说明此时采集到的语音信息为活体用户发出的,因此,蓝牙耳机也可以仅根据第二VAD取值确定上述语音信息是否为噪音信号。
通过对上述第一语音分量和第二语音分量分别进行语音活动检测,如果蓝牙耳机确定出上述语音信息是噪音信号,则蓝牙耳机可丢弃该语音信息;如果蓝牙耳机确定出上述语音信息不是噪音信号,则蓝牙耳机可继续执行下述步骤S404-S407。即用户向蓝牙耳机输入有效的语音信息时,才会触发蓝牙耳机进行后续声纹识别等过程,从而降低蓝牙耳机的功耗。
另外,当蓝牙耳机获取到与第一语音分量和第二语音分量分别对应的第一VAD取值和第二VAD取值后,还可以使用噪声估计算法(例如,最小值统计算法或最小值控制递归平均算法等)分别测算上述语音信息中的噪声值。例如,蓝牙耳机可以设置专门用于存储噪声值的存储空间,蓝牙耳机每次计算出新的噪声值后,可以将新的噪声值更新在上述存储空间中。即该存储空间中一直保存有最近测算出的噪声值。
这样,蓝牙耳机通过上述VAD算法确定出上述语音信息为有效的语音信息后,可使用上述存储空间中的噪声值分别对上述第一语音分量和第二语音分量进行降噪处理,使得后续蓝牙耳机(或手机)分别对第一语音分量和第二语音分量进行声纹识别时的识别结果更加准确。
S404、蓝牙耳机通过蓝牙连接向手机发送第一语音分量和第二语音分量。
蓝牙耳机获取到上述第一语音分量和第二语音分量后,可将第一语音分量和第二语音分量发送给手机,进而由手机执行下述步骤S705-S707,以实现对用户输入的语音信息的声纹识别、用户身份鉴权等操作。
S405、手机分别对第一语音分量和第二语音分量进行声纹识别,得到与第一语音分量对应的第一声纹识别结果以及第二语音分量对应的第二声纹识别结果。
手机内可预先存储一个或多个合法用户的声纹模型。其中,每个合法用户均具有两个声纹模型,一个是根据气传导麦克风(即第一语音传感器)工作时采集到的用户的语音特征建立的第一声纹模型,另一个是根据骨传导麦克风(即第二语音传感器)工作时采集到的用户的语音特征建立的第二声纹模型。
其中,第一声纹模型和第二声纹模型的建立需要经过两个阶段。第一阶段是背景模型训练阶段。在第一阶段中,开发人员可采集大量说话人佩戴上述蓝牙耳机发声时产生的相关文本的语音(例如,“你好,小E”等)。进而,手机可对这些相关文本的语音进行滤波、降噪后可提取背景语音中的音频特征(例如,时频语谱图,或gammatone-like谱图等),并使用GMM(gaussian mixed model,高斯混合模型)或者SVM(support vector machines,支持向量机)或者深度神经网络类框架等机器学习算法建立声纹识别的背景模型。手机或蓝牙耳机可基于该背景模型根据某一用户输入的注册语音建立属于该用户的第一声纹模型和第二声纹模型。其中,上述深度神经网络类框架包括但不限于DNN(deep neural network,深度神经网络)算法、RNN(recurrent neural network,循环神经网络)算法和LSTM(long short term memory,长短时记忆)算法等。
第二阶段是用户在手机上首次使用语音控制功能时,通过输入注册语音建立属于该用户的第一声纹模型和第二声纹模型的过程。例如,合法用户1首次使用手机内安装的语音助手APP时,语音助手APP可提示用户佩戴蓝牙耳机并说出“你好,小E”的注册语音。同样,由于蓝牙耳机上包括气传导麦克风和骨传导麦克风,因此,蓝牙耳机可获取到该注册语音中通过气传导麦克风采集到的第一注册分量以及通过骨传导麦克风采集到的第二注册分量。进而,蓝牙耳机将第一注册分量和第二注册分量发送给手机后,手机可分别提取第一注册分量和第二注册分量中用户1的音频特征,进而将用户1的音频特征输入至上述背景模型中,得到用户1的第一声纹模型和第二声纹模型。手机可以将合法用户1的第一声纹模型和第二声纹模型保存在手机本地,也可以将合法用户1的第一声纹模型和第二声纹模型发送给蓝牙耳机进行保存。
另外,在建立合法用户1的第一声纹模型和第二声纹模型时,手机还可以将此时连接的蓝牙耳机作为合法蓝牙设备。例如,手机可以将该合法蓝牙设备的标识(例如蓝牙耳机的MAC地址等)保存在手机本地。这样,手机可以接收和执行合法蓝牙设备发来的相关操作指令,而当非法蓝牙设备向手机发送操作指令时,手机可丢弃该操作指令以提高安全性。一个手机可以管理一个或多个合法蓝牙设备。如图7中的(a)所示,用户可以从设置功能中进入声纹识别功能的设置界面701,用户点击设置按钮705后可进入如图7中的(b)所示的合法设备管理界面706。用户在合法设备管理界面806中可以添加或删除合法蓝牙设备。
在步骤S405中,手机获取到上述语音信息中的第一语音分量和第二语音分量后,可 分别提取第一语音分量和第二语音分量中的音频特征,进而使用合法用户1的第一声纹模型与第一语音分量中的音频特征进行匹配,并使用合法用户1的第二声纹模型与第二语音分量中的音频特征进行匹配。例如,手机可通过一定算法计算上述第一声纹模型与第一语音分量的第一匹配度(即第一声纹识别结果),以及上述第二声纹模型与第二语音分量的第二匹配度(即第二声纹识别结果)。一般,当匹配度越高时,说明上述语音信息中的音频特征与合法用户1的音频特征越相似,输入该语音信息的用户是合法用户1的概率越高。
如果手机内存储有多个合法用户的声纹模型,则手机还可以按照上述方法逐一计算上述第一语音分量与其他合法用户(例如合法用户2、合法用户3)的第一匹配度,以及上述第二语音分量与其他合法用户的第二匹配度。进而,蓝牙耳机可以将匹配度最高的合法用户(例如合法用户A)确定为此时的发声用户。
另外,在手机对第一语音分量和第二语音分量进行声纹识别之前,还可以先判断是否需要对第一语音分量和第二语音分量进行声纹识别。例如,如果蓝牙耳机或者手机可以从用户输入的语音信息中识别出预设的关键词,例如,“转账”、“支付”、“**银行”或者“聊天记录”等涉及用户隐私或资金行为的关键词,说明用户此时通过语音控制手机所需的安全需求较高,因此,手机可执行步骤S405进行声纹识别。又例如,如果蓝牙耳机接收到用户执行的预先设置的用于开启声纹识别功能的操作,例如,敲击蓝牙耳机或者同时按下音量+和音量-按键等操作,说明用户此时需要通过声纹识别验证用户身份,因此,蓝牙耳机可通知手机执行步骤S405进行声纹识别。
又或者,还可以在手机内预先设置与不同安全等级对应的关键词。例如,安全等级最高的关键词包括“支付”、“付款”等,安全等级较高的关键词包括“拍照”、“打电话”等,安全等级最低的关键词包括“听歌”、“导航”等。
这样,当检测到上述采集到的语音信息中包含安全等级最高的关键词时,可触发手机分别对第一语音分量和第二语音分量进行声纹识别,即对采集到的两路音源均进行声纹识别以提高语音控制手机时的安全性。当检测到上述采集到的语音信息中包含安全等级较高的关键词时,由于此时用户通过语音控制手机的安全性需求一般,因此可触发手机仅对第一语音分量或第二语音分量进行声纹识别。当检测到上述采集到的语音信息中包含安全等级最低的关键词时,手机无需对第一语音分量和第二语音分量进行声纹识别。
当然,如果蓝牙耳机采集到的语音信息中没有包含关键词,说明此时采集到的语音信息可能只是用户在正常交谈时发出的语音信息,因此,手机无需对第一语音分量和第二语音分量进行声纹识别,从而可降低手机的功耗。
又或者,手机还可以预先设置一个或多个唤醒词用于唤醒手机打开声纹识别功能。例如,该唤醒词可以为“你好,小E”。当用户向蓝牙耳机输入语音信息后,蓝牙耳机或手机可识别该语音信息是否是包含唤醒词的唤醒语音。例如,蓝牙耳机可将采集到的语音信息中的第一语音分量和第二语音分量发送给手机,如果手机进一步识别出该语音信息中包含上述唤醒词,则手机可打开声纹识别功能(例如为声纹识别芯片上电)。后续如果蓝牙耳机采集到的语音信息中包含上述关键词,则手机可使用已开启的声纹识别功能按照步骤S405的方法进行声纹识别。
又例如,蓝牙耳机采集到语音信息后也可进一步识别该语音信息中是否包含上述唤醒词。如果包含上述唤醒词,则说明后续用户可能需要使用声纹识别功能,那么,蓝牙耳机 可向手机发送启动指令,使得手机响应于该启动指令打开声纹识别功能。
S406、手机根据第一声纹识别结果和第二声纹识别结果对用户身份鉴权。
在步骤S706中,手机通过声纹识别得到与第一语音分量对应的第一声纹识别结果以及与第二语音分量对应的第二声纹识别结果后,可综合这两个声纹识别结果对输入上述语音信息的用户身份鉴权,从而提高用户身份鉴权时的准确性和安全性。
示例性的,合法用户的第一声纹模型与上述第一语音分量的第一匹配度为第一声纹识别结果,合法用户的第二声纹模型与上述第二语音分量的第二匹配度为第二声纹识别结果。在对用户身份鉴权时,如果上述第一匹配度和第二匹配度满足预设的鉴权策略,例如,鉴权策略为当上述第一匹配度大于第一阈值,且上述第二匹配度大于第二阈值(第二阈值与第一阈值相同或不同)时,手机确定发出该第一语音分量和第二语音分量的用户为合法用户;否则,手机可确定发出该第一语音分量和第二语音分量的用户为非法用户。
又例如,手机可计算上述第一匹配度和第二匹配度的加权平均值,当该加权平均值大于预设阈值时,手机可确定发出该第一语音分量和第二语音分量的用户为合法用户;否则,手机可确定发出上述第一语音分量和第二语音分量的用户为非法用户。
又或者,手机可以在不同的声纹识别场景下使用不同的鉴权策略。例如,当采集到的语音信息中包含安全等级最高的关键词时,手机可将上述第一阈值和第二阈值均设置为99分。这样,只有当第一匹配度和第二匹配度均大于99分时,手机确定当前的发声用户为合法用户。而当采集到的语音信息中包含安全等级较低的关键词时,手机可将上述第一阈值和第二阈值均设置为85分。这样,当第一匹配度和第二匹配度均大于85分时,手机便可确定当前的发声用户为合法用户。也就是说,对于不同安全等级的声纹识别场景,手机可使用不同安全等级的鉴权策略对用户身份鉴权。
另外,如果手机内存储有或多个合法用户的声纹模型,例如,手机内存储有合法用户A、合法用户B和合法用户C的声纹模型,每个合法用户的声纹模型均包括第一声纹模型和第二声纹模型。那么,手机可以按照上述方法将采集到的第一语音分量和第二语音分量分别与每个合法用户的声纹模型进行匹配。进而,手机可以将满足上述鉴权策略,且匹配度最高的合法用户(例如合法用户A)确定为此时的发声用户。
在本申请的另一些实施例中,手机内存储的合法用户的声纹模型也可以是手机对上述注册语音中的第一注册分量以及第二注册分量进行融合后建立的。此时,每个合法用户均具有一个声纹模型,且该声纹模型既能反映出合法用户的声音通过空气传导时的音频特征,也能反映出合法用户的声音通过骨传导时的音频特征。
这样,手机接收到蓝牙耳机发送的语音信息中的第一语音分量和第二语音分量后,可将第一语音分量和第二语音分量融合后进行声纹识别,例如,计算第一语音分量和第二语音分量融合后与合法用户的声纹模型之间的匹配度。进而,手机根据该匹配度也能够对用户身份鉴权。由于这种身份鉴权方法中合法用户的声纹模型被融合为一个,因此声纹模型的复杂度和所需的存储空间都相应降低,同时由于利用了第二语音分量的声纹特征信息所以也具有双重声纹保障和活体检测功能。
S407、若上述用户为合法用户,则手机执行与上述语音信息对应的操作指令。
通过上述步骤S406的鉴权过程,如果手机确定出步骤S402中输入语音信息的发声用户为合法用户,则手机可生成与上述语音信息对应的操作指令。例如,当上述语音信息为 “小E,使用微信支付”时,与其对应的操作指令为打开微信APP的支付界面。这样,手机生成打开微信APP中支付界面的操作指令后,可自动打开微信APP,并显示微信APP中的支付界面。
另外,由于手机已经确定出上述用户为合法用户,因此,如图5所示,如果当前手机处于锁定状态,手机还可以先解锁屏幕,再执行打开微信APP中支付界面的操作指令,显示显示微信APP中的支付界面501。
示例性的,上述步骤S401-S407提供的语音控制方法可以是语音助手APP提供的一项功能。蓝牙耳机与手机交互时,如果通过声纹识别确定此时的发声用户为合法用户,手机可将生成的操作指令或语音信息等数据发送给应用程序层运行的语音助手APP。进而,由语音助手APP调用应用程序框架层的相关接口或服务执行与上述语音信息对应的操作指令。
可以看出,本申请实施例中提供的语音控制方法可以在利用声纹识别用户身份的同时,对手机解锁并执行语音信息中的相关操作指令。即用户只需要输入一次语音信息即可完成用户身份鉴权、手机解锁以及打开手机某一功能等一些列操作,从而大大提高了用户对手机的操控效率和用户体验。
在上述步骤S401-S407中,是以手机作为执行主体进行声纹识别以及用户身份鉴权等操作。可以理解的是,上述步骤S401-S407中的部分或全部内容也可以由蓝牙耳机完成,这可以降低手机的实现复杂度以及手机的功耗。如图6所示,该语音控制方法可以包括:
S601、手机与蓝牙耳机建立蓝牙连接。
S602(可选的)、蓝牙耳机检测是否处于佩戴状态。
S603、若处于佩戴状态,则蓝牙耳机通过第一语音传感器采集用户输入的语音信息中的第一语音分量,并通过第二语音传感器采集上述语音信息中的第二语音分量。
其中,步骤S601-S603中蓝牙耳机与手机建立蓝牙连接,检测蓝牙耳机是否处于佩戴状态,以及检测语音信息中的第一语音分量和第二语音分量的具体方法可参见上述步骤S401-S403的相关描述,故此处不再赘述。
需要说明的时,蓝牙耳机获取到上述第一语音分量和第二语音分量后,还可以对检测到的第一语音分量和第二语音分量进行VAD检测、降噪或滤波等操作,本申请实施例对此不做任何限制。
在本申请的一些实施例中,由于蓝牙耳机具有音频播放功能,而当蓝牙耳机的扬声器在工作时,蓝牙耳机上的气传导麦克风和骨传导麦克风可能会接收到扬声器所播放的音源的回声信号。因此,当蓝牙耳机获取到上述第一语音分量和第二语音分量后,还可以使用回声消除算法(adaptive echo cancellation,AEC)消除第一语音分量和第二语音分量中的回声信号,以提高后续声纹识别的准确性。
S604、蓝牙耳机分别对第一语音分量和第二语音分量进行声纹识别,得到与第一语音分量对应的第一声纹识别结果以及与第二语音分量对应的第二声纹识别结果。
与上述步骤S401-S407不同的是,在步骤S604中,蓝牙耳机内可预先存储一个或多个合法用户的声纹模型。这样,蓝牙耳机获取到上述第一语音分量和第二语音分量后,可使用蓝牙耳机本地存储的声纹模型对对第一语音分量和第二语音分量进行声纹识别。其中,蓝牙耳机分别对第一语音分量和第二语音分量进行声纹识别的具体方法,可参见上述步骤 S405中手机分别对第一语音分量和第二语音分量进行声纹识别的具体方法,故此处不再赘述。
S605、蓝牙耳机根据第一声纹识别结果和第二声纹识别结果对用户身份鉴权。
其中,蓝牙耳机根据第一声纹识别结果和第二声纹识别结果对用户身份鉴权的过程可参见上述步骤S406中手机根据第一声纹识别结果和第二声纹识别结果对用户身份鉴权的相关描述,故此处不再赘述。
S606、若上述用户为合法用户,则蓝牙耳机通过蓝牙连接向手机发送与上述语音信息对应的操作指令。
S607、手机执行上述操作指令。
如果蓝牙耳机确定出输入上述语音信息的发声用户为合法用户,则蓝牙耳机可生成与上述语音信息对应的操作指令。例如,当上述语音信息为“小E,使用微信支付”时,与其对应的操作指令为打开微信APP的支付界面。这样,蓝牙耳机可通过已建立的蓝牙连接向手机发送打开微信APP中支付界面的操作指令,如图5所示,手机接收到该操作指令后可自动打开微信APP,并显示微信APP中的支付界面501。
另外,由于蓝牙耳机已经确定出上述用户为合法用户,因此,当手机处于锁定状态时,蓝牙耳机还可以向手机发送用户身份鉴权通过的消息或者解锁指令,使得手机可以先解锁屏幕,再执行与上述语音信息对应的操作指令。当然,蓝牙耳机也可以将采集到的语音信息发送给手机,由手机根据该语音信息生成对应的操作指令,并执行该操作指令。
在本申请的一些实施例中,蓝牙耳机向手机发送上述语音信息或对应的操作指令时,还可以将自身的设备标识(例如MAC地址)发送给手机。由于手机内存储有已经通过鉴权的合法蓝牙设备的标识,因此,手机可根据接收到的设备标识确定当前连接的蓝牙耳机是否为合法蓝牙设备。如果该蓝牙耳机是合法蓝牙设备,则手机可进一步执行该蓝牙耳机发送来的操作指令,或者对该蓝牙耳机发送来的语音信息进行语音识别等操作,否则,手机可丢弃该蓝牙耳机发来的操作指令,从而避免非法蓝牙设备恶意操控手机导致的安全性问题。
或者,手机与合法蓝牙设备可以预先约定传输上述操作指令时的口令或密码。这样,蓝牙耳机向手机发送上述语音信息或对应的操作指令时,还可以向手机发送预先约定的口令或密码,使得手机确定当前连接的蓝牙耳机是否为合法蓝牙设备。
又或者,手机与合法蓝牙设备可以预先约定传输上述操作指令时使用的加密和解密算法。这样,蓝牙耳机向手机发送上述语音信息或对应的操作指令前,可使用约定的加密算法对该操作指令进行加密。手机接收到加密后的操作指令后,如果使用约定的解密算法能够解密出上述操作指令,则说明当前连接的蓝牙耳机为合法蓝牙设备,则手机可进一步执行该蓝牙耳机发送来的操作指令;否则,说明当前连接的蓝牙耳机为非法蓝牙设备,手机可丢弃该蓝牙耳机发来的操作指令。
需要说明的是,上述步骤S401-S407以及步骤S601-S607仅为在本申请提供的语音控制方法的两种实现方式。可以理解的是,本领域技术人员可以根据实际应用场景或实际经验设置上述实施例中哪些步骤由蓝牙耳机执行,哪些步骤由手机执行,本申请实施例对此不做任何限制。
例如,蓝牙耳机也可以在对第一语音分量和第二语音分量进行声纹识别之后,将得到 的第一声纹识别结果和第二声纹识别结果发送给手机,后续由手机根据该声纹识别结果进行用户身份鉴权等操作。
又例如,蓝牙耳机也可以在获取到上述第一语音分量和第二语音分量后,先判断是否需要对第一语音分量和第二语音分量进行声纹识别。如果需要对第一语音分量和第二语音分量进行声纹识别,则蓝牙耳机可向手机发送该第一语音分量和第二语音分量,进而由手机完成后续声纹识别、用户身份鉴权等操作;否则,蓝牙耳机无需向手机发送该第一语音分量和第二语音分量,避免增加手机处理该第一语音分量和第二语音分量的功耗。
另外,如图7中的(a)所示,用户还可以进入手机的设置界面701中开启或关闭上述语音控制控能。如果用户开启上述语音控制控能,用户可通过设置按钮702设置触发该语音控制的关键词,例如“小E”、“支付”等,用户也可以通过设置按钮703管理合法用户的声纹模型,例如添加或删除合法用户的声纹模型,用户还可以通过设置按钮704设置语音助手能够支持的操作指令,例如支付、拨打电话、订餐等。这样,用户可以获得定制化的语音控制体验。
在本申请的一些实施例中,本申请实施例公开了一种终端,如图8所示,该终端用于实现以上各个方法实施例中记载的方法,其包括:连接单元801、获取单元802、识别单元803、鉴权单元804以及执行单元805。其中,连接单元801用于支持终端执行图4中的过程S401,以及图6中的过程S601;获取单元802支持终端执行图4中的过程S404,以及图6中的过程S606;识别单元803用于支持终端执行图4中的过程S405;鉴权单元804用于支持终端执行图4中的过程S406;执行单元805用于支持终端执行图4中的过程S407和图6中的过程S607。其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。
在本申请的一些实施例中,本申请实施例公开了一种可穿戴设备,如图9所示,该可穿戴设备用于实现以上各个方法实施例中记载的方法,其包括:连接单元901、检测单元902、发送单元903、识别单元904以及鉴权单元905。其中,连接单元801用于支持终端执行图4中的过程S401,以及图6中的过程S601;检测单元902用于支持终端执行图4中的过程S402-S403,以及图6中的过程S602-S603;识别单元904用于支持终端执行图6中的过程S604;鉴权单元905用于支持终端执行图6中的过程S605;发送单元903用于支持终端执行图4中的过程S404,以及图6中的过程S606。其中,上述方法实施例涉及的各步骤的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。
在本申请的另一些实施例中,本申请实施例公开了一种终端,如图10所示,该终端可以包括:触摸屏1001,其中,所述触摸屏1001包括触敏表面1006和显示屏1007;一个或多个处理器1002;存储器1003;一个或多个应用程序(未示出);以及一个或多个计算机程序1004,上述各器件可以通过一个或多个通信总线1005连接。其中该一个或多个计算机程序1004被存储在上述存储器1003中并被配置为被该一个或多个处理器1002执行,该一个或多个计算机程序1004包括指令,上述指令可以用于执行如图4、图6及相应实施例中的各个步骤。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将装置的内部结构划分成不同的功能模块,以完成以 上描述的全部或者部分功能。上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请实施例各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器执行本申请各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:快闪存储器、移动硬盘、只读存储器、随机存取存储器、磁碟或者光盘等各种可以存储程序代码的介质。
以上所述,仅为本申请实施例的具体实施方式,但本申请实施例的保护范围并不局限于此,任何在本申请实施例揭露的技术范围内的变化或替换,都应涵盖在本申请实施例的保护范围之内。因此,本申请实施例的保护范围应以所述权利要求的保护范围为准。

Claims (22)

  1. 一种语音控制方法,其特征在于,包括:
    当发声用户向可穿戴设备输入语音信息时,终端根据所述语音信息中第一语音分量的第一声纹识别结果和所述语音信息中第二语音分量的第二声纹识别结果,对所述发声用户进行身份鉴权;所述可穿戴设备与所述终端通信连接,所述第一语音分量是由所述可穿戴设备的第一语音传感器采集到的,所述第二语音分量是由所述可穿戴设备的第二语音传感器采集到的;
    若所述终端对所述发声用户的身份鉴权结果为所述发声用户为合法用户,则所述终端执行与所述语音信息对应的操作指令。
  2. 根据权利要求1所述的语音控制方法,其特征在于,在终端根据所述语音信息中第一语音分量的第一声纹识别结果和所述语音信息中第二语音分量的第二声纹识别结果,对所述发声用户进行身份鉴权之前,还包括:
    所述终端从所述可穿戴设备获取第一声纹识别结果和第二声纹识别结果,所述第一声纹识别结果为所述可穿戴设备对所述第一语音分量进行声纹识别后得到的,所述第二声纹识别结果为所述可穿戴设备对所述第二语音分量进行声纹识别后得到的。
  3. 根据权利要求1所述的语音控制方法,其特征在于,在终端根据所述语音信息中第一语音分量的第一声纹识别结果和所述语音信息中第二语音分量的第二声纹识别结果,对所述发声用户进行身份鉴权之前,还包括:
    所述终端从所述可穿戴设备获取所述第一语音分量和所述第二语音分量;
    所述终端对所述第一语音分量和所述第二语音分量分别进行声纹识别,得到与所述第一语音分量对应的第一声纹识别结果以及与所述第二语音分量对应的第二声纹识别结果。
  4. 根据权利要求3所述的语音控制方法,其特征在于,所述终端对所述第一语音分量和所述第二语音分量分别进行声纹识别,包括:
    当所述语音信息中包括预设的关键词时,所述终端对所述第一语音分量和所述第二语音分量进行声纹识别;或者;
    当接收到用户输入的预设操作时,所述终端对所述第一语音分量和所述第二语音分量进行声纹识别。
  5. 根据权利要求3或4所述的语音控制方法,其特征在于,所述终端对所述第一语音分量和所述第二语音分量分别进行声纹识别,包括:
    所述终端判断所述第一语音分量与合法用户的第一声纹模型是否匹配,所述第一声纹模型用于反映所述第一语音传感器采集到的所述合法用户的音频特征;
    所述终端判断所述第二语音分量与合法用户的第二声纹模型是否匹配,所述第二声纹模型用于反映所述第二语音传感器采集到的所述合法用户的音频特征;
    其中,终端根据所述语音信息中第一语音分量的第一声纹识别结果和所述语音信息中第二语音分量的第二声纹识别结果,对所述发声用户进行身份鉴权,包括:
    若所述第一语音分量与合法用户的第一声纹模型匹配,且所述第二语音分量与合法用户的第二声纹模型匹配,则所述终端确定所述发声用户为合法用户;否则,所述终端确定所述发声用户为非法用户。
  6. 根据权利要求5所述的语音控制方法,其特征在于,所述终端判断所述第一语音 分量与合法用户的第一声纹模型是否匹配,包括:
    所述终端计算所述第一语音分量与所述合法用户的第一声纹模型之间的第一匹配度;
    若所述第一匹配度大于第一阈值,则所述终端确定所述第一语音分量与所述合法用户的第一声纹模型匹配;
    其中,所述终端判断所述第二语音分量与合法用户的第二声纹模型是否匹配,包括:
    所述终端计算所述第二语音分量与所述合法用户的第二声纹模型之间的第二匹配度;
    若所述第二匹配度大于第二阈值,则所述终端确定所述第二语音分量与所述合法用户的第二声纹模型匹配。
  7. 根据权利要求1-6中任一项所述的语音控制方法,其特征在于,在所述终端根据所述语音信息中第一语音分量的第一声纹识别结果和所述语音信息中第二语音分量的第二声纹识别结果,对所述发声用户进行身份鉴权之前,还包括:
    所述终端获取所述可穿戴设备发送的启动指令,所述启动指令是所述可穿戴设备响应于用户输入的唤醒语音生成的;
    响应于所述启动指令,所述终端打开声纹识别功能。
  8. 根据权利要求3-6中任一项所述的语音控制方法,其特征在于,在所述终端从所述可穿戴设备获取所述第一语音分量和所述第二语音分量之后,还包括:
    所述终端根据所述第一语音分量和所述第二语音分量确定所述语音信息中是否包含预设的唤醒词;
    若包含预设的唤醒词,则所述终端打开声纹识别功能。
  9. 根据权利要求1-8中任一项所述的语音控制方法,其特征在于,若所述发声用户为合法用户,则所述方法还包括:
    所述终端自动执行解锁操作。
  10. 根据权利要求1-9中任一项所述的语音控制方法,其特征在于,在所述终端执行与所述语音信息对应的操作指令之前,还包括:
    所述终端获取所述可穿戴设备的设备标识;
    其中,所述终端执行与所述语音信息对应的操作指令,包括:
    若所述可穿戴设备的设备标识为预设的合法设备标识,则所述终端执行与所述语音信息对应的操作指令。
  11. 一种终端,其特征在于,包括:
    触摸屏,其中,所述触摸屏包括触敏表面和显示器;
    一个或多个处理器;
    一个或多个存储器;
    以及一个或多个计算机程序,其中所述一个或多个计算机程序被存储在所述一个或多个存储器中,所述一个或多个计算机程序包括指令,当所述指令被所述终端执行时,使得所述终端执行以下步骤:
    当发声用户向可穿戴设备输入语音信息时,根据所述语音信息中第一语音分量的第一声纹识别结果和所述语音信息中第二语音分量的第二声纹识别结果,对所述发声用户进行身份鉴权;所述可穿戴设备与所述终端通信连接,所述第一语音分量是由所述可穿戴设备的第一语音传感器采集到的,所述第二语音分量是由所述可穿戴设备的第二语音传感器采 集到的;
    若对所述发声用户的身份鉴权结果为所述发声用户为合法用户,则执行与所述语音信息对应的操作指令。
  12. 根据权利要求11所述的终端,其特征在于,在终端根据所述语音信息中第一语音分量的第一声纹识别结果和所述语音信息中第二语音分量的第二声纹识别结果,对所述发声用户进行身份鉴权之前,所述终端还用于执行:
    从所述可穿戴设备获取第一声纹识别结果和第二声纹识别结果,所述第一声纹识别结果为所述可穿戴设备对所述第一语音分量进行声纹识别后得到的,所述第二声纹识别结果为所述可穿戴设备对所述第二语音分量进行声纹识别后得到的。
  13. 根据权利要求11所述的终端,其特征在于,在终端根据所述语音信息中第一语音分量的第一声纹识别结果和所述语音信息中第二语音分量的第二声纹识别结果,对所述发声用户进行身份鉴权之前,所述终端还用于执行:
    从所述可穿戴设备获取所述第一语音分量和所述第二语音分量;
    对所述第一语音分量和所述第二语音分量分别进行声纹识别,得到与所述第一语音分量对应的第一声纹识别结果以及与所述第二语音分量对应的第二声纹识别结果。
  14. 根据权利要求13所述的终端,其特征在于,所述终端对所述第一语音分量和所述第二语音分量分别进行声纹识别,具体包括:
    当所述语音信息中包括预设的关键词时,对所述第一语音分量和所述第二语音分量进行声纹识别;或者;
    当接收到用户输入的预设操作时,对所述第一语音分量和所述第二语音分量进行声纹识别。
  15. 根据权利要求13或14所述的终端,其特征在于,所述终端对所述第一语音分量和所述第二语音分量分别进行声纹识别,具体包括:
    判断所述第一语音分量与合法用户的第一声纹模型是否匹配,所述第一声纹模型用于反映所述第一语音传感器采集到的所述合法用户的音频特征;
    判断所述第二语音分量与合法用户的第二声纹模型是否匹配,所述第二声纹模型用于反映所述第二语音传感器采集到的所述合法用户的音频特征;
    其中,终端根据所述语音信息中第一语音分量的第一声纹识别结果和所述语音信息中第二语音分量的第二声纹识别结果,对所述发声用户进行身份鉴权,具体包括:
    若所述第一语音分量与合法用户的第一声纹模型匹配,且所述第二语音分量与合法用户的第二声纹模型匹配,则确定所述发声用户为合法用户;否则,确定所述发声用户为非法用户。
  16. 根据权利要求15所述的终端,其特征在于,所述终端判断所述第一语音分量与合法用户的第一声纹模型是否匹配,具体包括:
    计算所述第一语音分量与所述合法用户的第一声纹模型之间的第一匹配度;
    若所述第一匹配度大于第一阈值,则确定所述第一语音分量与所述合法用户的第一声纹模型匹配;
    所述终端判断所述第二语音分量与合法用户的第二声纹模型是否匹配,具体包括:
    计算所述第二语音分量与所述合法用户的第二声纹模型之间的第二匹配度;
    若所述第二匹配度大于第二阈值,则确定所述第二语音分量与所述合法用户的第二声纹模型匹配。
  17. 根据权利要求11-16中任一项所述的终端,其特征在于,在所述终端根据所述语音信息中第一语音分量的第一声纹识别结果和所述语音信息中第二语音分量的第二声纹识别结果,对所述发声用户进行身份鉴权之前,所述终端还用于执行:
    获取所述可穿戴设备发送的启动指令,所述启动指令是所述可穿戴设备响应于用户输入的唤醒语音生成的;
    响应于所述启动指令,打开声纹识别功能。
  18. 根据权利要求13-16中任一项所述的终端,其特征在于,在所述终端从所述可穿戴设备获取所述第一语音分量和所述第二语音分量之后,所述终端还用于执行:
    根据所述第一语音分量和所述第二语音分量确定所述语音信息中是否包含预设的唤醒词;
    若包含预设的唤醒词,则打开声纹识别功能。
  19. 根据权利要求11-18中任一项所述的终端,其特征在于,若所述发声用户为合法用户,则所述终端还用于执行:
    自动执行解锁操作。
  20. 根据权利要求11-19中任一项所述的终端,其特征在于,在所述终端执行与所述语音信息对应的操作指令之前,所述终端还用于执行:
    获取所述可穿戴设备的设备标识;
    其中,所述终端执行与所述语音信息对应的操作指令,具体包括:
    若所述可穿戴设备的设备标识为预设的合法设备标识,则执行与所述语音信息对应的操作指令。
  21. 一种计算机可读存储介质,所述计算机可读存储介质中存储有指令,其特征在于,当所述指令在终端上运行时,使得所述终端执行如权利要求1-10中任一项所述的语音控制方法。
  22. 一种包含指令的计算机程序产品,其特征在于,当所述计算机程序产品在终端上运行时,使得所述终端执行如权利要求1-10中任一项所述的语音控制方法。
PCT/CN2018/093829 2018-06-29 2018-06-29 一种语音控制方法、可穿戴设备及终端 WO2020000427A1 (zh)

Priority Applications (7)

Application Number Priority Date Filing Date Title
US17/256,845 US20210256979A1 (en) 2018-06-29 2018-06-29 Voice Control Method, Wearable Device, and Terminal
CN202011060047.9A CN112420035A (zh) 2018-06-29 2018-06-29 一种语音控制方法、可穿戴设备及终端
EP18924696.0A EP3790006A4 (en) 2018-06-29 2018-06-29 VOICE COMMAND PROCESS, PORTABLE DEVICE AND TERMINAL
CN201880024906.3A CN110574103B (zh) 2018-06-29 2018-06-29 一种语音控制方法、可穿戴设备及终端
RU2021101686A RU2763392C1 (ru) 2018-06-29 2018-06-29 Способ голосового управления, носимое устройство и терминал
PCT/CN2018/093829 WO2020000427A1 (zh) 2018-06-29 2018-06-29 一种语音控制方法、可穿戴设备及终端
KR1020207037501A KR102525294B1 (ko) 2018-06-29 2018-06-29 음성 제어 방법, 웨어러블 디바이스 및 단말

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/093829 WO2020000427A1 (zh) 2018-06-29 2018-06-29 一种语音控制方法、可穿戴设备及终端

Publications (1)

Publication Number Publication Date
WO2020000427A1 true WO2020000427A1 (zh) 2020-01-02

Family

ID=68772588

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/093829 WO2020000427A1 (zh) 2018-06-29 2018-06-29 一种语音控制方法、可穿戴设备及终端

Country Status (6)

Country Link
US (1) US20210256979A1 (zh)
EP (1) EP3790006A4 (zh)
KR (1) KR102525294B1 (zh)
CN (2) CN110574103B (zh)
RU (1) RU2763392C1 (zh)
WO (1) WO2020000427A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021256652A1 (en) * 2020-06-15 2021-12-23 Samsung Electronics Co., Ltd. Electronic apparatus and controlling method thereof

Families Citing this family (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110569795B (zh) * 2018-03-13 2022-10-14 腾讯科技(深圳)有限公司 一种图像识别方法、装置以及相关设备
CN110191450B (zh) * 2019-04-02 2021-02-26 华为技术有限公司 业务连接建立方法、蓝牙主设备、芯片及蓝牙系统
CN111210823B (zh) * 2019-12-25 2022-08-26 秒针信息技术有限公司 收音设备检测方法和装置
CN111131966B (zh) * 2019-12-26 2022-05-20 上海传英信息技术有限公司 模式控制方法、耳机系统及计算机可读存储介质
CN111613229A (zh) * 2020-05-13 2020-09-01 深圳康佳电子科技有限公司 一种电视音箱的声纹控制方法、存储介质及智能电视
CN113823288A (zh) * 2020-06-16 2021-12-21 华为技术有限公司 一种语音唤醒的方法、电子设备、可穿戴设备和系统
CN113830026A (zh) * 2020-06-24 2021-12-24 华为技术有限公司 一种设备控制方法及计算机可读存储介质
CN113963683A (zh) * 2020-07-01 2022-01-21 广州汽车集团股份有限公司 一种后备箱开启控制方法及后备箱开启控制系统
CN111683358A (zh) * 2020-07-06 2020-09-18 Oppo(重庆)智能科技有限公司 控制方法、装置、移动终端、存储介质和无线耳机
CN111627450A (zh) * 2020-07-28 2020-09-04 南京新研协同定位导航研究院有限公司 一种mr眼镜的延长续航系统及其续航方法
CN112259124B (zh) * 2020-10-21 2021-06-15 交互未来(北京)科技有限公司 基于音频频域特征的对话过程捂嘴手势识别方法
CN112133313A (zh) * 2020-10-21 2020-12-25 交互未来(北京)科技有限公司 基于单耳机语音对话过程捂嘴手势的识别方法
CN112259097A (zh) * 2020-10-27 2021-01-22 深圳康佳电子科技有限公司 一种语音识别的控制方法和计算机设备
CN114553289A (zh) * 2020-11-27 2022-05-27 成都立扬信息技术有限公司 一种基于北斗通信的通信终端
CN112530430A (zh) * 2020-11-30 2021-03-19 北京百度网讯科技有限公司 车载操作系统控制方法、装置、耳机、终端及存储介质
US11699428B2 (en) * 2020-12-02 2023-07-11 National Applied Research Laboratories Method for converting vibration to voice frequency wirelessly
KR20230056726A (ko) * 2021-01-11 2023-04-27 썬전 샥 컴퍼니 리미티드 골전도 이어폰의 작동상태를 최적화하기 위한 방법
CN112951243A (zh) * 2021-02-07 2021-06-11 深圳市汇顶科技股份有限公司 语音唤醒方法、装置、芯片、电子设备及存储介质
CN115132212A (zh) * 2021-03-24 2022-09-30 华为技术有限公司 一种语音控制方法和装置
US11393449B1 (en) * 2021-03-25 2022-07-19 Cirrus Logic, Inc. Methods and apparatus for obtaining biometric data
WO2022252858A1 (zh) * 2021-05-31 2022-12-08 华为技术有限公司 一种语音控制方法及电子设备
CN113329356B (zh) * 2021-06-02 2022-06-03 中国工商银行股份有限公司 切换接听方式的方法、装置、电子设备及介质
CN113298507B (zh) * 2021-06-15 2023-08-22 英华达(上海)科技有限公司 支付验证方法、系统、电子设备和存储介质
US11757871B1 (en) * 2021-07-13 2023-09-12 T-Mobile Usa, Inc. Voice command security and authorization in user computing devices
CN113612738B (zh) * 2021-07-20 2023-05-16 深圳市展韵科技有限公司 声纹实时鉴权加密的方法、声纹鉴权设备及受控设备
CN113559511A (zh) * 2021-07-26 2021-10-29 歌尔科技有限公司 控制方法、游戏装置、计算机程序产品及可读存储介质
CN113409819B (zh) * 2021-08-19 2022-01-25 中国空气动力研究与发展中心低速空气动力研究所 一种基于听觉谱特征提取的直升机声信号识别方法
CN113674749A (zh) * 2021-08-26 2021-11-19 珠海格力电器股份有限公司 一种控制方法、装置、电子设备和存储介质
CN114120603B (zh) * 2021-11-26 2023-08-08 歌尔科技有限公司 语音控制方法、耳机和存储介质
US20230290354A1 (en) * 2022-03-08 2023-09-14 University Of Houston System Systems and apparatus for multifactor authentication using bone conduction and audio signals

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103259908A (zh) * 2012-02-15 2013-08-21 联想(北京)有限公司 一种移动终端及其智能控制方法
CN103730120A (zh) * 2013-12-27 2014-04-16 深圳市亚略特生物识别科技有限公司 电子设备的语音控制方法及系统
CN105469791A (zh) * 2014-09-04 2016-04-06 中兴通讯股份有限公司 业务处理方法及装置
WO2018025039A1 (en) * 2016-08-03 2018-02-08 Cirrus Logic International Semiconductor Limited Methods and apparatus for authentication in an electronic device
CN107682553A (zh) * 2017-10-10 2018-02-09 广东欧珀移动通信有限公司 通话信号发送方法、装置、移动终端及存储介质
CN107863098A (zh) * 2017-12-07 2018-03-30 广州市艾涛普电子有限公司 一种语音识别控制方法和装置

Family Cites Families (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
RU2234727C1 (ru) * 2002-12-04 2004-08-20 Открытое акционерное общество "Корпорация "Фазотрон - научно-исследовательский институт радиостроения" Устройство управления бортовой радиолокационной станцией с альтернативным каналом речевого управления
CN101441869A (zh) * 2007-11-21 2009-05-27 联想(北京)有限公司 语音识别终端用户身份的方法及终端
WO2010013940A2 (en) * 2008-07-29 2010-02-04 Lg Electronics Inc. A method and an apparatus for processing an audio signal
US9262612B2 (en) * 2011-03-21 2016-02-16 Apple Inc. Device access using voice authentication
CN103078828A (zh) * 2011-10-25 2013-05-01 上海博路信息技术有限公司 一种云模式的语音鉴权系统
CN103871419B (zh) * 2012-12-11 2017-05-24 联想(北京)有限公司 一种信息处理方法及电子设备
US9589120B2 (en) * 2013-04-05 2017-03-07 Microsoft Technology Licensing, Llc Behavior based authentication for touch screen devices
CN104348621A (zh) * 2013-08-02 2015-02-11 成都林海电子有限责任公司 一种基于声纹识别的鉴权系统及方法
US9343068B2 (en) * 2013-09-16 2016-05-17 Qualcomm Incorporated Method and apparatus for controlling access to applications having different security levels
CN103838991A (zh) * 2014-02-20 2014-06-04 联想(北京)有限公司 一种信息处理方法及电子设备
CN104050406A (zh) * 2014-07-03 2014-09-17 南昌欧菲生物识别技术有限公司 利用指纹组合进行鉴权的方法及终端设备
KR102246900B1 (ko) * 2014-07-29 2021-04-30 삼성전자주식회사 전자 장치 및 이의 음성 인식 방법
US10146923B2 (en) * 2015-03-20 2018-12-04 Aplcomp Oy Audiovisual associative authentication method, related system and device
CN105096937A (zh) * 2015-05-26 2015-11-25 努比亚技术有限公司 语音数据处理方法及终端
EP3113505A1 (en) * 2015-06-30 2017-01-04 Essilor International (Compagnie Generale D'optique) A head mounted audio acquisition module
CN106373575B (zh) * 2015-07-23 2020-07-21 阿里巴巴集团控股有限公司 一种用户声纹模型构建方法、装置及系统
US10062388B2 (en) * 2015-10-22 2018-08-28 Motorola Mobility Llc Acoustic and surface vibration authentication
CN105224850A (zh) * 2015-10-24 2016-01-06 北京进化者机器人科技有限公司 组合鉴权方法及智能交互系统
US10262123B2 (en) * 2015-12-30 2019-04-16 Motorola Mobility Llc Multimodal biometric authentication system and method with photoplethysmography (PPG) bulk absorption biometric
US9892247B2 (en) * 2015-12-30 2018-02-13 Motorola Mobility Llc Multimodal biometric authentication system and method with photoplethysmography (PPG) bulk absorption biometric
US10446143B2 (en) * 2016-03-14 2019-10-15 Apple Inc. Identification of voice inputs providing credentials
CN107231232B (zh) * 2016-03-23 2020-04-28 阿里巴巴集团控股有限公司 一种身份验证方法及装置
JP2018025855A (ja) * 2016-08-08 2018-02-15 ソニーモバイルコミュニケーションズ株式会社 情報処理サーバ、情報処理装置、情報処理システム、情報処理方法、およびプログラム
US20180107813A1 (en) * 2016-10-18 2018-04-19 Plantronics, Inc. User Authentication Persistence
US10678502B2 (en) * 2016-10-20 2020-06-09 Qualcomm Incorporated Systems and methods for in-ear control of remote devices
CN106847275A (zh) * 2016-12-27 2017-06-13 广东小天才科技有限公司 一种用于控制穿戴设备的方法及穿戴设备
CN106714023B (zh) * 2016-12-27 2019-03-15 广东小天才科技有限公司 一种基于骨传导耳机的语音唤醒方法、系统及骨传导耳机
CN106686494A (zh) * 2016-12-27 2017-05-17 广东小天才科技有限公司 一种可穿戴设备的语音输入控制方法及可穿戴设备
US10360916B2 (en) * 2017-02-22 2019-07-23 Plantronics, Inc. Enhanced voiceprint authentication
US10438584B2 (en) * 2017-04-07 2019-10-08 Google Llc Multi-user virtual assistant for verbal device control
CN107886957A (zh) * 2017-11-17 2018-04-06 广州势必可赢网络科技有限公司 一种结合声纹识别的语音唤醒方法及装置
CN108062464A (zh) * 2017-11-27 2018-05-22 北京传嘉科技有限公司 基于声纹识别的终端控制方法及系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103259908A (zh) * 2012-02-15 2013-08-21 联想(北京)有限公司 一种移动终端及其智能控制方法
CN103730120A (zh) * 2013-12-27 2014-04-16 深圳市亚略特生物识别科技有限公司 电子设备的语音控制方法及系统
CN105469791A (zh) * 2014-09-04 2016-04-06 中兴通讯股份有限公司 业务处理方法及装置
WO2018025039A1 (en) * 2016-08-03 2018-02-08 Cirrus Logic International Semiconductor Limited Methods and apparatus for authentication in an electronic device
CN107682553A (zh) * 2017-10-10 2018-02-09 广东欧珀移动通信有限公司 通话信号发送方法、装置、移动终端及存储介质
CN107863098A (zh) * 2017-12-07 2018-03-30 广州市艾涛普电子有限公司 一种语音识别控制方法和装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3790006A4 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021256652A1 (en) * 2020-06-15 2021-12-23 Samsung Electronics Co., Ltd. Electronic apparatus and controlling method thereof
US11664033B2 (en) 2020-06-15 2023-05-30 Samsung Electronics Co., Ltd. Electronic apparatus and controlling method thereof

Also Published As

Publication number Publication date
EP3790006A4 (en) 2021-06-09
CN112420035A (zh) 2021-02-26
RU2763392C1 (ru) 2021-12-28
CN110574103B (zh) 2020-10-23
KR102525294B1 (ko) 2023-04-24
KR20210015917A (ko) 2021-02-10
EP3790006A1 (en) 2021-03-10
US20210256979A1 (en) 2021-08-19
CN110574103A (zh) 2019-12-13

Similar Documents

Publication Publication Date Title
WO2020000427A1 (zh) 一种语音控制方法、可穿戴设备及终端
US11508382B2 (en) System, device and method for enforcing privacy during a communication session with a voice assistant
KR101857899B1 (ko) 스마트 디바이스의 신뢰 그룹에 걸친 사용자 인증 확장
US9100392B2 (en) Method and apparatus for providing user authentication and identification based on a one-time password
US20140310764A1 (en) Method and apparatus for providing user authentication and identification based on gestures
CN108462697B (zh) 数据处理方法和装置、电子设备、计算机可读存储介质
WO2019011109A1 (zh) 权限控制方法及相关产品
WO2020088483A1 (zh) 一种音频控制方法及电子设备
WO2017181365A1 (zh) 一种耳机声道控制方法、相关设备及系统
EP3852102B1 (en) Voice assistant proxy for voice assistant servers
WO2018133282A1 (zh) 一种动态识别的方法及终端设备
CN110660398B (zh) 声纹特征更新方法、装置、计算机设备及存储介质
WO2019218843A1 (zh) 按键配置方法、装置、移动终端及存储介质
WO2019011108A1 (zh) 虹膜识别方法及相关产品
US20240013789A1 (en) Voice control method and apparatus
AU2019211885A1 (en) Authentication window display method and apparatus
CN111652624A (zh) 购票处理方法、检票处理方法、装置、设备及存储介质
CN108632713B (zh) 音量控制方法、装置、存储介质及终端设备
CN109960910B (zh) 语音处理方法、装置、存储介质及终端设备
WO2021213371A1 (zh) 信息处理方法和电子设备
CN114493787A (zh) 房屋管理方法、装置及计算机可读存储介质
CN107621977B (zh) 一种应用的控制方法、终端及计算机可读存储介质
CN110047494B (zh) 设备响应方法、设备及存储介质
WO2021227530A1 (zh) 设备使能方法及装置、存储介质
WO2019061248A1 (zh) 虚拟按键操作方法、终端、存储介质及计算机程序

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18924696

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2018924696

Country of ref document: EP

Effective date: 20201204

ENP Entry into the national phase

Ref document number: 20207037501

Country of ref document: KR

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

WWE Wipo information: entry into national phase

Ref document number: 2021101686

Country of ref document: RU