CN114360546A - Electronic equipment and awakening method thereof - Google Patents

Electronic equipment and awakening method thereof Download PDF

Info

Publication number
CN114360546A
CN114360546A CN202011063583.4A CN202011063583A CN114360546A CN 114360546 A CN114360546 A CN 114360546A CN 202011063583 A CN202011063583 A CN 202011063583A CN 114360546 A CN114360546 A CN 114360546A
Authority
CN
China
Prior art keywords
voiceprint
sound
electronic equipment
confidence
awakening
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011063583.4A
Other languages
Chinese (zh)
Inventor
孙渊
屈伸
许天亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202011063583.4A priority Critical patent/CN114360546A/en
Priority to PCT/CN2021/120305 priority patent/WO2022068694A1/en
Publication of CN114360546A publication Critical patent/CN114360546A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/22Position of source determined by co-ordinating a plurality of position lines defined by path-difference measurements
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04WWIRELESS COMMUNICATION NETWORKS
    • H04W4/00Services specially adapted for wireless communication networks; Facilities therefor
    • H04W4/02Services making use of location information
    • H04W4/021Services related to particular areas, e.g. point of interest [POI] services, venue services or geofences

Abstract

The application relates to an electronic device and a wake-up method thereof. The method comprises the following steps: receiving sound, and calculating the confidence of the awakening words of the sound; if the confidence of the awakening word is larger than or equal to a first threshold, calculating the sound source direction of the sound, and judging whether the sound source direction is in a first direction set or a second direction set; if the sound source position is in the first position set, judging whether to awaken the electronic equipment or not according to the first party position reliability corresponding to the first position in the matched first position set; and if the sound source position is in the second position set, judging whether to awaken the electronic equipment or not according to a second position confidence degree corresponding to the second position in the matched second position set. The method and the device can reduce the false awakening probability of the electronic equipment and improve the user experience.

Description

Electronic equipment and awakening method thereof
Technical Field
The present application relates to the field of terminal technologies, and in particular, to an electronic device and a wake-up method thereof.
Background
The electronic device may perform functions through voice interaction with the user. Such electronic devices include a sound pickup (e.g., a microphone array) and a speaker (e.g., a speaker), and have a sound pickup function and a playback function. For example, smart speakers, smart phones, smart televisions, etc. Before a user interacts with the electronic device in a voice mode, the electronic device needs to be awakened. By waking up, the electronic device can enter an operating state from a standby state. Generally, the electronic device determines whether to wake up by recognizing whether the received sound includes a preset wake-up word.
The electronic device is taken as an intelligent sound box, and the awakening word of the intelligent sound box is 'Xiaoyi'. If the user makes a sound containing the small art and small art, the intelligent sound box detects the small art and small art from the received sound, and the intelligent sound box wakes up. Sometimes, the smart speaker can also play a wake-up response voice to interact with the user voice. For example, "little art is, i can help you do something". However, in some scenarios, a user or other device makes a sound, but the sound does not include "mini art", which causes the smart speaker to be awoken by mistake. For example, the user is watching television. The user does not send the mini art, the sound sent by the television does not contain the mini art, and the intelligent sound box is awoken by mistake. Like this, influenced user's normal life, still required the user additionally to close intelligent audio amplifier, brought not good experience for the user.
Disclosure of Invention
In order to solve the technical problems in the prior art, the application provides the electronic device and the awakening method thereof, which can reduce the false awakening rate of the electronic device and improve the user experience.
In a first aspect, a wake-up method is provided. The method is applied to an electronic device comprising a sound pick-up and a loudspeaker, the sound pick-up comprising a plurality of microphones. The method comprises the following steps: receiving a sound; calculating the confidence of the awakening words of the sound; the awakening word confidence coefficient is used for representing the probability that the sound comprises the awakening word; after the confidence coefficient of the awakening word is larger than or equal to a first threshold value, calculating the sound source orientation of the sound; after the sound source direction is matched with one first direction in the first direction set, and after the first direction position reliability corresponding to the matched first direction is larger than or equal to a third threshold value, awakening the electronic equipment; or not waking up the electronic device after the first party position reliability corresponding to the matched first party is smaller than the third threshold. The awakening word is used for awakening the electronic equipment; the sound source orientation is the direction and position of the sound source relative to the electronic device; the first orientation set comprises M first orientation elements, each first orientation element comprising a first orientation and a first-party position confidence; the first direction is the direction and the position of a sound source for waking up the electronic equipment relative to the electronic equipment and is used for representing that the electronic equipment is woken up in the first direction; the first party position confidence is used for representing the probability of awakening the electronic equipment at the first party; m is a positive integer greater than or equal to 1. Therefore, under the condition that the confidence of the awakening words of the sound is larger than or equal to the first threshold, possible false awakening is further screened out according to the first orientation set, so that the false awakening probability of the electronic equipment is reduced, and the user experience is improved.
According to the first aspect, after the confidence of the awakening word is greater than or equal to a first threshold, the direction of a sound source corresponding to the sound is calculated; the method comprises the following steps: and after the confidence coefficient of the awakening word is greater than or equal to the first threshold and the confidence coefficient of the awakening word is less than the second threshold, calculating the position of the sound source corresponding to the sound. Therefore, the condition that the confidence coefficient of the awakening word is between the first threshold and the second threshold is screened out by setting the second threshold, and the processing efficiency of the electronic equipment can be improved.
According to a first aspect or any implementation of the first aspect above, the sound source position matches one first position in a first set of positions; the method comprises the following steps: the direction of the sound source azimuth relative to the electronic equipment and the direction of one first azimuth in the first azimuth set relative to the electronic equipment have an angular deviation within a preset fourth threshold value; the position of the sound source bearing relative to the electronic device is within a preset fifth threshold from the position of the first bearing relative to the electronic device.
According to the first aspect or any implementation manner of the first aspect, after the sound source direction does not match any first direction in the first direction set, extracting a voiceprint from the sound; after the voiceprint is matched with one first voiceprint in the first voiceprint set, and after the confidence coefficient of the first voiceprint corresponding to the first voiceprint is larger than or equal to a preset sixth threshold value, awakening the electronic equipment; or not waking up the electronic device after the first voiceprint confidence corresponding to the first voiceprint is smaller than a preset sixth threshold. The first voiceprint set comprises L voiceprint elements, and each voiceprint element comprises a first voiceprint and a first voiceprint confidence coefficient. The first voiceprint is used for representing a voiceprint for waking up the electronic equipment, and the first voiceprint confidence coefficient is used for representing the probability that the first voiceprint wakes up the electronic equipment; l is a positive integer of 1 or more. Therefore, if possible false awakening cannot be screened out according to the first orientation set, the possible false awakening is further screened out through the first voiceprint set, so that the false awakening probability of the electronic equipment is reduced, and the user experience is improved.
According to a first aspect, or any implementation manner of the first aspect above, after waking up the electronic device, the method further includes: the first set of orientations and the first set of voiceprints are updated.
According to the first aspect, or any implementation manner of the first aspect, after the confidence of the wakeup word is greater than or equal to the second threshold, the electronic device is woken up, and the first azimuth set and the first voiceprint set are updated.
According to the first aspect, or any one of the above implementation manners of the first aspect, after the confidence of the wakeup word is greater than or equal to the second threshold, the electronic device is awakened, and the first orientation set and the first voiceprint set are updated; the method comprises the following steps: after the confidence coefficient of the awakening word is larger than or equal to a second threshold value, awakening the electronic equipment, and creating a first orientation set and a first voiceprint set; waking up the electronic device to enable the orientation of the electronic device to be included in the first orientation set, waking up the electronic device to enable the voiceprint of the electronic device to be included in the first voiceprint set, giving an initial orientation confidence degree to the orientation of the electronic device to be included in the first orientation set, and giving an initial voiceprint confidence degree to the voiceprint of the electronic device to be included in the first voiceprint set.
In a second aspect, a method of waking up is provided. The method is applied to an electronic device comprising a sound pick-up and a loudspeaker, the sound pick-up comprising a plurality of microphones. The method comprises the following steps: receiving a sound; calculating the confidence of the awakening words of the sound; the awakening word confidence coefficient is used for representing the probability that the sound comprises the awakening word; after the confidence coefficient of the awakening word is larger than or equal to a first threshold value, calculating the sound source orientation of the sound; after the sound source position is matched with one second position in the second position set, and the confidence coefficient of the second position corresponding to the matched second position is larger than or equal to a seventh threshold value, awakening the electronic equipment; or, after the confidence of the second position corresponding to the matched second position is smaller than the seventh threshold, the electronic device is not awakened. The awakening word is used for awakening the electronic equipment; the sound source orientation is the direction and position of the sound source relative to the electronic device; the second bearing set comprises N second bearing elements, each second bearing element comprising a second bearing and a second bearing confidence, the second bearing being the direction and position of the sound source relative to the electronic device that does not wake up the electronic device and being indicative of the absence of wake-up of the electronic device at the second bearing, the second bearing confidence being indicative of the probability of absence of wake-up of the electronic device at the second bearing; n is a positive integer greater than or equal to 1. Therefore, under the condition that the confidence of the sound awakening word is larger than or equal to the first threshold, possible false awakening is further screened out according to the second direction set, so that the false awakening probability of the electronic equipment is reduced, and the user experience is improved.
According to a second aspect, the sound source bearing is matched to one of a second set of bearings; the method comprises the following steps: the angular deviation between the direction of the sound source azimuth relative to the electronic equipment and the direction of one second azimuth in the second azimuth set relative to the electronic equipment is within a preset eighth threshold; and the position deviation of the sound source azimuth relative to the position of the electronic equipment and the position deviation of the second azimuth relative to the position of the electronic equipment are within a preset ninth threshold value.
According to a second aspect, or any implementation form of the second aspect above, the method further comprises: extracting the voiceprint from the sound after the sound source direction is not matched with any second direction in the second direction set; the second set of orientations is updated after the voiceprint does not match any of the first voiceprints in the first set of voiceprints. The first voiceprint set comprises L voiceprint elements, each voiceprint element comprises a first voiceprint and a first voiceprint confidence coefficient, the first voiceprint is used for representing a voiceprint for waking up the electronic equipment, and the first voiceprint confidence coefficient is used for representing the probability of waking up the electronic equipment by the first voiceprint; l is a positive integer of 1 or more. Therefore, if the possible false awakening cannot be screened out according to the second orientation set, the possible false awakening is further screened out through the first voiceprint set, so that the false awakening probability of the electronic equipment is reduced, and the user experience is improved.
According to a second aspect, or any implementation form of the second aspect above, the method further comprises: extracting the voiceprint from the sound after the sound source direction is not matched with any second direction in the second direction set; after the voiceprint is matched with one first voiceprint in the first voiceprint set, and after the confidence coefficient of the first voiceprint corresponding to the first voiceprint is larger than or equal to a preset tenth threshold value, awakening the electronic equipment; or, after the first voiceprint confidence corresponding to the first voiceprint is smaller than a preset tenth threshold, the electronic device is not awakened, and the second orientation set is updated. The first voiceprint set comprises L voiceprint elements, each voiceprint element comprises a first voiceprint and a first voiceprint confidence coefficient, the first voiceprint confidence coefficient is used for representing the probability that the first voiceprint wakes up the electronic equipment, and the first voiceprint is used for representing the voiceprint which wakes up the electronic equipment; l is a positive integer of 1 or more.
According to a second aspect, or any implementation manner of the second aspect above, after waking up the electronic device, the method further includes: updating the first voiceprint set; after not waking up the electronic device, the method further comprises: the second set of orientations is updated.
According to the second aspect, or any implementation manner of the second aspect, after the confidence of the wakeup word is greater than or equal to the second threshold, the electronic device is woken up, and the first voiceprint set is updated. Therefore, the condition that the confidence coefficient of the awakening word is between the first threshold and the second threshold is screened out by setting the second threshold, and the processing efficiency of the electronic equipment can be improved.
In a third aspect, a method of waking up is provided. The method is applied to an electronic device comprising a sound pick-up and a loudspeaker, the sound pick-up comprising a plurality of microphones. The method comprises the following steps: receiving a sound; calculating the confidence of the awakening words of the sound; the awakening word confidence coefficient is used for representing the probability that the sound comprises the awakening word; after the confidence coefficient of the awakening word is larger than or equal to a first threshold value, calculating the sound source orientation of the sound; after the sound source position is matched with one second position in the second position set, the sound source position is not matched with any first position in the first position set, and the confidence of the second position corresponding to the matched second position is larger than or equal to an eleventh threshold value, the electronic equipment is awakened; or, after the confidence of the second orientation corresponding to the matched second orientation is smaller than the eleventh threshold, the electronic device is not awakened. The awakening word is used for awakening the electronic equipment; the sound source orientation is the direction and position of the sound source relative to the electronic device; the first orientation set comprises M first orientation elements, each first orientation element comprising a first orientation and a first-party position confidence; the first direction is the direction and the position of a sound source for waking up the electronic equipment relative to the electronic equipment and is used for representing that the electronic equipment is woken up in the first direction; the first party position confidence is used for representing the probability of awakening the electronic equipment at the first party; the second orientation set comprises N second orientation elements, each second orientation element comprising a second orientation and a second orientation confidence; a second orientation being a direction and position of the sound source relative to the electronic device without waking up the electronic device, for indicating that the electronic device is not woken up in the second orientation; the second orientation confidence is used to indicate a confidence that the electronic device was not woken up at the second orientation; m and N are positive integers greater than or equal to 1. Therefore, under the condition that the confidence of the awakening words of the sound is larger than or equal to the first threshold, possible false awakening is further screened out according to the first direction set and the second direction set, so that the false awakening probability of the electronic equipment is reduced, and the user experience is improved.
According to a third aspect, the sound source bearing is matched to one of a second set of bearings; the method comprises the following steps: the angular deviation between the direction of the sound source azimuth relative to the electronic equipment and the direction of one second azimuth in the second azimuth set relative to the electronic equipment is within a preset twelfth threshold; the position deviation of the sound source azimuth relative to the position of the electronic equipment and the position deviation of the second azimuth relative to the position of the electronic equipment are within a preset thirteenth threshold value; the sound source position does not match any first position in the first position set; the method comprises the following steps: the direction of the sound source azimuth relative to the electronic equipment and the direction of any one first azimuth in the first azimuth set relative to the electronic equipment are not within a preset fourteenth threshold value in angle deviation of the two directions; the positional deviation between the sound source direction and the electronic device is not within a fifteenth threshold, which is preset, from the position of the sound source direction and the electronic device in any one of the first direction sets.
According to the third aspect, or any one of the above implementation manners of the third aspect, the method further includes: after the sound source position is matched with one first position in the first position set and the sound source position is not matched with any second position in the second position set, and after the position reliability of the first position corresponding to the matched first position is larger than or equal to a sixteenth threshold value, the electronic equipment is awakened; or, after the first party position reliability corresponding to the matched first party is smaller than the sixteenth threshold, the electronic device is not awakened.
According to a third aspect, or any implementation form of the above third aspect, the sound source position matches one first position in the first set of positions; the method comprises the following steps: the direction of the sound source azimuth relative to the electronic equipment and the direction of one first azimuth in the first azimuth set relative to the electronic equipment have an angular deviation within a preset fourteenth threshold value; the position deviation of the sound source azimuth relative to the position of the electronic equipment and the position deviation of the first azimuth relative to the position of the electronic equipment are within a preset fifteenth threshold value; the sound source bearing does not match any second bearing in the second set of bearings; the method comprises the following steps: the angular deviation between the direction of the sound source azimuth relative to the electronic equipment and the direction of any one second azimuth in the second azimuth set relative to the electronic equipment is not within a preset twelfth threshold; and, the position deviation of the sound source azimuth relative to the position of the electronic device and any one of the second azimuths in the second azimuth set relative to the position of the electronic device are not within a preset thirteenth threshold.
According to the third aspect, or any one of the above implementation manners of the third aspect, the method further includes: extracting the voiceprint from the sound after the sound source direction is not matched with any second direction in the second direction set and the sound source direction is not matched with any first direction in the first direction set; after the voiceprint is matched with one first voiceprint in the first voiceprint set and the confidence coefficient of the first voiceprint corresponding to the first voiceprint is larger than or equal to a preset sixteenth threshold, the electronic equipment is awakened, and the first azimuth combination and the first voiceprint set are updated; or after the first voiceprint confidence corresponding to the first voiceprint is smaller than a preset sixteenth threshold, the electronic device is not awakened, and the second orientation set is updated. The first voiceprint set comprises L voiceprint elements, each voiceprint element comprises a first voiceprint and a first voiceprint confidence coefficient, the first voiceprint confidence coefficient is used for representing the probability that the first voiceprint wakes up the electronic equipment, and the first voiceprint is used for representing the voiceprint which wakes up the electronic equipment; l is a positive integer of 1 or more. Therefore, if possible false awakening cannot be screened according to the first azimuth set and the second azimuth set, the possible false awakening is further screened through the first voiceprint set, so that the false awakening probability of the electronic equipment is reduced, and the user experience is improved.
According to the third aspect, or any one of the above implementation manners of the third aspect, the method further includes: the second set of orientations is updated after the voiceprint does not match any of the first voiceprints in the first set of voiceprints.
According to the third aspect, or any one of the above implementation manners of the third aspect, the method further includes: updating the first set of bits after waking up the electronic device; the second set of bearings is updated after the electronic device is not woken up.
In a fourth aspect, an electronic device is provided. This electronic equipment includes adapter and speaker, and the adapter includes a plurality of microphones, and electronic equipment still includes: a processor; a memory; and a computer program, wherein the computer program is stored in the memory, and when executed by the processor, causes the electronic device to perform the method as described in any one of the first aspect and the implementation manner of the first aspect, any one of the second aspect and the implementation manner of the second aspect, and any one of the third aspect and the implementation manner of the third aspect.
For a technical effect corresponding to any one of the implementation manners of the fourth aspect and the fourth aspect, reference may be made to any one of the implementation manners of the first aspect and the first aspect, any one of the implementation manners of the second aspect and the second aspect, and a technical effect corresponding to any one of the implementation manners of the third aspect and the third aspect, which is not described herein again.
In a fifth aspect, a computer-readable storage medium is provided. The computer readable storage medium comprises a computer program which, when run on an electronic device, causes the electronic device to perform the method as described in the first aspect and any one of the implementations of the first aspect, the second aspect and any one of the implementations of the second aspect, the third aspect and any one of the implementations of the third aspect, wherein the electronic device comprises a sound pick-up and a loudspeaker, and the sound pick-up comprises a plurality of microphones.
For a technical effect corresponding to any one of the implementation manners of the fifth aspect and the fifth aspect, reference may be made to any one of the implementation manners of the first aspect and the first aspect, any one of the implementation manners of the second aspect and the second aspect, and a technical effect corresponding to any one of the implementation manners of the third aspect and the third aspect, which is not described herein again.
In a sixth aspect, a computer program product is provided. When run on a computer, cause the computer to perform the method as described in any one of the implementations of the first aspect and the first aspect, any one of the implementations of the second aspect and the second aspect, and any one of the implementations of the third aspect and the third aspect.
For a technical effect corresponding to any one implementation manner of the sixth aspect and the sixth aspect, reference may be made to any one implementation manner of the first aspect and the first aspect, any one implementation manner of the second aspect and the second aspect, and a technical effect corresponding to any one implementation manner of the third aspect and the third aspect, which are not described herein again.
Drawings
Fig. 1 is a schematic hardware structure diagram of an electronic device according to an embodiment of the present disclosure;
fig. 2 is a schematic diagram of a software structure of an electronic device according to an embodiment of the present application;
fig. 3 is a scene schematic diagram of a wake-up method according to an embodiment of the present application;
fig. 4 is a schematic diagram of a graphical user interface set by a user in the wake-up method according to the embodiment of the present application;
fig. 5 is a flowchart of an embodiment of a wake-up method provided in the embodiment of the present application;
fig. 6 is a flowchart of another embodiment of a wake-up method according to an embodiment of the present application;
fig. 7 is a flowchart of a wake-up method according to another embodiment of the present application;
fig. 8 is a flowchart of a wake-up method according to another embodiment of the present application;
fig. 9 is a flowchart of a wake-up method according to another embodiment of the present application;
fig. 10 is a flowchart of a wake-up method according to another embodiment of the present application;
fig. 11 is a schematic structural component diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The terminology used in the following examples is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of this application and the appended claims, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, such as "one or more", unless the context clearly indicates otherwise. It should also be understood that in the following embodiments of the present application, "at least one", "one or more" means one, two or more (including two). The term "and/or" is used to describe an association relationship that associates objects, meaning that three relationships may exist; for example, a and/or B, may represent: a alone, both A and B, and B alone, where A, B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship.
Reference throughout this specification to "one embodiment" or "some embodiments," or the like, means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the present application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," or the like, in various places throughout this specification are not necessarily all referring to the same embodiment, but rather "one or more but not all embodiments" unless specifically stated otherwise. The terms "comprising," "including," "having," and variations thereof mean "including, but not limited to," unless expressly specified otherwise. The term "coupled" includes both direct and indirect connections, unless otherwise noted.
The terminology used in the description of the embodiments section of the present application is for the purpose of describing particular embodiments of the present application only and is not intended to be limiting of the present application.
In one example, the probability of false wake-up generated by the electronic device is reduced by optimizing a preset wake-up word model in the electronic device. The main functions of the awakening word model are as follows: and detecting the awakening words from the sound collected by the electronic equipment to obtain the probability that the sound contains the awakening words. The wake word model is a trained machine learning model, for example, a model for detecting the wake words may be established in advance, and the model is trained by using the samples to obtain the wake word model. The pre-established model may be a neural network model, a gaussian mixture model, a hidden markov model, or the like. The sample may be a sound containing a wake-up word, a phoneme sequence of the sound containing the wake-up word, or an audio feature of the sound containing the wake-up word. Sounds containing wake-up words may be recorded by different people in different scenes. The sounds containing the awakening words recorded by different people in different scenes are used, so that the awakening words in the sounds in various scenes can be detected by the trained awakening word model. The sound recorded in different scenes may include noise (e.g., non-wake words) rather than just wake words. Therefore, sounds recorded in different scenes are used as samples, and when the awakening word model is trained, the awakening word model is polluted, so that the awakening word model can possibly recognize sounds including non-awakening words as awakening word sounds, and mistaken awakening occurs. Taking the smart sound box with the awakening word as the "mini art", after the awakening word model obtained based on the training of the method is set in the smart sound box, the awakening word model may detect sounds similar to or even completely different from the pronunciation of the "mini art" in the sounds picked up by the smart sound box as sounds containing the awakening word, so that the smart sound box is awakened by mistake.
In order to reduce the problem of mistaken awakening of the electronic device due to pollution of the awakening word model caused by unclean samples as much as possible, the awakening word model needs to be continuously optimized. Specifically, the wake word model is continuously optimized in an iterative mode through data labeling. However, data labeling requires manual labeling of the sound as a sample, so that the human resource consumption is too high, and the optimized awakening word model still has a false awakening problem with a certain probability. Therefore, the electronic device and the awakening method provided by the embodiment of the application can reduce the false awakening probability of the electronic device and improve the user experience.
The electronic equipment provided by the embodiment of the application is electronic equipment with a sound pickup function and an external voice playing function. For example: smart speaker, smart mobile phone, panel computer, personal computer (personal computer, PC), wearable equipment (like intelligent glasses, intelligent wrist-watch, intelligent bracelet etc.), intelligent household electrical appliances such as smart television, the intelligent screen, intelligent internet vehicle (ICV), intelligent (car) car (smart/intelligent car) or mobile unit etc..
Fig. 1 shows a schematic structural diagram of an electronic device 100. The electronic device 100 may include a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a key 190, a motor 191, an indicator 192, a camera 193, a display screen 194, a Subscriber Identification Module (SIM) card interface 195, and the like.
It is to be understood that the illustrated structure of the embodiment of the present invention does not specifically limit the electronic device 100. In other embodiments of the present application, electronic device 100 may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware. For example, the electronic device 100 may be a smart speaker. The smart speaker may include: a processor 110, an internal memory 121, a speaker 170A, and a microphone 170C.
Processor 110 may include one or more processing units, such as: the processor 110 may include some or all of an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors.
In some embodiments, processor 110 may include one or more interfaces. The interface may include an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, and the like.
The I2S interface may be used for audio communication. In some embodiments, processor 110 may include multiple sets of I2S buses. The processor 110 may be coupled to the audio module 170 via an I2S bus to enable communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may communicate sound to the wireless communication module 160 via an I2S interface, enabling answering of calls via a bluetooth headset.
The PCM interface may also be used for audio communication, sampling, quantizing and encoding analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled by a PCM bus interface. In some embodiments, the audio module 170 may also transmit sound to the wireless communication module 160 through the PCM interface, so as to receive a call through the bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.
It should be understood that the connection relationship between the modules according to the embodiment of the present invention is only illustrative, and is not limited to the structure of the electronic device 100. In other embodiments of the present application, the electronic device 100 may also adopt different interface connection manners or a combination of multiple interface connection manners in the above embodiments.
The electronic device 100 implements display functions via the GPU, the display screen 194, and the application processor. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.
The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to extend the memory capability of the electronic device 100. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music, video, etc. are saved in an external memory card.
The internal memory 121 may be used to store computer-executable program code, which includes instructions. The internal memory 121 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The storage data area may store data (such as audio data, phone book, etc.) created during use of the electronic device 100, and the like. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (UFS), and the like. The processor 110 executes various functional applications of the electronic device 100 and data processing by executing instructions stored in the internal memory 121 and/or instructions stored in a memory provided in the processor.
The electronic device 100 may implement audio functions via the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone interface 170D, and the application processor. Such as music playing, recording, etc.
The audio module 170 is used to convert digital audio information into analog sound output and also to convert analog audio input into digital sound. The audio module 170 may also be used to encode and decode sound. In some embodiments, the audio module 170 may be disposed in the processor 110, or some functional modules of the audio module 170 may be disposed in the processor 110.
The speaker 170A, also called a "horn", is used to convert the audio electrical signal into an acoustic signal. The electronic apparatus 100 can listen to music through the speaker 170A or listen to a handsfree call.
The receiver 170B, also called "earpiece", is used to convert the electrical audio signal into an acoustic signal. When the electronic apparatus 100 receives a call or voice information, it can receive voice by placing the receiver 170B close to the ear of the person.
The microphone 170C, also referred to as a "microphone," is used to convert sound signals into electrical signals. When making a call or transmitting voice information, the user can input a voice signal to the microphone 170C by speaking the user's mouth near the microphone 170C. The electronic device 100 may be provided with at least one microphone 170C. In other embodiments, the electronic device 100 may be provided with two microphones 170C to achieve a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device 100 may further include three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, perform directional recording, and so on.
The headphone interface 170D is used to connect a wired headphone. The headset interface 170D may be the USB interface 130, or may be a 3.5mm open mobile electronic device platform (OMTP) standard interface, a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.
The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration cues, as well as for touch vibration feedback. For example, touch operations applied to different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 191 may also respond to different vibration feedback effects for touch operations applied to different areas of the display screen 194. Different application scenes (such as time reminding, receiving information, alarm clock, game and the like) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.
Indicator 192 may be an indicator light that may be used to indicate a state of charge, a change in charge, or a message, missed call, notification, etc.
The software system of the electronic device 100 may employ a layered architecture, an event-driven architecture, a micro-core architecture, a micro-service architecture, or a cloud architecture. The embodiment of the present invention uses an Android system with a layered architecture as an example to exemplarily illustrate a software structure of the electronic device 100.
Fig. 2 is a block diagram of a software configuration of the electronic apparatus 100 according to the embodiment of the present invention.
The layered architecture divides the software into several layers, each layer having a clear role and division of labor. The layers communicate with each other through a software interface. In some embodiments, the Android system is divided into four layers, an application layer, an application framework layer, an Android runtime (Android runtime) and system library, and a kernel layer from top to bottom.
The application layer may include a series of application packages.
As shown in fig. 2, the application package may include applications such as camera, gallery, calendar, phone call, map, navigation, WLAN, bluetooth, music, video, short message, etc.
The application framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions.
As shown in FIG. 2, the application framework layers may include a window manager, content provider, view system, phone manager, resource manager, notification manager, and the like.
The window manager is used for managing window programs. The window manager can obtain the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.
The content provider is used to store and retrieve data and make it accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phone books, etc.
The view system includes visual controls such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may be composed of one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.
The phone manager is used to provide communication functions of the electronic device 100. Such as management of call status (including on, off, etc.).
The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and the like.
The notification manager enables the application to display notification information in the status bar, can be used to convey notification-type messages, can disappear automatically after a short dwell, and does not require user interaction. Such as a notification manager used to inform download completion, message alerts, etc. The notification manager may also be a notification that appears in the form of a chart or scroll bar text at the top status bar of the system, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, prompting text information in the status bar, sounding a prompt tone, vibrating the electronic device, flashing an indicator light, etc.
The Android Runtime comprises a core library and a virtual machine. The Android runtime is responsible for scheduling and managing an Android system.
The core library comprises two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.
The application layer and the application framework layer run in a virtual machine. And executing java files of the application program layer and the application program framework layer into a binary file by the virtual machine. The virtual machine is used for performing the functions of object life cycle management, stack management, thread management, safety and exception management, garbage collection and the like.
The system library may include a plurality of functional modules. For example: surface managers (surface managers), Media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., OpenGL ES), 2D graphics engines (e.g., SGL), and the like.
The surface manager is used to manage the display subsystem and provide fusion of 2D and 3D layers for multiple applications.
The media library supports a variety of commonly used audio, video format playback and recording, and still image files, among others. The media library may support a variety of audio-video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, and the like.
The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like.
The 2D graphics engine is a drawing engine for 2D drawing.
The kernel layer is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.
For convenience of understanding, the following embodiments of the present application will specifically describe a method provided by the embodiments of the present application by taking an electronic device having a structure shown in fig. 1 and fig. 2 as an example, with reference to the accompanying drawings and application scenarios. It should be noted that, although the software structure of the electronic device is illustrated in fig. 2 as an example, the software structure illustrated in fig. 2 is only an illustrative example, and software structures of other operating systems are also applicable to the wake-up method provided in the embodiment of the present application.
For convenience of description, the electronic device is taken as a smart speaker, and the smart speaker is in a home environment as an example, so that the wake-up method provided by the embodiment of the present application is explained. Fig. 3 is a scene schematic diagram of a wake-up method according to an embodiment of the present application. As shown in fig. 3, the home environment is provided with, in addition to the smart speakers: other equipment with the function of playing sound outside, such as televisions, traditional sound boxes and the like, and furniture, such as sofas, dining tables and the like. The user can move around a sofa, a dining table, etc., and the intelligent sound box is awakened by sending voice containing awakening words. The intelligent sound box can also be placed in other scenes. Such as shopping malls, office environments, etc. By executing the awakening method provided by the embodiment of the application, the false awakening probability of the intelligent sound box can be reduced. Hereinafter, a specific implementation of the wake-up method provided in the embodiment of the present application is described.
When the intelligent sound box is not awakened and is in a standby state, sound in the environment is picked up to obtain sound. The sound includes the sound of the targeted speaker, i.e., the user, and also includes noise signals in the environment. Therefore, noise reduction processing is generally performed on the received sound to obtain a clean sound, which is used as a sound for triggering execution of the wake-up method according to the embodiment of the present application.
In the wake-up method of the embodiment of the application, the following settings are set in the intelligent loudspeaker box: waking up at least one of a azimuth set, a false wake-up azimuth set, and a voiceprint set. Wherein, the miswake-up orientation set comprises: a false wake up bearing, and a confidence level of the false wake up bearing. In the following, an element in a set of false wake azimuths is represented by (false wake azimuth, confidence). The mistaken awakening position is used for recording the sound source position of the sound which does not awaken the intelligent sound box. The confidence of the false awakening position is used for describing the probability of sending out voice for awakening the intelligent sound box at the false awakening position. The confidence level may identify the probability by the magnitude of the value. For example, the larger the number, the higher the probability, and the smaller the number, the lower the probability. The orientation in the embodiments of the present application refers to the direction and position relative to the smart sound box, for example, the sound source orientation refers to the direction and position of the sound source relative to the smart sound box.
The wake up bearing set includes: a wake up bearing, and a confidence level of the wake up bearing. In the following, one element in the set of wake up bearings is represented by (wake up bearing, confidence). The awakening position is used for recording the sound source position of the sound awakening the intelligent sound box. The confidence of the wake-up orientation is used to describe the probability of the wake-up orientation emitting the voice to wake up the smart speaker.
The voiceprint set includes: wake voiceprint, and confidence of wake voiceprint. The confidence of the wake voiceprint can be represented by the number of hits of the wake voiceprint. In the following, one element in the voiceprint set is represented by (wake-up voiceprint, confidence). The wake-up voiceprint is used for recording a user voiceprint of sounds for waking up the smart speaker. The confidence of the awakening voiceprint is used for recording the probability of the sound with the awakening voiceprint awakening the intelligent sound box. The hit times are used for recording the times of awakening the intelligent sound box by the sound with the awakening voiceprint. The user voiceprint and the wake-up voiceprint can be represented by parameter values of a voiceprint characteristic parameter. The voiceprint characteristic parameters may include, for example, but are not limited to, intensity, wavelength, frequency, cadence, and the like. The parameter value of at least one voiceprint characteristic parameter differs between different voiceprints.
The possible representation method and calculation method for the sound source orientation are explained as follows: alternatively, a coordinate system of the smart loudspeaker may be established. For example, the origin of the coordinate system may be the physical center point of the smart speaker, and the positive x-axis direction may be a direction pointing horizontally directly in front of the smart speaker. The method for establishing the coordinate system is only an example, and is not limited to the method for establishing the coordinate system of the smart speaker. The sound source position can be identified by distance, and angle, in the above-mentioned coordinate system. In particular, the distance of the sound source orientation can be used to record: the distance between the sound source of the sound and the origin of the coordinate system of the intelligent loudspeaker box. The angle can be used to record: and the origin of the coordinate system of the intelligent sound box points at an included angle between a ray of a sound source of the sound and the positive direction of the x axis of the intelligent sound box. Optionally, a parameter of the dimension of height may be further added to the sound source bearing. The height can be used to record: the vertical distance between the source of the sound and the origin of the coordinate system. The information such as the distance and the angle of the sound source direction can be calculated by the intelligent sound box based on a related sound source positioning method. The sound source localization method may calculate a relative position between a sound source and a smart sound box based on a microphone array composed of at least 2 microphones provided in the smart sound box. Such as distance and angle, etc. Specifically, the sound source localization method may include, but is not limited to: controllable beam forming technology based on maximum output power, high-resolution spectrogram estimation technology, sound source localization technology based on time-delay (TDE), and the like. Taking TDE-based algorithm as an example, the core of the TDE-based algorithm is to accurately estimate propagation delay, which is generally obtained by performing cross-correlation processing on sounds picked up by a microphone array of an intelligent sound box. And then, the distance between the intelligent sound box and the sound source can be calculated by simple delay summation, geometric calculation or a method of directly utilizing a cross-correlation result to carry out controllable power response search and the like. Specific algorithms embodiments of the present application do not necessarily have to be developed.
Initial setting of the set: in an initial situation, for example, when the smart speaker is not used before factory shipment or is restored to a factory setting, at least one of the default false wake-up direction set, the default wake-up direction set, and the default voiceprint set in the smart speaker may be null. In the process that the user uses the intelligent sound box, the user can set at least one of the mistaken awakening position set, the awakening position set and the voiceprint set based on the environment where the intelligent sound box is located, or the user does not need to set the mistaken awakening position set or the voiceprint set. If the user does not set, user operation can be reduced, and user experience is improved.
The method of setting the above set is exemplified: because the sound source position of the sound which does not awaken the smart speaker is recorded in the mistaken awakening position, the mistaken awakening position generally corresponds to the position of other equipment which can make sound in the environment relative to the smart speaker. Based on this, a false wake up orientation may be set based on the orientation of other devices in the environment that are capable of making sounds relative to the smart speaker, and an initial confidence level may be set for the false wake up orientation by the user or the smart speaker. Taking the home environment shown in fig. 3 as an example, when the smart speaker has a display screen, a setting interface of the false wake-up direction may be provided for the user on the display screen of the smart speaker. For example, as shown in fig. 4, the user may set a false wake-up direction and a confidence level of the false wake-up direction based on the relative position between the tv and the smart speaker; setting a mistaken awakening direction and a confidence coefficient of the mistaken awakening direction based on the relative position between the traditional sound box and the intelligent sound box; then clicking a 'confirm' control; correspondingly, the intelligent sound box detects the operation of the user for the 'determination' control in the setting interface, acquires the information such as the false awakening direction and the confidence coefficient in the setting interface, and stores the information in the false awakening direction set. Optionally, if the smart sound box does not include a display screen or the display screen is inconvenient for a user to operate, the setting interface may be displayed to the user by other devices (e.g., a smartphone of the user) associated with the smart sound box, and the other devices send information, such as a false wake-up direction and confidence level, acquired from the setting interface to the smart sound box.
The waking direction records the sound source direction of the sound for waking up the smart sound box, so the waking direction generally corresponds to the direction of the position of the user, which often sends the sound for waking up the smart sound box, relative to the smart sound box in the environment. Based on this, the user may set a wake up orientation relative to the orientation of the smart speaker based on where the user often wakes up the smart speaker in the environment, and an initial confidence level is set for the wake up orientation by the user or the smart speaker. Taking the home environment shown in fig. 3 as an example, a general user often moves around a sofa, a dining table, etc. and wakes up the smart speaker. One or more wake orientations and corresponding confidence levels may thus be set based on the position on the sofa relative to the orientation of the smart speaker, and one or more wake orientations and corresponding confidence levels may be set based on the position near the table, e.g., the position of the dining chair, relative to the orientation of the smart speaker. The specific setting manner may refer to the setting manner of the false wake-up direction shown in fig. 4, and is not described herein again.
The wake-up voiceprint can be set by the user by recording voice. Correspondingly, the intelligent sound box acquires the user voiceprint according to the sound obtained by recording the voice and sets the user voiceprint to be awakened. The user or the smart speaker sets the initial confidence level of the wake-up voiceprint. For example, if the confidence is the number of hits, the initial confidence may be 0.
Updating the set: in the using process of the intelligent sound box, when the intelligent sound box is awakened, the awakening position set can be updated according to the sound source position of the sound awakening the intelligent sound box; updating the voiceprint set according to the user voiceprint extracted from the voice; and when the intelligent sound box is not awakened, updating the mistaken awakening position set according to the sound source position of the sound which does not awaken the intelligent sound box.
And calculating the sound source position of the sound based on the sound for waking up the intelligent sound box, and judging whether the wake-up position set comprises the sound source position of the sound. If yes, improving the confidence of the awakening position corresponding to the sound source position; if not, adding the sound source position as a wake-up position into the wake-up position set, and setting initial confidence for the newly added wake-up position. When the intelligent voice signal judges whether the awakening azimuth set comprises the sound source azimuth of the sound, the sound source azimuth can be completely consistent with a certain awakening azimuth and can have certain deviation. For example, when the wake-up azimuth and the sound source azimuth are represented by (distance, angle), respectively, it may be preset to set a distance threshold and an angle threshold, respectively. If the distance difference between the sound source azimuth and the awakening azimuth 1 meets the distance threshold and the angle difference meets the angle threshold, the awakening azimuth set can be judged to comprise the sound source azimuth. The wakeup position 1 may be referred to as a wakeup position corresponding to the sound source position, and may also be referred to as a wakeup position including the sound source position. Correspondingly, the confidence of the awakening azimuth 1 corresponding to the sound source azimuth is improved. It should be noted that the present embodiment does not limit the set value of the initial confidence. The embodiment of the application also does not limit the amplitude of confidence improvement when the confidence of the awakening position is improved every time. For example, the amplitude may be a fixed value, or a fixed percentage of confidence, etc. Similarly, the embodiment of the present application also does not limit the specific values of the preset distance threshold and the preset angle threshold. The distance threshold and the angle threshold may be determined based on the accuracy of the wake-up method, the accuracy of the sound source bearing calculation method, and the like. Specifically, the higher the accuracy of the wake-up method is, the smaller the values of the distance threshold and the angle threshold are generally; the higher the accuracy of the sound source direction calculation method is, the smaller the numerical values of the distance threshold and the angle threshold are generally. In addition, the distance threshold and the angle threshold are set, the wake-up direction in the wake-up direction set can be expanded from one point to one area, and the distance threshold and the angle threshold can be set based on the size of the expected expansion area. The distance threshold value and the angle threshold value can be adjusted by a user of the intelligent sound box according to needs.
Based on the voice for awakening the intelligent sound box, the intelligent sound box can extract the voice print of the user of the voice, and whether the awakening voice print of the voice print set comprises the extracted voice print of the user is judged. If yes, improving the confidence of the awakening voiceprint; otherwise, adding the user voiceprint as a wake-up voiceprint to the voiceprint set, and setting confidence for the newly added wake-up voiceprint. Similar to the determination of the wake-up orientation set, a certain error between the voiceprint of the user and the wake-up voiceprint may also be allowed in determining whether the voiceprint set includes the extracted voiceprint of the user. For example, a threshold may be set for each voiceprint feature included in a voiceprint, and as long as a difference between a value of each voiceprint feature of a voiceprint of a user and a value of a corresponding voiceprint feature of a certain wake-up voiceprint is smaller than the threshold corresponding to the voiceprint feature, it may be considered that a voiceprint set includes the voiceprint of the user, where the certain wake-up voiceprint is a wake-up voiceprint corresponding to the voiceprint of the user.
Based on the sound that does not awaken the smart speaker, the smart voice signal calculates the sound source location of the sound. The intelligent voice signal judges whether the miswake-up direction set comprises the sound source direction of the sound. If yes, reducing the confidence of the false awakening position corresponding to the sound source position; and if not, adding the sound source azimuth as a false awakening azimuth into the false awakening azimuth set, and setting initial confidence for the newly added false awakening azimuth. The implementation of the update of the false wake-up azimuth set may refer to the related description in the update of the wake-up azimuth set, and is not described herein again.
The awakening method in the embodiment of the application judges whether to execute the awakening process or not based on the awakening word confidence coefficient output by the awakening word model, the mistaken awakening direction set and/or the awakening direction set and the voiceprint set, so that the mistaken awakening probability is reduced. The wake-up method is specifically described below.
In one embodiment, the smart speaker includes a microphone and a speaker. Wherein the sound pickup comprises a microphone array, and the microphone array comprises a plurality of microphones.
As shown in fig. 5, the smart speaker is preset with a wake-up azimuth set (also referred to as a first azimuth set), a false wake-up azimuth set (also referred to as a second azimuth set), and a voiceprint set (also referred to as a first voiceprint set). The awakening method in the embodiment of the application can comprise the following steps:
step 501: the intelligent sound box picks up sound in the environment to obtain sound.
Because the pickup of sound in the environment is generally carried out continuously to intelligent audio amplifier, consequently intelligent audio amplifier generally will divide into the audio frequency section according to certain duration to the sound that continuously picks up. The sound in the embodiment of the present application generally refers to a divided audio segment. The specific duration of the audio segment is not limited in the embodiments of the present application.
In order to reduce the influence of noise on the subsequent processing, the smart speaker generally performs noise reduction processing on the sound before executing step 502, so as to suppress the noise signal in the sound and obtain cleaner sound. Thus, the sound used in step 502 is generally a sound subjected to noise reduction processing.
Because intelligent audio amplifier lasts the pickup, in order to reduce intelligent audio amplifier's data processing volume and electric quantity consumption, can set up preset conditions such as sound intensity threshold value for the sound that the pickup obtained, only satisfy the sound of preset condition and just can be based on awakening the word model calculation awakening word confidence coefficient for trigger subsequent processing. Specific preset conditions are not limited in the examples of the present application.
Step 502: the intelligent loudspeaker box calculates the confidence of the awakening words of the sound based on the awakening word model.
The wakeup word confidence is used to describe the probability that the sound includes the wakeup word sound.
Step 503: the intelligent sound box judges whether the confidence of the awakening words is smaller than a first threshold value; if the wakeup word confidence is not less than the first threshold, step 504 is performed.
Further, step 503 further includes: if the value is smaller than the first threshold value, the awakening process is not executed, the mistaken awakening direction set is updated according to the sound source direction of the sound, and the branching process is finished.
It should be noted that the determination in step 503 may also be performed by the wakeup word model, so that the wakeup word model may output two parameters, i.e., a determination result of whether to wake up and a confidence of the wakeup word, which is not limited in the embodiment of the present application.
The wakeup word confidence is used to describe the probability that the sound includes the wakeup word sound. The higher the confidence of the wake word, the greater the probability that the sound will include the wake word. The wake-up method of the embodiment of the application further executes the following steps 504 to 511 to further judge whether to execute the wake-up process, thereby realizing the screening of the false wake-up and reducing the probability of the false wake-up.
Step 504: the smart sound box judges whether the confidence of the awakening word is smaller than a second threshold value, the second threshold value is larger than the first threshold value, if the confidence of the awakening word is not smaller than the second threshold value, the awakening process is executed, and step 511 is executed; if less than the second threshold, step 505 is performed.
In the embodiment of the present application, the case that the confidence of the wakeup word is not less than the first threshold is further divided into two types through the second threshold: if the confidence of the awakening word is not less than the second threshold, the probability that the sound contains the awakening word sound is high, and the probability of false awakening is low, so that an awakening process is directly executed, and the intelligent sound box is awakened; if the confidence of the awakening word is smaller than the second threshold and not smaller than the first threshold, it indicates that the probability of the sound including the awakening word sound is relatively low and the probability of the occurrence of false awakening is relatively high, so that the following steps 506 to 509 are performed, and whether to perform the awakening process is further determined by combining the awakening direction set, the false awakening direction set, or the voiceprint set. For example, the confidence of the wakeup word ranges from (0, 100), the first threshold is 30, and the second threshold is 80. Correspondingly, if the confidence coefficient of the awakening word is less than 30, the awakening process is not executed; if the confidence of the awakening word is not less than 80, directly executing an awakening process; if the confidence of the awakening word is less than 80 and not less than 30, the following steps 505 to 509 are executed to further screen out possible false awakenings.
The wake orientation set, the false wake orientation set, or the voiceprint set may include at least one set element, each set element including at least two cells. For example, the wake orientation set includes set elements including a wake orientation and a confidence level corresponding to the wake orientation; the set elements included in the false awakening azimuth set comprise a false awakening azimuth and a confidence degree corresponding to the false awakening azimuth; the set elements included in the voiceprint set include voiceprints and confidence levels corresponding to the voiceprints.
It should be noted that there is no limitation on the execution sequence between the step of executing the wake-up procedure in step 504 and step 511, and the step 511 is executed after the wake-up procedure is executed in fig. 5 as an example.
Both the first threshold and the second threshold may be preset.
Step 505: the intelligent sound box calculates the sound source position of the sound.
The method for calculating the azimuth of the sound source has been described in the foregoing description, and will not be described in detail here.
Step 506: the intelligent sound box judges whether the mistaken awakening azimuth set comprises the sound source azimuth of the sound or not and judges whether the awakening azimuth set comprises the sound source azimuth of the sound or not; if the only mistaken wake-up bearing set comprises the sound source bearing, executing step 507; if the wake-only bearing set includes a sound source bearing, perform step 508; if not, step 509 is performed.
Step 507: the intelligent sound box judges whether to execute a wake-up process according to the confidence of the false wake-up direction corresponding to the sound source direction; if yes, executing the wake-up flow, and executing step 511; if not, the awakening process is not executed, the mistaken awakening direction set is updated according to the sound source direction of the sound, and the branching process is finished.
Wherein, whether the intelligent sound box judges to execute the awakening process according to the confidence coefficient of the false awakening position corresponding to the sound source position, and can include:
if the confidence of the mistaken awakening direction is smaller than the threshold a, judging not to execute the awakening process;
if the confidence of the mistaken awakening direction is not less than the threshold a, judging to execute an awakening process;
if the confidence of the false awakening direction is smaller than the threshold value a, the probability that the sound is a noise signal is relatively high, so that the awakening process is not executed, namely, the intelligent sound box is not awakened, and the false awakening probability is reduced.
Step 508: the intelligent sound box judges whether to execute an awakening process according to the confidence coefficient of the awakening direction corresponding to the sound source direction; if yes, executing the wake-up procedure, and executing step 511; if not, the awakening process is not executed, the mistaken awakening direction set is updated according to the sound source direction of the sound, and the branching process is finished.
Wherein, whether the intelligent sound box judges to execute the awakening process according to the confidence coefficient of the awakening position corresponding to the sound source position, and can include:
if the confidence of the awakening direction is smaller than the threshold b, judging not to execute the awakening process;
if the confidence of the awakening direction is not less than the threshold b, judging to execute an awakening process;
if the confidence of the awakening direction is smaller than the threshold b, the probability that the sound is a noise signal is relatively high, so that the awakening process is not executed, namely, the intelligent sound box is not awakened, and the false awakening probability is reduced.
Step 509: the intelligent sound box extracts user voiceprints of sound and judges whether awakening voiceprints of the voiceprint set comprise the extracted user voiceprints or not; if so, go to step 510; if not, not executing the awakening process; updating the false awakening direction set according to the sound source direction of the sound, and ending the branch process;
step 510: the intelligent sound box judges whether to execute an awakening process according to the confidence level of the awakening voiceprint corresponding to the voiceprint of the user; if yes, executing the wake-up procedure, and executing step 511; if not, the awakening process is not executed, the mistaken awakening direction set is updated according to the sound source direction of the sound, and the branching process is finished.
After the determination result is yes, the execution sequence between the wake-up process and step 511 is not limited.
Wherein, whether the intelligent sound box judges to execute the awakening process according to the confidence coefficient of the awakening voiceprint corresponding to the user voiceprint can include:
if the confidence coefficient of the awakening voiceprint is smaller than the threshold c, judging not to execute the awakening process;
if the confidence coefficient of the awakening voiceprint is not less than the threshold c, judging to execute an awakening process;
if the confidence of awakening the voiceprint is smaller than the threshold c, the possibility that the sound is the sound emitted by people who do not frequently appear in the environment is high, so that the awakening process is not executed, namely, the intelligent sound box is not awakened, and the false awakening probability is reduced.
Step 511: and the intelligent sound box updates the awakening azimuth set according to the sound source azimuth of the sound, updates the voiceprint set according to the voiceprint of the user of the sound, and finishes the branching process.
The implementation of this step may refer to the foregoing description about set update, and is not described here again.
It should be noted that, in the embodiment shown in fig. 5, step 505 may also be directly performed without setting the second threshold, that is, without performing step 504. Or, in another possible implementation, the judgment in step 504 may be moved to step 507, step 508, and step 510 in fig. 5, and the smart sound box judges whether to execute the wake-up process according to the confidence of the wake-up word, where the specific judgment criterion may refer to the judgment criterion shown in fig. 5. Taking step 507 as an example, step 507 will be replaced by: the intelligent sound box judges whether to execute an awakening process according to the awakening word confidence level and the confidence level of the false awakening direction corresponding to the sound source direction; if yes, executing the wake-up process, and executing step 511; if the determination result is negative, the wake-up procedure is not executed, and step 512 is executed. At this time, in step 507, the intelligent sound box determines whether to execute the wake-up process according to the confidence of the wake-up word and the confidence of the false wake-up direction corresponding to the sound source direction, which may include:
if the confidence of the awakening word is not less than the second threshold, judging to execute an awakening process;
if the confidence coefficient of the awakening word is smaller than a second threshold value, the confidence coefficient of the mistaken awakening direction is smaller than a first party position confidence coefficient threshold value, and the awakening process is not executed;
and if the confidence coefficient of the awakening word is smaller than the second threshold value, the confidence coefficient of the mistaken awakening direction is not smaller than the first party position confidence coefficient threshold value, and the awakening process is judged to be executed.
It should be noted that, in fig. 5, it is exemplified that the smart speaker updates the wake-up direction set and the voiceprint set after each time of determining to execute the wake-up procedure, and updates the false wake-up direction set after each time of determining not to execute the wake-up procedure, but may not update the above-mentioned sets after each time of determining to execute the wake-up procedure or not executing the wake-up procedure, but update the above-mentioned sets after selecting some times of determining based on a certain rule, based on the consideration of reducing the data processing amount of the smart speaker, reducing the power consumption, and the like, and the embodiment of the present application is not limited.
After judging that the awakening process is executed, the intelligent sound box updates the awakening direction set and the voiceprint set; after judging that the awakening process is not executed, updating the false awakening direction set; therefore, in the gradual use process of the intelligent sound box, the awakening position recorded in the awakening position set can correspond to the position where the user frequently sends awakening word voice in the environment, the mistaken awakening position recorded in the mistaken awakening position set can correspond to the position of other equipment capable of sending sound in the environment, and the awakening voiceprint recorded in the voiceprint set and having high confidence coefficient corresponds to the user voiceprint which often awakens the intelligent sound box, so that the awakening method provided by the embodiment of the application can achieve the effect of reducing the mistaken awakening probability better.
For example: assuming that the intelligent sound box is placed in a home environment to start using after leaving a factory, a wake-up direction, a false wake-up direction and a voiceprint set are preset in the intelligent sound box, and the three sets are respectively empty; setting the first threshold value to be 0.4, the second threshold value to be 0.7, the threshold value a to be 0.5, the threshold value b to be 0.6 and the threshold value c to be 5; then the process of the first step is carried out,
supposing that the sound of a television placed in the environment is large, the intelligent sound box picks up the television sound to obtain sound 1, and based on the awakening word model, the confidence coefficient of the awakening word of the sound 1 is calculated to be 0.1 and is smaller than a preset first threshold value of 0.4, if the answer is yes, the branch is executed in step 503, the awakening is not executed, the sound source direction 1 of the sound 1 is added to the mistaken awakening direction set to obtain a mistaken awakening direction 1, and an initial confidence coefficient is set for the mistaken awakening direction 1, for example, 0.8; because the false awakening probability is generally very small, the smart sound box collects the television sound again to obtain sound 2, then the confidence coefficient of the awakening word of the sound 2 is calculated to be 0.2, the branch with the judgment result of yes in 503 is executed, and the confidence coefficient of the false awakening direction 1 in the false awakening direction set is reduced; the confidence of the false awakening azimuth 1 is reduced along with the multiple execution of the above flow by the smart sound box, once the confidence is reduced to be below a threshold a 0.5, for example, 0.45, even if the smart sound box collects sound emitted by a television to obtain sound n occasionally, the confidence of the awakening word of the sound n is calculated to be a value, for example, 0.55, between a first threshold 0.4 and a second threshold 0.7, the smart sound box sequentially executes steps 503, 504 and 505, and judges that the confidence is smaller than the threshold a 0.5 according to the confidence 0.48 of the false awakening azimuth 1, the awakening flow is not executed, so that possible false awakening conditions are screened out from the condition of executing the awakening flow in the prior art, and the false awakening probability is reduced;
even if the false awakening caused by the television sound occurs before the confidence of the false awakening azimuth 1 is reduced to be below the threshold a 0.5, for example, the smart sound box collects sound to obtain sound m, the confidence of the awakening word is calculated to be 0.65, the steps 503 to 505 are sequentially executed, the confidence is judged to be not less than the threshold a 0.5 according to the confidence of the false awakening azimuth 1, for example, 0.55, the awakening process is executed, at this time, the awakening azimuth set and the voiceprint set are updated, and therefore the awakening azimuth set also comprises the sound source azimuth 1; however, in the above using process, the smart speaker will update the wake-up azimuth set and the voiceprint set according to the sound of the user when waking up the smart speaker, and in the wake-up voiceprints recorded in the voiceprint set, the confidence of the voiceprint of the user frequently waking up the smart speaker is gradually increased; when the sound with the awakening word confidence coefficient of 0.4-0.7 is acquired from the sound source position where the television is located subsequently, whether the awakening process is executed or not can be further judged according to the voiceprint set, so that possible false awakening is screened, and the false awakening probability is reduced.
According to the awakening method in the embodiment of the application shown in fig. 5, after the awakening word confidence of the sound is calculated based on the awakening word model, the awakening direction set, the false awakening direction set and the sound pattern set are further combined to judge whether to awaken the intelligent sound box, so that possible false awakening is further screened out under the condition that the intelligent sound box is awakened according to the judgment result output by the awakening word model, the false awakening probability of the intelligent sound box is reduced, and the user experience is improved.
Alternatively, the wake-up orientation set, the false wake-up orientation set, and the voiceprint set may also be non-preset, but created along with machine learning; after creation, continuing to enrich and adjust according to machine learning.
In contrast to fig. 5, which takes the preset wakeup position set, the false wakeup position set, and the voiceprint set in the smart speaker as an example, and in the embodiment shown in fig. 6, which takes the preset false wakeup position set and the voiceprint set in the smart speaker as an example, at this time, steps 506 to 511 are replaced with the following steps 601 to 605, specifically:
step 601: the intelligent sound box judges whether the miswake-up direction set comprises the sound source direction of the sound; if yes, go to step 602; if not, go to step 603;
the implementation of this step may refer to the above-mentioned related determination method in the update of the false wake-up direction set, and is not described here again.
Step 602: the intelligent sound box judges whether to execute a wake-up process according to the confidence of the false wake-up direction corresponding to the sound source direction; if yes, executing the awakening process, and executing step 605; if the judgment result is negative, the awakening process is not executed, the mistaken awakening direction set is updated according to the sound source direction of the sound, and the branching process is finished.
For the implementation of this step, refer to the description in step 507, and no further description is given here.
Step 603: the intelligent sound box extracts user voiceprints of sound and judges whether awakening voiceprints of the voiceprint set comprise the extracted user voiceprints or not; if so, go to step 604; if not, the awakening process is not executed, the mistaken awakening direction set is updated according to the sound source direction of the sound, and the branching process is finished.
Step 604: the intelligent sound box judges whether to execute an awakening process according to the confidence level of the awakening voiceprint corresponding to the voiceprint of the user; if yes, executing the wake-up process, and executing step 605; if not, the awakening process is not executed, the mistaken awakening direction set is updated according to the sound source direction of the sound, and the branch process is ended.
Step 605: and the intelligent sound box updates the voiceprint set according to the voiceprint of the user.
For the implementation of this step, refer to the description in step 511, which is not described herein again.
According to the awakening method in the embodiment of the application shown in fig. 6, the awakening word confidence of the sound is calculated based on the awakening word model, and whether the intelligent sound box is awakened or not is further judged by combining the mistaken awakening direction set and the voiceprint set, so that under the condition that the intelligent sound box is awakened according to the judgment result output by the awakening word model, possible mistaken awakening is further screened out, the mistaken awakening probability of the intelligent sound box is reduced, and the user experience is improved.
Different from the case that the mistaken wake-up direction set and the voiceprint set are preset in the smart speaker in the wake-up method shown in fig. 6, the case that the mistaken wake-up direction set and the voiceprint set are preset in the smart speaker in the wake-up method shown in fig. 7 is preset in the smart speaker in the wake-up method; the difference from fig. 6 is mainly that: and replacing the mistaken awakening azimuth set with an awakening azimuth set, omitting the step of updating the mistaken awakening azimuth set, updating the awakening azimuth set according to the sound source azimuth of the sound in step 705, and updating the voiceprint set according to the voiceprint of the sound user.
The step 702 may refer to the description in the step 508 to determine whether to implement the wake-up procedure according to the confidence of the wake-up direction, which is not described herein again.
Updating the first set of orientations and the first set of voiceprints when step 705 is first performed; the method comprises the following steps: creating a first set of orientations and a first set of voiceprints; bringing the direction for awakening the electronic equipment into a first direction set, and giving an initial first party position reliability to the direction brought into the first direction set; and bringing the voiceprint for waking up the electronic equipment into the first voiceprint set, and giving an initial first voiceprint confidence degree to the voiceprint brought into the first voiceprint set.
Updating the first set of orientations and the first set of voiceprints when step 705 is performed later; including at least one of: creating a new first party in the first party set, and giving the newly created first party an initial first party position reliability; creating a new first voiceprint in the first voiceprint set, and giving an initial first voiceprint confidence degree to the newly created first voiceprint; increasing the position reliability of a first party corresponding to an existing first direction matched with one first direction in a first direction set; and for an existing first voiceprint matched on one first voiceprint set, increasing the confidence of the first voiceprint corresponding to the existing first voiceprint.
Although step 705 in fig. 7 is described as an example, the update false wake-up azimuth set, the update voiceprint set, and the like in steps 602 to 605 in fig. 6 are similar to this, and the update false wake-up azimuth set, the update voiceprint set, and the like in steps 507 to 511 in fig. 5 are similar to this; and will not be described one by one here.
According to the awakening method in the embodiment of the application shown in fig. 7, the awakening word confidence of the sound is calculated based on the awakening word model, and whether the intelligent sound box is awakened or not is further judged by combining the awakening direction set and the sound pattern set, so that possible mistaken awakening is further screened out under the condition that the intelligent sound box is awakened according to the judgment result output by the awakening word model, the mistaken awakening probability of the intelligent sound box is reduced, and the user experience is improved.
Fig. 8 is a schematic flowchart of a wake-up method according to another embodiment of the present application. The method may be applied to an electronic device such as the smart speaker described above. The method can comprise the following steps:
step 801: receiving sound, and calculating the confidence of the awakening words of the sound; the awakening word confidence coefficient is used for describing the probability that the sound comprises the awakening word sound;
step 802: if the confidence coefficient of the awakening word is larger than or equal to a first threshold value, calculating the sound source orientation of the sound;
step 803: judging whether the sound source position is in the first position set or the second position set; the first direction set comprises a plurality of first directions, and the first directions are used for recording the sound source directions of sounds which do not awaken the intelligent sound box; the second direction set comprises a plurality of second directions, and the second directions are used for recording the sound source directions of sounds for awakening the intelligent sound box;
step 804: if the sound source position is only in the first position set, judging whether to awaken the intelligent sound box according to the confidence level of the first position corresponding to the sound source position, wherein the confidence level of the first position is used for describing the probability of sending the voice for awakening the intelligent sound box at the first position;
step 805: and if the sound source position is only in the second position set, judging whether to awaken the intelligent sound box according to the confidence coefficient of the second position corresponding to the sound source position, wherein the confidence coefficient of the second position is used for describing the probability of sending the voice for awakening the intelligent sound box at the second position.
The confidence of the wake-up word may correspond to the confidence of the wake-up word, the first direction may correspond to the false wake-up direction, and the second direction may correspond to the wake-up direction.
In one possible implementation manner, the method may further include:
if the sound source position is in the first position set and the second position set or the sound source position is not in the first position set and the second position set, extracting a user voiceprint according to the sound;
judging whether the first voiceprint set comprises user voiceprints or not; the first voiceprint set comprises a first voiceprint which is used for recording a user voiceprint of sound for awakening the electronic equipment;
and judging whether to awaken the electronic equipment or not according to the confidence coefficient of the first voiceprint corresponding to the voiceprint of the user.
In a possible implementation manner, before calculating the sound source direction of the sound, the method may further include: judging that the confidence of the awakening word is smaller than a second threshold; the second threshold is greater than the first threshold.
In a possible implementation manner, determining whether to wake up the electronic device according to the confidence of the first orientation corresponding to the sound source orientation may include:
judging whether the confidence of the first azimuth corresponding to the sound source azimuth is smaller than a threshold value a;
if the value is smaller than the threshold value a, judging that the electronic equipment is not awakened;
and if the judgment result is not less than the threshold value a, the electronic equipment is awakened.
In a possible implementation manner, determining whether to wake up the electronic device according to the confidence of the second orientation corresponding to the sound source orientation may include:
judging whether the confidence of a second azimuth corresponding to the sound source azimuth is smaller than a threshold b;
if the value is less than the threshold value b, judging that the electronic equipment is not awakened;
and if the judgment result is not less than the threshold b, the electronic equipment is awakened.
In a possible implementation manner, determining whether to wake up the electronic device according to a confidence level of a first voiceprint corresponding to a voiceprint of a user may include:
judging whether the confidence of a first voiceprint corresponding to the user voiceprint is smaller than a threshold value c;
if the current value is less than the threshold value c, judging that the electronic equipment is not awakened;
and if the judgment result is not less than the threshold value c, the electronic equipment is awakened.
In one possible implementation manner, the method may further include:
if the judgment result is that the electronic equipment is awakened and the second direction set comprises the sound source direction of the sound, improving the confidence coefficient of the second direction corresponding to the sound source direction;
and if the judgment result is that the electronic equipment is awakened and the sound source position of the sound is not included in the second position set, storing the sound source position as a second position into the second position set, and setting an initial confidence coefficient for the second position.
In one possible implementation manner, the method may further include:
if the judgment result is that the electronic equipment is awakened and the first voiceprint set comprises the vocal user voiceprint, improving the confidence of the first voiceprint corresponding to the vocal print of the user;
and if the judgment result is that the electronic equipment is awakened and the user voiceprint without sound in the first voiceprint set is not included, storing the user voiceprint as the first voiceprint in the first voiceprint set, and setting an initial confidence coefficient for the first voiceprint.
In one possible implementation manner, the method may further include:
if the judgment result is that the electronic equipment is not awakened and the first azimuth set comprises the sound source azimuth of the sound, reducing the confidence coefficient of the first azimuth comprising the sound source azimuth;
and if the judgment result is that the electronic equipment is not awakened and the sound source position of the sound is not included in the first position set, storing the sound source position as the first position in the first position set, and setting an initial confidence coefficient for the first position.
The specific implementation of fig. 8 may refer to the embodiment shown in fig. 5, and is not described herein again.
Fig. 9 is a flowchart of another embodiment of the wake-up method of the present application. The method may be applied to an electronic device such as the smart speaker described above. The method can comprise the following steps:
step 901: receiving sound, and calculating the confidence of the awakening words of the sound; the awakening word confidence coefficient is used for describing the probability that the sound comprises the awakening word sound;
step 902: if the confidence of the awakening word is larger than or equal to a first threshold value, calculating the sound source direction of the sound;
step 903: judging whether the sound source position is in a first position set; the first azimuth set comprises a first azimuth used for recording the azimuth of a sound source of the sound which does not wake up the electronic equipment;
step 904: and if the sound source position is in the first position set, judging whether to awaken the electronic equipment or not according to the confidence level of the first position corresponding to the sound source position, wherein the confidence level of the first position is used for describing the probability of sending out the voice for awakening the electronic equipment at the first position.
In one possible implementation manner, the method may further include:
if the sound source position is not in the first position set, extracting a user voiceprint according to the sound;
judging whether the first voiceprint set comprises user voiceprints or not; the first voiceprint set comprises a first voiceprint which is used for recording a user voiceprint of sound for awakening the electronic equipment;
and judging whether to awaken the electronic equipment or not according to the confidence coefficient of the first voiceprint corresponding to the voiceprint of the user.
In a possible implementation manner, before calculating the sound source direction of the sound, the method may further include:
judging that the confidence of the awakening word is smaller than a second threshold; the second threshold is greater than the first threshold.
In one possible implementation manner, determining whether to wake up the electronic device according to a confidence level of a first orientation corresponding to a sound source orientation includes:
judging whether the confidence of the first azimuth corresponding to the sound source azimuth is smaller than a threshold value a;
if the value is smaller than the threshold value a, judging that the electronic equipment is not awakened;
and if the judgment result is not less than the threshold value a, the electronic equipment is awakened.
In a possible implementation manner, determining whether to wake up the electronic device according to a confidence level of a first voiceprint corresponding to a voiceprint of a user may include:
judging whether the confidence of a first voiceprint corresponding to the user voiceprint is smaller than a threshold value c;
if the current value is less than the threshold value c, judging that the electronic equipment is not awakened;
and if the judgment result is not less than the threshold value c, the electronic equipment is awakened.
In one possible implementation manner, the method may further include:
if the judgment result is that the electronic equipment is awakened and the first voiceprint set comprises the vocal user voiceprint, improving the confidence of the first voiceprint corresponding to the vocal print of the user;
and if the judgment result is that the electronic equipment is awakened and the user voiceprint without sound in the first voiceprint set is not included, storing the user voiceprint as the first voiceprint in the first voiceprint set, and setting an initial confidence coefficient for the first voiceprint.
In one possible implementation manner, the method may further include:
if the judgment result is that the electronic equipment is not awakened and the first azimuth set comprises the sound source azimuth of the sound, reducing the confidence coefficient of the first azimuth comprising the sound source azimuth;
and if the judgment result is that the electronic equipment is not awakened and the sound source position of the sound is not included in the first position set, storing the sound source position as the first position in the first position set, and setting an initial confidence coefficient for the first position.
The specific implementation of fig. 9 may refer to the embodiment shown in fig. 6, and details are not repeated here.
Fig. 10 is a flowchart of a wake-up method according to another embodiment of the present application. The method may be applied to an electronic device such as the smart speaker described above. The method can comprise the following steps:
step 1001: receiving sound, and calculating the confidence of the awakening words of the sound; the awakening word confidence coefficient is used for describing the probability that the sound comprises the awakening word sound;
step 1002: if the confidence of the awakening word is larger than or equal to a first threshold value, calculating the sound source direction of the sound;
step 1003: judging whether the sound source azimuth is in a second azimuth set; wherein the second set of bearings includes a second bearing for recording a bearing of a sound source of sound waking up the electronic device;
step 1004: and if the sound source position is in the second position set, judging whether to awaken the electronic equipment or not according to the confidence level of the second position corresponding to the sound source position, wherein the confidence level of the second position is used for describing the probability of sending the voice for awakening the electronic equipment at the second position.
In one possible implementation manner, the method may further include:
if the sound source position is not in the second position set, extracting the voiceprint of the user according to the sound;
judging whether the first voiceprint set comprises user voiceprints or not; the first voiceprint set comprises a first voiceprint which is used for recording a user voiceprint of sound for awakening the electronic equipment;
and judging whether to awaken the electronic equipment or not according to the confidence coefficient of the first voiceprint corresponding to the voiceprint of the user.
In a possible implementation manner, before calculating the sound source direction of the sound, the method may further include:
judging that the confidence of the awakening word is smaller than a second threshold; the second threshold is greater than the first threshold.
In a possible implementation manner, determining whether to wake up the electronic device according to the confidence of the second orientation corresponding to the sound source orientation may include:
judging whether the confidence of a second azimuth corresponding to the sound source azimuth is smaller than a threshold b;
if the value is less than the threshold value b, judging that the electronic equipment is not awakened;
and if the judgment result is not less than the threshold b, the electronic equipment is awakened.
In a possible implementation manner, determining whether to wake up the electronic device according to a confidence level of a first voiceprint corresponding to a voiceprint of a user may include:
judging whether the confidence of a first voiceprint corresponding to the user voiceprint is smaller than a threshold value c;
if the current value is less than the threshold value c, judging that the electronic equipment is not awakened;
and if the judgment result is not less than the threshold value c, the electronic equipment is awakened.
In one possible implementation manner, the method may further include:
if the judgment result is that the electronic equipment is awakened and the second direction set comprises the sound source direction of the sound, improving the confidence coefficient of the second direction corresponding to the sound source direction;
and if the judgment result is that the electronic equipment is awakened and the sound source position of the sound is not included in the second position set, storing the sound source position as a second position into the second position set, and setting an initial confidence coefficient for the second position.
In one possible implementation manner, the method may further include:
if the judgment result is that the electronic equipment is awakened and the first voiceprint set comprises the vocal user voiceprint, improving the confidence of the first voiceprint corresponding to the vocal print of the user;
and if the judgment result is that the electronic equipment is awakened and the user voiceprint without sound in the first voiceprint set is not included, storing the user voiceprint as the first voiceprint in the first voiceprint set, and setting an initial confidence coefficient for the first voiceprint.
The specific implementation of fig. 10 may refer to the embodiment shown in fig. 7, and details are not repeated here.
It is to be understood that some or all of the steps or operations in the above-described embodiments are merely examples, and other operations or variations of various operations may be performed by the embodiments of the present application. Further, the various steps may be performed in a different order presented in the above-described embodiments, and it is possible that not all of the operations in the above-described embodiments are performed.
Fig. 11 is a schematic structural diagram of an electronic device provided in an embodiment of the present application. As shown in fig. 11, the electronic device 1100 may include: a calculation unit 1110 and a determination unit 1120.
In one embodiment:
a calculating unit 1110, configured to receive the sound and calculate a confidence of the sound awakening word; the awakening word confidence coefficient is used for describing the probability that the sound comprises the awakening word sound, and if the awakening word confidence coefficient is larger than or equal to a first threshold value, the sound source position of the sound is calculated;
a determining unit 1120 for determining whether the sound source azimuth is in the first azimuth set or the second azimuth set; the first azimuth set comprises a first azimuth used for recording the sound source azimuth of the sound which does not wake up the electronic equipment, the second azimuth set comprises a second azimuth used for recording the sound source azimuth of the sound which wakes up the electronic equipment; if the sound source position is only in the first position set, judging whether to awaken the electronic equipment or not according to the confidence coefficient of the first position corresponding to the sound source position, wherein the confidence coefficient of the first position is used for describing the probability of sending out the voice for awakening the electronic equipment at the first position; and if the sound source position is only in the second position set, judging whether to awaken the electronic equipment or not according to the confidence level of the second position corresponding to the sound source position, wherein the confidence level of the second position is used for describing the probability of sending the voice for awakening the electronic equipment at the second position.
In a possible implementation manner, the determining unit 1120 may be further configured to: if the sound source position is in the first position set and the second position set or the sound source position is not in the first position set and the second position set, extracting a user voiceprint according to the sound; judging whether the first voiceprint set comprises user voiceprints or not; the first voiceprint set comprises a first voiceprint which is used for recording a user voiceprint of sound for awakening the electronic equipment; and judging whether to awaken the electronic equipment or not according to the confidence coefficient of the first voiceprint corresponding to the voiceprint of the user.
In a possible implementation manner, the determining unit 1120 may be further configured to: before the sound source direction of the sound is calculated, judging that the confidence coefficient of the awakening word is smaller than a second threshold value; the second threshold is greater than the first threshold.
In a possible implementation manner, the determining unit 1120 may specifically be configured to: judging whether the confidence of the first azimuth corresponding to the sound source azimuth is smaller than a threshold value a; if the value is smaller than the threshold value a, judging that the electronic equipment is not awakened; and if the judgment result is not less than the threshold value a, the electronic equipment is awakened.
In a possible implementation manner, the determining unit 1120 may specifically be configured to: judging whether the confidence of a second azimuth corresponding to the sound source azimuth is smaller than a threshold b; if the value is less than the threshold value b, judging that the electronic equipment is not awakened; and if the judgment result is not less than the threshold b, the electronic equipment is awakened.
In a possible implementation manner, the determining unit 1120 may specifically be configured to: judging whether the confidence of a first voiceprint corresponding to the user voiceprint is smaller than a threshold value c; if the current value is less than the threshold value c, judging that the electronic equipment is not awakened; and if the judgment result is not less than the threshold value c, the electronic equipment is awakened.
In one possible implementation manner, the method may further include: the updating unit is used for improving the confidence of the second direction corresponding to the sound source direction if the judgment result is that the electronic equipment is awakened and the second direction set comprises the sound source direction of the sound; and if the judgment result is that the electronic equipment is awakened and the sound source position of the sound is not included in the second position set, storing the sound source position as a second position into the second position set, and setting an initial confidence coefficient for the second position.
In a possible implementation manner, the updating unit may be further configured to: if the judgment result is that the electronic equipment is awakened and the first voiceprint set comprises the vocal user voiceprint, improving the confidence of the first voiceprint corresponding to the vocal print of the user; and if the judgment result is that the electronic equipment is awakened and the user voiceprint without sound in the first voiceprint set is not included, storing the user voiceprint as the first voiceprint in the first voiceprint set, and setting an initial confidence coefficient for the first voiceprint.
In a possible implementation manner, the updating unit may be further configured to: if the judgment result is that the electronic equipment is not awakened and the first azimuth set comprises the sound source azimuth of the sound, reducing the confidence coefficient of the first azimuth comprising the sound source azimuth; and if the judgment result is that the electronic equipment is not awakened and the sound source position of the sound is not included in the first position set, storing the sound source position as the first position in the first position set, and setting an initial confidence coefficient for the first position.
In another embodiment:
a calculation unit 1110 for receiving sound; calculating the confidence of the awakening words of the sound; the awakening word confidence coefficient is used for describing the probability that the sound comprises the awakening word sound; if the confidence of the awakening word is larger than or equal to a first threshold value, calculating the sound source direction of the sound;
a determining unit 1120 for determining whether the sound source azimuth is in the first azimuth set; the first azimuth set comprises a first azimuth used for recording the azimuth of a sound source of the sound which does not wake up the electronic equipment; and if the sound source position is in the first position set, judging whether to awaken the electronic equipment or not according to the confidence level of the first position corresponding to the sound source position, wherein the confidence level of the first position is used for describing the probability of sending out the voice for awakening the electronic equipment at the first position.
In a possible implementation manner, the determining unit 1120 may be further configured to: if the sound source position is not in the first position set, extracting a user voiceprint according to the sound; judging whether the first voiceprint set comprises user voiceprints or not; the first voiceprint set comprises a first voiceprint which is used for recording a user voiceprint of sound for awakening the electronic equipment; and judging whether to awaken the electronic equipment or not according to the confidence coefficient of the first voiceprint corresponding to the voiceprint of the user.
In a possible implementation manner, the determining unit 1120 may be further configured to: before the sound source direction of the sound is calculated, judging that the confidence coefficient of the awakening word is smaller than a second threshold value; the second threshold is greater than the first threshold.
In a possible implementation manner, the determining unit 1120 may specifically be configured to: judging whether the confidence of the first azimuth corresponding to the sound source azimuth is smaller than a threshold value a; if the value is smaller than the threshold value a, judging that the electronic equipment is not awakened; and if the judgment result is not less than the threshold value a, the electronic equipment is awakened.
In a possible implementation manner, the determining unit 1120 may specifically be configured to: judging whether the confidence of a first voiceprint corresponding to the user voiceprint is smaller than a threshold value c; if the current value is less than the threshold value c, judging that the electronic equipment is not awakened; and if the judgment result is not less than the threshold value c, the electronic equipment is awakened.
In one possible implementation manner, the method may further include:
the updating unit is used for improving the confidence of the first voiceprint corresponding to the voiceprint of the user if the judgment result is that the electronic equipment is awakened and the first voiceprint set comprises the voiceprint of the user with sound; and if the judgment result is that the electronic equipment is awakened and the user voiceprint without sound in the first voiceprint set is not included, storing the user voiceprint as the first voiceprint in the first voiceprint set, and setting an initial confidence coefficient for the first voiceprint.
In a possible implementation manner, the updating unit may be further configured to: if the judgment result is that the electronic equipment is not awakened and the first azimuth set comprises the sound source azimuth of the sound, reducing the confidence coefficient of the first azimuth comprising the sound source azimuth; and if the judgment result is that the electronic equipment is not awakened and the sound source position of the sound is not included in the first position set, storing the sound source position as the first position in the first position set, and setting an initial confidence coefficient for the first position.
In yet another embodiment:
a calculation unit 1110 for receiving sound; calculating the confidence of the awakening words of the sound; the awakening word confidence coefficient is used for describing the probability that the sound comprises the awakening word sound; if the confidence of the awakening word is larger than or equal to a first threshold value, calculating the sound source direction of the sound;
a determining unit 1120 for determining whether the sound source position is in the second set of positions; wherein the second set of bearings includes a second bearing for recording a bearing of a sound source of sound waking up the electronic device; and if the sound source position is in the second position set, judging whether to awaken the electronic equipment or not according to the confidence level of the second position corresponding to the sound source position, wherein the confidence level of the second position is used for describing the probability of sending the voice for awakening the electronic equipment at the second position.
In a possible implementation manner, the determining unit 1120 may be further configured to: if the sound source position is not in the second position set, extracting the voiceprint of the user according to the sound; judging whether the first voiceprint set comprises user voiceprints or not; the first voiceprint set comprises a first voiceprint which is used for recording a user voiceprint of sound for awakening the electronic equipment; and judging whether to awaken the electronic equipment or not according to the confidence coefficient of the first voiceprint corresponding to the voiceprint of the user.
In a possible implementation manner, the determining unit 1120 may be further configured to: before the sound source direction of the sound is calculated, judging that the confidence coefficient of the awakening word is smaller than a second threshold value; the second threshold is greater than the first threshold.
In a possible implementation manner, the determining unit 1120 may specifically be configured to: judging whether the confidence of a second azimuth corresponding to the sound source azimuth is smaller than a threshold b; if the value is less than the threshold value b, judging that the electronic equipment is not awakened; and if the judgment result is not less than the threshold b, the electronic equipment is awakened.
In a possible implementation manner, the determining unit 1120 may specifically be configured to: judging whether the confidence of a first voiceprint corresponding to the user voiceprint is smaller than a threshold value c; if the current value is less than the threshold value c, judging that the electronic equipment is not awakened; and if the judgment result is not less than the threshold value c, the electronic equipment is awakened.
In one possible implementation manner, the method may further include:
the updating unit is used for improving the confidence of the second direction corresponding to the sound source direction if the judgment result is that the electronic equipment is awakened and the second direction set comprises the sound source direction of the sound; and if the judgment result is that the electronic equipment is awakened and the sound source position of the sound is not included in the second position set, storing the sound source position as a second position into the second position set, and setting an initial confidence coefficient for the second position.
In a possible implementation manner, the updating unit may be further configured to: if the judgment result is that the electronic equipment is awakened and the first voiceprint set comprises the vocal user voiceprint, improving the confidence of the first voiceprint corresponding to the vocal print of the user; and if the judgment result is that the electronic equipment is awakened and the user voiceprint without sound in the first voiceprint set is not included, storing the user voiceprint as the first voiceprint in the first voiceprint set, and setting an initial confidence coefficient for the first voiceprint.
The electronic device provided in the embodiment shown in fig. 11 may be used to implement the technical solutions of the method embodiments shown in fig. 5 to fig. 7 of the present application, and the implementation principles and technical effects of the technical solutions may further refer to the related descriptions in the method embodiments.
It should be understood that the division of the units of the apparatus shown in fig. 11 is merely a logical division, and the actual implementation may be wholly or partially integrated into one physical entity or may be physically separated. And these units can be implemented entirely in software, invoked by a processing element; or may be implemented entirely in hardware; part of the units can also be realized in the form of software called by a processing element, and part of the units can be realized in the form of hardware. For example, the obtaining unit may be a processing element separately set up, or may be implemented by being integrated in a certain chip of the electronic device. The other units are implemented similarly. In addition, all or part of the units can be integrated together or can be independently realized. In implementation, the steps of the method or the units above may be implemented by hardware integrated logic circuits in a processor element or instructions in software.
An embodiment of the present application further provides an electronic device, including: a processor; a memory; and a computer program, wherein the computer program is stored in the memory, the computer program comprising instructions which, when executed by the apparatus, cause the apparatus to perform the methods shown in figures 5 to 7.
Embodiments of the present application further provide a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a computer, the computer is enabled to execute the method provided by the embodiments shown in fig. 5 to 7 of the present application.
Embodiments of the present application further provide a computer program product, which includes a computer program, when the computer program runs on a computer, the computer is caused to execute the method provided by the embodiments shown in fig. 5 to 7 of the present application.
Those of ordinary skill in the art will appreciate that the various elements and algorithm steps described in connection with the embodiments disclosed herein can be implemented as electronic hardware, computer software, or combinations of electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It is clear to those skilled in the art that, for convenience and brevity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, any function, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
The above description is only for the specific embodiments of the present application, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present disclosure, and all the changes or substitutions should be covered by the protection scope of the present application. The protection scope of the present application shall be subject to the protection scope of the claims.

Claims (21)

1. A wake-up method applied to an electronic device including a sound pickup and a speaker, the sound pickup including a plurality of microphones, the method comprising:
receiving a sound;
calculating the confidence of the awakening words of the sound; the wake word confidence is used for representing the probability that the sound comprises a wake word;
after the confidence coefficient of the awakening word is larger than or equal to a first threshold value, calculating the sound source orientation of the sound;
after the sound source bearing is matched to one of a first set of bearings, and,
awakening the electronic equipment after the first party position reliability corresponding to the matched first party is greater than or equal to a third threshold value; alternatively, the first and second electrodes may be,
after the first party position reliability corresponding to the matched first party is smaller than a third threshold value, the electronic equipment is not awakened;
the awakening word is used for awakening the electronic equipment; the sound source orientation is a direction and a position of the sound source relative to the electronic device; the first orientation set comprises M first orientation elements, each first orientation element comprising a first orientation and a first-party position confidence; the first orientation is a direction and a position of a sound source waking up the electronic device relative to the electronic device, and is used for representing that the electronic device is woken up in the first orientation; the first party location confidence is used for representing the probability of awakening the electronic equipment at the first party; m is a positive integer greater than or equal to 1.
2. The method of claim 1,
after the confidence coefficient of the awakening word is larger than or equal to a first threshold value, calculating the direction of a sound source corresponding to the sound; the method comprises the following steps:
and after the confidence coefficient of the awakening word is larger than or equal to a first threshold value and the confidence coefficient of the awakening word is smaller than a second threshold value, calculating the position of the sound source corresponding to the sound.
3. The method according to claim 1 or 2, wherein the sound source position matches one first position in a first set of positions; the method comprises the following steps:
the direction of the sound source azimuth relative to the electronic equipment and the direction of one first azimuth in the first azimuth set relative to the electronic equipment have an angular deviation within a preset fourth threshold; and the number of the first and second electrodes,
and the position deviation of the sound source azimuth relative to the position of the electronic equipment and the position deviation of the first azimuth relative to the position of the electronic equipment are within a preset fifth threshold value.
4. The method according to any one of claims 1-3, further comprising:
after the sound source direction does not match any first direction in the first direction set, then
Extracting a voiceprint from the sound;
after the voiceprint matches one of the first set of voiceprints, and,
awakening the electronic equipment after the first voiceprint confidence corresponding to the first voiceprint is greater than or equal to a preset sixth threshold; alternatively, the first and second electrodes may be,
not waking up the electronic equipment after the first voiceprint confidence corresponding to the first voiceprint is smaller than a preset sixth threshold;
the first voiceprint set comprises L voiceprint elements, each voiceprint element comprises a first voiceprint and a first voiceprint confidence, the first voiceprint is used for representing a voiceprint for waking up the electronic equipment, and the first voiceprint confidence is used for representing the probability of waking up the electronic equipment by the first voiceprint; l is a positive integer of 1 or more.
5. The method of any of claims 1-4, wherein after waking the electronic device, the method further comprises: updating the first set of orientations and the first set of voiceprints.
6. The method according to any one of claims 2 to 5,
and awakening the electronic equipment and updating the first azimuth set and the first voiceprint set after the confidence coefficient of the awakening word is larger than or equal to a second threshold value.
7. A wake-up method applied to an electronic device including a sound pickup and a speaker, the sound pickup including a plurality of microphones, the method comprising:
receiving a sound;
calculating the confidence of the awakening words of the sound; the wake word confidence is used for representing the probability that the sound comprises a wake word;
after the confidence coefficient of the awakening word is larger than or equal to a first threshold value, calculating the sound source orientation of the sound;
after the sound source bearing is matched to one of a second set of bearings, and,
awakening the electronic equipment after the confidence coefficient of the second position corresponding to the matched second position is larger than or equal to a seventh threshold value; alternatively, the first and second electrodes may be,
when the confidence of the second position corresponding to the matched second position is smaller than a seventh threshold value, the electronic equipment is not awakened;
the awakening word is used for awakening the electronic equipment; the sound source orientation is a direction and a position of the sound source relative to the electronic device; the second bearing set comprises N second bearing elements, each second bearing element comprising a second bearing and a second bearing confidence, the second bearing being the direction and position of a sound source relative to the electronic device that does not wake up the electronic device, the second bearing indicating that the electronic device is not woken up at the second bearing, the second bearing confidence indicating the probability that the electronic device is not woken up at the second bearing; n is a positive integer greater than or equal to 1.
8. The method of claim 7, wherein the sound source bearing matches one of a second set of bearings; the method comprises the following steps:
the angular deviation between the direction of the sound source azimuth relative to the electronic equipment and the direction of one second azimuth in the second azimuth set relative to the electronic equipment is within a preset eighth threshold; and the number of the first and second electrodes,
and the position deviation of the sound source azimuth relative to the position of the electronic equipment and the position deviation of the second azimuth relative to the position of the electronic equipment are within a preset ninth threshold value.
9. The method according to claim 7 or 8, characterized in that the method further comprises:
after the sound source direction does not match any second direction in the second direction set, then
Extracting a voiceprint from the sound;
updating the second orientation set after the voiceprint is not matched with any one of the first voiceprints in the first voiceprint set;
the first voiceprint set comprises L voiceprint elements, each voiceprint element comprises a first voiceprint and a first voiceprint confidence, the first voiceprint is used for representing a voiceprint for waking up the electronic equipment, and the first voiceprint confidence is used for representing the probability of waking up the electronic equipment by the first voiceprint; l is a positive integer of 1 or more.
10. The method according to claim 7 or 8, characterized in that the method further comprises:
after the sound source direction does not match any second direction in the second direction set, then
Extracting a voiceprint from the sound;
after the voiceprint matches one of the first set of voiceprints, and,
awakening the electronic equipment after the first voiceprint confidence corresponding to the first voiceprint is greater than or equal to a preset tenth threshold; alternatively, the first and second electrodes may be,
when the first voiceprint confidence corresponding to the first voiceprint is smaller than a preset tenth threshold, not waking up the electronic equipment, and updating the second orientation set;
the first voiceprint set comprises L voiceprint elements, each voiceprint element comprises a first voiceprint and a first voiceprint confidence, the first voiceprint confidence is used for representing the probability that the first voiceprint wakes up the electronic equipment, and the first voiceprint is used for representing the voiceprint that wakes up the electronic equipment; l is a positive integer of 1 or more.
11. The method according to any one of claims 7 to 10,
after waking the electronic device, the method further comprises: updating the first set of voiceprints;
after not waking the electronic device, the method further comprises: updating the second set of orientations.
12. The method according to any one of claims 9 to 11,
and awakening the electronic equipment and updating the first voiceprint set after the confidence coefficient of the awakening word is larger than or equal to a second threshold value.
13. A wake-up method applied to an electronic device including a sound pickup and a speaker, the sound pickup including a plurality of microphones, the method comprising:
receiving a sound;
calculating the confidence of the awakening words of the sound; the wake word confidence is used for representing the probability that the sound comprises a wake word;
after the confidence coefficient of the awakening word is larger than or equal to a first threshold value, calculating the sound source orientation of the sound;
after the sound source bearing matches one of the second set of bearings and after the sound source bearing does not match any of the first set of bearings, an
When the confidence of the second position corresponding to the matched second position is larger than or equal to an eleventh threshold value, awakening the electronic equipment; alternatively, the first and second electrodes may be,
when the confidence of the second position corresponding to the matched second position is smaller than an eleventh threshold, the electronic equipment is not awakened;
the awakening word is used for awakening the electronic equipment; the sound source orientation is a direction and a position of the sound source relative to the electronic device; the first orientation set comprises M first orientation elements, each first orientation element comprising a first orientation and a first-party position confidence; the first orientation is a direction and a position of a sound source waking up the electronic device relative to the electronic device, and is used for representing that the electronic device is woken up in the first orientation; the first party location confidence is used for representing the probability of awakening the electronic equipment at the first party; the second orientation set comprises N second orientation elements, each second orientation element comprising a second orientation and a second orientation confidence; the second orientation is a direction and a position of a sound source relative to the electronic device that does not wake up the electronic device, indicating that the electronic device is not woken up in the second orientation; the second orientation confidence is used to represent a confidence that the electronic device was not woken up at the second orientation; m and N are positive integers greater than or equal to 1.
14. The method of claim 13,
the sound source bearing is matched with one second bearing in a second bearing set; the method comprises the following steps:
the sound source azimuth is relative to the direction of the electronic equipment, and the angular deviation of one second azimuth in the second azimuth set relative to the direction of the electronic equipment is within a preset twelfth threshold; and the number of the first and second electrodes,
the position deviation of the sound source azimuth relative to the position of the electronic equipment and the position deviation of the second azimuth relative to the position of the electronic equipment are within a preset thirteenth threshold value;
the sound source position does not match any first position in the first position set; the method comprises the following steps:
the direction of the sound source azimuth relative to the electronic equipment and the direction of any one first azimuth in the first azimuth set relative to the electronic equipment have angular deviations of two directions which are not within a preset fourteenth threshold value; and the number of the first and second electrodes,
and the position deviation of the sound source azimuth relative to the position of the electronic equipment and the position deviation of any one of the first azimuth in the first azimuth set relative to the position of the electronic equipment are not within a preset fifteenth threshold value.
15. The method of claim 13, further comprising:
after the sound source bearing matches one of a first set of bearings and after the sound source bearing does not match any of a second set of bearings, an
Awakening the electronic equipment after the first party position reliability corresponding to the matched first party is greater than or equal to a sixteenth threshold value; alternatively, the first and second electrodes may be,
and after the first party position reliability corresponding to the matched first party is smaller than a sixteenth threshold value, not waking up the electronic equipment.
16. The method of claim 15,
the sound source position is matched with one first position in a first position set; the method comprises the following steps:
the direction of the sound source azimuth relative to the electronic equipment and the direction of one first azimuth in the first azimuth set relative to the electronic equipment have an angular deviation within a preset fourteenth threshold value; and the number of the first and second electrodes,
the position of the sound source azimuth relative to the electronic equipment and the position deviation of the first azimuth relative to the electronic equipment are within a preset fifteenth threshold;
the sound source bearing does not match any second bearing in a second set of bearings; the method comprises the following steps:
the angular deviation between the direction of the sound source azimuth relative to the electronic equipment and the direction of any one second azimuth in the second azimuth set relative to the electronic equipment is not within a preset twelfth threshold; and the number of the first and second electrodes,
and the position deviation of the sound source position relative to the position of the electronic equipment and any one second position in the second position set relative to the position of the electronic equipment are not within a preset thirteenth threshold value.
17. The method of claim 13, further comprising:
after the sound source position does not match any second position in the second position set and after the sound source position does not match any first position in the first position set, then
Extracting a voiceprint from the sound;
after the voiceprint matches one of the first set of voiceprints, and,
when the confidence of the first voiceprint corresponding to the first voiceprint is greater than or equal to a preset sixteenth threshold, waking up the electronic equipment, and updating the first azimuth combination and the first voiceprint set; alternatively, the first and second electrodes may be,
when the first voiceprint confidence corresponding to the first voiceprint is smaller than a preset sixteenth threshold, not waking up the electronic equipment, and updating the second orientation set;
the first voiceprint set comprises L voiceprint elements, each voiceprint element comprises a first voiceprint and a first voiceprint confidence, the first voiceprint confidence is used for representing the probability that the first voiceprint wakes up the electronic equipment, and the first voiceprint is used for representing the voiceprint that wakes up the electronic equipment; l is a positive integer of 1 or more.
18. The method of claim 17, further comprising:
updating the second orientation set after the voiceprint does not match any one of the first voiceprints in the first set of voiceprints.
19. The method according to any one of claims 13-16, further comprising:
updating the first set of orientations after waking up the electronic device;
updating the second set of orientations after not waking up the electronic device.
20. An electronic device comprising a sound pickup and a speaker, the sound pickup comprising a plurality of microphones, the electronic device further comprising:
a processor;
a memory;
and a computer program, wherein the computer program is stored in the memory, which when executed by the processor causes the electronic device to perform the method of any of claims 1-19.
21. A computer-readable storage medium comprising a computer program which, when run on an electronic device, causes the electronic device to perform the method of any one of claims 1-19, wherein the electronic device comprises a microphone and a speaker, the microphone comprising a plurality of microphones.
CN202011063583.4A 2020-09-30 2020-09-30 Electronic equipment and awakening method thereof Pending CN114360546A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202011063583.4A CN114360546A (en) 2020-09-30 2020-09-30 Electronic equipment and awakening method thereof
PCT/CN2021/120305 WO2022068694A1 (en) 2020-09-30 2021-09-24 Electronic device and wake-up method thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011063583.4A CN114360546A (en) 2020-09-30 2020-09-30 Electronic equipment and awakening method thereof

Publications (1)

Publication Number Publication Date
CN114360546A true CN114360546A (en) 2022-04-15

Family

ID=80951134

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011063583.4A Pending CN114360546A (en) 2020-09-30 2020-09-30 Electronic equipment and awakening method thereof

Country Status (2)

Country Link
CN (1) CN114360546A (en)
WO (1) WO2022068694A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115376524A (en) * 2022-07-15 2022-11-22 荣耀终端有限公司 Voice awakening method, electronic equipment and chip system
US11762052B1 (en) * 2021-09-15 2023-09-19 Amazon Technologies, Inc. Sound source localization

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6276503B2 (en) * 2012-12-28 2018-02-07 アルパイン株式会社 Audio equipment
CN108800473A (en) * 2018-07-20 2018-11-13 珠海格力电器股份有限公司 Control method and device, storage medium and the electronic device of equipment
CN110428810B (en) * 2019-08-30 2020-10-30 北京声智科技有限公司 Voice wake-up recognition method and device and electronic equipment
CN110727821A (en) * 2019-10-12 2020-01-24 深圳海翼智新科技有限公司 Method, apparatus, system and computer storage medium for preventing device from being awoken by mistake

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11762052B1 (en) * 2021-09-15 2023-09-19 Amazon Technologies, Inc. Sound source localization
CN115376524A (en) * 2022-07-15 2022-11-22 荣耀终端有限公司 Voice awakening method, electronic equipment and chip system

Also Published As

Publication number Publication date
WO2022068694A1 (en) 2022-04-07

Similar Documents

Publication Publication Date Title
CN104902086B (en) Alarm clock ringing method and device
CN113873378B (en) Earphone noise processing method and device and earphone
CN107580113B (en) Reminding method, device, storage medium and terminal
CN112397062A (en) Voice interaction method, device, terminal and storage medium
WO2014008843A1 (en) Method for updating voiceprint feature model and terminal
CN109284080B (en) Sound effect adjusting method and device, electronic equipment and storage medium
CN108320751B (en) Voice interaction method, device, equipment and server
CN107371102B (en) Audio playing volume control method and device, storage medium and mobile terminal
WO2020006711A1 (en) Message playing method and terminal
WO2022068694A1 (en) Electronic device and wake-up method thereof
CN110556127A (en) method, device, equipment and medium for detecting voice recognition result
CN110830368B (en) Instant messaging message sending method and electronic equipment
CN114189790B (en) Audio information processing method, electronic device, system, product and medium
CN114299933A (en) Speech recognition model training method, device, equipment, storage medium and product
US20230345196A1 (en) Augmented reality interaction method and electronic device
CN111276122A (en) Audio generation method and device and storage medium
CN111081275B (en) Terminal processing method and device based on sound analysis, storage medium and terminal
WO2022267468A1 (en) Sound processing method and apparatus thereof
WO2022143258A1 (en) Voice interaction processing method and related apparatus
CN114333774A (en) Speech recognition method, speech recognition device, computer equipment and storage medium
CN111081102B (en) Dictation result detection method and learning equipment
CN114765026A (en) Voice control method, device and system
CN113965643A (en) Screen state control method of mobile terminal, mobile terminal and medium
CN107682579B (en) Incoming call reminding control method and device, storage medium and mobile terminal
CN116828102B (en) Recording method, recording device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination