CN111696562B - Voice wake-up method, device and storage medium - Google Patents

Voice wake-up method, device and storage medium Download PDF

Info

Publication number
CN111696562B
CN111696562B CN202010353897.1A CN202010353897A CN111696562B CN 111696562 B CN111696562 B CN 111696562B CN 202010353897 A CN202010353897 A CN 202010353897A CN 111696562 B CN111696562 B CN 111696562B
Authority
CN
China
Prior art keywords
awakening
electronic device
wake
threshold
awakened
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010353897.1A
Other languages
Chinese (zh)
Other versions
CN111696562A (en
Inventor
李树为
孙渊
屈伸
蒋幼宇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN202010353897.1A priority Critical patent/CN111696562B/en
Publication of CN111696562A publication Critical patent/CN111696562A/en
Application granted granted Critical
Publication of CN111696562B publication Critical patent/CN111696562B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/08Use of distortion metrics or a particular distance between probe pattern and reference templates
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Business, Economics & Management (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)
  • Electric Clocks (AREA)

Abstract

The embodiment of the application provides a voice awakening method, voice awakening equipment and a storage medium, which are used for solving the problem that the awakening rate of main equipment in a current equipment group is low. The method comprises the following steps: the main device determines whether the main device is awakened or not according to the awakening identification result of the main device and the awakening identification results of the other slave devices in the device group by introducing the awakening identification results of the other slave devices in the device group into the awakening decision of the main device, and the method improves the accuracy of awakening the main device in the device group. Furthermore, the main device can dynamically adjust a preset awakening threshold value of the main device in the device group in a continuous or discrete mode, and determine whether the main device is awakened according to the adjusted awakening threshold value, the dynamically adjusted awakening threshold value is more suitable for the actual state of awakening of the whole device group, the main device makes an awakening decision based on the dynamically adjusted awakening threshold value, and the accuracy of awakening the device is higher.

Description

Voice wake-up method, device and storage medium
Technical Field
The present application relates to the field of terminal technologies, and in particular, to a voice wake-up method, device, and storage medium.
Background
With the rise of intelligent voice interaction, more and more devices support voice interaction functions. The voice wakeup is used as the beginning of voice interaction, and is widely applied to different devices, such as a smart sound box and a smart television. When the space where the user is located has the devices supporting voice awakening, and the user sends awakening voice, the awakened devices can simultaneously respond to the request of the speaker and simultaneously interact with the user. At this time, the user may be confused about which device to interact with.
At present, for the above scenario, the devices capable of being woken up by the user at the same time may be combined into a device group, only the master device in the device group responds to the user's wake-up, and the other slave devices cooperate with the master device to process the user's instruction intention, thereby avoiding multiple devices from being woken up at the same time.
However, there are multiple wakeable devices in the device group, and only the wake-up result of the master device is used as the wake-up result of the whole device group, so that the accuracy of wake-up is not high, for example, the distance between the user and the master device is far away, or external interference exists on the transmission path, which may decrease the wake-up rate of the master device.
Disclosure of Invention
The embodiment of the application provides a voice awakening method, voice awakening equipment and a storage medium, and the awakening rate of main equipment in an equipment group is improved.
In a first aspect, an embodiment of the present application provides a voice wake-up method, where the method is applied to a first electronic device, and the first electronic device and at least one second electronic device belong to a same device group, where the method includes: acquiring a first awakening confidence coefficient of the audio data, wherein the first awakening confidence coefficient is used for indicating the acoustic feature similarity of an awakening word in the audio data determined by the first electronic device and a preset awakening word; receiving a wake-up identification result sent by at least one second electronic device, wherein the wake-up identification result is used for indicating that the at least one second electronic device is allowed or forbidden to be woken up; and determining to allow or prohibit the first electronic equipment to be awakened according to the first awakening confidence level and the awakening identification result.
The first electronic device is a master device in the device group, and the second electronic device is a slave device in the device group.
In the above scheme, the master device determines whether the master device is awakened according to the awakening identification result of the master device itself and the awakening identification results of the other slave devices in the device group by introducing the awakening identification results of the other slave devices in the device group into the awakening decision of the master device. According to the scheme, whether the main equipment in the equipment group executes the awakening response or not can be judged quickly, the awakening identification result of other equipment in the equipment group is fully utilized, and the awakening accuracy of the main equipment in the equipment group is improved.
Optionally, the wake-up recognition result includes at least one of a wake-up identifier of the at least one second electronic device and a second wake-up confidence level; the awakening identification comprises an identification for allowing or forbidding the second electronic equipment to be awakened, and the second awakening confidence coefficient is used for indicating the acoustic feature similarity between the awakening word in the audio data determined by the second electronic equipment and the preset awakening word.
If the wake-up recognition result only includes the wake-up identifier of at least one second electronic device, the first electronic device may count the number of devices of the second electronic device allowed to be woken up and the ratio of the number of devices to the total number of devices in the device group according to the wake-up identifier.
If the wake-up recognition result only includes the second wake-up confidence of at least one second electronic device, the first electronic device first needs to determine whether the second electronic device is allowed to be woken up according to the second wake-up confidence and a wake-up threshold preset by each second electronic device. Then, the number of devices of the second electronic device allowed to be woken up is counted, and the number of the devices accounts for the proportion of the total number of the devices in the device group.
If the wake-up identification result includes the wake-up identification and the second wake-up confidence of at least one second electronic device, the first electronic device may determine, by any one of the above statistical methods, the number of devices of the second electronic device that are allowed to be woken up, and the ratio of the number of devices to the total number of devices in the device group.
In one possible design, determining whether to allow or prohibit the first electronic device from being woken up according to the first wake-up confidence and the wake-up recognition result includes: if the first awakening confidence coefficient is larger than or equal to the first threshold value, determining that the first electronic equipment is allowed to be awakened; or if the first awakening confidence coefficient is smaller than a first threshold value and the first awakening confidence coefficient is larger than a second threshold value, determining to allow or prohibit the first electronic equipment to be awakened according to the first awakening confidence coefficient and the awakening identification result; or if the first awakening confidence coefficient is smaller than or equal to the second threshold value, determining that the first electronic equipment is prohibited from being awakened.
According to the scheme, based on the preset main equipment awakening condition, the main equipment is determined to be allowed or forbidden to be awakened by comparing the first awakening confidence coefficient obtained by calculation of the main equipment with the size relation between the first threshold and the second threshold. If the first awakening confidence degree is between the first threshold value and the second threshold value, the awakening identification result sent by other slave devices in the device group can be combined to determine whether to allow or prohibit the master device to be awakened. According to the scheme, whether the main equipment in the equipment group executes the awakening response or not can be judged quickly, the awakening identification result of other equipment in the equipment group is fully utilized, and the awakening accuracy of the main equipment in the equipment group is improved.
In one possible design, determining whether to allow or prohibit the first electronic device from being woken up according to the first wake-up confidence and the wake-up recognition result includes: counting the awakening condition of at least one second electronic device according to the awakening identification result; and if the awakening condition meets the preset awakening condition of the first electronic equipment, determining that the first electronic equipment is allowed to be awakened.
Optionally, the wake-up condition includes any one of the following: the second awakening confidence degrees of all the second electronic devices in the device group except the first electronic device are greater than or equal to a third threshold; the proportion of the number of the devices of the second electronic device which is allowed to be awakened in the device group to the total number of the devices of the device group is greater than or equal to a first proportion; the proportion of the number of the devices of the second electronic device allowed to be awakened in the device group to the total number of the devices of the device group is smaller than a first proportion and larger than a second proportion, and second awakening confidence degrees of the second electronic devices allowed to be awakened are all larger than or equal to a third threshold; the third threshold is a preset threshold allowing the second electronic device to be awakened in the first electronic device.
It should be noted that, in order to improve the accuracy of the determination, it may be checked whether an electronic device whose second wake confidence is lower than a third threshold exists in the second electronic devices that are allowed to be woken, and if the electronic device exists, it is determined that the first electronic device is prohibited from being woken. The third awakening condition can effectively avoid that the main equipment is awakened by mistake due to the fact that the preset awakening threshold value set by part of the slave equipment is too low, and the awakening accuracy of the main equipment is improved.
In the above scheme, the first electronic device counts the wake-up condition of the slave devices in the device group through the wake-up identification result sent by at least one second electronic device, determines whether the wake-up condition of the slave devices meets a preset wake-up condition, determines that the first electronic device is allowed to be woken up if the preset wake-up condition is met, and determines that the first electronic device is prohibited from being woken up if the preset wake-up condition is not met. According to the scheme, the awakening identification results of other slave devices in the equipment group are fully considered, the awakening condition of the main device in the equipment group is optimized, and the awakening accuracy of the main device in the equipment group is improved.
Optionally, the wake-up recognition result further includes a device identifier of at least one second electronic device, where the device identifier is used to indicate a device type of the second electronic device, and determine a weight value of the second electronic device.
In one possible design, determining whether to allow or prohibit the first electronic device from being woken up according to the first wake-up confidence and the wake-up recognition result includes: counting a first weight value of a second electronic device allowed to be awakened in the device group according to the awakening identification result; counting second weighted values of all second electronic equipment in the equipment group; adjusting a first threshold based on the first weight value and the second weight value; determining to allow or prohibit the first electronic device from being awakened according to the first awakening confidence coefficient and the adjusted first threshold value; the first threshold is a threshold allowing the first electronic device to be awakened.
According to the scheme, the weight value of the electronic equipment in the equipment group is introduced, and the weight value is used for indicating the confidence level of the awakening recognition result of the electronic equipment. The weight value may be a value related to a device type of the electronic device, and may also be a value related to software/hardware performance of the electronic device. For example, the weight value of the smart television is 0.3, the weight value of the smart speaker is 0.6, and the weight value of the smart lamp is 0.1. For another example, the weighting value of speaker Pro is 0.5, the weighting value of regular speaker Pro is 0.3, and the weighting value of speaker mini is 0.2.
According to the scheme, from the perspective of the awakening threshold, the confidence level of the awakening identification result of the slave equipment allowed to be awakened is comprehensively analyzed by combining with the actual awakening condition of the slave equipment in the equipment group, then the awakening threshold of the master equipment in the equipment group is dynamically adjusted on the basis of a continuous mode, and whether the current master equipment needs to make an awakening response or not is judged according to the comparison result of the adjusted awakening threshold and the awakening confidence level calculated by the current master equipment. The adjusted awakening threshold value of the main equipment is more suitable for the actual state of the whole equipment group which is awakened, so that the awakening accuracy of the main equipment in the equipment group is improved.
Optionally, the first weight value is determined according to the number of devices corresponding to the device type of the second electronic device allowed to be awakened in the device group, and the weight value corresponding to each device type of the second electronic device allowed to be awakened; the second weight value is determined according to the number of devices corresponding to the device types of all the second electronic devices in the device group and the weight value corresponding to each device type.
In one possible design, adjusting the first threshold based on the first weight value and the second weight value includes: and taking the product of the ratio of the first weight value to the second weight value and the maximum threshold value adjusting parameter as a threshold value adjusting parameter, and adjusting the first threshold value according to the threshold value adjusting parameter.
In one possible design, determining whether to allow or prohibit the first electronic device from being woken up according to the first wake-up confidence and the wake-up recognition result includes: counting the number of the second electronic equipment allowed to be awakened in the equipment group according to the awakening identification result; determining a threshold adjustment parameter corresponding to the number of the devices according to an awakening threshold adjustment table, wherein the awakening threshold adjustment table comprises a corresponding relation between the number of the devices of the second electronic device allowed to be awakened and the threshold adjustment parameter; adjusting the first threshold based on the threshold adjustment parameter; and determining to allow or prohibit the first electronic equipment to be awakened according to the first awakening confidence level and the adjusted first threshold value.
The above scheme also starts from the angle of the wake-up threshold, combines with the actual wake-up situation of the slave devices in the device group, counts the number or proportion of the devices of the slave devices allowed to be woken up, further dynamically adjusts the wake-up threshold of the master device in the device group based on a discrete mode, and determines whether the current master device needs to make a wake-up response according to the comparison result of the adjusted wake-up threshold and the wake-up confidence calculated by the current master device. The adjusted awakening threshold value of the main equipment is more suitable for the actual state of the whole equipment group which is awakened, so that the awakening accuracy of the main equipment in the equipment group is improved.
In one possible design, determining whether to allow or prohibit the first electronic device from being woken up according to the first wake-up confidence and the adjusted first threshold includes: if the first awakening confidence coefficient is larger than or equal to the adjusted first threshold value, determining that the first electronic equipment is allowed to be awakened; or if the first awakening confidence coefficient is smaller than the adjusted first threshold, determining to prohibit the first electronic device from being awakened.
In a second aspect, an embodiment of the present application provides a voice wake-up method, where the method is applied to a first electronic device, and the first electronic device and at least one second electronic device belong to a same device group, where the method includes: acquiring a first awakening confidence coefficient of the audio data, wherein the first awakening confidence coefficient is used for indicating the acoustic feature similarity of an awakening word in the audio data determined by the first electronic device and a preset awakening word; receiving a wakeup identification result sent by at least one second electronic device, wherein the wakeup identification result is used for indicating that the second electronic device is allowed or forbidden to be awakened; adjusting a first threshold value according to the awakening identification result, wherein the first threshold value is a threshold value allowing the first electronic equipment to be awakened; and determining to allow or prohibit the first electronic equipment from being awakened according to the adjusted first threshold and the adjusted first awakening confidence coefficient.
In one possible design, adjusting the first threshold according to the wake-up recognition result includes: counting a first weight value of a second electronic device allowed to be awakened in the device group according to the awakening identification result; counting second weighted values of all second electronic equipment in the equipment group; the first threshold is adjusted based on the first weight value and the second weight value.
Optionally, the first weight value is determined according to the number of devices corresponding to the device type of the second electronic device allowed to be awakened in the device group, and the weight value corresponding to each device type of the second electronic device allowed to be awakened; the second weight value is determined according to the number of the devices corresponding to the device types of all the second electronic devices in the device group and the weight value corresponding to each device type.
In one possible design, adjusting the first threshold based on the first weight value and the second weight value includes: and taking the product of the ratio of the first weight value to the second weight value and the maximum threshold value adjusting parameter as a threshold value adjusting parameter, and adjusting the first threshold value according to the threshold value adjusting parameter.
In one possible design, adjusting the first threshold according to the wake up recognition result includes: counting the number of the second electronic equipment allowed to be awakened in the equipment group according to the awakening identification result; determining a threshold adjustment parameter corresponding to the number of the devices according to an awakening threshold adjustment table, wherein the awakening threshold adjustment table comprises a corresponding relation between the number of the devices of the second electronic device allowed to be awakened and the threshold adjustment parameter; the first threshold is adjusted based on the threshold adjustment parameter.
Optionally, the wake-up recognition result includes at least one of a wake-up identifier of the at least one second electronic device and a second wake-up confidence level; the awakening identifier comprises an identifier for allowing or forbidding the second electronic device to be awakened, and the second awakening confidence coefficient is used for indicating the acoustic feature similarity between the awakening word in the audio data determined by the second electronic device and the preset awakening word.
In one possible design, determining whether to allow or prohibit the first electronic device from being woken up according to the adjusted first threshold and the first wake-up confidence level includes: if the first awakening confidence coefficient is larger than or equal to the adjusted first threshold value, determining that the first electronic equipment is allowed to be awakened; or if the first awakening confidence coefficient is smaller than the adjusted first threshold, determining to prohibit the first electronic device from being awakened.
In a third aspect, an embodiment of the present application provides a voice wake-up device, where the voice wake-up device is a first electronic device, and the first electronic device and at least one second electronic device belong to a same device group, and the voice wake-up device includes: the acquiring module is used for acquiring a first awakening confidence coefficient of the audio data, wherein the first awakening confidence coefficient is used for indicating the acoustic feature similarity of an awakening word in the audio data determined by the first electronic device and a preset awakening word; the receiving module is used for receiving a wake-up identification result sent by at least one piece of second electronic equipment, and the wake-up identification result is used for indicating that the at least one piece of second electronic equipment is allowed or forbidden to be woken up; and the processing module is used for determining to allow or prohibit the first electronic equipment to be awakened according to the first awakening confidence coefficient and the awakening identification result.
Optionally, the wake-up recognition result includes at least one of a wake-up identifier of the at least one second electronic device and a second wake-up confidence level; the awakening identification comprises an identification for allowing or forbidding the second electronic equipment to be awakened, and the second awakening confidence coefficient is used for indicating the acoustic feature similarity between the awakening word in the audio data determined by the second electronic equipment and the preset awakening word.
Optionally, the processing module is specifically configured to: if the first awakening confidence coefficient is larger than or equal to the first threshold value, determining that the first electronic equipment is allowed to be awakened; or if the first awakening confidence coefficient is smaller than a first threshold value and the first awakening confidence coefficient is larger than a second threshold value, determining to allow or prohibit the first electronic device to be awakened according to the first awakening confidence coefficient and the awakening identification result; or if the first awakening confidence coefficient is smaller than or equal to the second threshold value, determining to prohibit the first electronic equipment from being awakened.
Optionally, the processing module is specifically configured to: counting the awakening condition of at least one second electronic device according to the awakening identification result; and if the awakening condition meets the preset awakening condition of the first electronic equipment, determining that the first electronic equipment is allowed to be awakened.
Optionally, the wake-up condition includes any one of the following: the second awakening confidence degrees of all the second electronic devices in the device group except the first electronic device are greater than or equal to a third threshold; the proportion of the number of the devices of the second electronic device which is allowed to be awakened in the device group to the total number of the devices of the device group is greater than or equal to a first proportion; the proportion of the number of the devices of the second electronic device which is allowed to be awakened in the device group to the total number of the devices of the device group is smaller than a first proportion and larger than a second proportion, and second awakening confidence coefficients of the second electronic device which is allowed to be awakened are all larger than or equal to a third threshold value; the third threshold is a preset threshold allowing the second electronic device to be awakened in the first electronic device.
Optionally, the wake-up recognition result further includes a device identifier of at least one second electronic device, where the device identifier is used to indicate a device type of the second electronic device, and determine a weight value of the second electronic device.
Optionally, the processing module is specifically configured to: counting a first weight value of a second electronic device allowed to be awakened in the device group according to the awakening identification result; counting second weighted values of all second electronic equipment in the equipment group; adjusting a first threshold based on the first weight value and the second weight value; determining to allow or prohibit the first electronic device to be awakened according to the first awakening confidence level and the adjusted first threshold; the first threshold is a threshold allowing the first electronic device to be woken up.
Optionally, the first weight value is determined according to the number of devices corresponding to the device type of the second electronic device allowed to be woken up in the device group, and the weight value corresponding to each device type of the second electronic device allowed to be woken up; the second weight value is determined according to the number of the devices corresponding to the device types of all the second electronic devices in the device group and the weight value corresponding to each device type.
Optionally, the processing module is specifically configured to: and taking the product of the ratio of the first weight value to the second weight value and the maximum threshold value adjusting parameter as a threshold value adjusting parameter, and adjusting the first threshold value according to the threshold value adjusting parameter.
Optionally, the processing module is specifically configured to: counting the number of the second electronic equipment allowed to be awakened in the equipment group according to the awakening identification result; determining a threshold adjustment parameter corresponding to the number of the devices according to an awakening threshold adjustment table, wherein the awakening threshold adjustment table comprises a corresponding relation between the number of the devices of the second electronic device allowed to be awakened and the threshold adjustment parameter; adjusting the first threshold based on the threshold adjustment parameter; and determining to allow or prohibit the first electronic equipment to be awakened according to the first awakening confidence level and the adjusted first threshold value.
Optionally, the processing module is specifically configured to: if the first awakening confidence coefficient is larger than or equal to the adjusted first threshold value, determining that the first electronic equipment is allowed to be awakened; or if the first awakening confidence coefficient is smaller than the adjusted first threshold, determining to prohibit the first electronic device from being awakened.
In a fourth aspect, an embodiment of the present application provides a voice wake-up device, where the voice wake-up device is a first electronic device, and the first electronic device and at least one second electronic device belong to the same device group, and the voice wake-up device includes: the acquiring module is used for acquiring a first awakening confidence coefficient of the audio data, wherein the first awakening confidence coefficient is used for indicating the acoustic feature similarity of an awakening word in the audio data determined by the first electronic device and a preset awakening word; the receiving module is used for receiving a wake-up identification result sent by at least one second electronic device, and the wake-up identification result is used for indicating whether the second electronic device is allowed to be woken up or forbidden to be woken up; the processing module is used for adjusting a first threshold value according to the awakening identification result, wherein the first threshold value is a threshold value allowing the first electronic equipment to be awakened; and determining to allow or prohibit the first electronic equipment from being awakened according to the adjusted first threshold and the adjusted first awakening confidence coefficient.
Optionally, the processing module is specifically configured to: counting a first weight value of a second electronic device allowed to be awakened in the device group according to the awakening identification result; counting second weighted values of all second electronic equipment in the equipment group; the first threshold is adjusted based on the first weight value and the second weight value.
Optionally, the first weight value is determined according to the number of devices corresponding to the device type of the second electronic device allowed to be awakened in the device group, and the weight value corresponding to each device type of the second electronic device allowed to be awakened; the second weight value is determined according to the number of devices corresponding to the device types of all the second electronic devices in the device group and the weight value corresponding to each device type.
Optionally, the processing module is specifically configured to: and taking the product of the ratio of the first weight value to the second weight value and the maximum threshold value adjusting parameter as a threshold value adjusting parameter, and adjusting the first threshold value according to the threshold value adjusting parameter.
Optionally, the processing module is specifically configured to: counting the number of the second electronic equipment allowed to be awakened in the equipment group according to the awakening identification result; determining a threshold adjustment parameter corresponding to the number of the devices according to an awakening threshold adjustment table, wherein the awakening threshold adjustment table comprises a corresponding relation between the number of the devices of the second electronic device allowed to be awakened and the threshold adjustment parameter; the first threshold is adjusted based on the threshold adjustment parameter.
Optionally, the wake-up recognition result includes at least one of a wake-up identifier of the at least one second electronic device and a second wake-up confidence level; the awakening identifier comprises an identifier for allowing or forbidding the second electronic device to be awakened, and the second awakening confidence coefficient is used for indicating the acoustic feature similarity between the awakening word in the audio data determined by the second electronic device and the preset awakening word.
Optionally, the processing module is specifically configured to: if the first awakening confidence coefficient is larger than or equal to the adjusted first threshold value, determining that the first electronic equipment is allowed to be awakened; or if the first awakening confidence coefficient is smaller than the adjusted first threshold, determining to prohibit the first electronic device from being awakened.
In a fifth aspect, an embodiment of the present application provides a voice wake-up device, including: a memory for storing a computer program and a processor for retrieving and executing the computer program from the memory, such that the processor executes the computer program to perform the method of any of the first aspect or the method of any of the second aspect.
In a sixth aspect, an embodiment of the present application provides a storage medium, which includes a computer program for implementing the method according to any one of the first aspect or the second aspect.
The embodiment of the application provides a voice awakening method, equipment and a storage medium, wherein the method comprises the following steps: the method includes the steps that the main device determines whether the main device is awakened or not according to the awakening identification result of the main device and the awakening identification results of other slave devices in the device group by introducing the awakening identification results of the other slave devices in the device group into an awakening decision of the main device, and the accuracy of awakening the main device in the device group is improved. Furthermore, the main device can dynamically adjust a preset awakening threshold value of the main device in the device group in a continuous or discrete mode, and determine whether the main device is awakened according to the adjusted awakening threshold value, the dynamically adjusted awakening threshold value is more suitable for the actual state of awakening of the whole device group, the main device makes an awakening decision based on the dynamically adjusted awakening threshold value, and the accuracy of awakening the device is higher.
Drawings
Fig. 1 is a schematic view of a scene of a voice wake-up method according to an embodiment of the present application;
fig. 2 is a schematic hardware structure diagram of an electronic device according to an embodiment of the present application;
fig. 3 is a schematic diagram of a software architecture of an electronic device according to an embodiment of the present application;
fig. 4a to fig. 4c are schematic diagrams of a scenario of a voice wake-up method according to an embodiment of the present application;
FIG. 5 is a schematic diagram of a user interface interaction provided by an embodiment of the present application;
fig. 6 is an interaction diagram of a voice wake-up method according to an embodiment of the present application;
fig. 7 is a schematic flowchart of a voice wake-up method according to an embodiment of the present application;
fig. 8a is a flowchart illustrating a determination method of a voice wake-up method according to an embodiment of the present application;
fig. 8b is a flowchart illustrating a determination method of a voice wake-up method according to an embodiment of the present application;
fig. 9 is a flowchart illustrating a determination method of a voice wake-up method according to an embodiment of the present application;
fig. 10 is a schematic flowchart of another voice wake-up method according to an embodiment of the present application;
fig. 11 is a schematic flowchart of another voice wake-up method according to an embodiment of the present application;
fig. 12 is a schematic structural diagram of a voice wake-up device according to an embodiment of the present application;
fig. 13 is a schematic structural diagram of a voice wake-up device according to an embodiment of the present application;
fig. 14 is a schematic diagram of a hardware structure of a voice wake-up device according to an embodiment of the present application.
Detailed Description
The electronic device provided by the embodiment of the application is an electronic device with a voice awakening function, namely, a user can awaken the electronic device through voice. Specifically, the user wakes up the electronic device by speaking the wake-up word, and the wake-up word may be preset by the user according to the needs of the user, or may be preset by the electronic device before the electronic device leaves a factory.
The electronic equipment acquires audio data and detects whether the audio data contains the awakening words, if the audio data contains the awakening words, the electronic equipment is awakened, otherwise, the electronic equipment is not awakened. After the electronic device is awakened, the user can interact with the electronic device through voice. For example, the preset wake-up word is "art", and when the electronic device detects that the audio data contains the art, the electronic device is woken up. Fig. 1 shows a schematic diagram of a voice wakeup scenario, which includes an electronic device 10, an electronic device 20, an electronic device 30, and an electronic device 40, as shown in fig. 1. The electronic device 10, the electronic device 30, and the electronic device 40 have the same preset wake-up word, for example, the wake-up word 1, and the preset wake-up word in the electronic device 20 is the wake-up word 2. When the wake-up word spoken by the user is the wake-up word 1, and each electronic device in the scene can receive or acquire the wake-up word spoken by the user, the electronic devices 10, 30, and 40 can be woken up. Since the user utters the wake-up word different from the wake-up word preset by the electronic device 20, the electronic device 20 is not woken up.
It can be known from the above examples that when there are multiple electronic devices supporting voice wakeup in a space where a user is located, the user speaks a wakeup word, which may be received or collected by multiple electronic devices, and if the wakeup words preset by the multiple electronic devices are all the same, the multiple electronic devices may be woken up, and the multiple woken-up electronic devices may simultaneously respond to a request of the user, so that the user may not know which device to perform voice interaction, and user experience may be reduced.
In order to avoid confusion of user voice interaction, according to the voice wake-up method provided in the embodiment of the present application, electronic devices capable of being simultaneously woken up by a user form a device group, only one electronic device in the device group responds to the wake-up of the user, the electronic device responding to the wake-up is regarded as a master device of the device group, other electronic devices except the master device are regarded as slave devices of the device group, and the slave devices cooperate with the master device to process an instruction intention of the user. Taking fig. 1 as an example, the electronic device 10, the electronic device 30, and the electronic device 40 with the same preset wake-up word may be combined into a device group, the electronic device 10 is set as a master device of the device group, the electronic device 30 and the electronic device 40 are slave devices of the device group, and the slave devices cooperate with the master device to process instruction intentions of the user. It should be noted that, the master device of the device group may be set by a preset rule, for example, the electronic device with the strongest processing performance in the device group is set as the master device, or any one of the electronic devices in the device group may be set as the master device by user self-definition, which is not limited in this embodiment.
Based on the voice wake-up scenario provided in fig. 1, after receiving or acquiring audio data, the main device of the device group pre-processes the audio data, extracts wake-up words in the audio data, calculates a similarity value of acoustic features between the wake-up words in the audio data and a preset wake-up word through a voice wake-up model, and if the similarity value is greater than or equal to a preset similarity threshold (also referred to as a preset wake-up threshold), it is considered that a wake-up request is detected, and the main device of the device group is woken up, otherwise the main device is not woken up. Therefore, whether the master device in the device group wakes up or not is determined by the wake-up recognition result of the master device, and the implementation scheme has the following defects:
first, the device group includes a plurality of electronic devices that can be woken up, and only the wake-up recognition result of the main device itself is relied on as the wake-up recognition result of the whole device group, which is not accurate. For example, when there is an interference source on a voice transmission path, or when the user is far away from the main device, only the wake-up recognition result of the main device itself is relied on, and when the main device needs to be woken up, the main device does not detect the wake-up voice due to the influence of external factors, or the wake-up voice is detected but does not reach the preset wake-up threshold, so that the whole device group does not respond.
Second, the preset wake-up threshold of the master device in the device group is a fixed value, but the wake-up threshold may be different in different voice wake-up scenarios. Therefore, the fixed preset wake-up threshold may affect the wake-up recognition result of the master device.
In order to increase the wake-up rate of the main device, other solutions may be used to assist the main device in determining the wake-up identification. For example, in a server model rechecking mode, the main device of the device group interacts with the server to obtain the wake-up recognition result from the server. However, this approach is costly to implement and, depending on the network environment, may have high latency issues. For another example, the accuracy of the wake-up recognition is improved by expanding the voice wake-up model on the device side, which has the disadvantage of occupying more computing and storage resources on the device side, resulting in higher overall cost of the device.
In summary, the main device of the device group only recognizes the result according to the wake-up, so that a high misjudgment rate exists, and the whole device group is easy to cause no response after the user sends the wake-up voice, and the user experience is poor. Although the wake-up rate of the main device can be improved by rechecking the server or expanding the device-side voice wake-up model, the implementation cost is high, and high time delay exists. In view of this, an embodiment of the present application provides a voice wake-up method, where a master device determines whether the master device is woken up according to a wake-up recognition result of the master device itself and wake-up recognition results of other slave devices in a device group by introducing wake-up recognition results of the other slave devices in the device group into a wake-up decision of the master device and fully utilizing the wake-up recognition results of different electronic devices in the device group, and the wake-up recognition results of the other slave devices in the device group are considered in the wake-up decision of the master device, so that accuracy of waking up the master device in the device group is improved.
Under different voice awakening scenes, the main equipment of the equipment group adopts a fixed awakening threshold value to make awakening decision, and higher misjudgment rate exists. In contrast, in the voice wake-up method provided in the embodiments of the present application, the master device also introduces wake-up recognition results of other slave devices in the device group into a wake-up decision of the master device, and dynamically adjusts a preset wake-up threshold of the master device in the device group in a continuous or discrete manner according to the wake-up recognition result of the master device itself and the wake-up recognition results of the other slave devices in the device group, and determines whether the master device is woken up according to the adjusted wake-up threshold. The awakening threshold value after dynamic adjustment is more fit for the actual state of awakening of the whole equipment group, and the main equipment makes awakening decision based on the awakening threshold value after dynamic adjustment, so that the accuracy of awakening the main equipment in the equipment group is improved.
The following describes in detail electronic devices in the device group in the embodiment of the present application.
The electronic device in the embodiment of the present application may be a portable electronic device, such as a mobile phone, a tablet computer, an Artificial Intelligence (AI) intelligent voice terminal, a wearable device, an Augmented Reality (AR)/Virtual Reality (VR) device, and the like. Exemplary embodiments of the portable electronic device include, but are not limited to, a portable electronic device with a mounted or other operating system. The portable electronic device may be a car mounted terminal, a Laptop computer (Laptop), or the like. It should also be understood that the electronic device according to the embodiment of the present application may also be a desktop computer, an intelligent home device (e.g., an intelligent television, an intelligent sound box), and the like, which is not limited thereto.
For example, fig. 2 shows a hardware structure diagram of an electronic device according to an embodiment of the present application. Specifically, as shown in fig. 2, the electronic device includes a processor 110, an internal memory 121, an external memory interface 122, a camera 131, a display 132, a sensor module 140, a Subscriber Identity Module (SIM) card interface 151, a button 152, an audio module 160, a speaker 161, a receiver 162, a microphone 163, an earphone interface 164, a Universal Serial Bus (USB) interface 170, a charging management module 180, a power management module 181, a battery 182, a mobile communication module 191, and a wireless communication module 192. In other embodiments, the electronic device may also include motors, indicators, keys, and the like.
It should be understood that the hardware configuration shown in fig. 2 is only one example. The electronic devices of the embodiments of the application may have more or fewer components than the electronic devices shown in the figures, may combine two or more components, or may have different configurations of components. The various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.
Processor 110 may include one or more processing units, among others. For example: the processor 110 may include an Application Processor (AP), a modem, a Graphics Processor (GPU), an Image Signal Processor (ISP), a controller, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processor (NPU), and the like. The different processing units may be separate devices or may be integrated into one or more processors.
In some embodiments, a buffer may also be provided in processor 110 for storing instructions and/or data. As an example, the cache in the processor 110 may be a cache memory. The buffer may be used to hold instructions and/or data that have just been used, generated, or recycled by processor 110. If the processor 110 needs to use the instruction or data, it can be called directly from the buffer. Helping to reduce the time for processor 110 to fetch instructions or data and thus helping to improve the efficiency of the system.
The internal memory 121 may be used to store programs and/or data. In some embodiments, the internal memory 121 includes a program storage area and a data storage area. The storage program area may be configured to store an operating system (e.g., an operating system such as Android and IOS), a computer program required by at least one function (e.g., a voice wake-up function, a sound playing function), and the like. The storage data area may be used to store data (e.g., audio data) created, and/or collected during use of the electronic device, etc. For example, the processor 110 may implement one or more functions by calling programs and/or data stored in the internal memory 121 to cause the electronic device to execute a corresponding method. For example, the processor 110 calls some programs and/or data in the internal memory to cause the electronic device to execute the voice wake-up method provided in the embodiments of the present application, thereby implementing the voice wake-up function. The internal memory 121 may be a high-speed random access memory, a nonvolatile memory, or the like. For example, the non-volatile memory may include at least one of one or more magnetic disk storage devices, flash memory devices, and/or universal flash memory (UFS), among others.
The external memory interface 122 may be used to connect an external memory card (e.g., a Micro SD card) to extend the storage capability of the electronic device. The external memory card communicates with the processor 110 through the external memory interface 122 to implement a data storage function. For example, the electronic device may save files such as images, music, videos, and the like in the external memory card through the external memory interface 122.
The camera 131 may be used to capture motion, still images, and the like. Typically, the camera 131 includes a lens and an image sensor. The optical image generated by the object through the lens is projected onto the image sensor, and then is converted into an electric signal for subsequent processing. For example, the image sensor may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The image sensor converts the optical signal into an electrical signal and then transmits the electrical signal to the ISP to be converted into a digital image signal. It should be noted that the electronic device may include 1 or N cameras 131, where N is a positive integer greater than 1.
The display screen 132 may include a display panel for displaying a user interface. The display panel may be a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-o led, a quantum dot light-emitting diode (QLED), or the like. It should be noted that the electronic device may include 1 or M display screens 132, where M is a positive integer greater than 1. For example, the electronic device may implement display functionality via the GPU, the display screen 132, the application processor, and/or the like.
The sensor module 140 may include one or more sensors. For example, the touch sensor 140A, the gyroscope 140B, the acceleration sensor 140C, the fingerprint sensor 140D, the pressure sensor 140E, and the like. In some embodiments, the sensor module 140 may also include an ambient light sensor, a distance sensor, a proximity light sensor, a bone conduction sensor, a temperature sensor, and the like.
Here, the touch sensor 140A may also be referred to as a "touch panel". The touch sensor 140A may be disposed on the display screen 132, and the touch sensor 140A and the display screen 132 form a touch screen, which is also called a "touch screen". The touch sensor 140A is used to detect a touch operation acting thereon or nearby. The touch sensor 140A may pass the detected touch operation to the application processor to determine the touch event type. The electronic device may provide visual output related to touch operations, etc. through the display screen 132. In other embodiments, the touch sensor 140A may be disposed on a surface of the electronic device at a different location than the display screen 132.
Gyroscope 140B may be used to determine the motion pose of the electronic device. In some embodiments, the angular velocity of the electronic device about three axes (i.e., the x, y, and z axes) may be determined by gyroscope 140B. The gyroscope 140B may be used for photographing anti-shake. Illustratively, when the shutter is pressed, the gyroscope 140B detects a shake angle of the electronic device, calculates a distance to be compensated for the lens module according to the shake angle, and allows the lens to counteract the shake of the electronic device through a reverse movement, thereby achieving anti-shake. The gyroscope sensor 140B may also be used for navigation and body sensing of game scenes.
The acceleration sensor 140C can detect the magnitude of acceleration of the electronic device in various directions (typically three axes). When the electronic device is stationary, the magnitude and direction of gravity can be detected. The acceleration sensor 140C may also be used to identify the posture of the electronic device, and may be applied to horizontal and vertical screen switching, pedometer, and other applications.
The fingerprint sensor 140D is used to capture a fingerprint. The electronic equipment can utilize the collected fingerprint characteristics to realize fingerprint unlocking, application lock access, fingerprint photographing, incoming call answering and the like.
The pressure sensor 140E is used for sensing a pressure signal, and can convert the pressure signal into an electrical signal. For example, the pressure sensor 140E may be disposed on the display screen 132. The touch operations which act on the same touch position but have different touch operation intensities can correspond to different operation instructions.
The SIM card interface 151 is used to connect a SIM card. The SIM card can be attached to and detached from the electronic device by being inserted into the SIM card interface 151 or being pulled out from the SIM card interface 151. The electronic device may support 1 or K SIM card interfaces 151, K being a positive integer greater than 1. The SIM card interface 151 may support a Nano SIM card, a Micro SIM card, and/or a SIM card, among others. Multiple cards can be inserted into the same SIM card interface 151 at the same time. The types of the plurality of cards may be the same or different. The SIM card interface 151 may also be compatible with different types of SIM cards. The SIM card interface 151 is also compatible with external memory cards. The electronic equipment realizes functions of conversation, data communication and the like through the interaction of the SIM card and the network. In some embodiments, the electronic device may also employ esims, namely: an embedded SIM card. The eSIM card can be embedded in the electronic device and cannot be separated from the electronic device.
The keys 152 may include a power on key, a volume key, and the like. The keys 152 may be mechanical keys or touch keys. The electronic device may receive a key input, and generate a key signal input related to user settings and function control of the electronic device.
The electronic device may implement audio functions through the audio module 160, the speaker 161, the receiver 162, the microphone 163, the headset interface 164, and the application processor, etc. Such as an audio play function, a recording function, a voice wake-up function, etc.
The audio module 160 may be used to perform digital-to-analog conversion, and/or analog-to-digital conversion on the audio data, and may also be used to encode and/or decode the audio data. For example, the audio module 160 may be disposed independently of the processor, may be disposed in the processor 110, or may dispose some functional modules of the audio module 160 in the processor 110.
The speaker 161, also called a "speaker", converts audio data into sound and plays the sound. For example, the electronic device 100 may listen to music, listen to a speakerphone, or issue a voice prompt, etc. via the speaker 161.
A receiver 162, also called "earpiece", is used to convert audio data into sound and play the sound. For example, when the electronic device 100 answers a call, the answer can be made by placing the receiver 162 close to the ear of the person.
The microphone 163, also called "microphone" or "microphone", is used for collecting sounds (e.g., ambient sounds including sounds made by a person, sounds made by a device, etc.) and converting the sounds into audio electrical data. When making a call or transmitting voice, the user can make a sound by approaching the microphone 163 through the mouth of the person, and the microphone 163 collects the sound made by the user. When the voice wake-up function of the electronic device is turned on, the microphone 163 may collect ambient sound in real time to obtain audio data. The condition of the microphone 163 collecting sound is related to the environment. For example, when the ambient environment is noisy and the user speaks the wake-up word, the sound collected by the microphone 163 includes ambient noise and the sound of the wake-up word emitted by the user. For another example, when the ambient environment is quite quiet, the user speaks the wake word, and the sound collected by the microphone 163 is the sound of the wake word emitted by the user. For another example, when the surrounding environment is noisy, the voice wake-up function of the electronic device is turned on, but the user does not speak the wake-up word to wake up the electronic device, and the sound collected by the microphone 163 is only the surrounding environment noise.
It should be noted that the electronic device may be provided with at least one microphone 163. For example, two microphones 163 are provided in the electronic device, and in addition to collecting sound, a noise reduction function can be realized. For another example, three, four or more microphones 163 may be further provided in the electronic device, so that the recognition of the sound source, the directional recording function, or the like may be further realized on the basis of realizing the sound collection and the noise reduction.
The earphone interface 164 is used to connect a wired earphone. The headset interface 164 may be a USB interface 170, or may be a 3.5mm open mobile electronic device platform (OMTP) standard interface, a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface, or the like.
The USB interface 170 is an interface conforming to a USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 170 may be used to connect a charger to charge the electronic device, and may also be used to transmit data between the electronic device and a peripheral device. And the earphone can also be used for connecting an earphone and playing audio through the earphone. For example, the USB interface 170 may be used to connect other electronic devices, such as AR devices, computers, and the like, in addition to the headset interface 164.
The charging management module 180 is configured to receive charging input from a charger. The charger may be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 180 may receive charging input from a wired charger via the USB interface 170. In some wireless charging embodiments, the charging management module 180 may receive a wireless charging input through a wireless charging coil of the electronic device. While the charging management module 180 charges the battery 182, the power management module 180 may also supply power to the electronic device.
The power management module 181 is used to connect the battery 182, the charging management module 180 and the processor 110. The power management module 181 receives input from the battery 182 and/or the charging management module 180 to power the processor 110, the internal memory 121, the display 132, the camera 131, and the like. The power management module 181 may also be used to monitor parameters such as battery capacity, battery cycle count, battery state of health (leakage, impedance), and the like. In some other embodiments, the power management module 181 may also be disposed in the processor 110. In other embodiments, the power management module 181 and the charging management module 180 may be disposed in the same device.
The mobile communication module 191 may provide a solution including 2G/3G/4G/5G wireless communication, etc. applied to the electronic device. The mobile communication module 191 may include a filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like.
The wireless communication module 192 may provide solutions for wireless communication applied to electronic devices, including WLAN (e.g., Wi-Fi network), Bluetooth (BT), Global Navigation Satellite System (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR), and the like. The wireless communication module 192 may be one or more devices that integrate at least one communication processing module.
In some embodiments, the antenna 1 of the electronic device is coupled to the mobile communication module 191 and the antenna 2 is coupled to the wireless communication module 192 so that the electronic device can communicate with other devices. Specifically, the mobile communication module 191 may communicate with other devices through the antenna 1, and the wireless communication module 193 may communicate with other devices through the antenna 2.
Fig. 3 shows a software architecture diagram of an electronic device according to an embodiment of the present application. As shown in fig. 3, the electronic device includes an audio collection module (audio collector)401, an audio processing module (audio processor)402, an audio recognition module (audio recognizer)403, and an interaction module (interaction) 404.
The audio collection module 401 is configured to store audio data converted from sounds collected by a sound collection device (such as the microphone 163 shown in fig. 2 or other sensors for collecting sounds), and forward the audio data to the audio processing module 402. For example, the audio collection module 401 may be configured to store the audio data obtained from the audio module 160 into a memory (e.g., the internal memory 121, or a memory in the processor 110), and forward the audio data stored in the memory to the audio processing module 402 for processing. It should be noted that, in the embodiment of the present application, the audio collection module 401 may actively acquire the audio data from the audio module 160 after receiving the notification that the audio module 160 obtains the audio data, or send the audio data to the audio collection module 401 after the audio module 160 acquires the audio data, and the embodiment of the present application does not limit the manner in which the audio collection module 401 acquires the audio data from the audio module 160.
The audio processing module 402 is configured to perform preprocessing on the audio data, such as channel conversion, smoothing, and denoising, and send the preprocessed audio data to the audio recognition module 403, so that the subsequent audio recognition module 403 performs wakeup word detection.
The audio recognition module 403 is configured to perform wakeup word detection on the audio data, and determine whether a preset wakeup word exists in the audio data through a voice wakeup model. And calculating the similarity value of the acoustic features of the awakening word in the audio data and the preset awakening word through the voice awakening model, if the similarity value is greater than or equal to a preset similarity threshold value, sending an awakening instruction to the microphone 163, the receiver 162 and the earphone interphone, acquiring pre-stored response voice data from the audio module 160 through the loudspeaker 161 after awakening, converting the response voice data into sound, and performing voice response. For example, the audio recognition module 403 may perform wakeup word detection according to the audio data at Q sampling times in a first time period, which may also be referred to as a wakeup word time window, and is generally set to be not less than a time period required for a user to make a wakeup word sound. The duration of an interval between two adjacent sampling intervals in Q sampling moments is a first sampling interval, that is, the audio processing module 402 may send the preprocessed audio data to the audio recognition module 403 at every first sampling interval, and the audio recognition module 403 performs wakeup word detection once at every first sampling interval according to the most recently received audio data at Q sampling moments. When the audio recognition module 403 detects the wake-up word, the similarity value between the wake-up word and the preset wake-up word is calculated through the voice wake-up model, if the similarity value is greater than or equal to the preset similarity threshold value, it is determined to wake up the electronic device, and the audio recognition module 403 sends a wake-up instruction to the speaker 161. It should be noted that, in the embodiment of the present application, a value of the first sampling interval may be 0.1ms, 0.2ms, or the like, and may be preset, or may be determined according to a preset algorithm, which is not limited in this embodiment. In other embodiments, the audio recognition module 403 can also perform voice data recognition on the audio data, recognize semantics in the voice data, and so on.
The interaction module 404 is configured to perform information interaction with other devices, for example, with other slave devices in the device group. The interaction module 404 is configured to, when receiving the wake-up identification results sent by other slave devices in the device group, send the wake-up identification results of the other slave devices to the audio identification module 403, so that the audio identification module 403 determines whether to wake up the electronic device in combination with the wake-up identification results of the other slave devices. The interaction module 404 is further configured to send a control instruction to other slave devices in the device group, so that the other slave devices cooperate with the master device to process the instruction intention of the user.
Further, in some embodiments, the electronic device may also include an audio synthesizer module (audio synthesizer) 405. The audio synthesis module 405 is configured to synthesize corresponding response voice data, and convert the response voice data into sound for playing. For example, the electronic device may play a sound of "ask what help is needed" through the speaker 161 in response to capturing a sound of "art" emitted by the user. In this case, the audio synthesis module 405 is configured to synthesize corresponding response voice data in response to the collection of the sound of "art and art" uttered by the user, convert the synthesized response voice data into a sound of "ask what help is needed", and play it.
It should be understood that the software architecture shown in FIG. 3 is merely an example. The electronic device of the embodiments of the present application may have more or fewer modules than the electronic device shown in the drawings, may combine two or more modules, and the like. The various modules shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.
It is noted that the audio collection module 401, the audio identification module 403, the audio processing module 402, the interaction module 404, and the audio synthesis module 405 shown in fig. 3 may be integrated into one or more processing units in the processor 110 shown in fig. 2, for example, part or all of the audio collection module 401, the audio identification module 403, the audio processing module 402, the interaction module 404, and the audio synthesis module 405 may be integrated into one or more processors such as an application processor, a special-purpose processor, and the like. It should be noted that, the dedicated processor in the embodiment of the present application may be a DSP, an Application Specific Integrated Circuit (ASIC) chip, or the like.
The following embodiments may be implemented in an electronic device having the above-described hardware configuration and/or software configuration.
The following describes in detail a usage scenario of the voice wake-up method provided in the present application with reference to the accompanying drawings.
Fig. 4a to fig. 4c take the smart speaker as an example of the electronic device in the device group. As shown in fig. 4a, the scene includes 6 speakers, and the 6 speakers form a device group and are distributed in different positions of the area 1 (e.g., living room). The sound box 11 is a master device of the device group, the sound boxes 12 to 16 are slave devices of the device group, and the sound boxes in the device group are connected with each other (fig. 4a only shows the connection relationship between the master device and the slave devices). The equipment types of all the sound boxes in the equipment group are the same, and the awakening words are the same. Each sound box in the equipment group can receive or collect the awakening voice sent by the user at the awakening moment, only the main equipment sound box 11 in the equipment group is awakened and responded, the sound box 11 synthesizes the awakening recognition results of other sound boxes in the equipment group to make an awakening decision to determine whether the sound box 11 is awakened, and after the sound box 11 is awakened, the auxiliary equipment sound boxes 12 to 16 can cooperate with the sound box 11 to process the instruction intention of the user. Fig. 4b differs from fig. 4a in that the 6 enclosures in the device group are distributed over different zones, as shown in fig. 4b, for example, where master enclosure 11 and slave enclosure 12 are located in zone 1 and slave enclosures 13 through 16 are located in zone 2 (e.g., a bedroom). In this scenario, the user may send out the wake-up voice in any of the two areas, the distance between the user and the main device loudspeaker 11 in the device group is not fixed, and there may be a short distance or a long distance, and there may also be a noise source on the transmission path of the wake-up voice. Fig. 4c is different from fig. 4a in that the hardware specifications of the speakers in the device group are different, but the wake-up word is the same. The sound box can be subdivided into a sound box Pro, a conventional sound box and a sound box mini according to the hardware specification of the sound box, and the hardware specification of each sound box has the size relationship as follows: pro > conventional loudspeaker > mini loudspeaker. The hardware specification includes, but is not limited to, processing performance, size, and the like of the chip built in the sound box. As shown in fig. 4c, the loudspeaker box Pro11 with the best hardware specification is used as the main device of the device group.
The above usage scenario is only an example, in the scenario, each electronic device in the same device group may be a device of the same type, for example, all devices in the device group are smart speakers, as shown in fig. 4a to 4c, or may be a device of different type, for example, a smart speaker, a smart phone, a smart television, and the like are in the device group, as shown in fig. 1. The embodiment of the present application does not set any limit to the device types of the electronic devices in the device group.
As an example, a user may manage or set up the electronic devices in the device group via any electronic device with display functionality, such as a smartphone, tablet, etc. Fig. 5 shows a user interface interaction diagram according to an embodiment of the application. As shown in FIG. 5, the user interface 501 includes a status bar, icon controls for multiple applications (e.g., "Smart Home" icons), time and weather gadgets, and the like. When the electronic equipment detects the touch operation of a finger (or a touch pen) of a user on a certain application icon, the electronic equipment starts the application program in response to the touch operation, and displays a user interface of the application program on a display screen. Illustratively, the electronic device detects a touch operation on the "smart home" icon, and in response to the touch operation, displays a device interface 502 of the user on the display screen, where the device interface 502 includes a list of smart devices (such as devices a, b, c, and d shown in fig. 5) that the user has added, a first control 503, a second control 504, and a third control 505. The user can add a new smart device to the smart device list by clicking the first control 503, and the user can add a plurality of smart devices in the smart device list to the same device group by clicking the second control 504 and the third control 505. On the device group interface 506, the user can set the master device in the device group (for example, set device a as the master device in fig. 5) by clicking the fourth control 507, and the user can enter any smart device interface 509 in the device group by clicking the device control 508 to query the device information (for example, the device model, the data record, the wake-up sensitivity, and the like) of a certain smart device. For example, for the wake sensitivity of the smart device, the user may set the wake-up enhancement function of the smart device to be turned on or off via the fifth control 510.
Based on the above descriptions of the electronic device and the scenario, the voice wake-up method provided by the present application is described in detail below with reference to several specific embodiments.
Fig. 6 shows an interaction diagram of a voice wake-up method provided in an embodiment of the present application. For example, a device group composed of 3 electronic devices is taken as an example, and it is assumed that the first electronic device is a master device of the device group, and the second electronic device a and the second electronic device b are slave devices of the device group. As shown in fig. 6, when the user sends the wake-up voice, each electronic device in the device group can receive or acquire the wake-up voice, and each electronic device in the device group performs wake-up word detection on the wake-up voice to obtain a wake-up recognition result. Specifically, the first electronic device calculates a first awakening confidence coefficient of the awakening voice; the second electronic device a calculates a second awakening confidence coefficient of the awakening voice and determines to allow or prohibit the second electronic device a to be awakened; and the second electronic device b calculates a third awakening confidence level of the awakening voice and determines to allow or prohibit the second electronic device b to be awakened. After determining the awakening identification results, the second electronic device a and the second electronic device b respectively send the determined awakening identification results to the first electronic device, so that the first electronic device determines to allow or prohibit the first electronic device to be awakened according to the first awakening confidence coefficient determined by the first electronic device a and the awakening identification results sent by other slave devices in the device group. It should be understood that the wake-up recognition results may be different because the wake-up confidence calculated by each electronic device may be the same or different due to different distances between each electronic device and the user, different interference factors on the transmission path, different processing performances of each electronic device, and the like.
As an example, fig. 7 shows a schematic flow chart of a voice wake-up method, where the method uses a first electronic device as an execution subject, and as shown in fig. 7, the method specifically includes the following steps:
step 101, audio data is acquired.
The first electronic device may capture the sound of the surrounding environment by a sound capture device, such as a microphone, other sensors for capturing sound, and the like. After the sound collection device collects the ambient sound, the sound collection device converts the ambient sound into audio electrical data and outputs the audio electrical data to the audio module 160, and the audio module 160 performs coding and/or analog-to-digital conversion to obtain audio data in a corresponding format. After the audio module 160 obtains the audio data in the corresponding format, the audio data in the corresponding format may be sent to the audio collection module 401 in the processor 110. The audio collection module 401 stores the audio data of the corresponding format in a memory (e.g., the internal memory 121, or a memory in the processor 110, etc.) and transmits the audio data of the corresponding format to the audio processing module 402. The audio processing module 402 preprocesses the audio data in the corresponding format to obtain processed audio data. The pre-processing of the audio data includes channel conversion, smoothing, noise reduction, and the like. Optionally, in some embodiments, the audio module 160 obtains the audio data in the corresponding format, and may further send a notification to the audio collection module 401 in the processor 110, after receiving the notification, the audio collection module 401 obtains the audio data in the corresponding format from the audio module 401, then stores the audio data in the corresponding format in the memory, and sends the audio data in the corresponding format to the audio processing module 402, and the audio processing module 402 performs pre-processing on the audio data in the corresponding format to obtain processed audio data.
Step 102, calculating a first awakening confidence coefficient of the audio data. The first awakening confidence coefficient is used for indicating the acoustic feature similarity between an awakening word in the audio data and a preset awakening word.
The first electronic device may perform wakeup word detection on the preprocessed audio data through the processor 110. Illustratively, the first electronic device performs wakeup word detection on the preprocessed audio data through the audio recognition module 403 in the processor 110. The audio recognition module 403 is the preprocessed audio data obtained from the audio processing module 402.
Specifically, the audio recognition module 403 performs wakeup word detection on the preprocessed audio data, and can determine whether a preset wakeup word exists in the audio data through the voice wakeup model. The voice awakening model is obtained through pre-training of a large amount of audio data, and similarity values of acoustic features of awakening words in the audio data and preset awakening words are calculated through the voice awakening model. The input of the voice wakeup model is preprocessed audio data, and the output of the voice wakeup model may include a wakeup identifier, where the wakeup identifier includes an identifier that allows or prohibits the electronic device from being woken up. Illustratively, the wake-up flag is 0, which indicates that the electronic device is prohibited from being woken up; the wake up flag is 1 to indicate that the electronic device is allowed to wake up. In some embodiments, the output of the voice wake model may include a wake confidence level indicating the similarity of the acoustic features of the wake words in the audio data to the preset wake words. In some embodiments, the output of the voice wake model may include both the wake identification and the wake confidence described above. The audio recognition module 403 obtains the wake up flag and/or the wake up confidence level through the voice wake up model.
And 103, receiving a wake-up identification result sent by at least one second electronic device. Wherein the wake-up identification result is used for indicating that the at least one second electronic device is allowed or prohibited to be woken up.
In this embodiment, each slave device in the device group has a wake-up identification function. Similar to the above steps 101 and 102, each second electronic device in the device group may also preprocess the received audio data, then perform wakeup word detection on the preprocessed audio data, determine whether a preset wakeup word exists in the audio data through a voice wakeup model, calculate similarity between acoustic features of the wakeup word in the audio data and the preset wakeup word through the voice wakeup model, and finally obtain a wakeup identifier and/or a wakeup confidence. Accordingly, the wake-up recognition result sent by each second electronic device may include a wake-up identifier and/or a wake-up confidence. The first electronic device may receive a wake-up recognition result sent by at least one second electronic device in the device group through the interaction module 404, and when the interaction module 404 receives the wake-up recognition result sent by the at least one second electronic device, the interaction module 404 sends the wake-up recognition result of the at least one second electronic device to the audio recognition module 403, so that the audio recognition module 403 determines whether to wake up the first electronic device according to the first wake-up confidence and the wake-up recognition result sent by the at least one second electronic device.
And step 104, determining to allow or prohibit the first electronic device to be awakened according to the first awakening confidence level and the awakening identification result sent by the at least one second electronic device.
In this step, the first electronic device may determine, through the audio recognition module 403 in the processor 110, to allow or prohibit the first electronic device from being woken up. Specifically, the audio identification module 403 first determines a size relationship between the first wake-up confidence and a preset wake-up threshold, and determines whether to allow or prohibit the first electronic device to be woken up according to a determination result. The preset wake-up threshold may include one threshold or two thresholds.
In a possible implementation manner, the preset wake-up threshold includes a threshold, which is a first threshold. As shown in fig. 8a, the audio recognition module 403 may determine to allow or prohibit the first electronic device from being woken up according to the magnitude relationship between the first wake-up confidence and the first threshold. Specifically, if the first wake-up confidence is greater than or equal to the first threshold, the audio recognition module 403 determines that the first electronic device is allowed to be woken up; if the first wake-up confidence is smaller than the first threshold, the audio recognition module 403 determines to allow or prohibit the first electronic device to be woken up according to the first wake-up confidence and a wake-up recognition result sent by the at least one second electronic device. In this implementation, if the first wake-up confidence calculated by the first electronic device is higher, the first electronic device may directly determine that the first electronic device is allowed to be woken up. If the first awakening confidence degree calculated by the first electronic device is not very high, the first electronic device needs to comprehensively judge whether the first electronic device is allowed to be awakened or not by combining awakening identification results sent by other slave devices in the device group, and the awakening accuracy of the main device in the device group is improved.
In another possible implementation manner, the preset wake-up threshold includes two thresholds, which are a first threshold and a second threshold, respectively, where the first threshold is greater than the second threshold. As shown in fig. 8b, the audio recognition module 403 may determine to allow or prohibit the first electronic device to be woken up according to the magnitude relationship between the first wake-up confidence and the first and second thresholds. Specifically, if the first wake-up confidence is greater than or equal to the first threshold, the audio recognition module 403 determines that the first electronic device is allowed to be woken up; if the first wake-up confidence is less than or equal to the second threshold, the audio recognition module 403 determines to prohibit the first electronic device from being woken up; if the first wake-up confidence is smaller than the first threshold and larger than the second threshold, the audio recognition module 403 determines to allow or prohibit the first electronic device to be woken up according to the first wake-up confidence and a wake-up recognition result sent by at least one second electronic device. Compared with the first implementation mode, the implementation mode is additionally provided with the second threshold which is a relatively low judgment threshold, if the first awakening confidence obtained by the first electronic device through calculation is smaller than or equal to the second threshold, the first awakening confidence can be considered to be really low, the first electronic device does not need to be comprehensively judged by combining awakening identification results sent by other slave devices in the device group, and can directly judge that the first electronic device is forbidden to be awakened.
It should be noted that, in the embodiment of the present application, the preset wake-up threshold of each electronic device in the device group may be the same threshold, for example, the preset wake-up threshold in the first electronic device and any second electronic device includes a threshold, where the threshold is a first threshold, the first electronic device performs preliminary determination according to the first threshold and a first wake-up confidence obtained by current calculation, and similarly, the second electronic device performs wake-up recognition determination according to the first threshold and a second wake-up confidence obtained by current calculation, and sends a wake-up recognition result to the first electronic device. In some embodiments, the preset wake-up thresholds of the electronic devices in the device group may also be different thresholds, for example, the preset wake-up threshold in the first electronic device includes a threshold that is a first threshold, and the preset wake-up threshold in the second electronic device also includes a threshold that may be a value greater than or less than the first threshold, so that it can be seen that the electronic devices in the device group can respectively make a wake-up identification decision based on the respective preset wake-up thresholds.
In summary, any of the above implementations includes the following steps: and determining to allow or prohibit the first electronic equipment from being awakened according to the first awakening confidence coefficient and the awakening identification result sent by the at least one second electronic equipment. The step comprises the following three possible implementation modes:
in a first possible implementation manner, a first electronic device counts the wake-up conditions of slave devices in a device group through wake-up identification results sent by at least one second electronic device, and determines whether the wake-up conditions of the slave devices meet preset wake-up conditions, if the wake-up conditions meet the preset wake-up conditions, it is determined that the first electronic device is allowed to be woken up, and if the wake-up conditions do not meet the preset wake-up conditions, it is determined that the first electronic device is prohibited from being woken up.
In a second possible implementation manner, the first electronic device dynamically adjusts a preset wake-up threshold of the first electronic device in the device group based on a continuous manner, and determines to allow or prohibit the first electronic device to be woken up according to comparison between the adjusted preset wake-up threshold and a first wake-up confidence calculated by the current first electronic device.
In a third possible implementation manner, the first electronic device dynamically adjusts a preset wake-up threshold of the first electronic device in the device group based on a discrete manner, and determines to allow or prohibit the first electronic device to be woken up according to comparison between the adjusted preset wake-up threshold and a first wake-up confidence coefficient calculated by the current first electronic device.
The first implementation mode is based on a preset rule, the preset rule fully considers the awakening identification results of other slave devices in the device group, the awakening condition of the main device in the device group is optimized, and the awakening accuracy of the main device in the device group is improved. The latter two implementation manners are both from the perspective of the wake-up threshold, and the preset wake-up threshold of the master device is optimized in combination with the wake-up recognition results of other slave devices in the device group, and the purpose is the same as that of the first implementation manner.
With reference to fig. 9 to fig. 11, how the first electronic device makes a wake decision in conjunction with the wake recognition results sent by other slave devices in the device group is described in detail below. It should be noted that the determination processes of the following embodiments can be executed by the audio recognition module 403 in the processor 110 of the first electronic device.
Fig. 9 shows a flowchart of determining a voice wakeup method, and as shown in fig. 9, if the first wakeup confidence calculated by the first electronic device is smaller than the first threshold, the method includes the following steps:
step 201, counting the awakening condition of at least one second electronic device according to the awakening identification result sent by the at least one second electronic device.
Specifically, the wake up recognition result may include at least one of a wake up identifier of the at least one second electronic device and a second wake up confidence. The awakening identification comprises an identification for allowing or forbidding the second electronic equipment to be awakened, and the second awakening confidence coefficient is used for indicating the acoustic feature similarity between the awakening word in the audio data determined by the second electronic equipment and the preset awakening word.
In a possible case, if only the wake-up identifier is included in the wake-up recognition result sent by the at least one second electronic device, the first electronic device may count the number of devices of the second electronic device allowed to be woken up and the ratio of the number of devices to the total number of devices in the device group according to the wake-up identifier.
In another possible case, if the wake-up recognition result sent by the at least one second electronic device only includes the second wake-up confidence level, the first electronic device first needs to determine whether the second electronic device is allowed to be woken up according to the second wake-up confidence level and a wake-up threshold preset by each second electronic device. Then, the number of devices of the second electronic device allowed to be awakened is counted, and the number of the devices accounts for the proportion of the total number of the devices of the device group. The preset wake-up threshold of each second electronic device may be the same value as the preset wake-up threshold of the first electronic device, for example, the first threshold, or may be a different value from the preset wake-up threshold of the first electronic device. The preset wake-up thresholds of the second electronic devices may be the same or different, and this embodiment is not limited in any way.
In another possible case, if the wake-up identification result sent by at least one second electronic device includes both the wake-up identification and the second wake-up confidence, the first electronic device may determine, by statistics of any one of the above cases, the number of devices of the second electronic device that are allowed to be woken up, and the ratio of the number of devices to the total number of devices in the device group.
Step 202, judging whether the awakening condition of at least one second electronic device meets a preset awakening condition of the first electronic device, and if the awakening condition of the at least one second electronic device meets the preset awakening condition of the first electronic device, executing step 203; if the preset wake-up condition of the first electronic device is not satisfied, step 204 is executed.
In this embodiment, the preset wake-up condition of the first electronic device includes any one of the following:
(1) the second wake confidence of all the second electronic devices in the device group except the first electronic device is greater than or equal to a third threshold. The third threshold is a preset threshold in the first electronic device that allows the second electronic device to be woken up, and the threshold may be greater than the determination thresholds of some second electronic devices.
(2) The proportion of the number of the devices of the second electronic device allowed to be awakened in the device group to the total number of the devices of the device group is larger than or equal to the first proportion. Illustratively, the first ratio may be set to 80%, the meaning of this condition being: it may be determined that the master device first electronic device is allowed to wake up if 80% of the slave devices in the device group are allowed to wake up.
(3) The proportion of the number of the devices of the second electronic device allowed to be woken in the device group to the total number of the devices of the device group is smaller than the first proportion and larger than the second proportion, and the second wake-up confidences of the second electronic devices allowed to be woken are all larger than or equal to a third threshold. Wherein the first ratio is greater than the second ratio.
It should be noted that the third condition is initially designed as follows: the preset wake-up thresholds of the second electronic devices may be different, that is, the judgment thresholds of the second electronic devices allowed to be woken up are different, for example, the preset wake-up threshold of the second electronic device a is 0.7, the preset wake-up threshold of the second electronic device b is 0.8, and the preset wake-up threshold of the second electronic device c is 0.5. If the second awakening confidence coefficients of the audio data obtained by calculation of the second electronic devices are the same, based on different preset awakening thresholds, the awakening identification results determined by the second electronic devices are different. Based on this, the referential of the device number of the second electronic device which is counted by the first electronic device and allowed to be awakened is not high. In order to improve the accuracy of the judgment, whether the electronic device with the second awakening confidence coefficient lower than the third threshold exists in the second electronic devices allowed to be awakened or not can be checked by setting the third threshold, and if the electronic device exists, the first electronic device is determined to be prohibited from being awakened. The condition can effectively avoid that the main equipment is mistakenly awakened due to the fact that the preset awakening threshold value set by part of the slave equipment is too low, and the awakening accuracy of the main equipment is improved.
Illustratively, the first proportion is set to 80%, the second proportion is set to 50%, the third threshold is set to 0.7, the slave devices include the second electronic devices a, b, c, d, the preset wake-up threshold of each second electronic device and the transmitted wake-up identification result are shown in table 1, as can be seen from table 1, the proportion of the number of devices of the second electronic devices allowed to be woken up to the total number of devices of the device group is 60% (3/5), which is between the first proportion and the second proportion, however, there is an electronic device, i.e., the second electronic device c, of the second electronic devices allowed to be woken up, whose confidence level is less than the third threshold 0.7, and therefore, it is determined that the first electronic device is not woken up if the third condition is not met.
TABLE 1
Second electronic device Presetting a wake-up threshold Second degree of awakening confidence Wake-up identifier
a 0.7 0.8 1
b 0.7 0.5 0
c 0.5 0.5 1
d 0.8 0.8 1
Step 203, determining that the first electronic device is allowed to be woken up.
Step 204, determining to prohibit the first electronic device from being awakened.
According to the scheme, based on the preset main device awakening condition, when the first awakening confidence coefficient calculated by the main device is smaller than the first threshold value, the awakening condition of the slave device in the device group is counted according to the awakening identification results sent by other slave devices in the device group, whether the awakening condition of the slave device meets the preset main device awakening condition is judged, and if yes, the main device in the device group is allowed to be awakened. According to the scheme, whether the main equipment in the equipment group executes the awakening response or not can be judged quickly, the awakening identification result of other equipment in the equipment group is fully utilized, and the awakening accuracy of the main equipment in the equipment group is improved.
It should be understood that if most of the slave devices in the device group except the master device all determine that the master device is allowed to be woken up, and the preset wake-up threshold of the master device is too high, the master device may make a false determination, so that it may be determined whether the wake-up threshold of the current master device needs to be dynamically adjusted by using the methods provided by the following two embodiments in combination with the wake-up situations of other slave devices in the device group, so as to improve the accuracy of waking up the master device in the device group.
Fig. 10 is a flowchart illustrating another voice wake-up method, as shown in fig. 10, if the first wake-up confidence calculated by the first electronic device is smaller than the first threshold, the method includes the following steps:
step 301, according to the wake-up recognition result sent by at least one second electronic device, counting a first weight value of a second electronic device allowed to be woken up in the device group.
In this embodiment, a weight value of the electronic device in the device group is introduced, where the weight value is used to indicate a confidence level of the wake-up recognition result of the electronic device. The weight value may be a value related to a device type of the electronic device, and may also be a value related to software/hardware performance of the electronic device. As an example, a smart home includes smart devices of multiple device types, each having a voice wake-up function, including a smart television, a smart speaker, a smart lamp, a smart air conditioner, a smart refrigerator, and the like, and devices having the same wake-up word in the multiple types of smart devices may be grouped into a device group. Suppose that the device group includes 1 smart television, 2 smart speakers, and 2 smart lights, and the weight values of different device types in the device group can be preset, for example, the weight value of the smart television is 0.3, the weight value of the smart speaker is 0.6, and the weight value of the smart light is 0.1. As another example, each electronic device in the device group may be a same type of device, such as a smart speaker, as shown in fig. 4c, the smart speaker may be subdivided into a speaker Pro, a conventional speaker, and a speaker mini, and the size relationship of the processing performance is as follows: the sound box Pro is larger than the conventional sound box and larger than the sound box mini, and the weight values of various sound boxes can be preset according to the processing performance of the sound box, for example, the weight value of the sound box Pro is 0.5, the weight value of the conventional sound box is 0.3, and the weight value of the sound box mini is 0.2.
It should be noted that, the weight values of the electronic devices in the device group may be preset or recommended by a third-party device (for example, a service platform provided by a manufacturer of the electronic device). As an example, when a certain electronic device newly joins to a device group, the third-party device may issue a weight value of the electronic device to a master device of the electronic device or the device group to which the electronic device joins. As an example, if the electronic device is a slave device in a device group, when the electronic device sends a wake-up recognition result to a master device in the device group, the weight value of the electronic device may be carried in the wake-up recognition result, so that the master device can timely learn the weight value of the newly added electronic device. In some embodiments, the weight values of the electronic devices may be maintained uniformly by the host device or the third party device.
In this embodiment, the wake up recognition result sent by the at least one second electronic device may include at least one of a wake up identifier of the at least one second electronic device, the second wake up confidence level, and a device identifier of the at least one second electronic device. Wherein the device identification is used for indicating the device type of the second electronic device. Specifically, the first electronic device first counts which second electronic devices are allowed to be woken up in the device group according to a wake-up identification result sent by at least one second electronic device, which may specifically refer to step 201 of the above embodiment, and details are not described here. After the second electronic device allowed to be awakened is determined, the device type of the second electronic device allowed to be awakened in the device group and the number of devices corresponding to the device type of the second electronic device allowed to be awakened are counted according to the device identification of the second electronic device. Based on the preset weight value of each electronic device, a first weight value of a second electronic device that is allowed to be waken in the device group is finally counted, which may be specifically referred to as the following formula:
α=γ 1 ×n 12 ×n 2 +…+γ x ×n x formula one
m=n 1 +n 2 +…+n x Formula II
Wherein, α represents a first weight value of a second electronic device allowed to be awakened in the device group;
γ i a weight value representing the ith type of second electronic equipment allowed to be awakened;
n i the device number of the ith type of second electronic device allowed to be awakened is represented, wherein i is 1,2, …, and x is a positive integer greater than or equal to 1;
x represents the number of device types of the second electronic device allowed to be woken up in the device group;
m represents the total number of devices of the second electronic device within the device group that are allowed to be woken up.
Illustratively, table 2 shows a parameter statistical table of all the second electronic devices except the first electronic device in the device group, and table 2 includes a weight value, a total number of devices, and a number of devices allowed to be woken up of each second electronic device.
TABLE 2
Second electronic device Weighted value Total number of devices Number of devices allowed to be woken up
a 0.5 2 1
b 0.3 1 0
c 0.2 3 2
As can be seen from table 2, the total number of devices of the second electronic devices allowed to be waken in the device group is 3, and the device group includes 1 second electronic device a and 2 second electronic devices c, where the weight value of the second electronic device a is 0.5, and the weight value of the second electronic device c is 0.2, and according to the above formula one, the first weight value α of the second electronic device allowed to be waken in the device group is 0.5 × 1+0.2 × 2 is 0.9.
Step 302, counting second weight values of all second electronic devices in the device group.
As an example, the first electronic device may count device types of all the second electronic devices in the device group and a total number of the second electronic devices corresponding to each device type according to a device identifier in a wake-up recognition result sent by at least one second electronic device. As another example, the first electronic device may directly determine the device types of all the second electronic devices in the device group and the total number of the second electronic devices corresponding to each device type according to the interconnection situation in the current device group. It should be noted that, when the newly added second electronic device in the device group is interconnected with the first electronic device for the first time, the device information of the second electronic device may be sent to the first electronic device, so that the device information of all the second electronic devices in the device group is prestored in the first electronic device, where the device information includes a device identifier and/or a weight value.
Specifically, the second weight values of all the second electronic devices in the device group may be determined according to the following formula:
β=γ 1 ×n 12 ×n 2 +…+γ z ×n z formula III
o=n 1 +n 2 +…+n x Formula four
Wherein, beta represents a second weighted value of all the second electronic devices in the device group;
γ j representing the weight value of the j second electronic equipment;
n j representing the total number of the j second electronic equipment; wherein j is 1,2, …, z, z is a positive integer greater than or equal to 1;
z represents the number of device types of the second electronic device within the device group;
o represents the total number of devices of all the second electronic devices in the device group.
By way of example, and again taking table 2 as an example, it can be seen that: the total number of the second electronic devices in the device group is 6, and the device group includes 2 second electronic devices a, 1 second electronic device b, and 3 second electronic devices c, where a weight value of the second electronic device a is 0.5, a weight value of the second electronic device b is 0.3, and a weight value of the second electronic device c is 0.2, and according to the above formula three, a second weight value β of all the second electronic devices in the device group is 0.5 × 2+0.3 × 1+0.2 × 3 — 1.9.
Step 303, adjusting the first threshold value based on the first weight value and the second weight value.
The first threshold is a threshold that allows the first electronic device to be woken up (i.e., a preset wake-up threshold of the first electronic device).
In a specific implementation manner, the first electronic device determines an adjustment parameter of a wake-up threshold of the first electronic device according to the first weight value, the second weight value, and the maximum threshold adjustment parameter, and the adjustment parameter may be determined by the following formula:
equation five of theta ═ alpha/beta
Δ' ═ θ × Δ formula six
In the formula, θ represents the weight proportion of the second electronic devices allowed to be woken up in the device group to all the second electronic devices;
Δ represents a maximum threshold adjustment parameter, which is a preset value, typically set between 0 and 1, for example Δ ═ 0.1.
Δ' represents an adjustment parameter for the wake-up threshold of the first electronic device.
The first electronic device adjusts a preset wake-up threshold of the first electronic device based on the adjustment parameter, and the adjusted wake-up threshold of the first electronic device can be determined through the following formula:
Thredhold current the formula of ═ Thredhold-Delta' seven
In the formula, Thredhold current Indicating an adjusted wake-up threshold (i.e., an adjusted first threshold) of the first electronic device;
threshold represents a preset wake-up threshold (i.e., a first threshold) of the first electronic device.
It can be seen that, in this embodiment, the adjustment parameter of the wake-up threshold is determined through weight value calculation of the device, the accuracy of the adjustment parameter is high, and the adjustment parameter at adjacent time is usually continuously changed.
And step 304, determining to allow or prohibit the first electronic device to be awakened according to the first awakening confidence level and the adjusted first threshold value.
Specifically, the first electronic device is determined to be allowed or prohibited to be woken up by comparing the first waking confidence with the adjusted first threshold. If the first awakening confidence coefficient calculated by the first electronic equipment is greater than or equal to the adjusted first threshold, determining that the first electronic equipment is allowed to be awakened; and if the first awakening confidence coefficient is smaller than the adjusted first threshold, determining to prohibit the first electronic device from being awakened.
According to the scheme, from the perspective of the awakening threshold value, the confidence level of the awakening identification result of the slave equipment allowed to be awakened is comprehensively analyzed by combining with the actual awakening condition of the slave equipment in the equipment group, the awakening threshold value of the master equipment in the equipment group is dynamically adjusted based on a continuous mode, and whether the current master equipment needs to make an awakening response or not is judged according to the comparison result of the adjusted awakening threshold value and the awakening confidence level calculated by the current master equipment. Because the adjusted awakening threshold value of the main equipment is more in accordance with the actual state of the whole equipment group which is awakened, the awakening accuracy of the main equipment in the equipment group is improved.
Fig. 11 shows a flowchart of another determination method for voice wakeup, as shown in fig. 11, if the first wakeup confidence calculated by the first electronic device is smaller than the first threshold, the method includes the following steps:
step 401, counting the number of devices of the second electronic device allowed to be woken up in the device group according to the wakening identification result sent by the at least one second electronic device.
In this embodiment, the wake-up recognition result sent by the at least one second electronic device includes a wake-up identifier and/or a second wake-up confidence of the at least one second electronic device. The first electronic device first counts which second electronic devices allowed to be woken up in the device group according to a wake-up identification result sent by at least one second electronic device, so as to determine the number of devices of the second electronic devices allowed to be woken up in the device group, which may be specifically referred to step 201 of the above embodiment and is not described herein again.
Step 402, determining threshold adjusting parameters corresponding to the number of the devices according to the awakening threshold adjusting table.
In this embodiment, a wake-up threshold adjustment table is pre-stored in a memory (e.g., the internal memory 121 or a memory in the processor 110) of the first electronic device.
In one possible implementation, the wake threshold adjustment table may include a correspondence between the number of devices of the second electronic device allowed to be woken and the threshold adjustment parameter. As an example, the wake threshold adjustment table may include a correspondence of a range of values of the number of devices of the second electronic device allowed to be woken to the threshold adjustment parameter. For example, assuming that the total number of the second electronic devices in the device group except for the first electronic device is 7, the device number of the second electronic devices allowed to be woken up, which is counted by the first electronic device, may fall within any one of the value ranges shown in table 3, and the first electronic device may determine the threshold adjustment parameter this time according to the wake-up threshold adjustment table shown in table 3.
TABLE 3
Figure BDA0002472815900000211
As can be seen from table 3, the larger the number of devices of the second electronic device allowed to be woken up, the larger the threshold adjustment parameter.
In another possible implementation manner, the wake-up threshold adjustment table may include a correspondence relationship between the ratio of the number of devices of the second electronic device allowed to be woken to the total number of devices of all the second electronic devices and the threshold adjustment parameter. It will be appreciated that the larger the ratio, the larger the threshold adjustment parameter.
As can be seen, in this embodiment, the threshold adjustment parameters are determined through a preset wake-up threshold adjustment table, and each threshold adjustment parameter is a discrete value.
Step 403, adjusting the first threshold value based on the threshold value adjusting parameter.
In the present embodiment, the preset wake-up threshold of the first electronic device is adjusted in the same manner as in step 303 of the above embodiment, and after the threshold adjustment parameter is determined, the formula seven may be adopted to reduce the preset wake-up threshold of the first electronic device.
And step 404, determining to allow or prohibit the first electronic device from being awakened according to the first awakening confidence level and the adjusted first threshold value.
Step 404 of this embodiment is the same as step 304 of the above embodiment, and reference may be made to the above embodiment for details, which are not repeated herein.
The above scheme also starts from the angle of the wake-up threshold, combines with the actual wake-up situation of the slave devices in the device group, counts the number or proportion of the devices of the slave devices allowed to be woken up, further dynamically adjusts the wake-up threshold of the master device in the device group based on a discrete mode, and determines whether the current master device needs to make a wake-up response according to the comparison result of the adjusted wake-up threshold and the wake-up confidence calculated by the current master device. Because the adjusted awakening threshold value of the main equipment is more in accordance with the actual state of the whole equipment group which is awakened, the awakening accuracy of the main equipment in the equipment group is improved.
The voice wake-up method provided by the embodiment of the present application is described above in detail, and the voice wake-up apparatus provided by the embodiment of the present application will be described below. Fig. 12 is a schematic structural diagram of a voice wake-up device according to an embodiment of the present application. As shown in fig. 12, a voice wake-up apparatus 1200 provided in the embodiment of the present application includes:
an obtaining module 1201, configured to obtain a first wake-up confidence of audio data, where the first wake-up confidence is used to indicate an acoustic feature similarity between a wake-up word in the audio data determined by the first electronic device and a preset wake-up word;
a receiving module 1202, configured to receive a wake-up recognition result sent by the at least one second electronic device, where the wake-up recognition result is used to indicate that the at least one second electronic device is allowed or prohibited from being woken up;
a processing module 1203, configured to determine to allow or prohibit the first electronic device to be woken up according to the first waking confidence and the waking identification result.
Optionally, the wake-up recognition result includes at least one of a wake-up identifier and a second wake-up confidence of the at least one second electronic device; the awakening identifier comprises an identifier which allows or prohibits the second electronic device to be awakened, and the second awakening confidence coefficient is used for indicating the acoustic feature similarity between the awakening word in the audio data determined by the second electronic device and a preset awakening word.
Optionally, the processing module 1203 is specifically configured to:
if the first awakening confidence coefficient is larger than or equal to a first threshold value, determining that the first electronic equipment is allowed to be awakened; or
If the first awakening confidence coefficient is smaller than the first threshold value and the first awakening confidence coefficient is larger than a second threshold value, determining to allow or prohibit the first electronic equipment to be awakened according to the first awakening confidence coefficient and the awakening identification result; or
And if the first awakening confidence coefficient is smaller than or equal to the second threshold value, determining that the first electronic equipment is prohibited from being awakened.
Optionally, the processing module 1203 is specifically configured to:
counting the awakening condition of the at least one second electronic device according to the awakening identification result;
and if the awakening condition meets the preset awakening condition of the first electronic equipment, determining that the first electronic equipment is allowed to be awakened.
Optionally, the wake-up condition includes any one of the following:
the second awakening confidence degrees of all second electronic equipment except the first electronic equipment in the equipment group are greater than or equal to a third threshold;
the proportion of the number of the devices of the second electronic device which is allowed to be awakened in the device group to the total number of the devices of the device group is greater than or equal to a first proportion;
the proportion of the number of the devices of the second electronic device allowed to be awakened in the device group to the total number of the devices of the device group is smaller than the first proportion and larger than a second proportion, and second awakening confidences of the second electronic devices allowed to be awakened are all larger than or equal to a third threshold;
the third threshold is a preset threshold allowing the second electronic device to be awakened in the first electronic device.
Optionally, the wake-up recognition result further includes a device identifier of the at least one second electronic device, where the device identifier is used to indicate a device type of the second electronic device, and determine a weight value of the second electronic device.
Optionally, the processing module 1203 is specifically configured to:
counting a first weight value of a second electronic device which is allowed to be awakened in the device group according to the awakening identification result;
counting second weighted values of all second electronic equipment in the equipment group;
adjusting a first threshold based on the first weight value and the second weight value;
determining to allow or prohibit the first electronic device from being awakened according to the first awakening confidence level and the adjusted first threshold;
wherein the first threshold is a threshold that allows the first electronic device to be woken up.
Optionally, the first weight value is determined according to the number of devices corresponding to the device type of the second electronic device allowed to be woken up in the device group, and the weight value corresponding to each device type of the second electronic device allowed to be woken up; the second weight value is determined according to the number of the devices corresponding to the device types of all the second electronic devices in the device group and the weight value corresponding to each device type.
Optionally, the processing module 1203 is specifically configured to:
and taking the product of the ratio of the first weight value to the second weight value and a maximum threshold adjusting parameter as a threshold adjusting parameter, and adjusting the first threshold according to the threshold adjusting parameter.
Optionally, the processing module 1203 is specifically configured to:
counting the number of the second electronic equipment allowed to be awakened in the equipment group according to the awakening identification result;
determining a threshold adjustment parameter corresponding to the equipment number according to an awakening threshold adjustment table, wherein the awakening threshold adjustment table comprises a corresponding relation between the equipment number of the second electronic equipment allowed to be awakened and the threshold adjustment parameter;
adjusting the first threshold based on the threshold adjustment parameter;
and determining to allow or prohibit the first electronic equipment to be awakened according to the first awakening confidence and the adjusted first threshold.
Optionally, the processing module 1203 is specifically configured to:
if the first awakening confidence coefficient is larger than or equal to the adjusted first threshold value, determining that the first electronic equipment is allowed to be awakened; or
And if the first awakening confidence coefficient is smaller than the adjusted first threshold, determining to prohibit the first electronic device from being awakened.
The voice wake-up device provided in the embodiment of the present application is configured to execute the technical solution executed by the first electronic device in the method embodiments shown in fig. 7, fig. 8a, fig. 8b, and fig. 9, and the implementation principle and the technical effect are similar, which are not described herein again.
Fig. 13 is a schematic structural diagram of a voice wake-up device according to an embodiment of the present application. As shown in fig. 13, a voice wake-up apparatus 1300 provided in an embodiment of the present application includes:
an obtaining module 1301, configured to obtain a first wake-up confidence level of audio data, where the first wake-up confidence level is used to indicate an acoustic feature similarity between a wake-up word in the audio data determined by the first electronic device and a preset wake-up word;
a receiving module 1302, configured to receive a wake-up recognition result sent by the at least one second electronic device, where the wake-up recognition result is used to indicate that the second electronic device is allowed or prohibited to be woken up;
the processing module 1303 is configured to adjust a first threshold according to the wake-up recognition result, where the first threshold is a threshold that allows the first electronic device to be woken up; and determining to allow or prohibit the first electronic equipment from being awakened according to the adjusted first threshold and the first awakening confidence level.
Optionally, the processing module 1303 is specifically configured to:
counting a first weight value of a second electronic device which is allowed to be awakened in the device group according to the awakening identification result;
counting second weighted values of all second electronic equipment in the equipment group;
adjusting the first threshold based on the first and second weight values.
Optionally, the first weight value is determined according to the number of devices corresponding to the device type of the second electronic device allowed to be woken up in the device group, and the weight value corresponding to each device type of the second electronic device allowed to be woken up; the second weight value is determined according to the number of the devices corresponding to the device types of all the second electronic devices in the device group and the weight value corresponding to each device type.
Optionally, the processing module 1303 is specifically configured to:
and taking the product of the ratio of the first weight value to the second weight value and a maximum threshold value adjusting parameter as a threshold value adjusting parameter, and adjusting the first threshold value according to the threshold value adjusting parameter.
Optionally, the processing module 1303 is specifically configured to:
counting the number of the second electronic equipment allowed to be awakened in the equipment group according to the awakening identification result;
determining a threshold adjustment parameter corresponding to the equipment number according to an awakening threshold adjustment table, wherein the awakening threshold adjustment table comprises a corresponding relation between the equipment number of the second electronic equipment allowed to be awakened and the threshold adjustment parameter;
adjusting the first threshold based on the threshold adjustment parameter.
Optionally, the wake-up recognition result includes at least one of a wake-up identifier and a second wake-up confidence of the at least one second electronic device; the awakening identifier comprises an identifier which allows or prohibits the second electronic device to be awakened, and the second awakening confidence coefficient is used for indicating the acoustic feature similarity between the awakening word in the audio data determined by the second electronic device and a preset awakening word.
Optionally, the processing module 1303 is specifically configured to:
if the first awakening confidence is greater than or equal to the adjusted first threshold, determining that the first electronic device is allowed to be awakened; or
And if the first awakening confidence coefficient is smaller than the adjusted first threshold, determining to prohibit the first electronic device from being awakened.
The voice wake-up device provided in the embodiment of the present application is configured to execute the technical solution executed by the first electronic device in the method embodiments shown in fig. 10 and fig. 11, and the implementation principle and the technical effect are similar, which are not described herein again.
It should be noted that the division of the modules of the voice wake-up device is only a logical division, and all or part of the above blocks may be integrated into one physical entity or may be physically separated in actual implementation. And these modules can be realized in the form of software called by processing element; or may be implemented entirely in hardware; and part of the modules can be realized in the form of calling software by the processing element, and part of the modules can be realized in the form of hardware. For example, the processing module may be a processing element that is set up separately, or may be implemented by being integrated into a chip of the apparatus, or may be stored in a memory of the apparatus in the form of program code, and a function of the processing module may be called and executed by a processing element of the apparatus. Other modules are implemented similarly. In addition, all or part of the modules can be integrated together or can be independently realized. The processing element described herein may be an integrated circuit having signal processing capabilities. In implementation, each step of the above method or each module above may be implemented by an integrated logic circuit of hardware in a processor element or an instruction in the form of software.
For example, the above modules may be one or more integrated circuits configured to implement the above methods, such as: one or more Application Specific Integrated Circuits (ASICs), or one or more microprocessors (DSPs), or one or more Field Programmable Gate Arrays (FPGAs), among others. For another example, when some of the above modules are implemented in the form of a processing element scheduler code, the processing element may be a general-purpose processor, such as a Central Processing Unit (CPU) or other processor that can call program code. As another example, these modules may be integrated together, implemented in the form of a system-on-a-chip (SOC).
Fig. 14 is a schematic diagram of a hardware structure of a voice wake-up device according to an embodiment of the present application. As shown in fig. 14, the voice wakeup device 1400 provided in the embodiment of the present application may include:
a processor 1401, a memory 1402, and a communication interface 1403. A memory 1402 for storing a computer program; the processor 1401 is configured to execute the computer program stored in the memory 1402 to implement the method performed by the first electronic device in any of the above method embodiments. A communication interface 1403 for data communication or signal communication with at least one second electronic device or server.
Alternatively, the memory 1402 may be separate or integrated with the processor 1401. When the memory 1402 is a separate device from the processor 1401, the voice wake-up apparatus 1400 may further include: a bus 1404 for connecting the memory 1402 and the processor 1401.
In a possible embodiment, the processing module 1203 in fig. 12 may be implemented integrally in the processor 1401, and the receiving module 1202 may be implemented integrally in the communication interface 1403. The processing module 1303 in fig. 13 may be implemented by being integrated in the processor 1401, and the receiving module 1302 may be implemented by being integrated in the communication interface 1403.
In one possible implementation, the processor 1401 may be configured to implement the information processing operation of the first electronic device in the above method embodiment, and the communication interface 1403 may be configured to implement the signal transceiving operation of the first electronic device in the above method embodiment.
The voice wake-up device provided in this embodiment may be configured to execute the method executed by the first electronic device in any of the above method embodiments, and the implementation principle and the technical effect are similar, which are not described herein again.
The embodiment of the present application further provides a computer-readable storage medium, where the computer-readable storage medium stores computer-executable instructions, and when the computer-executable instructions are executed by a processor, the computer-executable instructions are used to implement a technical solution of the first electronic device in any one of the foregoing method embodiments.
The embodiment of the present application further provides a program, and when the program is executed by a processor, the program is configured to execute the technical solution of the first electronic device in any one of the foregoing method embodiments.
The embodiment of the present application further provides a computer program product, which includes program instructions, where the program instructions are used to implement the technical solution of the first electronic device in any one of the foregoing method embodiments.
An embodiment of the present application further provides a chip, including: a processing module and a communication interface, wherein the processing module is capable of executing the technical solution of the first electronic device in the foregoing method embodiment.
Further, the chip further includes a storage module (e.g., a memory), where the storage module is configured to store instructions, and the processing module is configured to execute the instructions stored by the storage module, and the execution of the instructions stored in the storage module causes the processing module to execute the technical solution of the first electronic device.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, it may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center via wired (e.g., coaxial cable, fiber optic, Digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.) means. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that includes one or more available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.
In this application, "at least one" means one or more, "a plurality" means two or more. "and/or" describes the association relationship of the associated object, indicating that there may be three relationships, for example, a and/or B, which may indicate: a exists alone, A and B exist simultaneously, and B exists alone, wherein A and B can be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship; in the formula, the character "/" indicates that the preceding and following related objects are in a relationship of "division". "at least one of the following" or similar expressions refer to any combination of these items, including any combination of the singular or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, a-b, a-c, b-c, or a-b-c, wherein a, b, c may be single or multiple.
It is to be understood that the various numerical references referred to in the embodiments of the present application are merely for convenience of description and distinction and are not intended to limit the scope of the embodiments of the present application.
It should be understood that, in the embodiment of the present application, the sequence numbers of the above-mentioned processes do not imply an order of execution, and the order of execution of the processes should be determined by their functions and inherent logic, and should not limit the implementation process of the embodiment of the present application in any way.

Claims (38)

1. A voice wake-up method is applied to a first electronic device, wherein the first electronic device and at least one second electronic device belong to the same device group, and the method comprises the following steps:
acquiring a first awakening confidence coefficient of audio data, wherein the first awakening confidence coefficient is used for indicating the acoustic feature similarity of an awakening word in the audio data determined by the first electronic device and a preset awakening word;
receiving a wake-up identification result sent by the at least one second electronic device, wherein the wake-up identification result is used for indicating that the at least one second electronic device is allowed or prohibited to be woken up;
and determining to allow or prohibit the first electronic equipment to be awakened according to the first awakening confidence level and the awakening identification result.
2. The method according to claim 1, wherein the wake-up recognition result comprises at least one of a wake-up identifier of the at least one second electronic device and a second wake-up confidence level;
the awakening identifier comprises an identifier which allows or prohibits the second electronic device to be awakened, and the second awakening confidence coefficient is used for indicating the acoustic feature similarity between the awakening word in the audio data determined by the second electronic device and a preset awakening word.
3. The method according to claim 1 or 2, wherein the determining to allow or prohibit the first electronic device from being woken up according to the first wake-up confidence level and the wake-up recognition result comprises:
if the first awakening confidence coefficient is larger than or equal to a first threshold value, determining that the first electronic equipment is allowed to be awakened; or
If the first awakening confidence degree is smaller than the first threshold value and the first awakening confidence degree is larger than a second threshold value, determining to allow or prohibit the first electronic equipment to be awakened according to the first awakening confidence degree and the awakening identification result; or
And if the first awakening confidence coefficient is smaller than or equal to the second threshold value, determining to prohibit the first electronic equipment from being awakened.
4. The method of claim 3, wherein the determining to allow or prohibit the first electronic device from being woken up according to the first wake-up confidence level and the wake-up recognition result comprises:
counting the awakening condition of the at least one second electronic device according to the awakening identification result;
and if the awakening condition meets the preset awakening condition of the first electronic equipment, determining that the first electronic equipment is allowed to be awakened.
5. The method of claim 4, wherein the wake-up condition comprises any one of:
the second awakening confidence degrees of all the second electronic devices in the device group except the first electronic device are larger than or equal to a third threshold;
the proportion of the number of the devices of the second electronic device which is allowed to be awakened in the device group to the total number of the devices of the device group is larger than or equal to a first proportion;
the proportion of the number of the devices of the second electronic device allowed to be awakened in the device group to the total number of the devices of the device group is smaller than the first proportion and larger than the second proportion, and second awakening confidences of the second electronic device allowed to be awakened are all larger than or equal to a third threshold;
the third threshold is a preset threshold allowing the second electronic device to be awakened in the first electronic device.
6. The method of claim 2, wherein the wake up recognition result further comprises a device identifier of the at least one second electronic device, wherein the device identifier is used for indicating a device type of the second electronic device, and determining a weight value of the second electronic device.
7. The method of claim 6, wherein the determining to allow or prohibit the first electronic device from being woken up according to the first wake-up confidence level and the wake-up recognition result comprises:
counting a first weight value of a second electronic device which is allowed to be awakened in the device group according to the awakening identification result;
counting second weighted values of all second electronic equipment in the equipment group;
adjusting a first threshold based on the first weight value and the second weight value;
determining to allow or prohibit the first electronic device to be awakened according to the first awakening confidence level and the adjusted first threshold;
wherein the first threshold is a threshold that allows the first electronic device to be woken up.
8. The method according to claim 7, wherein the first weight value is determined according to the number of devices corresponding to the device types of the second electronic devices allowed to be woken in the device group and the weight values corresponding to the device types of the second electronic devices allowed to be woken; the second weight value is determined according to the number of devices corresponding to the device types of all the second electronic devices in the device group and the weight value corresponding to each device type.
9. The method of claim 7, wherein adjusting the first threshold based on the first and second weight values comprises:
and taking the product of the ratio of the first weight value to the second weight value and a maximum threshold adjusting parameter as a threshold adjusting parameter, and adjusting the first threshold according to the threshold adjusting parameter.
10. The method according to claim 1 or 2, wherein the determining to allow or prohibit the first electronic device from being woken up according to the first wake-up confidence level and the wake-up recognition result comprises:
counting the number of the second electronic equipment allowed to be awakened in the equipment group according to the awakening identification result;
determining a threshold adjustment parameter corresponding to the equipment number according to a wake-up threshold adjustment table, wherein the wake-up threshold adjustment table comprises a corresponding relation between the equipment number of the second electronic equipment allowed to be woken up and the threshold adjustment parameter;
adjusting a first threshold based on the threshold adjustment parameter;
and determining to allow or prohibit the first electronic equipment to be awakened according to the first awakening confidence and the adjusted first threshold.
11. The method according to claim 7 or 10, wherein the determining to allow or prohibit the first electronic device from being woken up according to the first wake-up confidence and the adjusted first threshold comprises:
if the first awakening confidence is greater than or equal to the adjusted first threshold, determining that the first electronic device is allowed to be awakened; or
And if the first awakening confidence coefficient is smaller than the adjusted first threshold, determining to prohibit the first electronic device from being awakened.
12. A voice wake-up method is applied to a first electronic device, wherein the first electronic device and at least one second electronic device belong to the same device group, and the method comprises the following steps:
acquiring a first awakening confidence coefficient of audio data, wherein the first awakening confidence coefficient is used for indicating the acoustic feature similarity of an awakening word in the audio data determined by the first electronic device and a preset awakening word;
receiving a wake-up identification result sent by the at least one second electronic device, wherein the wake-up identification result is used for indicating that the second electronic device is allowed or prohibited to be woken up;
adjusting a first threshold value according to the awakening identification result, wherein the first threshold value is a threshold value for allowing the first electronic device to be awakened;
and determining to allow or prohibit the first electronic equipment to be awakened according to the adjusted first threshold and the first awakening confidence level.
13. The method of claim 12, wherein the adjusting the first threshold according to the wake-up recognition result comprises:
counting a first weight value of a second electronic device which is allowed to be waken in the device group according to the wakening identification result;
counting second weighted values of all second electronic equipment in the equipment group;
adjusting the first threshold based on the first and second weight values.
14. The method of claim 13, wherein the first weight value is determined according to the number of devices corresponding to the device types of the second electronic devices allowed to be woken in the device group, and the weight values corresponding to the device types of the second electronic devices allowed to be woken; the second weight value is determined according to the number of devices corresponding to the device types of all the second electronic devices in the device group and the weight value corresponding to each device type.
15. The method of claim 13, wherein adjusting the first threshold based on the first and second weight values comprises:
and taking the product of the ratio of the first weight value to the second weight value and a maximum threshold adjusting parameter as a threshold adjusting parameter, and adjusting the first threshold according to the threshold adjusting parameter.
16. The method of claim 12, wherein adjusting the first threshold according to the wake-up recognition result comprises:
counting the number of the second electronic equipment allowed to be awakened in the equipment group according to the awakening identification result;
determining a threshold adjustment parameter corresponding to the equipment number according to a wake-up threshold adjustment table, wherein the wake-up threshold adjustment table comprises a corresponding relation between the equipment number of the second electronic equipment allowed to be woken up and the threshold adjustment parameter;
adjusting the first threshold based on the threshold adjustment parameter.
17. The method according to any of claims 12-16, wherein the wake up recognition result comprises at least one of a wake up identity of the at least one second electronic device, a second wake up confidence;
the awakening identifier comprises an identifier which allows or prohibits the second electronic device to be awakened, and the second awakening confidence coefficient is used for indicating the acoustic feature similarity between the awakening word in the audio data determined by the second electronic device and a preset awakening word.
18. The method according to any of claims 12-16, wherein the determining to allow or prohibit the first electronic device from being woken up according to the adjusted first threshold and the first wake-up confidence level comprises:
if the first awakening confidence is greater than or equal to the adjusted first threshold, determining that the first electronic device is allowed to be awakened; or
And if the first awakening confidence coefficient is smaller than the adjusted first threshold value, determining that the first electronic equipment is prohibited from being awakened.
19. The utility model provides a voice wake-up equipment, its characterized in that, voice wake-up equipment is first electronic equipment, first electronic equipment and at least one second electronic equipment belong to same equipment group, voice wake-up equipment includes:
the acquiring module is used for acquiring a first awakening confidence coefficient of audio data, wherein the first awakening confidence coefficient is used for indicating the acoustic feature similarity between an awakening word in the audio data determined by the first electronic device and a preset awakening word;
a receiving module, configured to receive a wake-up identification result sent by the at least one second electronic device, where the wake-up identification result is used to indicate that the at least one second electronic device is allowed or prohibited to be woken up;
and the processing module is used for determining to allow or prohibit the first electronic equipment to be awakened according to the first awakening confidence coefficient and the awakening identification result.
20. The device of claim 19, wherein the wake up recognition result comprises at least one of a wake up flag of the at least one second electronic device and a second wake up confidence level;
the awakening identifier comprises an identifier which allows or prohibits the second electronic device to be awakened, and the second awakening confidence coefficient is used for indicating the acoustic feature similarity between the awakening word in the audio data determined by the second electronic device and a preset awakening word.
21. The device according to claim 19 or 20, wherein the processing module is specifically configured to:
if the first awakening confidence coefficient is larger than or equal to a first threshold value, determining that the first electronic equipment is allowed to be awakened; or alternatively
If the first awakening confidence coefficient is smaller than the first threshold value and the first awakening confidence coefficient is larger than a second threshold value, determining to allow or prohibit the first electronic equipment to be awakened according to the first awakening confidence coefficient and the awakening identification result; or
And if the first awakening confidence coefficient is smaller than or equal to the second threshold value, determining to prohibit the first electronic equipment from being awakened.
22. The device according to claim 21, wherein the processing module is specifically configured to:
counting the awakening condition of the at least one second electronic device according to the awakening identification result;
and if the awakening condition meets the preset awakening condition of the first electronic equipment, determining that the first electronic equipment is allowed to be awakened.
23. The device of claim 22, wherein the wake-up condition comprises any one of:
the second awakening confidence degrees of all second electronic equipment except the first electronic equipment in the equipment group are greater than or equal to a third threshold;
the proportion of the number of the devices of the second electronic device which is allowed to be awakened in the device group to the total number of the devices of the device group is larger than or equal to a first proportion;
the proportion of the number of the devices of the second electronic device allowed to be awakened in the device group to the total number of the devices of the device group is smaller than the first proportion and larger than the second proportion, and second awakening confidences of the second electronic device allowed to be awakened are all larger than or equal to a third threshold;
the third threshold is a preset threshold allowing the second electronic device to be awakened in the first electronic device.
24. The device of claim 20, wherein the wake up recognition result further comprises a device identifier of the at least one second electronic device, wherein the device identifier is used for indicating a device type of the second electronic device, and determining a weight value of the second electronic device.
25. The device according to claim 24, wherein the processing module is specifically configured to:
counting a first weight value of a second electronic device which is allowed to be awakened in the device group according to the awakening identification result;
counting second weighted values of all second electronic equipment in the equipment group;
adjusting a first threshold based on the first weight value and the second weight value;
determining to allow or prohibit the first electronic device to be awakened according to the first awakening confidence level and the adjusted first threshold;
wherein the first threshold is a threshold that allows the first electronic device to be woken up.
26. The device according to claim 25, wherein the first weight value is determined according to the number of devices corresponding to the device type of the second electronic device allowed to be woken in the device group, and the weight values corresponding to the respective device types of the second electronic device allowed to be woken; the second weight value is determined according to the number of devices corresponding to the device types of all the second electronic devices in the device group and the weight value corresponding to each device type.
27. The device according to claim 25, wherein the processing module is specifically configured to:
and taking the product of the ratio of the first weight value to the second weight value and a maximum threshold value adjusting parameter as a threshold value adjusting parameter, and adjusting the first threshold value according to the threshold value adjusting parameter.
28. The device according to claim 19 or 20, wherein the processing module is specifically configured to:
counting the number of the second electronic equipment allowed to be awakened in the equipment group according to the awakening identification result;
determining a threshold adjustment parameter corresponding to the equipment number according to a wake-up threshold adjustment table, wherein the wake-up threshold adjustment table comprises a corresponding relation between the equipment number of the second electronic equipment allowed to be woken up and the threshold adjustment parameter;
adjusting a first threshold based on the threshold adjustment parameter;
and determining to allow or prohibit the first electronic equipment to be awakened according to the first awakening confidence and the adjusted first threshold.
29. The device according to claim 25 or 28, wherein the processing module is specifically configured to:
if the first awakening confidence coefficient is larger than or equal to the adjusted first threshold value, determining that the first electronic equipment is allowed to be awakened; or alternatively
And if the first awakening confidence coefficient is smaller than the adjusted first threshold, determining to prohibit the first electronic device from being awakened.
30. The utility model provides a voice wake-up equipment, its characterized in that, voice wake-up equipment is first electronic equipment, first electronic equipment and at least one second electronic equipment belong to same equipment group, voice wake-up equipment includes:
the acquiring module is used for acquiring a first awakening confidence coefficient of audio data, wherein the first awakening confidence coefficient is used for indicating the acoustic feature similarity between an awakening word in the audio data determined by the first electronic device and a preset awakening word;
a receiving module, configured to receive a wake-up identification result sent by the at least one second electronic device, where the wake-up identification result is used to indicate that the second electronic device is allowed or prohibited to be woken up;
the processing module is used for adjusting a first threshold value according to the awakening identification result, wherein the first threshold value is a threshold value allowing the first electronic device to be awakened; and determining to allow or prohibit the first electronic equipment to be awakened according to the adjusted first threshold and the first awakening confidence level.
31. The device according to claim 30, wherein the processing module is specifically configured to:
counting a first weight value of a second electronic device which is allowed to be awakened in the device group according to the awakening identification result;
counting second weighted values of all second electronic equipment in the equipment group;
adjusting the first threshold based on the first and second weight values.
32. The device according to claim 31, wherein the first weight value is determined according to the number of devices corresponding to the device type of the second electronic device allowed to be woken in the device group, and the weight values corresponding to the respective device types of the second electronic device allowed to be woken; the second weight value is determined according to the number of the devices corresponding to the device types of all the second electronic devices in the device group and the weight value corresponding to each device type.
33. The device according to claim 31, wherein the processing module is specifically configured to:
and taking the product of the ratio of the first weight value to the second weight value and a maximum threshold value adjusting parameter as a threshold value adjusting parameter, and adjusting the first threshold value according to the threshold value adjusting parameter.
34. The device according to claim 30, wherein the processing module is specifically configured to:
counting the number of the second electronic equipment allowed to be awakened in the equipment group according to the awakening identification result;
determining a threshold adjustment parameter corresponding to the equipment number according to a wake-up threshold adjustment table, wherein the wake-up threshold adjustment table comprises a corresponding relation between the equipment number of the second electronic equipment allowed to be woken up and the threshold adjustment parameter;
adjusting the first threshold based on the threshold adjustment parameter.
35. The device according to any of claims 30-34, wherein the wake up recognition result comprises at least one of a wake up identity of the at least one second electronic device, a second wake up confidence;
the awakening identifier comprises an identifier which allows or prohibits the second electronic device to be awakened, and the second awakening confidence coefficient is used for indicating the acoustic feature similarity between the awakening word in the audio data determined by the second electronic device and a preset awakening word.
36. The device according to any one of claims 30 to 34, wherein the processing module is specifically configured to:
if the first awakening confidence is greater than or equal to the adjusted first threshold, determining that the first electronic device is allowed to be awakened; or alternatively
And if the first awakening confidence coefficient is smaller than the adjusted first threshold, determining to prohibit the first electronic device from being awakened.
37. A voice wake-up device, comprising: a memory for storing a computer program and a processor for calling and executing the computer program from the memory, such that the processor executes the computer program to perform the method of any of claims 1 to 11, or the method of any of claims 12 to 18.
38. A storage medium, characterized in that the storage medium comprises a computer program for implementing the method according to any one of claims 1 to 11 or the method according to any one of claims 12 to 18.
CN202010353897.1A 2020-04-29 2020-04-29 Voice wake-up method, device and storage medium Active CN111696562B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010353897.1A CN111696562B (en) 2020-04-29 2020-04-29 Voice wake-up method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010353897.1A CN111696562B (en) 2020-04-29 2020-04-29 Voice wake-up method, device and storage medium

Publications (2)

Publication Number Publication Date
CN111696562A CN111696562A (en) 2020-09-22
CN111696562B true CN111696562B (en) 2022-08-19

Family

ID=72476807

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010353897.1A Active CN111696562B (en) 2020-04-29 2020-04-29 Voice wake-up method, device and storage medium

Country Status (1)

Country Link
CN (1) CN111696562B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112420051A (en) * 2020-11-18 2021-02-26 青岛海尔科技有限公司 Equipment determination method, device and storage medium
CN112509596B (en) * 2020-11-19 2024-07-09 北京小米移动软件有限公司 Wakeup control method, wakeup control device, storage medium and terminal
CN115079810A (en) * 2021-03-10 2022-09-20 Oppo广东移动通信有限公司 Information processing method and device, main control equipment and controlled equipment
CN113889102A (en) * 2021-09-23 2022-01-04 达闼科技(北京)有限公司 Instruction receiving method, system, electronic device, cloud server and storage medium
CN115171703B (en) * 2022-05-30 2024-05-24 青岛海尔科技有限公司 Distributed voice awakening method and device, storage medium and electronic device

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107622770A (en) * 2017-09-30 2018-01-23 百度在线网络技术(北京)有限公司 voice awakening method and device
CN109273007A (en) * 2018-10-11 2019-01-25 科大讯飞股份有限公司 Voice awakening method and device
CN109346071A (en) * 2018-09-26 2019-02-15 出门问问信息科技有限公司 Wake up processing method, device and electronic equipment
CN110223684A (en) * 2019-05-16 2019-09-10 华为技术有限公司 A kind of voice awakening method and equipment
CN110364151A (en) * 2019-07-15 2019-10-22 华为技术有限公司 A kind of method and electronic equipment of voice wake-up
CN110570861A (en) * 2019-09-24 2019-12-13 Oppo广东移动通信有限公司 method and device for voice wake-up, terminal equipment and readable storage medium
CN111081217A (en) * 2019-12-03 2020-04-28 珠海格力电器股份有限公司 Voice wake-up method and device, electronic equipment and storage medium

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107134279B (en) * 2017-06-30 2020-06-19 百度在线网络技术(北京)有限公司 Voice awakening method, device, terminal and storage medium

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107622770A (en) * 2017-09-30 2018-01-23 百度在线网络技术(北京)有限公司 voice awakening method and device
CN109346071A (en) * 2018-09-26 2019-02-15 出门问问信息科技有限公司 Wake up processing method, device and electronic equipment
CN109273007A (en) * 2018-10-11 2019-01-25 科大讯飞股份有限公司 Voice awakening method and device
CN110223684A (en) * 2019-05-16 2019-09-10 华为技术有限公司 A kind of voice awakening method and equipment
CN110364151A (en) * 2019-07-15 2019-10-22 华为技术有限公司 A kind of method and electronic equipment of voice wake-up
CN110570861A (en) * 2019-09-24 2019-12-13 Oppo广东移动通信有限公司 method and device for voice wake-up, terminal equipment and readable storage medium
CN111081217A (en) * 2019-12-03 2020-04-28 珠海格力电器股份有限公司 Voice wake-up method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111696562A (en) 2020-09-22

Similar Documents

Publication Publication Date Title
CN111696562B (en) Voice wake-up method, device and storage medium
WO2021013137A1 (en) Voice wake-up method and electronic device
WO2020228815A1 (en) Voice-based wakeup method and device
CN112712803B (en) Voice awakening method and electronic equipment
CN112289313A (en) Voice control method, electronic equipment and system
WO2020207328A1 (en) Image recognition method and electronic device
CN111696570B (en) Voice signal processing method, device, equipment and storage medium
CN111369988A (en) Voice awakening method and electronic equipment
CN113393856B (en) Pickup method and device and electronic equipment
CN112446255A (en) Video image processing method and device
CN110070863A (en) A kind of sound control method and device
CN113448482B (en) Sliding response control method and device of touch screen and electronic equipment
WO2023016018A1 (en) Voice processing method and electronic device
CN114697812A (en) Sound collection method, electronic equipment and system
EP4199488A1 (en) Voice interaction method and electronic device
CN112150778A (en) Environmental sound processing method and related device
WO2022161077A1 (en) Speech control method, and electronic device
CN114067776A (en) Electronic device and audio noise reduction method and medium thereof
CN114520002A (en) Method for processing voice and electronic equipment
CN114120987B (en) Voice wake-up method, electronic equipment and chip system
CN114999535A (en) Voice data processing method and device in online translation process
CN113572798B (en) Device control method, system, device, and storage medium
CN115731923A (en) Command word response method, control equipment and device
CN114116610A (en) Method, device, electronic equipment and medium for acquiring storage information
CN115480250A (en) Voice recognition method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant