CN112289313A

CN112289313A - Voice control method, electronic equipment and system

Info

Publication number: CN112289313A
Application number: CN202010990191.6A
Authority: CN
Inventors: 孙渊; 伍晓晖; 屈伸
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2019-07-01
Filing date: 2019-07-01
Publication date: 2021-01-29
Also published as: CN110322878A; WO2021000876A1

Abstract

The application provides a voice control method, electronic equipment and a system, and relates to the technical field of voice control. Under the multi-device scene, the problem that the voice assistant awakening the device closest to the user can only respond to the voice command of the user, and the response failure can be caused is solved. The specific scheme is as follows: in a multi-device scene, after a user speaks a wakeup word, one device in the multiple devices can be selected to be awakened and responded through the multi-device awakening arbitration, and the voice command spoken by the user is collected by the device which is used for awakening and responding. According to the collected voice command, the event corresponding to the voice command is executed by the device which has the function of executing the event corresponding to the voice command in the multi-device through the multi-device capability arbitration, and the response to the voice command is completed.

Description

Voice control method, electronic equipment and system

Technical Field

The present application relates to the field of voice control technologies, and in particular, to a voice control method, an electronic device, and a system.

Background

The voice assistant is an important application of artificial intelligence on the mobile phone. The mobile phone can perform intelligent interaction of intelligent conversation and instant question and answer with the user through the voice assistant. And a voice command input by the user can be recognized, and the mobile phone is triggered to automatically execute an event corresponding to the voice command. Typically, the voice assistant is in a sleep state, and the user can wake up the voice assistant by voice before using the voice assistant. Only after the voice assistant is awakened can the voice command input by the user be received and recognized. The voice data for waking up may be referred to as a wake-up word. For example, take the example of the wake word "small Esmall E". If the user wants to use the voice assistant to trigger the handset to play music, the user may first say "Small Esmall E" to wake up the voice assistant. After the voice assistant wakes up, the user says "play music" again. The mobile phone can receive and recognize the voice command by using the voice assistant, and trigger the mobile phone to automatically play music.

With the development of the technology, the application of the voice control is more and more extensive. For example, many home devices currently support voice control functions. For example, a voice control function can be realized by installing a voice assistant in the household equipment. Thus, there may be a scenario where the user is in an environment (e.g., the user's home) that includes multiple devices that support voice control functionality, i.e., a multiple device scenario. In the multi-device scenario, if there are devices with the same wake-up word in the multiple devices, after the user speaks the wake-up word, the voice assistants of the devices with the same wake-up word are all woken up, and all recognize and respond to the voice command subsequently spoken by the user. For example, as shown in fig. 1, the user's living room has three devices, namely a sound box 101, a television 102 and a mobile phone 103, which are all installed with a voice assistant, and the wake-up words are all "small E". Then, when the user speaks the wake-up word "small E", the voice assistant of the speaker 101, the tv 102 and the mobile phone 103 will be woken up. When the user continues to speak "play music", the voice command is received and recognized by the speaker 101, the tv 102 and the mobile phone 103, and the music is automatically played.

In the prior art, the multi-device wake-up arbitration may be performed by a server or a local device (the local device may be any one of the devices with voice control function) based on voice energy. That is, one device is selected from a plurality of devices having the same wake-up word to wake up its voice assistant so that the device recognizes and responds to the user's voice command. Wherein the speech energy is used to indicate the distance between the device and the user. For example, taking the server as an example of performing the multi-device wake-up arbitration, with reference to fig. 1, the server may select the device closest to the user from the speaker 101, the television 102, and the mobile phone 103 according to the voice energy, such that the speaker 101 wakes up the voice assistant thereof, and the other devices do not respond to the wake-up word, i.e., do not wake up the voice assistant thereof. Thus, after the user continues to speak the voice command, only speaker 101 recognizes and responds to the user's voice command.

The prior art has at least the following problems: in the above scheme of multi-device wake-up arbitration, after the user speaks the wake-up word, the device closest to the user wakes up the voice assistant thereof, and responds to the voice command subsequently spoken by the user. However, if the voice command spoken by the user corresponds to an event, the device cannot complete, for example, if the voice command is "navigate to a certain place", but the device closest to the user does not have a navigation function, such as the above sound box 101, the response may fail. At this time, unless the user moves to a device having a navigation function, such as the mobile phone 103 mentioned above, and re-speaks the wakeup word and the voice command, the voice control for implementing navigation is difficult to be completed.

Disclosure of Invention

The embodiment of the application provides a voice control method, electronic equipment and a system. Under the multi-device scene, the problem that the voice assistant awakening the device closest to the user can only respond to the voice command of the user, and the response failure can be caused is solved.

In order to achieve the above purpose, the embodiment of the present application adopts the following technical solutions:

in a first aspect, an embodiment of the present application provides a voice control method, where the method may be applied to a voice control system, and the voice control system may include: the device comprises a set of devices and a server, wherein the set of devices at least comprises a first electronic device and a second electronic device which have voice control functions. The method can comprise the following steps: when the user wants to use the voice control function of the device, a corresponding wake-up word, such as the first voice data, may be spoken. At this time, the first electronic device and the second electronic device may receive first voice data of the user, respectively; the method comprises the steps that when the first electronic equipment determines that first voice data are the same as awakening words registered in the first electronic equipment, the first electronic equipment sends energy information of the first voice data detected by the first electronic equipment to a server; the second electronic equipment sends energy information of the first voice data detected by the second electronic equipment to the server when the first voice data is determined to be the same as the awakening words registered in the second electronic equipment; the server can perform multi-device awakening blanking according to the energy information of the first voice data detected by the first electronic device and the energy information of the first voice data detected by the second electronic device, namely, judge which device performs awakening response. If the energy of the first voice data detected by the first electronic device is greater than the energy of the first voice data detected by the second electronic device, the server may determine that a wake-up response is performed by the first electronic device and may send a first wake-up indication to the first electronic device; the first electronic equipment responds to the received first awakening instruction and can awaken the voice control function of the first electronic equipment; therefore, after the user speaks the voice name, for example, the second voice data, the first electronic equipment with the voice control function is awakened to receive the second voice data of the user, and the second voice data is sent to the server; the server can perform multi-device capability blanking according to the second voice data, namely, the server judges which device executes the event corresponding to the second voice data, for example, the server can determine a target electronic device from a group of devices, and the target electronic device has a function of executing the event corresponding to the second voice data; the server sends a content instruction to the target electronic equipment, wherein the content instruction is an instruction corresponding to the second voice data or the content instruction is data required for executing an event corresponding to the second voice data; in this way, the target electronic device can execute the event corresponding to the second voice data according to the content indication.

By adopting the technical scheme, in a multi-device scene, after a user speaks a wakeup word and a voice command, the server can only wake up one device, such as the device closest to the user, to perform wakeup response through multi-device wakeup arbitration and multi-device capability arbitration. Moreover, when the equipment for performing the wake-up response does not have the function of executing the event corresponding to the voice command, the equipment corresponding to the equipment having the function of executing the event corresponding to the voice command can execute the event corresponding to the voice command without moving the position of the user or re-speaking the wake-up word and the voice command, so that the response to the voice command is completed. The electronic equipment is more intelligent, and efficient interaction between the electronic equipment and a user is realized. Meanwhile, the use experience of the user is improved.

In a possible implementation manner, the set of devices may further include a third electronic device; wherein the third electronic device does not have a voice control function; or the third electronic equipment has a voice control function, but the distance between the third electronic equipment and the user is larger than the sound pickup distance of the third electronic equipment. Thus, the coverage range of the voice control can exceed the sound pickup range of the electronic equipment. For example, the sound collecting distance of a television set provided with 6 microphones is generally within 5 meters, and by adopting the method of the embodiment of the application, even if the distance between a user and the television set exceeds 5 meters, the user can be controlled to automatically execute events such as video playing and the like through voice control. In addition, the user does not need to explicitly say that the video needs to be played on the television, that is, the user does not need to specify that the device needing to play the video is the television, and only needs to say that "play a certain video" by the user.

In another possible implementation manner, when the first voice data is received, the voice control functions of the first electronic device and the second electronic device are not woken up.

In another possible implementation manner, the method may further include: the server sends a command response instruction to the first electronic equipment, wherein the command response instruction is used for instructing the first electronic equipment to prompt the user that the target electronic equipment executes an event corresponding to the second voice data; and the first electronic equipment prompts the user to execute the event corresponding to the second voice data by the target electronic equipment according to the command response instruction. In this way, the device performing the wake-up response, that is, the first electronic device, prompts the user at which device the voice command is to be responded to through a prompt, such as a voice prompt, thereby improving the user experience.

In another possible implementation manner, the determining, by the server, the target electronic device from a group of devices according to the second voice data may specifically include: and the server selects equipment with a function of executing the event corresponding to the second voice data from the group of equipment according to the capability information of each equipment in the group of equipment and the second voice data. And if only one device in the group of devices has the function of executing the event corresponding to the second voice data, the server determines that the device is the target electronic device. And if a plurality of devices in the group of devices have the function of executing the event corresponding to the second voice data, the server determines one device as the target electronic device from the plurality of devices. Wherein, in some embodiments, the target electronic device is any one of a plurality of devices. In some other embodiments, the target electronic device satisfies at least one of the following conditions: the target electronic device is the device with the shortest distance to the user in the plurality of devices; the target electronic equipment is in a starting state; the target electronic equipment is not determined to be used for executing events corresponding to other voice data within preset time; or, the target electronic device is the device with the highest frequency of use among the plurality of devices. Therefore, the equipment which has the function of executing the event corresponding to the voice command can be selected to respond to the voice command, and the equipment which best meets the intention of the user can be selected to execute the event corresponding to the voice command, so that the voice control is more intelligent, and the use experience of the user is improved.

In another possible implementation manner, the method may further include: each device in the group of devices reports respective capability information to the server; the server stores capability information for each device in a set of devices. The server can determine the equipment with the function of executing the event corresponding to the voice command by using the stored capability information of each equipment in the group of equipment.

In another possible implementation manner, the method may further include: the server sends a second awakening instruction to the second electronic equipment, and the second electronic equipment determines not to awaken the voice control function of the second electronic equipment according to the second awakening instruction; or the second electronic device determines that the first wake-up instruction is not received within the preset time, and determines not to wake up the voice control function of the second electronic device. After the second electronic device detects the wakeup word, it may be determined that no wakeup response is required according to the feedback of the server or no feedback received within a preset time.

In a second aspect, an embodiment of the present application provides a voice control method, which may be applied to a set of devices including at least a first electronic device and a second electronic device having a voice control function, and the method may include: when the user wants to use the voice control function of the device, a corresponding wake-up word, such as the first voice data, may be spoken. At this time, the first electronic device and the second electronic device may receive first voice data of the user, respectively; the first electronic equipment can acquire energy information of the first voice data detected by the first electronic equipment when the first voice data is determined to be the same as the awakening words registered in the first electronic equipment; the second electronic equipment can send energy information of the first voice data detected by the second electronic equipment to the first electronic equipment serving as the main equipment when the first voice data is determined to be the same as the awakening words registered in the second electronic equipment; the first electronic device as the master device may perform multi-device wake-up blanking, i.e. determine which device has performed a wake-up response. If the first electronic device can determine the device for performing the wake-up response from the first electronic device and the second electronic device according to the energy information of the first voice data detected by the first electronic device and the energy information of the first voice data detected by the second electronic device; if the energy of the first voice data detected by the first electronic equipment is greater than the energy of the first voice data detected by the second electronic equipment, determining that the first electronic equipment performs awakening response, and awakening the voice control function of the first electronic equipment by the first electronic equipment, so that the first electronic equipment after the user speaks the voice name, and awakening the voice control function receives the second voice data of the user after the user speaks the voice name, such as the second voice data; if the energy of the first voice data detected by the second electronic equipment is larger than the energy of the first voice data detected by the first electronic equipment, and the second electronic equipment is confirmed to wake up to respond, the first electronic equipment sends a first wake-up instruction to the second electronic equipment, and the second electronic equipment responds to the first wake-up instruction to wake up the voice control function of the second electronic equipment, so that after a user speaks a voice name, for example, after the second voice data, the second electronic equipment which wakes up the voice control function receives the second voice data of the user and sends the second voice data to the first electronic equipment; the first electronic device can perform multi-device capability blanking according to the second voice data, namely, which device executes the event corresponding to the second voice data is judged, for example, the first electronic device can determine a target electronic device from a group of devices, and the target electronic device has a function of executing the event corresponding to the second voice data; if the target electronic equipment is first electronic equipment, the first electronic equipment analyzes the second voice data to obtain an instruction corresponding to the second voice data, and executes an event corresponding to the second voice data according to the instruction; or the first electronic equipment acquires data required for executing the event corresponding to the second voice data from the server and executes the event corresponding to the second voice data according to the data; if the target electronic equipment is not the first electronic equipment, the first electronic equipment sends a content indication to the target electronic equipment; the content indication is an instruction corresponding to the second voice data, or the content indication is data required for executing an event corresponding to the second voice data; and the target electronic equipment executes the event corresponding to the second voice data according to the content indication.

By adopting the technical scheme, in a multi-device scene, after a user speaks a wakeup word and a voice command, the electronic device serving as the main device can perform multi-device wakeup arbitration and multi-device capability arbitration, and not only can one device be woken up, but also the device closest to the user can be woken up to respond. Moreover, when the equipment for performing the wake-up response does not have the function of executing the event corresponding to the voice command, the equipment corresponding to the equipment having the function of executing the event corresponding to the voice command can execute the event corresponding to the voice command without moving the position of the user or re-speaking the wake-up word and the voice command, so that the response to the voice command is completed. The electronic equipment is more intelligent, and efficient interaction between the electronic equipment and a user is realized. Meanwhile, the use experience of the user is improved.

In a possible implementation manner, the set of devices may further include a third electronic device; wherein the third electronic device does not have a voice control function; or the third electronic equipment has a voice control function, but the distance between the third electronic equipment and the user is larger than the sound pickup distance of the third electronic equipment. Thus, the coverage range of voice control can exceed the sound collecting range of the electronic equipment, namely, even if the distance between a user and certain electronic equipment exceeds the sound collecting range of the electronic equipment, the user can be controlled to automatically execute corresponding events through the voice control. In addition, the user does not need to explicitly say that the electronic device is required to execute the event, that is, the user does not need to specify that the device required to execute the event is the electronic device, and only needs to say that "execute something", by adopting the method of the embodiment, the electronic device can be triggered to automatically execute the corresponding event.

In another possible implementation manner, if the second electronic device is a device that performs a wake response, the method may further include: the first electronic equipment sends a command response instruction to the second electronic equipment, wherein the command response instruction is used for instructing the second electronic equipment to prompt a user that an event corresponding to the second voice data is to be executed by the target electronic equipment; the second electronic equipment prompts a user to execute an event corresponding to the second voice data by the target electronic equipment according to the command response instruction; or if the first electronic device is a device that responds to the wake up, the method further comprising: the first electronic equipment prompts the user that the target electronic equipment executes the event corresponding to the second voice data. Therefore, the equipment for awakening response prompts the user at which equipment to respond to the voice command through prompting, such as voice prompting, so that the use experience of the user is improved.

In another possible implementation manner, the determining, by the first electronic device, the target electronic device from a group of devices according to the second voice data may specifically include: and the first electronic equipment selects equipment with the function of executing the event corresponding to the second voice data from the group of equipment according to the capability information of each equipment in the group of equipment and the second voice data. And if only one device in the group of devices has the function of executing the event corresponding to the second voice data, the first electronic device determines that the device is the target electronic device. And if a plurality of devices in the group of devices have the function of executing the event corresponding to the second voice data, the first electronic device determines one device from the plurality of devices as the target electronic device. Wherein, in some embodiments, the target electronic device is any one of a plurality of devices. In some other embodiments, the target electronic device satisfies at least one of the following conditions: the target electronic device is the device with the shortest distance to the user in the plurality of devices; the target electronic equipment is in a starting state; the target electronic equipment is not determined to be used for executing events corresponding to other voice data within preset time; or, the target electronic device is the device with the highest frequency of use among the plurality of devices. Therefore, the equipment which has the function of executing the event corresponding to the voice command can be selected to respond to the voice command, and the equipment which best meets the intention of the user can be selected to execute the event corresponding to the voice command, so that the voice control is more intelligent, and the use experience of the user is improved.

In another possible implementation manner, the method may further include: each device except the first electronic device in the group of devices reports respective capability information to the first electronic device; the first electronic device stores capability information for each device in a set of devices. The electronic device as the master device can determine the device having the function of executing the event corresponding to the voice command by using the stored capability information of each device in the group of devices.

In another possible implementation manner, if the first electronic device is a device that performs a wake response, the method may further include: the first electronic equipment sends a second awakening instruction to the second electronic equipment, and the second electronic equipment determines not to awaken the voice control function of the second electronic equipment according to the second awakening instruction; or the second electronic device determines that the first wake-up instruction is not received within the preset time, and determines not to wake up the voice control function of the second electronic device. After the electronic device serving as the slave device detects the wakeup word, it may be determined that no wakeup response is required according to the feedback of the master device or no feedback received within a preset time.

In a third aspect, an embodiment of the present application provides a voice control method, which may be applied to a first electronic device with a voice control function, where the first electronic device is included in a set of devices, and the set of devices further includes a second electronic device with a voice control function, and the method may include: when the user wants to use the voice control function of the device, a corresponding wake-up word, such as the first voice data, may be spoken. At this time, the first electronic device may receive first voice data of the user; the method comprises the steps that when the fact that first voice data are the same as awakening words registered in first electronic equipment is determined, the first electronic equipment sends energy information of the first voice data detected by the first electronic equipment to a server; the method comprises the steps that first electronic equipment receives a wake-up instruction sent by a server, wherein the wake-up instruction is sent after the server determines that the first electronic equipment wakes up according to energy information of first voice data detected by the first electronic equipment and energy information of the first voice data detected by second electronic equipment, and the energy of the first voice data detected by the first electronic equipment is larger than that of the first voice data detected by the second electronic equipment; the first electronic equipment responds to the awakening instruction and awakens the voice control function of the first electronic equipment; therefore, after the user speaks the voice name, for example, the second voice data, the first electronic equipment with the voice control function is awakened to receive the second voice data of the user; the first electronic equipment sends second voice data to the server; the first electronic equipment receives a command response instruction sent by the server, wherein the command response instruction is used for instructing the first electronic equipment to prompt a user to execute an event corresponding to the second voice data by a target electronic equipment, and the target electronic equipment is equipment which is determined by the server according to the second voice data and has a function of executing the event corresponding to the second voice data from a group of equipment; and the first electronic equipment prompts the user to execute the event corresponding to the second voice data by the target electronic equipment according to the command response instruction.

By adopting the technical scheme, after the user speaks the awakening word in the multi-device scene, the plurality of devices in the group of devices including the first electronic device transmit the energy of the detected data to the server, so that the server performs multi-device awakening arbitration. If the first electronic device is a device for performing wake-up response, the server can perform multi-device capability arbitration by transmitting the collected voice command spoken by the user to the server. In this way, it is not only possible to wake up only one of the devices, e.g. the device closest to the user, for a wake-up response. Moreover, when the equipment for performing the wake-up response does not have the function of executing the event corresponding to the voice command, the equipment corresponding to the equipment having the function of executing the event corresponding to the voice command can execute the event corresponding to the voice command without moving the position of the user or re-speaking the wake-up word and the voice command, so that the response to the voice command is completed. The electronic equipment is more intelligent, and efficient interaction between the electronic equipment and a user is realized. Meanwhile, the use experience of the user is improved.

In one possible implementation, the set of devices may further include a third electronic device; the third electronic equipment does not have a voice control function; or the third electronic equipment has a voice control function, but the distance between the third electronic equipment and the user is greater than the sound pickup distance of the third electronic equipment.

In another possible implementation manner, the voice control function of the first electronic device is awakened when the first voice data is received.

In another possible implementation manner, if the target electronic device is a first electronic device, the method may further include: the first electronic equipment receives a content indication sent by the server, wherein the content indication is an instruction corresponding to the second voice data, or the content indication is data required for executing an event corresponding to the second voice data; and the first electronic equipment executes the event corresponding to the second voice data according to the content indication.

In a fourth aspect, an embodiment of the present application provides a voice control method, which may be applied to a second electronic device, where the second electronic device is included in a set of devices, and the set of devices further includes a first electronic device having a voice control function, where the first electronic device is configured to receive first voice data and second voice data of a user, the first voice data is a wakeup word, and the second voice data is a voice command; the method can comprise the following steps: the second electronic equipment receives a content instruction, wherein the content instruction is an instruction corresponding to the second voice data or data required for executing an event corresponding to the second voice data; and the second electronic equipment executes the event corresponding to the second voice data according to the content indication.

By adopting the technical scheme, under a multi-device scene, even if the electronic device is not the awakened device, the multi-device capacity arbitration of the server is carried out. When the device for performing the wake-up response does not have the function of executing the event corresponding to the voice command, the device corresponding to the device having the function of executing the event corresponding to the voice command, such as the second electronic device, can execute the event corresponding to the voice command without moving the position of the user or re-speaking the wake-up word and the voice command, thereby completing the response to the voice command. The electronic equipment is more intelligent, and efficient interaction between the electronic equipment and a user is realized. Meanwhile, the use experience of the user is improved.

In a possible implementation manner, the second electronic device does not have a voice control function; or the second electronic equipment has a voice control function, but the distance between the second electronic equipment and the user is larger than the sound pickup distance of the second electronic equipment.

In another possible implementation manner, the second electronic device has a voice control function, and the distance between the second electronic device and the user is less than or equal to the sound pickup distance of the second electronic device; the method may further comprise: the second electronic equipment receives the first voice data; and the second electronic equipment sends the energy information of the first voice data detected by the second electronic equipment when the first voice data is determined to be the same as the awakening words registered in the second electronic equipment. When the first voice data is received, the voice control function of the second electronic equipment is not awakened.

In another possible implementation manner, the method may further include: the second electronic equipment receives a second awakening instruction, and determines not to awaken the voice control function of the second electronic equipment according to the second awakening instruction; or the second electronic device determines that the first awakening instruction is not received within the preset time, and determines not to awaken the voice control function of the second electronic device.

In a fifth aspect, an embodiment of the present application provides a voice control method, which may be applied to a first electronic device with a voice control function, where the first electronic device is included in a set of devices, and the set of devices further includes a second electronic device with a voice control function, and the method may include: the method comprises the steps that first electronic equipment receives first voice data of a user; the method comprises the steps that when first voice data are determined to be the same as awakening words registered in first electronic equipment, the first electronic equipment obtains energy information of the first voice data detected by the first electronic equipment; the method comprises the steps that first electronic equipment receives energy information of first voice data, sent by second electronic equipment, detected by the second electronic equipment; the first electronic equipment determines equipment for performing awakening response from the first electronic equipment and the second electronic equipment according to the energy information of the first voice data detected by the first electronic equipment and the energy information of the first voice data detected by the second electronic equipment; if the energy of the first voice data detected by the first electronic equipment is larger than the energy of the first voice data detected by the second electronic equipment, determining that the first electronic equipment wakes up the response, the first electronic equipment wakes up the voice control function of the first electronic equipment, and the first electronic equipment behind the voice control function is woken up to receive the second voice data of the user; if the energy of the first voice data detected by the second electronic equipment is larger than the energy of the first voice data detected by the first electronic equipment, and the second electronic equipment is confirmed to wake up to respond, the first electronic equipment sends a first wake-up instruction to the second electronic equipment and receives second voice data sent by the second electronic equipment, wherein the second voice data is acquired after a user speaks the second voice data after the second electronic equipment responds to the first wake-up instruction and wakes up a voice control function of the second electronic equipment; the first electronic equipment determines target electronic equipment from the group of equipment according to the second voice data, and the target electronic equipment has a function of executing the event corresponding to the second voice data; if the target electronic equipment is first electronic equipment, the first electronic equipment analyzes the second voice data to obtain an instruction corresponding to the second voice data, and executes an event corresponding to the second voice data according to the instruction; or the first electronic equipment acquires data required for executing the event corresponding to the second voice data from the server and executes the event corresponding to the second voice data according to the data; and if the target electronic equipment is not the first electronic equipment, the first electronic equipment sends a content indication to the target electronic equipment, wherein the content indication is an instruction corresponding to the second voice data, or the content indication is data required for executing the event corresponding to the second voice data, and is used for the target electronic equipment to execute the event corresponding to the second voice data.

In a possible implementation manner, the set of devices may further include a third electronic device; the third electronic equipment does not have a voice control function; or the third electronic equipment has a voice control function, but the distance between the third electronic equipment and the user is greater than the sound pickup distance of the third electronic equipment.

In another possible implementation, the voice control function of the first electronic device is not woken up when receiving the first voice data.

In another possible implementation manner, if the second electronic device is a device that performs a wake response, the method may further include: the first electronic equipment sends a command response instruction to the second electronic equipment, wherein the command response instruction is used for instructing the second electronic equipment to prompt a user that an event corresponding to the second voice data is to be executed by the target electronic equipment; or if the first electronic device is a device that responds to the wake up, the method may further comprise: the first electronic equipment prompts the user that the target electronic equipment executes the event corresponding to the second voice data.

In another possible implementation manner, the determining, by the first electronic device, the target electronic device from a group of devices according to the second voice data may specifically include: and the first electronic equipment selects equipment with the function of executing the event corresponding to the second voice data from the group of equipment according to the capability information of each equipment in the group of equipment and the second voice data. And if one device in the group of devices has the function of executing the event corresponding to the second voice data, the first electronic device determines that the device is the target electronic device. If a plurality of devices in the group of devices have the function of executing the event corresponding to the second voice data, the first electronic device determines one device from the plurality of devices as a target electronic device; wherein, in some embodiments, the target electronic device is any one of a plurality of devices. In some other embodiments, the target electronic device satisfies at least one of the following conditions: the target electronic device is the device with the shortest distance to the user in the plurality of devices; the target electronic equipment is in a starting state; the target electronic equipment is not determined to be used for executing events corresponding to other voice data within preset time; or, the target electronic device is the device with the highest frequency of use among the plurality of devices.

In another possible implementation manner, the method may further include: the method comprises the steps that first electronic equipment receives respective capability information reported by each piece of equipment except the first electronic equipment in a group of equipment; the first electronic device stores capability information for each device in a set of devices.

In another possible implementation manner, if the first electronic device is a device that performs a wake response, the method may further include: and the first electronic equipment sends a second awakening instruction to the second electronic equipment, wherein the second awakening instruction is used for indicating that the second electronic equipment does not perform awakening response.

In a sixth aspect, an embodiment of the present application provides a voice control method, which is applied to a server, where the server is included in a voice control system, and the voice control system further includes: a set of devices including at least a first electronic device and a second electronic device having a voice control function; the method can comprise the following steps: the server receives energy information of first voice data detected by first electronic equipment and sent by the first electronic equipment, and energy information of the first voice data detected by second electronic equipment and sent by the second electronic equipment; the server determines that the first electronic equipment wakes up according to the energy information of the first voice data detected by the first electronic equipment and the energy information of the first voice data detected by the second electronic equipment, and sends a first wake-up instruction to the first electronic equipment; the energy of the first voice data detected by the first electronic equipment is greater than that of the first voice data detected by the second electronic equipment; the server receives second voice data sent by the first electronic equipment; the server determines target electronic equipment from the group of equipment according to the second voice data, and the target electronic equipment has a function of executing the event corresponding to the second voice data; and the server sends a content indication to the target electronic equipment, wherein the content indication is an instruction corresponding to the second voice data, or the content indication is data required for executing the event corresponding to the second voice data, and is used for indicating the target electronic equipment to execute the event corresponding to the second voice data.

By adopting the technical scheme, in a multi-device scene, after a user speaks a wakeup word and a voice command, the server can perform multi-device wakeup arbitration and multi-device capability arbitration, and not only can one device be woken up, but also the device closest to the user can be woken up to respond. Moreover, when the equipment for performing the wake-up response does not have the function of executing the event corresponding to the voice command, the equipment corresponding to the equipment having the function of executing the event corresponding to the voice command can execute the event corresponding to the voice command without moving the position of the user or re-speaking the wake-up word and the voice command, so that the response to the voice command is completed. The electronic equipment is more intelligent, and efficient interaction between the electronic equipment and a user is realized. Meanwhile, the use experience of the user is improved.

In another possible implementation manner, the method may further include: and the server sends a command response instruction to the first electronic equipment, wherein the command response instruction is used for instructing the first electronic equipment to prompt the user that the target electronic equipment executes the event corresponding to the second voice data.

In another possible implementation manner, the server determines the target electronic device from a group of devices according to the second voice data, which may specifically include: and the server selects equipment with a function of executing the event corresponding to the second voice data from the group of equipment according to the capability information of each equipment in the group of equipment and the second voice data. And if one device in the group of devices has the function of executing the event corresponding to the second voice data, the server determines that the device is the target electronic device. And if a plurality of devices in the group of devices have the function of executing the event corresponding to the second voice data, the server determines one device as the target electronic device from the plurality of devices. Wherein, in some embodiments, the target electronic device is any one of a plurality of devices. In some other embodiments, the target electronic device satisfies at least one of the following conditions: the target electronic device is the device with the shortest distance to the user in the plurality of devices; the target electronic equipment is in a starting state; the target electronic equipment is not determined to be used for executing events corresponding to other voice data within preset time; or, the target electronic device is the device with the highest frequency of use among the plurality of devices.

In another possible implementation manner, the method may further include: the server receives respective capability information reported by each device in a group of devices; the server stores capability information for each device in a set of devices.

In another possible implementation manner, the method may further include: and the server sends a second awakening instruction to the second electronic equipment, wherein the second awakening instruction is used for indicating the second electronic equipment not to perform awakening response.

In a seventh aspect, an embodiment of the present application provides an electronic device, including: one or more processors and memory; a memory coupled with the one or more processors, the memory for storing computer program code, the computer program code comprising computer instructions, which when executed by the one or more processors, cause the electronic device to perform the voice control method of any one of the third aspect or possible implementations of the third aspect; or, the electronic device executes the voice control method according to any one of the fourth aspect or possible implementation manners of the fourth aspect; alternatively, the electronic device executes the voice control method according to any one of the fifth aspect and possible implementation manners of the fifth aspect.

In an eighth aspect, an embodiment of the present application provides a server, including: one or more processors and memory; a memory is coupled to the one or more processors for storing computer program code comprising computer instructions which, when executed by the one or more processors, cause the server to perform the voice control method as described in the sixth aspect or any of its possible implementations.

In a ninth aspect, an embodiment of the present application provides a computer storage medium, which includes computer instructions that, when executed on an electronic device, cause the electronic device to perform the voice control method according to the third aspect or any one of the possible implementation manners of the third aspect; or causing the electronic device to perform the voice control method according to any one of the fourth aspect or possible implementation manners of the fourth aspect; or cause the electronic device to perform the voice control method according to any one of the fifth aspect or possible implementation manners of the fifth aspect.

In a tenth aspect, an embodiment of the present application provides a computer storage medium, which includes computer instructions, and when the computer instructions are executed on an electronic device, the server executes the voice control method according to the sixth aspect or any one of the possible implementation manners of the sixth aspect.

In an eleventh aspect, embodiments of the present application provide a computer program product, which when run on a computer, causes the computer to execute the voice control method according to any one of the third aspect or the possible implementation manners of the third aspect; or causing a computer to execute the voice control method according to the fourth aspect or any one of its possible implementation manners; or causing a computer to perform the voice control method according to the fifth aspect or any one of its possible implementation manners.

In a twelfth aspect, embodiments of the present application provide a computer program product, which, when run on a computer, causes the computer to execute the voice control method according to the sixth aspect or any one of the possible implementation manners of the sixth aspect.

In a thirteenth aspect, an embodiment of the present application provides an apparatus having a function of implementing a behavior of an electronic device, such as a first electronic device, a second electronic device, or a third electronic device, in the method of the foregoing aspects. The functions may be implemented by hardware, or by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions, for example, a receiving unit or module, a transmitting unit or module, a waking unit or module, and the like.

In a fourteenth aspect, the present application provides an apparatus having a function of implementing the server behavior in the method of the above aspects. The functions may be implemented by hardware, or by hardware executing corresponding software. The hardware or software includes one or more modules corresponding to the above functions, for example, a transmitting unit or module, a receiving unit or module, a determining unit or module, and the like.

In a fifteenth aspect, an embodiment of the present application provides a voice control system, which may include: the system comprises a set of equipment and a server, wherein the set of equipment at least comprises a first electronic equipment and a second electronic equipment which have voice control functions; the method comprises the steps that first electronic equipment and second electronic equipment respectively receive first voice data of a user; the first electronic equipment determines that the first voice data are the same as the awakening words registered in the first electronic equipment, and sends energy information of the first voice data detected by the first electronic equipment to the server; the second electronic equipment determines that the first voice data is the same as the awakening words registered in the second electronic equipment, and sends energy information of the first voice data detected by the second electronic equipment to the server; the server determines that the first electronic equipment wakes up according to the energy information of the first voice data detected by the first electronic equipment and the energy information of the first voice data detected by the second electronic equipment, and sends a first wake-up instruction to the first electronic equipment; the energy of the first voice data detected by the first electronic equipment is greater than that of the first voice data detected by the second electronic equipment; the first electronic equipment responds to the first awakening instruction and awakens the voice control function of the first electronic equipment; the first electronic equipment which wakes up the voice control function receives second voice data of the user; the first electronic equipment sends second voice data to the server; the server determines target electronic equipment from the group of equipment according to the second voice data, and the target electronic equipment has a function of executing the event corresponding to the second voice data; the server sends a content indication to the target electronic equipment, wherein the content indication is an instruction corresponding to the second voice data or the content indication is data required for executing an event corresponding to the second voice data; and the target electronic equipment executes the event corresponding to the second voice data according to the content indication.

In a possible implementation manner, the set of devices may further include: a third electronic device; the third electronic equipment does not have a voice control function; or the third electronic equipment has a voice control function, but the distance between the third electronic equipment and the user is greater than the sound pickup distance of the third electronic equipment.

In a sixteenth aspect, an embodiment of the present application provides a voice control system, where the voice control system may include: the device comprises a set of devices, a control unit and a processing unit, wherein the set of devices at least comprises a first electronic device and a second electronic device with voice control functions; the method comprises the steps that first electronic equipment and second electronic equipment respectively receive first voice data of a user; the method comprises the steps that the first electronic equipment determines that first voice data are the same as awakening words registered in the first electronic equipment, and energy information of the first voice data detected by the first electronic equipment is obtained; the second electronic equipment determines that the first voice data is the same as the awakening words registered in the second electronic equipment, and sends energy information of the first voice data detected by the second electronic equipment to the first electronic equipment; the first electronic equipment determines equipment for performing awakening response from the first electronic equipment and the second electronic equipment according to the energy information of the first voice data detected by the first electronic equipment and the energy information of the first voice data detected by the second electronic equipment; if the energy of the first voice data detected by the first electronic equipment is larger than the energy of the first voice data detected by the second electronic equipment, and the first electronic equipment determines that the first electronic equipment performs awakening response, the first electronic equipment awakens a voice control function of the first electronic equipment, and the first electronic equipment after the voice control function is awakened receives the second voice data of the user; if the energy of the first voice data detected by the second electronic equipment is larger than the energy of the first voice data detected by the first electronic equipment, the first electronic equipment determines that the second electronic equipment performs awakening response, the first electronic equipment sends a first awakening instruction to the second electronic equipment, the second electronic equipment responds to the first awakening instruction, awakens a voice control function of the second electronic equipment, and the second electronic equipment after awakening the voice control function receives second voice data of a user and sends the second voice data to the first electronic equipment; the first electronic equipment determines target electronic equipment from the group of equipment according to the second voice data, and the target electronic equipment has a function of executing the event corresponding to the second voice data; if the target electronic equipment is first electronic equipment, the first electronic equipment analyzes the second voice data to obtain an instruction corresponding to the second voice data, and executes an event corresponding to the second voice data according to the instruction; or the first electronic equipment acquires data required for executing the event corresponding to the second voice data from the server and executes the event corresponding to the second voice data according to the data; if the target electronic equipment is not the first electronic equipment, the first electronic equipment sends a content indication to the target electronic equipment; the content indication is an instruction corresponding to the second voice data, or the content indication is data required for executing an event corresponding to the second voice data; and the target electronic equipment executes the event corresponding to the second voice data according to the content indication.

It should be appreciated that the description of technical features, solutions, benefits, or similar language in this application does not imply that all of the features and advantages may be realized in any single embodiment. Rather, it is to be understood that the description of a feature or advantage is intended to include the specific features, aspects or advantages in at least one embodiment. Therefore, the descriptions of technical features, technical solutions or advantages in the present specification do not necessarily refer to the same embodiment. Furthermore, the technical features, technical solutions and advantages described in the present embodiments may also be combined in any suitable manner. One skilled in the relevant art will recognize that an embodiment may be practiced without one or more of the specific features, aspects, or advantages of a particular embodiment. In other instances, additional features and advantages may be recognized in certain embodiments that may not be present in all embodiments.

Drawings

Fig. 1 is a schematic view of a scenario of a multi-device voice control provided in an embodiment of the present application;

FIG. 2 is a simplified schematic diagram of a speech control system according to an embodiment of the present application;

fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure;

fig. 4 is a schematic flowchart of a voice control method according to an embodiment of the present application;

fig. 5 is a schematic view of another scenario of multi-device voice control provided in an embodiment of the present application;

fig. 6 is a schematic view of a scene of still another multi-device voice control provided in an embodiment of the present application;

fig. 7 is a flowchart illustrating another speech control method according to an embodiment of the present application.

Detailed Description

In the following, the terms "first", "second" are used for descriptive purposes only and are not to be understood as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present embodiment, "a plurality" means two or more unless otherwise specified.

The voice control method provided by the embodiment of the application can be applied to a group of devices. The group of devices may include a plurality of devices, at least two devices of the plurality of devices have a voice control function, and the wakeup words are the same. In the embodiments of the present application, such an application scenario may be referred to as a multi-device scenario. In the multi-device scenario, after the user speaks the wake-up word and the voice command, by using the method of this embodiment, even if the device having the function of executing the event corresponding to the voice command is not closest to the user, the device can execute the event corresponding to the voice command, and complete the response to the voice command. The electronic equipment is more intelligent, and efficient interaction between the electronic equipment and a user is realized. Meanwhile, the use experience of the user is improved.

In some embodiments, a voice assistant may be installed in an electronic device to enable the electronic device to implement voice control functions. The voice assistant is typically in a dormant state. The user may voice wake up the voice assistant before using the voice control functionality of the electronic device. The voice data that wakes up the voice assistant may be referred to as a wake-up word (or wake-up voice), among others. The wake-up word may be pre-registered in the electronic device. The waking up the voice assistant in this embodiment may be that the electronic device starts the voice assistant in response to the waking word spoken by the user. The voice control function may refer to: after the voice assistant of the electronic device is started, the user may trigger the electronic device to automatically execute an event corresponding to a voice command (e.g., a piece of voice data) by speaking the voice command.

In addition, the voice assistant may be an embedded application in the electronic device (i.e., a system application of the electronic device), or may be a downloadable application. An embedded application is an application program provided as part of an implementation of an electronic device, such as a cell phone. A downloadable application is an application that may provide its own Internet Protocol Multimedia Subsystem (IMS) connection. The downloadable application may be pre-installed in the electronic device or may be a third party application downloaded by a user and installed in the electronic device.

Embodiments of the present application will be described in detail below with reference to the accompanying drawings.

Fig. 2 is a schematic composition diagram of a voice control system according to an embodiment of the present application. The voice control system may be applied to a group of devices as described above. The set of devices includes a plurality of devices that satisfy one or more of the following conditions: the same wireless access point (such as a WiFi access point) is connected, the same account is logged in, and the same account is set in the same group by the user.

Wherein, as an example, the set of devices may include at least two electronic devices: for example, a first electronic device 201 and a second electronic device 202. The first electronic device 201 and the second electronic device 202 each have a voice control function, such as a voice assistant is installed. And the awakening words for awakening the voice assistant are the same, such as small E and small E.

Typically, when the distance between the electronic device (such as the first electronic device 201 or the second electronic device 202) and the user is less than or equal to a predetermined distance, such as 5 meters, after the user speaks the wake word, the electronic device can detect the wake word and determine whether a voice assistant in the device needs to be woken up. In this embodiment, the distances between the first electronic device 201 and the second electronic device 202 and the user are both less than or equal to the predetermined distance. That is, after the user speaks the wake word "small eSmall E", the wake word can be detected by both the first electronic device 201 and the second electronic device 202.

In this embodiment, a multi-device wake arbitration may be performed, that is, only one of the first electronic device 201 and the second electronic device 202 responds to the wake word. That is, only one device wakes up its voice assistant. And recognizing the voice command spoken by the user by the device after the user continues to speak the voice command.

In addition, multi-device capability arbitration can be performed, that is, whether a device for waking up the voice assistant has a function of executing an event corresponding to the voice command or not is judged. If the device for waking up the voice assistant does not have the function of executing the event corresponding to the voice command, the device can be handed over to the device with the function of executing the event corresponding to the voice command for execution.

For example, after the user speaks the wake word "Small Esmall E", the second electronic device 202 responds to the wake word, i.e., the second electronic device 202 wakes up its voice assistant. And receives a voice command "navigate to a place" that recognizes the user utters. However, the second electronic device 202 does not have the navigation function, and the first electronic device 201 has the navigation function, the first electronic device 201 can execute the event corresponding to the voice command "navigate to a place". Alternatively, the set of devices may also include other electronic devices, such as the third electronic device 204, and the third electronic device 204 has a navigation function, the third electronic device 204 can execute the event corresponding to the voice command "navigate to a place". The distance between the third electronic device 204 and the user may be less than or equal to the predetermined distance, or may be greater than the predetermined distance. The third electronic device 204 may or may not have a voice control function.

In some embodiments, the device performing the multi-device wake arbitration and the multi-device capability arbitration may be any one of the first electronic device 201 and the second electronic device 202. In this embodiment, the device that performs the above-described multi-device wake arbitration and multi-device capability arbitration may be referred to as a master device. The master device stores capability information of a plurality of devices in advance. The plurality of devices include the first electronic device 201 and the second electronic device 202 described above, and may further include other electronic devices, such as the third electronic device 204 described above.

In other embodiments, the device performing the multi-device wake arbitration and the multi-device capability arbitration may also be a server. As shown in fig. 2, the system architecture may also include a server 203. The server 203 can provide an intelligent voice service in which capability information of a plurality of devices is previously stored. For example, the first electronic device 201, the second electronic device 202, and other electronic devices (such as the third electronic device 204) may report their own capability information to the server 203 for storage thereof when being powered on or restarted. For another example, the electronic devices (such as the first electronic device 201, the second electronic device 202, and other electronic devices) may also periodically report their own capability information to the server 203 for storage. Of course, the electronic device may also upload the changed capability information to the server when determining that the capability information of the electronic device changes, so that the electronic device updates the stored capability information of the electronic device.

For example, the electronic devices described in the embodiments of the present application, such as the first electronic device 201, the second electronic device 202, and the third electronic device 204 described above, may be a mobile phone, a tablet computer, a desktop computer, a laptop computer, a handheld computer, a notebook computer, a desktop computer, an ultra-mobile personal computer (UMPC), a netbook, a cellular phone, a Personal Digital Assistant (PDA), an Augmented Reality (AR) \ Virtual Reality (VR) device, a media player, a television, a smart speaker, a smart watch smart headset, and the like. The embodiment of the present application does not particularly limit the specific form of the electronic device. The specific structure of the electronic device may refer to the description of the embodiment corresponding to fig. 3.

In addition, in some embodiments, the first electronic device 201, the second electronic device 202, and the third electronic device 204 may be the same type of electronic device, for example, the first electronic device 201, the second electronic device 202, and the third electronic device 204 are all mobile phones. In other embodiments, the first electronic device 201, the second electronic device 202, and the third electronic device 204 may be different types of electronic devices, for example, the first electronic device 201 is a mobile phone, the second electronic device 202 is a smart speaker, and the third electronic device 204 is a television (as shown in fig. 2).

Please refer to fig. 3, which is a schematic structural diagram of an electronic device according to an embodiment of the present disclosure.

As shown in fig. 3, the electronic device may include a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, a charging management module 140, a power management module 141, a battery 142, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display screen 194, a Subscriber Identification Module (SIM) card interface 195, and the like. The sensor module 180 may include a pressure sensor 180A, a gyroscope sensor 180B, an air pressure sensor 180C, a magnetic sensor 180D, an acceleration sensor 180E, a distance sensor 180F, a proximity light sensor 180G, a fingerprint sensor 180H, a temperature sensor 180J, a touch sensor 180K, an ambient light sensor 180L, a bone conduction sensor 180M, and the like.

It is to be understood that the illustrated structure of the present embodiment does not constitute a specific limitation to the electronic device. In other embodiments, an electronic device may include more or fewer components than shown, or some components may be combined, some components may be split, or a different arrangement of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware.

Processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors.

The controller may be a neural center and a command center of the electronic device. The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution.

In the embodiment of the present application, a wake-up word (e.g., "small E") may be set in the electronic device. The DSP may monitor the voice data in real time through the microphone 170C of the electronic device. When the DSP monitors the voice data, the monitored voice data may be checked to determine whether it is suspected of being a wake-up word set in the electronic device. If the verification is passed, if the AP of the electronic equipment is in a dormant state, the DSP can wake up the AP and inform the AP to verify the received voice data again. Upon passing the check again, the AP may determine that the voice data matches a wake-up word provided in the electronic device.

A memory may also be provided in processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system.

In some embodiments, processor 110 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface, etc.

The charging management module 140 is configured to receive charging input from a charger. The charger may be a wireless charger or a wired charger. In some wired charging embodiments, the charging management module 140 may receive charging input from a wired charger via the USB interface 130. In some wireless charging embodiments, the charging management module 140 may receive a wireless charging input through a wireless charging coil of the electronic device. The charging management module 140 may also supply power to the electronic device through the power management module 141 while charging the battery 142.

The power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110. The power management module 141 receives input from the battery 142 and/or the charge management module 140 and provides power to the processor 110, the internal memory 121, the external memory, the display 194, the camera 193, the wireless communication module 160, and the like. The power management module 141 may also be used to monitor parameters such as battery capacity, battery cycle count, battery state of health (leakage, impedance), etc. In some other embodiments, the power management module 141 may also be disposed in the processor 110. In other embodiments, the power management module 141 and the charging management module 140 may be disposed in the same device.

The wireless communication function of the electronic device may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor, the baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in an electronic device may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution including 2G/3G/4G/5G wireless communication applied to the electronic device. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The mobile communication module 150 may receive the electromagnetic wave from the antenna 1, filter, amplify, etc. the received electromagnetic wave, and transmit the electromagnetic wave to the modem processor for demodulation. The mobile communication module 150 may also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna 1 to radiate the electromagnetic wave. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the same device as at least some of the modules of the processor 110. For example, in some embodiments of the present application, the mobile communication module 150 may interact with the server, such as sending energy information of the detected voice data to the server after detecting voice data matching with the wakeup word, and receiving a wakeup indication returned by the server, so as to determine whether a wakeup response is required according to the wakeup indication. For another example, a content instruction sent by the server is received, and an event corresponding to the user voice command is executed according to the content instruction.

The wireless communication module 160 may provide solutions for wireless communication applied to electronic devices, including Wireless Local Area Networks (WLANs) (such as wireless fidelity (Wi-Fi) networks), Bluetooth (BT), Global Navigation Satellite Systems (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR), and the like. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering processing on electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, perform frequency modulation and amplification on the signal, and convert the signal into electromagnetic waves through the antenna 2 to radiate the electromagnetic waves. For example, in some embodiments of the present application, the wireless communication module 160 may interact with other electronic devices, such as sending energy information of the detected voice data to other electronic devices after detecting voice data matching with a wakeup word, and receiving a wakeup indication returned by the electronic device, so as to determine whether a wakeup response is required according to the wakeup indication. For another example, a content instruction sent by the electronic device is received, and an event corresponding to a user voice command is executed according to the content instruction.

In some embodiments, antenna 1 of the electronic device is coupled to the mobile communication module 150 and antenna 2 is coupled to the wireless communication module 160 so that the electronic device can communicate with the network and other devices through wireless communication techniques. The wireless communication technology may include global system for mobile communications (GSM), General Packet Radio Service (GPRS), code division multiple access (code division multiple access, CDMA), Wideband Code Division Multiple Access (WCDMA), time-division code division multiple access (time-division code division multiple access, TD-SCDMA), Long Term Evolution (LTE), LTE, BT, GNSS, WLAN, NFC, FM, and/or IR technologies, etc. The GNSS may include a Global Positioning System (GPS), a global navigation satellite system (GLONASS), a beidou navigation satellite system (BDS), a quasi-zenith satellite system (QZSS), and/or a Satellite Based Augmentation System (SBAS).

The electronic device implements the display function through the GPU, the display screen 194, and the application processor, etc. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.

The display screen 194 is used to display images, video, and the like. The display screen 194 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), and the like. In some embodiments, the electronic device may include 1 or N display screens 194, with N being a positive integer greater than 1.

The electronic device may implement a shooting function through the ISP, the camera 193, the video codec, the GPU, the display screen 194, the application processor, and the like.

The ISP is used to process the data fed back by the camera 193. For example, when a photo is taken, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converting into an image visible to naked eyes. The ISP can also carry out algorithm optimization on the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The light sensing element converts the optical signal into an electrical signal, which is then passed to the ISP where it is converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into image signal in standard RGB, YUV and other formats. In some embodiments, the electronic device may include 1 or N cameras 193, N being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process digital image signals and other digital signals. For example, when the electronic device selects a frequency point, the digital signal processor is used for performing fourier transform and the like on the frequency point energy.

Video codecs are used to compress or decompress digital video. The electronic device may support one or more video codecs. In this way, the electronic device can play or record video in a variety of encoding formats, such as: moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, and the like.

The NPU is a neural-network (NN) computing processor that processes input information quickly by using a biological neural network structure, for example, by using a transfer mode between neurons of a human brain, and can also learn by itself continuously. The NPU can realize applications such as intelligent cognition of electronic equipment, for example: image recognition, face recognition, speech recognition, text understanding, and the like.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to extend the memory capability of the electronic device. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music, video, etc. are saved in an external memory card.

The internal memory 121 may be used to store computer-executable program code, which includes instructions. The processor 110 executes various functional applications of the electronic device and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The data storage area can store data (such as audio data, phone book and the like) created in the using process of the electronic device. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (UFS), and the like.

The electronic device may implement audio functions via the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone interface 170D, and the application processor. Such as music playing, recording, etc.

The audio module 170 is used to convert digital audio information into an analog audio signal output and also to convert an analog audio input into a digital audio signal. The audio module 170 may also be used to encode and decode audio signals. In some embodiments, the audio module 170 may be disposed in the processor 110, or some functional modules of the audio module 170 may be disposed in the processor 110.

The speaker 170A, also called a "horn", is used to convert the audio electrical signal into an acoustic signal. The electronic apparatus can listen to music through the speaker 170A or listen to a handsfree call.

The receiver 170B, also called "earpiece", is used to convert the electrical audio signal into an acoustic signal. When the electronic device answers a call or voice information, it can answer the voice by placing the receiver 170B close to the ear of the person.

The microphone 170C, also referred to as a "microphone," is used to convert sound signals into electrical signals. When a call is placed or a voice message is sent or some event needs to be triggered by the voice assistant to be performed by the electronic device, the user can speak via his/her mouth near the microphone 170C and input a voice signal into the microphone 170C. The electronic device may be provided with at least one microphone 170C. In other embodiments, the electronic device may be provided with two microphones 170C to achieve a noise reduction function in addition to collecting sound signals. In other embodiments, the electronic device may further include three, four or more microphones 170C to collect sound signals, reduce noise, identify sound sources, perform directional recording, and the like.

The headphone interface 170D is used to connect a wired headphone. The headset interface 170D may be the USB interface 130, or may be a 3.5mm open mobile electronic device platform (OMTP) standard interface, a cellular telecommunications industry association (cellular telecommunications industry association of the USA, CTIA) standard interface.

The pressure sensor 180A is used for sensing a pressure signal, and converting the pressure signal into an electrical signal. In some embodiments, the pressure sensor 180A may be disposed on the display screen 194. The pressure sensor 180A can be of a wide variety, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like. The capacitive pressure sensor may be a sensor comprising at least two parallel plates having an electrically conductive material. When a force acts on the pressure sensor 180A, the capacitance between the electrodes changes. The electronics determine the strength of the pressure from the change in capacitance. When a touch operation is applied to the display screen 194, the electronic device detects the intensity of the touch operation according to the pressure sensor 180A. The electronic device may also calculate the position of the touch from the detection signal of the pressure sensor 180A. In some embodiments, the touch operations that are applied to the same touch position but different touch operation intensities may correspond to different operation instructions. For example: and when the touch operation with the touch operation intensity smaller than the first pressure threshold value acts on the short message application icon, executing an instruction for viewing the short message. And when the touch operation with the touch operation intensity larger than or equal to the first pressure threshold value acts on the short message application icon, executing an instruction of newly building the short message.

The gyro sensor 180B may be used to determine the motion pose of the electronic device. In some embodiments, the angular velocity of the electronic device about three axes (i.e., x, y, and z axes) may be determined by the gyroscope sensor 180B. The gyro sensor 180B may be used for photographing anti-shake. Illustratively, when the shutter is pressed, the gyroscope sensor 180B detects a shake angle of the electronic device, calculates a distance to be compensated for by the lens module according to the shake angle, and allows the lens to counteract the shake of the electronic device through a reverse movement, thereby achieving anti-shake. The gyroscope sensor 180B may also be used for navigation, somatosensory gaming scenes.

The air pressure sensor 180C is used to measure air pressure. In some embodiments, the electronic device calculates altitude, aiding in positioning and navigation, from barometric pressure values measured by barometric pressure sensor 180C.

The magnetic sensor 180D includes a hall sensor. The electronic device may detect the opening and closing of the flip holster using the magnetic sensor 180D. In some embodiments, when the electronic device is a flip, the electronic device may detect the opening and closing of the flip according to the magnetic sensor 180D. And then according to the opening and closing state of the leather sheath or the opening and closing state of the flip cover, the automatic unlocking of the flip cover is set.

The acceleration sensor 180E can detect the magnitude of acceleration of the electronic device in various directions (typically three axes). When the electronic device is at rest, the magnitude and direction of gravity can be detected. The method can also be used for recognizing the posture of the electronic equipment, and is applied to horizontal and vertical screen switching, pedometers and other applications.

A distance sensor 180F for measuring a distance. The electronic device may measure distance by infrared or laser. In some embodiments, taking a picture of a scene, the electronic device may utilize the distance sensor 180F to range to achieve fast focus.

The proximity light sensor 180G may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The electronic device emits infrared light to the outside through the light emitting diode. The electronic device uses a photodiode to detect infrared reflected light from nearby objects. When sufficient reflected light is detected, it can be determined that there is an object near the electronic device. When insufficient reflected light is detected, the electronic device may determine that there are no objects near the electronic device. The electronic device can detect that the electronic device is held by a user and close to the ear for conversation by utilizing the proximity light sensor 180G, so that the screen is automatically extinguished, and the purpose of saving power is achieved. The proximity light sensor 180G may also be used in a holster mode, a pocket mode automatically unlocks and locks the screen.

The ambient light sensor 180L is used to sense the ambient light level. The electronic device may adaptively adjust the brightness of the display screen 194 based on the perceived ambient light level. The ambient light sensor 180L may also be used to automatically adjust the white balance when taking a picture. The ambient light sensor 180L may also cooperate with the proximity light sensor 180G to detect whether the electronic device is in a pocket to prevent accidental touches.

The fingerprint sensor 180H is used to collect a fingerprint. The electronic equipment can utilize the collected fingerprint characteristics to realize fingerprint unlocking, access to an application lock, fingerprint photographing, fingerprint incoming call answering and the like.

The temperature sensor 180J is used to detect temperature. In some embodiments, the electronic device implements a temperature processing strategy using the temperature detected by temperature sensor 180J. For example, when the temperature reported by the temperature sensor 180J exceeds a threshold, the electronic device performs a reduction in performance of a processor located near the temperature sensor 180J, so as to reduce power consumption and implement thermal protection. In other embodiments, the electronic device heats the battery 142 when the temperature is below another threshold to avoid an abnormal shutdown of the electronic device due to low temperatures. In other embodiments, the electronic device performs a boost on the output voltage of the battery 142 when the temperature is below a further threshold to avoid abnormal shutdown due to low temperature.

The touch sensor 180K is also referred to as a "touch panel". The touch sensor 180K may be disposed on the display screen 194, and the touch sensor 180K and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180K is used to detect a touch operation applied thereto or nearby. The touch sensor can communicate the detected touch operation to the application processor to determine the touch event type. Visual output associated with the touch operation may be provided through the display screen 194. In other embodiments, the touch sensor 180K may be disposed on a surface of the electronic device at a different position than the display screen 194.

The bone conduction sensor 180M may acquire a vibration signal. In some embodiments, the bone conduction sensor 180M may acquire a vibration signal of the human vocal part vibrating the bone mass. The bone conduction sensor 180M may also contact the human pulse to receive the blood pressure pulsation signal. In some embodiments, the bone conduction sensor 180M may also be disposed in a headset, integrated into a bone conduction headset. The audio module 170 may analyze a voice signal based on the vibration signal of the bone mass vibrated by the sound part acquired by the bone conduction sensor 180M, so as to implement a voice function. The application processor can analyze heart rate information based on the blood pressure beating signal acquired by the bone conduction sensor 180M, so as to realize the heart rate detection function.

The keys 190 include a power-on key, a volume key, and the like. The keys 190 may be mechanical keys. Or may be touch keys. The electronic device may receive a key input, and generate a key signal input related to user settings and function control of the electronic device.

The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration cues, as well as for touch vibration feedback. For example, touch operations applied to different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 191 may also respond to different vibration feedback effects for touch operations applied to different areas of the display screen 194. Different application scenes (such as time reminding, receiving information, alarm clock, game and the like) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

Indicator 192 may be an indicator light that may be used to indicate a state of charge, a change in charge, or a message, missed call, notification, etc.

The SIM card interface 195 is used to connect a SIM card. The SIM card can be attached to and detached from the electronic device by being inserted into the SIM card interface 195 or being pulled out of the SIM card interface 195. The electronic equipment can support 1 or N SIM card interfaces, and N is a positive integer greater than 1. The SIM card interface 195 may support a Nano SIM card, a Micro SIM card, a SIM card, etc. The same SIM card interface 195 can be inserted with multiple cards at the same time. The types of the plurality of cards may be the same or different. The SIM card interface 195 may also be compatible with different types of SIM cards. The SIM card interface 195 may also be compatible with external memory cards. The electronic equipment realizes functions of conversation, data communication and the like through the interaction of the SIM card and the network. In some embodiments, the electronic device employs esims, namely: an embedded SIM card. The eSIM card can be embedded in the electronic device and cannot be separated from the electronic device.

The methods in the following embodiments may be implemented in an electronic device having the above hardware structure.

In the embodiment of the application, in the multi-device scenario, after the user speaks the wakeup word and the voice command, one device of the multiple devices is selected to perform wakeup response through the multi-device wakeup arbitration. And through the arbitration of the capabilities of the multiple devices, when the device for performing the wake-up response does not have the function of executing the event corresponding to the voice command, the device with the function of executing the event corresponding to the voice command in the multiple devices can execute the event corresponding to the voice command, and the response to the voice command is completed.

The multi-device wake arbitration and the multi-device capability arbitration may be implemented by one of the devices or may be implemented by a server. The following describes in detail a voice control method provided in the embodiments of the present application according to different implementations of the multi-device wake-up arbitration and the multi-device capability arbitration devices. In addition, in the following embodiments, with reference to fig. 1, a multi-device scenario is taken as: the user living room has three devices, namely a sound box 101, a television 102 and a mobile phone 103, which are all provided with voice assistants, and the awakening words are small E and small E as an example for explanation.

Fig. 4 is a flowchart illustrating a voice control method according to an embodiment of the present application. This embodiment takes as an example that the multi-device wake arbitration and the multi-device capability arbitration are implemented by a server. As shown in fig. 4, the method may include the following S401-S409.

S401, the sound box 101, the television 102 and the mobile phone 103 receive first voice data input by a user respectively.

For example, the first voice data may be the above-mentioned wake-up word "small E".

For the electronic equipment provided with the voice assistant, under the condition that the electronic equipment does not have other software and hardware and uses the microphone to collect voice data, the DSP of the electronic equipment can monitor whether the voice data is input by a user in real time through the microphone. In general, when a user wants to use a voice control function of an electronic apparatus, the user may generate a sound within a sound pickup distance of the electronic apparatus to input the generated sound to a microphone. At this time, if the electronic device does not have other software and hardware using the microphone to collect voice data, the DSP of the electronic device may monitor corresponding voice data, such as the first voice data, through the microphone and perform caching.

For example, in connection with FIG. 5, a user sitting in a couch in a living room may speak the wake up word "Small Esmall E" when desiring to use the voice control function. If the sound box 101, the sound pickup distances of the television 102 and the mobile phone 103 are 4 meters, and no other software or hardware is using the microphone to collect voice data, the DSPs of the sound box 101, the television 102 and the mobile phone 103 can detect the first voice data corresponding to the wake-up word "small E" through the respective microphones.

S402, the speaker 101, the television 102, and the mobile phone 103 check the received first voice data, and determine that the first voice data is a registered wakeup word.

After the electronic device receives the first voice data, the first voice data may be checked, that is, whether the received first voice data is a wakeup word registered in the electronic device is determined. If the check is passed, indicating that the received first voice data is a wakeup word, the following S403 may be performed. If the verification fails, it indicates that the received first voice data is not a wakeup word, and at this time, the electronic device may delete the cached first voice data.

For example, the verifying the first voice data by the electronic device may specifically include: the DSP of the electronic equipment performs low-precision matching on the text of the first voice data and the text of the awakening words registered in the electronic equipment. If the matching of the DSP is passed and the AP of the electronic equipment is in a dormant state, the DSP can awaken the AP, and the AP carries out high-precision matching on the text of the first voice data and the text of the awakening words registered in the electronic equipment. If the AP match also passes, the electronic device can determine that the first voice data is a registered wake-up word. If the match of the DSP fails or the match of the AP fails, the electronic device may determine that the first voice data is not a registered wake up word.

For example, in combination with the example in S401, after the DSPs of the sound box 101, the television 102, and the mobile phone 103 detect the first voice data corresponding to the wakeup word "small E", the respective DSPs and APs may check the first voice data. As in this embodiment, the sound box 101, the television 102, and the mobile phone 103 all verify the detected first voice data, that is, all three determine that the detected first voice data is a registered wakeup word.

S403, the speaker 101, the tv 102, and the mobile phone 103 report the detected energy information of the first voice data to the server, respectively.

Wherein the energy information is used to indicate a distance between the device and the user. In some embodiments, the energy information may be represented by one or more of a signal-to-noise ratio, a sound pressure, and the like. For example, the energy information is expressed by sound pressure. With reference to the example in S402, after the sound box 101, the television 102, and the mobile phone 103 determine that the detected first voice data is the registered wakeup word, the sound box 101, the television 102, and the mobile phone 103 may respectively measure the sound pressure of the detected first voice data, and report the measured sound pressure of the first voice data to the server. Wherein a higher sound pressure indicates a closer distance between the device and the user.

S404, the server determines that the sound box 101 wakes up according to the energy information of the first voice data reported by the sound box 101, the television 102 and the mobile phone 103.

After receiving the energy information of the first voice data reported by the multiple electronic devices, the server may perform multi-device wake-up arbitration, that is, the server may select one of the multiple electronic devices to perform a wake-up response.

For example, in connection with the example in S403, after receiving the sound pressure of the first voice data transmitted by the sound box 101, the television 102, and the mobile phone 103, the server may select a device with the largest sound pressure, that is, the device closest to the user, to perform the wake-up response according to the magnitude of the sound pressure. Referring to fig. 5, the distances between the speaker 101, the tv 102 and the handset 103 and the user are 2 meters, 3 meters and 2.5 meters, respectively. Accordingly, the sound pressure of the first voice data measured by the sound box 101 is the largest, and the sound pressure of the first voice data measured by the television 102 is the smallest after the sound pressure of the first voice data measured by the mobile phone 103. Thus, the server may select loudspeaker 101 for a wake-up response. For example, the server may send a first wake-up indication to the loudspeaker 101, where the first wake-up indication indicates to perform a wake-up response. In addition, the server may further send a second wake-up instruction to the television 102 and the mobile phone 103, respectively, where the second wake-up instruction is used to instruct not to perform a wake-up response. Alternatively, the server may not send any instruction to the television 102 and the mobile phone 103, but the television 102 and the mobile phone 103 determine that no wake-up instruction is received within a preset time, and determine not to perform a wake-up response when the above-mentioned first wake-up instruction is received.

S405, the sound box 101 wakes up the voice assistant and receives second voice data input by the user.

And S406, the sound box 101 reports the second voice data to the server.

For example, as shown in fig. 5, loudspeaker 101 may wake up its voice control function, such as waking up its voice assistant, after receiving the first wake-up indication. Loudspeaker 101 may also play a wake-up response, such as "i am". And the television 102 and the handset 103 do not respond according to the received second wake-up indication. The user may continue to speak the voice command. Thus, the AP of the speaker 101 can detect the voice data corresponding to the voice command, such as the second voice data, through the microphone. At this time, the speaker 101 may report the second voice data to the server.

S407, the server determines the devices in the sound box 101, the television 102, and the mobile phone 103 that have the function of executing the event corresponding to the second voice data.

After receiving the second voice data reported by the sound box 101, the server may perform multi-device capability arbitration, that is, the server may determine, according to the second voice data, which electronic device of the plurality of electronic devices has a function of executing an event corresponding to the second voice data. In some embodiments, the electronic device may automatically report the capability information of the electronic device to the server when the electronic device is powered on or restarted, so that the server stores the capability information. In other embodiments, the electronic device may also automatically report its own capability information to the server periodically. The electronic equipment can also automatically report the self capability information to the server when detecting that the self capability information changes. In this way, after the server receives the second voice data, the server may analyze the second voice data by using an Automatic Speech Recognition (ASR) technology to obtain what function the electronic device needs to have to execute the event corresponding to the second voice data. And then according to the determined result and the stored capability information of the plurality of electronic devices, determining a device having a function of executing the event corresponding to the second voice data from the plurality of electronic devices.

For example, in connection with fig. 5 and the examples in S401 to S406, it is assumed that the sound box 101, the television 102, and the mobile phone 103 respectively report their own capability information when they are powered on. If the capability information reported by the loudspeaker 101 includes: music broadcast function, weather report function. The capability information reported by the television 102 includes: and a video playing function. The capability information reported by the mobile phone 103 includes: and (4) a navigation function. The server may store the capability information reported by each electronic device in correspondence with an identifier of the electronic device (e.g., a Media Access Control (MAC) address of the device), for example, the correspondence between the capability information of the electronic device and the identifier of the electronic device stored by the server is shown in table 1.

TABLE 1

In table 1, MAC address 1 is an identifier of sound box 101, MAC address 2 is an identifier of television 102, and MAC address 3 is an identifier of mobile phone 103. In addition, it should be noted that the sound box 101, the television 102, and the mobile phone 103 may report their own capability information to the server once when they are powered on, so that the server can update the capability information of the device in time.

For example, the voice command spoken by the user, i.e. the second voice data is "play movie earth wave". After the server receives the second voice data 'play movie surfing earth', the server can analyze the second voice data 'play movie surfing earth' and determine that an event corresponding to the 'play movie surfing earth' is executed, namely that equipment for executing the play movie surfing earth needs to have a video playing function. The server may determine, according to table 1, that the device identified as MAC address 2, i.e., the television 102, has a video playing function. That is, the server specifies the sound box 101, the television 102, and the cellular phone 103, and the television 102 is a device having a function of executing an event corresponding to the second voice data "play movie earth.

For another example, the voice command spoken by the user, i.e., the second voice data is "navigate to a place" as an example. After the server receives the second voice data 'navigate to a place', the server can analyze the second voice data 'navigate to a place', and determine to execute an event corresponding to 'navigate to a place', namely, equipment for executing navigation to a place needs to have a navigation function. The server can determine from table 1 that the device identified as MAC address 3, i.e. the handset 103, has a navigation function. That is, the server specifies the speaker 101, the television 102, and the mobile phone 103 is a device having a function of executing the event corresponding to the second voice data "navigate to a place".

S408, the server transmits a content instruction to the device having the function of executing the event corresponding to the second voice data.

S409, the device having the function of executing the event corresponding to the second voice data executes the event corresponding to the second voice data according to the content instruction.

The content indication may be data required to perform an event corresponding to the second voice data. For example, as shown in fig. 6, the voice command spoken by the user, i.e., the second voice data is "play movie stormy earth". The content indication may be a play link for the movie "wandering earth". Thus, in connection with the example in S407, the server may send the play link for the movie "wandering earth" to the television 102. After receiving the play link, the tv 102 can play the movie "wandering earth" according to the play link, as shown in fig. 6. In fig. 4, S408 and S409 illustrate the television 102 as a device having a function of executing the event corresponding to the second voice data.

The content indication may be an instruction corresponding to the second voice data. For another example, the voice command spoken by the user, i.e., the second voice data is "navigate to a place" as an example. The content indication may be an instruction corresponding to "navigate to place" of the second voice data. Thus, in connection with the example in S407, the server may send an instruction corresponding to the second voice data "navigate to place" to the cell phone 103. The mobile phone 103 can start the navigation application according to the received instruction, display the route navigated to the place, and perform voice broadcast. Of course, the content indication may also be the second voice data itself, so that after receiving the second voice data, the mobile phone 103 may analyze the second voice data to obtain an instruction corresponding to the second voice data, and execute the instruction.

In addition, the server may also send a command response instruction to the sound box 101, where the command response instruction is used to instruct the sound box 101 to perform a voice command response. In some embodiments, if the server determines that the other electronic device has the function of executing the event corresponding to the second voice data but loudspeaker 101 does not have the function, the server may send a command response indication to loudspeaker 101, where the command response indication instructs loudspeaker 101 to prompt the user that the event corresponding to the voice command is to be executed on the other electronic device.

For example, in connection with the example in S402, the server determines that the television 102 has the function of executing the event corresponding to the second voice data "play movie stormy earth", and the sound box 101 does not have the function. The server may send a command response indication to the soundbox 101 instructing the soundbox 101 to prompt the user that the movie "wandering earth" is to be played on the television 102. As shown in fig. 6, the speaker 101 responds to the command to indicate that "the movie" earth storms "will be played on the tv" is possible. For another example, in conjunction with the example in S402, the server determines that the mobile phone 103 has a function of executing the second voice data "navigate to a place" corresponding to the event, and the sound box 101 does not have the function. The server may send a command response indication to the cell phone 103 instructing the speaker 101 to prompt the user that navigation is to be performed on the cell phone 103. Sound box 101 may perform a voice broadcast "navigate on cell phone" according to the command response indication.

In other embodiments, if the server determines that loudspeaker 101 is capable of performing the event corresponding to the second voice data, the server may send a voice command response and a content indication to loudspeaker 101. In this way, the sound box 101 can perform voice broadcast according to the voice command response, if the broadcast content is "a certain event will be executed", and execute the event corresponding to the second voice data according to the content indication.

It should be noted that, in the embodiment of the present application, the user may speak the wake-up word (i.e., the first voice data) and the voice command (i.e., the second voice data) continuously or discontinuously. For example, the user can continuously speak the wake-up word and the voice command "small E and small E play movie earth wave". Or speak the awakening word "small E" first, and after hearing that there is a device to play the awakening response sound, such as "i am", speak the voice command "play movie wandering earth". If the user speaks the awakening words and voice commands continuously, after the equipment for awakening response is determined, the equipment does not play the awakening response sound, but directly plays the prompt sound according to the command response instruction after receiving the command response instruction sent by the server, such as 'movie' wandering earth 'will be played on the television'.

In S407-S409, the multi-device scene includes only three devices, i.e., the sound box 101, the television 102, and the mobile phone 103, as an example. In other embodiments, other electronic devices may also be included in the multi-device scenario. The electronic device may or may not have a voice control function. When the electronic equipment has the voice control function, the awakening word can be different from the awakening word 'small E and small E'; or the electronic equipment has a voice control function, the awakening word is the same as the awakening word 'small E and small E', but the distance between the electronic equipment and the user exceeds the sound pickup distance. In such a scenario, if the server stores the capability information of the electronic device and determines that the electronic device is a device having a function of executing the event corresponding to the second voice data, the server may also send a content indication to the electronic device so that the electronic device executes the event corresponding to the second voice data according to the content indication. Thus, the coverage range of the voice control can exceed the sound pickup range of the electronic equipment. For example, the sound collecting distance of a television set provided with 6 microphones is generally within 5 meters, and by adopting the method of the embodiment of the application, even if the distance between a user and the television set exceeds 5 meters, the user can be controlled to automatically execute events such as video playing and the like through voice control. In addition, the user does not need to explicitly say that the video needs to be played on the television, that is, the user does not need to specify that the device needing to play the video is the television, and only needs to say that "play a certain video" by the user.

In addition, with the popularization of smart homes, more and more electronic devices with voice control functions are provided, and more functions are provided for the electronic devices. If the device that finally executes the event corresponding to the voice command is determined based on the capability information of the electronic device according to the example in S407, a case where a plurality of electronic devices each have a function of executing the event corresponding to the voice command may occur at the same time. In some embodiments, the server may select one electronic device from the plurality of electronic devices having the function of executing the event corresponding to the voice command to execute the event corresponding to the voice command. In other embodiments, the server may also select the electronic device closest to the user to execute the event corresponding to the voice command according to the distance between the user and each of the plurality of electronic devices having the function of executing the event corresponding to the voice command. The server may further select one of the electronic devices to execute the event corresponding to the voice command according to a state of each of the electronic devices having a function of executing the event corresponding to the voice command, such as whether the electronic device is in a power-on state or not, whether the electronic device is determined to be used for executing the event corresponding to the other voice command within a preset time, and the like. For example, when the server determines that two electronic devices (e.g., the electronic device 1 and the electronic device 2) have the function of executing the event corresponding to the voice command, but the electronic device 1 is determined to execute the event corresponding to another voice command several minutes ago, the server may select the electronic device 2 to execute the event corresponding to the current voice command. The server can also record the use habits of different users (different users can be distinguished by voiceprints), and the electronic equipment frequently used by the user is selected from a plurality of electronic equipment with the function of executing the event corresponding to the voice command in combination with the use habits to execute the event corresponding to the voice command. For example, if the server determines that both the television 1 and the television 2 have the video playing function after receiving a voice command of the user 1 instructing to play a video, the server may select the television 1 to play the video according to the use habit of the user, where the user 1 often uses the television 1 to watch the video, and the server determines that both the television 1 and the television 2 have the video playing function. Of course, the server may also determine an event corresponding to the electronic device to execute the voice command by combining one or more of the distance between the electronic device and the user, the state of the electronic device, and the usage habit of the user, which is not limited in this embodiment. Therefore, the device which best meets the user intention can be selected to execute the event corresponding to the voice command, so that the voice control is more intelligent, and the use experience of the user is improved.

Fig. 7 is a flowchart illustrating another speech control method according to an embodiment of the present application. This embodiment takes as an example that the multi-device wake arbitration and the multi-device capability arbitration are implemented by the master device. The master device may be any one of the sound box 101, the television 102, and the mobile phone 103, and in this embodiment, the master device is taken as the mobile phone 103 as an example. As shown in fig. 7, the method may include the following S701-S709.

S701, the sound box 101, the television 102 and the mobile phone 103 respectively receive first voice data input by a user.

S702, the speaker 101, the tv 102, and the mobile phone 103 check the received first voice data, and determine that the first voice data is a registered wakeup word.

The specific descriptions of S701 and S702 are the same as the descriptions of corresponding contents in S401 and S402 in the embodiment shown in fig. 4, and are not repeated here.

S703, the speaker 101 and the tv 102 report the detected energy information of the first voice data to the mobile phone 103 respectively.

S704, the mobile phone 103 determines that the sound box 101 performs the wake-up response according to the energy information of the first voice data reported by the sound box 101 and the television 102 and the energy information of the first voice data measured by the mobile phone.

The specific descriptions of S703 and S704 are similar to the descriptions of the corresponding contents in S403 and S404 in the embodiment shown in fig. 4. The difference is that in the present embodiment, the multi-device wake-up arbitration is performed by the mobile phone 103 as the main device, and therefore, the sound box 101 and the television 102 report the energy information of the first voice data to the mobile phone 103.

S705, the sound box 101 wakes up the voice assistant to receive the second voice data input by the user.

S706, the sound box 101 reports the second voice data to the mobile phone 103.

And S707, the mobile phone 103 determines that the sound box 101, the television 102 and the mobile phone 103 have a function of executing the event corresponding to the second voice data.

The detailed descriptions of S705-S707 are similar to the descriptions of the corresponding contents of S405-S407 in the embodiment shown in fig. 4. The difference lies in that: 1. in this embodiment, the multi-device capability arbitration is performed by the mobile phone 103 as the master device, and therefore, after receiving the second voice data, the sound box 101 reports the second voice data to the mobile phone 103. Of course, in this embodiment, the mobile phone 103 serving as the main device may also collect voice data input by the user. 2. The mobile phone 103 stores capability information of itself and other electronic devices, for example, as shown in table 1 in the embodiment shown in fig. 4, a corresponding relationship between the capability information of the electronic device and an identifier of the electronic device may be stored in the mobile phone 103, so as to determine a device having a function of executing an event corresponding to the second voice data according to the corresponding relationship.

In this embodiment, if the mobile phone 103 determines that the device having the function of executing the event corresponding to the second voice data is the mobile phone 103, the device is the mobile phone 103. At this time, if the content indication does not need to be obtained by interacting with the server, the mobile phone 103 may directly analyze the second voice data to obtain a corresponding instruction, and then execute an event corresponding to the second voice data according to the instruction, and if the content indication needs to be obtained by interacting with the server, the mobile phone 103 may send a request message to the server to request the server to send the content indication to the mobile phone 103.

If the mobile phone 103 determines that the device having the function of executing the event corresponding to the second voice data is another device, such as the sound box 101 or the television 102, the following S708-S709 may be performed.

S708, the mobile phone 103 transmits a content instruction to the device having the function of executing the event corresponding to the second voice data.

S709, the device having the function of executing the event corresponding to the second voice data executes the event corresponding to the second voice data according to the content instruction.

The mobile phone 103 may send a request message to the server to obtain a content indication, and send the content indication to a device having a function of executing an event corresponding to the second voice data, so that the device executes the event corresponding to the second voice data according to the content indication. In fig. 7, S708 and S709 illustrate the television 102 as a device having a function of executing the event corresponding to the second voice data.

Of course, in some other embodiments, if it is determined that the device having the function of executing the event corresponding to the second voice data is another device and is not the device performing the wake-up response, that is, not the sound box 101 but the television 102, the mobile phone 103 may send the second voice data to the television 102 as an alternative to S708. The television 102 may interact with the server according to the second voice data to obtain the content indication.

In other embodiments, if it is determined that the device having the function of executing the event corresponding to the second voice data is another device and is the device performing the wake-up response, that is, the sound box 101, as an alternative to S708, the mobile phone 103 may send instruction information to the sound box 101, where the instruction information is used for instructing the sound box 101 to respond to the voice command. At this time, the sound box 101 may interact with the server according to the received second voice data to obtain the content indication.

It should be noted that, the above-mentioned S708 and S709 are described by taking as an example that the content indication needs to be obtained by interacting with the server to implement the response to the voice command. If the content instruction does not need to be obtained by interacting with the server, when it is determined that the device having the function of executing the event corresponding to the second voice data is not the sound box 101 but the television 102, the mobile phone 103 may send the second voice data to the television 102, and the television 102 may analyze the second voice data to obtain a corresponding instruction, and then execute the event corresponding to the second voice data according to the instruction. The mobile phone 103 may also analyze the second voice data to obtain a corresponding instruction, and then send the instruction to the television 102, so that the television 10 executes an event corresponding to the second voice data according to the instruction. When it is determined that the device having the function of executing the event corresponding to the second voice data is the sound box 101, the mobile phone 103 may send instruction information to the sound box 101, and the sound box 101 may directly analyze the second voice data according to the instruction information to obtain a corresponding instruction, and then execute the event corresponding to the second voice data according to the instruction.

In addition, the mobile phone 103 may also send a command response instruction to the sound box 101, where the command response instruction is used to instruct the sound box 101 to perform a voice command response. For a detailed description of the voice command response, reference may be made to the detailed description of the corresponding contents in the embodiment shown in fig. 4. Other descriptions for S707-S709 can also refer to the descriptions of corresponding contents of S407-S409 in the embodiment shown in fig. 4. Are not described in detail herein.

It should be noted that, in the embodiment of the present application, the interaction between the electronic devices (e.g., between the mobile phone 103 and the audio box 101, and between the mobile phone 103 and the television 102) may be implemented by establishing a bluetooth connection between the two electronic devices using a bluetooth protocol, or may be implemented by establishing a Wi-Fi connection between the two electronic devices using a Wi-Fi protocol. Of course, the present invention may also be implemented by using a connection established by using other short-range communication protocols, and the embodiment is not limited in particular herein.

By adopting the method shown in fig. 4 or fig. 7, in a multi-device scenario, after the user speaks the wakeup word and the voice command, through multi-device wakeup arbitration and multi-device capability arbitration, not only one device can be woken up, for example, the device closest to the user performs wakeup response. Moreover, when the equipment for performing the wake-up response does not have the function of executing the event corresponding to the voice command, the equipment corresponding to the equipment having the function of executing the event corresponding to the voice command can execute the event corresponding to the voice command without moving the position of the user or re-speaking the wake-up word and the voice command, so that the response to the voice command is completed. The electronic equipment is more intelligent, and efficient interaction between the electronic equipment and a user is realized. Meanwhile, the use experience of the user is improved.

Still other embodiments of the present application provide a computer storage medium, which may include computer instructions, which, when executed on an electronic device (such as the above-mentioned sound box 101, television 102, or mobile phone 103), cause the electronic device to perform the steps as performed by the electronic device in the corresponding embodiment of fig. 7.

Further embodiments of the present application also provide a computer program product, which when run on a computer, causes the computer to perform the steps as performed by the electronic device (such as the sound box 101, the television 102, or the mobile phone 103 described above) in the corresponding embodiment of fig. 7.

Other embodiments of the present application further provide an apparatus having a function of implementing the behavior of the electronic device (such as the sound box 101, the television 102, or the mobile phone 103) in the corresponding embodiment of fig. 7. The functions can be realized by hardware, and the functions can also be realized by executing corresponding software by hardware. The hardware or software includes one or more modules corresponding to the above functions, for example, a receiving unit or module, a determining unit or module, a transmitting unit or module, and the like.

Through the above description of the embodiments, it is clear to those skilled in the art that, for convenience and simplicity of description, the foregoing division of the functional modules is merely used as an example, and in practical applications, the above function distribution may be completed by different functional modules according to needs, that is, the internal structure of the device may be divided into different functional modules to complete all or part of the above described functions.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. For example, the above-described device embodiments are merely illustrative, and for example, the division of the modules or units is only one logical functional division, and there may be other divisions when actually implemented, for example, a plurality of units or components may be combined or may be integrated into another device, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may be one physical unit or a plurality of physical units, that is, may be located in one place, or may be distributed in a plurality of different places. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially or partially contributed to by the prior art, or all or part of the technical solutions may be embodied in the form of a software product, where the software product is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

The above description is only an embodiment of the present application, but the scope of the present application is not limited thereto, and any changes or substitutions within the technical scope of the present disclosure should be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A voice control method is applied to a voice control system, and the voice control system comprises the following steps: a set of devices and a server, the set of devices including at least a first electronic device having a voice control function, the method comprising:

the first electronic equipment receives first voice data of a user;

the first electronic equipment determines that the first voice data is the same as a wake-up word registered in the first electronic equipment, and sends energy information of the first voice data detected by the first electronic equipment to the server;

the server determines that the first electronic device wakes up according to the energy information of the first voice data detected by the first electronic device, and sends a first wake-up instruction to the first electronic device;

the first electronic equipment responds to the first awakening instruction and awakens a voice control function of the first electronic equipment;

the first electronic equipment which wakes up the voice control function receives second voice data of a user;

the first electronic equipment sends the second voice data to the server;

the server determines target electronic equipment from the group of equipment according to the second voice data, wherein the target electronic equipment has a function of executing the event corresponding to the second voice data;

the server sends a content indication to the target electronic equipment, wherein the content indication is an instruction corresponding to the second voice data, or the content indication is data required for executing an event corresponding to the second voice data;

and the target electronic equipment executes the event corresponding to the second voice data according to the content indication.

2. The method of claim 1, wherein the set of devices further comprises a second electronic device having voice control functionality, the method further comprising:

the second electronic equipment receives the first voice data of a user;

the second electronic equipment determines that the first voice data is the same as the awakening words registered in the second electronic equipment, and sends energy information of the first voice data detected by the second electronic equipment to the server;

the server determines, according to the energy information of the first voice data detected by the first electronic device, that the first electronic device performs a wake-up response, and sends a first wake-up instruction to the first electronic device, where the method includes:

if the energy of the first voice data detected by the first electronic device is greater than the energy of the first voice data detected by the second electronic device, the server determines that the first electronic device wakes up according to the energy information of the first voice data detected by the first electronic device and the energy information of the first voice data detected by the second electronic device, and sends a first wake-up instruction to the first electronic device.

3. The method of claim 1 or 2, wherein the set of devices further comprises a third electronic device;

wherein the third electronic device does not have a voice control function; or the like, or, alternatively,

the third electronic equipment has a voice control function, but the distance between the third electronic equipment and a user is greater than the sound pickup distance of the third electronic equipment.

4. The method of any of claims 1-3, wherein voice control functionality of neither the first electronic device nor the second electronic device is woken up while receiving the first voice data.

5. The method according to any one of claims 1-4, further comprising:

the server sends a command response instruction to the first electronic device, wherein the command response instruction is used for instructing the first electronic device to prompt a user to execute an event corresponding to the second voice data by the target electronic device;

and the first electronic equipment prompts a user to execute an event corresponding to the second voice data by the target electronic equipment according to the command response instruction.

6. The method according to any of claims 1-5, wherein the server determines a target electronic device from the set of devices based on the second voice data, comprising:

the server selects equipment with a function of executing the event corresponding to the second voice data from the group of equipment according to the capability information of each equipment in the group of equipment and the second voice data;

if only one device in the group of devices has the function of executing the event corresponding to the second voice data, the server determines that the device is the target electronic device;

if a plurality of devices in the group of devices have the function of executing the event corresponding to the second voice data, the server determines one device from the plurality of devices as the target electronic device;

wherein the target electronic device is any one of the plurality of devices, or,

the target electronic device satisfies at least one of the following conditions:

the target electronic device is the device with the shortest distance to the user in the plurality of devices;

the target electronic equipment is in a starting state;

the target electronic equipment is not determined to be used for executing events corresponding to other voice data within preset time; or the like, or, alternatively,

the target electronic device is the device with the highest frequency of use among the plurality of devices.

7. The method of claim 6, further comprising:

each device in the group of devices reports respective capability information to the server;

the server stores capability information for each device in the set of devices.

8. The method according to any one of claims 1-7, further comprising:

the server sends a second awakening instruction to the second electronic equipment, and the second electronic equipment determines not to awaken the voice control function of the second electronic equipment according to the second awakening instruction; or the like, or, alternatively,

and the second electronic equipment determines that the first awakening instruction is not received within preset time, and determines not to awaken the voice control function of the second electronic equipment.

9. A voice control method applied to a set of devices including at least a first electronic device and a second electronic device having a voice control function, the method comprising:

the first electronic equipment and the second electronic equipment respectively receive first voice data of a user;

the first electronic equipment determines that the first voice data is the same as a wake-up word registered in the first electronic equipment, and obtains energy information of the first voice data detected by the first electronic equipment;

the second electronic equipment determines that the first voice data is the same as a wake-up word registered in the second electronic equipment, and sends energy information of the first voice data detected by the second electronic equipment to the first electronic equipment;

the first electronic device determines a device for performing wake-up response from the first electronic device and the second electronic device according to the energy information of the first voice data detected by the first electronic device and the energy information of the first voice data detected by the second electronic device;

if the energy of the first voice data detected by the first electronic equipment is greater than the energy of the first voice data detected by the second electronic equipment, determining that the first electronic equipment wakes up to respond, the first electronic equipment wakes up the voice control function of the first electronic equipment, and wakes up the first electronic equipment after the voice control function to receive second voice data of a user;

the first electronic equipment determines target electronic equipment from the group of equipment according to the second voice data, and the target electronic equipment has a function of executing an event corresponding to the second voice data;

if the target electronic equipment is the first electronic equipment, the first electronic equipment analyzes the second voice data to obtain an instruction corresponding to the second voice data, and executes an event corresponding to the second voice data according to the instruction; or the first electronic device acquires data required for executing the event corresponding to the second voice data from a server, and executes the event corresponding to the second voice data according to the data;

if the target electronic device is not the first electronic device, the first electronic device sends a content indication to the target electronic device; the content indication is an instruction corresponding to the second voice data, or the content indication is data required for executing an event corresponding to the second voice data; and the target electronic equipment executes the event corresponding to the second voice data according to the content indication.

10. The method of claim 9, further comprising:

if the energy of the first voice data detected by the second electronic equipment is larger than the energy of the first voice data detected by the first electronic equipment, and the second electronic equipment is confirmed to wake up and respond, the first electronic equipment sends a first wake-up instruction to the second electronic equipment, the second electronic equipment responds to the first wake-up instruction, wakes up the voice control function of the second electronic equipment, and the second electronic equipment which wakes up the voice control function receives the second voice data of a user and sends the second voice data to the first electronic equipment.

11. The method of claim 9 or 10, wherein the set of devices further comprises a third electronic device;

12. The method of any of claims 9-11, wherein voice control functionality of neither the first electronic device nor the second electronic device is woken up while receiving the first voice data.

13. The method according to any one of claims 9 to 12,

if the second electronic device is a device that responds to the wake up, the method further comprises: the first electronic device sends a command response instruction to the second electronic device, wherein the command response instruction is used for instructing the second electronic device to prompt a user to execute an event corresponding to the second voice data by the target electronic device; the second electronic equipment prompts a user to execute an event corresponding to the second voice data by the target electronic equipment according to the command response instruction; or

If the first electronic device is a device that responds to a wake up, the method further comprises: and the first electronic equipment prompts a user that the target electronic equipment executes the event corresponding to the second voice data.

14. The method according to any of claims 9-13, wherein the first electronic device determining a target electronic device from the set of devices based on the second speech data comprises:

the first electronic equipment selects equipment with a function of executing the event corresponding to the second voice data from the group of equipment according to the capability information of each equipment in the group of equipment and the second voice data;

if only one device in the group of devices has the function of executing the event corresponding to the second voice data, the first electronic device determines that the device is the target electronic device;

if a plurality of devices in the group of devices have a function of executing the event corresponding to the second voice data, the first electronic device determines one device from the plurality of devices as the target electronic device;

the target electronic equipment is in a starting state;

15. The method of claim 14, further comprising:

each device except the first electronic device in the group of devices reports respective capability information to the first electronic device;

the first electronic device stores capability information for each device in the set of devices.

16. The method of any of claims 9-15, wherein if the first electronic device is a wake-up responding device, the method further comprises:

the first electronic equipment sends a second awakening instruction to the second electronic equipment, and the second electronic equipment determines not to awaken the voice control function of the second electronic equipment according to the second awakening instruction; or the like, or, alternatively,

17. A voice control method is applied to a first electronic device with a voice control function, wherein the first electronic device is included in a set of devices, and the method comprises the following steps:

the first electronic equipment receives first voice data of a user;

the first electronic equipment determines that the first voice data is the same as a wake-up word registered in the first electronic equipment, and sends energy information of the first voice data detected by the first electronic equipment to a server;

the first electronic device receives a wake-up instruction sent by a server, wherein the wake-up instruction is sent after the server determines that the first electronic device performs wake-up response according to the energy information of the first voice data detected by the first electronic device;

the first electronic equipment responds to the awakening indication to awaken a voice control function of the first electronic equipment;

the first electronic equipment sends the second voice data to the server;

the first electronic device receives a command response instruction sent by the server, wherein the command response instruction is used for instructing the first electronic device to prompt a user to execute an event corresponding to the second voice data by a target electronic device, and the target electronic device is a device which is determined by the server according to the second voice data and has a function of executing the event corresponding to the second voice data from the group of devices;

18. The method of claim 17, wherein the set of devices further comprises a second electronic device having voice control functionality,

the wake-up instruction is specifically sent by the server after determining that a wake-up response is sent by the first electronic device according to the energy information of the first voice data detected by the first electronic device and the energy information of the first voice data detected by the second electronic device, where the energy of the first voice data detected by the first electronic device is greater than the energy of the first voice data detected by the second electronic device.

19. The method of claim 17 or 18, wherein the set of devices further comprises a third electronic device;

20. The method of any of claims 17-19, wherein a voice control function of the first electronic device is not woken up while receiving the first voice data.

21. The method of any one of claims 17-20, wherein the target electronic device is the first electronic device, the method further comprising:

the first electronic equipment receives a content indication sent by the server, wherein the content indication is an instruction corresponding to the second voice data or data required for executing an event corresponding to the second voice data;

and the first electronic equipment executes the event corresponding to the second voice data according to the content indication.

22. An electronic device, comprising: one or more processors and memory;

the memory coupled with the one or more processors for storing computer program code comprising computer instructions which, when executed by the one or more processors, cause the electronic device to perform the voice control method of any of claims 17-21.

23. A computer storage medium comprising computer instructions that, when executed on an electronic device, cause the electronic device to perform the voice control method of any of claims 17-21.

24. A voice control system, comprising: the method comprises the steps that a set of devices and a server are arranged, wherein the set of devices at least comprises a first electronic device with a voice control function;

the first electronic equipment receives first voice data of a user;

the first electronic equipment sends the second voice data to the server;

25. The system of claim 24, wherein the set of devices further comprises a second electronic device having voice control functionality,

the second electronic equipment receives the first voice data of a user;

26. A voice control system, the voice control system comprising: a set of devices including at least a first electronic device and a second electronic device having a voice control function;

if the energy of the first voice data detected by the first electronic device is greater than the energy of the first voice data detected by the second electronic device, and the first electronic device determines that the first electronic device performs a wake-up response, the first electronic device wakes up a voice control function of the first electronic device, and the first electronic device after the voice control function is woken up to receive second voice data of a user;

27. The system of claim 26,

if the energy of the first voice data detected by the second electronic device is larger than the energy of the first voice data detected by the first electronic device, and the first electronic device determines that the second electronic device wakes up in response, the first electronic device sends a first wake-up instruction to the second electronic device, the second electronic device responds to the first wake-up instruction, wakes up the voice control function of the second electronic device, and wakes up the second electronic device after the voice control function to receive the second voice data of a user and sends the second voice data to the first electronic device.