CN111833863A - Voice control system, method and apparatus, and computing device and storage medium - Google Patents

Voice control system, method and apparatus, and computing device and storage medium Download PDF

Info

Publication number
CN111833863A
CN111833863A CN201910325459.1A CN201910325459A CN111833863A CN 111833863 A CN111833863 A CN 111833863A CN 201910325459 A CN201910325459 A CN 201910325459A CN 111833863 A CN111833863 A CN 111833863A
Authority
CN
China
Prior art keywords
voice
intelligent
decision
devices
feature data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910325459.1A
Other languages
Chinese (zh)
Other versions
CN111833863B (en
Inventor
韩翀蛟
罗奎
章伟明
陈宣雍
刁宏锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201910325459.1A priority Critical patent/CN111833863B/en
Publication of CN111833863A publication Critical patent/CN111833863A/en
Application granted granted Critical
Publication of CN111833863B publication Critical patent/CN111833863B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

A speech control system, method and apparatus, and computing device and storage medium are disclosed. The voice control method comprises the following steps: acquiring feature data from at least two intelligent devices, wherein the feature data is used for determining the distance between the intelligent devices and a voice speaker; determining, based on the feature data, a distance or a parameter capable of characterizing the distance between each of the at least two smart devices and the voice utterer; determining voice control decisions corresponding to the at least two smart devices based on the distance or parameter; and sending control instructions corresponding to voice control decisions of the at least two intelligent devices. Therefore, the voice in the sound pickup range of at least two intelligent devices is uniquely responded through control, so that the problem of user experience caused by the fact that a plurality of intelligent devices are awakened simultaneously is solved.

Description

Voice control system, method and apparatus, and computing device and storage medium
Technical Field
The present disclosure relates to the field of internet technologies, and in particular, to a voice control system, method, and apparatus, a computing device, and a storage medium.
Background
With the rapid development of information technology, more and more intelligent voice devices are put into use, and users can wake up related intelligent voice devices through simple voice awakening, so that more convenience is provided for the life of the users. However, when a user is in the pickup range of multiple intelligent voice devices, the multiple intelligent voice devices can acquire the voice of the user and are respectively awakened to interact with the user, so that the response audio of the multiple intelligent voice devices is overlapped, and the user experience is influenced.
Therefore, there is a need for an improved voice control strategy to address the above-mentioned problems.
Disclosure of Invention
The disclosure aims to provide a voice control system, a voice control method and a voice control device, so as to solve the problem of user experience caused by the fact that a plurality of intelligent voice devices are awakened at the same time.
According to a first aspect of the present disclosure, there is provided a voice control method, the method comprising: acquiring feature data from at least two intelligent devices, wherein the feature data is used for determining the distance between the intelligent devices and a voice speaker; determining, based on the feature data, a distance or a parameter capable of characterizing the distance between each of the at least two smart devices and the voice utterer; determining voice control decisions corresponding to the at least two smart devices based on the distance or parameter; and sending control instructions corresponding to voice control decisions of the at least two intelligent devices.
Optionally, the voice control decision may include: enabling the smart device closest to the voice speaker to be awakened; or disabling the smart devices other than the smart device closest to the voice speaker from being woken up.
Optionally, the method may further include: taking an intelligent device which has received the transmitted characteristic data and has not transmitted a control instruction for prohibiting awakening as a first intelligent device; acquiring feature data of a second intelligent device different from the first intelligent device in response to the same voice; and sending a control instruction for prohibiting awakening to the intelligent device far away from the voice sender in the first intelligent device and the second intelligent device, and taking the intelligent device close to the user in the first intelligent device and the second intelligent device as a new first intelligent device.
Optionally, in a case that a predetermined condition is satisfied, a wake-up control instruction is sent to the first smart device.
Optionally, the method further comprises: and for the same voice, after the awakening control instruction is sent to the first intelligent device, under the condition that the feature data from other intelligent devices is acquired, sending a control instruction for prohibiting awakening to the other intelligent devices.
Optionally, the predetermined condition comprises at least one of: aiming at the same voice, reaching preset decision time from the moment of receiving the characteristic data sent by the first intelligent device; for the same voice, feature data from the smart device is not received for a predetermined period of time.
Optionally, the method is performed by a server and/or a decision terminal, the decision terminal is one of a plurality of intelligent devices, the decision terminal can communicate with other intelligent devices, and the server and the plurality of intelligent devices communicate based on a wireless communication network; and/or the plurality of smart devices communicate over a local area network.
Optionally, the server and/or the decision terminal obtains feature data from at least two pieces of intelligent equipment, and sends a control instruction corresponding to a voice control decision of the at least two pieces of intelligent equipment to the at least two pieces of intelligent equipment, and the intelligent equipment executes the control instruction received first.
Optionally, the method may further include: sending a device information table to the plurality of smart devices, the device information table including device information for smart devices designated as decision terminals.
Optionally, the feature data may comprise at least one of: calculating a wake-up energy value based on the voice; confidence of the collected wake-up word in the speech; acquiring the image of the user acquired during the voice acquisition; acquiring distance data between the user and the intelligent equipment detected in the voice process; and WiFi channel state information of the smart device.
According to a second aspect of the present disclosure, there is also provided a voice control method, including: acquiring feature data from at least two intelligent devices; determining an association relationship between each of the at least two smart devices and a voice speaker based on the feature data; determining voice control decisions corresponding to the at least two smart devices based on the association; and sending control instructions corresponding to the voice control decisions of the at least one smart device.
Optionally, the voice control decision may include: enabling the intelligent device corresponding to the strongest association relationship to be awakened; or to disable the smart devices other than the smart device corresponding to the strongest association from being woken up.
Optionally, the method is executed by a decision terminal, the decision terminal is one of a plurality of intelligent devices, the decision terminal can communicate with other intelligent devices, and the server communicates with the plurality of intelligent devices based on a wireless communication network; and/or the plurality of smart devices communicate over a local area network.
Optionally, the decision terminal may include a server and/or a decision terminal, where the server and/or the decision terminal obtains feature data from at least two pieces of intelligent equipment, and sends a control instruction corresponding to a voice control decision of at least one piece of intelligent equipment, and the intelligent equipment executes the control instruction received first.
Optionally, the association relationship comprises a distance of the smart device relative to the voice speaker; or the association relation comprises a parameter capable of characterizing the distance of the intelligent device relative to the voice speaker.
Optionally, the feature data comprises at least one of: calculating a wake-up energy value based on the voice; confidence of the collected wake-up word in the speech; collecting the image of the voice sender collected during the voice; collecting distance data between a voice sender and the intelligent equipment detected in the voice process; and WiFi channel state information of the smart device.
According to a third aspect of the present disclosure, there is also provided a voice control method applied to an intelligent device, the method including: collecting voice sent by a user; sending feature data corresponding to the voice to a server, and sending the feature data to a decision terminal under the condition that the intelligent device is not a designated decision terminal, wherein the feature data is used for determining the distance between the intelligent device and a voice sender; receiving a control instruction from the server and/or the decision terminal; and executing the control instruction received firstly.
Optionally, in a case that the smart device is a designated decision terminal, the method further includes: obtaining feature data from at least two smart devices, the feature data being used to determine a distance between the smart devices and the user; determining, based on the feature data, a distance or a parameter capable of characterizing the distance between each of the at least two smart devices and the voice utterer; determining voice control decisions corresponding to the at least two smart devices based on the distance or parameter; and sending control instructions corresponding to voice control decisions of the at least two intelligent devices.
Optionally, the method may further include: waking up in response to receiving a wake-up control instruction; and/or disable waking in response to receiving a control instruction to disable waking.
Optionally, the method may further include: and automatically waking up under the condition that the control instruction is not received within a preset waiting time after the characteristic data is sent.
Optionally, the method may further include: updating the preset waiting time length based on the waiting time from the sending of the characteristic data to the receiving of the control command.
Optionally, the decision terminal is one of a plurality of intelligent devices, the decision terminal can communicate with other intelligent devices, and the intelligent devices and the server communicate with each other based on a wireless communication network; and/or the plurality of smart devices communicate over a local area network.
Optionally, the feature data comprises at least one of: calculating a wake-up energy value based on the voice; confidence of the collected wake-up word in the speech; collecting the image of the voice sender collected during the voice; collecting distance data between a voice sender and the intelligent equipment detected in the voice process; and WiFi channel state information of the smart device.
Optionally, the method may further include: and filtering the collected voice, and calculating to obtain a wake-up energy value based on the voice after filtering as the characteristic data.
Optionally, the method may further include: a device information table is received that includes device information for a smart device designated as a decision terminal.
According to a fourth aspect of the present disclosure, there is also provided a voice control method applied to an intelligent device, the method including: collecting voice sent by a user; sending feature data corresponding to the voice to a decision end, wherein the feature data are used for determining an incidence relation between the intelligent equipment and the user; receiving a control instruction from the decision end, wherein the control instruction corresponds to a voice control decision determined by the decision end based on the incidence relation; executing the received control instruction.
Optionally, the decision terminal includes a server and/or a decision terminal, where the intelligent device sends feature data corresponding to the voice to the server and/or the decision terminal; and/or the intelligent equipment receives a control instruction from the server and/or the decision terminal and executes the control instruction received firstly.
Optionally, the intelligent device and the server communicate based on a wireless communication network; and/or the intelligent equipment and the decision terminal are communicated based on a local area network.
Optionally, the decision terminal is one of a plurality of intelligent devices, and the decision terminal is capable of communicating with other intelligent devices, where the intelligent device is the decision terminal, the method further includes: acquiring feature data from at least two intelligent devices; determining an association relationship between each of the at least two smart devices and the user based on the feature data; determining voice control decisions corresponding to the at least two smart devices based on the association; and sending control instructions corresponding to the voice control decisions of the at least one smart device.
Optionally, the plurality of smart devices belong to the same device group, and the method further includes: a device information table is received that includes device information for a smart device designated as a decision terminal.
Optionally, the voice control decision comprises: enabling the intelligent device corresponding to the strongest association relationship to be awakened; and/or disabling the smart devices other than the smart device corresponding to the strongest association from being woken up.
Optionally, the association relationship comprises a distance of the smart device relative to the voice speaker; or the association relation comprises a parameter capable of characterizing the distance of the intelligent device relative to the voice speaker.
Optionally, the feature data comprises at least one of: calculating a wake-up energy value based on the voice; confidence of the collected wake-up word in the speech; acquiring the image of the user acquired during the voice acquisition; acquiring distance data between the user and the intelligent equipment detected in the voice process; and WiFi channel state information of the smart device.
According to a fifth aspect of the present disclosure, there is also provided a voice control system, including a decision-making end and a plurality of intelligent devices, where the decision-making end can communicate with the plurality of intelligent devices, where the intelligent devices collect voice sent by a user, and send feature data corresponding to the voice to the decision-making end; the decision end acquires feature data from at least two intelligent devices, determines an incidence relation between each of the at least two intelligent devices and the user based on the feature data, determines a voice control decision corresponding to the at least two intelligent devices based on the incidence relation, and sends a control instruction corresponding to the voice control decision to the at least one intelligent device; and the intelligent equipment receives the control instruction from the decision end and executes the control instruction.
According to a sixth aspect of the present disclosure, there is also provided a voice control system, including a server and a plurality of intelligent devices, the server being capable of communicating with the plurality of intelligent devices, wherein one intelligent device is designated as a decision terminal, the decision terminal being capable of communicating with other intelligent devices, the intelligent device collects a voice uttered by a user, transmits feature data corresponding to the voice to the server, and transmits the feature data to the decision terminal in a case where the intelligent device is not the decision terminal, the feature data being capable of being used to determine a distance between the intelligent device and the user; the server and the decision terminal acquire feature data from at least two intelligent devices, determine the distance between each of the at least two intelligent devices and the user or a parameter capable of representing the distance based on the feature data, determine voice control decisions corresponding to the at least two intelligent devices based on the distance or the parameter, and send control instructions corresponding to the voice control decisions to the at least two intelligent devices; and the intelligent equipment receives a control instruction from the server and/or the decision terminal and executes the control instruction received firstly.
According to a seventh aspect of the present disclosure, there is also provided a voice control apparatus comprising: the first acquisition device is used for acquiring feature data from at least two intelligent devices, wherein the feature data is used for determining the distance between the intelligent devices and the user; first distance means for determining, based on the feature data, a distance or a parameter capable of characterizing the distance between each of the at least two smart devices and the voice utterer; a first decision-making means for determining a voice control decision corresponding to the at least two smart devices based on the distance or parameter; and the first communication device is used for sending a control instruction corresponding to the voice control decision of the at least two intelligent devices.
According to an eighth aspect of the present disclosure, there is also provided a voice control apparatus comprising: the first acquisition device is used for acquiring feature data from at least two intelligent devices; the first distance device is used for determining the incidence relation between each of the at least two intelligent devices and the voice speaker based on the characteristic data; the first decision-making device is used for determining voice control decisions corresponding to the at least two intelligent devices based on the incidence relation; and a first communication device for sending control instructions corresponding to voice control decisions of at least one intelligent device.
According to a ninth aspect of the present disclosure, there is also provided a voice control apparatus comprising: the voice acquisition device is used for acquiring voice sent by a user; the second communication device is used for sending feature data corresponding to the voice to a server and sending the feature data to a decision terminal under the condition that the intelligent equipment is not a designated decision terminal, wherein the feature data is used for determining the distance between the intelligent equipment and a voice decision maker; third communication means for receiving control instructions from the server and/or the decision terminal; and the control sub-device is used for executing the control instruction received firstly.
According to a tenth aspect of the present disclosure, there is also provided a voice control apparatus including: the voice acquisition device is used for acquiring voice sent by a user; the second communication device is used for sending feature data corresponding to the voice to a decision-making terminal, and the feature data are used for determining the incidence relation between the intelligent equipment and the user; a third communication device, configured to receive a control instruction from the decision end, where the control instruction corresponds to a voice control decision determined by the decision end based on the association relationship; and the control sub-device is used for executing the received control instruction.
According to an eleventh aspect of the present disclosure, there is also provided a computing device, comprising: a processor; and a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method as described above.
According to a twelfth aspect of the present disclosure, there is also presented a non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the method as described above.
Therefore, according to the voice control scheme, one of at least two intelligent devices receiving the same voice can be uniquely awakened and interacted with a user through control, and the problem of poor user experience caused by simultaneous response and voice broadcast of the intelligent devices is solved.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent by describing in greater detail exemplary embodiments thereof with reference to the attached drawings, in which like reference numerals generally represent like parts throughout.
FIG. 1 shows a schematic diagram of a voice control system according to one embodiment of the present disclosure.
FIG. 2 illustrates a voice-controlled communication diagram according to one embodiment of the present disclosure.
FIG. 3 shows a voice control flow diagram according to one embodiment of the present disclosure.
FIG. 4 shows a flow diagram of a voice control method according to one embodiment of the present disclosure.
FIG. 5 shows a flow diagram of a voice control method according to one embodiment of the present disclosure.
FIG. 6 shows a schematic diagram of a voice-controlled device according to one embodiment of the present disclosure.
FIG. 7 shows a schematic diagram of a voice-controlled device according to one embodiment of the present disclosure.
FIG. 8 shows a schematic structural diagram of a computing device according to one embodiment of the invention.
Detailed Description
Preferred embodiments of the present disclosure will be described in more detail below with reference to the accompanying drawings. While the preferred embodiments of the present disclosure are shown in the drawings, it should be understood that the present disclosure may be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the disclosure to those skilled in the art.
As mentioned before, when the user is in the pickup range of many intelligent voice devices, these many intelligent voice devices all can gather user's pronunciation and instruction and awaken up, later carries out voice broadcast to the user respectively, leads to many intelligent voice devices's answer audio frequency to overlap, influences user experience.
In view of this, the present disclosure provides a voice control scheme, which enables one of at least two smart devices receiving the same voice to be uniquely awakened and interacted with a user through control, so as to solve the problem of poor user experience caused by simultaneous response and voice broadcast of multiple smart devices.
Before describing the voice control scheme of the present disclosure, a voice control system and a voice control mechanism for implementing the scheme will be described with reference to the drawings and the embodiments.
FIG. 1 shows a schematic diagram of a voice control system according to one embodiment of the present disclosure. FIG. 2 illustrates a voice-controlled communication diagram according to one embodiment of the present disclosure.
As shown in FIG. 1, the speech control system 10 of the present disclosure may include a decision-making terminal 11 and a plurality of intelligent devices (12-1, 12-2, 12-3, … …, 12-n).
The decision terminal 11 is capable of communicating with the plurality of intelligent devices 12, for example via a router 13.
The smart device 120 can collect the voice uttered by the user and send feature data corresponding to the voice to the decision terminal 11. The decision terminal 11 may obtain feature data from at least two intelligent devices, respectively, determine an association relationship between each of the at least two intelligent devices and a voice issuer based on the feature data, determine a voice control decision corresponding to the at least two intelligent devices based on the association relationship, and send a control instruction corresponding to the voice control decision to the at least one intelligent device. The intelligent device 12 may receive the control instruction from the decision terminal and execute the control instruction.
The voice speaker who utters the voice may be a user or a terminal device, which is not limited in this disclosure.
The characteristic data may be an association that can be used to determine the association between the intelligent device that captured the speech and the speaker of the speech.
The characteristic data can be in various forms and can be obtained by different techniques (described in detail below). For example, the feature data may comprise a wake energy value calculated based on the speech. Alternatively, the feature data may include a confidence level of the captured wake words in the speech. Alternatively, the feature data may include an image of the voice utterer collected when the voice is collected. Alternatively, the characteristic data may include distance data of the speaker detected when the voice is collected from the smart device. Alternatively, the characteristic data may include WiFi channel status information of the smart device, and the like. In an actual application scenario, the corresponding technology may be conveniently selected according to actual needs to obtain the required feature data, which is not limited by the present disclosure.
The association relationship may be an association relationship between the intelligent device that utters the voice and the speaker that utters the voice, and the association relationship may be characterized by a predetermined parameter or value, and the parameter or value that characterizes the association relationship may correspond to the strength of the association relationship.
For example, in the embodiment of the present invention, the association relationship may include a distance of the smart device with respect to the voice utterer. Alternatively, the association may include a parameter that characterizes a distance of the smart device relative to the voice speaker. In one embodiment, the smallest distance value or the smallest distance-characterizing parameter value may be considered to correspond to the strongest correlation, whereas the largest distance value or the largest distance-characterizing parameter value may be considered to correspond to the weakest correlation.
The voice control decision may be determined in correspondence with the association. For example, when the association corresponds to the distance relationship, the voice control decision may include, for example, allowing the smart device corresponding to the strongest association to be woken up, or preventing smart devices other than the smart device corresponding to the strongest association from being woken up.
It should be understood that the association relationship between the smart device and the voice speaker or the corresponding voice control decision is only an illustrative example and not a limitation, and in other embodiments, the association relationship may also be, for example, an affiliation relationship, a control relationship, an instruction relationship, and the like between the smart device and the voice speaker. The corresponding voice control decision may for example comprise forwarding a control instruction to its child device, etc. And will not be described in detail herein.
The decision terminal 11 may be a server, a decision terminal (or referred to as a master device), or a server and a decision terminal, which is not limited by the present disclosure.
Wherein the server is capable of communicating with the plurality of smart devices, e.g. over a wireless network. The decision terminal may also communicate with the plurality of intelligent devices, e.g. over a wireless network. In one embodiment, the decision terminal and the plurality of intelligent devices can also communicate based on a local area network, so that delay or data or instructions which cannot be transmitted in time due to poor wireless network conditions is avoided. As shown in fig. 1, a server and/or a decision terminal as a decision terminal may communicate with a plurality of intelligent devices via a router 13. It should be understood that in the embodiment of the present disclosure, communication may also be performed between the intelligent devices, and details are not described herein.
In one embodiment, the decision terminal may also be one of a plurality of intelligent devices. The server may designate one of the plurality of intelligent devices as a decision terminal, and send a device information table to the plurality of intelligent devices, where the device information table may include device information of the intelligent device designated as the decision terminal. In another embodiment, the decision terminal may also be a device that has an association relationship with multiple intelligent devices (e.g., within the same predetermined range, or belonging to the same network, etc.) and that is capable of stronger processing. In different application scenarios, the decision device may be set as needed, which is not limited by the present disclosure.
In one embodiment, the server and/or the decision terminal may obtain feature data from at least two intelligent devices and send a control instruction corresponding to a voice control decision of at least one intelligent device, and the intelligent device executes the control instruction received first. Therefore, the voice control decision can be guaranteed to be issued to the intelligent equipment, so that time delay is reduced or the situations of no equipment response and the like caused by poor communication quality are avoided.
For better understanding of the voice control scheme of the present disclosure, the following will describe in detail taking as an example that both the server and the decision terminal are used as decision makers and that the association relationship between the smart device and the voice speaker corresponds to the distance therebetween. It is to be understood that the disclosed embodiments are merely illustrative and not restrictive, and that the details referred to may also apply to the server or decision device alone decision making.
Taking the server and the decision terminal as the decision maker and taking the association relationship between the intelligent device and the voice speaker corresponding to the distance between the intelligent device and the voice speaker as an example, referring to the communication diagram shown in fig. 2, the voice control system of the present disclosure may include a server 11-1 and a plurality of intelligent devices (12-1, 12-2, 12-3, … …, 12-n), with which the server 11-1 can communicate with each other via a router 13.
In one embodiment, one of the plurality of smart devices, such as smart device 12-1, may be designated as a decision terminal, the decision terminal 12-1 being capable of communicating with other smart devices via other routers 13. For example, the decision terminal may be designated by the server, and the device information table may be sent to the plurality of intelligent devices, where the device information table may include at least the device information of the intelligent device designated as the decision terminal, so that the plurality of intelligent devices know to which device they should send the feature data.
Referring to fig. 2, smart devices (including smart device 12-1 designated as a decision terminal) can collect speech uttered by a user and can transmit feature data corresponding to the speech to the server 11-1 in step 2 shown in fig. 2. The feature data can be used to determine an association (e.g., a distance or a parameter that can characterize a distance) between the smart device and the user.
Wherein, in case the smart device is the decision terminal 12-1, i.e. the master device, it can make a voice control decision itself based on feature data. In case the smart device is not the decision terminal, e.g. a slave device such as the smart devices 12-2, 12-3, … …, 12-n, it may send the feature data to the decision terminal, so that the decision terminal can make speech control decisions based on the feature data.
The server 11-1 and the smart device 12-1 may both serve as a decision-making end for performing voice control according to the embodiment of the present disclosure, and may perform voice control decision based on received feature data of the smart device.
Specifically, the server 11-1 and the smart device 12-1 acquire feature data from at least two smart devices, determine a distance or a parameter capable of characterizing the distance between each of the at least two smart devices and the voice utterer based on the feature data, determine a voice control decision corresponding to the at least two smart devices based on the distance or the parameter, and send a control instruction corresponding to the voice control decision to at least one smart device.
For example, the server sends control instructions corresponding to its voice control decisions to the respective smart devices in step 3 shown in fig. 2. For example, control instruction "tune" is woken up or control instruction "false" that is prohibited from being woken up. The decision terminal 12-1 sends a control instruction corresponding to the voice control decision to the other intelligent devices through the router 13 in step 4 shown in fig. 2. For example, control instruction "tune" is woken up or control instruction "false" that is prohibited from being woken up. The decision terminal can also issue a control command "future" or "false" to itself.
The smart devices 12-2, 12-3, … …, 12-n may be capable of receiving control instructions from the server and/or the decision terminal and may execute the control instructions received earlier. Therefore, when any communication condition of the wireless network or the local area network has a problem, the voice control decision based on the other communication mode can still be issued to the intelligent equipment so as to reduce time delay or avoid the situations of no equipment response and the like.
In one embodiment, the plurality of smart devices may be devices that have a wake-up function nearby turned on. After the user starts the nearby wake-up function, the server may establish a nearby wake-up function device group, and designate one of the intelligent devices as a decision terminal, and the other intelligent devices as slave devices. The server may randomly designate the decision terminal, or may designate by comparing IP or MAC information of a plurality of intelligent devices, and the like, which is not limited in this disclosure.
The device information of the plurality of smart devices can be recorded in the device information table corresponding to the device group. Referring to step 1 shown in fig. 2, the server may send device information (including but not limited to a device group ID, an IP of each smart device, a device identifier, a MAC address, a device role, etc.) of the device group to each smart device of the device group.
In an embodiment, the device information table may also include only the device information of the intelligent device designated as the decision terminal, and referring to step 1 shown in fig. 2, the server may transmit the device information table to the plurality of intelligent devices in advance, where the device information table may include at least the device information of the intelligent device designated as the decision terminal.
A plurality of smart devices (e.g., smart devices 12-1, 12-2, 12-3, … …, 12-n) can perform information parsing on a device information table sent by a server after receiving the device information table, so that each smart device can determine its device role (decision terminal or slave device), or at least know which smart device is designated as the decision terminal, so as to know to which smart device feature data for making a voice control decision is to be sent.
In one embodiment, the local communication of the intelligent device may be a communication mechanism based on a local area network, and information interaction between the devices may be performed through a router, where the speed of information interaction between the devices is faster than that of uplink and downlink transmission to the server.
In a preferred embodiment, the local area network communication can adopt a TCP/UDP Server/client dual communication mechanism. The decision terminal can be used as a local TCP/UDP Server, and other slave devices are used as clients.
Under the condition that the local area network is normal, the decision terminal and the slave equipment can carry out information interaction in a TCP connection mode, and the decision terminal (server) is used for regularly detecting whether the TCP Heartbeat connection is normal. Under the condition that the TCP connection is found to be abnormal, the decision terminal and the slave equipment can interact in a UDP single packet + multicast mode. The dual communication mechanism can ensure information interaction between the devices as much as possible under different network states, and is convenient for the control decision of the decision terminal to be issued to other intelligent devices in time.
Therefore, the decision terminal and other intelligent devices only relate to local area network communication, the information interaction speed is high, and the intelligent devices can receive the voice control decision from the decision terminal more quickly, so that the waiting time of a user is shortened. Moreover, based on a server and decision terminal dual decision mechanism, when problems such as communication blockage, packet loss and the like of local communication difference occur in a local area network, a voice control decision made by a server side can still reach the intelligent device, so that awakening of the intelligent device is guaranteed.
In one embodiment, the voice control decision of the present disclosure may be made based on an association relationship between a voice speaker and an intelligent device that collects the voice of a user, and the server and the decision terminal may adopt a policy that an immediate decision is issued immediately in a decision process for the same voice.
For example, the server and the decision terminal can make a decision upon receiving feature data of more than one intelligent device, i.e. compare the distances between each of the more than one intelligent devices and the voice originator based on the received feature data, and immediately send a control instruction to the intelligent device farther away from the voice originator to disable waking up.
And the server and the decision terminal may use, as the first smart device, a smart device that has received the feature data transmitted by the server and has not transmitted the control instruction to prohibit the waking up, and thereafter, the server and the decision terminal send a control instruction for prohibiting awakening to the intelligent device far away from the user in the first intelligent device and the second intelligent device each time the feature data of the same voice is acquired by another intelligent device (a second intelligent device different from the first intelligent device), and the intelligent device closer to the user in the first intelligent device and the second intelligent device is taken as a new first intelligent device, and so on, and finally, for the same voice, only one intelligent device closest to the user is allowed to be awakened, so that the uniqueness of the awakening device is ensured, and the problem of poor experience brought to the user by simultaneously awakening a plurality of intelligent devices is solved.
Therefore, for a plurality of intelligent devices which acquire the same voice, the corresponding voice control decisions can be sent in a time-sharing manner, so that network congestion caused by centralized issuing of the decisions can be reduced. And the intelligent device which is not allowed to be awakened can receive the awakening forbidding instruction as early as possible and make corresponding processing in time without waiting for a long time.
In order to enable the intelligent device to be awakened in time to interact with the user after the user sends out the voice, in one embodiment, the server for making the decision and the decision terminal can send an awakening control instruction to the first device under the condition that a predetermined condition is met in the decision process for the same voice. In this way, only one smart device can eventually be woken up for one decision of the same voice.
Wherein the predetermined condition may include at least one of: aiming at the same voice, reaching preset decision time from the moment of receiving the characteristic data sent by the first intelligent device; for the same voice, feature data from the smart device is not received for a predetermined period of time.
The preset decision time, the preset time period, and the like may be a delay time preset for a specific application scenario, for example, 200 ms. Thus, the smart device is enabled to be woken up to interact with the user, as far as possible, without the user noticing a wake-up delay after uttering speech. The scheme is particularly suitable for intelligent equipment which has high real-time requirement on awakening feedback, such as an intelligent sound box.
It should be understood that the predetermined time period and/or the decision time preset in different application scenarios may be different. Also, in practical applications, the preset decision time and/or the preset time may also be updated based on, for example, upgrading and/or iteration of a network, a device, etc., which is not limited by the present disclosure.
The intelligent device can respond correspondingly after receiving the control instruction from the server and/or the decision terminal. For example, the smart device can wake up in response to receiving a wake-up control instruction, or the smart device can disable wake-up in response to receiving a control instruction that disables wake-up. The awakened smart device can then interact with the user, such as broadcasting a voice "i am, you say" to the user, and can further collect the voice uttered by the user and respond accordingly.
As mentioned above, the embodiment of the present disclosure may be a dual-decision mechanism based on a server and a decision terminal, so that the intelligent device side may receive a control instruction from the server and/or the decision terminal. In one embodiment, the smart device executes the control command received first. Therefore, unique response of the intelligent device to the control instruction is guaranteed. Moreover, when one type of network communication (such as a wireless network or a local area network) has a problem, such as poor network state, network congestion, and the like, a decision instruction based on the other type of communication (such as the local area network or the wireless network) can still be sent to the intelligent device in time, so as to avoid situations such as no intelligent device response, and the like.
In order to avoid the situation that the intelligent device cannot receive the control command and cannot respond due to the network and the like, in one embodiment, the intelligent device can automatically wake up when the intelligent device does not receive the control command within a preset waiting time (for example, 500ms) after sending the feature data.
The intelligent device side can be provided with a timing module, and the timing module can be started when the intelligent device collecting voice sends characteristic data to the server and/or the decision terminal, so that the intelligent device can automatically wake up and enter a response flow after the timing module is overtime without receiving a control decision from a local decision terminal or a control decision from the server, and the situation that no intelligent device responds is avoided.
The preset waiting time may also be preset, and may be set according to experience or according to a network condition of the base station, and the preset waiting time may also be set to be adaptively adjusted based on the network condition, so that the intelligent device that acquires voice according to the present disclosure can respond faster. The present disclosure is not so limited.
In one embodiment, the smart device may monitor the current network conditions and indicate that the current network conditions are good when a network status parameter (e.g., network transmission speed) is above a first threshold, at which point the predetermined wait period may be reduced as appropriate. When the network status parameter (e.g., network transmission speed) is lower than or equal to a second threshold (the second threshold is less than the first threshold), indicating that the current network condition is poor, the predetermined wait period may be increased appropriately. The rule of the decrease or increase of the predetermined waiting time can be set according to actual needs, and is not described herein again.
In one embodiment, the smart device may update the predetermined wait period based on a wait elapsed time from the transmission of the feature data (or from the receipt of the voice) to the receipt of the control instruction.
Specifically, when the waiting time Tc of the received voice control decision of the server is greater than the current predetermined waiting time Tw, indicating that the current network status tends to be degraded, the predetermined waiting time Tw may be updated by using the following formula (1):
Tw(n)=K*Tw(n-1)+(1.0-K)*Tc(n) (1)
wherein n represents the current awakening, and n-1 represents the previous awakening; tw (n-1) represents a preset waiting time obtained by updating after a voice control decision is received at the previous time; tc (n) represents the time consumed by the awakening waiting of this time, and Tw (n) represents the preset waiting time obtained by updating after the voice control decision is received; k represents an update smoothing parameter when the decision delay increases, and the value range thereof may be (0.0, 1.0).
When the waiting time Tc for receiving the voice control decision of the server is smaller than the current predetermined waiting time Tw, indicating that the current network condition tends to be better, the predetermined waiting time Tw may be updated by using the following formula (2):
Tw(n)=L*Tw(n-1)+(1.0-L)*Tc(n) (2)
wherein L represents an update smoothing parameter when the decision delay is reduced, the value range may be (0.0,1.0), and the meanings of other parameters are as described above and are not described herein again.
In a preferred embodiment, the value of K may be 0.92, and the value of L may be 0.75. Wherein, the larger value of K is to carefully reduce the waiting time (Tw) by a smaller step length when the network has a trend of deterioration; the smaller value of L is to increase the waiting time (Tw) in larger steps to react faster when the network tends to get better.
As previously mentioned, the voice control decisions of the present disclosure may be made based on the distance between the user and the smart device that collects the user's voice.
Specifically, the server and the decision terminal may obtain feature data from at least two intelligent devices, determine a distance or a parameter capable of characterizing the distance between each of the at least two intelligent devices and the user based on the feature data, determine a voice control decision corresponding to the at least two intelligent devices based on the distance or the parameter, and send a control instruction corresponding to the voice control decision to the at least one intelligent device, respectively. Wherein the voice control decision may include, but is not limited to, enabling the smart device closest to the user to be awakened, or disabling the smart device further from the user from being awakened.
Therefore, the server and the decision terminal can be controlled, so that one intelligent device which is closest to the user in at least two intelligent devices collecting the same voice of the user can be awakened, and other devices are forbidden to be awakened, so that the at least two intelligent devices are ensured to be uniquely awakened, and the problem of poor user experience caused by the fact that a plurality of intelligent devices are awakened and broadcast to the user at the same time is solved.
As described above, in the embodiments of the present disclosure, the above feature data may be acquired based on different manners. In other words, the above feature data may be acquired based on different techniques, respectively, in order to determine the distance between the user and the smart device.
In one embodiment, the feature data may be a calculated wake-up energy value based on the collected speech. The wake-up energy value can be calculated by methods such as Root Mean Square (RMS) or windowed square.
As an example, the root mean square value RMS is used to calculate the wake-up energy value, which can be calculated based on the following equation (3):
Figure BDA0002036075480000161
where N is the length of the frame of voice data, and x (N) is the voice data point.
As another example, calculating the wake-up energy value using a windowed square value may be based on equation (4) below:
Figure BDA0002036075480000162
where N is the frame length of the speech data, x (N) is the speech data point, and w (N) is the commonly used window function of the speech data, such as hanning window, hamming window, etc.
In addition, the feature data may be a confidence level of the collected wake word in the speech. The confidence of the awakening word can represent the distance between the user and the intelligent equipment to a certain degree, and the greater confidence of the awakening word can represent that the intelligent equipment is closer to the user; otherwise, a smaller confidence of the wake-up word indicates that the smart device is farther from the user, and is not described herein again.
Alternatively, the feature data may be distance data between the user and the smart device, which is detected when the voice is collected. For example, an infrared sensing device may be configured on the smart device, and when voice is collected, the distance between the user and the smart device is obtained by using an infrared detection technology, and the distance is used as the feature data.
Alternatively, the feature data may be WiFi channel state information of the smart device that collects voice. Specifically, for example, the WiFi antenna of the smart device may be used to collect CSI data, perform calculation analysis and identification classification on the CSI (channel state information) data, and accordingly determine the distance between the smart device and the user.
Alternatively, the feature data may be an image of the user acquired when the voice is acquired. The scheme is applicable to the intelligent equipment provided with the camera, when the intelligent equipment collects voice, the real-time image of the user can be shot by starting the camera, so that the real-time image collected by the camera can be analyzed by combining an image processing technology, and the distance between the user and the intelligent equipment can be determined.
Alternatively, high frequency acoustic reflection detection techniques may also be used to measure the distance of the user from the smart speaker. Specifically, the high-frequency sound waves which cannot be sensed by human ears can be played through the loudspeaker of the intelligent device, the reflection quantity of the high-frequency sound waves is acquired through the microphone of the intelligent device, and the distance between the user and the sound box is judged according to the reflection quantity. The fact that the high-frequency reflection variable is large can indicate that the intelligent device is close to the user, the fact that the high-frequency reflection variable is small can indicate that the intelligent device is far away from the user, and therefore the intelligent device closest to the user can be selected from a plurality of intelligent devices collecting the same voice to respond to awakening.
It should be understood that the above-described feature data, and/or the technique by which the feature data is obtained, is an example and not a limitation of the present disclosure for determining the distance between the user and the smart device or a parameter of determining the distance, and the present disclosure may determine the distance between the user and the smart device by, but not limited to, the above-described manner.
In addition, because different intelligent devices have different device types, the configured modules may be different, and the heights, layouts, shelters, and the like of different devices are different, which may cause differences in the acquired feature data, thereby affecting the accuracy of the distance determined based on the feature data or the related parameters characterizing the distance. In one embodiment, the present disclosure may also reduce data collection differences as much as possible by making corrections to the collected speech or other relevant data.
Taking the example of correcting the voice collected by the intelligent device, the intelligent device may perform filtering processing on the collected voice, and calculate a wake-up energy value based on the voice after the filtering processing, as the feature data.
Specifically, for example, a frequency band with concentrated data acquisition differences may be determined by performing comparative analysis on a large number of acquired microphones (microphones for short) of intelligent devices of different models and wake-up word materials picked up by different batches of mics, and a filter may be designed to filter out a frequency band with a large difference in the wake-up word materials and reserve a data frequency band with a small difference for energy calculation, where a filter coefficient may be obtained by using Matlab fdatool design.
After filtering, data acquisition differences caused by the mic are weakened to a certain extent, and based on the awakening energy value obtained by voice calculation after filtering, the awakening words with different distances from the sound source can be better distinguished, and the distance resolution of the awakening word energy is improved (the distance resolution of the awakening words can be understood as the minimum distance between devices capable of successfully making distance difference judgment).
In other embodiments, the feature data obtained based on different technologies may be corrected to reduce the data acquisition difference through corresponding processing, which is not described herein again.
Therefore, the distance between each intelligent device and the user who utters voice or the parameter representing the distance can be determined through the characteristic data, and the server and the decision terminal can make the voice control decision based on the distance or the parameter representing the distance.
So far, the voice wake-up system and the voice wake-up mechanism of the present disclosure have been described in detail with reference to fig. 1-2.
FIG. 3 shows a voice control flow diagram according to one embodiment of the present disclosure. The voice control scheme of the present disclosure is described with the user as the voice speaker, the smart device 12-1 as the local decision terminal, and the smart devices 12-2 and 12-n as the local slaves.
As shown in fig. 3, in step S301, a user is in the pickup range of the plurality of smart devices and utters a voice in anticipation of waking up one smart device in its vicinity. Wherein, the voice includes a wake-up word capable of waking up the smart device. For example, "tianmaoling".
In step S302, the plurality of smart devices can each acquire the voice.
In step S303, the smart device that collects the same voice can perform feature extraction on the collected voice or other information to obtain related feature data corresponding to the voice.
In steps S304 and S305, the smart device sends the extracted feature data to a decision maker capable of making voice control decisions: a server and a decision terminal. Wherein, when the intelligent device collecting the voice is the decision terminal itself, step S305 may be omitted.
In step S306 and step S307, the server and the decision terminal make a voice control decision. The server and the decision terminal receive the feature data corresponding to the voice, and correspondingly, the voice control decisions made by the server and the decision terminal are consistent. Details of the specific decision can be found in the above related description, and are not repeated herein.
In steps S308 and S309, the server and the decision terminal send a voice control instruction, for example, an intelligent device farther from the user sends a control instruction for prohibiting being woken up, and an intelligent device closest to the user sends a wake-up control instruction.
In step S310, each smart device receives control commands from the server and the decision terminal, and executes the control command received first. For example, if a wake-up control command is received first, then the wake-up is performed in response to the wake-up control command, and if a control command for prohibiting the wake-up is received first, then the wake-up is prohibited in response to the control command.
Therefore, according to the voice control scheme disclosed by the disclosure, through control, one of at least two intelligent devices receiving the same voice is uniquely awakened and interacts with a user, so that the problem of poor user experience caused by simultaneous response and voice broadcast of a plurality of intelligent devices is solved. And moreover, a double-decision mechanism based on a server and a decision terminal enables a decision maker based on another communication to still make a voice control decision when one communication has a problem, so that the situation of no intelligent equipment response caused by the network and the like is avoided. Furthermore, the quick response of the intelligent equipment is further ensured through an automatic awakening mechanism of the intelligent equipment, a mechanism for adaptively adjusting the preset waiting time length based on the network state and the like.
In addition, the strategy for instantly deciding to send the prohibition instruction can reduce the problem of network congestion caused by concentrated issuing of the decision and can send the prohibition instruction to the intelligent device prohibited to wake up as early as possible so as to reduce the waiting time of the device and avoid unnecessary waste of resources.
FIG. 4 shows a flow diagram of a voice control method according to one embodiment of the present disclosure. FIG. 6 shows a schematic diagram of a voice-controlled device according to one embodiment of the present disclosure. The method can be executed by the server shown in fig. 1 or an intelligent device designated as a decision terminal, and can also be executed by the voice control apparatus 600 shown in fig. 6.
As shown in fig. 4, in step S410, feature data from at least two smart devices may be acquired, for example, by the first acquiring means 610 shown in fig. 6. Wherein the feature data is usable to determine a distance between the smart device and a voice speaker.
In step S420, a distance or a parameter capable of characterizing the distance between each of the at least two smart devices and the voice utterer is determined based on the feature data, for example, by the first distance device 620 shown in fig. 6.
In step S430, voice control decisions corresponding to the at least two smart devices may be determined based on the distances or parameters, for example, by the first decision means 630 shown in fig. 6. Wherein the voice control decision may include causing smart devices closest to the voice speaker to be allowed to wake up and causing smart devices other than the smart device closest to the voice speaker to be inhibited from waking up.
In step S440, the control instruction corresponding to the voice control decision of the at least two smart devices may be sent to the at least two smart devices, for example, by the first communication means 640 shown in fig. 6.
Due to network delay and the like, the server and the decision terminal do not receive the feature data from the at least two intelligent devices at the same time. In the embodiment of the disclosure, the server and the decision terminal can make a decision immediately according to the received feature data and issue the decision immediately. Specifically, the server and the decision terminal may use, as a first smart device, a smart device that has received feature data sent by the first smart device and has not sent a control instruction to prohibit waking up to the first smart device, compare distances (or parameters representing distances) between the first smart device and a voice sender of a second smart device different from the first smart device after obtaining feature data of the second smart device in response to the same voice, send a control instruction to prohibit waking up to a smart device farther from the user among the first smart device and the second smart device, and use a smart device closer to the user among the first smart device and the second smart device as a new first smart device. And finally, sending a wake-up control instruction to the first intelligent device under the condition that a preset condition is met. Wherein the predetermined condition comprises at least one of: aiming at the same voice, reaching preset decision time from the moment of receiving the characteristic data sent by the first intelligent device; for the same voice, feature data from the smart device is not received for a predetermined period of time.
In addition, for the same voice, after the wake-up control instruction is sent to the first device, in the case of acquiring feature data from other smart devices, a control instruction for prohibiting wake-up may also be sent to the other smart devices. In this way, in case that already a smart device is woken up, no further wake-up of other devices will be performed to avoid repeated wake-up.
In one embodiment, the decision terminal is one of a plurality of intelligent devices, the decision terminal being capable of communicating with other intelligent devices. The server and the intelligent devices can communicate based on a wireless communication network, and the intelligent devices can communicate based on a local area network.
In one embodiment, the server may send a device information table to the plurality of smart devices, the device information table including device information for smart devices designated as decision terminals. Thereby, the plurality of smart devices are enabled to know which smart device is designated as the decision terminal. In other embodiments, the plurality of smart devices may belong to the same device group, and the device information table may include device information of the plurality of smart devices, such as device ID, device IP, device MAC information, device role, and the like. After receiving the device information table, the intelligent device can analyze the device information table to acquire the device role.
In other embodiments, in step S410 above, feature data from at least two smart devices may be acquired. In step S420, based on the feature data, an association relationship between each of the at least two smart devices and the voice speaker is determined. In step S430, based on the association relationship, voice control decisions corresponding to the at least two smart devices are determined. In step S440, control instructions corresponding to its voice control decision may be sent to at least one smart device.
Wherein the voice control decision may comprise: enabling the intelligent device corresponding to the strongest association relationship to be awakened; or to disable the smart devices other than the smart device corresponding to the strongest association from being woken up.
In the embodiment of the present invention, the method may be executed by a decision terminal, where the decision terminal is one of a plurality of intelligent devices, the decision terminal is capable of communicating with other intelligent devices, and the server communicates with the plurality of intelligent devices based on a wireless communication network; and/or the plurality of smart devices communicate over a local area network.
In the embodiment of the present invention, the decision terminal includes a server and/or a decision terminal, where the server and/or the decision terminal obtains feature data from at least two pieces of intelligent equipment and sends a control instruction corresponding to a voice control decision to at least one piece of intelligent equipment, and the intelligent equipment executes the control instruction received first.
In the embodiment of the present invention, the association relationship includes a distance of the smart device relative to the voice utterer; or the association relation comprises a parameter capable of characterizing the distance of the intelligent device relative to the voice speaker.
In an embodiment of the invention, the characteristic data comprises at least one of: calculating a wake-up energy value based on the voice; confidence of the collected wake-up word in the speech; collecting the image of the voice sender collected during the voice; collecting distance data between a voice sender and the intelligent equipment detected in the voice process; and WiFi channel state information of the smart device.
FIG. 5 shows a flow diagram of a voice control method according to one embodiment of the present disclosure. FIG. 7 shows a schematic diagram of a voice-controlled device according to one embodiment of the present disclosure. The method shown in fig. 5 may be executed by the smart device shown in fig. 1, or may be executed by the voice control apparatus 700 shown in fig. 7.
As shown in fig. 5, in step S510, for example, the voice collecting device 710 shown in fig. 7 may collect the voice uttered by the voice utterer.
In step S520, for example, the second communication device 720 shown in fig. 7 may send feature data corresponding to the voice to a server, and send the feature data to a decision terminal in the case that the smart device is not a designated decision terminal, where the feature data is used for determining the distance between the smart device and the user.
In step S530, a control instruction from the server and/or the decision terminal may be received, for example, by the third communication device 730 shown in fig. 7.
In step S540, the control sub-apparatus 740 shown in fig. 7 may execute the control command received earlier.
In one embodiment, in the case where the smart device is a designated decision terminal, the decision terminal may obtain feature data from at least two smart devices, the feature data being usable to determine a distance between the smart device and the user, determine a distance or a parameter capable of characterizing the distance between each of the at least two smart devices and the user based on the feature data, determine a voice control decision corresponding to the at least two smart devices based on the distance or parameter, and send a control instruction corresponding to the voice control decision to at least one smart device.
In one embodiment, the smart device is capable of waking in response to receiving a wake control instruction and may also disable waking in response to receiving a control instruction that disables waking.
In one embodiment, the smart device can automatically wake up when the smart device does not receive the control command within a preset waiting time after sending the feature data.
In one embodiment, the smart device may update the preset wait time period based on a wait elapsed time from the transmission of the feature data to the reception of the control instruction.
In one embodiment, the intelligent device and the server can communicate based on a wireless communication network, and a plurality of intelligent devices can communicate based on a local area network.
In one embodiment, the characteristic data may include at least one of: and calculating a wake-up energy value based on the voice. Confidence of the collected wake-up word in the speech; acquiring the image of the user acquired during the voice acquisition; acquiring distance data between the user and the intelligent equipment detected in the voice process; and WiFi channel state information of the smart device.
In one embodiment, the smart device may perform filtering processing on the collected voice, and calculate a wake-up energy value based on the voice after the filtering processing, as the feature data.
In one embodiment, the smart device is capable of receiving a device information table that includes device information for the smart device designated as the decision terminal. The intelligent device can analyze the device information table to acquire the device role of the intelligent device or the device information of the intelligent device designated as the decision terminal.
In other embodiments, in step S510, voice uttered by the user is collected; sending feature data corresponding to the voice to a decision end, wherein the feature data are used for determining an incidence relation between the intelligent equipment and the user; receiving a control instruction from the decision end, wherein the control instruction corresponds to a voice control decision determined by the decision end based on the incidence relation; executing the received control instruction.
Wherein the association relationship may include a distance of the smart device relative to the voice speaker; or the association relation comprises a parameter capable of characterizing the distance of the intelligent device relative to the voice speaker. The speech control decision may comprise: enabling the intelligent device corresponding to the strongest association relationship to be awakened; and/or disabling the smart devices other than the smart device corresponding to the strongest association from being woken up.
The decision terminal may include a server and/or a decision terminal, wherein the intelligent device sends feature data corresponding to the voice to the server and/or the decision terminal; and/or the intelligent equipment receives a control instruction from the server and/or the decision terminal and executes the control instruction received firstly. Wherein the intelligent device and the server communicate based on a wireless communication network; and/or the intelligent equipment and the decision terminal are communicated based on a local area network.
In this embodiment of the present invention, the decision terminal may be one of a plurality of intelligent devices, and the decision terminal may be capable of communicating with other intelligent devices, where the intelligent device is a decision terminal, the method further includes: acquiring feature data from at least two intelligent devices; determining an association relationship between each of the at least two smart devices and the user based on the feature data; determining voice control decisions corresponding to the at least two smart devices based on the association; and sending control instructions corresponding to the voice control decisions of the at least one smart device.
Optionally, the plurality of smart devices may belong to the same device group, and the method further includes: a device information table is received that includes device information for a smart device designated as a decision terminal.
Optionally, the feature data comprises at least one of: calculating a wake-up energy value based on the voice; confidence of the collected wake-up word in the speech; acquiring the image of the user acquired during the voice acquisition; acquiring distance data between the user and the intelligent equipment detected in the voice process; and WiFi channel state information of the smart device.
Details of the implementation of the voice control method and/or the operations performed by the voice control apparatus shown in fig. 4-7 can be referred to the above description, and are not repeated herein.
FIG. 8 shows a schematic structural diagram of a computing device according to one embodiment of the invention.
Referring to fig. 8, computing device 800 includes memory 810 and processor 820.
The processor 820 may be a multi-core processor or may include multiple processors. In some embodiments, processor 820 may include a general-purpose host processor and one or more special coprocessors such as a Graphics Processor (GPU), a Digital Signal Processor (DSP), or the like. In some embodiments, processor 820 may be implemented using custom circuitry, such as an Application Specific Integrated Circuit (ASIC) or a Field Programmable Gate Array (FPGA).
The memory 810 may include various types of storage units, such as system memory, Read Only Memory (ROM), and permanent storage. Wherein the ROM may store static data or instructions for the processor 820 or other modules of the computer. The persistent storage device may be a read-write storage device. The persistent storage may be a non-volatile storage device that does not lose stored instructions and data even after the computer is powered off. In some embodiments, the persistent storage device employs a mass storage device (e.g., magnetic or optical disk, flash memory) as the persistent storage device. In other embodiments, the permanent storage may be a removable storage device (e.g., floppy disk, optical drive). The system memory may be a read-write memory device or a volatile read-write memory device, such as a dynamic random access memory. The system memory may store instructions and data that some or all of the processors require at runtime. In addition, the memory 810 may include any combination of computer-readable storage media, including various types of semiconductor memory chips (DRAM, SRAM, SDRAM, flash memory, programmable read-only memory), magnetic and/or optical disks, may also be employed. In some embodiments, memory 810 may include a removable storage device that is readable and/or writable, such as a Compact Disc (CD), a read-only digital versatile disc (e.g., DVD-ROM, dual layer DVD-ROM), a read-only Blu-ray disc, an ultra-density optical disc, a flash memory card (e.g., SD card, min SD card, Micro-SD card, etc.), a magnetic floppy disc, or the like. Computer-readable storage media do not contain carrier waves or transitory electronic signals transmitted by wireless or wired means.
The memory 810 has stored thereon processable code, which, when processed by the processor 820, causes the processor 820 to perform the speech control methods described above.
The voice control method, apparatus and system according to the present invention have been described in detail above with reference to the accompanying drawings.
Furthermore, the method according to the invention may also be implemented as a computer program or computer program product comprising computer program code instructions for carrying out the above-mentioned steps defined in the above-mentioned method of the invention.
Alternatively, the invention may also be embodied as a non-transitory machine-readable storage medium (or computer-readable storage medium, or machine-readable storage medium) having stored thereon executable code (or a computer program, or computer instruction code) which, when executed by a processor of an electronic device (or computing device, server, etc.), causes the processor to perform the steps of the above-described method according to the invention.
Those of skill would further appreciate that the various illustrative logical blocks, modules, circuits, and algorithm steps described in connection with the disclosure herein may be implemented as electronic hardware, computer software, or combinations of both.
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems and methods according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein.

Claims (41)

1. A method for voice control, the method comprising:
acquiring feature data from at least two intelligent devices, wherein the feature data is used for determining the distance between the intelligent devices and a voice speaker;
determining, based on the feature data, a distance or a parameter capable of characterizing the distance between each of the at least two smart devices and the voice utterer;
determining voice control decisions corresponding to the at least two smart devices based on the distance or parameter; and
and sending control instructions corresponding to voice control decisions of the at least two intelligent devices.
2. The method of claim 1, wherein the voice control decision comprises:
enabling the smart device closest to the voice speaker to be awakened; or
Causing smart devices other than the smart device closest to the voice speaker to be inhibited from being woken up.
3. The method of claim 2, further comprising:
taking an intelligent device which has received the transmitted characteristic data and has not transmitted a control instruction for prohibiting awakening as a first intelligent device;
acquiring feature data of a second intelligent device different from the first intelligent device in response to the same voice;
and sending a control instruction for prohibiting awakening to the intelligent device far away from the voice sender in the first intelligent device and the second intelligent device, and taking the intelligent device close to the user in the first intelligent device and the second intelligent device as a new first intelligent device.
4. The method of claim 3, further comprising:
and sending a wake-up control instruction to the first intelligent device under the condition that a preset condition is met.
5. The method of claim 4, further comprising:
and for the same voice, after the awakening control instruction is sent to the first intelligent device, under the condition that the feature data from other intelligent devices is acquired, sending a control instruction for prohibiting awakening to the other intelligent devices.
6. The method of claim 3, wherein the predetermined condition comprises at least one of:
aiming at the same voice, reaching preset decision time from the moment of receiving the characteristic data sent by the first intelligent device;
for the same voice, feature data from the smart device is not received for a predetermined period of time.
7. The method according to claim 1, characterized in that the method is performed by a server and/or a decision terminal, the decision terminal being one of a plurality of intelligent devices, the decision terminal being capable of communicating with other intelligent devices,
the server and the plurality of intelligent devices communicate based on a wireless communication network; and/or
The plurality of intelligent devices communicate based on a local area network.
8. The method of claim 7,
the server and/or the decision terminal acquire feature data from at least two intelligent devices and send control instructions corresponding to voice control decisions of the at least two intelligent devices, and the intelligent devices execute the control instructions received first.
9. The method of claim 7, further comprising:
sending a device information table to the plurality of smart devices, the device information table including device information for smart devices designated as decision terminals.
10. The method of claim 1, wherein the characterization data comprises at least one of:
calculating a wake-up energy value based on the voice;
confidence of the collected wake-up word in the speech;
acquiring the image of the user acquired during the voice acquisition;
acquiring distance data between the user and the intelligent equipment detected in the voice process; and
WiFi channel state information of the intelligent device.
11. A method for voice control, the method comprising:
acquiring feature data from at least two intelligent devices;
determining an association relationship between each of the at least two smart devices and a voice speaker based on the feature data;
determining voice control decisions corresponding to the at least two smart devices based on the association; and
and sending a control instruction corresponding to the voice control decision of the intelligent device to at least one intelligent device.
12. The method of claim 11, wherein the voice control decision comprises:
enabling the intelligent device corresponding to the strongest association relationship to be awakened; or
The smart devices other than the smart device corresponding to the strongest association are inhibited from being woken up.
13. The method of claim 11, wherein the method is performed by a decision terminal, wherein the decision terminal is one of a plurality of smart devices, wherein the decision terminal is capable of communicating with other smart devices,
the server and the plurality of intelligent devices communicate based on a wireless communication network; and/or
The plurality of intelligent devices communicate based on a local area network.
14. The method according to claim 13, wherein the decision terminal comprises a server and/or a decision terminal, wherein,
the server and/or the decision terminal acquire feature data from at least two intelligent devices and send a control instruction corresponding to a voice control decision of at least one intelligent device, and the intelligent device executes the control instruction received first.
15. The method of claim 1,
the incidence relation comprises the distance of the intelligent equipment relative to the voice speaker; or
The association includes a parameter that characterizes a distance of the smart device relative to the voice speaker.
16. The method of claim 1, wherein the characterization data comprises at least one of:
calculating a wake-up energy value based on the voice;
confidence of the collected wake-up word in the speech;
collecting the image of the voice sender collected during the voice;
collecting distance data between a voice sender and the intelligent equipment detected in the voice process; and
WiFi channel state information of the intelligent device.
17. A voice control method is applied to intelligent equipment and comprises the following steps:
collecting the voice sent by a voice sender;
sending feature data corresponding to the voice to a server, and sending the feature data to a decision terminal under the condition that the intelligent device is not a designated decision terminal, wherein the feature data is used for determining the distance between the intelligent device and a voice sender;
receiving a control instruction from the server and/or the decision terminal;
and executing the control instruction received firstly.
18. The method of claim 17, wherein in the case where the smart device is a designated decision terminal, the method further comprises:
obtaining feature data from at least two smart devices, the feature data being used to determine a distance between the smart devices and the user;
determining, based on the feature data, a distance or a parameter capable of characterizing the distance between each of the at least two smart devices and the voice utterer;
determining voice control decisions corresponding to the at least two smart devices based on the distance or parameter; and
and sending control instructions corresponding to voice control decisions of the at least two intelligent devices.
19. The method of claim 18, further comprising:
waking up in response to receiving a wake-up control instruction; and/or
The wake-up is disabled in response to receiving a control instruction to disable wake-up.
20. The method of claim 17, further comprising:
and automatically waking up under the condition that the control instruction is not received within a preset waiting time after the characteristic data is sent.
21. The method of claim 20, further comprising:
updating the preset waiting time length based on the waiting time from the sending of the characteristic data to the receiving of the control command.
22. The method of claim 17, wherein the decision terminal is one of a plurality of smart devices, the decision terminal capable of communicating with other smart devices,
the intelligent device and the server communicate based on a wireless communication network; and/or
The plurality of intelligent devices communicate based on a local area network.
23. The method of claim 17, wherein the characterization data comprises at least one of:
calculating a wake-up energy value based on the voice;
confidence of the collected wake-up word in the speech;
collecting the image of the voice sender collected during the voice;
collecting distance data between a voice sender and the intelligent equipment detected in the voice process; and
WiFi channel state information of the intelligent device.
24. The method of claim 23, further comprising:
and filtering the collected voice, and calculating to obtain a wake-up energy value based on the voice after filtering as the characteristic data.
25. The method of claim 22, further comprising:
a device information table is received that includes device information for a smart device designated as a decision terminal.
26. A voice control method is applied to intelligent equipment and comprises the following steps:
collecting the voice sent by a voice sender;
sending feature data corresponding to the voice to a decision end, wherein the feature data are used for determining an incidence relation between the intelligent equipment and the user;
receiving a control instruction from the decision end, wherein the control instruction corresponds to a voice control decision determined by the decision end based on the incidence relation;
executing the received control instruction.
27. The method according to claim 26, wherein the decision terminal comprises a server and/or a decision terminal, wherein,
the intelligent equipment sends feature data corresponding to the voice to the server and/or the decision terminal; and/or
And the intelligent equipment receives a control instruction from the server and/or the decision terminal and executes the control instruction received firstly.
28. The method of claim 26,
the intelligent device and the server communicate based on a wireless communication network; and/or
And the intelligent equipment and the decision terminal are communicated based on a local area network.
29. The method of claim 26, wherein the decision terminal is one of a plurality of smart devices, the decision terminal being capable of communicating with other smart devices, and wherein if the smart device is a decision terminal, the method further comprises:
acquiring feature data from at least two intelligent devices;
determining an association relationship between each of the at least two smart devices and the voice speaker based on the feature data;
determining voice control decisions corresponding to the at least two smart devices based on the association; and
and sending a control instruction corresponding to the voice control decision of the intelligent device to at least one intelligent device.
30. The method of claim 29, wherein the plurality of smart devices belong to a same device group, the method further comprising:
a device information table is received that includes device information for a smart device designated as a decision terminal.
31. The method of claim 26, wherein the voice control decision comprises:
enabling the intelligent device corresponding to the strongest association relationship to be awakened; and/or
The smart devices other than the smart device corresponding to the strongest association are inhibited from being woken up.
32. The method of claim 26,
the incidence relation comprises the distance of the intelligent equipment relative to the voice speaker; or
The association includes a parameter that characterizes a distance of the smart device relative to the voice speaker.
33. The method of claim 26, wherein the characterization data comprises at least one of:
calculating a wake-up energy value based on the voice;
confidence of the collected wake-up word in the speech;
acquiring the image of the user acquired during the voice acquisition;
acquiring distance data between the user and the intelligent equipment detected in the voice process; and
WiFi channel state information of the intelligent device.
34. A voice control system is characterized by comprising a decision end and a plurality of intelligent devices, wherein the decision end can be communicated with the intelligent devices, wherein,
the intelligent equipment collects voice sent by a user and sends feature data corresponding to the voice to the decision end;
the decision end acquires feature data from at least two intelligent devices, determines an incidence relation between each of the at least two intelligent devices and the user based on the feature data, determines a voice control decision corresponding to the at least two intelligent devices based on the incidence relation, and sends a control instruction corresponding to the voice control decision to the at least one intelligent device;
and the intelligent equipment receives the control instruction from the decision end and executes the control instruction.
35. A voice control system comprising a server and a plurality of intelligent devices, the server being capable of communicating with the plurality of intelligent devices, wherein,
one intelligent device is designated as a decision terminal, which is capable of communicating with other intelligent devices,
the intelligent equipment collects voice sent by a user, sends feature data corresponding to the voice to the server, and sends the feature data to the decision terminal under the condition that the intelligent equipment is not the decision terminal, wherein the feature data can be used for determining the distance between the intelligent equipment and the user;
the server and the decision terminal acquire feature data from at least two intelligent devices, determine the distance between each of the at least two intelligent devices and the user or a parameter capable of representing the distance based on the feature data, determine voice control decisions corresponding to the at least two intelligent devices based on the distance or the parameter, and send control instructions corresponding to the voice control decisions to the at least two intelligent devices;
and the intelligent equipment receives a control instruction from the server and/or the decision terminal and executes the control instruction received firstly.
36. A voice control apparatus, comprising:
the device comprises a first acquisition device, a second acquisition device and a voice processing device, wherein the first acquisition device is used for acquiring feature data from at least two intelligent devices, and the feature data is used for determining the distance between the intelligent devices and a voice speaker;
first distance means for determining, based on the feature data, a distance or a parameter capable of characterizing the distance between each of the at least two smart devices and the voice utterer;
a first decision-making means for determining a voice control decision corresponding to the at least two smart devices based on the distance or parameter; and
and the first communication device is used for sending a control instruction corresponding to the voice control decision of the at least two intelligent devices.
37. A voice control apparatus, comprising:
the first acquisition device is used for acquiring feature data from at least two intelligent devices;
the first distance device is used for determining the incidence relation between each of the at least two intelligent devices and the voice speaker based on the characteristic data;
the first decision-making device is used for determining voice control decisions corresponding to the at least two intelligent devices based on the incidence relation; and
the first communication device is used for sending a control instruction corresponding to the voice control decision of the intelligent device to at least one intelligent device.
38. A voice control apparatus, comprising:
the voice acquisition device is used for acquiring voice sent by a voice sender;
the second communication device is used for sending the feature data corresponding to the voice to a server and sending the feature data to a decision terminal under the condition that the intelligent equipment is not a designated decision terminal, wherein the feature data is used for determining the distance between the intelligent equipment and a voice sender;
third communication means for receiving control instructions from the server and/or the decision terminal;
and the control sub-device is used for executing the control instruction received firstly.
39. A voice control apparatus, comprising:
the voice acquisition device is used for acquiring voice sent by a voice sender;
the second communication device is used for sending feature data corresponding to the voice to a decision-making terminal, and the feature data are used for determining the incidence relation between the intelligent equipment and the voice sender;
a third communication device, configured to receive a control instruction from the decision end, where the control instruction corresponds to a voice control decision determined by the decision end based on the association relationship;
and the control sub-device is used for executing the received control instruction.
40. A computing device, comprising:
a processor; and
a memory having executable code stored thereon, which when executed by the processor, causes the processor to perform the method of any of claims 1-33.
41. A non-transitory machine-readable storage medium having stored thereon executable code, which when executed by a processor of an electronic device, causes the processor to perform the method of any of claims 1-33.
CN201910325459.1A 2019-04-22 2019-04-22 Voice control system, method and apparatus, and computing device and storage medium Active CN111833863B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910325459.1A CN111833863B (en) 2019-04-22 2019-04-22 Voice control system, method and apparatus, and computing device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910325459.1A CN111833863B (en) 2019-04-22 2019-04-22 Voice control system, method and apparatus, and computing device and storage medium

Publications (2)

Publication Number Publication Date
CN111833863A true CN111833863A (en) 2020-10-27
CN111833863B CN111833863B (en) 2023-04-07

Family

ID=72912200

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910325459.1A Active CN111833863B (en) 2019-04-22 2019-04-22 Voice control system, method and apparatus, and computing device and storage medium

Country Status (1)

Country Link
CN (1) CN111833863B (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112837686A (en) * 2021-01-29 2021-05-25 青岛海尔科技有限公司 Wake-up response operation execution method and device, storage medium and electronic device
CN113132193A (en) * 2021-04-13 2021-07-16 Oppo广东移动通信有限公司 Control method and device of intelligent device, electronic device and storage medium
WO2022188560A1 (en) * 2021-03-10 2022-09-15 Oppo广东移动通信有限公司 Methods for distance relationship determination, device control and model training, and related apparatuses
CN115617169A (en) * 2022-10-11 2023-01-17 深圳琪乐科技有限公司 Voice control robot and robot control method based on role relationship
WO2023221062A1 (en) * 2022-05-19 2023-11-23 北京小米移动软件有限公司 Voice wake-up method and apparatus of electronic device, storage medium and chip

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109378000A (en) * 2018-12-19 2019-02-22 科大讯飞股份有限公司 Voice awakening method, device, system, equipment, server and storage medium
CN109391528A (en) * 2018-08-31 2019-02-26 百度在线网络技术(北京)有限公司 Awakening method, device, equipment and the storage medium of speech-sound intelligent equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109391528A (en) * 2018-08-31 2019-02-26 百度在线网络技术(北京)有限公司 Awakening method, device, equipment and the storage medium of speech-sound intelligent equipment
CN109378000A (en) * 2018-12-19 2019-02-22 科大讯飞股份有限公司 Voice awakening method, device, system, equipment, server and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112837686A (en) * 2021-01-29 2021-05-25 青岛海尔科技有限公司 Wake-up response operation execution method and device, storage medium and electronic device
WO2022188560A1 (en) * 2021-03-10 2022-09-15 Oppo广东移动通信有限公司 Methods for distance relationship determination, device control and model training, and related apparatuses
CN113132193A (en) * 2021-04-13 2021-07-16 Oppo广东移动通信有限公司 Control method and device of intelligent device, electronic device and storage medium
WO2023221062A1 (en) * 2022-05-19 2023-11-23 北京小米移动软件有限公司 Voice wake-up method and apparatus of electronic device, storage medium and chip
CN115617169A (en) * 2022-10-11 2023-01-17 深圳琪乐科技有限公司 Voice control robot and robot control method based on role relationship

Also Published As

Publication number Publication date
CN111833863B (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN111833863B (en) Voice control system, method and apparatus, and computing device and storage medium
CN106910500B (en) Method and device for voice control of device with microphone array
CN108667697B (en) Voice control conflict resolution method and device and voice control system
CN109473092B (en) Voice endpoint detection method and device
CN107483254B (en) Production test method and device for intelligent module of Internet of things
CN110060685B (en) Voice wake-up method and device
WO2017185342A1 (en) Method and apparatus for determining voice input anomaly, terminal, and storage medium
CN106100676A (en) Control method, user terminal and the interphone terminal of audio frequency output
CN109473095B (en) Intelligent household control system and control method
WO2015131783A1 (en) Method for setting in-vehicle usage scenario, on-board device and network device
CN112489648B (en) Awakening processing threshold adjusting method, voice household appliance and storage medium
CN108335700B (en) Voice adjusting method and device, voice interaction equipment and storage medium
US10312874B2 (en) Volume control methods and devices, and multimedia playback control methods and devices
CN110709931B (en) System and method for audio pattern recognition
CN107464565A (en) A kind of far field voice awakening method and equipment
CN110767225B (en) Voice interaction method, device and system
EP3979676A1 (en) Method and apparatus for identifying dual-mode wireless device, ios device, and medium
US11089411B2 (en) Systems and methods for coordinating rendering of a remote audio stream by binaural hearing devices
CN112311635B (en) Voice interruption awakening method and device and computer readable storage medium
CN113096658A (en) Terminal equipment, awakening method and device thereof and computer readable storage medium
CN113010139B (en) Screen projection method and device and electronic equipment
CN111640431A (en) Equipment response processing method and device
CN105679350A (en) Audio playing method and device
CN109147783B (en) Voice recognition method, medium and system based on Karaoke system
CN110958348B (en) Voice processing method and device, user equipment and intelligent sound box

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant