CN113990312A - Equipment control method and device, electronic equipment and storage medium - Google Patents

Equipment control method and device, electronic equipment and storage medium Download PDF

Info

Publication number
CN113990312A
CN113990312A CN202111212291.7A CN202111212291A CN113990312A CN 113990312 A CN113990312 A CN 113990312A CN 202111212291 A CN202111212291 A CN 202111212291A CN 113990312 A CN113990312 A CN 113990312A
Authority
CN
China
Prior art keywords
target
audio information
candidate
information
devices
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111212291.7A
Other languages
Chinese (zh)
Inventor
戴林
蒋朵拉
魏德平
秦子宁
谢俊杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Gree Electric Appliances Inc of Zhuhai
Zhuhai Lianyun Technology Co Ltd
Original Assignee
Gree Electric Appliances Inc of Zhuhai
Zhuhai Lianyun Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Gree Electric Appliances Inc of Zhuhai, Zhuhai Lianyun Technology Co Ltd filed Critical Gree Electric Appliances Inc of Zhuhai
Priority to CN202111212291.7A priority Critical patent/CN113990312A/en
Publication of CN113990312A publication Critical patent/CN113990312A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/60Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for measuring the quality of voice signals
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/2803Home automation networks
    • H04L12/2816Controlling appliance services of a home automation network by calling their functionalities
    • H04L12/282Controlling appliance services of a home automation network by calling their functionalities based on user interaction within the home
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The application relates to a device control method and apparatus, an electronic device, and a storage medium. The method comprises the following steps: acquiring target audio information from at least two candidate devices in a target time period; according to all the target audio information, determining target equipment needing to respond to the voice from the at least two candidate equipment; generating a control instruction for controlling the target equipment to respond according to the semantic information corresponding to the target audio information; and issuing the control command to the target equipment. By the method in the embodiment, under the condition that a plurality of candidate devices exist in a target environment and each candidate device can acquire corresponding target audio information, the target device actually required to be controlled by a target object can be determined according to the target audio information, so that the aim of accurately performing voice control can be fulfilled, and the problem that the specified voice interaction device can be accurately awakened can be effectively solved.

Description

Equipment control method and device, electronic equipment and storage medium
Technical Field
The present application relates to the field of smart home devices, and in particular, to a device control method and apparatus, an electronic device, and a storage medium.
Background
At this stage, many homes have devices that can interact through voice, and users can wake up the devices through special dialogues to chat or control. However, in some scenarios where there are multiple voice interaction devices, it may happen that multiple voice interaction devices are simultaneously woken up for one voice, for example: the user speaks the wake-up word aloud in the living room, and at this time, the devices in the living room that receive the wake-up word all reply to the user: i am there. In this time, the voice source of the equipment from all directions appears, and very poor user experience is brought to the user.
Aiming at the technical problem that the specified voice interaction equipment cannot be accurately awakened in the related technology, an effective solution is not provided at present.
Disclosure of Invention
In order to solve the technical problem that the specified voice interaction equipment cannot be awakened accurately, the application provides an equipment control method and device, electronic equipment and a storage medium.
In a first aspect, an embodiment of the present application provides an apparatus control method, including:
in a target time period, acquiring target audio information from at least two candidate devices, wherein the at least two candidate devices are located in the same target environment, and each target audio information is obtained by acquiring the same voice emitted by the same target object by the corresponding candidate device;
according to all the target audio information, determining target equipment needing to respond to the voice in the at least two candidate equipment;
generating a control instruction for controlling the target equipment to respond according to the semantic information corresponding to the target audio information;
and issuing the control instruction to the target equipment.
Optionally, as in the foregoing method, the determining, according to all the target audio information, a target device that needs to respond to the voice in the at least two candidate devices includes:
performing semantic analysis on the target audio information to determine intention information of the target audio information, wherein the intention information is information used for indicating equipment of which the target object needs to trigger response;
and determining the target equipment in the at least two candidate equipments according to the intention information.
Optionally, as in the foregoing method, after performing semantic analysis on the target audio information to determine intention information of the target audio information, the method further includes:
analyzing the target audio information to determine the volume and/or definition of the target audio information;
and determining the audio quality of the target audio based on the volume and/or the definition.
Optionally, as in the foregoing method, the determining, according to the intention information, the target device from the at least two candidate devices includes:
determining a first number of the candidate devices;
determining a second number of the target devices according to the intention information and the first number, wherein the second number is smaller than or equal to the first number;
and determining the second number of the target devices in all the candidate devices according to the audio quality of the target audio information acquired by each candidate device, wherein the audio quality of the target audio information acquired by the target devices is higher than the audio quality of the target audio information acquired by other candidate devices except the target devices.
Optionally, as in the foregoing method, before acquiring the target audio information from the at least two candidate devices in the target time period, the method further includes:
in the target time period, acquiring candidate audio information of each specified device in the target environment;
and extracting to obtain tone information in each candidate audio information.
Optionally, as in the foregoing method, the acquiring, in the target time period, target audio information from at least two candidate devices includes:
and determining target audio information of the at least two candidate devices with the same tone information in all the candidate audio information.
Optionally, as in the foregoing method, before generating a control instruction for controlling the target device to respond according to semantic information corresponding to the target audio information, the method further includes:
and under the condition that only one candidate audio information is acquired, the designated equipment which acquires the candidate audio information is taken as the target equipment.
In a second aspect, an embodiment of the present application provides an apparatus control device, including:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring target audio information from at least two candidate devices in a target time period, the at least two candidate devices are positioned in the same target environment, and each target audio information is obtained by acquiring the same voice emitted by the same target object by the corresponding candidate device;
a determining module, configured to determine, according to all the target audio information, a target device that needs to respond to the voice from the at least two candidate devices;
the generating module is used for generating a control instruction for controlling the target equipment to respond according to the semantic information corresponding to the target audio information;
and the issuing module is used for issuing the control command to the target equipment.
In a third aspect, an embodiment of the present application provides an electronic device, including: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
the memory is used for storing a computer program;
the processor, when executing the computer program, is configured to implement the method according to any of the preceding claims.
In a fourth aspect, the present application provides a computer-readable storage medium, which includes a stored program, where the program is executed to perform the method according to any one of the preceding claims.
Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages:
the embodiment of the application provides a device control method and device, electronic equipment and a storage medium. The equipment control method comprises the following steps: in a target time period, acquiring target audio information from at least two candidate devices, wherein the at least two candidate devices are located in the same target environment, and each target audio information is obtained by acquiring the same voice emitted by the same target object by the corresponding candidate device; according to all the target audio information, determining target equipment needing to respond to the voice in the at least two candidate equipment; generating a control instruction for controlling the target equipment to respond according to the semantic information corresponding to the target audio information; and issuing the control instruction to the target equipment. By the method in the embodiment, under the condition that a plurality of candidate devices exist in the target environment and each candidate device can acquire corresponding target audio information, the target device actually required to be controlled by the target object can be determined according to the target audio information, so that the aim of accurately performing voice control can be fulfilled, and the problem that the specified voice interaction device can be accurately awakened can be effectively solved.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention.
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art that other drawings can be obtained according to the drawings without inventive exercise.
Fig. 1 is a schematic flowchart of an apparatus control method according to an embodiment of the present disclosure;
fig. 2 is a schematic flowchart of an apparatus control method according to another embodiment of the present application;
fig. 3 is a schematic flow chart of an apparatus control method according to an application example of the present application;
fig. 4 is a block diagram of an apparatus control device according to an embodiment of the present application;
fig. 5 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present application clearer, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
According to an aspect of an embodiment of the present application, there is provided an apparatus control method. Alternatively, in this embodiment, the device control method described above may be applied to a hardware environment constituted by a terminal and a server. The server is connected with the terminal through a network, can be used for providing services (such as equipment control services, equipment response services and the like) for the terminal or a client installed on the terminal, and can be provided with a database on the server or independent of the server for providing data storage services for the server.
The network may include, but is not limited to, at least one of: wired networks, wireless networks. The wired network may include, but is not limited to, at least one of: wide area networks, metropolitan area networks, local area networks, which may include, but are not limited to, at least one of the following: WIFI (Wireless Fidelity), bluetooth. The terminal may not be limited to a PC, a mobile phone, a tablet computer, and the like.
The device control method in the embodiment of the application may be executed by a server, a terminal, or both the server and the terminal. The terminal executing the device control method according to the embodiment of the present application may also be executed by a client installed thereon.
Taking the device control method in this embodiment executed by a server as an example, fig. 1 is a device control method provided in this embodiment, and includes the following steps:
step S101, in a target time period, target audio information from at least two candidate devices is obtained, wherein the at least two candidate devices are located in the same target environment, and each target audio information is obtained by collecting the same voice emitted by the same target object for the corresponding candidate device.
The device control method in this embodiment may be applied to a scenario where a target device that a user needs to control needs to be determined according to a voice uttered by the user in a plurality of devices capable of performing voice interaction, for example: the method includes the steps of determining a scene of an intelligent sound box (i.e., one of candidate devices) to be controlled by voice in a home environment, determining a scene of an intelligent camera (i.e., one of candidate devices) to be controlled by voice in a public environment, and the like, and also can be used for identifying scenes in other environments, which need to control other devices. In the embodiment of the present application, the method for controlling the device is described by taking a scene of the smart speaker to be controlled by determining voice in a home environment as an example, and the method for controlling the device is also applicable to other types of scenes without contradiction.
Taking the example of determining the scene of the smart sound box to be controlled by voice in the home environment, the target smart sound box (i.e., the target device) to be controlled (e.g., awakened) in all the smart sound boxes is determined by performing exception recognition on all the smart sound boxes.
When the target object needs to control each device in the target environment, the target object can send out voice in the target environment, and then each candidate device in the target environment can acquire the voice in the target time period and then acquire corresponding target audio information.
The target time period may be a time period with a preset time length, so that under the condition that the same candidate device collects multiple target audio information, each target audio information is sent by a target object for completing a certain control instruction, and semantic consistency can be further satisfied, for example, 5 seconds and the like.
The target audio information may be information obtained by audio capture by the candidate device, and may be, for example, audio in a format of m4a or MP 3.
Further, in the event that a candidate device in the target environment is able to recognize speech, target audio information corresponding to the speech is generated.
For example, when 4 smart speakers exist in the living room and a user speaks a voice of "one speaker plays music randomly" in the living room, all the smart speakers capable of receiving the voice in the living room may generate corresponding target audio information, and when only N (N is less than or equal to 4) smart speakers may receive the voice, the N smart speakers generate corresponding target audio signals according to the received voices.
And step S102, determining target equipment needing to respond to the voice from the at least two candidate equipment according to all the target audio information.
After all the target audio information is acquired, analysis can be performed according to the target audio information to determine the target device which the target object actually wants to control.
The analyzing of the target audio information may be: the target device to be controlled is determined according to the volume and the definition of the target audio information, or the target device is determined according to the identification information (for example, the position of the device, the ID of the device, and the like) of the target device specified in the target audio information.
And step S103, generating a control instruction for controlling the target equipment to respond according to the semantic information corresponding to the target audio information.
After the target audio information is obtained, the target audio information can be analyzed to determine the type of the target object to control the target device.
Further, taking target audio information for waking up a target device as an example, after the candidate device collects the target audio information of the user, the server may obtain the target audio information, identify the target audio information as a text, perform semantic understanding on the text, and determine whether a keyword of a wake-up word appears in the text and whether the keyword of the wake-up word appears is a real wake-up intention of the target object; in the case that the true awakening intention of the target object is determined and the target device is determined, a control instruction for controlling the target device to respond to the voice can be generated.
The control instruction may be an instruction for controlling the target device to operate according to a requirement corresponding to the voice, for example, when the voice is "play music", the control instruction is used to control the target device and perform music playing.
And step S104, sending the control command to the target equipment.
After the control instruction is generated, the communication address of each target device can be obtained by inquiring, and then the control instruction is issued to the target device according to the communication address, so that the target device can operate according to the control instruction, and the purpose of responding to the voice of the target object is achieved.
By the method in the embodiment, under the condition that a plurality of candidate devices exist in a target environment and each candidate device can acquire corresponding target audio information, the target device actually required to be controlled by a target object can be determined according to the target audio information, so that the aim of accurately performing voice control can be fulfilled, and the problem that the specified voice interaction device can be accurately awakened can be effectively solved.
As an alternative implementation manner, as in the foregoing method, the step S102 of determining, according to all target audio information, a target device that needs to respond to a voice in at least two candidate devices includes the following steps:
step S201, performing semantic analysis on the target audio information, and determining intention information of the target audio information, where the intention information is information used to indicate a device whose target object needs to trigger a response.
After the target audio information is obtained, semantic analysis can be performed on the target audio information.
The target audio information can be converted into text information, then semantic analysis is carried out on the text information, and intention information corresponding to the text information is extracted.
For example, semantic parsing may be performed on the text information of the target object, extracting the key information, and then performing template matching (the template may include "one," "two," "multiple," "half," "all," and so on, which are words indicating the number of devices), if the template includes: the intention information of the user can be considered as 'partial device on', if the keyword includes 'all', the intention information of the user can be considered as 'all device on'.
In addition, the intention information may further include information indicating a unique identifier of a certain device, for example, when the keyword information includes "device No. two", the intention information of the user may be considered as "device identified as device No. two is turned on", and further, the intention information may further include information indicating a device state of the device (for example, device in a wake-up standby state, etc.).
Step S202, according to the intention information, the target device is determined from at least two candidate devices.
After the intention information is determined, target devices required to respond may be determined from all the candidate devices, and when the number of the target devices is less than the number of the candidate devices, the target device to be controlled may be determined according to the volume and the definition of the target audio information, and the target device may be determined according to identification information (for example, a location where the device is located, an ID of the device, and the like) of the target device specified in the target audio information.
By the method in the embodiment, the target device can be determined from all the candidate devices, so that the purpose of determining the device needing response is achieved, and the device control is performed according to the intention of the target object.
As shown in fig. 2, as an alternative implementation manner, as the foregoing method, after performing semantic analysis on the target audio information in step S201 to determine the intention information of the target audio information, the method further includes the following steps:
step S301, analyzing the target audio information to determine the volume and/or definition of the target audio information;
step S302, based on the volume and/or the definition, the audio quality of the target audio is determined.
After the target audio information is obtained, the signal of the obtained target audio information may be analyzed, and one or more of the volume and the definition of the target audio information may be determined according to the signal of the target audio information. Generally, when capturing the voice of the same target object, the closer the voice is to the target object, the higher the volume and the higher the definition of the captured target audio information are.
After the volume and/or the clarity of the target audio information is obtained, the audio quality of the target audio can be calculated. For example, a first evaluation value corresponding to each volume range and a second evaluation value corresponding to each definition range are preset, then an evaluation value X of the volume and an evaluation value Y of the definition are obtained by matching, and finally the evaluation values X and Y are weighted to obtain an audio quality Q of the target audio, where when the weight of the volume is n (n is less than or equal to 1), the audio quality Q is:
Q=Xn+Y(1-n)。
by the method in the embodiment, the audio quality of the target audio can be obtained based on the target audio information analysis, and further the target device can be conveniently selected according to the audio quality.
As an alternative implementation manner, as in the foregoing method, the step S202 of determining the target device from among the at least two candidate devices according to the intention information includes the following steps:
step S401, determining a first number of candidate devices;
step S402, determining a second number of the target devices according to the intention information and the first number, wherein the second number is smaller than or equal to the first number;
step S403, determining a second number of target devices from all candidate devices according to the audio quality of the target audio information acquired by each candidate device, where the audio quality of the target audio information acquired by the target device is higher than the audio quality of the target audio information acquired by other candidate devices except the target device.
After obtaining the target audio information, a first number of candidate devices may be determined based on a source of the target audio information.
Since the intention information is used to indicate the number of devices for which the target object needs to trigger a response, the second number of target devices may be determined after the intention information and the first number are obtained.
For example, when the intention information is: under the condition of one device/equipment awakening word and two devices/equipment awakening words, directly awakening a corresponding number of target devices; when the intention information is: half of the devices/device wake-up words and other keywords, the user's intention information may be considered as "opening part of the devices", and a second number (i.e., half of the first number) may be calculated according to the first number of the candidate devices, and if "all" is included, the user's intention information may be considered as "opening all the devices", and a second number (i.e., the same as the first number) may be obtained according to the first number of the candidate devices.
After the second number is determined, then a second number of target devices may be woken up among all candidate devices. And the audio quality of the target audio information collected by the target device is higher than that of the target audio information collected by other candidate devices except the target device.
For example, the candidate devices may be ranked (from high to low, or from low to high) according to the audio quality of the acquired target audio information, and the candidate device with the second number of highest audio quality may be selected as the target device.
In addition, a second number of target devices may be randomly selected from all the candidate devices according to the second number.
By the method in the embodiment, the target devices meeting the number indicated by the target audio information can be selected from all the candidate devices, and the quality of the audio information acquired by the target devices can be guaranteed, so that the accuracy of voice interaction is improved.
As an alternative implementation, as the foregoing method, before the step S101 acquires the target audio information from at least two candidate devices in the target time period, the method further includes the following steps:
step S501, in a target time period, acquiring candidate audio information of each specified device in a target environment;
in step S502, the tone information in each candidate audio information is extracted.
In order to enable the same candidate device to acquire a plurality of target audio information, each target audio information is sent by a target object for completing a certain control instruction, so that semantic consistency can be satisfied, and the target time period needs to be a time period with a preset time duration, for example, 5 seconds.
Moreover, the target environment comprises a plurality of designated devices, the number of the designated devices is more than that of the candidate devices, and the designated devices can also collect the voice in the target environment and obtain the candidate audio information.
After the candidate audio information is obtained, the tone color analysis can be performed on each candidate audio information to obtain the tone color information in each candidate audio information, and then whether the voice of the same object is acquired can be distinguished according to the tone color information.
As an alternative implementation, in the foregoing method, the step S101 of acquiring target audio information from at least two candidate devices in the target time period includes:
and determining target audio information of at least two candidate devices with the same tone information in all the candidate audio information.
That is to say, candidate audio information corresponding to the same tone information is determined according to the tone information and is used as target audio information for analyzing and acquiring a response strategy of the candidate device of the target audio information.
Further, when the candidate audio information I does not have other candidate audio information that is the same as the tone information thereof, it is not necessary to analyze whether the designated device I corresponding to the candidate audio information I needs to respond, and a corresponding control instruction I is directly generated according to the candidate audio information I and issued to the designated device I.
As an optional implementation manner, as in the foregoing method, before generating a control instruction for controlling the target device to respond according to semantic information corresponding to the target audio information, the method further includes:
and under the condition that only one candidate audio information is acquired, the specified device which acquires the candidate audio information is taken as the target device.
Further, when only one candidate audio information is acquired, it is highly possible that the target object only wants to be controlled for one device, and therefore, the designated device which acquires the candidate audio information is directly used as the target device.
By the method in the embodiment, whether the candidate audio information is from the same object can be determined according to the tone information, so as to determine the candidate device corresponding to each voice, and further determine whether to control the response of the candidate device, that is, when a certain voice is acquired by only one candidate device, an analysis process for determining the target device is not required, and only when a certain voice is acquired by a plurality of candidate devices, an analysis process for determining the target device is required, so that the analysis efficiency can be greatly improved.
As shown in fig. 3, an application example applying any of the foregoing embodiments is also provided:
in a home environment, after target audio information is acquired by a candidate device for a user's voice, the candidate devices send the audio data to the server, the server extracts information such as sound characteristics (e.g. tone, volume) from the target audio information of the user, and determines whether multiple candidate devices receive the same wake-up command at the same time, if multiple candidate devices receive the same voiceprint (i.e., tone) wake-up command (i.e., voice to wake up the candidate device) at the same time, at this point, the server recognizes the target audio information as text, semantically understanding the text, extracting text intentions, analyzing the text intentions, and when a plurality of candidate devices receive the same awakening instruction at the same time, analyzing according to the text extraction intention, and deciding the target equipment to be awakened according to the sound characteristics: judging whether the target audio information has awakening word keywords or not, and judging whether the user needs to open a single device or some part of devices or all devices according to the real awakening intention of the user when the awakening word keywords appear; if only a single candidate device is turned on, the candidate device with the highest loudness (i.e., volume) and definition in the received sound features wakes up and responds to the user's instruction. If a part of candidate equipment is opened, the server wakes up the corresponding number of candidate equipment according to a specific instruction of a user; such as: the user's intention is to wake up half of all candidate devices in the living room, and after the server has acquired this intention, the target device to be woken up is selected by means of sound characteristic intensity (i.e., loudness, clarity), random, etc. If only one candidate device receives the wake-up command, whether to wake up the candidate device is determined according to the intention.
For example:
firstly, under the condition that partial candidate devices are awakened, the number of devices to be awakened (namely, one type of intention information) is determined by combining specific device information and states in a family; if the user only wakes up part of the devices randomly without considering the device states of the devices (i.e. one of intention information, the device states include that the physical location information is reported actively by a human (a room is located, etc., and the room is set when the app is networked, etc.), the basic information is reported automatically by the devices, etc.), then the devices may be sorted in a descending order according to the sound characteristic intensity (e.g. loudness, etc.) received by each device, and N devices are taken out to wake up. If the sound feature strengths (the magnitudes of various features in the sound, such as the parameters of the sound including timbre, loudness, etc., which may be compared) are all similar (e.g., less than 2 db), then the candidate devices to wake up are randomly selected among the candidate devices. If the candidate devices are in the specific state, the device state of each candidate device in the family needs to be combined with whether the device state meets the device state described by the user, if the device state meets the device state, the candidate devices are directly taken as target devices, and corresponding control instructions for waking up are generated to wake up the target devices, and the wake-up instructions are ignored by other candidate devices which do not meet the device state.
And secondly, the equipment required to be opened can be specified. Can be obtained by: 1: candidate equipment reports equipment state information, and 2: the user needs to wake up the candidate device for the specified device state, 3: the candidate device in the device state is awakened. If all candidate devices are turned on, then the candidate devices that receive the audio are all awake.
The method is based on voiceprint (namely tone color) identification, user characteristic information is extracted on the basis of voiceprint identification, upper-layer service is established, and therefore target equipment is identified and obtained from all candidate equipment and is controlled to respond.
As shown in fig. 4, according to an embodiment of another aspect of the present application, there is also provided an apparatus control device including:
the acquisition module 1 is configured to acquire target audio information from at least two candidate devices in a target time period, where the at least two candidate devices are located in the same target environment, and each target audio information is obtained by acquiring the same voice emitted by the same target object for the corresponding candidate device;
the determining module 2 is configured to determine, according to all target audio information, a target device that needs to respond to the voice from among the at least two candidate devices;
the generating module 3 is used for generating a control instruction for controlling the target equipment to respond according to the semantic information corresponding to the target audio information;
and the issuing module 4 is used for issuing the control command to the target equipment.
Specifically, the specific process of implementing the functions of each module in the apparatus according to the embodiment of the present invention may refer to the related description in the method embodiment, and is not described herein again.
According to another embodiment of the present application, there is also provided an electronic apparatus including: as shown in fig. 5, the electronic device may include: the system comprises a processor 1501, a communication interface 1502, a memory 1503 and a communication bus 1504, wherein the processor 1501, the communication interface 1502 and the memory 1503 complete communication with each other through the communication bus 1504.
A memory 1503 for storing a computer program;
the processor 1501 is configured to implement the steps of the above-described method embodiments when executing the program stored in the memory 1503.
The bus mentioned in the electronic device may be a Peripheral Component Interconnect (PCI) bus, an Extended Industry Standard Architecture (EISA) bus, or the like. The bus may be divided into an address bus, a data bus, a control bus, etc. For ease of illustration, only one thick line is shown, but this does not mean that there is only one bus or one type of bus.
The communication interface is used for communication between the electronic equipment and other equipment.
The Memory may include a Random Access Memory (RAM) or a Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the processor.
The Processor may be a general-purpose Processor, including a Central Processing Unit (CPU), a Network Processor (NP), and the like; but also Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components.
The embodiment of the present application further provides a computer-readable storage medium, where the storage medium includes a stored program, and when the program runs, the method steps of the above method embodiment are executed.
It is noted that, in this document, relational terms such as "first" and "second," and the like, may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The foregoing are merely exemplary embodiments of the present invention, which enable those skilled in the art to understand or practice the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. An apparatus control method characterized by comprising:
in a target time period, acquiring target audio information from at least two candidate devices, wherein the at least two candidate devices are located in the same target environment, and each target audio information is obtained by acquiring the same voice emitted by the same target object by the corresponding candidate device;
according to all the target audio information, determining target equipment needing to respond to the voice in the at least two candidate equipment;
generating a control instruction for controlling the target equipment to respond according to the semantic information corresponding to the target audio information;
and issuing the control instruction to the target equipment.
2. The method of claim 1, wherein the determining, from all of the target audio information, a target device that needs to respond to the speech among the at least two candidate devices comprises:
performing semantic analysis on the target audio information to determine intention information of the target audio information, wherein the intention information is information used for indicating equipment of which the target object needs to trigger response;
and determining the target equipment in the at least two candidate equipments according to the intention information.
3. The method of claim 2, wherein after the semantic analyzing the target audio information to determine the intention information of the target audio information, the method further comprises:
analyzing the target audio information to determine the volume and/or definition of the target audio information;
and determining the audio quality of the target audio based on the volume and/or the definition.
4. The method of claim 3, wherein said determining the target device among the at least two candidate devices according to the intention information comprises:
determining a first number of the candidate devices;
determining a second number of the target devices according to the intention information and the first number, wherein the second number is smaller than or equal to the first number;
and determining the second number of the target devices in all the candidate devices according to the audio quality of the target audio information acquired by each candidate device, wherein the audio quality of the target audio information acquired by the target devices is higher than the audio quality of the target audio information acquired by other candidate devices except the target devices.
5. The method of any of claims 1 to 4, wherein prior to obtaining target audio information from at least two candidate devices in the target time period, the method further comprises:
in the target time period, acquiring candidate audio information of each specified device in the target environment;
and extracting to obtain tone information in each candidate audio information.
6. The method of claim 5, wherein obtaining target audio information from at least two candidate devices in a target time period comprises:
and determining target audio information of the at least two candidate devices with the same tone information in all the candidate audio information.
7. The method according to claim 5, wherein before generating the control instruction for controlling the target device to respond according to the semantic information corresponding to the target audio information, the method further comprises:
and under the condition that only one candidate audio information is acquired, the designated equipment which acquires the candidate audio information is taken as the target equipment.
8. An apparatus control device, characterized by comprising:
the device comprises an acquisition module, a processing module and a processing module, wherein the acquisition module is used for acquiring target audio information from at least two candidate devices in a target time period, the at least two candidate devices are positioned in the same target environment, and each target audio information is obtained by acquiring the same voice emitted by the same target object by the corresponding candidate device;
a determining module, configured to determine, according to all the target audio information, a target device that needs to respond to the voice from the at least two candidate devices;
the generating module is used for generating a control instruction for controlling the target equipment to respond according to the semantic information corresponding to the target audio information;
and the issuing module is used for issuing the control command to the target equipment.
9. An electronic device, comprising: the system comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;
the memory is used for storing a computer program;
the processor, when executing the computer program, implementing the method of any of claims 1 to 7.
10. A computer-readable storage medium, characterized in that the storage medium comprises a stored program, wherein the program when executed performs the method of any of the preceding claims 1 to 7.
CN202111212291.7A 2021-10-18 2021-10-18 Equipment control method and device, electronic equipment and storage medium Pending CN113990312A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111212291.7A CN113990312A (en) 2021-10-18 2021-10-18 Equipment control method and device, electronic equipment and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111212291.7A CN113990312A (en) 2021-10-18 2021-10-18 Equipment control method and device, electronic equipment and storage medium

Publications (1)

Publication Number Publication Date
CN113990312A true CN113990312A (en) 2022-01-28

Family

ID=79739236

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111212291.7A Pending CN113990312A (en) 2021-10-18 2021-10-18 Equipment control method and device, electronic equipment and storage medium

Country Status (1)

Country Link
CN (1) CN113990312A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116074150A (en) * 2023-03-02 2023-05-05 广东浩博特科技股份有限公司 Switch control method and device for intelligent home and intelligent home

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116074150A (en) * 2023-03-02 2023-05-05 广东浩博特科技股份有限公司 Switch control method and device for intelligent home and intelligent home
CN116074150B (en) * 2023-03-02 2023-06-09 广东浩博特科技股份有限公司 Switch control method and device for intelligent home and intelligent home

Similar Documents

Publication Publication Date Title
JP6475386B2 (en) Device control method, device, and program
US10733978B2 (en) Operating method for voice function and electronic device supporting the same
CN110100447B (en) Information processing method and device, multimedia device and storage medium
JP2019057273A (en) Method and apparatus for pushing information
CN108958810A (en) A kind of user identification method based on vocal print, device and equipment
CN107112014A (en) Application foci in voice-based system
CN112201246B (en) Intelligent control method and device based on voice, electronic equipment and storage medium
CN105679310A (en) Method and system for speech recognition
US20180285068A1 (en) Processing method of audio control and electronic device thereof
CN110290280B (en) Terminal state identification method and device and storage medium
US11862153B1 (en) System for recognizing and responding to environmental noises
KR20230141950A (en) Voice query qos based on client-computed content metadata
JP2017192091A (en) IOT system with voice control function and information processing method thereof
WO2019227370A1 (en) Method, apparatus and system for controlling multiple voice assistants, and computer-readable storage medium
CN112767916A (en) Voice interaction method, device, equipment, medium and product of intelligent voice equipment
CN113990312A (en) Equipment control method and device, electronic equipment and storage medium
CN112634897B (en) Equipment awakening method and device, storage medium and electronic device
CN108600559B (en) Control method and device of mute mode, storage medium and electronic equipment
US10861453B1 (en) Resource scheduling with voice controlled devices
CN109558717A (en) A kind of management method of router, device, storage medium and router
CN110060662B (en) Voice recognition method and device
CN112420043A (en) Intelligent awakening method and device based on voice, electronic equipment and storage medium
CN114999496A (en) Audio transmission method, control equipment and terminal equipment
CN113314115A (en) Voice processing method of terminal equipment, terminal equipment and readable storage medium
CN114077840A (en) Method, device, equipment and storage medium for optimizing voice conversation system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination