WO2023206686A1

WO2023206686A1 - Control method for smart device, and storage medium and electronic apparatus

Info

Publication number: WO2023206686A1
Application number: PCT/CN2022/095335
Authority: WO
Inventors: 郝斌
Original assignee: 青岛海尔科技有限公司; 海尔智家股份有限公司
Priority date: 2022-04-29
Filing date: 2022-05-26
Publication date: 2023-11-02
Also published as: CN117014246A

Abstract

A control method for a smart device, and a storage medium and an electronic apparatus, which relate to the field of home automation/smart homes. The method comprises: acquiring a first sound signal received by a first device and a second sound signal received by a second device, wherein the first sound signal and the second sound signal are sound signals corresponding to an operation execution signal which is sent by a target object (S202); determining a target cross-power spectrum of the first sound signal and the second sound signal according to the first sound signal and the second sound signal (S204); according to the target cross-power spectrum, determining a target delay between the first device and the second device, wherein the target delay is the difference between the time of the operation execution signal reaching the first device and the time of the operation execution signal reaching the second device (S206); and according to the target delay, determining a target device from the first device and the second device, and controlling the target device to execute a device operation indicated by the operation execution signal, wherein the target device is the device which is closest to the target object (S208).

Description

Control method, storage medium and electronic device of intelligent equipment

This disclosure claims priority to the Chinese patent application filed with the China Patent Office on April 29, 2022, with application number 202210469005.3 and the invention title "Control method, storage medium and electronic device for intelligent equipment", the entire content of which is incorporated by reference. This disclosure is ongoing.

Technical field

The present disclosure relates to the field of smart home/smart home, specifically, to a control method, storage medium and electronic device of an intelligent device.

Background technique

Currently, smart devices can be controlled to perform device operations through voice control. For example, a voice wake-up signal is used to wake up the voice interaction function of the smart device, and voice control instructions are used to control and trigger the voice interaction function of the smart device. In many cases, the same scene may include multiple devices configured with voice interaction functions, and users usually only need one device to respond to voice control instructions at the same time. For example, only the device close to the user needs to execute voice control instructions.

Taking the voice wake-up command in a smart home environment as an example, there are multiple devices equipped with voice interaction functions in the same scene, including speakers, TVs, washing machines, etc., and the user only needs to wake up one of the devices at the same time. In this regard, the nearest wake-up determination algorithm can be used to wake up the device. Nearby wake-up means that devices closest to the speaker respond first.

Most methods of determining the device closest to the user mainly use energy information for determination, and normalized energy can be used as the criterion to determine the device closest to the user. Generally, this type of algorithm only considers the relationship between the microphone array itself on each device. After the self-noise is removed from the signals of each device (for example, audio and TV series perform echo cancellation, washing machines remove local noise, etc.), the signal energy is normalized. , combined with the evaluation scores to make the judgment.

The control method of the above-mentioned smart device is effective in high signal-to-noise ratio scenarios and has good results. However, in scenarios with low signal-to-noise ratio and high reverberation, the energy received by the microphone not only includes the sound source, but also the energy received by the microphone. There are also noise and reverberation, such as historical signals of sound sources emitted from various objects in the environment, and other noises. The control methods of the above-mentioned smart devices have poor performance.

It can be seen that the control method of smart devices in related technologies has the problem of poor device control accuracy due to the presence of noise and reverberation in scenarios with low signal-to-noise ratio and high reverberation.

Contents of the invention

Embodiments of the present disclosure provide a control method, a storage medium and an electronic device for an intelligent device, to at least solve the problem that the control method of an intelligent device in related technologies exists in a scene with low signal-to-noise ratio and high reverberation due to the presence of noise. And the problem of poor accuracy of device control caused by reverberation.

According to an aspect of an embodiment of the present disclosure, a method for controlling a smart device is provided, including: acquiring a first sound signal received by a first device and a second sound signal received by a second device, wherein the first sound signal The signal and the second sound signal are sound signals corresponding to the operation execution signal issued by the target object; the first sound signal and the second sound signal are determined according to the first sound signal and the second sound signal. The target cross power spectrum of the signal; according to the target cross power spectrum, determine the target delay between the first device and the second device, wherein the target delay is when the operation execution signal reaches the The difference in time between the first device and the second device; determining a target device from the first device and the second device according to the target delay, and controlling the target device to perform the operation execution Operation of the device indicated by the signal, wherein the target device is the device closest to the target object.

According to another aspect of the embodiment of the present disclosure, a control device for an intelligent device is also provided, including: an acquisition unit configured to acquire the first sound signal received by the first device and the second sound signal received by the second device, Wherein, the first sound signal and the second sound signal are sound signals corresponding to the operation execution signal issued by the target object; the first determination unit is configured to determine the sound signal according to the first sound signal and the second sound signal. , determine the target cross power spectrum of the first sound signal and the second sound signal; the second determination unit is configured to determine the relationship between the first device and the second device according to the target cross power spectrum. The target delay, wherein the target delay is the time difference between the operation execution signal arriving at the first device and the second device; the execution unit is configured to, according to the target delay, start from the A target device is determined among the first device and the second device, and the target device is controlled to perform the device operation indicated by the operation execution signal, wherein the target device is the device closest to the target object.

According to yet another aspect of the embodiments of the present disclosure, a computer-readable storage medium is also provided. The computer-readable storage medium stores a computer program, wherein the computer program is configured to execute the above-mentioned smart device when running. Control Method.

According to another aspect of the embodiment of the present disclosure, an electronic device is also provided, including a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor executes the above-mentioned steps through the computer program. Control methods for smart devices.

In the embodiment of the present disclosure, the smart device to be controlled is determined based on the cross-correlation function between the received signals of the two devices, by obtaining the first sound signal received by the first device and the second sound signal received by the second device. Sound signal, wherein the first sound signal and the second sound signal are sound signals corresponding to the operation execution signal issued by the target object; according to the first sound signal and the second sound signal, the first sound signal and the second sound signal are determined. Target cross power spectrum; according to the target cross power spectrum, determine the target delay between the first device and the second device, where the target delay is the time difference between the arrival of the operation execution signal at the first device and the arrival at the second device; according to The target delay is to determine the target device from the first device and the second device, and control the target device to perform the device operation indicated by the operation execution signal, wherein the target device is the device closest to the target object, because the target device is based on the signal received by the two devices. The cross power spectrum of the sound signal determines the time difference between the two devices receiving the sound signal, so that the device closest to the user can be selected as the smart device to be controlled based on the time difference, and is determined based on the arrival time difference of the operation execution signal The device closest to the user can achieve the purpose of reducing miscontrol of the device caused by the impact of noise, reverberation, etc. on the signal energy, achieving the technical effect of improving the accuracy of device control, and thus solving the problem of control of smart devices in related technologies. This method has the problem of poor device control accuracy due to the presence of noise and reverberation in scenes with low signal-to-noise ratio and high reverberation.

Description of the drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below. Obviously, for those of ordinary skill in the art, It is said that other drawings can be obtained based on these drawings without exerting creative labor.

Figure 1 is a schematic diagram of the hardware environment of an optional smart device control method according to an embodiment of the present disclosure;

Figure 2 is a schematic flowchart of an optional smart device control method according to an embodiment of the present disclosure;

Figure 3 is a schematic flowchart of another optional smart device control method according to an embodiment of the present disclosure;

Figure 4 is a structural block diagram of an optional intelligent device control device according to an embodiment of the present disclosure;

FIG. 5 is a structural block diagram of an optional electronic device according to an embodiment of the present disclosure.

Detailed ways

In order to enable those skilled in the art to better understand the present disclosure, the following will clearly and completely describe the technical solutions in the present disclosure embodiments in conjunction with the accompanying drawings. Obviously, the described embodiments are only These are part of the embodiments of this disclosure, not all of them. Based on the embodiments in this disclosure, all other embodiments obtained by those of ordinary skill in the art without creative efforts should fall within the scope of protection of this disclosure.

It should be noted that the terms "first", "second", etc. in the description and claims of the present disclosure and the above-mentioned drawings are used to distinguish similar objects and are not necessarily used to describe a specific order or sequence. It is to be understood that the data so used are interchangeable under appropriate circumstances so that the embodiments of the disclosure described herein can be practiced in sequences other than those illustrated or described herein. In addition, the terms "including" and "having" and any variations thereof are intended to cover non-exclusive inclusions, e.g., a process, method, system, product, or apparatus that encompasses a series of steps or units and need not be limited to those explicitly listed. Those steps or elements may instead include other steps or elements not expressly listed or inherent to the process, method, product or apparatus.

According to an aspect of an embodiment of the present disclosure, a method for controlling a smart device is provided. The control method of this smart device is widely used in whole-house intelligent digital control application scenarios such as smart home, smart home, smart home device ecology, and smart residence (Intelligence House) ecology. Optionally, in this embodiment, the above intelligent device control method can be applied to a hardware environment composed of a terminal device 102 and a server 104 as shown in FIG. 1 . As shown in Figure 1, the server 104 is connected to the terminal device 102 through the network and can be used to provide services (such as application services, etc.) for the terminal or the client installed on the terminal. A database can be set up on the server or independently from the server. To provide data storage services for the server 104, cloud computing and/or edge computing services can be configured on the server or independently of the server to provide data computing services for the server 104.

The above-mentioned network may include but is not limited to at least one of the following: wired network, wireless network. The above-mentioned wired network may include but is not limited to at least one of the following: wide area network, metropolitan area network, and local area network. The above-mentioned wireless network may include at least one of the following: WIFI (Wireless Fidelity, Wireless Fidelity), Bluetooth. The terminal device 102 may be, but is not limited to, a PC, a mobile phone, a tablet, a smart air conditioner, a smart hood, a smart refrigerator, a smart oven, a smart stove, a smart washing machine, a smart water heater, a smart washing equipment, a smart dishwasher, or a smart projection device. , smart TV, smart clothes drying rack, smart curtains, smart audio and video, smart sockets, smart audio, smart speakers, smart fresh air equipment, smart kitchen and bathroom equipment, smart bathroom equipment, smart sweeping robot, smart window cleaning robot, smart mopping robot, Smart air purification equipment, smart steamers, smart microwave ovens, smart kitchen appliances, smart purifiers, smart water dispensers, smart door locks, etc.

The control method of the smart device in the embodiment of the present disclosure may be executed by the server 104, may be executed by the terminal device 102, or may be executed jointly by the server 104 and the terminal device 102. Wherein, the terminal device 102 may also execute the control method of the smart device according to the embodiment of the present disclosure by a client installed thereon.

Taking the server 104 executing the smart device control method in this embodiment as an example, Figure 2 is a schematic flowchart of an optional smart device control method according to an embodiment of the present disclosure. As shown in Figure 2, this method The process can include the following steps:

Step S202: Obtain the first sound signal received by the first device and the second sound signal received by the second device, where the first sound signal and the second sound signal are sound signals corresponding to the operation execution signal sent by the target object.

The control method of smart devices in this embodiment can be applied to scenarios where there are multiple smart devices that are allowed to use the same operation execution signal to control the execution of corresponding device operations. The operation execution signal can be a device wake-up signal, or a signal that controls the device to perform other device operations. Taking the device wake-up signal as an example, the device wake-up signal can contain the wake-up word of the smart device, and the smart device can respond to the received wake-up signal to wake it up. Voice interaction function. Smart devices may be smart home devices, which may include but are not limited to smart home appliances, such as the above-mentioned smart air conditioners, smart refrigerators, smart ovens, etc. In this embodiment, the type of smart device is not limited.

Under normal circumstances, the placement of various devices is unknown. The positions of some devices such as speakers are not even fixed. Compared with refrigerators, air conditioners, etc., the positions generally do not change frequently. The difficulties at this time are: 1) Using each device to coordinate After the angle of the sound source is estimated, the specific location of the sound source cannot be determined due to the lack of position information between devices; 2) Even if multiple devices are processed like a distributed array, position information between devices is still needed for sound source localization.

When the target object (corresponding to the user) needs to use a certain smart device, an operation execution signal can be sent. The operation execution signal can be used to instruct the execution of the corresponding device operation, which can be responded to by both the first device and the second device. voice control signal. For example, the operation execution signal may be a target wake-up signal, and the wake-up word carried by the target wake-up signal can wake up the first device and the second device at the same time. The first device and the second device can respectively collect sounds through the sound collecting components on them to obtain the first sound signal and the second sound signal. Here, the sound collecting component can be a microphone or a microphone array. Correspondingly, the first sound signal is a sound signal collected by the first microphone in the first microphone array on the first device, and the second sound signal is collected by the second microphone in the second microphone array on the second device. sound signal.

For the collected sound signals, the first device and the second device can respectively send the collected sound signals to the server. The server may receive the sent sound signal from the first device and the second device respectively, thereby acquiring the first sound signal and the second sound signal.

Optionally, what the first device and the second device send to the server may be sound signals collected by their microphone arrays. The server can select the sound signal collected by the first microphone from the sound signal collected by the first microphone array to obtain the first sound signal (which may be the sound signal of the first channel); the sound collected from the second microphone array Among the signals, the sound signal collected by the second microphone is selected to obtain the second sound signal (which may be the sound signal of the second channel). The method of selecting the first microphone and the second microphone may be random selection or selection based on a sequence of microphones, which is not limited in this embodiment.

Optionally, the corresponding relationship between the sound signals collected by the first device and the second device can be obtained by matching based on the collection time of the sound signal, the signal characteristics of the sound signal, etc., that is, based on the first device and the second device. The sound signal collection time and the signal characteristics of the sound signal are matched to determine the sound signal of the second device that matches the sound signal collected by the first device.

Step S204: Determine the target cross power spectrum of the first sound signal and the second sound signal based on the first sound signal and the second sound signal.

Users typically only want to control one smart device at a time. If multiple devices collect sound signals at the same time, the server can select the smart device to be controlled from the multiple devices. The method of selecting the smart device to be controlled can be: using energy to select the nearest device. However, using energy to select nearby devices requires ensuring that the selected audio signal (sound signal) must be similar to the sound source signal to ensure the accuracy of device selection. Under low signal-to-noise ratio and high reverberation conditions, the accuracy of selecting the nearest device among multiple devices is poor.

In order to solve at least part of the above technical problems, the time difference between the sound signals received by the two devices can be determined based on the cross power spectrum of the sound signals received by the two devices, so that the device closest to the user can be selected based on the time difference. As a smart device to be controlled, determining the device closest to the user based on the arrival time difference of the operation execution signal can reduce device miscontrol caused by noise, reverberation and other effects on signal energy, thereby improving the accuracy of device control. .

For the operation execution signal, after acquiring the first sound signal and the second sound signal, the server may first calculate the cross power spectrum of the first sound signal and the second sound signal, obtain the target cross power spectrum, and calculate the first sound signal and the second sound signal. The mutual power spectrum of the two sound signals may be obtained in one or more ways. For example, the first sound signal and the second sound signal may be converted into the frequency domain, and the first sound signal and the second sound signal may be determined based on the converted frequency domain signals. The cross power spectrum of the sound signal can also be determined by other methods, which is not limited in this embodiment.

Step S206: Determine the target delay between the first device and the second device according to the target cross power spectrum, where the target delay is the time difference between the operation execution signal arriving at the first device and the second device.

According to the target cross power spectrum, the server can determine the target delay between the first device and the second device. The target delay here is the time difference between the operation execution signal arriving at the first device and the second device. It can be the operation execution time. The difference between the time the signal reaches the first device and the time the operation execution signal reaches the second device. There may be one or more ways to determine the target delay based on the target cross-power spectrum. For example, it may be determined based on the generalized cross-correlation function, or it may be determined based on other functions related to the cross-power spectrum. In this embodiment, the target delay is determined based on the target cross-power spectrum. The method of determining the target delay using the target cross power spectrum is not specifically limited.

Step S208: Determine the target device from the first device and the second device according to the target delay, and control the target device to perform the device operation indicated by the operation execution signal, where the target device is the device closest to the target object.

According to the target delay, the server may determine the device closest to the target object among the first device and the second device, and determine the determined device as the target device. The determined target device may be the smart device to be controlled. For example, for a scenario where the operation execution signal is a target wake-up signal, the target device to be woken up among the first device and the second device may be determined.

After determining the target device, the server may control the target device to perform a device operation indicated by the operation execution signal. Here, the operation execution signal may carry an operation execution instruction, and the target device may perform the device operation indicated by the operation execution instruction carried in the operation execution signal. For example, if the operation execution instruction is an instruction to control increasing the temperature, then the device closest to the user can be selected from two devices that are allowed to execute the instruction to increase the temperature, such as a water heater and a bathroom heater, to perform the operation of increasing the temperature.

Optionally, for a scenario where the operation execution signal is a target wake-up signal, if both the first device and the second device have woken up, the other device among the first device and the second device, except the target device, can be controlled to enter from the wake-up state. to sleep state. For the target device, it can be controlled to remain in the awake state, so that the target device can collect subsequent voice control signals sent by the target object through the sound collection component on it, and respond to the collected voice control signals to execute the instructions of the voice control signals. The control operation is not limited in this embodiment.

It should be noted that although two devices are used as an example to illustrate the scenario of selecting the nearest device for control in this embodiment, it is not limited to this. For a scenario where there are more than two controllable devices in the same scenario, you can first select from multiple devices. Select two devices from among the devices as the first device and the second device, determine the device closer to the target object (corresponding to the above-mentioned target device) in the above manner, and use the determined device as the first device, and continue to select the device from multiple devices. Select one device from the remaining devices as the second device and continue to determine the device closer to the target object until there are no devices left among the multiple devices. The device finally determined to be closest to the target object is the target device.

Through the above steps, the first sound signal received by the first device and the second sound signal received by the second device are obtained, wherein the first sound signal and the second sound signal are sound signals corresponding to the operation execution signal sent by the target object. ; According to the first sound signal and the second sound signal, determine the target cross power spectrum of the first sound signal and the second sound signal; according to the target cross power spectrum, determine the target delay between the first device and the second device, where , the target delay is the time difference between the arrival of the operation execution signal at the first device and the arrival at the second device; according to the target delay, the target device is determined from the first device and the second device, and the target device is controlled to execute the instruction of the operation execution signal Device operation, in which the target device is the device closest to the target object, solves the problem of the control method of smart devices in related technologies due to the presence of noise and reverberation in scenarios with low signal-to-noise ratio and high reverberation. The problem of poor accuracy of equipment control improves the accuracy of equipment control.

In an exemplary embodiment, determining the target cross power spectrum of the first sound signal and the second sound signal according to the first sound signal and the second sound signal includes:

S11, obtain the initial mutual power spectrum of the first sound signal and the second sound signal;

S12, obtain the target reverberation gain value corresponding to the first device and the second device, where the target reverberation gain value is a gain value used to suppress reverberation noise;

S13, use the target reverberation gain value to update the initial cross power spectrum to obtain the target cross power spectrum.

In this embodiment, when determining the cross power spectrum of the first sound signal and the second sound signal, the server may obtain the initial cross power spectrum of the first sound signal and the second sound signal. Obtaining the initial cross power spectrum may be: obtaining the spectrum signals of the first sound signal and the second sound signal respectively, obtaining the first frequency domain signal and the second frequency domain signal, and then calculating the first frequency domain signal and the second frequency domain signal. Cross power spectrum, the initial cross power spectrum is obtained.

For example, two devices are device A and device B. One signal is selected from the voice signals collected by the microphone arrays of device A and device B respectively. The two obtained signals are x ₁ and x ₂ respectively, belonging to the device. A and device B, as shown in formula (1):

x _i (n) = s _i (n) + _di (n) (1)

Among them, x is the microphone received signal, i is the microphone serial number, with values 1 and 2, n is the time domain sampling point, s is the sound source signal, and d is the noise.

The short-time Fourier transform (STFT, short-time Fourier transform, or short-term Fourier transform) is performed on the two signals respectively. The frequency domain signal after transformation to the frequency domain is as follows: Formula (2):

X _i (l,f)=S _i (l,f)+D _i (l,f) (2)

Where,

After determining the frequency domain signals corresponding to the two signals, the mutual power spectrum of the two signals can be determined based on the two frequency domain signals.

Determining the cross power spectrum of the two signals refers to related technologies, which will not be described in detail in this embodiment. In addition, the cross power spectrum can also be smoothed, and the smoothed cross power spectrum can be shown in formula (3):

in,

is the mutual power spectrum of the two signals, σ is the smoothing factor, the value can be 0.9, and * is the conjugate.

The server may also obtain a target reverberation gain value corresponding to the first device and the second device, where the target reverberation gain value is a gain value used to suppress reverberation noise. The target reverberation gain value may be determined based on the sound signal received by the first device that matches the operation execution signal, and the sound signal received by the second device that matches the operation execution signal, for example, the first sound signal and the second The sound signal may be determined based on signals other than the first sound signal and the second sound signal. The method of determining the reverberation gain value based on the sound signal can be any method capable of determining the reverberation gain value between sound signals, which is not limited in this embodiment.

After obtaining the target reverberation gain value, the target reverberation gain value can be used to update the initial cross power spectrum to obtain the target cross power spectrum. The way to update the initial cross power spectrum can be to use the target reverberation gain value to update the initial cross power spectrum. Smoothing processing may also be performed in other ways. In this embodiment, there is no limitation on the way of updating the initial cross power spectrum.

It should be noted that, in addition to the reverberation gain, there are many ways to improve the signal-to-noise ratio of the signal cross-power spectrum. For example, the gain value can suppress other noises, such as stationary noise, non-stationary noise, etc. In this embodiment, This will not be described in detail.

Through this embodiment, the cross power spectrum is updated through the reverberation gain value. After considering the reverberation factor, the cross power spectrum can better characterize the relationship between sound source signals and improve the accuracy of equipment control.

In an exemplary embodiment, obtaining target reverberation gain values corresponding to the first device and the second device includes:

S21, obtain multiple sound signals received by each of the first device and the second device, where the multiple sound signals are sound signals corresponding to the operation execution signal;

S22, determine the coherence functions of multiple sound signals and obtain the target coherence function;

S23, estimate the reverberation suppression coefficient of each device according to the target coherence function, and obtain the target reverberation suppression coefficient;

S24, determine the maximum value of the minimum reverberation gain value and the reverberation gain value corresponding to the target reverberation suppression coefficient as the reverberation gain value of each device.

In this embodiment, for any one of the first device and the second device, multiple sound signals received by each device can be obtained respectively. The multiple sound signals are all sound signals corresponding to the operation execution signal, which can They are sound signals of different channels in the same microphone array of the same device, that is, different sound signals collected by different microphones. The method of obtaining multiple sound signals received by each device may be: selecting multiple microphones according to the microphone serial numbers in the microphone array of each device, and determining the sound signals received by the multiple microphones as multiple sound signals, Other methods may also be used to obtain multiple sound signals corresponding to each device, which is not limited in this embodiment.

For multiple sound signals, the server can determine the coherence functions of the multiple sound signals and obtain the target coherence function. The method of determining the correlation functions of multiple sound signals may be: determining the correlation functions of any two sound signals among the multiple sound signals to obtain multiple coherence functions. The target coherence function may include multiple coherence functions, or be obtained from multiple coherence functions. A coherent function selected from . When determining the correlation function of any two sound signals, the mutual power spectrum of the two sound signals can be determined first; then, based on the mutual power spectrum of the two sound signals, the coherence function of the two sound signals is determined. Here, the method of obtaining the cross power spectrum of the two sound signals is similar to that in the previous embodiment, and will not be described again.

Optionally, for any two sound signals, for example, the third sound signal and the fourth sound signal, determine the coherence of the third sound signal and the fourth sound signal based on the mutual power spectrum of the third sound signal and the fourth sound signal. The function can be as shown in formula (4):

in,

is the coherence function of two sound signals,

is the cross power spectrum of the two sound signals x ₃ and x ₄ ,

is the autopower spectrum of the sound signal x ₃ ,

is the autopower spectrum of the sound signal x ₄ , x ₃ may be the third sound signal, and x ₄ may be the fourth sound signal.

Since the sound source angle (i.e., the angle between the direction of the target wake-up signal and the microphone) is unknown, after determining the target coherence function, the reverberation suppression coefficient of each device can be estimated through the target coherence function, and the reverberation suppression coefficient of The method can be as shown in formula (5):

in,

is the reverberation suppression coefficient, Γ _n (f) = sinc (2πfd'/c), and the noise field uses the scattering noise field, d' is the distance between the microphones that receive the two sound signals (the spacing is known), and c is speed of sound.

Based on the target reverberation suppression coefficient, the reverberation gain value of each device may be further determined, that is, the target reverberation gain value. The way to determine the reverberation gain value of each device can be as shown in formula (6):

Among them, G(l,f) is the reverberation gain value, G _min is the minimum reverberation gain value, and ε is the preset value, which can be 0.9.

Optionally, the reverberation gain value may be determined based on sound signals respectively selected by different devices. In this case, the reverberation gain value of the first device and the reverberation gain value of the second device may be the same, that is, the first device and the second device receive at least one sound signal and obtain multiple sound signals. ; Determine the coherence functions of multiple sound signals to obtain the target coherence function; estimate the reverberation suppression coefficient between the first device and the second device according to the target coherence function to obtain the target reverberation suppression coefficient; convert the minimum reverberation gain value The maximum value among the reverberation gain values corresponding to the target reverberation suppression coefficient is determined as the target reverberation gain value. The third sound signal and the first sound signal may be the same sound signal or different sound signals, which is not limited in this embodiment.

Through this embodiment, the reverberation suppression coefficient is estimated based on the coherence function, and the reverberation gain value is determined based on the reverberation suppression coefficient, which can improve the accuracy of the reverberation gain value determination.

In an exemplary embodiment, the obtaining a plurality of sound signals received by each of the first device and the second device includes:

S31. Obtain two sound signals received by each device, where the two sound signals are sounds corresponding to the operation execution signals received by different microphones of the same microphone array of each device. Signal.

In this embodiment, in order to improve the timeliness of device control, the number of acquired sound signals received by each device may be two. That is, the server can obtain two sound signals (or two-way sound signals) received by each device. The two sound signals obtained are received by different microphones of the same microphone array of each device and are related to the operation execution signal. corresponding sound signal.

After acquiring the two sound signals received by each device, the server can determine the coherence functions of the two sound signals in a similar manner to the previous embodiment, obtain the target coherence function, and calculate the target coherence function for each device based on the obtained target coherence function. The reverberation suppression coefficient is estimated to obtain the target reverberation suppression coefficient; the maximum value of the minimum reverberation gain value and the reverberation gain value corresponding to the target reverberation suppression coefficient is determined as the reverberation gain value of each device. The process of determining the reverberation gain value of each device is similar to the previous embodiment and will not be described again.

In this embodiment, the reverberation gain value of each device is determined by separately obtaining two sound signals received by each device, which can improve the timeliness of determining the reverberation gain of the device.

In an exemplary embodiment, the target reverberation gain value includes a first reverberation gain value corresponding to the first device and a second reverberation gain value corresponding to the second device, for example, based on equation (5) The reverberation gain value determined by the calculation method. The first reverberation gain value and the second reverberation gain value may be the same or different, which is not limited in this embodiment.

Correspondingly, the initial cross power spectrum is updated using the target reverberation gain value to obtain the target cross power spectrum, including:

S41, perform a multiplication operation on the product of the first reverberation gain value and the first frequency domain signal, and the product of the second reverberation gain value and the conjugate of the second frequency domain signal to obtain the reverberation reference information, where the first The frequency domain signal is a frequency domain signal corresponding to the first sound signal, and the second frequency domain signal is a frequency domain signal corresponding to the second sound signal;

S42: Perform a weighted summation operation on the initial cross power spectrum and the reverberation reference information to obtain the target cross power spectrum.

Based on the first reverberation gain value and the second reverberation gain value, the server may calculate the product of the first reverberation gain value and the first frequency domain signal, and the product of the second reverberation gain value and the conjugate of the second frequency domain signal. Perform a multiplication operation to obtain the reverberation reference information. After obtaining the reverberation reference information, a weighted summation operation can be performed on the initial cross power spectrum and the reverberation reference information to obtain the target cross power spectrum.

For example, for devices A and B, select two channels each (the spacing is known), and calculate the corresponding reverberation gain values. Then select a channel of A and B (i.e., x ₁ and x ₂ ). After data calibration, calculate its cross power spectrum. The method of calculating the cross power spectrum can be as shown in formula (7):

in,

is the mutual power spectrum of x ₁ and x ₂ , G _A (l,f) is the reverberation gain value of device A, G _B (l,f) is the reverberation gain value of device B, X ₁ (l,f) is the frequency domain signal of x ₁ , X ₂ ^* (l, f) is the conjugate of the frequency domain signal of x ₂ , θ is a preset value, which can be any value between [0,1], for example, 0.9 .

Through this embodiment, by considering the impact of reverberation factors on the cross-correlation function and updating the cross-power spectrum based on the reverberation gain value and the corresponding frequency domain signal, the ability of the cross-power spectrum to represent the relationship between sound source signals can be improved. This improves the accuracy of equipment control.

In an exemplary embodiment, determining the target delay between the first device and the second device according to the target cross power spectrum includes:

S51. According to the target cross power spectrum, determine the generalized cross-correlation phase transformation function corresponding to the first device and the second device, where the generalized cross-correlation phase transformation function uses the time delay between the first device and the second device as a variable. The function;

S52. Find the delay that maximizes the function value of the generalized cross-correlation phase transformation function in the first delay range to obtain the target delay. The first delay range is based on the inverse number of the delay threshold and the delay threshold. In the endpoint interval, the delay threshold is the value obtained by dividing the distance between the first device and the second device by the speed of sound.

Optionally, the time delay between the first device and the second device may be determined based on a GCC-PHAT (Generalized Cross Correlation PHAse Transformation) function corresponding to the first device and the second device. The server may determine the GCC-PHAT function corresponding to the first device and the second device according to the target cross power spectrum, where the GCC-PHAT function is a function with the time delay between the first device and the second device as a variable. For example, the GCC-PHAT function can be determined based on equation (8)

in,

is the mutual power spectrum between the two devices, τ is the time delay between the two devices, the meanings of l, f and h are similar to those in the previous embodiments and will not be described again here.

According to existing experience, if the microphone distance d is known, the value range of the delay is τ∈[-d/c,d/c], where c is the speed of sound propagation, that is, the speed of sound. For the first device and the second device, the delay value range is the first delay range, and the first delay range is based on the delay threshold (that is, d/c, d is the distance between the two devices) The inverse number and delay threshold are the endpoints of the interval.

When determining the delay between the first device and the second device, the optimal delay can be found within the first delay range, that is, the optimal delay can be found within the first delay range such that

(The variable is τ) The maximum delay is a problem of finding the global optimal solution. You can use interpolation (find the optimal solution through a large number of interpolations) or other methods to find the time that maximizes the function value of the GCC-PHAT function. delay to obtain the target delay.

Through this embodiment, the delay between devices is determined through the GCC-PHAT function, which can improve the accuracy of delay determination.

In an exemplary embodiment, the delay that maximizes the function value of the generalized cross-correlation phase transformation function is found within the first delay range to obtain the target delay, which includes:

S61: Randomly select a delay within the second delay range to obtain a random delay, where the second delay range is an interval with zero and the delay threshold as endpoints;

S62: Determine the first parameter function value corresponding to the generalized cross-correlation phase transformation function and the random delay, and the second reference function value corresponding to the generalized cross-correlation phase transformation function and the inverse of the random delay;

S63. Determine the delay corresponding to the maximum function value among the first reference function value and the second reference function value as the initial delay;

S64, based on the initial delay and the inverse of the initial delay, perform an interpolation operation within the first delay range to obtain the target delay, where the target delay is the initial delay and the delay inserted by the interpolation operation, such that The time delay at which the function value of the generalized cross-correlation phase transformation function is maximum.

Optionally, the optimal delay can be found within the first delay range through an interpolation method. For example, the optimal τ can be obtained by performing a large number of interpolation calculations. For example, the opposite number of the delay threshold can be used as the starting point, and the preset time interval is the step size to sequentially interpolate within the first delay range to obtain a set of insertion delays; determine the GCC-PHAT function and a set of The function value corresponding to each insertion delay in the insertion delay; select the delay with the largest corresponding function value from the inverse of the delay threshold, a set of insertion delays and the delay threshold to obtain the target delay.

Considering that in actual interaction scenarios, limited by the response time and the computing power of the device, a small amount of calculation is often required, and the method of finding the optimal delay through a large number of interpolations has poor applicability. In this embodiment, the convenience of interpolation processing can be improved by setting the reference delay and selecting the interpolation position based on the reference delay and the function value corresponding to the GCC-PHAT function and the reference delay. The server can randomly select a delay within the second delay range to obtain a random delay. Optionally, the random delay can be a positive value. Correspondingly, the second delay range can be an interval between zero and the delay threshold as the endpoint. For example, assuming that the distance d between two microphones is at most 6m, the delay Randomly select a value between (0,d/c] to obtain τ ₀ (an example of random delay).

The server may also determine the first parameter function value corresponding to the GCC-PHAT function and the random delay, and the second reference function value corresponding to the GCC-PHAT function and the inverse of the random delay, and combine the first reference function value and the second reference function value. The delay corresponding to the maximum function value among the function values is determined as the initial delay, that is, the above-mentioned reference delay, and based on the initial delay and the inverse number of the initial delay, an interpolation operation is performed within the first delay range, The target delay is obtained. Here, the target delay is the delay that maximizes the function value of the GCC-PHAT function among the initial delay and the delay inserted by the interpolation operation.

For example, calculate and compare R(τ ₀ ) and R(-τ ₀ ), and take the maximum value of the corresponding R as τ ₁ (an example of the above-mentioned initial delay), and record the other as τ ₁ ' (the above-mentioned initial time delay). An example of the inverse of delay), and interpolate between [-d/c,d/c] based on τ ₁ and τ ₁ ' to find the delay that maximizes the function value of the GCC-PHAT function.

Through this embodiment, by randomly selecting the time delay as the initial time delay for interpolation processing, the convenience of the interpolation processing can be improved.

In an exemplary embodiment, based on the initial delay and the inverse of the initial delay, an interpolation operation is performed within the first delay range to obtain the target delay, including:

S71, perform the following interpolation steps in a loop until the loop stop condition is met, where the loop stop condition includes at least one of the following: the number of interpolation steps performed reaches a preset number of times (for example, 10 times), and the initial delay is within the preset delay Within the range (which can be based on the delay range determined a priori, that is, the delay range set based on the allowed activity range of the target object), the initial delay after the end of the cycle is the target delay:

Step 1: Determine a first delay, where the first delay is a delay inserted between the initial delay and the delay threshold.

In the first interpolation round of each round, the first delay can be inserted between the initial delay and the delay threshold (for example, between (τ ₁ ,d/c]). For example, the inserted delay τ ₂ (i.e., The first delay) is: τ ₂ =τ ₁ +α(τ ₁ -τ ₁ ′), where α is a value less than 1, for example, α=0.6, and α can also be other values.

Step 2: Determine the second delay when the first function value corresponding to the generalized cross-correlation phase transformation function and the first delay is greater than the first reference function value, where the second delay is between the initial delay and the first delay. The delay inserted between delays.

After determining the first delay, a first function value corresponding to the GCC-PHAT function and the first delay may be determined. If the first function value is greater than the first reference function value, a second delay may continue to be inserted between the initial delay and the first delay. For example, calculate R(τ ₂ ), if R(τ ₂ )>R(τ ₁ ), insert the time delay τ ₂₁ (that is, the second time delay), which is: τ ₂₁ =τ ₁ +β(τ ₂ - τ ₁ ), where β is a value less than 1, for example, β = 0.8. At this time, the inserted value is between (τ ₁ , τ ₂ ), and β can also be other values.

Step 3: When the second function value corresponding to the generalized cross-correlation phase transformation function and the second delay is greater than the first function value, use the second delay to update the initial delay to obtain an updated initial delay.

After determining the second delay, a second function value corresponding to the GCC-PHAT function and the second delay may be determined. If the second function value is greater than the first function value, the second delay can be used to update the initial delay to obtain an updated initial delay.

Step 4: When the second function value is smaller than the first function value, use the first delay to update the initial delay to obtain the updated initial delay.

If the second function value is smaller than the first function value, the first delay can be used to update the initial delay to obtain an updated initial delay. For example, if R(τ ₂₁ )>R(τ ₂ ), then τ _opt =τ ₂₁ , otherwise, τ _opt =τ ₂ , where τ _opt is the updated initial delay.

Step 5: When the first function value is smaller than the first reference function value and larger than the second reference function value, determine the third delay, where the third delay is between the first delay and the delay threshold. Insertion delay.

If the first function value is smaller than the first reference function value and larger than the second reference function value, a third delay may be inserted between the first delay and the delay threshold. For example, R(τ ₁ ′)<R(τ ₂ )<R(τ ₁ ), then insert the time delay τ ₃₁ (that is, the third time delay), which is: τ ₃₁ =τ ₁ +γ(τ ₂ -τ ₁ ), here, γ is a value greater than 1, for example, γ=1.2. At this time, the inserted value is between (τ ₂ , d/c], and γ can also be other values.

Step 6: When the third function value corresponding to the generalized cross-correlation phase transformation function and the third delay is greater than the first function value, use the third delay to update the initial delay to obtain an updated initial delay.

After determining the third delay, a third function value corresponding to the GCC-PHAT function and the third delay may be determined. If the third function value is greater than the first function value, the third delay can be used to update the initial delay to obtain the updated initial delay. For example, calculate R(τ ₃₁ ), if R(τ ₃₁ )>R(τ ₂ ), then τ _opt =τ ₃₁ .

Step 7: When the third function value is smaller than the first function value, determine the fourth delay, and use the fourth delay to update the initial delay to obtain the updated initial delay, where the fourth delay is the delay inserted between zero and the inverse of the initial delay.

If the third function value is smaller than the first function value, you can continue to insert the fourth delay between zero and the opposite number of the initial delay, and use the fourth delay to update the initial delay to obtain the updated initial delay. . For example, if R(τ ₃₁ )<R(τ ₂ ), insert time delay τ ₃₂ (that is, the third time delay), it is: τ ₃₂ =τ ₁ ′+μ(τ ₁ -τ ₁ ′), where, μ is a value less than 1, for example, μ=0.125. At this time, the inserted value is between (τ ₁ ′, 0). In this case, τ _opt =τ ₃₂ .

Step 8: When the first function value is less than the second reference function value, determine the fifth delay, where the fifth delay is the time inserted between the inverse of the delay threshold and the inverse of the initial delay. extension.

If the first function value is smaller than the second reference function value, a fifth delay may be inserted between the inverse of the delay threshold and the inverse of the initial delay. For example, if R(τ ₂ )<R(τ ₁ ′), then insert the time delay τ ₄₁ (ie, the fifth time delay), which is: τ ₄₁ =τ ₁ +γ(τ ₁ ′-τ ₁ ), insert τ ₄₁ is between [d/c,τ ₁ ′).

Step 9: When the fourth function value corresponding to the generalized cross-correlation phase transformation function and the fifth time delay is greater than the first reference function value, use the fifth time delay to update the initial time delay to obtain the updated initial time delay. .

After the fifth delay is determined, a fourth function value corresponding to the GCC-PHAT function and the fifth delay can be determined. If the fourth function value is greater than the first reference function value, the fifth delay can be used to update the initial delay to obtain an updated initial delay. For example, calculate R(τ ₄₁ ), if R(τ ₄₁ )>R(τ ₁ ), then τ _opt =τ ₄₁ .

Step 10: When the fourth function value is less than the first reference function value, determine the sixth time delay, and use the sixth time delay to update the initial time delay to obtain the updated initial time delay, where the sixth time delay Delay is the delay inserted between zero and the inverse of the initial delay.

If the fourth function value is smaller than the first reference function value, you can continue to insert a sixth delay between zero and the inverse of the initial delay, and use the sixth delay to update the initial delay to obtain the updated initial delay. extension. For example, if R(τ ₄₁ )<R(τ ₁ ), then insert time delay τ ₄₂ (ie, the sixth time delay), which is: τ ₄₂ =τ ₁ ′+μ(τ ₁ -τ ₁ ′). At this time, τ _opt =τ ₄₂ .

After updating the initial delay, you can jump out of this loop. If the loop stop condition is not met, you can use the updated initial execution to re-execute the interpolation step. For example, you can set τ ₁ =τ _opt , τ ₁ ′ = -τ _opt , repeat the above operation; if the loop stop condition is met, the updated initial delay can be determined as the target delay, and the process of re-executing the interpolation step using the updated initial execution is similar to the above steps 1 to 10, and will not be repeated here. To elaborate.

Through this embodiment, by interpolating values in different intervals and determining whether to continue interpolation or update the initial delay based on the relationship between the function value corresponding to the interpolation and the determined function value, the rationality of the interpolation can be improved and the final result can be avoided. Accuracy of optimal delay determination.

In an exemplary embodiment, in this embodiment, the target delay may be determined in a manner similar to that in the foregoing embodiment. For example, the first to sixth delays may be determined in a manner similar to that in the foregoing embodiment. . In order to improve the accuracy of target delay determination, the following parameters can be selected when determining the first delay to the sixth delay: α=0.6, β=0.8, γ=1.2, μ=0.125. Correspondingly, based on the initial delay and the inverse of the initial delay, an interpolation operation is performed within the first delay range to obtain the target delay, including:

S81, perform the following interpolation steps in a loop until the loop stop condition is met, where the loop stop condition includes at least one of the following: the number of interpolation steps performed reaches a preset number, the initial delay is within the preset delay range, and after the loop ends The initial delay is the target delay:

Determine the first delay, where the first delay is the sum of the difference between the initial delay and the inverse of the initial delay multiplied by 0.6 and the initial delay (for example, τ ₂ =τ ₁ +0.6× (τ ₁ -τ ₁ ′));

When the first function value corresponding to the generalized cross-correlation phase transformation function and the first delay is greater than the first reference function value, the second delay is determined, where the second delay is the sum of the first delay and the initial delay. The sum of the difference multiplied by 0.8 and the initial delay (for example, τ ₂₁ =τ ₁ +0.8×(τ ₂ -τ ₁ ));

When the second function value corresponding to the generalized cross-correlation phase transformation function and the second delay is greater than the first function value, use the second delay to update the initial delay to obtain the updated initial delay;

When the second function value is smaller than the first function value, the first delay is used to update the initial delay to obtain the updated initial delay;

When the first function value is smaller than the first reference function value and larger than the second reference function value, determine the third delay, where the third delay is the difference between the first delay and the initial delay multiplied by 1.2 The sum of the obtained value and the initial delay (for example, τ ₃₁ =τ ₁ +1.2×(τ ₂ -τ ₁ ));

When the third function value corresponding to the generalized cross-correlation phase transformation function and the third delay is greater than the first function value, use the third delay to update the initial delay to obtain the updated initial delay;

When the third function value is smaller than the first function value, determine the fourth delay, and use the fourth delay to update the initial delay to obtain the updated initial delay, where the fourth delay is the initial delay. The sum of the difference between the delay and the inverse of the initial delay multiplied by 0.125 and the inverse of the initial delay (for example, τ ₃₂ =τ ₁ ′+0.125×(τ ₁ -τ ₁ ′));

When the first function value is less than the second reference function value, the fifth delay is determined, where the fifth delay is the difference between the inverse of the initial delay and the initial delay multiplied by 1.2 and the value obtained by the initial delay. The sum of time delays (for example, τ ₄₁ =τ ₁ +1.2×(τ ₁ ′-τ ₁ ));

When the fourth function value corresponding to the generalized cross-correlation phase transformation function and the fifth delay is greater than the first reference function value, use the fifth delay to update the initial delay to obtain the updated initial delay;

When the fourth function value is less than the first reference function value, determine the sixth delay, and use the sixth delay to update the initial delay to obtain the updated initial delay, where the sixth delay is the initial The sum of the difference between the delay and the inverse of the initial delay multiplied by 0.125 and the inverse of the initial delay (for example, τ ₄₂ =τ ₁ ′+0.125×(τ ₁ -τ ₁ ′)).

Through this embodiment, by setting reasonable delay insertion related parameters, the accuracy of target delay determination can be improved, and the accuracy of device control can be improved.

In an exemplary embodiment, determining the target device from the first device and the second device according to the target delay includes:

S71, when the target delay is positive, determine the second device as the target device;

S72: When the target delay is negative, determine the first device as the target device.

In this embodiment, the smart device to be controlled may be selected from the first device and the second device based on the positive or negative of the target delay. If the target delay is positive, the operation execution signal reaches the first device later than the second device, that is, the operation execution signal reaches the second device earlier, and the second device is closer to the target object. Therefore, it can Determine the second device as the target device.

Similarly, if the target delay is negative, the operation execution signal reaches the first device earlier than the second device, that is, the operation execution signal reaches the first device earlier, and the first device is closer to the target object. Therefore, the first device can be determined as the target device.

Through this embodiment, the smart device to be controlled is selected based on the positive and negative delay of arrival of two devices, which can improve the rationality of device control.

The control method of the smart device in the embodiment of the present disclosure is explained below in combination with optional examples. In this example, the operation execution signal is that the first device is a first home appliance, the second device is a second home appliance, and the operation execution signal is a wake-up signal.

This optional example provides a solution for determining nearby wake-up of multiple devices. In nearby wake-up, the signal between devices is used to estimate TDOA (Time Difference of Arrival) (that is, by solving two The cross-correlation function between device signals is used for discrimination), and the reverberation factor is considered. The cross-correlation function can be used to suppress the reverberation and improve the robustness of the TDOA estimation method to the reverberation factor. At the same time, the method of distinguishing two devices can be extended to multiple devices. The two devices are first judged, and the better device is selected, and then the next device is judged, and so on.

As shown in Figure 3, the process of the control method of the smart device in this optional example may include the following steps:

Step S302: Select two channels from the microphone arrays of the first home appliance and the second home appliance, obtain two channels of sound signals, and calculate corresponding reverberation gain values. Here, the two sound signals obtained are the same as the wake-up signals collected by the microphones of each channel.

Step S304: Select a channel in the microphone array of the first home appliance and the second home appliance to obtain two sound signals. After data calibration, calculate their mutual power spectrum. The calculated cross power spectrum is the cross power spectrum taking into account the reverberation gain value.

Step S306: Calculate the GCC_PHAT function of the first home appliance and the second home appliance based on the cross power spectrum, and determine the optimal delay through interpolation calculation.

The way to find the optimal TDOA is a nonlinear single-variable optimization, which is a statistical optimization problem. In addition to the interpolation algorithm, particle swarm optimization, maximum likelihood estimation, Markov Monte Carlo method, etc. can also be used, but not limited to interpolation algorithm.

Step S308: By judging the positive and negative of the optimal delay, it is determined which device the sound source is closer to, and the device closest to the sound source is awakened, and the device closest to the sound source is awakened. The sound source here is the sound source of the wake-up signal.

Through this optional example, the cross-correlation function between devices is used to find the optimal time delay TDOA. By judging the positive and negative of TDOA, the device with a closer sound source is obtained. Compared with the energy-based discrimination method, it has better Anti-noise; at the same time, the reverberation factor is taken into account when calculating the cross-correlation function, which also has good robustness in a large reverberation environment; in addition, the simplified interpolation calculation method is used to find the global optimal TDOA, and the calculation The volume is greatly reduced and the accuracy is improved.

It should be noted that for the sake of simple description, the foregoing method embodiments are expressed as a series of action combinations. However, those skilled in the art should know that the present disclosure is not limited by the described action sequence. Because in accordance with the present disclosure, certain steps may be performed in other orders or simultaneously. Secondly, those skilled in the art should also know that the embodiments described in the specification are preferred embodiments, and the actions and modules involved are not necessarily necessary for the present disclosure.

Through the description of the above embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is Better implementation. Based on this understanding, the technical solution of the present disclosure can be embodied in the form of a software product in essence or that contributes to the existing technology. The computer software product is stored in a storage medium (such as ROM (Read-Only Memory, Read-only memory)/RAM (Random Access Memory, disk, optical disk), including a number of instructions to make a terminal device (can be a mobile phone, computer, server, or network device, etc.) to execute this Methods described in various embodiments are disclosed.

According to another aspect of an embodiment of the present disclosure, a control device for an intelligent device for implementing the above method for controlling an intelligent device is also provided. Figure 4 is a structural block diagram of an optional intelligent device control device according to an embodiment of the present disclosure. As shown in Figure 4, the device may include:

The acquisition unit 402 is configured to acquire the first sound signal received by the first device and the second sound signal received by the second device, wherein the first sound signal and the second sound signal are operations related to the target object. Execute the sound signal corresponding to the signal;

The first determination unit 404 is connected to the acquisition unit 402 and is configured to determine the target cross power spectrum of the first sound signal and the second sound signal according to the first sound signal and the second sound signal;

The second determination unit 406 is connected to the first determination unit 404 and is configured to determine the target delay between the first device and the second device according to the target cross power spectrum, wherein the target delay The time difference between when the operation execution signal reaches the first device and when it reaches the second device;

The execution unit 408 is connected to the second determination unit 406 and is configured to determine a target device from the first device and the second device according to the target delay, and control the target device to execute the operation execution signal The indicated device operates in which the target device is the device closest to the target object.

It should be noted that the acquisition unit 402 in this embodiment can be configured to perform the above step S202, the first determination unit 404 in this embodiment can be configured to perform the above step S204, and the second determination unit 406 in this embodiment can Set to execute the above step S206, the execution unit 408 in this embodiment may be set to execute the above step S208.

Through the above module, the first sound signal received by the first device and the second sound signal received by the second device are obtained, wherein the first sound signal and the second sound signal are sound signals corresponding to the operation execution signal sent by the target object. ; According to the first sound signal and the second sound signal, determine the target cross power spectrum of the first sound signal and the second sound signal; according to the target cross power spectrum, determine the target delay between the first device and the second device, where , the target delay is the time difference between the arrival of the operation execution signal at the first device and the arrival at the second device; according to the target delay, the target device is determined from the first device and the second device, and the target device is controlled to execute the instruction of the operation execution signal Device operation, in which the target device is the device closest to the target object, solves the problem of the control method of smart devices in related technologies due to the presence of noise and reverberation in scenarios with low signal-to-noise ratio and high reverberation. The problem of poor accuracy of equipment control improves the accuracy of equipment control.

In an exemplary embodiment, the first determining unit includes:

The first acquisition module is configured to acquire the initial cross power spectrum of the first sound signal and the second sound signal;

The second acquisition module is configured to acquire the target reverberation gain value corresponding to the first device and the second device, where the target reverberation gain value is a gain value used to suppress reverberation noise;

The update module is set to use the target reverberation gain value to update the initial cross power spectrum to obtain the target cross power spectrum.

In an exemplary embodiment, the second acquisition module includes:

The acquisition submodule is configured to acquire multiple sound signals received by each of the first device and the second device, where the multiple sound signals are sound signals corresponding to the operation execution signal;

The first determination sub-module is configured to determine the coherence functions of multiple sound signals and obtain the target coherence function;

The estimation submodule is configured to estimate the reverberation suppression coefficient of each device based on the target coherence function to obtain the target reverberation suppression coefficient;

The second determination sub-module is configured to determine the maximum value of the minimum reverberation gain value and the reverberation gain value corresponding to the target reverberation suppression coefficient as the reverberation gain value of each device.

In an exemplary embodiment, the acquisition sub-module includes:

The acquisition subunit is configured to acquire two sound signals received by each device, where the two sound signals are sound signals received by different microphones of the same microphone array of each device and corresponding to the operation execution signal.

In an exemplary embodiment, the target reverberation gain value includes a first reverberation gain value corresponding to the first device and a second reverberation gain value corresponding to the second device; the update module includes:

The first execution sub-module is configured to perform a multiplication operation on the product of the first reverberation gain value and the first frequency domain signal, and the product of the second reverberation gain value and the conjugate of the second frequency domain signal, to obtain the reverberation reference Information, wherein the first frequency domain signal is a frequency domain signal corresponding to the first sound signal, and the second frequency domain signal is a frequency domain signal corresponding to the second sound signal;

The second execution submodule is configured to perform a weighted sum operation on the initial cross power spectrum and the reverberation reference information to obtain the target cross power spectrum.

In an exemplary embodiment, the second determining unit includes:

The first determination module is configured to determine the generalized cross-correlation phase transformation function corresponding to the first device and the second device according to the target cross power spectrum, wherein the generalized cross-correlation phase transformation function is based on the relationship between the first device and the second device. The delay is a function of variables;

The search module is configured to search for the delay that maximizes the function value of the generalized cross-correlation phase transformation function within the first delay range, and obtain the target delay, wherein the first delay range is the inverse of the delay threshold and the time delay. The delay threshold is the interval between the endpoints, and the delay threshold is the value obtained by dividing the distance between the first device and the second device by the speed of sound.

In an exemplary embodiment, the search module includes:

Select the sub-module and set it to randomly select the delay within the second delay range to obtain the random delay, where the second delay range is the interval with zero and the delay threshold as endpoints;

The third determination submodule is configured to determine the first parameter function value corresponding to the generalized cross-correlation phase transformation function and the random delay, and the second reference function value corresponding to the generalized cross-correlation phase transformation function and the inverse number of the random delay;

The fourth determination sub-module is configured to determine the delay corresponding to the maximum function value among the first reference function value and the second reference function value as the initial delay;

The third execution sub-module is set to perform an interpolation operation within the first delay range based on the initial delay and the inverse of the initial delay to obtain the target delay, where the target delay is the initial delay and the interpolation operation inserted Among the time delays, the time delay that maximizes the function value of the generalized cross-correlation phase transformation function.

In an exemplary embodiment, the third execution sub-module includes:

The first execution subunit is configured to perform the following interpolation steps cyclically until the loop stop condition is met, where the loop stop condition includes at least one of the following: the number of interpolation steps executed reaches the preset number, and the initial delay is within the preset delay Within the range, the initial delay after the end of the loop is the target delay:

Determine the first delay, where the first delay is the delay inserted between the initial delay and the delay threshold;

When the first function value corresponding to the generalized cross-correlation phase transformation function and the first delay is greater than the first reference function value, the second delay is determined, wherein the second delay is the sum of the initial delay and the first delay. The delay inserted between;

When the first function value is smaller than the first reference function value and larger than the second reference function value, a third delay is determined, where the third delay is a time inserted between the first delay and the delay threshold. extend;

When the third function value is smaller than the first function value, determine the fourth delay, and use the fourth delay to update the initial delay to obtain the updated initial delay, where the fourth delay is zero and The delay inserted between the opposite numbers of the initial delay;

When the first function value is less than the second reference function value, determine a fifth delay, where the fifth delay is a delay inserted between the inverse of the delay threshold and the inverse of the initial delay;

When the fourth function value is less than the first reference function value, determine the sixth delay, and use the sixth delay to update the initial delay to obtain the updated initial delay, where the sixth delay is Delay inserted between zero and the inverse of the initial delay.

In an exemplary embodiment, the third execution sub-module includes:

The second execution subunit is configured to execute the following interpolation steps cyclically until the loop stop condition is met, where the loop stop condition includes at least one of the following: the number of interpolation steps executed reaches the preset number, and the initial delay is within the preset delay Within the range, the initial delay after the end of the loop is the target delay:

Determine the first delay, where the first delay is the sum of the difference between the initial delay and the inverse of the initial delay multiplied by 0.6 and the initial delay;

When the first function value corresponding to the generalized cross-correlation phase transformation function and the first delay is greater than the first reference function value, the second delay is determined, where the second delay is the sum of the first delay and the initial delay. The sum of the value obtained by multiplying the difference by 0.8 and the initial delay;

When the first function value is smaller than the first reference function value and larger than the second reference function value, determine the third delay, where the third delay is the difference between the first delay and the initial delay multiplied by 1.2 The sum of the obtained value and the initial delay;

When the third function value is smaller than the first function value, determine the fourth delay, and use the fourth delay to update the initial delay to obtain the updated initial delay, where the fourth delay is the initial delay. The sum of the difference between the delay and the inverse of the initial delay multiplied by 0.125 and the inverse of the initial delay;

When the first function value is less than the second reference function value, the fifth delay is determined, where the fifth delay is the difference between the inverse of the initial delay and the initial delay multiplied by 1.2 and the value obtained by the initial delay. sum of delays;

When the fourth function value is less than the first reference function value, determine the sixth delay, and use the sixth delay to update the initial delay to obtain the updated initial delay, where the sixth delay is the initial The sum of the difference between the delay and the inverse of the initial delay multiplied by 0.125 and the inverse of the initial delay.

In an exemplary embodiment, execution units include:

The second determination module is configured to determine the second device as the target device when the target delay is positive;

The third determination module is configured to determine the first device as the target device when the target delay is negative.

It should be noted here that the examples and application scenarios implemented by the above modules and corresponding steps are the same, but are not limited to the contents disclosed in the above embodiments. It should be noted that the above module, as part of the device, can run in the hardware environment as shown in Figure 1, and can be implemented by software or hardware, where the hardware environment includes a network environment.

According to yet another aspect of the embodiments of the present disclosure, a storage medium is also provided. Optionally, in this embodiment, the above-mentioned storage medium can be used to execute the program code of any of the above-mentioned control methods of the smart device in the embodiment of the present disclosure.

Optionally, in this embodiment, the above storage medium may be located on at least one network device among multiple network devices in the network shown in the above embodiment.

Optionally, in this embodiment, the storage medium is configured to store program codes for performing the following steps:

S1, obtain the first sound signal received by the first device and the second sound signal received by the second device, where the first sound signal and the second sound signal are sound signals corresponding to the operation execution signal sent by the target object;

S2, determine the target cross power spectrum of the first sound signal and the second sound signal according to the first sound signal and the second sound signal;

S3. Determine the target delay between the first device and the second device according to the target cross power spectrum, where the target delay is the time difference between the operation execution signal arriving at the first device and the second device;

S4. According to the target delay, determine the target device from the first device and the second device, and control the target device to perform the device operation indicated by the operation execution signal, where the target device is the device closest to the target object.

Optionally, for specific examples in this embodiment, reference may be made to the examples described in the above embodiments, which will not be described again in this embodiment.

Optionally, in this embodiment, the above-mentioned storage medium may include but is not limited to: U disk, ROM, RAM, mobile hard disk, magnetic disk or optical disk and other various media that can store program codes.

According to yet another aspect of the embodiments of the present disclosure, an electronic device for implementing the above control method of an intelligent device is also provided. The electronic device may be a server, a terminal, or a combination thereof.

Figure 5 is a structural block diagram of an optional electronic device according to an embodiment of the present disclosure. As shown in Figure 5, it includes a processor 502, a communication interface 504, a memory 506 and a communication bus 508. The processor 502, the communication interface 504 and memory 506 complete communication with each other through communication bus 508, where,

Memory 506 configured to store computer programs;

When the processor 502 is configured to execute the computer program stored on the memory 506, it implements the following steps:

Optionally, the communication bus may be a PCI (Peripheral Component Interconnect, Peripheral Component Interconnect Standard) bus, or an EISA (Extended Industry Standard Architecture, Extended Industrial Standard Architecture) bus, etc. The communication bus can be divided into address bus, data bus, control bus, etc. For ease of presentation, only one thick line is used in Figure 5, but it does not mean that there is only one bus or one type of bus. The communication interface is used for communication between the above-mentioned electronic device and other equipment.

The memory may include RAM or non-volatile memory, such as at least one disk memory. Optionally, the memory may also be at least one storage device located remotely from the aforementioned processor.

As an example, the memory 506 may include, but is not limited to, the acquisition unit 402, the first determination unit 404, the second determination 406 and the execution unit 408 in the control device of the smart device. In addition, it may also include but is not limited to other modular units in the control device of the above-mentioned smart device, which will not be described again in this example.

The above-mentioned processor can be a general-purpose processor, which can include but is not limited to: CPU (Central Processing Unit, central processing unit), NP (Network Processor, network processor), etc.; it can also be a DSP (Digital Signal Processing, digital signal processor) ), ASIC (Application Specific Integrated Circuit), FPGA (Field-Programmable Gate Array, field programmable gate array) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

Those of ordinary skill in the art can understand that the structure shown in Figure 5 is only illustrative. The device that implements the above control method for smart devices can be a terminal device, and the terminal device can be a smart phone (such as an Android phone, iOS phone, etc.), a tablet Computers, handheld computers, and mobile Internet devices (Mobile Internet Devices, MID), PAD and other terminal equipment. FIG. 5 does not limit the structure of the above-mentioned electronic device. For example, the electronic device may also include more or fewer components (such as network interfaces, display devices, etc.) than shown in FIG. 5 , or have a different configuration than that shown in FIG. 5 .

Those of ordinary skill in the art can understand that all or part of the steps in the various methods of the above embodiments can be completed by instructing the hardware related to the terminal device through a program. The program can be stored in a computer-readable storage medium, and the storage medium can Including: flash disk, ROM, RAM, magnetic disk or optical disk, etc.

The above serial numbers of the embodiments of the present disclosure are only for description and do not represent the advantages and disadvantages of the embodiments.

If the integrated units in the above embodiments are implemented in the form of software functional units and sold or used as independent products, they can be stored in the above computer-readable storage medium. Based on this understanding, the technical solution of the present disclosure is essentially or contributes to the existing technology, or all or part of the technical solution can be embodied in the form of a software product, and the computer software product is stored in a storage medium, It includes several instructions to cause one or more computer devices (which can be personal computers, servers or network devices, etc.) to execute all or part of the steps of the methods described in various embodiments of the present disclosure.

In the above-mentioned embodiments of the present disclosure, each embodiment is described with its own emphasis. For parts that are not described in detail in a certain embodiment, please refer to the relevant descriptions of other embodiments.

In the several embodiments provided by this disclosure, it should be understood that the disclosed client can be implemented in other ways. Among them, the device embodiments described above are only illustrative. For example, the division of the units is only a logical function division. In actual implementation, there may be other division methods. For example, multiple units or components may be combined or may be Integrated into another system, or some features can be ignored, or not implemented. On the other hand, the coupling or direct coupling or communication connection between each other shown or discussed may be through some interfaces, and the indirect coupling or communication connection of the units or modules may be in electrical or other forms.

The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution provided in this embodiment.

In addition, each functional unit in various embodiments of the present disclosure may be integrated into one processing unit, or each unit may exist physically alone, or two or more units may be integrated into one unit. The above integrated units can be implemented in the form of hardware or software functional units.

The above are only preferred embodiments of the present disclosure. It should be pointed out that for those of ordinary skill in the art, several improvements and modifications can be made without departing from the principles of the present disclosure. These improvements and modifications can also be made. should be regarded as the scope of protection of this disclosure.

Claims

A control method for smart devices, including:

Obtain the first sound signal received by the first device and the second sound signal received by the second device, wherein the first sound signal and the second sound signal are sound signals corresponding to the operation execution signal issued by the target object. ;

Determine target cross power spectra of the first sound signal and the second sound signal according to the first sound signal and the second sound signal;

According to the target cross power spectrum, a target delay between the first device and the second device is determined, wherein the target delay is the time when the operation execution signal reaches the first device and when it reaches the The time difference of the second device;

According to the target delay, a target device is determined from the first device and the second device, and the target device is controlled to perform the device operation indicated by the operation execution signal, wherein the target device is at a distance The nearest device to the target object.
The method according to claim 1, wherein determining the target cross power spectrum of the first sound signal and the second sound signal according to the first sound signal and the second sound signal includes:

Obtain the initial cross power spectrum of the first sound signal and the second sound signal;

Obtain a target reverberation gain value corresponding to the first device and the second device, wherein the target reverberation gain value is a gain value used to suppress reverberation noise;

The initial cross power spectrum is updated using the target reverberation gain value to obtain the target cross power spectrum.
The method according to claim 2, wherein said obtaining the target reverberation gain value corresponding to the first device and the second device includes:

Obtain a plurality of sound signals received by each of the first device and the second device, wherein the plurality of sound signals are sound signals corresponding to the operation execution signal;

Determine coherence functions of the plurality of sound signals to obtain a target coherence function;

Estimate the reverberation suppression coefficient of each device according to the target coherence function to obtain the target reverberation suppression coefficient;

The maximum value of the minimum reverberation gain value and the reverberation gain value corresponding to the target reverberation suppression coefficient is determined as the reverberation gain value of each device.
The method of claim 3, wherein the obtaining a plurality of sound signals received by each of the first device and the second device includes:

Two sound signals received by each device are obtained, wherein the two sound signals are sound signals received by different microphones of the same microphone array of each device and corresponding to the operation execution signal.
The method of claim 2, wherein the target reverberation gain value includes a first reverberation gain value corresponding to the first device and a second reverberation gain value corresponding to the second device; The step of using the target reverberation gain value to update the initial cross power spectrum to obtain the target cross power spectrum includes:

Perform a multiplication operation on the product of the first reverberation gain value and the first frequency domain signal, and the product of the second reverberation gain value and the conjugate of the second frequency domain signal to obtain reverberation reference information, where, The first frequency domain signal is a frequency domain signal corresponding to the first sound signal, and the second frequency domain signal is a frequency domain signal corresponding to the second sound signal;

A weighted sum operation is performed on the initial cross power spectrum and the reverberation reference information to obtain the target cross power spectrum.
The method according to claim 1, wherein determining the target delay between the first device and the second device according to the target cross power spectrum includes:

According to the target cross power spectrum, a generalized cross-correlation phase transformation function corresponding to the first device and the second device is determined, wherein the generalized cross-correlation phase transformation function is based on the first device and the second device. The delay between the second device is a function of the variable;

Find the delay that maximizes the function value of the generalized cross-correlation phase transformation function within the first delay range to obtain the target delay, wherein the first delay range is the sum of the inverse of the delay threshold The delay threshold is an interval of endpoints, and the delay threshold is a value obtained by dividing the distance between the first device and the second device by the speed of sound.
The method according to claim 6, wherein said finding the delay within the first delay range that maximizes the function value of the generalized cross-correlation phase transformation function to obtain the target delay includes:

Randomly select a delay within a second delay range to obtain a random delay, wherein the second delay range is an interval with zero and the delay threshold as endpoints;

Determine a first parameter function value corresponding to the generalized cross-correlation phase transformation function and the random time delay, and a second reference function value corresponding to the inverse number of the generalized cross-correlation phase transformation function and the random time delay;

Determine the delay corresponding to the maximum function value among the first reference function value and the second reference function value as the initial delay;

Based on the initial delay and the inverse of the initial delay, an interpolation operation is performed within the first delay range to obtain the target delay, where the target delay is the initial delay and Among the delays inserted by the interpolation operation, the delay that maximizes the function value of the generalized cross-correlation phase transformation function.
The method according to claim 7, wherein the interpolation operation is performed within the first delay range based on the initial delay and the inverse number of the initial delay to obtain the target delay, including :

The following interpolation steps are executed cyclically until a loop stop condition is met, wherein the loop stop condition includes at least one of the following: the number of interpolation steps performed reaches a preset number, and the initial delay is within a preset delay range. , the initial delay after the end of the cycle is the target delay:

Determine a first delay, wherein the first delay is a delay inserted between the initial delay and the delay threshold;

When the first function value corresponding to the generalized cross-correlation phase transformation function and the first delay is greater than the first reference function value, a second delay is determined, wherein the second delay is The delay inserted between the initial delay and the first delay;

When the second function value corresponding to the generalized cross-correlation phase transformation function and the second delay is greater than the first function value, the second delay is used to update the initial delay, and we obtain The updated initial delay;

When the second function value is less than the first function value, use the first delay to update the initial delay to obtain the updated initial delay;

When the first function value is less than the first reference function value and greater than the second reference function value, a third time delay is determined, wherein the third time delay is at the first time The delay inserted between the delay and the delay threshold;

When the third function value corresponding to the generalized cross-correlation phase transformation function and the third time delay is greater than the first function value, the third time delay is used to update the initial time delay, and we obtain The updated initial delay;

When the third function value is less than the first function value, determine a fourth delay, and use the fourth delay to update the initial delay to obtain the updated initial delay. , wherein the fourth delay is the delay inserted between zero and the inverse of the initial delay;

When the first function value is less than the second reference function value, a fifth delay is determined, wherein the fifth delay is the inverse of the delay threshold and the initial delay. The delay inserted between opposite numbers;

When the fourth function value corresponding to the generalized cross-correlation phase transformation function and the fifth delay is greater than the first reference function value, the fifth delay is used to update the initial delay, Obtain the updated initial delay;

When the fourth function value is less than the first reference function value, determine a sixth delay, and use the sixth delay to update the initial delay to obtain the updated initial delay. delay, wherein the sixth delay is a delay inserted between zero and the inverse of the initial delay.
The method according to claim 7, wherein the interpolation operation is performed within the first delay range based on the initial delay and the inverse number of the initial delay to obtain the target delay, including :

The following interpolation steps are executed cyclically until a loop stop condition is met, wherein the loop stop condition includes at least one of the following: the number of interpolation steps performed reaches a preset number, and the initial delay is within a preset delay range. , the initial delay after the end of the cycle is the target delay:

Determine the first delay, wherein the first delay is the sum of the difference between the initial delay and the inverse of the initial delay multiplied by 0.6 and the initial delay;

When the first function value corresponding to the generalized cross-correlation phase transformation function and the first delay is greater than the first reference function value, a second delay is determined, wherein the second delay is The sum of the difference between the first delay and the initial delay multiplied by 0.8 and the initial delay;

When the second function value corresponding to the generalized cross-correlation phase transformation function and the second delay is greater than the first function value, the second delay is used to update the initial delay, and we obtain The updated initial delay;

When the second function value is less than the first function value, use the first delay to update the initial delay to obtain the updated initial delay;

When the first function value is smaller than the first reference function value and larger than the second reference function value, a third delay is determined, wherein the third delay is the first delay The sum of the value obtained by multiplying the difference from the initial delay by 1.2 and the initial delay;

When the third function value corresponding to the generalized cross-correlation phase transformation function and the third time delay is greater than the first function value, the third time delay is used to update the initial time delay, and we obtain The updated initial delay;

When the third function value is less than the first function value, determine a fourth delay, and use the fourth delay to update the initial delay to obtain the updated initial delay. , wherein the fourth delay is the sum of the difference between the initial delay and the inverse of the initial delay multiplied by 0.125 and the inverse of the initial delay;

When the first function value is less than the second reference function value, a fifth time delay is determined, wherein the fifth time delay is the difference between the inverse of the initial time delay and the initial time delay. The sum of the value obtained by multiplying the value by 1.2 and the initial delay;

When the fourth function value corresponding to the generalized cross-correlation phase transformation function and the fifth delay is greater than the first reference function value, the fifth delay is used to update the initial delay, Obtain the updated initial delay;

When the fourth function value is less than the first reference function value, determine a sixth delay, and use the sixth delay to update the initial delay to obtain the updated initial delay. delay, wherein the sixth delay is the sum of the difference between the initial delay and the inverse of the initial delay multiplied by 0.125 and the inverse of the initial delay.
The method according to any one of claims 1 to 9, wherein determining the target device from the first device and the second device according to the target delay includes:

When the target delay is positive, determine the second device as the target device;

When the target delay is negative, the first device is determined as the target device.
A computer-readable storage medium includes a stored program, wherein the method of any one of claims 1 to 10 is executed when the program is run.
An electronic device includes a memory and a processor, a computer program is stored in the memory, and the processor is configured to execute the method according to any one of claims 1 to 10 through the computer program.