WO2018188287A1 - 一种语音控制方法、装置及家电设备 - Google Patents

一种语音控制方法、装置及家电设备 Download PDF

Info

Publication number
WO2018188287A1
WO2018188287A1 PCT/CN2017/104905 CN2017104905W WO2018188287A1 WO 2018188287 A1 WO2018188287 A1 WO 2018188287A1 CN 2017104905 W CN2017104905 W CN 2017104905W WO 2018188287 A1 WO2018188287 A1 WO 2018188287A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
information
voice information
target
control
Prior art date
Application number
PCT/CN2017/104905
Other languages
English (en)
French (fr)
Inventor
张新健
Original Assignee
广东美的制冷设备有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Priority claimed from CN201710233667.XA external-priority patent/CN107123421A/zh
Priority claimed from CN201710482779.9A external-priority patent/CN107202385B/zh
Priority claimed from CN201710493300.1A external-priority patent/CN107271963A/zh
Application filed by 广东美的制冷设备有限公司 filed Critical 广东美的制冷设备有限公司
Publication of WO2018188287A1 publication Critical patent/WO2018188287A1/zh

Links

Images

Classifications

    • AHUMAN NECESSITIES
    • A01AGRICULTURE; FORESTRY; ANIMAL HUSBANDRY; HUNTING; TRAPPING; FISHING
    • A01MCATCHING, TRAPPING OR SCARING OF ANIMALS; APPARATUS FOR THE DESTRUCTION OF NOXIOUS ANIMALS OR NOXIOUS PLANTS
    • A01M29/00Scaring or repelling devices, e.g. bird-scaring apparatus
    • A01M29/16Scaring or repelling devices, e.g. bird-scaring apparatus using sound waves
    • A01M29/18Scaring or repelling devices, e.g. bird-scaring apparatus using sound waves using ultrasonic signals
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/20Position of source determined by a plurality of spaced direction-finders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue

Definitions

  • the present invention relates to the field of smart homes, and in particular, to a voice control method, device and home appliance.
  • Household appliances such as air conditioners and electric fans, as living appliances, are increasingly appearing in people's daily lives.
  • Most of the existing control of home appliances is through a matching remote controller or by installing an application program in the mobile terminal. Users can't get rid of the dependence on external control devices, and the experience is poor.
  • the embodiments of the present invention provide a voice control method, device, and home appliance to implement intelligent control of a home appliance, and the user can control the home appliance only by issuing voice information, thereby relieving the user's dependence on the remote controller. .
  • a first aspect of the present invention provides a voice control method, including:
  • the home appliance is controlled to operate in accordance with the target control command and the position information.
  • the voice control method of the embodiment of the invention collects the voice information sent by the sound source, identifies the voice information, determines the corresponding target control command, obtains the position information of the sound source relative to the microphone array, and controls the home appliance to follow the target control command and position. Information is running. Thereby, intelligent control of the home appliance can be realized, and the user can control the home appliance only by issuing voice information, thereby relieving the user's dependence on the remote controller and improving the user experience.
  • the second aspect of the present invention provides a voice control apparatus, including:
  • a microphone array for collecting voice information from a sound source
  • a voice recognition module configured to identify the voice information, and determine a target control instruction corresponding to the voice information
  • a positioning module configured to acquire location information of the sound source relative to the microphone array
  • control module configured to control the home appliance to operate according to the target control instruction and the location information.
  • the voice control device of the embodiment of the invention collects the voice information sent by the sound source, identifies the voice information, determines the corresponding target control command, obtains the position information of the sound source relative to the microphone array, and controls the home appliance to follow the target control command and position. Information is running. Thereby, intelligent control of the home appliance can be realized, and the user can control the home appliance only by issuing voice information, thereby relieving the user's dependence on the remote controller and improving the user experience.
  • a third aspect of the present invention provides a home appliance device comprising: the voice control device as described above.
  • the household electrical appliance of the embodiment of the present invention feeds voice information by collecting sound information
  • the line identification determines a corresponding target control command, acquires position information of the sound source relative to the microphone array, and controls the home appliance to operate according to the target control command and the position information.
  • intelligent control of the home appliance can be realized, and the user can control the home appliance only by issuing voice information, thereby relieving the user's dependence on the remote controller and improving the user experience.
  • a fourth aspect of the present invention provides a computer program product for performing a voice control method as described in the first aspect when instructions in the computer program product are executed by a processor.
  • the fifth aspect of the present invention provides a computer readable storage medium having stored thereon a computer program capable of implementing the voice control method as described in the first aspect when the computer program is executed by the processor.
  • FIG. 1 is a schematic flowchart of a voice control method according to an embodiment of the present invention
  • FIG. 2 is a schematic flowchart of a voice control method according to another embodiment of the present invention.
  • FIG. 3 is a schematic flowchart of a voice control method according to another embodiment of the present invention.
  • FIG. 4 is a schematic flowchart of a voice control method according to another embodiment of the present invention.
  • Figure 5 is a schematic diagram of a signal model and an array model
  • FIG. 6 is a schematic flowchart diagram of a voice control method according to another embodiment of the present invention.
  • FIG. 7 is a schematic diagram of a scene for acquiring sound source position information according to an example of the present invention.
  • FIG. 8 is a schematic diagram of coordinates of acquiring sound source position information according to an example of the present invention.
  • FIG. 9 is a flow chart showing the acquisition of sound source position information according to an example of the present invention.
  • FIG. 10 is a schematic flowchart diagram of a voice control method according to another embodiment of the present invention.
  • FIG. 11 is a schematic structural diagram of a voice control apparatus according to an embodiment of the present invention.
  • FIG. 12 is a schematic structural diagram of a household electrical appliance according to an embodiment of the present invention.
  • FIG. 13 is a system architecture diagram of a household electrical appliance according to an embodiment of the present invention.
  • FIG. 1 is a schematic flowchart diagram of a voice control method according to an embodiment of the present invention.
  • the voice control method includes the following steps:
  • the household electrical appliance may include, but is not limited to, an air conditioner, a fan, and the like.
  • the home appliance can collect voice information sent by the sound source (user) based on the built-in microphone array.
  • the microphone array may be a linear array, a circular array, a spherical array, or the like. Any arbitrary topology.
  • the microphone array may be a linear microphone array, which is composed of a plurality of microphones that are independent of each other and have the same characteristics, and all the microphones are on the same straight line, the orientation of each microphone is the same, and the intervals of any two adjacent microphones are the same. , are preset distances.
  • S12 Identify the voice information, and determine a target control instruction corresponding to the voice information.
  • the voice information sent by the sound source is collected through the microphone array, the voice information is identified, and the target control instruction corresponding to the voice information is determined.
  • the target control instruction can be determined by querying the voice template library.
  • the speech recognition model may be utilized to determine the target control command. The specific process of determining the target control command in two ways will be given in the following content. To avoid redundancy, it will not be described in detail here.
  • the location information may be angle information and/or distance information of the sound source relative to the microphone array.
  • the position information of the sound source relative to the microphone array can also be obtained.
  • a beamforming algorithm may be used to obtain location information of the sound source.
  • the execution sequence of the step S12 and the step S13 is not in the order of the two, and the two can be executed at the same time, and can be executed in the following steps. It is not intended to limit the invention.
  • the control system of the home appliance can control the home appliance according to the target.
  • the control command and the position information are operated, that is, the home appliance is controlled to execute the target control command toward the position where the sound source is located.
  • the microphone array built in the air conditioner collects the voice information, and determines that the target control command corresponding to the voice information is “tuning the wind”, and the built-in processing of the air conditioner.
  • the position information of the sound source obtained by the device is at an angle of 30° to the left side of the air conditioner, and the control system built in the air conditioner controls the air conditioner to blow toward the left side 30°, and reduces the wind power of the air conditioner.
  • the voice control method of the embodiment by collecting voice information sent by the sound source, identifying the voice information to determine a corresponding target control command, obtaining position information of the sound source relative to the microphone array, and controlling the home appliance according to the target control command and the position information run.
  • intelligent control of the home appliance can be realized, and the user can control the home appliance only by issuing voice information, thereby relieving the user's dependence on the remote controller and improving the user experience.
  • the embodiment of the present invention determines the target control instruction by identifying the voice information.
  • One of the methods is to determine the target control command by querying the voice template library.
  • the implementation process of the method includes: Obtaining the target content of the target voice information, determining whether the preset voice template library has a quasi-control instruction corresponding to the target content, and if there is a quasi-control instruction in the voice template library, the quasi-control command is used as the target control instruction, thereby
  • another voice control method provided by an embodiment of the present invention may include the following steps:
  • the home appliance may first perform analog-to-digital conversion on the voice information, and convert the analog voice information into digital voice information for use in Subsequent speech processing.
  • a high-precision analog-to-digital conversion processing chip can perform high-speed analog-to-digital conversion on the multi-channel analog voice information collected by the microphone array to obtain digital voice information.
  • the interference information includes but is not limited to noise information and echo information.
  • the obtained multi-channel digital voice information may be further processed to eliminate interference information in the multi-channel digital voice information, such as
  • the multi-channel digital voice information is subjected to noise cancellation processing and echo cancellation processing to obtain target voice information.
  • the digital audio processing chip may be used to process the obtained multi-channel digital voice information to eliminate interference information and obtain target voice information.
  • the target content in the target voice information may be further acquired.
  • the target content can be acquired by using a related speech recognition technology.
  • the target content is some commonly used control commands, including but not limited to small, high, closed, open, reduced, reduced, and the like.
  • the quasi-control command is a control instruction that matches the target content by a preset threshold.
  • the identifiable control command may be stored in the home appliance in advance, and after receiving the control command issued by the user, the control command is matched with the pre-stored control command to perform the matched control. Instructions to achieve control of home appliances.
  • the voice template library may be preset, and the voice template library stores a plurality of control commands that can be recognized and executed by the home appliance, and the control effect achieved by executing each control command is completely complete. different. After the target content in the target voice information is acquired, the target content is matched with all control commands in the voice template library to determine whether there is a quasi-control instruction corresponding to the target content in the voice template library.
  • the control instruction is Precise control instructions.
  • voice template library and the preset threshold in this embodiment may be set by the manufacturer before the home appliance is shipped from the factory.
  • step S26 if it is determined that there is a quasi-control command corresponding to the target content in the voice template library, step S26 is performed; if it is determined that there is no quasi-control command corresponding to the target content in the voice template library, step S29 is performed.
  • the quasi-control command is used as the target control command.
  • the quasi-control command is used as the target control control command for controlling the home appliance.
  • the target voice information is processed by using a preset beamforming algorithm to obtain location information.
  • the target voice information may be processed by using a preset beamforming algorithm to obtain location information of the sound source.
  • the beamforming algorithm is a controllable algorithm based on the maximum output power.
  • the basic idea is that the signals collected by each array element are weighted and summed to form a beam.
  • the possible position of the sound source is searched to guide the beam, and the weight is modified to make the microphone.
  • the output signal power of the array is the largest.
  • the beamforming algorithm can be used in both the time domain and the frequency domain, and has strong applicability.
  • step S27 and steps S24-S26 is in no particular order, and the bit is acquired.
  • the step of setting the information and the step of acquiring the target control instruction may be performed simultaneously or sequentially. This embodiment is only explained by the step S27 after the step S26, and is not intended to limit the present invention.
  • the home appliance after acquiring the target control instruction corresponding to the position information of the sound source and the voice information sent by the sound source, the home appliance can be controlled to operate according to the target control command and the position information.
  • the acquired location information is further output to the home appliance to control the home appliance to operate according to the target control command and the location information. That is to say, whether or not the target control command is recognized as a quasi-control command can be used as a condition to determine whether to output the position information to the home appliance.
  • the target control command is recognized as the quasi-control command
  • the position information is output to the home appliance, otherwise, no operation is performed.
  • the device By outputting position information to the home appliance when the target control command is recognized as the quasi-control command to control the home appliance to operate according to the target control command and the position information, if no operation is performed when not recognized, the accuracy of the language control can be improved, and the home appliance can be avoided.
  • the device affects the user experience by executing unnecessary control commands.
  • the home appliance when it is determined that there is no quasi-control instruction corresponding to the target content in the voice template library, the home appliance does not perform any operation, that is, does not respond to the voice information sent by the sound source.
  • the voice control method in this embodiment obtains digital voice information by performing analog-to-digital conversion on the voice information collected by the microphone array, and eliminates interference information in the digital voice information to obtain target voice information, thereby eliminating interference of the interference information to the target voice information, and improving The accuracy of subsequent processing.
  • the target speech information is processed by using a preset beamforming algorithm to obtain location information, and the target content of the target speech information is obtained, and whether the preset speech template library has a quasi-control instruction corresponding to the target content is determined.
  • the quasi-control command is used as the target control command, and the acquired position information is output to the home appliance, and the home appliance is controlled to operate according to the target control command and the position information.
  • no quasi-control command no operation is performed. Improve the accuracy of language control and prevent home appliances from affecting the user experience by executing unnecessary control commands.
  • steps S24-S26 can be replaced with the following steps:
  • the speech recognition model can be obtained through training, and the speech recognition model can be used to obtain a control command that can be recognized and executed by the home appliance.
  • pre-defined control commands issued by different people using Mandarin can be recorded to obtain the recorded samples as training samples, and then the training model is used to obtain the speech recognition model based on the neural network model.
  • the pre-defined control commands issued by dialects in different regions can be recorded to obtain recording samples with different accents as training samples, and the training samples are utilized based on the neural network model. Training to obtain a speech recognition model.
  • the target voice information is input into the pre-trained voice recognition model, and the voice recognition model is determined by analyzing the target voice information. Whether the target voice information is a quasi-control command. If the target voice information is input into the voice recognition model and the control command that can be recognized and executed by the home appliance is obtained, the target voice information is regarded as a quasi-control command; otherwise, the target voice information is not a quasi-control command, and no operation is performed.
  • the quasi-control command is a control command corresponding to the target voice information in the control command that can be recognized and executed by the home appliance.
  • the quasi-control command is used as the target control command.
  • the voice recognition model recognizes that the target voice information is a quasi-control command
  • the quasi-control command is used as the target control command.
  • the voice control method of the embodiment can improve the language control by inputting the target voice information into the voice recognition model, and when the voice recognition model recognizes that the target voice information is a quasi-control command, and using the quasi-control command as the target control command. Accuracy.
  • FIG. 4 is a schematic flowchart diagram of a voice control algorithm according to another embodiment of the present invention.
  • step S27 may include the following steps:
  • the reference microphone is a microphone in the microphone array.
  • the signal received by each microphone is different due to the difference in sound path.
  • the received information needs to be artificially compensated.
  • the relative time difference obtained by obtaining the signals received by the respective microphones can be calculated for compensating for obtaining the same signal as the reference microphone.
  • the microphone array is composed of M microphones, respectively denoted as m 1 , m 2 , . . . , m M , and the intervals between adjacent microphones are equal, both are d.
  • There is at least one sound source in the space which is a narrow-band sound source or a single-frequency signal sound source.
  • the sound source is located far away from the microphone, so that the sound wave sent from the sound source to the microphone can be approximated as a plane wave, so that each microphone
  • the direction angle of the received plane wave is the same as the direction of arrival, which is recorded as
  • the time when the other microphones receive the voice information has a certain delay or overdue with respect to m 1 , so that a certain phase difference is generated.
  • the time delay t i at which the i-th microphone receives the speech information with respect to m 1 can be calculated according to the formula (1), and the formula (1) is expressed as follows:
  • is the wavelength, that is, d satisfies the "half-wavelength" principle.
  • the relative time difference of the target voice information in each microphone in the microphone array relative to the reference microphone can also be calculated by the formula (1).
  • S272. Process target voice information of each microphone according to a relative time difference and a preset weighted correlation function.
  • the target voice information of each microphone is processed according to a preset weighted correlation function, and the processed each is processed.
  • the road target voice information can be expressed as w i x i (tt i ).
  • S273 Sum the target voice information of each processed path to form an output power of one beam information and beam information.
  • the beam information can be expressed as a vector form as shown in the formula (3).
  • x(t) is a matrix composed of speech information x i (t) received by the microphone array
  • W is a matrix including a weighted correlation function w i and a delay value t i
  • W H represents a conjugate transpose of the matrix W
  • the output power of the beam information when calculating the output power of the beam information, it is necessary to introduce a snapshot model. Assuming that each microphone receives N snapshot data, and N ⁇ , the output power of the beam information can be calculated by Equation (4).
  • x(n) is a matrix composed of voice information received by the microphone array.
  • the beam with the largest output power is obtained, which is the target beam signal.
  • W a(k')
  • S275 Search for a spatial point corresponding to the target beam signal, and use the location information of the spatial point as the location information of the sound source.
  • the spatial point corresponding to the target beam signal is searched, and the position information of the searched spatial point is the position information of the sound source.
  • the relative time difference of the target voice information in each microphone in the microphone array relative to the reference microphone is obtained, and the target voice information is processed according to the relative time difference and the preset weighted correlation function, and then summed up and formed.
  • the output power of the beam information and the beam information is obtained by adjusting the weighted value of the weighted correlation function to obtain the target beam signal with the largest output power, and searching for the spatial point corresponding to the target beam signal as the position information of the sound source, which can improve the accuracy of the position information recognition.
  • step S27 can be replaced with the following steps:
  • the preset parameter is set according to a signal to noise ratio of the voice information.
  • the microphone array can include M microphones, as shown in FIG.
  • the microphone array includes four microphones as an example. Each microphone can receive corresponding voice information, and the relative positions of the microphones are predetermined, and the microphones are not related to each other.
  • the voice information received by the microphone array includes a voice signal and a noise signal.
  • the voice information received by the microphone array may include a voice signal input by the user, and an environmental noise signal, Reverberation noise signal, etc.
  • the voice information received by the microphone array may include a voice signal input by the user, and an environmental noise signal, Reverberation noise signal, etc.
  • the preset parameter is introduced when calculating the relative time difference of the voice information reaching any two microphones.
  • the preset parameter can be set according to the signal to noise ratio of the voice information, and when the signal to noise ratio is within a certain range, the preset parameter can be positively correlated with the signal to noise ratio, that is, the larger the signal to noise ratio, the preset parameter value. The bigger.
  • acquiring first voice information and second voice information of the first microphone and the second microphone among the M microphones performing Fourier transform on the first voice information and the second voice information to generate a first Fourier transform value and a first
  • the two Fourier transform values generate a relative time difference according to the first Fourier transform value, the second Fourier transform value, and the preset parameter.
  • the first microphone and the second microphone are any two microphones among the M microphones.
  • the first microphone is the microphone 1
  • the second microphone is the microphone 2.
  • the first voice information x 1 (t) received by the microphone 1 and the second voice information x 2 (t) received by the microphone 2 are acquired, for the first voice information x 1 (t) and
  • the second speech information x 2 (t) is Fourier transformed to generate a first Fourier transform value X 1 ( ⁇ ) and a second Fourier transform value X 2 ( ⁇ ), and the relative time difference can be generated by the following formula (5):
  • the peak position of R 12 ( ⁇ ) is a relative time difference
  • ⁇ 12 ( ⁇ ) is a generalized cross-correlation weighting function
  • G 12 ( ⁇ ) is a cross-power spectrum between the first Fourier transform value and the second Fourier transform value
  • ⁇ 12 ( ⁇ ) is a generalized cross-correlation spectrum
  • G 12 ( ⁇ ) X 1 ( ⁇ ) X 2 ( ⁇ ).
  • the ideal model of the speech information x 1 (t), x 2 (t) received by the microphone 1 and the microphone 2 with the distance d (without considering the reverberation noise) is as shown in the formula (6):
  • n ir (t) is the reverb noise signal received by microphone i.
  • N ie ( ⁇ ) is the windowed Fourier transform of the ambient noise signal n ie (t)
  • S i ( ⁇ ) is the Fourier transform value of the sound source signal received by the microphone i.
  • Equation (12) Equation (12)
  • a preset parameter ⁇ 2 is introduced in the generalized cross-correlation weighting function used by the conventional CSP algorithm.
  • the generalized cross-correlation weighting function is expressed by the formula (13):
  • 0.707 ⁇ ⁇ ⁇ 1, ⁇ 2 varies by the signal-to-noise ratio, and ⁇ 2 satisfies the following equation (14):
  • represents the signal-to-noise ratio
  • ⁇ 0 , ⁇ 1 , ⁇ 0 , and ⁇ 1 are constants determined according to actual conditions, and ⁇ 1 > ⁇ 0 .
  • the preset parameter ⁇ 2 that changes with the signal-to-noise ratio is introduced, which can be generated by the vocal multi-path reflection noise, the sound generated by the operation of the device itself, and other devices in the indoor environment.
  • the reverberation sound composed of noise and the like has strong resistance, achieves better ability to cope with noise, and improves the calculation accuracy of the relative time difference (sound path difference) of the speech information reaching the two microphones, thereby facilitating the improvement of the sound source localization.
  • Accuracy helps voice recognition control of air conditioners and fan-type home appliances.
  • the voice information can be located by the following formula (15):
  • ⁇ i is the relative time difference between the arrival of the voice information and any two microphones of the M microphones, that is, the peak value of R 12 ( ⁇ ) in the equation (5)
  • m i1 and m i2 are respectively between any two microphones
  • the position vector, s represents the sound source position vector
  • c is the speed of sound in the current medium. For example, at 1 standard atmosphere and 15 ° C, the sound propagates in the air at a speed of 340 m/s.
  • the three-dimensional spatial geometry of any two microphones and sound source locations in the microphone array is as shown in FIG. 8.
  • the microphone 1 and the microphone 2 are on the x-axis, and the midpoint of the line is the origin.
  • the time difference (ie, the difference in sound path) from the source to the two microphones is ⁇ i .
  • the spherical coordinate of the sound source S is (r, ⁇ , ⁇ ), and the sound source, the microphone 1 and the microphone 2 are converted into a Cartesian coordinate system:
  • equation (16) can be approximated as:
  • the angle ⁇ can be approximated.
  • the cone of the ⁇ angle can be used to indicate the possible position of the sound source. Therefore, as long as the sound path difference ⁇ i can be obtained, the direction angle of the sound source to the midpoint of the connection of the arbitrary two microphones can be approximated. That is to say, the possible positional surface of a sound source can be obtained by two microphones. Further, through an array comprising M microphones, a plurality of faces of possible positions of the sound source can be obtained, and the focus of these faces is also the position of the sound source.
  • the obtained sound source positions are often not able to all intersect at one point, so as long as the position closest to the distance of several faces is found, it is the estimated sound source position.
  • the voice information sent by the sound source may also be subjected to short-time Fourier transform to generate a plurality of audio frequency domain values, thereby using a plurality of audio frequency domain values.
  • the maximum value and/or the minimum value are compared with the threshold value to determine whether the voice information is a valid voice signal, and if it is a valid voice signal, the amplitude spectrum of the voice information is subtracted from the noise amplitude spectrum, wherein the threshold value can be The first threshold value and/or the second threshold value are included, and the first threshold value is less than the second threshold value.
  • ⁇ of the plurality of audio frequency domain values is less than or equal to the first threshold value threshold1, that is, max 1 ⁇ k ⁇ fLen ⁇
  • ⁇ of the plurality of audio frequency domain values is greater than or equal to the second threshold value threshold2, that is, min 1 ⁇ k ⁇ fLen ⁇
  • ⁇ of the plurality of audio frequency domain values is greater than the first threshold value threshold1, that is, max 1 ⁇ k ⁇ fLen ⁇
  • ⁇ of the plurality of audio frequency domain values is smaller than the second threshold value threshold2, that is, Min 1 ⁇ k ⁇ fLen ⁇
  • the voice information corresponding to the audio frequency domain value exceeding the threshold is not a valid voice signal.
  • the threshold value may be set in advance according to experience, or may be determined by a specific environment. For example, when the user performs voice control on a home appliance such as an air conditioner or a fan, the sound frequency is generally 200 to 1000 Hz. In this case, the first threshold value can be set to 200 Hz, and the second threshold value is 1000 Hz.
  • the value of the amplitude spectrum of the noise signal is updated so that the noise amplitude spectrum is always kept the nearest noise offset; if the voice information is determined to be a valid voice signal, The amplitude spectrum of the received speech information is subtracted from the noise amplitude spectrum in the frequency domain, that is, the most Near noise simulates the current noise.
  • the corresponding updated noise amplitude spectrum is the amplitude spectrum of the first frame of voice information
  • the corresponding noise amplitude spectrum is The amplitude spectrum of the first frame of speech information, at which time the amplitude spectrum of the second frame of speech information is subtracted from the amplitude spectrum of the first frame of speech information
  • the third frame of speech information is a noisy signal
  • the updated noise amplitude spectrum is the third frame.
  • the amplitude spectrum of the voice information if the fourth frame voice information is a noise signal, the noise amplitude spectrum is updated to be the amplitude spectrum of the fourth frame voice information; if the fifth frame voice information is a valid voice signal, the corresponding noise amplitude spectrum is The amplitude spectrum of the four frames of speech information, at which time the amplitude spectrum of the fifth frame of speech information is subtracted from the amplitude spectrum of the fourth frame of speech information, and so on. Therefore, the adaptive environment is realized, and the background noise can be well removed under different noise environments, and the amplitude spectrum of the voice information after noise reduction is obtained.
  • a preset parameter is introduced, that is, the voice information and the preset parameters collected by the microphone array are used to obtain the voice information arrival.
  • the relative time difference between any two of the plurality of microphones, and then the voice information is located according to the relative time difference of any two microphones and the positions of the two microphones. Therefore, it can effectively reduce the environmental noise, and has strong adaptability to the reverberation and sound diffraction noise in the far field environment, achieving double noise reduction effect, and greatly improving the far field based on the array microphone.
  • the accuracy of sound source identification makes the practicality of far field sound source recognition greatly enhanced.
  • the location information of the sound source may be obtained by using various methods such as GPS positioning and sound source localization based on the subspace, and is not limited herein.
  • the home appliance can be controlled to operate according to the control command and the position information.
  • the control instruction may be a mosquito repellent instruction, and the following combination diagram 10. The voice control method provided by the embodiment of the present invention is further described.
  • FIG. 10 is a schematic flowchart diagram of a voice control method according to another embodiment of the present invention.
  • the voice control method includes:
  • Step 501 Acquire voice information sent by the sound source based on the microphone array.
  • the microphone array comprises M microphones, wherein M is a positive integer greater than one.
  • Step 502 Identify the voice information, and determine that the target control instruction corresponding to the voice information is a mosquito repellent instruction.
  • step 503 a response message is returned to the user.
  • Step 504 Acquire location information of the sound source relative to the microphone array.
  • Step 505 transmitting mosquito repellent sound waves according to the mosquito repellent instruction and the position information.
  • the vibration generated by the ultrasonic wave in the air can pass through the tentacles of the mosquito head, and the mosquito's auditory nerve feels uncomfortable, so that the mosquito seeks to avoid the sound wave region.
  • the mosquitoes rely on the wings to flutter and fly, the wings vibrate to cause air to tremble, and the ultrasonic vibration caused by the ultrasonic waves in the air will aggravate the air's vibration, which makes the air resistance of the mosquitoes increase, the muscle burden increases, and it is difficult to endure. escape.
  • the principle of ultrasonic mosquito repellent can be used to transmit the drive to the location where the user is located. Mosquito sound waves to drive away mosquitoes in the user's location.
  • the mosquito repellent sound wave refers to a sound wave having a certain frequency or a certain frequency, which can drive the mosquito out of the sound wave region by stimulating the nervous system and the muscle system of the mosquito.
  • the frequency of the mosquito repellent sound wave can be Set the frequency range outside the range of the human ear.
  • mosquito repellent sound waves can be The frequency range is set to be greater than 24KHz to avoid the user being affected by mosquito repellent sound waves.
  • the frequency of the mosquito repellent sound wave may be set to be constantly changed within a preset range. That is, the step 505 may specifically include: adjusting the frequency of the mosquito repellent sound wave at a preset adjustment frequency, thereby continuously changing the frequency of the mosquito repellent sound wave.
  • a control steering module and an ultrasonic transmitting module composed of a clock pulse generator, a charging adjustment circuit, a multivibrator, and a speaker or a buzzer may be disposed in the household electrical appliance.
  • the control steering module can rotate the driving motor or the motor to cause the ultrasonic transmitting module to steer, so that the sound wave is emitted toward the general orientation of the user, so that the ultrasonic transmitting module can be located at the user's location.
  • Fire mosquito repellent sound waves By adjusting the frequency of the clock pulse, the frequency of the mosquito repellent sound wave can be adjusted to prevent the mosquito from adapting and immunity to the fixed frequency mosquito repellent sound wave.
  • step 502 and step 504 may also be performed simultaneously, that is, after the voice information sent by the sound source is collected based on the microphone array, on the one hand, the corresponding target control instruction is determined according to the voice information. Whether it is a mosquito repellent instruction, on the one hand, determines the location information of the user based on the voice information. Then, only when the target control command corresponding to the voice information is the mosquito repellent command, the determined position information is sent to the control steering module, so that the ultrasonic transmitting module transmits the mosquito repellent sound wave to the position where the user is located. By simultaneously determining the target control command corresponding to the voice information and the position information of the sound source according to the collected voice information, the efficiency of sound wave mosquito repellent is improved.
  • the target control command corresponding to the voice information is determined as the mosquito repellent instruction. After that, a response message can also be returned to the user to prompt the user to successfully repel the operation.
  • the sound wave repellent function of the control device can be turned off by voice to reduce energy consumption.
  • the present application also proposes a voice control device.
  • FIG. 11 is a schematic structural diagram of a voice control apparatus according to an embodiment of the present invention.
  • the voice control device 60 includes:
  • the microphone array 610 The microphone array 610, the voice recognition module 620, the positioning module 630, and the control module 640. among them,
  • the microphone array 610 is configured to collect voice information sent by the sound source.
  • the voice recognition module 620 is configured to identify the voice information, and determine a target control instruction corresponding to the voice information.
  • the voice recognition module 620 may further include:
  • the model identifying unit is configured to input the target voice information into the voice recognition model to determine whether the target voice information is a quasi-control command.
  • the second setting unit is configured to use the quasi-control command as the target control instruction if the target voice information is recognized as the quasi-control command.
  • the positioning module 630 is configured to obtain location information of the sound source relative to the microphone array.
  • the positioning module 630 is configured to process the target voice information by using a preset beamforming algorithm to obtain location information.
  • the positioning module 630 may include:
  • the first time delay acquisition unit is configured to obtain a delay value of the target voice information in each microphone in the microphone array relative to the reference microphone, wherein the reference microphone is a microphone in the microphone array.
  • the processing unit is configured to process the target voice information of each microphone according to the delay value and the preset weighted correlation function.
  • the beam form unit is configured to sum the target voice information of each processed path to form an output power of one beam information and beam information.
  • the adjusting unit is configured to adjust the weighting value of the enhanced correlation function to obtain a target beam signal with the largest output power.
  • the search unit is configured to search for a spatial point corresponding to the target beam signal, and use the position information of the spatial point as the position information of the sound source.
  • the positioning module 630 may include:
  • a second time delay acquisition unit configured to acquire, according to the voice information and the preset parameter, a relative time difference of the voice information reaching any two of the M microphones, where the signal to noise ratio is set according to the voice information Determining the preset parameter;
  • a positioning unit configured to locate the voice information according to a relative time difference of the voice information reaching the two microphones and positions of the two microphones.
  • the second time delay obtaining unit is specifically configured to:
  • the relative time difference of the voice information reaching any two of the M microphones is generated by the following formula:
  • the peak position of R 12 ( ⁇ ) is a relative time difference
  • ⁇ 12 ( ⁇ ) is a generalized cross-correlation weighting function
  • G 12 ( ⁇ ) is a cross-power spectrum between the first Fourier transform value and the second Fourier transform value
  • G 12 ( ⁇ ) X 1 ( ⁇ )X 2 ( ⁇ )
  • X 1 ( ⁇ ), X 2 ( ⁇ ) are respectively for the first speech information x 1 (t) and the second speech information x 2 (t)
  • the first Fourier transform value and the second Fourier transform value generated by the Fourier transform are performed.
  • the second time delay acquisition unit is further configured to determine a generalized cross-correlation weighting function ⁇ 12 ( ⁇ ) by the following formula:
  • ⁇ 2 is the preset parameter
  • represents the signal-to-noise ratio
  • ⁇ 0 , ⁇ 1 , ⁇ 0 , and ⁇ 1 are preset constants, and ⁇ 1 > ⁇ 0 .
  • the positioning module is further configured to obtain location information of the sound source relative to the microphone array by using the following formula:
  • ⁇ i is a relative time difference between the voice information reaching any two of the M microphones
  • m i1 , m i2 are respectively position vectors between the two microphones
  • s represents a sound source position vector
  • c is the speed of sound under the current medium.
  • the control module 640 is configured to control the home appliance to operate according to the target control instruction and the location information.
  • control module 640 is configured to: after identifying that the target control command is a quasi-control command, output location information to the home appliance to control the home appliance to operate according to the target control command and the location information.
  • the voice control device 60 may further include:
  • the conversion module is configured to perform analog-to-digital conversion on the voice information to obtain digital voice information before the voice recognition module 620 identifies the voice information and determines a control instruction corresponding to the voice information.
  • the interference cancellation module is configured to eliminate interference information in the digital voice information and obtain target voice information.
  • a transform module configured to perform short-time Fourier transform on the voice information to generate a plurality of audio frequency domain values
  • a comparison module configured to compare a maximum value and a minimum value of the plurality of audio frequency domain values with a threshold value to determine whether the voice information is a valid voice signal, where the threshold value includes a first door a limit value and a second threshold value, and the first threshold value is less than the second threshold value;
  • a determining module configured to determine that the voice information is a valid voice signal when a maximum value of the plurality of audio frequency domain values is greater than the first threshold value and a minimum value is less than the second threshold value, and Subtracting the amplitude spectrum of the speech information from the noise amplitude spectrum;
  • the maximum value of the plurality of audio frequency domain values is less than or equal to the first threshold value or the minimum value is greater than or equal to the second threshold value, determining that the voice information is a noise signal, and updating a noise amplitude Spectrum, so that the noise amplitude spectrum is the nearest noise amplitude spectrum.
  • the target control instruction may be a mosquito repellent instruction
  • the control module is specifically configured to:
  • the mosquito repellent sound wave is emitted according to the mosquito repellent instruction and the position information.
  • the voice control device 60 may further include: a sending module, configured to return a response message to the user.
  • control module is further configured to:
  • the frequency of the mosquito repellent sound wave is adjusted at a preset adjustment frequency.
  • the voice control device of the embodiment obtains the voice information sent by the sound source, identifies the voice information, determines the corresponding target control command, obtains the position information of the sound source relative to the microphone array, and controls the home appliance according to the target control command and the position information. run. Thereby, intelligent control of the home appliance can be realized, and the user can control the home appliance only by issuing voice information, thereby relieving the user's dependence on the remote controller and improving the user experience.
  • the present application also proposes a home appliance.
  • FIG. 12 is a schematic structural diagram of a household electrical appliance according to an embodiment of the present invention.
  • the household electrical appliance 120 includes the voice control device 60 as described in the foregoing embodiment.
  • the home appliance 120 may be any device such as an air conditioner or a fan.
  • the air conditioner or the fan type device when using a household appliance such as an air conditioner or a fan to repel mosquitoes, since the air conditioner or the fan type device is usually used in a bedroom, the room space is small, and therefore, it is better than in a wild environment. Mosquito repellent effect.
  • the air conditioner itself is a device that needs to blow out air. The air conditioner blows out the wind flow during use. As the wind and wind direction change, the indoor air oscillates, and at the same time, the ultrasonic emission module emits mosquito repellent sound waves, and the mosquito muscles are bigger. The system's stimulation has improved the effect of mosquito repellent.
  • voice control method embodiment is also applicable to the home appliance of the embodiment, and the implementation principle is similar, and details are not described herein again.
  • the home appliance may adopt the system architecture diagram shown in FIG.
  • the home appliance may include a voice broadcast subsystem 131, a voice recognition subsystem 132, a microphone array subsystem 123, a sound source localization subsystem 134, and a control subsystem 135.
  • the microphone array subsystem 133 can collect voice information, and on the one hand, send the voice information to the voice recognition subsystem 132 for voice recognition, and send the voice information to the sound source positioning subsystem 134 for sound source localization.
  • the control signal may be sent to the voiceprint positioning subsystem 134 on the one hand, so that the sound source localization subsystem 134 will locate the positioning result.
  • the instruction is prompted to cause the voice announcement subsystem 131 to prompt the user to operate successfully.
  • the sound source localization subsystem 134 according to the voice information collected by the microphone array subsystem 133, after signal processing, determines the sound source information, and if the control command output by the voice recognition subsystem 132 is received, the positioning result can be sent to the control. Subsystem 135.
  • the control subsystem 135 can control the home appliance to operate according to the target control command and the position information.
  • the control subsystem 135 can include an ultrasonic transmitting module and a control steering module, etc., and the steering module receives the positioning sent by the sound source positioning subsystem 134.
  • the ultrasonic transmitting module can be controlled to start or turn off the emitted mosquito repellent sound wave, and according to the positioning result sent by the sound source positioning subsystem 134, the motor or the motor is driven to rotate, so that the ultrasonic transmitting module is turned to realize the position to the user. Fire mosquito repellent sound waves.
  • the home appliance of the embodiment collects the voice information sent by the sound source, identifies the voice information, determines a corresponding target control command, obtains the position information of the sound source relative to the microphone array, and controls the home appliance to operate according to the target control command and the position information. .
  • intelligent control of the home appliance can be realized, and the user can control the home appliance only by issuing voice information, thereby relieving the user's dependence on the remote controller and improving the user experience.
  • the present invention also provides a computer program product that, when executed by a processor, executes a voice control method as described in the foregoing embodiments.
  • the present invention also provides a computer readable storage medium having stored thereon a computer program capable of implementing the voice control method as described in the foregoing embodiments when the computer program is executed by the processor.
  • the machine can be read into a storage medium, and when executed, the program can include the flow of an embodiment of the methods as described above.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Remote Sensing (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Birds (AREA)
  • Insects & Arthropods (AREA)
  • Pest Control & Pesticides (AREA)
  • Wood Science & Technology (AREA)
  • Zoology (AREA)
  • Environmental Sciences (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

一种语音控制方法、装置及家电设备,其中,方法包括:基于麦克风阵列采集声源所发出的语音信息(11);对语音信息进行识别,确定与语音信息对应的目标控制指令(12);获取声源相对于麦克风阵列的位置信息(13);控制家电设备按照目标控制指令和位置信息运行(14)。通过本方法,能够实现对家电设备的智能化控制,用户仅通过发出语音信息即可控制家电设备,解除用户对遥控器的依赖,提升用户体验。

Description

一种语音控制方法、装置及家电设备
相关申请的交叉引用
本申请要求广东美的制冷设备有限公司于2017年4月11日提交的、发明名称为“语音控制方法、装置及家电设备”的、中国专利申请号“201710233667.X”,和广东美的制冷设备有限公司于2017年6月22日提交的、发明名称为“声波驱蚊方法、装置及空调器”的、中国专利申请号“201710482779.9”,及广东美的制冷设备有限公司、美的集团股份有限公司于2017年6月22日提交的、发明名称为“声源定位的方法和装置及空调器”的、中国专利申请号“201710493300.1”的优先权。
技术领域
本发明涉及智能家居领域,尤其涉及一种语音控制方法、装置及家电设备。
背景技术
家电设备比如空调、电风扇等作为生活电器,越来越多地出现在人们的日常生活中,现有对家电设备的控制大多是通过配套的遥控器,或者通过在移动终端中安装应用程序实现的,用户无法摆脱对外部控制设备的依赖,体验感较差。
发明内容
有鉴于此,本发明实施例提供一种语音控制方法、装置及家电设备,以实现智能化控制家电设备,用户仅通过发出语音信息即可对家电设备进行控制,解除了用户对遥控器的依赖。
本发明第一方面实施例提出一种语音控制方法,包括:
基于麦克风阵列采集声源所发出的语音信息;
对所述语音信息进行识别,确定与所述语音信息对应的目标控制指令;
获取所述声源相对于所述麦克风阵列的位置信息;
控制家电设备按照所述目标控制指令和所述位置信息运行。
本发明实施例的语音控制方法,通过采集声源发出的语音信息,对语音信息进行识别确定对应的目标控制指令,获取声源相对于麦克风阵列的位置信息,控制家电设备按照目标控制指令和位置信息运行。由此,能够实现对家电设备的智能化控制,用户仅通过发出语音信息即可控制家电设备,解除了用户对遥控器的依赖,提升了用户体验。
本发明第二方面实施例提出了一种语音控制装置,包括:
麦克风阵列,用于采集声源所发出的语音信息;
语音识别模块,用于对所述语音信息进行识别,确定与所述语音信息对应的目标控制指令;
定位模块,用于获取所述声源相对于所述麦克风阵列的位置信息;
控制模块,用于控制家电设备按照所述目标控制指令和所述位置信息运行。
本发明实施例的语音控制装置,通过采集声源发出的语音信息,对语音信息进行识别确定对应的目标控制指令,获取声源相对于麦克风阵列的位置信息,控制家电设备按照目标控制指令和位置信息运行。由此,能够实现对家电设备的智能化控制,用户仅通过发出语音信息即可控制家电设备,解除了用户对遥控器的依赖,提升了用户体验。
本发明第三方面实施例提出了一种家电设备,包括:如上所述的语音控制装置。
本发明实施例的家电设备,通过采集声源发出的语音信息,对语音信息进 行识别确定对应的目标控制指令,获取声源相对于麦克风阵列的位置信息,控制家电设备按照目标控制指令和位置信息运行。由此,能够实现对家电设备的智能化控制,用户仅通过发出语音信息即可控制家电设备,解除了用户对遥控器的依赖,提升了用户体验。
本发明第四方面实施例提出了一种计算机程序产品,当所述计算机程序产品中的指令由处理器执行时,执行如第一方面实施例所述的语音控制方法。
本发明第五方面实施例提出了一种计算机可读存储介质,其上存储有计算机程序,当该计算机程序被处理器执行时能够实现如第一方面实施例所述的语音控制方法。
本发明附加的方面和优点将在下面的描述中部分给出,部分将从下面的描述中变得明显,或通过本发明的实践了解到。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其它的附图。
图1为本发明一实施例提供的语音控制方法的流程示意图;
图2为本发明另一实施例提供的语音控制方法的流程示意图;
图3为本发明又一实施例提供的语音控制方法的流程示意图;
图4为本发明又一实施例提供的语音控制方法的流程示意图;
图5为信号模型和阵列模型示意图;
图6为本发明又一实施例提供的语音控制方法的流程示意图;
图7为根据本发明一个示例的获取声源位置信息的场景示意图;
图8为根据本发明一个示例的获取声源位置信息的坐标示意图;
图9为根据本发明一个示例的获取声源位置信息的流程示意图;
图10为本发明又一实施例提供的语音控制方法的流程示意图;
图11为本发明一个实施例提供的语音控制装置的结构示意图;
图12为本发明一个实施例提供的家电设备的结构示意图;
图13为本发明一个实施例提供的家电设备的系统架构图。
具体实施方式
下面结合附图对本发明实施例进行详细描述。
应当明确,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其它实施例,都属于本发明保护的范围。
下面参考附图描述本发明实施例的语音控制方法、装置及家电设备。
图1为本发明一实施例提供的语音控制方法的流程示意图。
如图1所示,该语音控制方法包括以下步骤:
S11,基于麦克风阵列采集声源所发出的语音信息。
用户在使用家电设备时,若需要对家电设备进行控制,比如控制家电设备开启或关闭,或者控制家电设备改变运行方式,比如改变家电设备的档位、风向等,可以向所需控制的家电设备发出语音信息。本实施例中,家电设备可以包括但不限于空调、风扇等。
本实施例中,家电设备可以基于内置的麦克风阵列采集声源(用户)所发出的语音信息。其中,麦克风阵列,可以是线性阵列、圆形阵列、球形阵列等 等任意的拓扑结构。
本发明实施例中,麦克风阵列可以是直线麦克风阵列,由若干个相互独立且特性相同的麦克风组成,且所有麦克风处于同一直线上,每个麦克风的朝向相同,任意两个相邻麦克风的间隔相同,均为预设距离。
S12,对语音信息进行识别,确定与语音信息对应的目标控制指令。
本实施例中,通过麦克风阵列采集到声源发出的语音信息之后,对语音信息进行识别,确定与语音信息对应的目标控制指令。
本发明实施例中,对语音信息进行识别以确定对应的目标控制指令的方式有两种。作为一种可能的实现方式,可以通过查询语音模板库的方式确定目标控制指令。作为另一种可能的实现方式,可以利用语音识别模型确定目标控制指令。采用两种方式确定目标控制指令的具体过程将在后续内容中给出,为避免赘述,此处不作详细说明。
S13,获取声源相对于麦克风阵列的位置信息。
其中,位置信息可以是声源相对于麦克风阵列的角度信息和/或距离信息。
本实施例中,通过麦克风阵列采集到声源发出的语音信息之后,还可以获取声源相对于麦克风阵列的位置信息。作为一种可能的实现方式,可以采用波束形成算法获取声源的位置信息。
需要说明的是,本实施例中,步骤S12和步骤S13的执行顺序不分先后,二者可以同时执行,也可以先后执行,本实施例仅以步骤S13在步骤S12之后执行为例进行解释说明,而不能作为对本发明的限制。
S14,控制家电设备按照目标控制指令和位置信息运行。
本实施例中,确定了语音信息对应的目标控制指令,并获取了声源相对于麦克风阵列的位置信息之后,家电设备的控制系统即可控制家电设备按照目标 控制指令和位置信息运行,即控制家电设备朝向声源所在的位置执行目标控制指令。
以家电设备为空调为例。当用户发出“风力调小一点儿”的语音信息时,空调内置的麦克风阵列采集该语音信息,并确定出与该语音信息对应的目标控制指令为“调小风力”,另外,空调内置的处理装置获取的声源的位置信息为相对空调左侧30°夹角处,则空调内置的控制系统控制空调朝向左侧30°方向吹风,同时将空调出风的风力减小。
本实施例的语音控制方法,通过采集声源发出的语音信息,对语音信息进行识别确定对应的目标控制指令,获取声源相对于麦克风阵列的位置信息,控制家电设备按照目标控制指令和位置信息运行。由此,能够实现对家电设备的智能化控制,用户仅通过发出语音信息即可控制家电设备,解除了用户对遥控器的依赖,提升了用户体验。
为了确定语音信息对应的目标控制指令,需要对语音信息进行识别。如前文所述,本发明实施例通过对语音信息进行识别以确定目标控制指令的方式有两种,其中一种为通过查询语音模板库的方式确定目标控制指令,该方式的实现过程具体包括:获取目标语音信息的目标内容,判断预设的语音模板库是否存在与目标内容对应的准控制指令,如果语音模板库中存在准控制指令,则将准控制指令作为目标控制指令,从而,如图2所示,本发明实施例提供的另一种语音控制方法可以包括以下步骤:
S21,基于麦克风阵列采集声源所发出的语音信息。
S22,对语音信息进行模数转换得到数字语音信息。
本实施例中,家电设备通过麦克风阵列采集到声源发出的语音信息后,可以先对语音信息进行模数转换,将模拟语音信息转换为数字语音信息,以用于 后续的语音处理。作为一种可能的实现方式,可以采用高精度模数转换处理芯片对麦克风阵列采集的多路模拟语音信息进行高速模数转换,得到数字语音信息。
S23,消除数字语音信息中的干扰信息,得到目标语音信息。
其中,干扰信息包括但不限于噪声信息和回声信息。
本实施例中,对多路语音信息进行模数转换处理得到多路数字语音信息之后,还可以进一步对所得的多路数字语音信息进行处理,以消除多路数字语音信息中的干扰信息,比如对多路数字语音信息进行噪声消除处理和回声消除处理,得到目标语音信息。作为一种可能的实现方式,可以采用数字音频处理芯片对所得的多路数字语音信息进行处理,以消除干扰信息,得到目标语音信息。
S24,获取目标语音信息的目标内容。
本实施例中,在对麦克风阵列采集的语音信息进行一系列处理得到目标语音信息之后,还可以进一步获取目标语音信息中的目标内容。具体地,可以采用相关的语音识别技术获取目标内容。
其中,目标内容为一些常用的控制命令,包括但不限于调小、调高、关闭、开启、减小、降低等。
S25,判断预设的语音模板库是否存在与目标内容对应的准控制指令。
其中,准控制指令为与目标内容匹配度超过预设阈值的控制指令。
为了能够执行用户发出的控制命令,可以预先在家电设备中存储可以识别的控制指令,在接收到用户发出的控制命令后,将控制命令同预先存储的控制指令相匹配,以执行匹配到的控制指令,实现对家电设备的控制。
本实施例中,可以预先设置好语音模板库,语音模板库中存储有若干个能够被家电设备识别并执行的控制指令,执行每个控制指令实现的控制效果完全 不同。在获取了目标语音信息中的目标内容之后,将目标内容同语音模板库中的所有控制指令进行匹配,以判断语音模板库中是否存在与目标内容对应的准控制指令。
具体地,可以通过计算目标内容与语音模板库中控制指令的匹配度来判断是否存在准控制指令,当目标内容与控制指令的匹配度超过预设阈值(比如90%)时,该控制指令即为准控制指令。
需要说明的是,本实施例中的语音模板库和预设阈值可以在家电设备出厂前由生产商设置。
本实施例中,若判断得知语音模板库中存在与目标内容对应的准控制指令,执行步骤S26;若判断得知语音模板库中不存在与目标内容对应的准控制指令,执行步骤S29。
S26,将准控制指令作为目标控制指令。
本实施例中,当得知预设的语音模板库中存在与目标内容对应的准控制指令时,准控制指令被作为目标控制控制指令用于控制家电设备。
S27,采用预设的波束形成算法对目标语音信息进行处理得到位置信息。
本实施例中,在对麦克风阵列采集的语音信息进行一系列处理得到目标语音信息之后,还可以采用预设的波束形成算法对目标语音信息进行处理,得到声源的位置信息。
波束形成算法是基于最大输出功率的可控算法,其基本思想是,将各阵元采集来的信号进行加权求和形成波束,通过搜索声源的可能位置来引导该波束,修改权值使得麦克风阵列的输出信号功率最大。波束形成算法既能用于时域中,也可以用于频域中,具有较强的适用性。
需要说明的是,步骤S27与步骤S24~S26的执行顺序不分先后,获取位 置信息的步骤和获取目标控制指令的步骤可以同时进行,也可以先后进行,本实施例仅以步骤S27在步骤S26之后执行为例进行解释说明,而不能作为对本发明的限制。
S28,控制家电设备按照目标控制指令和位置信息运行。
本实施例中,获取了声源的位置信息和声源所发出的语音信息对应的目标控制指令之后,即可控制家电设备按照目标控制指令和位置信息运行。
具体地,当识别出目标控制指令为准控制指令之后,进一步向家电设备输出获取的位置信息,以控制家电设备按照目标控制指令和位置信息运行。也就是说,可以将是否识别出目标控制指令为准控制指令作为条件,以决定是否向家电设备输出位置信息。当识别出目标控制指令为准控制指令时,才向家电设备输出位置信息,否则,不进行任何操作。
通过在识别出目标控制指令为准控制指令时向家电设备输出位置信息以控制家电设备按照目标控制指令和位置信息运行,未识别出时不进行任何操作,能够提高语言控制的准确率,避免家电设备因执行不必要的控制指令而影响用户体验。
S29,不进行任何操作。
本实施例中,当判断得知语音模板库中不存在与目标内容对应的准控制指令时,家电设备不再进行任何操作,即不响应声源发出的语音信息。
本实施例的语音控制方法,通过对麦克风阵列采集的语音信息进行模数转换得到数字语音信息,消除数字语音信息中的干扰信息得到目标语音信息,能够消除干扰信息对目标语音信息的干扰,提高后续处理的准确率。通过采用预设的波束形成算法对目标语音信息进行处理得到位置信息,获取目标语音信息的目标内容,判断预设的语音模板库是否存在与目标内容对应的准控制指令, 当存在准控制指令时将准控制指令作为目标控制指令,并向家电设备输出获取的位置信息,控制家电设备按照目标控制指令和位置信息运行,当不存在准控制指令时不进行任何操作,能够提高语言控制的准确率,避免家电设备因执行不必要的控制指令而影响用户体验。
前文介绍了本发明实施例通过对语音信息进行识别以确定目标控制指令的两种实现方式中的一种,下面将详细介绍另一种实现方式,即利用语音识别模型确定目标控制指令。从而,如图3所示,在如图2所示实施例的基础上,步骤S24~S26可以替换为如下步骤:
S31,将目标语音信息输入到语音识别模型中判断目标语音信息是否为准控制指令。
其中,语音识别模型可以通过训练获得,利用语音识别模型可以获得能够被家电设备识别并执行的控制指令。作为一种可能的实现方式,在训练语音识别模型时,可以录制不同人使用普通话发出的预定义的控制指令,以得到录音样本作为训练样本,然后基于神经网络模型利用训练样本训练获得语音识别模型。作为另一种可能的实现方式,在训练语音识别模型时,可以录制不同地区的人使用方言发出的预定义的控制指令,以得到不同口音的录音样本作为训练样本,基于神经网络模型利用训练样本训练获得语音识别模型。
本实施例中,在对麦克风阵列采集的语音信息进行一系列处理得到目标语音信息之后,将目标语音信息输入到预先训练好的语音识别模型中,语音识别模型通过对目标语音信息进行解析,判断目标语音信息是否为准控制指令。若目标语音信息输入至语音识别模型中解析得到能够被家电设备识别并执行的控制指令,则认为目标语音信息为准控制指令;否则,认为目标语音信息不是准控制指令,不再进行任何操作。
其中,准控制指令为能够被家电设备识别并执行的控制指令中与目标语音信息对应的控制指令。
S32,如果识别出目标语音信息为准控制指令,则将准控制指令作为目标控制指令。
本实施例中,当语音识别模型识别出目标语音信息为准控制指令时,将准控制指令作为目标控制指令。
本实施例的语音控制方法,通过将目标语音信息输入到语音识别模型中,并在语音识别模型识别出目标语音信息为准控制指令时,将准控制指令作为目标控制指令,能够提高语言控制的准确率。
为了更清楚地说明上述采用预设波束形成算法对目标语音信息进行处理得到位置信息的实现过程,本发明实施例提出了另一种语音控制方法。图4为本发明又一实施例提供的语音控制算法的流程示意图。
如图4所示,在如图2所示实施例的基础上,步骤S27可以包括以下步骤:
S271,获取麦克风阵列中各路麦克风中的目标语音信息相对于参考麦克风的相对时间差。
其中,参考麦克风为麦克风阵列中的一个麦克风。
声源发出的语音信息到达麦克风阵列时,由于声程差,每个麦克风接收到的信号有所差异。为了能够保证各个麦克风得到的信号相同,需要对接收到的信息进行人为补偿。此时,可以计算获得各个麦克风收到信号的相对时间差,以用于补偿获得与参考麦克风相同的信号。
假设麦克风阵列由M个麦克风组成,分别记为m1,m2,…,mM,且相邻麦克风之间的间隔相等,均为d。空间内至少存在一个声源,均为窄带声源或单频信号声源,声源位置距离麦克风均较远,以至于声源发出的语音信息到达 麦克风的声波可以近似为平面波,从而每个麦克风接收到的平面波的方向角即波达方向均相同,记为
Figure PCTCN2017104905-appb-000001
将m1作为参考麦克风,则其他麦克风接收到语音信息的时间相对于m1存在一定的延时或超期,从而会产生一定的相位差。第i个麦克风接收到语音信息相对于m1的时延ti可以根据公式(1)计算获得,公式(1)表示如下:
Figure PCTCN2017104905-appb-000002
其中,c为声速,M为麦克风的个数,d为相邻两个麦克风之间的间隔。为避免相位差大于π而造成方向模糊的情况,通常取
Figure PCTCN2017104905-appb-000003
λ'为波长,即d满足“半波长”原则。
因此,本实施例中,麦克风阵列中各路麦克风中的目标语音信息相对于参考麦克风的相对时间差也可以通过公式(1)计算获得。
S272,根据相对时间差和预设的加权相关函数对各路麦克风的目标语音信息进行处理。
本实施例中,在获得麦克风阵列中各路麦克风中的目标语音信息相对于参考麦克风的相对时间差之后,根据预设的加权相关函数对各路麦克风的目标语音信息进行处理,得到处理后的各路目标语音信息,可以表示为wixi(t-ti)。
其中,wi,i=1,2,...,M为每个麦克风的目标语音信息的加权相关函数,xi(t-ti),i=1,2,...,M为人为补偿后的目标语音信息。
S273,将处理后的各路的目标语音信息求和,形成一路波束信息和波束信息的输出功率。
本实施例中,在相邻两个麦克风之间的间隔满足半波长原则的条件下,将处理后的各路的目标语音信息进行加权求和,得到一路波束信息,计算公式如 公式(2)所示。
Figure PCTCN2017104905-appb-000004
基于如图5所示的信号模型和阵列模型,波束信息可以表示为如公式(3)所示的矢量形式。
y(t)=WHxi(t)                             (3)
其中,x(t)为麦克风阵列所接收的语音信息xi(t)组成的矩阵,W为包含加权相关函数wi和延时值ti的矩阵,WH表示矩阵W的共轭转置矩阵,i=1,2,...,M,i和M均为正整数。对于远场平面波,W等于其阵列流行矢量,即W=a(k'),k’为当前
Figure PCTCN2017104905-appb-000005
对应的波束。
本实施例中,在计算波束信息的输出功率时,需要引入快拍模型。假设每个麦克风接收了N个快拍的数据,且N→∞时,波束信息的输出功率可通过公式(4)计算获得。
Figure PCTCN2017104905-appb-000006
其中,x(n)为麦克风阵列所接收的语音信息组成的矩阵。
S274,调整加强相关函数的加权值,得到输出功率最大的目标波束信号。
由于W=a(k'),从而,通过改变波束k’,即可改变矩阵W,进而实现对加权相关函数的加权值的调整,形成不同的波束,实现对声源点所在的整个空间的扫描。
本实施例中,通过改变波束k’调整加强相关函数的加权值,得到输出功率最大的波束,即为目标波束信号。
通过W=a(k'),相当于给出了一个空间滤波器,可以把声源点所在的整个空间即麦克风阵列前方的空间进行网格划分,可以依次计算出每个网格点到各 个麦克风的相对时间差,得到每个网格的波束信号与该波束信号的输出功率,从而实现对整个空间的扫描,最终可以从中确定出最大输出功率对应的波束信号。
S275,搜索目标波束信号所对应的空间点,将空间点的位置信息作为声源的位置信息。
本实施例中,在得到输出功率最大的目标波束信号之后,搜索目标波束信号所对应的空间点,搜索到的空间点的位置信息即为声源的位置信息。
本实施例的语音控制方法,通过获取麦克风阵列中各路麦克风中的目标语音信息相对于参考麦克风的相对时间差,根据相对时间差和预设的加权相关函数对目标语音信息进行处理后求和形成一路波束信息和波束信息的输出功率,通过调整加权相关函数的加权值得到输出功率最大的目标波束信号,搜索目标波束信号对应的空间点作为声源的位置信息,能够提高位置信息识别的准确率。
前文介绍了本发明实施例获取声源相对于麦克风阵列的位置信息的实现方式中的一种,下面将详细介绍另一种实现方式,即基于相对时间差估计的方法,获取声源的位置。从而,在如图2所示实施例的基础上,如图6所示,步骤S27可以替换为如下步骤:
S401,根据语音信息和预设参量,获取语音信息到达M个麦克风中的任意两个麦克风的相对时间差。
其中,根据所述语音信息的信噪比设定所述预设参量。
S402,根据语音信息到达该两个麦克风的相对时间差和这两个麦克风的位置对语音信息进行定位。
可以理解的是,麦克风阵列中可以包括M个麦克风,如图7所示,本发 明实施例以麦克风阵列包括4个麦克风为例进行示意,每个麦克风均可以接收相应的语音信息,各个麦克风的相对位置是既定的,且各个麦克风之间互不相关。
可以理解的是,麦克风阵列所接收的语音信息,包括语音信号和噪声信号。
当该麦克风阵列设置在家电设备,如空调器、风扇等上时,如果用户需要对家电设备进行语音控制,则麦克风阵列接收到的语音信息中会包含用户输入的语音信号,以及环境噪声信号、混响噪声信号等。可以理解,用户对处于室内环境的空调或风扇进行语音控制时,用户对空调或风扇发出的声音会发生反射,即产生反射噪声,空调器、风扇等运行以及其他设备(如音箱等)也会产生声音,其与反射噪声可组成混响噪声。
在本发明的实施例中,考虑到语音信息中包含环境噪声、混响噪声,因此,在计算语音信息到达任意两个麦克风的相对时间差时,引入了预设参量。
可选地,可以根据语音信息的信噪比设置预设参量,且信噪比在一定范围内时,预设参量可与信噪比正相关,即信噪比越大,预设参量取值越大。
具体地,获取M个麦克风之中的第一麦克风和第二麦克风的第一语音信息和第二语音信息,对第一语音信息和第二语音信息进行傅立叶变换以生成第一傅立叶变换值和第二傅立叶变换值,根据第一傅立叶变换值、第二傅立叶变换值和预设参量生成相对时间差。
其中,第一麦克风和第二麦克风为M个麦克风之中的任意两个麦克风,例如,参见图7,第一麦克风为麦克风1,第二麦克风为麦克风2。
在本发明的一个示例中,获取麦克风1接收到的第一语音信息x1(t)和麦克风2接收到的第二语音信息x2(t),对第一语音信息x1(t)和第二语音信息x2(t)进行傅立叶变换以生成第一傅立叶变换值X1(ω)和第二傅立叶变换值X2(ω),进而 可以通过以下公式(5)生成相对时间差:
Figure PCTCN2017104905-appb-000007
其中,R12(τ)的峰值位置为相对时间差,ψ12(ω)为广义互相关加权函数,G12(ω)为第一傅立叶变换值和第二傅立叶变换值之间的互功率谱,φ12(ω)为广义互相关谱,其中,G12(ω)=X1(ω)X2(ω)。
需要说明的是,在获取相对时间差时,可以选择不同的加权函数ψ12(ω),例如,基本互相关函数,ψ12(ω)=1;SCOT(Smooth Co-herence Transform,平滑相干变换)加权函数,
Figure PCTCN2017104905-appb-000008
CSP(Crosspower Spectrum Phase,互功率谱相位)加权函数,
Figure PCTCN2017104905-appb-000009
等。可以理解,不同的加权函数可以得到不同的相对时间差估计算法,传统CSP算法中,选择CSP加权函数,即
Figure PCTCN2017104905-appb-000010
进一步地,两个间距为d的麦克风1、麦克风2接收到的语音信息x1(t)、x2(t)的理想模型(不考虑混响噪声),如式(6)所示:
xi(t)=aiS(t-τi)+nie(t)                                                (6)
实际模型(考虑混响噪声)如式(7)所示:
xi(t)=aiS(t-τi)+nie(t)+nir(t)                                        (7)
其中,i=1,2,S(t)为声源信号,ai为声音传播过程中的衰减因子,τi是声源到达麦克风i所需要的时间,nie(t)为麦克风i接收到的环境噪声信号,nir(t)为麦克风i接收到的混响噪声信号。
需要说明的是,在室内环境中,用户在对空调或风扇等家电设备进行语音控制时,必然会产生多路径的声音反射,并且语音信息还包含了家电设备本身 及其他设备运作产生的噪声信号,所以实际模型中不仅要考虑环境噪声,还得考虑房间多径反射噪声、家电设备本身运作产生的噪声,即混响噪声。
由式(6)的理想模型可知,两个麦克风接收到的语音信息x1(t)、x2(t)的互功率谱函数G12(ω)可通过下式(8)计算:
Figure PCTCN2017104905-appb-000011
其中,Nie(ω)为环境噪声信号nie(t)的加窗傅立叶变换,Si(ω)为麦克风i接收到的声源信号的傅立叶变换值。
由于S(t)、n1e(t)、n2e(t)彼此不相关,因此,在高信噪比情况下,上式(8)可以简化为式(9):
Figure PCTCN2017104905-appb-000012
当考虑房间多径反射噪声时,两麦克风接收到的语音信息的互功率谱函数G12(ω)可通过下式(10)计算:
Figure PCTCN2017104905-appb-000013
由于S(t)、n1e(t)、n2e(t)之间彼此不相关,因此,上式(10)可以简化为式(11):
Figure PCTCN2017104905-appb-000014
进一步地,在高信噪比的情况下,由于Nir(ω)相对于S(ω)很小,两者之间的相关性也随之很小,因此上式(11)又可以通过下式(12)近似表示:
Figure PCTCN2017104905-appb-000015
然而,在实际家居环境中,空调、电风扇等家电设备运转时,一方面,由 于自身状态的变化,如风量的改变、风向的改变等会产生各种不同的噪声;另一方面,用户对空调、风扇等家电设备进行语音控制时,往往会伴随出现语音的多路径反射声、电视声、音响声、其他人声、煮饭声等,即空调、风扇等家电设备运行过程中存在较大的环境噪声和较强的混响,这使得式(11)中的后三项变得比较大,不可忽略,因此,用|G12(ω)|来近似|S(ω)|2也会产生较大的误差,使得式加权函数的峰值不再明显,式(5)中的R12(τ)的峰值也不再明显,从而较大影响了不考虑混响噪声的传统CSP算法的性能。
即言,在传统CSP算法中,在高信噪比的情况下,|G12(ω)|与|S(ω)|2之间的差值较小,可以等效做近似替换,此时估计出的时延精度较高;而在低信噪比的情况下,两者之间将会有较大的差值,不可近似替换,而且随着信噪比的下降,|S(ω)|2在|G12(ω)|中所占的比例也下降。
在本发明的实施例中,基于传统CSP算法,为保证|S(ω)|2在|G12(ω)|所占的比例具有一定稳定性,引入一个随信噪比变化的预设参量,记为λ2,通过这个预设参量来调节广义互相关加权函数的大小,进而减小噪声的影响,提高算法的抗噪性能。
具体地,在传统的CSP算法使用的广义互相关加权函数中引入预设参量λ2,在本发明的实施例中,广义互相关加权函数通过式(13)表示:
Figure PCTCN2017104905-appb-000016
在本发明的一个实施例中,0.707≤λ≤1,λ2随着信噪比的变换而变化的量,且λ2满足下式(14):
Figure PCTCN2017104905-appb-000017
其中,σ表示信噪比,σ0、σ1、λ0、λ1是根据实际情况定的常数,且λ10
可以理解,如果取λ2=1,则为传统的CSP算法。
由此,基于传统CSP算法,引入了随着信噪比而变化的预设参量λ2后,可以对室内环境下,由人声多路径反射噪声、设备自身运转产生的声音、及其他设备产生的噪声等组成的混响声音有较强的抵抗能力,实现更好的应对噪声的能力,提高语音信息到达两麦克风的相对时间差(声程差)的计算精度,从而有利于提升声源定位的精度,有助于空调、风扇类家电设备的语音识别控制。
具体地,可以通过以下公式(15)对语音信息进行定位:
||mi1-s||-||mi2-s||=Δτic                   (15)
其中,Δτi为语音信息到达M个麦克风中的任意两个麦克风的相对时间差,即为式(5)中的R12(τ)的峰值,mi1、mi2分别为任意两个麦克风之间位置矢量,s表示声源位置矢量,c为在当前介质下的声速,如在1个标准大气压和15℃下,声音在空气中的传播速度为340m/s。
在本发明的一个示例中,麦克风阵列中任意两个麦克风和声源位置的三维空间几何结构如图8所示,麦克风1和麦克风2在x轴上,其连线的中点为原点,声源到这两个麦克风的时间差(即声程差)为Δτi
由式(15)可以看出,声源的位置在一个双曲面上。
参见图8,声源S的球面坐标为(r,θ,φ),将声源、麦克风1和麦克风2转化到直角坐标系中分别为:
Figure PCTCN2017104905-appb-000018
将s、mi1、mi2代入式(15),且两边平方,可得:
Figure PCTCN2017104905-appb-000019
当声场为远场,即距离r比较远时,
Figure PCTCN2017104905-appb-000020
趋近于零,则此时式(16)可以近似为:
Figure PCTCN2017104905-appb-000021
由此可见,当知道语音信息到达任意两个麦克风的相对时间差和这两个麦克风之间的距离时,就可以近似的求得θ角。当声源为远场声源时,可以用θ角的锥面来表示声源的可能位置。因此,只要能求得声程差Δτi,就可以近似地求得声源对于该任意两个麦克风连线中点的方向角。即言,通过两个麦克风就能获得一个声源的可能位置面。进而通过一个包含M个麦克风的阵列,可以获得多个声源可能位置的面,这些面的焦点也就是声源的位置。
需要说明的是,在实际情况中,由于存在误差,得到的声源位置往往不能够全都交于一点,所以只要找到距离几个面距离最近的位置,就是所估计出的声源位置。
在本发明的一个实施例中,在基于麦克风阵列采集声源所发出的语音信息后,还可以对语音信息进行短时傅立叶变换以生成多个音频频域值,进而将多个音频频域值中的最大值和/或最小值与门限值进行比较以判断语音信息是否为有效语音信号,如果为有效语音信号,则将语音信息的幅度谱减去噪声幅度谱,其中,门限值可以包括第一门限值和/或第二门限值,且第一门限值小于第二门限值。
具体地,如图9所示,在麦克风阵列中每个麦克风采集到一帧语音信息x[n]之后,对各帧语音信息进行短时傅立叶变换,得到多个音频频域值X[k,τ], 其中,n=1,2,3,...,fLen,k=1,2,3,...,fLen,fLen为语音信息的帧长,τ为短时傅立叶变换的时间参数。
进一步地,根据X[k,τ]进行判断。
在本发明的一个示例中,参见图9,如果多个音频频域值中的最大值max1≤k≤fLen{|X[k,τ]|}小于等于第一门限值threshold1,即max1≤k≤fLen{|X[k,τ]|}≤threshold1,则判定接收到的语音信息为噪声信号,否则判定接收到的语音信息为有效语音信号。
在本发明的另一个示例中,如果多个音频频域值中的最小值min1≤k≤fLen{|X[k,τ]|}大于等于第二门限值threshold2,即min1≤k≤fLen{|X[k,τ]|}≥threshold2,则判定接收到的语音信息为噪声信号,否则判定接收到的语音信息为有效语音信号。
在本发明的又一个示例中,如果多个音频频域值中的最大值max1≤k≤fLen{|X[k,τ]|}大于第一门限值threshold1,即max1≤k≤fLen{|X[k,τ]|}>threshold1,且多个音频频域值中的最小值min1≤k≤fLen{|X[k,τ]|}小于第二门限值threshold2,即min1≤k≤fLen{|X[k,τ]|}<threshold2,则判定接收到的语音信息为有效语音信号,否则判定接收到的语音信息为噪声信号。
即言,超过门限值的音频频域值对应的语音信息不是有效语音信号。其中,门限值可以根据经验事先设定,也可以由具体环境来确定。例如,用户在对空调、风扇等家电设备进行语音控制时,声音频率一般为200~1000Hz,此时可以设定第一门限值为200Hz,第二门限值为1000Hz。
更进一步地,参见图9,如果判定语音信息为噪声信号,则更新噪声信号的幅度谱的值,使噪声幅度谱始终保持为最近的噪声偏置;如果判定语音信息为有效语音信号,则将接收语音信息的幅度谱在频域减去噪声幅度谱,即以最 近的噪声模拟当前的噪声。
举例而言,如果第一帧语音信息为噪声信号,则对应的更新的噪声幅度谱为第一帧语音信息的幅度谱;如果第二帧语音信息为有效语音信号,则对应的噪声幅度谱为第一帧语音信息的幅度谱,此时将第二帧语音信息的幅度谱减去第一帧语音信息的幅度谱;如果第三帧语音信息为噪声信号,则更新噪声幅度谱为第三帧语音信息的幅度谱;如果第四帧语音信息为噪声信号,则更新噪声幅度谱为第四帧语音信息的幅度谱;如果第五帧语音信息为有效语音信号,则对应的噪声幅度谱为第四帧语音信息的幅度谱,此时将第五帧语音信息的幅度谱减去第四帧语音信息的幅度谱,以此类推。由此,实现了自适应环境,在不同的噪声环境下都可以较好的实现去除背景噪声,得到降噪后的语音信息幅度谱。
综上,根据上述基于相对时间差估计的方法,在获取语音信息到达任意两个麦克风的相对时间差时,引入了预设参量,即通过麦克风阵列所采集的语音信息和预设参量,获取语音信息到达多个麦克风之中任意两个麦克风的相对时间差,进而根据任意两个麦克风的相对时间差和这两个麦克风的位置对语音信息进行定位。由此,能够有效的自适应减少环境噪声,而且对远场环境下的混响及声音衍射噪声有较强的适应抵抗能力,实现了双重降噪效果,大幅提升了基于阵列式麦克风的远场声源识别精度,使远场声源识别的实用性大大增强。
需要说明的是,在本发明实施例中,还可以通过GPS定位、基于子空间的声源定位等多种方法,获取声源的位置信息,此处不作限制。
通过上述分析可知,在确定了语音信息对应的控制指令,及获取了声源相对于麦克风阵列的位置信息后,可以控制家电设备按照控制指令和位置信息运行。在本发明一种可能的实现形式中,控制指令可以是驱蚊指令,下面结合图 10,对本发明实施例提供的语音控制方法进行进一步说明。
图10为本发明又一实施例提供的语音控制方法的流程示意图。
如图10所示,该语音控制方法包括:
步骤501,基于麦克风阵列采集声源所发出的语音信息。
其中,所述麦克风阵列包括M个麦克风,其中,M为大于1的正整数。
步骤502,对语音信息进行识别,确定与语音信息对应的目标控制指令为驱蚊指令。
步骤503,向用户返回响应消息。
步骤504,获取声源相对于麦克风阵列的位置信息。
步骤505,根据驱蚊指令和位置信息,发射驱蚊声波。
可以理解的是,超声波在空气中产生的振荡可以通过蚊虫头部的触须,使蚊虫的听觉神经感到不适,从而使蚊虫力求避开声波区域。另外,蚊虫依靠翅膀颤动飞行,翅膀颤动引起空气颤动,而超声波在空气中所引发的超声振荡会加剧空气的颤动,从而使得蚊虫飞行时的空气阻力加大,肌肉负担增大,难于忍受,只得逃逸。
因此,在本发明实施例中,确定了声源即用户相对于麦克风针对的位置信息后,且确定目标控制指令为驱蚊指令后,可以利用超声波驱蚊的原理,向用户所在的位置发射驱蚊声波,以驱赶用户所在位置的蚊虫。
其中,驱蚊声波,指具有某一种或多种特定频率,可以通过刺激蚊虫的神经系统、肌肉系统等将蚊虫驱赶出声波区域的声波。
具体的,由于人耳听觉频率范围在20赫兹至2万赫兹(20HZ-20KHZ),因此,在本发明实施例中,为了避免驱蚊声波干扰到用户的正常休息,可以将驱蚊声波的频率设定在人耳听觉范围之外的频率范围。例如,可以将驱蚊声波 的频率范围设定为大于24KHz以上的范围,以避免用户被驱蚊声波影响。
另外,为了避免蚊虫对驱蚊声波产生适应性和免疫力,在本发明实施例中,可以设置驱蚊声波的频率在预设的范围内不断变化。即,步骤505具体可以包括:以预设的调节频率,调整驱蚊声波的频率,从而使驱蚊声波的频率不断变化。
具体的,在本发明实施例中,可以在家电设备中设置控制转向模块,和以时钟脉冲发生器、充电调节电路、多谐振荡器和扬声器或蜂鸣器等组成的超声波发射模块。在确定用户的位置信息后,控制转向模块可以通过驱动马达或电机转动,使超声波发送模块进行转向,以使声波的发射方向朝向用户所在的大致方位,从而超声波发射模块即可向用户所在的位置发射驱蚊声波。且通过调整时钟脉冲的频率等方法,可以调整驱蚊声波的频率,以避免蚊虫对固定频率的驱蚊声波产生适应性和免疫力。
需要说明的是,在本申请实施例中,步骤502和步骤504也可以同时进行,即,在基于麦克风阵列采集声源所发出的语音信息后,一方面根据语音信息,确定对应的目标控制指令是否为驱蚊指令,一方面根据语音信息,确定用户的位置信息。然后仅在语音信息对应的目标控制指令为驱蚊指令时,将确定的位置信息,发送给控制转向模块,以使超声波发射模块向用户所在的位置发射驱蚊声波。通过同时根据采集到的语音信息,确定语音信息对应的目标控制指令及声源的位置信息,提高了声波驱蚊的效率。
另外,用户需要对家电设备进行控制时,向所需控制的家电设备发出语音信息后,通常希望了解操作是否成功,那么在本申请实施例中,确定语音信息对应的目标控制指令为驱蚊指令后,还可以向用户返回响应消息,以提示用户驱蚊操作成功。
另外,在用户不需要利用设备进行驱蚊时,也可以通过语音,控制设备的声波驱蚊功能关闭,以减少能源的消耗。
为了实现上述实施例,本申请还提出一种语音控制装置。
图11为本发明一个实施例提供的语音控制装置的结构示意图。
如图11所示,语音控制装置60包括:
麦克风阵列610、语音识别模块620、定位模块630,以及控制模块640。其中,
麦克风阵列610,用于采集声源所发出的语音信息。
语音识别模块620,用于对语音信息进行识别,确定与语音信息对应的目标控制指令。
可选地,在本发明实施例的一种可能的实现方式中,语音识别模块620还可以包括:
模型识别单元,用于将目标语音信息输入到语音识别模型中判断目标语音信息是否为准控制指令。
第二设置单元,用于如果识别出目标语音信息为准控制指令,则将准控制指令作为目标控制指令。
定位模块630,用于获取声源相对于麦克风阵列的位置信息。
具体地,定位模块630用于,采用预设的波束形成算法对目标语音信息进行处理得到位置信息。
进一步地,在本发明实施例的一种可能的实现方式中,定位模块630可以包括:
第一时延获取单元,用于获取麦克风阵列中各路麦克风中的目标语音信息相对于参考麦克风的时延值,其中,参考麦克风为麦克风阵列中的一个麦克风。
处理单元,用于利用根据时延值和预设的加权相关函数对各路麦克风的目标语音信息进行处理。
波束形式单元,用于将处理后的各路的目标语音信息求和,形成一路波束信息和波束信息的输出功率。
调整单元,用于调整加强相关函数的加权值,得到输出功率最大的目标波束信号。
搜索单元,用于搜索目标波束信号所对应的空间点,将空间点的位置信息作为声源的位置信息。
在本发明实施例的另一种可能的实现方式中,定位模块630可以包括:
第二时延获取单元,用于根据语音信息和预设参量,获取所述语音信息到达所述M个麦克风中的任意两个麦克风的相对时间差,其中,根据所述语音信息的信噪比设定所述预设参量;
定位单元,用于根据所述语音信息到达该两个麦克风的相对时间差和这两个麦克风的位置对所述语音信息进行定位。
所述第二时延获取单元,具体用于:
通过以下公式生成所述语音信息到达所述M个麦克风中的任意两个麦克风的相对时间差:
Figure PCTCN2017104905-appb-000022
其中,R12(τ)的峰值位置为相对时间差,ψ12(ω)为广义互相关加权函数,G12(ω)为第一傅立叶变换值和第二傅立叶变换值之间的互功率谱,G12(ω)=X1(ω)X2(ω),X1(ω)、X2(ω)分别为对第一语音信息x1(t)和第二语音信息x2(t)进行傅立叶变换生成的第一傅立叶变换值和第二傅立叶变换值。
所述第二时延获取单元,还用于通过以下公式确定广义互相关加权函数 ψ12(ω):
Figure PCTCN2017104905-appb-000023
其中,λ2为所述预设参量,
Figure PCTCN2017104905-appb-000024
σ表示信噪比,σ0、σ1、λ0、λ1为预设的常数,且λ1>λ0
在本发明实施例的另一种可能的实现方式中,所述定位模块,还用于通过以下公式获取所述声源相对于所述麦克风阵列的位置信息:
||mi1-s||-||mi2-s||=Δτic,
其中,Δτi为所述语音信息到达所述M个麦克风中的任意两个麦克风的相对时间差,mi1、mi2分别为所述任意两个麦克风之间位置矢量,s表示声源位置矢量,c为在当前介质下的声速。
控制模块640,用于控制家电设备按照目标控制指令和位置信息运行。
具体地,控制模块640用于,当识别出目标控制指令为准控制指令后,向家电设备输出位置信息,以控制家电设备按照目标控制指令和位置信息运行。
可选地,在本发明实施例一种可能的实现方式中,该语音控制装置60还可以包括:
转换模块,用于在语音识别模块620对语音信息进行识别,确定与语音信息对应的控制指令之前,对语音信息进行模数转换得到数字语音信息。
干扰消除模块,用于消除数字语音信息中的干扰信息,得到目标语音信息。
变换模块,用于对所述语音信息进行短时傅立叶变换以生成多个音频频域值;
比较模块,用于将所述多个音频频域值中的最大值和最小值与门限值进行比较以判断所述语音信息是否为有效语音信号,其中,所述门限值包括第一门限值和第二门限值,且所述第一门限值小于所述第二门限值;
判断模块,用于在所述多个音频频域值中的最大值大于所述第一门限值且最小值小于所述第二门限值时,判断所述语音信息为有效语音信号,并将所述语音信息的幅度谱减去噪声幅度谱;
及在所述多个音频频域值中的最大值小于等于所述第一门限值或最小值大于等于所述第二门限值时,判断所述语音信息为噪声信号,并更新噪声幅度谱,以使噪声幅度谱为最近的噪声幅度谱。
可选地,在本发明实施例一种可能的实现方式中,上述目标控制指令可以为驱蚊指令,相应的,所述控制模块,具体用于:
根据所述驱蚊指令和所述位置信息,发射驱蚊声波。
该语音控制装置60还可以包括:发送模块,用于向用户返回响应消息。
可选地,在本发明实施例的一种可能的实现方式中,控制模块,还用于:
以预设的调节频率,调整所述驱蚊声波的频率。
需要说明的是,前述对语音控制方法实施例的解释说明也适用于本实施例的语音控制装置,其实现原理类似,此处不再赘述。
本实施例的语音控制装置,通过采集声源发出的语音信息,对语音信息进行识别确定对应的目标控制指令,获取声源相对于麦克风阵列的位置信息,控制家电设备按照目标控制指令和位置信息运行。由此,能够实现对家电设备的智能化控制,用户仅通过发出语音信息即可控制家电设备,解除了用户对遥控器的依赖,提升了用户体验。
为了实现上述实施例,本申请还提出一种家电设备。
图12为本发明一个实施例提供的家电设备的结构示意图。
如图12所示,该家电设备120,包括如前述实施例所述的语音控制装置60。
其中,家电设备120可以为空调器,风扇等任意设备。
可以理解的是,当利用空调器或风扇类等家电设备进行驱蚊时,由于空调器或风扇类设备通常在卧室中使用,房间空间较小,因此,相比在野外环境中具有更好的驱蚊效果。另外,空调器本身是一个需要出风的设备,空调器在使用过程中吹出风流,随着风力、风向的变化,室内空气振荡,同时配合超声波发射模块发射驱蚊声波,更加大了对蚊虫肌肉系统的刺激,提高了驱蚊的效果。
需要说明的是,前述对语音控制方法实施例的解释说明也适用于本实施例的家电设备,其实现原理类似,此处不再赘述。
在本发明一种可能的实现形式中,家电设备可以采用图13所示的系统架构图。
如图13所示,家电设备可以包括语音播报子系统131、语音识别子系统132、麦克风阵列子系统123、声源定位子系统134、控制子系统135。
具体实现时,麦克风阵列子系统133可以采集语音信息,并一方面将语音信息发送给语音识别子系统132进行语音识别,一方面将语音信息发送给声源定位子系统134进行声源定位。
语音识别子系统132对语音信息进行语音识别后,若确定了语音信息对应的目标控制指令,则一方面可以向声纹定位子系统134发送控制信号,以使声源定位子系统134将定位结果发送给控制子系统135;一方面可以向控制子系统135发送识别出的目标控制指令;再一方面可以向语音播放子系统131输出 提示指令,以使语音播报子系统131提示用户操作成功。
声源定位子系统134根据麦克风阵列子系统133采集到的语音信息,经过信号处理,确定了声源信息后,若接收到语音识别子系统132输出的控制指令,则可以将定位结果发送给控制子系统135。
控制子系统135可以控制家电设备按照目标控制指令和位置信息运行,比如,控制子系统135中可以包括超声波发射模块和控制转向模块等,控制转向模块在接收到声源定位子系统134发送的定位结果后,可以控制超声波发射模块启动发射或关闭发射驱蚊声波,并根据声源定位子系统134发送的定位结果,驱动马达或电机转动,从而使超声波发射模块进行转向,实现向用户所在的位置发射驱蚊声波。
本实施例的家电设备,通过采集声源发出的语音信息,对语音信息进行识别确定对应的目标控制指令,获取声源相对于麦克风阵列的位置信息,控制家电设备按照目标控制指令和位置信息运行。由此,能够实现对家电设备的智能化控制,用户仅通过发出语音信息即可控制家电设备,解除了用户对遥控器的依赖,提升了用户体验。
为了实现上述实施例,本发明还提出一种计算机程序产品,当计算机程序产品中的指令由处理器执行时,执行如前述实施例所述的语音控制方法。
为了实现上述实施例,本发明还提出一种计算机可读存储介质,其上存储有计算机程序,当该计算机程序被处理器执行时能够实现如前述实施例所述的语音控制方法。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一计算 机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)或随机存储记忆体(Random Access Memory,RAM)等。
以上所述,仅为本发明的具体实施方式,但本发明的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本发明揭露的技术范围内,可轻易想到的变化或替换,都应涵盖在本发明的保护范围之内。因此,本发明的保护范围应以权利要求的保护范围为准。

Claims (31)

  1. 一种语音控制方法,其特征在于,包括:
    基于麦克风阵列采集声源所发出的语音信息;
    对所述语音信息进行识别,确定与所述语音信息对应的目标控制指令;
    获取所述声源相对于所述麦克风阵列的位置信息;
    控制家电设备按照所述目标控制指令和所述位置信息运行。
  2. 如权利要求1所述的语音控制方法,其特征在于,所述对所述语音信息进行识别,确定与所述语音信息对应的目标控制指令之前,还包括:
    对所述语音信息进行短时傅立叶变换以生成多个音频频域值;
    将所述多个音频频域值中的最大值和最小值与门限值进行比较以判断所述语音信息是否为有效语音信号,其中,所述门限值包括第一门限值和第二门限值,且所述第一门限值小于所述第二门限值;
    如果所述多个音频频域值中的最大值大于所述第一门限值且最小值小于所述第二门限值,则判断所述语音信息为有效语音信号,并将所述语音信息的幅度谱减去噪声幅度谱;
    如果所述多个音频频域值中的最大值小于等于所述第一门限值或最小值大于等于所述第二门限值,则判断所述语音信息为噪声信号,并更新噪声幅度谱,以使噪声幅度谱为最近的噪声幅度谱。
  3. 如权利要求1所述的语音控制方法,其特征在于,所述对所述语音信息进行识别,确定与所述语音信息对应的目标控制指令之前,还包括:
    对所述语音信息进行模数转换得到数字语音信息;
    消除所述数字语音信息中的干扰信息,得到所述目标语音信息。
  4. 如权利要求3所述的语音控制方法,其特征在于,所述对所述语音信息进行识别,确定与所述语音信息对应的目标控制指令,包括:
    获取所述目标语音信息的目标内容;
    判断预设的语音模板库是否存在与所述目标内容对应的准控制指令;其中,所述准控制指令为与所述目标内容匹配度超过预设阈值的控制指令;
    如果所述语音模板库中存在所述准控制指令,则将所述准控制指令作为所述目标控制指令。
  5. 如权利要求3所述的语音控制方法,其特征在于,所述对所述语音信息进行识别,确定与所述语音信息对应的目标控制指令,包括:
    将所述目标语音信息输入到语音识别模型中判断所述目标语音信息是否为准控制指令;
    如果识别出所述目标语音信息为所述准控制指令,则将所述准控制指令作为所述目标控制指令。
  6. 如权利要求4或5所述的语音控制方法,其特征在于,所述控制家电设备按照所述目标控制指令和所述位置信息运行,包括:
    当识别出所述目标控制指令为所述准控制指令后,向所述家电设备输出所述位置信息,以控制所述家电设备按照所述目标控制指令和所述位置信息运行。
  7. 如权利要求3所述的语音控制方法,其特征在于,所述获取所述声源相对于所述麦克风阵列的位置信息,包括:
    采用预设的波束形成算法对所述目标语音信息进行处理得到所述位置信息。
  8. 如权利要求7所述的语音控制方法,其特征在于,所述采用预设波束 形成算法对所述目标语音信息进行处理得到所述位置信息,包括:
    获取所述麦克风阵列中各路麦克风中的所述目标语音信息相对于参考麦克风的时延值;其中,所述参考麦克风为所述麦克风阵列中的一个麦克风;
    根据所述时延值和预设的加权相关函数对各路麦克风的所述目标语音信息进行处理;
    将处理后的各路的所述目标语音信息求和,形成一路波束信息和所述波束信息的输出功率;
    调整所述加强相关函数的加权值,得到输出功率最大的目标波束信号;
    搜索所述目标波束信号所对应的空间点,将所述空间点的位置信息作为所述声源的所述位置信息。
  9. 如权利要求1所述的语音控制方法,其特征在于,所述麦克风阵列包括M个麦克风,M为大于1的整数;
    所述获取所述声源相对于所述麦克风阵列的位置信息,包括:
    根据语音信息和预设参量,获取所述语音信息到达所述M个麦克风中的任意两个麦克风的相对时间差,其中,根据所述语音信息的信噪比设定所述预设参量;
    根据所述语音信息到达该两个麦克风的相对时间差和这两个麦克风的位置对所述语音信息进行定位。
  10. 如权利要求9所述的语音控制方法,其特征在于,所述根据语音信息和预设参量,获取所述语音信息到达所述M个麦克风中的任意两个麦克风的相对时间差,包括:
    通过以下公式生成所述语音信息到达所述M个麦克风中的任意两个麦克风的相对时间差:
    Figure PCTCN2017104905-appb-100001
    其中,R12(τ)的峰值位置为相对时间差,ψ12(ω)为广义互相关加权函数,G12(ω)为第一傅立叶变换值和第二傅立叶变换值之间的互功率谱,G12(ω)=X1(ω)X2(ω),X1(ω)、X2(ω)分别为对第一语音信息x1(t)和第二语音信息x2(t)进行傅立叶变换生成的第一傅立叶变换值和第二傅立叶变换值。
  11. 如权利要求10所述的语音控制方法,其特征在于,所述广义互相关加权函数ψ12(ω)通过以下公式确定:
    Figure PCTCN2017104905-appb-100002
    其中,λ2为所述预设参量,
    Figure PCTCN2017104905-appb-100003
    σ表示信噪比,σ0、σ1、λ0、λ1为预设的常数,且λ1>λ0
  12. 如权利要求7所述的声源定位的方法,其特征在于,通过以下公式获取所述声源相对于所述麦克风阵列的位置信息:
    ||mi1-s||-||mi2-s||=Δτic,
    其中,Δτi为所述语音信息到达所述M个麦克风中的任意两个麦克风的相对时间差,mi1、mi2分别为所述任意两个麦克风之间位置矢量,s表示声源位置矢量,c为在当前介质下的声速。
  13. 如权利要求1所述的语音控制方法,其特征在于,所述目标控制指令为驱蚊指令;
    所述控制家电设备按照所述目标控制指令和所述位置信息运行,包括:
    根据所述驱蚊指令和所述位置信息,发射驱蚊声波;
    所述确定与所述语音信息对应的目标控制指令之后,还包括:
    向用户返回响应消息。
  14. 如权利要求13所述的语音控制方法,其特征在于,所述根据所述驱蚊指令和所述位置信息,发射驱蚊声波,包括:
    以预设的调节频率,调整所述驱蚊声波的频率。
  15. 一种语音控制装置,其特征在于,包括:
    麦克风阵列,用于采集声源所发出的语音信息;
    语音识别模块,用于对所述语音信息进行识别,确定与所述语音信息对应的目标控制指令;
    定位模块,用于获取所述声源相对于所述麦克风阵列的位置信息;
    控制模块,用于控制家电设备按照所述目标控制指令和所述位置信息运行。
  16. 根据权利要求15所述的语音控制装置,其特征在于,还包括:
    变换模块,用于对所述语音信息进行短时傅立叶变换以生成多个音频频域值;
    比较模块,用于将所述多个音频频域值中的最大值和最小值与门限值进行比较以判断所述语音信息是否为有效语音信号,其中,所述门限值包括第一门限值和第二门限值,且所述第一门限值小于所述第二门限值;
    判断模块,用于在所述多个音频频域值中的最大值大于所述第一门限值且最小值小于所述第二门限值时,判断所述语音信息为有效语音信号,并将所述语音信息的幅度谱减去噪声幅度谱;
    及在所述多个音频频域值中的最大值小于等于所述第一门限值或最小值大于等于所述第二门限值时,判断所述语音信息为噪声信号,并更新噪声幅度 谱,以使噪声幅度谱为最近的噪声幅度谱。
  17. 根据权利要求15所述的语音控制装置,其特征在于,还包括:
    转换模块,用于在所述语音识别模块对所述语音信息进行识别,确定与所述语音信息对应的控制指令之前,对所述语音信息进行模数转换得到数字语音信息;
    干扰消除模块,用于消除所述数字语音信息中的干扰信息,得到所述目标语音信息。
  18. 根据权利要求17所述的语音控制装置,其特征在于,所述语音识别模块,包括:
    内容获取单元,用于获取所述目标语音信息的目标内容;
    判断单元,用于判断预设的语音模板库是否存在与所述目标内容对应的准控制指令;其中,所述准控制指令为与所述目标内容匹配度超过预设阈值的控制指令;
    第一设置单元,用于如果所述语音模板库中存在所述准控制指令,则将所述准控制指令作为所述目标控制指令。
  19. 根据权利要求17所述的语音控制装置,其特征在于,所述语音识别模块,包括:
    模型识别单元,用于将所述目标语音信息输入到语音识别模型中判断所述目标语音信息是否为准控制指令;
    第二设置单元,用于如果识别出所述目标语音信息为所述准控制指令,则将所述准控制指令作为所述目标控制指令。
  20. 根据权利要求18或19所述的语音控制装置,其特征在于,所述控制模块,具体用于当识别出所述目标控制指令为所述准控制指令后,向所述家电 设备输出所述位置信息,以控制所述家电设备按照所述目标控制指令和所述位置信息运行。
  21. 根据权利要求17所述的语音控制装置,其特征在于,所述定位模块,具体用于采用预设的波束形成算法对所述目标语音信息进行处理得到所述位置信息。
  22. 根据权利要求21所述的语音控制装置,其特征在于,所述定位模块,包括:
    第一时延获取单元,用于获取所述麦克风阵列中各路麦克风中的所述目标语音信息相对于参考麦克风的时延值;其中,所述参考麦克风为所述麦克风阵列中的一个麦克风;
    处理单元,用于利用根据所述时延值和预设的加权相关函数对各路麦克风的所述目标语音信息进行处理;
    波束形式单元,用于将处理后的各路的所述目标语音信息求和,形成一路波束信息和所述波束信息的输出功率;
    调整单元,用于调整所述加强相关函数的加权值,得到输出功率最大的目标波束信号;
    搜索单元,用于搜索所述目标波束信号所对应的空间点,将所述空间点的位置信息作为所述声源的所述位置信息。
  23. 如权利要求15所述的语音控制装置,其特征在于,所述麦克风阵列包括M个麦克风,M为大于1的整数;
    所述定位模块,包括:
    第二时延获取单元,用于根据语音信息和预设参量,获取所述语音信息到达所述M个麦克风中的任意两个麦克风的相对时间差,其中,根据所述语音 信息的信噪比设定所述预设参量;
    定位单元,用于根据所述语音信息到达该两个麦克风的相对时间差和这两个麦克风的位置对所述语音信息进行定位。
  24. 如权利要求23所述的语音控制装置,其特征在于,所述第二时延获取单元,具体用于:
    通过以下公式生成所述语音信息到达所述M个麦克风中的任意两个麦克风的相对时间差:
    Figure PCTCN2017104905-appb-100004
    其中,R12(τ)的峰值位置为相对时间差,ψ12(ω)为广义互相关加权函数,G12(ω)为第一傅立叶变换值和第二傅立叶变换值之间的互功率谱,G12(ω)=X1(ω)X2(ω),X1(ω)、X2(ω)分别为对第一目标语音信息x1(t)和第二目标语音信息x2(t)进行傅立叶变换生成的第一傅立叶变换值和第二傅立叶变换值。
  25. 如权利要求24所述的语音控制装置,其特征在于,所述第二时延获取单元,还用于通过以下公式确定广义互相关加权函数ψ12(ω):
    Figure PCTCN2017104905-appb-100005
    其中,λ2为所述预设参量,
    Figure PCTCN2017104905-appb-100006
    σ表示信噪比,σ0、σ1、λ0、λ1为预设的常数,且λ1>λ0
  26. 如权利要求21所述的声源定位的装置,其特征在于,所述定位模块,还用于通过以下公式获取所述声源相对于所述麦克风阵列的位置信息:
    ||mi1-s||-||mi2-s||=Δτic,
    其中,Δτi为所述语音信息到达所述M个麦克风中的任意两个麦克风的相对时间差,mi1、mi2分别为所述任意两个麦克风之间位置矢量,s表示声源位置矢量,c为在当前介质下的声速。
  27. 如权利要求15所述的语音控制装置,其特征在于,所述目标控制指令为驱蚊指令;
    所述控制模块,具体用于:
    根据所述驱蚊指令和所述位置信息,发射驱蚊声波;
    所述装置,还包括:
    发送模块,用于向用户返回响应消息。
  28. 如权利要求27所述的语音控制装置,其特征在于,所述控制模块,还用于:
    以预设的调节频率,调整所述驱蚊声波的频率。
  29. 一种家电设备,包括上述权利要求15-28任一项所述的语音控制装置。
  30. 一种计算机程序产品,当所述计算机程序产品中的指令由处理器执行时,执行如权利要求1-14中任一项所述的语音控制方法。
  31. 一种计算机可读存储介质,其上存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1-14中任一项所述的语音控制方法。
PCT/CN2017/104905 2017-04-11 2017-09-30 一种语音控制方法、装置及家电设备 WO2018188287A1 (zh)

Applications Claiming Priority (6)

Application Number Priority Date Filing Date Title
CN201710233667.X 2017-04-11
CN201710233667.XA CN107123421A (zh) 2017-04-11 2017-04-11 语音控制方法、装置及家电设备
CN201710493300.1 2017-06-22
CN201710482779.9A CN107202385B (zh) 2017-06-22 2017-06-22 声波驱蚊方法、装置及空调器
CN201710493300.1A CN107271963A (zh) 2017-06-22 2017-06-22 声源定位的方法和装置及空调器
CN201710482779.9 2017-06-22

Publications (1)

Publication Number Publication Date
WO2018188287A1 true WO2018188287A1 (zh) 2018-10-18

Family

ID=63792235

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/104905 WO2018188287A1 (zh) 2017-04-11 2017-09-30 一种语音控制方法、装置及家电设备

Country Status (1)

Country Link
WO (1) WO2018188287A1 (zh)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102522082A (zh) * 2011-12-27 2012-06-27 重庆大学 一种公共场所异常声音的识别与定位方法
CN102707262A (zh) * 2012-06-20 2012-10-03 太仓博天网络科技有限公司 一种基于麦克风阵列的声源定位系统
JP2012211768A (ja) * 2011-03-30 2012-11-01 Advanced Telecommunication Research Institute International 音源定位装置
CN104214890A (zh) * 2014-01-20 2014-12-17 美的集团股份有限公司 通过语音控制空调器送风的方法及空调器
CN105467364A (zh) * 2015-11-20 2016-04-06 百度在线网络技术(北京)有限公司 一种定位目标声源的方法和装置
CN105532634A (zh) * 2015-12-02 2016-05-04 小米科技有限责任公司 超声波驱蚊方法、装置及系统
CN107123421A (zh) * 2017-04-11 2017-09-01 广东美的制冷设备有限公司 语音控制方法、装置及家电设备
CN107202385A (zh) * 2017-06-22 2017-09-26 广东美的制冷设备有限公司 声波驱蚊方法、装置及空调器

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2012211768A (ja) * 2011-03-30 2012-11-01 Advanced Telecommunication Research Institute International 音源定位装置
CN102522082A (zh) * 2011-12-27 2012-06-27 重庆大学 一种公共场所异常声音的识别与定位方法
CN102707262A (zh) * 2012-06-20 2012-10-03 太仓博天网络科技有限公司 一种基于麦克风阵列的声源定位系统
CN104214890A (zh) * 2014-01-20 2014-12-17 美的集团股份有限公司 通过语音控制空调器送风的方法及空调器
CN105467364A (zh) * 2015-11-20 2016-04-06 百度在线网络技术(北京)有限公司 一种定位目标声源的方法和装置
CN105532634A (zh) * 2015-12-02 2016-05-04 小米科技有限责任公司 超声波驱蚊方法、装置及系统
CN107123421A (zh) * 2017-04-11 2017-09-01 广东美的制冷设备有限公司 语音控制方法、装置及家电设备
CN107202385A (zh) * 2017-06-22 2017-09-26 广东美的制冷设备有限公司 声波驱蚊方法、装置及空调器

Similar Documents

Publication Publication Date Title
CN112020864B (zh) 麦克风阵列中的智能波束控制
US9966059B1 (en) Reconfigurale fixed beam former using given microphone array
CN110491403B (zh) 音频信号的处理方法、装置、介质和音频交互设备
CN107202385B (zh) 声波驱蚊方法、装置及空调器
WO2020103703A1 (zh) 一种音频数据处理方法、装置、设备及存储介质
CN107479030B (zh) 基于分频和改进的广义互相关双耳时延估计方法
JP4376902B2 (ja) 音声入力システム
JP3771812B2 (ja) ロボットおよびその制御方法
JP4247037B2 (ja) 音声信号処理方法と装置及びプログラム
CN110140360B (zh) 使用波束形成的音频捕获的方法和装置
CN110140359B (zh) 使用波束形成的音频捕获
JP6644959B1 (ja) ビームフォーミングを使用するオーディオキャプチャ
CN104902418A (zh) 用于估计目标和噪声谱方差的多传声器方法
CN113113034A (zh) 用于平面麦克风阵列的多源跟踪和语音活动检测
EP2630807A1 (en) Systems, methods, apparatus, and computer-readable media for far-field multi-source tracking and separation
CN108235181B (zh) 在音频处理装置中降噪的方法
CN107872762A (zh) 话音活动检测单元及包括话音活动检测单元的听力装置
CN110660404B (zh) 基于零陷滤波预处理的语音通信和交互应用系统、方法
CN105430587A (zh) 包括gsc波束形成器的听力装置
JP2015070321A (ja) 音声処理装置、音声処理方法、及び音声処理プログラム
CN108337605A (zh) 基于差分波束形成的隐声方法
US11346917B2 (en) Information processing apparatus and information processing method
Ince et al. A hybrid framework for ego noise cancellation of a robot
Wu et al. Speaker localization and tracking in the presence of sound interference by exploiting speech harmonicity
WO2018188287A1 (zh) 一种语音控制方法、装置及家电设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17905655

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 04/03/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 17905655

Country of ref document: EP

Kind code of ref document: A1