WO2020192721A1 - 一种语音唤醒方法、装置、设备及介质 - Google Patents

一种语音唤醒方法、装置、设备及介质 Download PDF

Info

Publication number
WO2020192721A1
WO2020192721A1 PCT/CN2020/081341 CN2020081341W WO2020192721A1 WO 2020192721 A1 WO2020192721 A1 WO 2020192721A1 CN 2020081341 W CN2020081341 W CN 2020081341W WO 2020192721 A1 WO2020192721 A1 WO 2020192721A1
Authority
WO
WIPO (PCT)
Prior art keywords
noise reduction
voice
signals
wake
beamforming
Prior art date
Application number
PCT/CN2020/081341
Other languages
English (en)
French (fr)
Inventor
陈礼文
张科
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Priority to US17/598,702 priority Critical patent/US20230031491A1/en
Priority to EP20777736.8A priority patent/EP3926624B1/en
Publication of WO2020192721A1 publication Critical patent/WO2020192721A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Definitions

  • This application relates to the field of voice interaction, and in particular to a voice wake-up method, device, equipment, and computer-readable storage medium.
  • voice interaction has gradually become a more popular control method.
  • the device In order to realize voice interaction, the device must first be activated from the sleep state to the running state by voice, that is, wake up by voice.
  • voice that is, wake up by voice.
  • the wake-up effect of voice wake-up directly affects the voice interaction experience.
  • Voice wake-up specifically refers to the detection of target keywords in a continuous voice stream to wake up the device or wake up the application.
  • the industry mainly uses beamforming-based microphone array voice enhancement methods to achieve voice wake-up.
  • the input voice is spatially filtered by using the spatial phase information contained in the voice signals received by multiple microphones to form a directivity
  • the spatial beam is then enhanced for the voice signal in the specified direction, which can achieve better enhancement effects than a single microphone.
  • microphone array voice enhancement methods based on beamforming include fixed beamforming and adaptive beamforming.
  • the fixed beam forming method is generally used in scenes where the location of the sound source is unknown.
  • the noise is reduced by selecting several azimuth beams, and then the noise-reduced voice signal is transmitted to the wake-up engine, which is screened by the wake-up engine to determine the sound source location. Then lock the beam direction to reduce noise according to the screening result of the wake-up engine, and then perform keyword detection on the voice signal after noise reduction, so as to realize voice wake-up.
  • this method cannot perform strong noise reduction on the voice signal outside the beam, which limits the noise reduction performance before wake-up.
  • the location information filtered by the wake-up engine is at risk of positioning errors. When the sound source is positioned incorrectly, it can cause the noise reduction algorithm to fail, and the situation cannot be awakened.
  • the adaptive beamforming method is generally applied to scenes where the sound source position is known.
  • a camera, infrared sensor, etc. are used for auxiliary positioning to obtain the sound source position information in advance, and the sound source position information is sent to the front-end noise reduction algorithm to lock the beam and enhance noise reduction Algorithm performance, improve voice wake-up effect.
  • this method requires additional hardware-assisted positioning, which requires high positioning accuracy. When the positioning accuracy cannot meet the requirements, it may also lead to a decrease in noise reduction performance.
  • the implementation of the solution is complicated, and it is difficult to completely cover many scenes and has poor reliability.
  • the present application provides a voice wake-up method, device, device, medium, and computer program product, which are used to improve the noise reduction performance and obtain a better voice wake-up effect without increasing additional costs.
  • the first aspect of the embodiments of the application provides a voice wake-up method, which is applied to an electronic device that collects voice signals through a microphone array, and uses N beamforming noise reduction algorithms to perform noise reduction processing on the voice signals. Obtain N noise reduction signals.
  • Each beamforming noise reduction algorithm corresponds to one of the N regions. Different beamforming noise reduction algorithms correspond to different regions. Therefore, electronic devices can be based on the beams of the regions corresponding to each beamforming noise reduction algorithm. Perform noise reduction processing on the speech signal, and its noise reduction performance will not be limited.
  • the union of N regions covers the signal collection region of the microphone array, where N is a positive integer greater than 1, so that the sound source must be located in at least one of the N regions, and the above N beams form noise reduction
  • At least one of the algorithms is accurate for sound source localization.
  • At least one of the N noise reduction signals obtained by the noise reduction processing of the above N beamforming noise reduction algorithms has a good signal-to-noise ratio.
  • the electronic device uses the wake-up engine to According to at least one noise reduction signal among the N noise reduction signals, such as the above noise reduction signal with a better signal-to-noise ratio, the voice wake-up has a higher wake-up success rate, and the voice wake-up effect is improved.
  • the method uses N beamforming noise reduction algorithms corresponding to different regions to perform noise reduction processing on the voice signal to realize voice wake-up without additional hardware equipment to assist positioning. On the one hand, it saves costs, on the other hand, it is simple to implement and easy to popularize.
  • the higher the hardware processing capability of the electronic device for example, the higher the computing capability of the processor, the more voice signals it can process.
  • the signal collection area of the microphone array is divided into finer granularity, and the beam of each beamforming noise reduction algorithm is narrowed to obtain better noise reduction performance.
  • the electronic device can determine the value of N according to its own hardware processing capability, and then divide the signal collection area according to the value of N to obtain the N areas.
  • the electronic device may set the value of N based on the business scenario type corresponding to the current business scenario, and divide the signal collection area according to the value of N to obtain N areas.
  • N is set to a small value to reduce the amount of calculation, thereby reducing power consumption.
  • the electronic device searches for the parameter configuration file according to the current business scenario type, and determines the value of N that matches the current business scenario type. In this way, when the business scenario of the electronic device changes, the value of N can be automatically switched, so that the electronic device has good wake-up performance in any business scenario.
  • the N areas may be obtained by evenly dividing the signal acquisition area according to angles. In this way, the beam angles corresponding to each area are the same.
  • a beamforming noise reduction algorithm performs noise reduction processing on the speech signal, it has a better balance.
  • the electronic device may use only one microphone array to collect the voice signal.
  • the electronic device uses N beamforming noise reduction algorithms to reduce the noise of the voice signal.
  • N noise reduction signals are obtained by processing
  • the voice signals can be copied first to obtain N channels of the voice signals, and then N beamforming noise reduction algorithms are used to perform noise reduction processing on the N channels of voice signals to obtain N noise reductions.
  • the electronic device obtains N voice signals by duplicating the voice signals.
  • the number of microphone arrays used can be greatly reduced, thereby reducing hardware costs.
  • the electronic device can use multiple microphone arrays to collect voice signals, where each microphone array can collect one voice signal, and multiple microphone arrays can collect multiple voice signals.
  • the electronic device can directly use the N beamforming noise reduction algorithms to reduce noise on the voice signals to obtain N noise reduction signals.
  • the noise algorithm performs noise reduction processing on the N channels of voice signals to obtain N noise reduction signals, where each beamforming noise reduction algorithm is used to process one channel of the N channels of voice signals, and different beamforming noise reduction algorithms Used to process voice signals from different channels. That is, the N beamforming noise reduction algorithms correspond to N channels of speech signals one to one.
  • the electronic device collects N channels of voice signals through its own microphone array, and does not need to perform additional copy processing on the above voice signals. On the one hand, it reduces the requirements on the hardware processing capabilities of the electronic device, and on the other hand, it saves copying. The time required for the voice signal increases the wake-up rate.
  • the electronic device can specifically use the adaptive beamforming noise reduction algorithm to perform noise reduction processing on the voice signal, because each beamforming noise reduction algorithm corresponds to the noise in the N regions An area is equivalent to the location of a sound source determined by each beamforming noise reduction algorithm.
  • the adaptive beamforming noise reduction algorithm can enhance the speech signal in the specified direction to obtain better noise reduction effects and improve wake-up performance.
  • the electronic device may set a wake-up strategy for voice wake-up, so as to reduce the false wake-up rate.
  • the electronic device can use the wake-up engine to determine the similarity between each of the N noise-reduction signals and the wake-up word according to the wake-up algorithm; based on this, the electronic device can perform voice wake-up according to the similarity.
  • the electronic device can set a corresponding wake-up strategy according to actual needs.
  • One wake-up strategy is that if each of the N noise reduction signals reduces noise The average value of the similarity between the signal and the wake-up word is greater than the preset threshold, then voice wake-up is performed; another wake-up strategy is, if there are a preset number of noise-reduction signals among the N noise-reduction signals and the wake-up If the similarity of words is greater than the preset threshold, the voice wake-up is performed.
  • the aforementioned wake-up strategy can also be used in combination.
  • the voice wake-up is performed.
  • a second aspect of the embodiments of the present application provides a voice wake-up device, and the device includes:
  • the noise reduction module is used to perform noise reduction processing on the voice signal using N beamforming noise reduction algorithms to obtain N noise reduction signals; each beamforming noise reduction algorithm corresponds to one of the N regions, and different beams The regions corresponding to the noise reduction algorithm are different; the union of the N regions covers the signal collection region of the microphone array, and the N is a positive integer greater than 1;
  • the wake-up module is used to use the wake-up engine to perform voice wake-up according to at least one of the N noise reduction signals.
  • a third aspect of the embodiments of the present application provides an electronic device, which includes a microphone array, a processor, and a memory:
  • the microphone array is used to collect voice signals
  • the memory is used to store program codes
  • the processor is configured to execute the following steps according to instructions in the program code:
  • each beamforming noise reduction algorithm corresponds to one of the N regions, and different beamforming noise reduction algorithms correspond to Different areas; the union of the N areas covers the signal collection area of the microphone array, and the N is a positive integer greater than 1;
  • the wake-up engine is used to perform voice wake-up according to at least one of the N noise reduction signals.
  • the processor is further configured to execute the following steps:
  • the processor is further configured to execute the following steps:
  • the processor is further configured to execute the following steps:
  • the parameter configuration file is searched according to the current business scenario type, and the value of N that matches the current business scenario type is determined.
  • the processor is further configured to:
  • the N areas are obtained by evenly dividing the signal acquisition area according to the angle.
  • the microphone array is single;
  • the processor performs noise reduction processing to obtain N noise reduction signals, it is specifically used for:
  • N beamforming noise reduction algorithms to perform noise reduction processing on the N channels of voice signals to obtain N noise reduction signals, where each beamforming noise reduction algorithm is used to process one channel of the N channels of voice signals, Different beamforming noise reduction algorithms are used to process different channels of speech signals.
  • each microphone array collects one voice signal, and the multiple microphone arrays collect multiple voice signals;
  • the processor is specifically used for:
  • N beamforming noise reduction algorithms are used to perform noise reduction processing on the N voice signals to obtain N noise reduction signals, where each beamforming noise reduction algorithm is used for One of the N channels of voice signals is processed, and different beamforming noise reduction algorithms are used to process different channels of voice signals.
  • the beamforming noise reduction algorithm includes an adaptive beamforming noise reduction algorithm.
  • the processor uses a wake-up engine to perform voice wake-up according to at least one of the N noise reduction signals, it is specifically configured to:
  • the processor performing voice wake-up according to the score specifically includes:
  • a fourth aspect of the embodiments of the present application provides a computer-readable storage medium, where the computer-readable storage medium is used to store program code, and the program code is used to execute the voice wake-up method described in the first aspect of the embodiments of the present application .
  • the fifth aspect of the embodiments of the present application provides a computer program product containing computer-readable instructions, which when the computer-readable instructions run on a computer, cause the computer to execute the voice wake-up method described in the first aspect of the embodiments of the present application.
  • FIG. 1 is a scene architecture diagram of a voice wake-up method in an embodiment of this application
  • FIG. 2 is a flowchart of a voice wake-up method in an embodiment of this application.
  • FIG. 3A is an example diagram of area distribution in an embodiment of this application.
  • FIG. 3B is an example diagram of area distribution in an embodiment of the application.
  • Figure 5 is a flowchart of a voice wake-up method in an embodiment of the application.
  • FIG. 6 is a schematic diagram of a scene of a voice wake-up method in an embodiment of this application.
  • FIG. 7 is a schematic structural diagram of a voice wake-up device in an embodiment of this application.
  • FIG. 8 is a schematic structural diagram of a terminal in an embodiment of the application.
  • the fixed beam forming method currently used in the industry cannot perform strong noise reduction on the voice signal outside the beam, which limits the noise reduction performance before wake-up, and in the case of low signal-to-noise ratio, the location information filtered by the wake-up engine has positioning
  • the risk of errors which in turn leads to the failure of the noise reduction algorithm, the technical problems that the system cannot wake up, and the adaptive beamforming method requires additional hardware-assisted positioning, which requires high positioning accuracy, and the implementation of the solution is complex, and many scenarios are difficult to fully cover.
  • this application proposes a voice wake-up method that uses multiple beamforming noise reduction algorithms to separately reduce the noise of the voice signal to improve the wake-up performance.
  • the signal collection area of the microphone array can be divided into N sections to form N areas, and the union of the N areas covers the signal collection area of the microphone array.
  • the sound source must be located in the N areas
  • N beamforming noise reduction algorithms in a one-to-one correspondence with the region, so that when the voice signal collected by the microphone array is received, the N beamforming noise reduction algorithms are used
  • the speech signal is processed for noise reduction to obtain N noise reduction signals.
  • the N noise reduction processing obtained by the above N beamforming noise reduction algorithms At least one of the two noise reduction signals has a good signal-to-noise ratio, so the wake-up engine can be used to wake up the voice according to at least one of the N noise reduction signals, which has a higher wake-up success rate and improves the voice Wake up effect.
  • the voice wake-up method provided in this application can be applied to electronic devices.
  • the electronic device can be any device with voice interaction function, including but not limited to smart speakers with voice interaction function, smart home appliances, smart phones, tablet computers, in-vehicle devices, wearable devices, and augmented reality (AR)/ Virtual reality equipment (virtual reality, VR) and so on.
  • the voice wake-up method provided in this application can be specifically stored in an electronic device in the form of an application or software, and the electronic device implements the voice wake-up method provided in this application by executing the application or software.
  • the scene includes a smart speaker 10 in which the user emits a voice.
  • the smart speaker 10 collects voice signals through a microphone array, and uses N beamforming noise reduction algorithms to separately perform Perform noise reduction processing to obtain N noise reduction signals, and then use the wake-up engine to perform voice wakeup according to at least one of the N noise reduction signals, so that the smart speaker 10 is activated from the sleep state to the running state.
  • FIG. 1 is an exemplary illustration of a smart speaker.
  • smart home appliances such as smart refrigerators, smart TVs, or electronic devices such as smart phones can also be used to implement the provisions of this application.
  • Voice wake-up method can also be used to implement the provisions of this application.
  • the voice wake-up method includes:
  • S201 Collect voice signals through the microphone array.
  • the microphone array specifically refers to an arrangement of multiple microphones, which can be used to sample and process the spatial characteristics of the sound field.
  • the microphone array can be linear, circular or spherical.
  • the number of elements of the microphone array that is, the number of microphones, can be set according to actual needs.
  • the microphone array It can be a 2 wheat array, a 4 wheat array, a 6 wheat array or an 8 wheat array.
  • the microphone array has a signal collection area, which specifically refers to the area where the microphone array can collect voice signals.
  • the signal collection area of the microphone array can be a 180° spatial area in front of the TV.
  • the signal collection area of the microphone array includes a 360° spatial area surrounding the microphone array.
  • the microphone array can collect the voice signal from the sound source located in the signal collection area.
  • the sound source can be the user.
  • the sound source can also be other electronic devices, for example, in a smart home system.
  • the smart TV can issue a voice command to turn on the speaker, and the microphone array of the smart speaker can collect the voice signal sent by the smart TV and perform voice recognition based on the voice signal to determine whether to execute the voice command.
  • the number of sound sources may be one or multiple, and the microphone array can simultaneously collect voice signals from one or more sound sources.
  • the electronic device can only use one microphone array to collect voice signals, and perform noise reduction processing based on the voice signals to realize voice wake-up.
  • the electronic device can also use multiple microphone arrays to collect voice signals. In this way, the electronic device can perform noise reduction processing based on the multiple voice signals collected by the multiple microphone arrays based on the multiple voice signals to realize voice wake-up.
  • S202 Perform noise reduction processing on the voice signal by using N beamforming noise reduction algorithms to obtain N noise reduction signals.
  • Each beamforming noise reduction algorithm corresponds to one of the N regions, and different beamforming noise reduction algorithms correspond to different regions; the union of the N regions covers the signal collection area of the microphone array, where N is A positive integer greater than 1.
  • the union of N areas covers the signal collection area of the microphone array, including two situations, one is that there is no intersection between the areas, that is, there is no overlap between the areas, and the other is that there is no overlap between the areas. Intersection, that is, there are regions that overlap. It should be noted that when the number of regions is equal and the regions do not overlap, the signal acquisition region is divided into finer granularity, narrower beam, and relatively more accurate sound source positioning.
  • Figure 3A shows an example of the area in the case of non-overlapping.
  • the signal acquisition area of the microphone array is a 180° spatial area
  • the N areas include 0° to 60° and 60° to 120° in the signal acquisition area. ° and the regions corresponding to the three angular ranges of 120° to 180°, as shown in Figure 3A, 1, 2, and 3;
  • Figure 3B shows an example of the region in the overlap situation.
  • the acquisition area is a 180° spatial area.
  • N areas include the three angle ranges corresponding to 0° to 70°, 60° to 130°, and 110° to 180° in the signal acquisition area, as shown in Figure 3B, 1, 2 , 3 shown.
  • the beamforming noise reduction algorithm is based on the beamforming principle to achieve noise reduction.
  • the beamforming principle is specifically to select an appropriate weighting vector to compensate for the propagation delay of each element in the microphone array, so as to make it in a desired direction
  • the upper array output can be superimposed in the same direction, so that the array produces a main lobe beam in this direction, and can suppress interference in a certain direction.
  • the beamforming noise reduction algorithm corresponds to the region i
  • i is When a positive integer greater than or equal to 1 and less than or equal to N, the array output can be superimposed in the same direction in the direction corresponding to area i, the main lobe beam is generated in this direction, and the interference noise in this direction is consistent.
  • the N beamforming noise reduction algorithms correspond to N different areas in the signal acquisition area. Therefore, when the electronic device uses N beamforming noise reduction algorithms to reduce the noise of the voice signal, it is equivalent to locking the beams of the N areas separately Perform noise reduction processing on the speech signal. In other words, the electronic device respectively assumes that the sound source is in area 1, area 2, ... area N to perform noise reduction processing on the speech signal to obtain N noise reduction signals. Since the union of N areas covers the signal collection area, the sound source must be located in at least one of the above N areas, that is, at least one of the above assumptions is correct. In this case, there is at least one noise reduction module Noise reduction can be carried out for the direction of the sound source position, and the sound emitted by the sound source is not regarded as noise suppression, and the noise reduction performance is not limited due to the unclear sound source position.
  • the electronic device can use the adaptive beamforming noise reduction algorithm to reduce the noise of the voice signal based on the beam in the corresponding area of each algorithm Processing, by enhancing the voice signal in the corresponding direction, a better noise reduction effect can be obtained and the wake-up performance can be improved.
  • the adaptive beamforming noise reduction algorithm uses an adaptive algorithm to optimize the weight set under a certain optimal criterion. It can adapt to various environmental changes and adjust the weight set to near the best position in real time.
  • the optimal criterion can specifically be the minimum military error (MSE) criterion, the maximum signal-to-noise ratio (signal noise ratio, SNR) criterion, the maximum likelihood ratio (LH) criterion or the minimum noise variance ( noise variance, NV) criteria, etc.
  • Adaptive algorithms are mainly divided into closed-loop algorithms and open-loop algorithms.
  • Closed-loop algorithms include but are not limited to least mean square (LMS) algorithm, difference steepest descent (DSD) algorithm, acceleration gradient, AG) algorithm and a modified algorithm of the above algorithm.
  • the open-loop algorithm includes direct inversion methods, such as sample matrix inversion (SMI), direct matrix inversion ((DirectMatrixInversion, DMI), etc.).
  • the closed-loop algorithm is simple to implement, reliable in performance, and does not require data storage; the open-loop algorithm has better convergence speed and cancellation performance, and the corresponding adaptive algorithm can be selected according to actual needs to achieve the optimization of the weight set, thereby achieving voice Noise reduction.
  • the electronic device first copies the voice signal to obtain N channels of the voice signals, and then uses N beamforming noise reduction algorithms to perform noise reduction processing on the N channels of voice signals Obtain N noise reduction signals, where each beamforming noise reduction algorithm is used to process one voice signal among the N channels of voice signals, and different beamforming noise reduction algorithms are used to process different voice signals of different channels.
  • the electronic device uses N beamforming noise reduction algorithms to perform noise reduction processing on the N channels of voice signals to obtain N noise reduction signals , Where each beamforming noise reduction algorithm is used to process one voice signal among the N channels of voice signals, and different beamforming noise reduction algorithms are used to process different voice signals.
  • S203 Use the wake-up engine to perform voice wake-up according to at least one of the N noise reduction signals.
  • the wake-up engine specifically refers to the voice wake-up engine, that is, the core component that wakes up the device or wakes up the application through voice, and is generally expressed in the form of software, such as the main program of voice interaction software.
  • the wake-up engine specifically determines the similarity between the signal input to the wake-up engine and the wake-up word through the wake-up algorithm, and realizes voice wake-up according to the similarity.
  • the electronic device can use the wake-up engine to perform wake-up based on at least one of the N noise-reduction signals, such as the above-mentioned signal with good signal-to-noise ratio.
  • the word detection operation improves the detection probability of the wake-up word.
  • a wake-up event can be generated to achieve the purpose of waking up the electronic device or waking up the application in the electronic device. In this way, the method improves the wake-up success rate.
  • an electronic device uses a wake-up engine to wake up voice according to at least one of the noise reduction signals, it can use N wake-up engines to process the N noise reduction signals, and each wake-up engine processes one noise reduction signal. Signal, and the noise reduction signal processed by each wake-up engine is different, so that wake-up words can be detected in parallel, shorten the detection time, and increase the wake-up rate.
  • the electronic device may also only use one wake-up engine to process the N noise reduction signals to achieve voice wake-up.
  • the electronic device uses a wake-up engine to serially process the N noise reduction signals, sequentially detects the wake-up words in the N noise-reduction signals, and generates a corresponding wake-up event through the wake-up engine when the wake-up word is detected, Achieve voice wake-up.
  • Figure 4 shows a flow chart of the method of using a microphone array and a wake-up engine to achieve voice wake-up.
  • the microphone array 1 collects voice signals, and the electronic device copies the voice signals collected by the microphone array to obtain N channels of voice signals. Then, the N voice signals are divided into N noise reduction algorithm modules in a one-to-one correspondence manner, and each noise reduction algorithm module stores a beamforming noise reduction algorithm, specifically beamforming noise reduction algorithm 1.
  • Beamforming noise reduction Algorithm 2 ...Beamforming noise reduction algorithm n, where beamforming noise reduction algorithm 1 locks the beam from 0 to m1 degrees, and beamforming noise reduction algorithm 2 locks the beam from m1 to m2 degrees.
  • Beamforming noise reduction algorithm n Lock the beams from m(n-1) to m(n) degrees, the above-mentioned N noise reduction algorithm modules perform noise reduction processing on the voice signal to obtain N noise reduction signals, and then send the N noise reduction signals to the wake-up engine 1.
  • Noise signal, wake-up engine 1 is based on voice wake-up based on N noise reduction signals.
  • FIG. 5 shows the flow chart of the method of using N microphone arrays and N wake-up engines to achieve voice wake-up.
  • each microphone array collects one voice signal
  • N microphone arrays collect N voice signals.
  • voice signal 1 voice signal 2... voice signal n
  • voice signal n input the voice signal into N noise reduction algorithm modules in a one-to-one correspondence relationship
  • the parameter configuration of the N noise reduction algorithm modules is the same as that in FIG.
  • the noise reduction algorithm module uses the beamforming noise reduction algorithm to perform the noise reduction N noise reduction signals, and then sends the noise reduction signals to the wake-up engine 1, the wake-up engine 2 in a one-to-one correspondence relationship, respectively, the wake-up engine n , N wake-up engines detect the wake-up words according to their corresponding noise reduction signals, and if one wake-up engine detects the wake-up words, it will wake up by voice.
  • the embodiment of the present application provides a voice wake-up method.
  • voice signals are collected through a microphone array, and N beamforming noise reduction algorithms are used to perform noise reduction processing on the voice signals to obtain N noise reduction algorithms.
  • Noise signal where each beamforming noise reduction algorithm corresponds to one of the N regions, and different beamforming noise reduction algorithms correspond to different regions, so the voice signal can be performed based on the beams of the regions corresponding to each beamforming noise reduction algorithm.
  • Noise reduction processing its noise reduction performance will not be limited, can maximize the noise reduction performance of the noise reduction algorithm.
  • the electronic device can also cancel the echo outside the beam, improving the echo cancellation performance.
  • the union of N areas covers the signal collection area of the microphone array, and the sound source must be located in at least one area of the N areas.
  • the sound source may be located in one area of the N areas or in multiple areas.
  • Such as two adjacent areas of course, when there are multiple sound sources, the area in which they are located can be multiple adjacent areas or multiple non-adjacent areas.
  • the above N beams form noise reduction At least one of the algorithms is accurate for sound source localization.
  • At least one of the N noise reduction signals obtained by the noise reduction processing of the above N beamforming noise reduction algorithms has a good signal-to-noise ratio, and electronic devices can use the wake-up engine
  • the voice wake-up is performed according to at least one of the N noise reduction signals, which has a higher wake-up success rate and improves the voice wake-up effect.
  • the size of the area directly affects the width of the beam, which in turn affects the noise reduction effect of the beamforming noise reduction algorithm.
  • the value of N can be set reasonably to make the beamforming noise reduction algorithm
  • the corresponding area size can meet the demand for noise reduction performance.
  • the electronic device may determine the value of N according to its own hardware processing capability, and then divide the signal collection area of the microphone array according to the value of N to obtain the N areas.
  • the value of N can be set to be proportional to the hardware processing capability within a certain range, and N increases with the increase of the hardware processing capability, and decreases with the decrease of the hardware processing capability.
  • the beam angle is less than the preset angle, such as 15°, which will cause the electronic device to recognize the noise as the voice signal from the sound source, which will affect the noise reduction performance. Based on this, you can set N The value of is not greater than the preset value.
  • the hardware processing capabilities of smart speakers are relatively limited.
  • the value of N can be set to a small value, such as 3; the hardware processing capabilities of smart phones are updated with the chip Continuous enhancement.
  • the value of N can be set to a larger value, such as 10; it should be noted that based on different hardware configurations, the hardware processing capabilities of different smart phones can be different. For low-configuration mobile phones, you can set N to a small value, such as 5, and for high-configuration phones, you can set N to a large value, such as 10.
  • the electronic device may also determine the value of N based on the current service scenario type, and divide the signal collection area based on the value of N, thereby determining the N areas.
  • the electronic device obtains the current business scenario type, and the current business scenario type represents the ability of the electronic device to provide services in the current scenario.
  • the current business scenario type may include a low-power business scenario type or standard power consumption Business scenario type. If the current business scenario type is a low-power business scenario type, it indicates that the remaining power of the electronic device is low, and only low-power services or basic services are provided. If the current business scenario type is a standard power-consumption business scenario type, it indicates The remaining power of the electronic equipment is sufficient, and the services that can be provided are not limited. It can provide low-power services or high-power services. In addition to basic services, it can also provide value-added services. Then the electronic equipment can provide services based on the current business scenario. Look up the parameter configuration file, determine the value of N that matches the current service scenario type, and then divide the signal collection area according to the value of N to obtain the N areas.
  • the parameter configuration file saves the corresponding relationship between the business scenario type and N.
  • the corresponding relationship can be written into the parameter configuration file in advance based on experience values.
  • the electronic device After the electronic device obtains the current business scenario type, it saves it in the parameter configuration file. Determine the value of N according to the above-mentioned corresponding relationship.
  • the current business scenario type is a standard power consumption business scenario.
  • the electronic device After the electronic device obtains the current business scenario type, it searches for the parameter configuration file to obtain N In this example, N can be 5, then the electronic device divides the signal collection area into 5 parts to form 5 areas, and reduces the voice signal according to the beamforming noise reduction algorithm corresponding to the 5 areas Noise processing, and the wake-up engine determines whether to generate a wake-up event according to the processed noise reduction signal.
  • the remaining power continues to decrease.
  • the current business scenario type is a low-power business scenario
  • the electronic device only provides low-power services.
  • the electronic device searches for the parameter configuration file to obtain the value of N.
  • N can be
  • the electronic device re-divides the signal acquisition area, that is, divides the signal acquisition area into 3 parts to form 3 areas, and performs noise reduction processing on the voice signal according to the beamforming noise reduction algorithm corresponding to the 3 areas, and uses wake-up
  • the engine determines whether to generate a wake-up event according to the processed noise reduction signal to achieve voice wake-up.
  • the N regions can be obtained by dividing the signal collection region evenly according to angles. In this way, the beam angles corresponding to each region are the same, and the N beamforming noise reduction algorithms are used for voice When the signal is processed for noise reduction, it has better balance.
  • the N regions may also be uneven, which is not limited in this embodiment.
  • the electronic device may also set some wake-up strategies to wake up the device or wake up the application when the conditions specified by the wake-up strategy are met, thereby reducing the false wake-up rate.
  • the electronic device after the electronic device performs noise reduction processing on the voice signal to obtain N noise reduction signals, it can use the wake-up engine to determine the similarity between each noise reduction signal in the N noise reduction signals and the wake word, and then according to the Similarity is used for voice wake-up.
  • the embodiments of the present application provide several implementation manners.
  • One implementation is that if the average value of the similarity between each of the N noise reduction signals and the wake-up word is greater than a preset threshold, voice wake-up is performed; another implementation is: If the similarity between the predetermined number of noise reduction signals and the wake-up word among the N noise reduction signals is greater than a predetermined threshold, then voice wake-up is performed.
  • the similarity between the noise reduction signal and the wake-up word can represent the probability of the wake-up word in the speech signal
  • the similarity to the wake-up word is high, or the N noise-reduction signals are similar
  • a higher average degree indicates that there is a greater chance of a wake-up word in the voice signal.
  • voice wake-up can be performed, otherwise, no voice wake-up is performed. In this way, performing voice wakeup based on the similarity can reduce the number of false wakeups and reduce the false wakeup rate.
  • this application also provides a scenario embodiment to illustrate the voice wake-up method.
  • the smart phone 600 includes a microphone array 610.
  • the microphone array 610 is specifically a linear microphone array formed by 4 microphones.
  • the microphone array 610 collects the voice signal sent by the user, and then transmits the voice signal to the processor 640 of the smart phone 600, and the processor 640 640 performs noise reduction processing on the voice signal.
  • the processor 640 first copies the above voice signal to obtain 4 voice signals, and then uses 4 beamforming noise reduction algorithms corresponding to different regions to perform noise reduction processing on the 4 voice signals Among them, one beamforming noise reduction algorithm is used to process one voice signal, and different beamforming noise reduction algorithms process different voice signals.
  • the wake-up engine 650 of 600 identified three noise-reduction signals in the above-mentioned noise-reduction signal according to the wake-up algorithm, and the similarity reached 90% with the wake-up word "Little E, hello", thus generating a wake-up event and
  • the application layer 660 of the smart phone 600 reports a wake-up event to wake up the smart phone 600.
  • the smart phone 600 can be awakened by flashing the Home button indicator 620. After being awakened, the microphone array 610 continues to collect voice signals. For example, when the user says "please call xxx", the microphone array 610 can collect the voice signal corresponding to the content. After the voice signal is transmitted to the processor 640, The processor 640 can recognize the voice signal, and call the corresponding application according to the recognition result, for example, call a call application to make a call, as shown in the interface 630, in this way, voice interaction is realized.
  • the above describes a voice wake-up method in the present application, and the following describes a device for executing the above voice wake-up method.
  • the voice wake-up device provided by the embodiments of the present application may specifically be any electronic device with voice interaction function, including smart home appliances, smart terminals, vehicle-mounted terminals, wearable devices, AR/VR devices, etc., which have an implementation corresponding to the above-mentioned figure.
  • the function of the voice wake-up method provided in any embodiment corresponding to 1 to FIG. 6.
  • the function can be realized by hardware, or by hardware executing corresponding software.
  • the hardware or software includes one or more modules corresponding to the above functions, and the modules may be software and/or hardware.
  • the voice wake-up device 700 includes:
  • the collection module 710 is configured to collect voice signals through a microphone array
  • the noise reduction module 720 is configured to perform noise reduction processing on the voice signal using N beamforming noise reduction algorithms to obtain N noise reduction signals; each beamforming noise reduction algorithm corresponds to one of the N regions and is different The areas corresponding to the beamforming noise reduction algorithm are different; the union of the N areas covers the signal collection area of the microphone array, and the N is a positive integer greater than 1;
  • the wake-up module 730 is configured to use the wake-up engine to perform voice wake-up according to at least one of the N noise reduction signals.
  • the collection module 710 can be specifically used to execute the method in S201.
  • the noise reduction module 720 can be specifically used to execute the method in S202.
  • the wake-up module 730 can be specifically used to execute the method in S203.
  • the device 700 further includes:
  • the first determining module is configured to determine the value of N according to the hardware processing capability, and divide the signal acquisition area according to the value of N to obtain the N areas.
  • the first determining module may refer to the description of related content for determining N regions in the embodiment shown in FIG. 2.
  • the device 700 further includes:
  • the second determining module is configured to determine the value of N according to the current service scenario type, and divide the signal collection area according to the value of N to obtain the N areas.
  • the second determining module determines the value of N, it is specifically configured to:
  • the parameter configuration file is searched according to the current business scenario type, and the value of N that matches the current business scenario type is determined.
  • the second determining module may refer to the description of related content about determining N regions in the embodiment shown in FIG. 2.
  • the N areas are obtained by uniformly dividing the signal collection area according to angles.
  • the collection module 710 is specifically configured to:
  • the noise reduction module 720 is specifically used for:
  • N beamforming noise reduction algorithms to perform noise reduction processing on the N channels of voice signals to obtain N noise reduction signals, where each beamforming noise reduction algorithm is used to process one channel of the N channels of voice signals, Different beamforming noise reduction algorithms are used to process different channels of speech signals.
  • the noise reduction module 720 may refer to the description of related content in S202 in the embodiment shown in FIG. 2 and the description of related content in the embodiment shown in FIG. 4.
  • the collection module 710 is specifically configured to:
  • each microphone array collects one voice signal
  • the noise reduction module 720 is specifically used for:
  • N beamforming noise reduction algorithms to perform noise reduction processing on the N channels of voice signals to obtain N noise reduction signals, where each beamforming noise reduction algorithm is used to process one channel of the N channels of voice signals, Different beamforming noise reduction algorithms are used to process different channels of speech signals.
  • the noise reduction module 720 may refer to the description of related content in S202 in the embodiment shown in FIG. 2 and the description of related content in the embodiment shown in FIG. 5.
  • the beamforming noise reduction algorithm includes an adaptive beamforming noise reduction algorithm.
  • the wake-up module 730 is specifically configured to:
  • the wake-up module 730 may refer to the description of related content after S203 in the embodiment shown in FIG. 2.
  • the wake-up module 730 when the wake-up module 730 performs voice wake-up according to the score, it specifically includes:
  • the wake-up module 730 may refer to the description of related content after S203 in the embodiment shown in FIG. 2.
  • FIG. 8 shows a schematic structural diagram of a terminal provided by an embodiment of the present application.
  • the terminal 800 includes a microphone array 801, a processor 802, and a memory 803.
  • the microphone array 801 is used to collect voice signals
  • the memory 803 is used to store program code
  • the processor 802 is used to call the program code in the memory to execute the following steps to implement the voice wake-up method provided in FIG. 2 above:
  • each beamforming noise reduction algorithm corresponds to one of the N regions, and different beamforming noise reduction algorithms correspond to Different areas; the union of the N areas covers the signal collection area of the microphone array, and the N is a positive integer greater than 1;
  • the wake-up engine is used to perform voice wake-up according to at least one of the N noise reduction signals.
  • processor 802 is further configured to execute the following steps:
  • processor 802 is further configured to execute the following steps:
  • processor 802 is further configured to execute the following steps:
  • the parameter configuration file is searched according to the current business scenario type, and the value of N that matches the current business scenario type is determined.
  • processor 802 is further configured to:
  • the N areas are obtained by evenly dividing the signal acquisition area according to the angle.
  • the microphone array is single;
  • processor 802 When the processor 802 performs noise reduction processing to obtain N noise reduction signals, it is specifically configured to:
  • N beamforming noise reduction algorithms to perform noise reduction processing on the N channels of voice signals to obtain N noise reduction signals, wherein each beamforming noise reduction algorithm is used to process one voice signal among the N channels of voice signals.
  • Different beamforming noise reduction algorithms are used to process different channels of speech signals.
  • each microphone array collects one voice signal, and the multiple microphone arrays collect multiple voice signals;
  • the processor 802 is specifically configured to:
  • N beamforming noise reduction algorithms are used to perform noise reduction processing on the N voice signals to obtain N noise reduction signals, where each beamforming noise reduction algorithm is used for One of the N channels of voice signals is processed, and different beamforming noise reduction algorithms are used to process different channels of voice signals.
  • the beamforming noise reduction algorithm includes an adaptive beamforming noise reduction algorithm.
  • the processor 802 uses a wake-up engine to perform voice wake-up according to at least one of the N noise reduction signals, it is specifically configured to:
  • the processor 802 performs voice wake-up according to the score, which specifically includes:
  • the embodiment of the present application also provides a computer-readable storage medium, where the computer-readable storage medium is used to store program code, and the program code is used to execute the voice wake-up method described in the present application.
  • the embodiments of the present application also provide a computer program product containing computer-readable instructions that, when the computer-readable instructions are run on a computer, cause the computer to execute the voice wake-up method described in the foregoing aspects.
  • the disclosed device and method may be implemented in other ways.
  • the device embodiments described above are only illustrative.
  • the division of the units is only a logical function division, and there may be other divisions in actual implementation, for example, multiple units or components can be combined or It can be integrated into another system, or some features can be ignored or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physical units, that is, they may be located in one place, or they may be distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • At least one (item) refers to one or more, and “multiple” refers to two or more.
  • “And/or” is used to describe the association relationship of associated objects, indicating that there can be three types of relationships, for example, “A and/or B” can mean: only A, only B, and both A and B , Where A and B can be singular or plural.
  • the character “/” generally indicates that the associated objects are in an “or” relationship.
  • the following at least one item (a)” or its similar expression refers to any combination of these items, including any combination of single item (a) or plural items (a).
  • At least one (a) of a, b or c can mean: a, b, c, "a and b", “a and c", “b and c", or "a and b and c" ", where a, b, and c can be single or multiple.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Quality & Reliability (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

一种语音唤醒方法、装置、设备及介质,其中,语音唤醒方法包括通过麦克风阵列采集语音信号(S201);利用N个波束形成降噪算法分别对语音信号进行降噪处理得到N个降噪信号;每个波束形成降噪算法对应N个区域中的一个区域,且不同波束形成降噪算法对应的区域不同;N个区域的并集覆盖麦克风阵列的信号采集区域,N为大于1的正整数(S202);利用唤醒引擎,根据N个降噪信号中的至少一个进行语音唤醒(S203)。该方法能够最大限度发挥降噪算法的降噪性能,能够对波束外的回声也进行消除,提升回声消除性能,具有较高唤醒率和识别率,而且实现简单、易于推广。

Description

一种语音唤醒方法、装置、设备及介质
本申请要求在2019年3月28日提交中国国家知识产权局、申请号为201910243897.3、发明名称为“一种语音唤醒方法、装置、设备及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及语音交互领域,尤其涉及一种语音唤醒方法、装置、设备及计算机可读存储介质。
背景技术
随着语音识别技术的发展,语音交互逐渐成为一种较为流行的控制方式。为了实现语音交互,首先需要通过语音将设备由休眠状态激活至运行状态,也即需要进行语音唤醒。作为语音交互的入口,语音唤醒的唤醒效果直接影响着语音交互体验。
语音唤醒具体是指在连续语音流中检测出目标关键词,以唤醒设备或唤醒应用。目前,业界主要采用基于波束形成(beamforming)的麦克风阵列语音增强方法实现语音唤醒,具体地,利用多个麦克风接收到的语音信号里包含的空间相位信息对输入语音进行空间滤波,形成具有指向性的空间波束,然后对指定方向上的语音信号进行增强,能取得比单麦克风更好的增强效果。其中,基于波束形成的麦克风阵列语音增强方法包括固定波束形成法和自适应波束形成法。
固定波束形成法一般应用于声源位置未知的场景中,通过选取几个方位的波束进行降噪,再将降噪后的语音信号传输给唤醒引擎,由唤醒引擎进行筛选以确定声源位置,然后根据唤醒引擎的筛选结果锁定波束方向进行降噪,然后再对降噪后的语音信号进行关键词检测,从而实现语音唤醒。然而,该方法不能对波束外的语音信号进行强降噪,限制了唤醒前的降噪性能,并且在低信噪比的情况下,通过唤醒引擎筛选出的方位信息存在定位错误的风险,当声源定位错误时,可以导致降噪算法失效,出现无法唤醒的情况。
自适应波束形成法一般应用于声源位置已知的场景,通过摄像头、红外传感器等进行辅助定位以提前获取声源位置信息,将声源位置信息发送给前端降噪算法锁定波束,增强降噪算法性能,提升语音唤醒效果。然而,该方法需要额外的硬件辅助定位,其定位精度要求较高,当定位精度无法满足要求时,还可能导致降噪性能下降,并且该方案实现复杂,很多场景难以完全覆盖,可靠性差。
发明内容
有鉴于此,本申请提供了一种语音唤醒方法、装置、设备、介质以及计算机程序产品,用于在不增加额外成本的情况下提高降噪性能,获得较好的语音唤醒效果。
本申请实施例第一方面提供了一种语音唤醒方法,该方法应用于电子设备,该电子设备通过麦克风阵列采集语音信号,利用N个波束形成降噪算法分别对所述语音信号进行降噪处理得到N个降噪信号,每个波束形成降噪算法对应N个区域中的一个区域,不同波束 形成降噪算法对应的区域不同,因而电子设备能够基于各个波束形成降噪算法对应的区域的波束对语音信号进行降噪处理,其降噪性能不会受到限制。
并且,N个区域的并集覆盖所述麦克风阵列的信号采集区域,其中,N为大于1的正整数,如此,声源必然位于N个区域中的至少一个区域,上述N个波束形成降噪算法中至少有一个对声源定位是准确的,通过上述N个波束形成降噪算法降噪处理得到的N个降噪信号中至少有一个具有较好的信噪比,电子设备利用唤醒引擎,根据所述N个降噪信号中的至少一个降噪信号,如上述信噪比较好的降噪信号进行语音唤醒,具有较高的唤醒成功率,提升了语音唤醒效果。该方法通过N个对应不同区域的波束形成降噪算法对语音信号进行降噪处理实现语音唤醒,无需额外的硬件设备辅助定位,一方面节省了成本,另一方面实现简单,易于推广。
在本申请实施例第一方面的第一种实现方式中,电子设备的硬件处理能力越高,如处理器的计算能力越高,其能够处理的语音信号越多,在此种情形下,可以将麦克风阵列的信号采集区域按照更细粒度进行划分,将每个波束形成降噪算法的波束收窄以获得更好的降噪性能。基于此,电子设备可以根据自身硬件处理能力确定所述N的取值,然后按照所述N的取值划分所述信号采集区域得到所述N个区域。
在本申请实施例第一方面的第二种实现方式中,电子设备可以基于当前所处业务场景对应的业务场景类型设置N的取值,按照N的取值划分信号采集区域得到N个区域,从而实现在满足当前业务场景需求的前提下,获得较好的降噪性能,提高语音唤醒性能。例如,在低功耗业务场景中,将N设置为较小值,以减少计算量,从而减少功耗。
进一步地,在本申请实施例第一方面的第三种实现方式中,电子设备根据当前业务场景类型查找参数配置文件,确定与所述当前业务场景类型匹配的N的取值。如此,电子设备的业务场景发生变化时,能够自动切换N的取值,从而使得电子设备在任意业务场景下均具有良好的唤醒性能。
在本申请实施例第一方面的第四种实现方式中,N个区域可以是按照角度对所述信号采集区域均匀划分得到的,如此,每个区域对应的波束角度是一致的,在利用N个波束形成降噪算法对语音信号进行降噪处理时,具有较好的均衡性。
在本申请实施例第一方面的第五种实现方式中,电子设备可以仅使用一个麦克风阵列采集语音信号,如此,电子设备在利用N个波束形成降噪算法分别对所述语音信号进行降噪处理得到N个降噪信号时,可以先复制所述语音信号得到N路所述语音信号,然后再利用N个波束形成降噪算法对所述N路语音信号进行降噪处理得到N个降噪信号,其中,每个波束形成降噪算法用于处理所述N路语音信号中的一路语音信号,不同波束形成降噪算法用于处理不同路的语音信号。
在此种情形下,电子设备通过对语音信号进行复制处理得到N路语音信号,如此,可以大幅减少使用的麦克风阵列数量,从而降低硬件成本。
在本申请实施例第一方面的第六种实现方式中,电子设备可以使用多个麦克风阵列采集语音信号,其中,每个麦克风阵列可以采集一路语音信号,多个麦克风阵列可以采集多路语音信号,当多路语音信号为N路语音信号时,电子设备在利用N个波束形成降噪算法分别对所述语音信号进行降噪处理得到N个降噪信号时,可以直接利用N个波束形成降噪 算法对所述N路语音信号进行降噪处理得到N个降噪信号,其中,每个波束形成降噪算法用于处理所述N路语音信号中的一路语音信号,不同波束形成降噪算法用于处理不同路的语音信号。也即N个波束形成降噪算法与N路语音信号一一对应。
在此种情形下,电子设备通过自身的麦克风阵列采集得到N路语音信号,无需对上述语音信号进行额外的复制处理,一方面降低了对电子设备硬件处理能力的要求,另一方面节约了复制语音信号所需时间,提高了唤醒速率。
在本申请实施例第一方面的第七种实现方式中,电子设备具体可以利用自适应波束形成降噪算法对语音信号进行降噪处理,由于每个波束形成降噪算法对应N个区域中的一个区域,相当于每个波束形成降噪算法确定了一个声源位置,如此,利用自适应波束形成降噪算法可以对指定方向上的语音信号进行增强,获得较好的降噪效果,提升唤醒性能。
在本申请实施例第一方面的第八种实现方式中,电子设备可以设置唤醒策略进行语音唤醒,以降低误唤醒率。具体地,电子设备可以利用唤醒引擎,根据唤醒算法确定所述N个降噪信号中每一个降噪信号与唤醒词的相似度;基于此,电子设备可以根据所述相似度进行语音唤醒。
进一步地,在本申请实施例第一方面的第九种实现方式中,电子设备可以根据实际需求设置相应的唤醒策略,一种唤醒策略为,若所述N个降噪信号中每一个降噪信号与所述唤醒词的相似度的平均值大于预设阈值,则进行语音唤醒;另一种唤醒策略为,若所述N个降噪信号中有预设数量个降噪信号与所述唤醒词的相似度大于预设阈值,则进行语音唤醒。当然,在实际应用时,也可以将上述唤醒策略结合使用,具体地,若所述N个降噪信号中每一个降噪信号与所述唤醒词的相似度的平均值大于预设阈值,且所述N个降噪信号中有预设数量个降噪信号与所述唤醒词的相似度大于预设阈值,则进行语音唤醒。
本申请实施例第二方面提供了一种语音唤醒装置,所述装置包括:
采集模块,用于通过麦克风阵列采集语音信号;
降噪模块,用于利用N个波束形成降噪算法分别对所述语音信号进行降噪处理得到N个降噪信号;每个波束形成降噪算法对应N个区域中的一个区域,且不同波束形成降噪算法对应的区域不同;所述N个区域的并集覆盖所述麦克风阵列的信号采集区域,所述N为大于1的正整数;
唤醒模块,用于利用唤醒引擎,根据所述N个降噪信号中的至少一个进行语音唤醒。
本申请实施例第三方面提供了一种电子设备,所述电子设备包括麦克风阵列、处理器和存储器:
所述麦克风阵列用于采集语音信号;
所述存储器用于存储程序代码;
所述处理器用于根据所述程序代码中的指令执行如下步骤:
利用N个波束形成降噪算法分别对所述语音信号进行降噪处理得到N个降噪信号;每个波束形成降噪算法对应N个区域中的一个区域,且不同波束形成降噪算法对应的区域不同;所述N个区域的并集覆盖所述麦克风阵列的信号采集区域,所述N为大于1的正整数;
利用唤醒引擎,根据所述N个降噪信号的至少一个进行语音唤醒。
在本申请实施例第三方面的第一种实现方式中,所述处理器还用于执行如下步骤:
根据硬件处理能力确定所述N的取值;
按照所述N的取值划分所述信号采集区域得到所述N个区域。
在本申请实施例第三方面的第二种实现方式中,所述处理器还用于执行如下步骤:
根据当前业务场景类型确定所述N的取值;
按照所述N的取值划分所述信号采集区域得到所述N个区域。
在本申请实施例第三方面的第三种实现方式中,所述处理器还用于执行如下步骤:
根据当前业务场景类型查找参数配置文件,确定与所述当前业务场景类型匹配的N的取值。
在本申请实施例第三方面的第四种实现方式中,所述处理器还用于:
按照角度对所述信号采集区域均匀划分得到所述N个区域。
在本申请实施例第三方面的第五种实现方式中,所述麦克风阵列为单个;
则所述处理器在进行降噪处理得到N个降噪信号时,具体用于:
复制所述语音信号得到N路所述语音信号;
利用N个波束形成降噪算法对所述N路语音信号进行降噪处理得到N个降噪信号,其中,每个波束形成降噪算法用于处理所述N路语音信号中的一路语音信号,不同波束形成降噪算法用于处理不同路的语音信号。
在本申请实施例第三方面的第六种实现方式中,所述麦克风阵列为多个;每个麦克风阵列采集一路语音信号,所述多个麦克风阵列采集多路语音信号;
所述处理器具体用于:
所述多路语音信号为N路语音信号时,利用N个波束形成降噪算法对所述N路语音信号进行降噪处理得到N个降噪信号,其中,每个波束形成降噪算法用于处理所述N路语音信号中的一路语音信号,不同波束形成降噪算法用于处理不同路的语音信号。
在本申请实施例第三方面的第七种实现方式中,所述波束形成降噪算法包括自适应波束形成降噪算法。
在本申请实施例第三方面的第八种实现方式中,所述处理器在利用唤醒引擎,根据所述N个降噪信号的至少一个进行语音唤醒时,具体用于:
利用唤醒引擎确定所述N个降噪信号中每一个降噪信号与唤醒词的相似度;
根据所述相似度进行语音唤醒。
在本申请实施例第三方面的第九种实现方式中,所述处理器根据所述评分进行语音唤醒,具体包括:
若所述N个降噪信号中每一个降噪信号与所述唤醒词的相似度的平均值大于预设阈值,和/或,所述N个降噪信号中有预设数量个降噪信号与所述唤醒词的相似度大于预设阈值,则进行语音唤醒。
本申请实施例第四方面提供了一种计算机可读存储介质,所述计算机可读存储介质用于存储程序代码,所述程序代码用于执行本申请实施例第一方面所述的语音唤醒方法。
本申请实施例第五方面提供了一种包含计算机可读指令的计算机程序产品,当该计算机可读指令在计算机上运行时,使得计算机执行本申请实施例第一方面所述的语音唤醒方法。
附图说明
图1为本申请实施例中一种语音唤醒方法的场景架构图;
图2为本申请实施例中一种语音唤醒方法的流程图;
图3A为本申请实施例中区域分布示例图;
图3B为本申请实施例中区域分布示例图;
图4为本申请实施例中一种语音唤醒方法的流程图;
图5为本申请实施例中一种语音唤醒方法的流程图;
图6为本申请实施例中一种语音唤醒方法的场景示意图;
图7为本申请实施例中一种语音唤醒装置的结构示意图;
图8为本申请实施例中一种终端的结构示意图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行描述。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”、“第三”、“第四”等(如果存在)是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本申请的实施例例如能够以除了在这里图示或描述的那些以外的顺序实施。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
针对业界目前使用的固定波束形成法不能对波束外的语音信号进行强降噪,限制了唤醒前的降噪性能,并且在低信噪比的情况下,通过唤醒引擎筛选出的方位信息存在定位错误的风险,进而导致降噪算法失效,系统无法唤醒的技术问题,以及自适应波束形成法需要额外的硬件辅助定位,其定位精度要求较高,并且方案实现复杂,很多场景难以完全覆盖,导致可靠性较差的技术问题,本申请提出了一种通过多个波束形成降噪算法分别对语音信号进行降噪,以提升唤醒性能的语音唤醒方法。
具体地,麦克风阵列的信号采集区域可以划分成N份,形成N个区域,所述N个区域的并集覆盖所述麦克风阵列的信号采集区域,如此,声源必然位于所述N个区域中的至少一个区域,然后提供N个与所述区域呈一一对应关系的波束形成降噪算法,如此,接收到麦克风阵列采集的语音信号时,利用所述N个波束形成降噪算法分别对所述语音信号进行降噪处理得到N个降噪信号,由于上述N个波束形成降噪算法中至少有一个对声源定位是准确的,通过上述N个波束形成降噪算法降噪处理得到的N个降噪信号中至少有一个具有较好的信噪比,因而能够利用唤醒引擎,根据所述N个降噪信号中的至少一个进行语音唤醒,其具有较高的唤醒成功率,提升了语音唤醒效果。
可以理解,本申请提供的语音唤醒方法可以应用于电子设备。该电子设备可以是任意具有语音交互功能的设备,包括但不限于具有语音交互功能的智能音箱、智能家电、智能 手机、平板电脑、车载设备、可穿戴设备以及增强现实(Augmented Reality,AR)/虚拟现实设备(virtual reality,VR)等等。本申请提供的语音唤醒方法具体可以以应用程序或软件的形式存储于电子设备,电子设备通过执行该应用程序或软件,实现本申请提供的语音唤醒方法。
为了使得本申请的技术方案更加清楚、易于理解,首先结合具体场景对本申请提供的语音唤醒方法的系统框架进行介绍。参见图1所示的语音唤醒方法的场景架构图,该场景中包括智能音箱10,用户发出语音,智能音箱10通过麦克风阵列采集语音信号,利用N个波束形成降噪算法分别对所述语音信号进行降噪处理得到N个降噪信号,然后利用唤醒引擎,根据所述N个降噪信号中的至少一个进行语音唤醒,使得智能音箱10由休眠状态激活为运行状态。
需要说明的是,图1是以智能音箱进行示例性说明的,在本申请实施例其他可能的实现方式中,也可以是智能冰箱、智能电视等智能家电或者智能手机等电子设备实现本申请提供的语音唤醒方法。
接下来,从电子设备的角度,对本申请提供的语音唤醒方法进行详细介绍。参见图2所示的语音唤醒方法的流程图,该方法包括:
S201:通过麦克风阵列采集语音信号。
麦克风阵列具体是指多个麦克风的排列,其可以用于对声场的空间特性进行采样并处理。具体到本申请实施例,麦克风阵列可以是线性的,也可以是环形或球形,麦克风阵列的阵元数量,也即麦克风数量,可以根据实际需求而设置,作为本申请的一些具体示例,麦克风阵列可以为2麦阵列、4麦阵列、6麦阵列或者是8麦阵列。
麦克风阵列具有信号采集区域,该信号采集区域具体是指该麦克风阵列能够采集到语音信号的区域,例如,针对电视等电子设备,其麦克风阵列的信号采集区域可以是电视前方180°的空间区域,又例如针对音箱等设备,其麦克风阵列的信号采集区域包括环绕该麦克风阵列360°的空间区域。
麦克风阵列能够采集位于信号采集区域的声源发出的语音信号,该声源可以是用户,当然,在一些可能的实现方式中,该声源也可以是其他电子设备,例如,在智能家居系统中,智能电视可以发出开启音箱的语音指令,智能音箱的麦克风阵列可以采集智能电视发出的语音信号,并基于该语音信号进行语音识别,从而确定是否执行上述语音指令。在实际应用时,声源的数量可以是一个,也可以是多个,该麦克风阵列能够同时采集一个或多个声源发出的语音信号。
考虑到硬件成本问题,电子设备可以仅采用一个麦克风阵列采集语音信号,基于该语音信号进行降噪处理,实现语音唤醒。当然,电子设备也可以采用多个麦克风阵列采集语音信号,如此,电子设备可以基于多个麦克风阵列采集的多路语音信号,基于该多路语音信号进行降噪处理,以实现语音唤醒。
S202:利用N个波束形成降噪算法分别对所述语音信号进行降噪处理得到N个降噪信号。
每个波束形成降噪算法对应N个区域中的一个区域,且不同波束形成降噪算法对应的区域不同;所述N个区域的并集覆盖所述麦克风阵列的信号采集区域,所述N为大于1 的正整数。
其中,N个区域的并集覆盖所述麦克风阵列的信号采集区域包括两种情况,一种情况为各区域之间不存在交集,也即区域不存在重叠,另一种情况为区域之间存在交集,也即有区域存在重叠。需要说明的是,在区域数量相等的情况下,区域不重叠时,其对于信号采集区域的划分粒度更细,波束更窄,对声源定位相对更为精准。
为了便于理解,结合具体示例对N个区域的并集覆盖麦克风阵列的信号采集区域进行详细说明。图3A示出了不重叠情形下区域的一个示例,在该示例中,麦克风阵列的信号采集区域为180°的空间区域,N个区域包括信号采集区域中0°至60°、60°至120°以及120°至180°等三个角度范围对应的区域,如图3A中1、2、3所示;图3B示出了重叠情形下区域的一个示例,在该示例中,麦克风阵列的信号采集区域为180°的空间区域,N个区域包括信号采集区域中0°至70°、60°至130°以及110°至180°等三个角度范围对应的区域,如图3B中1、2、3所示。
波束形成降噪算法是基于波束形成原理实现降噪的,其中,波束形成原理具体为,选取一个适当的加权向量以补偿麦克风阵列中各个阵元的传播延时,从而使得在某一个期望的方向上阵列输出可以同向叠加,进而使得阵列在该方向上产生一个主瓣波束,并且可以在某个方向上对干扰进行一定的抑制,如此,当波束形成降噪算法对应于区域i,i为大于等于1,且小于等于N的正整数时,可以实现阵列输出在区域i对应的方向上同向叠加,在该方向上产生主瓣波束,并对该方向上的干扰噪声进行一致。
N个波束形成降噪算法分别对应信号采集区域中N个不同的区域,因此,电子设备在利用N个波束形成降噪算法对语音信号进行降噪处理时,相当于分别锁定N个区域的波束对语音信号进行降噪处理,换言之,电子设备分别假定声源在区域1、区域2……区域N对语音信号进行降噪处理,得到N个降噪信号。由于N个区域的并集覆盖信号采集区域,因此,声源必然位于上述N个区域中的至少一个区域,即上述假定至少有一个是正确的,在此种情形下,至少有一个降噪模块能够针对声源位置方向进行降噪,而不会将声源发出的声音当做噪声抑制,也不会因不明确声源位置使降噪性能受限。
由于每个波束形成降噪算法分别锁定了一个区域的波束,在此种情形下,电子设备可以利用自适应波束形成降噪算法,基于各算法各自对应的区域中的波束对语音信号进行降噪处理,通过对相应方向上的语音信号进行增强,可以获得较好的降噪效果,提升唤醒性能。
自适应波束形成降噪算法是在某种最优准则下通过自适应算法来实现权集寻优,其能够适应各种环境的变化,实时地将权集调整到最佳位置附近。其中,最优准则具体可以是最小军方误差(mean square error,MSE)准则、最大信噪比(signal noise ratio,SNR)准则、最大似然比(likelihood ratio,LH)准则或者最小噪声方差(noise variance,NV)准则等。
自适应算法主要分为闭环算法和开环算法,闭环算法包括但不限于最小均方(least mean square,LMS)算法、差分最陡下降(difference steepest descent,DSD)算法、加速梯度(acceleration gradient,AG)算法以及上述算法的变形算法,开环算法包括直接求逆法,如采样矩阵求逆(SampleMatrixInversion,SMI)、直接矩阵求逆((DirectMatrixInversion,DMI)等等。
其中,闭环算法实现简单、性能可靠,而且不需数据存储;开环算法则具有较好的收敛速度和相消性能,可以根据实际需求选择相应的自适应算法实现权集寻优,从而实现语音降噪。
需要说明的是,当麦克风阵列为单个麦克风阵列时,电子设备先复制所述语音信号得到N路所述语音信号,然后利用N个波束形成降噪算法对所述N路语音信号进行降噪处理得到N个降噪信号,其中,每个波束形成降噪算法用于处理所述N路语音信号中的一路语音信号,不同波束形成降噪算法用于处理不同路的语音信号。
当麦克风阵列为N个麦克风阵列时,麦克风阵列采集的语音信号为N路语音信号,则电子设备利用N个波束形成降噪算法对所述N路语音信号进行降噪处理得到N个降噪信号,其中,每个波束形成降噪算法用于处理所述N路语音信号中的一路语音信号,不同波束形成降噪算法用于处理不同路的语音信号。
S203:利用唤醒引擎,根据所述N个降噪信号中的至少一个进行语音唤醒。
在本实施例中,唤醒引擎具体是指语音唤醒引擎,即通过语音方式实现唤醒设备或者唤醒应用的核心组件,其一般表现为软件形式,如语音交互软件的主程序等等。其中,唤醒引擎具体是通过唤醒算法确定输入该唤醒引擎的信号与唤醒词的相似度,根据所述相似度实现语音唤醒。
由于N个降噪信号中至少有一个信噪比较好的信号,电子设备能够利用唤醒引擎,根据所述N个降噪信号中的至少一个,如上述信噪比较好的信号,执行唤醒词检测操作,提高了唤醒词检出概率,当检测到唤醒词时,可以生成唤醒事件达到唤醒电子设备或者唤醒电子设备中的应用的目的,如此,该方法提高了唤醒成功率。
需要说明的是,电子设备在利用唤醒引擎,根据所述降噪信号中的至少一个进行语音唤醒时,可以利用N个唤醒引擎处理所述N个降噪信号,每个唤醒引擎处理一个降噪信号,并且每个唤醒引擎处理的降噪信号是不同的,如此,可以并行检测唤醒词,缩短检测时间,提高了唤醒速率。
当然,在资源有限的情况下,电子设备也可以仅通过一个唤醒引擎处理所述N个降噪信号实现语音唤醒。具体地,电子设备利用一个唤醒引擎对所述N个降噪信号进行串行处理,依次检测N个降噪信号中的唤醒词,并在检测到唤醒词时通过唤醒引擎生成相应的唤醒事件,实现语音唤醒。
为了便于理解,下面结合具体示例进行说明。
图4示出了采用一个麦克风阵列和一个唤醒引擎实现语音唤醒的方法流程图,如图4所示,麦克风阵列1采集语音信号,电子设备复制其麦克风阵列采集的语音信号得到N路语音信号,然后将N路语音信号以一一对应的方式分输入至N个降噪算法模块,每个降噪算法模块中存储有波束形成降噪算法,具体为波束形成降噪算法1、波束形成降噪算法2……波束形成降噪算法n,其中,波束形成降噪算法1锁定0到m1度位置的波束,波束形成降噪算法2锁定m1到m2度位置的波束……波束形成降噪算法n锁定m(n-1)到m(n)度位置的波束,上述N个降噪算法模块对语音信号进行降噪处理后得到N个降噪信号,然后向唤醒引擎1发送所述N个降噪信号,唤醒引擎1基于根据N个降噪信号进行语音唤醒。
图5示出了采用N个麦克风阵列和N个唤醒引擎实现语音唤醒的方法流程图,如图5所 示,每个麦克风阵列采集得到一路语音信号,则N个麦克风阵列采集得到N路语音信号,具体为语音信号1、语音信号2……语音信号n,然后将语音信号以一一对应的关系分别输入至N个降噪算法模块,所述N个降噪算法模块参数配置与图4相同,降噪算法模块在利用波束形成降噪算法进行降噪处理的N个降噪信号后,再将降噪信号以一一对应的关系分别发送至唤醒引擎1、唤醒引擎2……唤醒引擎n,N个唤醒引擎根据各自对应的降噪信号检测唤醒词,若有一个唤醒引擎检测到唤醒词,则进行语音唤醒。
由上可知,本申请实施例提供了一种语音唤醒方法,在该方法中,通过麦克风阵列采集语音信号,利用N个波束形成降噪算法分别对所述语音信号进行降噪处理得到N个降噪信号,其中,每个波束形成降噪算法对应N个区域中的一个区域,不同波束形成降噪算法对应的区域不同,因而能够基于各个波束形成降噪算法对应的区域的波束对语音信号进行降噪处理,其降噪性能不会受到限制,能够最大限度发挥降噪算法的降噪性能。并且,电子设备能够对波束外的回声也进行消除,提升回声消除性能。
N个区域的并集覆盖所述麦克风阵列的信号采集区域,声源必然位于N个区域中的至少一个区域,具体地,声源可以位于N个区域中的一个区域,也可以位于多个区域,如相邻的两个区域,当然,声源为多个时,其所处的区域可以是多个相邻的区域,也可以是多个不相邻的区域,上述N个波束形成降噪算法中至少有一个对声源定位是准确的,通过上述N个波束形成降噪算法降噪处理得到的N个降噪信号中至少有一个具有较好的信噪比,电子设备能够利用唤醒引擎,根据所述N个降噪信中的至少一个进行语音唤醒,具有较高的唤醒成功率,提升了语音唤醒效果。
此外,在降噪前无需进行声源定位,不存在定位错误的风险,也无需增加辅助定位设备,一方面节省了成本开销,另一方面实现简单,易于推广。
可以理解的是,区域的大小直接影响着波束的宽窄,进而影响了波束形成降噪算法的降噪效果,为了获得良好的降噪效果,可以合理设置N的取值,使得波束形成降噪算法对应的区域大小能够满足对降噪性能的需求。
在一些可能的实现方式中,电子设备可以根据自身硬件处理能力确定所述N的取值,然后按照所述N的取值划分所述麦克风阵列的信号采集区域得到所述N个区域。在具体实现时,可以设置N的取值在一定范围内与硬件处理能力呈正比,N随着硬件处理能力的增大而增大,随着硬件处理能力的减小而减小。需要说明的是,在有些情况下,波束角度小于预设角度,如15°时,将导致电子设备将噪声也识别为声源发出的语音信号,进而影响降噪性能,基于此,可以设置N的取值不大于预设数值。
为了便于理解,下面结合具体示例进行说明。例如,智能音箱的硬件处理能力如计算能力较为有限,在此种情形下,可以将N的取值设置为较小的值,例如设置为3;智能手机的硬件处理能力随着芯片的更新换代不断增强,在此种情形下,可以将N的取值设置为较大的值,例如设置为10;需要说明的是,基于硬件配置不同,不同智能手机的硬件处理能力可以是不同的,针对低配置手机,可以将N设置为较小值,如设置为5,针对高配置手机,可以将N设置为较大值,如设置为10。
需要说明的是,N的取值越大,区域的数量越多,电子设备进行降噪处理的计算量越大,对电子设备的功耗要求越高。基于此,在一些可能的实现方式中,电子设备还可以基 于当前业务场景类型确定N的取值,并基于N的取值划分信号采集区域,从而确定所述N个区域。
具体地,电子设备获取当前业务场景类型,当前业务场景类型表征电子设备在当前场景下提供业务的能力,作为本申请的一个示例,当前业务场景类型可以包括低功耗业务场景类型或标准功耗业务场景类型,若当前业务场景类型为低功耗业务场景类型,则表明电子设备剩余电量低,仅提供低功耗业务或基础业务,若当前业务场景类型为标准功耗业务场景类型,则表明电子设备剩余电量充足,可以提供的业务不受限制,其既可以提供低功耗业务,也可以提供高功耗业务,除了基础业务外,还可以提供增值业务,然后电子设备根据当前业务场景类型查找参数配置文件,确定与所述当前业务场景类型匹配的N的取值,再按照所述N的取值划分所述信号采集区域得到所述N个区域。
其中,参数配置文件中保存有业务场景类型与N之间的对应关系,该对应关系可以是根据经验值预先写入参数配置文件的,电子设备获取当前业务场景类型后,根据参数配置文件中保存的上述对应关系,确定N的取值。
仍以智能音箱为例,当其剩余电量为90%时,则表明剩余电量充足,当前业务场景类型为标准功耗业务场景,电子设备获取所述当前业务场景类型后,查找参数配置文件得到N的取值,在该示例中,N可以为5,则此时电子设备将信号采集区域划分为5份,形成5个区域,并根据5个区域对应的波束形成降噪算法对语音信号进行降噪处理,并由唤醒引擎根据处理后的降噪信号确定是否生成唤醒事件。
随着智能音箱的不断使用,剩余电量不断降低。当剩余电量为10%时,则当前业务场景类型为低功耗业务场景,电子设备仅提供低功耗业务,此时电子设备查找参数配置文件得到N的取值,在该示例中,N可以为3,如此,电子设备重新划分信号采集区域,即将信号采集区域划分为3份,形成3个区域,并根据3个区域对应的波束形成降噪算法对语音信号进行降噪处理,并利用唤醒引擎根据处理后的降噪信号确定是否生成唤醒事件,实现语音唤醒。
如此,电子设备的业务场景发生变化时,能够自动切换N的取值,从而使得电子设备在任意业务场景下均具有良好的唤醒性能。
在上述实施例中,所述N个区域可以是按照角度对所述信号采集区域均匀划分得到的,如此,每个区域对应的波束角度是一致的,在利用N个波束形成降噪算法对语音信号进行降噪处理时,具有较好的均衡性。当然,在本申请实施例其他可能的实现方式中,N个区域也可以是不均匀的,本实施例对此不作限定。
考虑到实际应用时,还可能存在误唤醒的情形,电子设备还可以设置一些唤醒策略,在满足该唤醒策略所规定的条件的情况下唤醒设备或者唤醒应用,从而降低误唤醒率。具体地,电子设备在对语音信号进行降噪处理得到N个降噪信号后,可以利用唤醒引擎确定所述N个降噪信号中每一个降噪信号与唤醒词的相似度,然后根据所述相似度进行语音唤醒。
在根据所述相似度进行语音唤醒时,本申请实施例提供了几种实现方式。一种实现方式为,若所述N个降噪信号中每一个降噪信号与所述唤醒词的相似度的平均值大于预设阈值,则进行语音唤醒;另一种实现方式为,若所述N个降噪信号中有预设数量个降噪信号 与所述唤醒词的相似度大于预设阈值,则进行语音唤醒。当然,在实际应用时,也可以在两种条件均满足的情况下,即所述N个降噪信号中每一个降噪信号与所述唤醒词的相似度的平均值大于预设阈值,且所述N个降噪信号中有预设数量个降噪信号与所述唤醒词的相似度大于预设阈值的情况下,进行语音唤醒。
由于降噪信号与唤醒词的相似度能够表征语音信号中携带唤醒词的几率,当N个降噪信号中存在多个降噪信号与唤醒词的相似度较高,或者N个降噪信号相似度均值较高,则表明语音信号中有较大几率携带唤醒词,在此种情形下可以进行语音唤醒,反之,则不进行语音唤醒。如此,通过所述相似度进行语音唤醒可以减少误唤醒发生次数,降低误唤醒率。
为了便于理解本申请的技术方案,本申请还提供了一场景实施例对语音唤醒方法进行说明。
参见图6所示的语音唤醒方法的场景示意图,该方法应用于唤醒智能手机的场景中,智能手机600中包括麦克风阵列610,该麦克风阵列610具体是由4个麦克风形成的线性麦克风阵列,智能手机600处于息屏状态情况下,当用户说出“小E,您好”时,麦克风阵列610采集用户发出的语音信号,然后将该语音信号传输给智能手机600的处理器640,由处理器640对语音信号进行降噪处理,具体地,处理器640首先复制上述语音信号,得到4路语音信号,然后利用4个对应不同区域的波束形成降噪算法,对4路语音信号进行降噪处理,其中,一个波束形成降噪算法用于处理一路语音信号,不同波束形成降噪算法处理不同路的语音信号,如此,可以得到4个降噪信号,然后将4个降噪信号输入至智能手机600的唤醒引擎650中,唤醒引擎650根据唤醒算法识别出上述降噪信号中有3个降噪信号与唤醒词“小E,您好”的相似度达到90%,因而生成唤醒事件,并向智能手机600的应用层660上报唤醒事件,以唤醒智能手机600。
智能手机600可以通过Home键指示灯620闪烁表征被唤醒。在被唤醒后,其麦克风阵列610继续采集语音信号,如用户说出“请拨打xxx电话”时,麦克风阵列610可以采集到与该内容对应的语音信号,该语音信号传输至处理器640后,处理器640可以识别语音信号,并根据识别结果调用对应的应用,例如调用通话应用拨打电话,如界面630所示,如此,即实现了语音交互。
以上对本申请中一种语音唤醒方法进行说明,以下对执行上述语音唤醒方法的装置进行描述。本申请实施例所提供的语音唤醒装置具体可以是任意具有语音交互功能的电子设备,包括智能家电、智能终端、车载终端、可穿戴设备以及AR/VR设备等等,其具有实现对应于上述图1至图6任意所对应的实施例中所提供的语音唤醒方法的功能。所述功能可以通过硬件实现,也可以通过硬件执行相应的软件实现。硬件或软件包括一个或多个与上述功能相对应的模块,所述模块可以是软件和/或硬件。一些实施方式中,如图7所示,所述语音唤醒装置700包括:
采集模块710,用于通过麦克风阵列采集语音信号;
降噪模块720,用于利用N个波束形成降噪算法分别对所述语音信号进行降噪处理得到N个降噪信号;每个波束形成降噪算法对应N个区域中的一个区域,且不同波束形成降 噪算法对应的区域不同;所述N个区域的并集覆盖所述麦克风阵列的信号采集区域,所述N为大于1的正整数;
唤醒模块730,用于利用唤醒引擎,根据所述N个降噪信号中的至少一个进行语音唤醒。
具体实现时,该采集模块710具体可以用于执行S201中的方法,具体请参考图2示出的方法实施例对S201部分的描述;该降噪模块720具体可以用于执行S202中的方法,具体请参考图2示出的方法实施例对S202部分的描述;该唤醒模块730具体可以用于执行S203中的方法,具体请参考图2示出的方法实施例对S203部分的描述,此处不再赘述。
可选地,所述装置700还包括:
第一确定模块,用于根据硬件处理能力确定所述N的取值,按照所述N的取值划分所述信号采集区域得到所述N个区域。
具体实现时,第一确定模块可以参考图2示出的实施例中关于确定N个区域的相关内容描述。
可选地,所述装置700还包括:
第二确定模块,用于根据当前业务场景类型确定所述N的取值,按照所述N的取值划分所述信号采集区域得到所述N个区域。
进一步地,所述第二确定模块在确定N的取值时,具体用于:
根据当前业务场景类型查找参数配置文件,确定与所述当前业务场景类型匹配的N的取值。
具体实现时,第二确定模块可以参考图2示出的实施例中关于确定N个区域的相关内容描述。
可选地,所述N个区域是按照角度对所述信号采集区域均匀划分得到的。
可选地,所述采集模块710具体用于:
通过单个麦克风阵列采集语音信号;
所述降噪模块720具体用于:
复制所述语音信号得到N路所述语音信号;
利用N个波束形成降噪算法对所述N路语音信号进行降噪处理得到N个降噪信号,其中,每个波束形成降噪算法用于处理所述N路语音信号中的一路语音信号,不同波束形成降噪算法用于处理不同路的语音信号。
具体实现时,所述降噪模块720可以参考图2示出的实施例中S202相关内容描述,以及图4所示出的实施例中相关内容描述。
可选地,所述采集模块710具体用于:
通过多个麦克风阵列采集多路语音信号,其中,每个麦克风阵列采集一路语音信号;
所述降噪模块720具体用于:
利用N个波束形成降噪算法对所述N路语音信号进行降噪处理得到N个降噪信号,其中,每个波束形成降噪算法用于处理所述N路语音信号中的一路语音信号,不同波束形成降噪算法用于处理不同路的语音信号。
具体实现时,所述降噪模块720可以参考图2示出的实施例中S202相关内容描述, 以及图5所示出的实施例中相关内容描述。
可选地,所述波束形成降噪算法包括自适应波束形成降噪算法。
可选地,所述唤醒模块730具体用于:
利用唤醒引擎确定所述N个降噪信号中每一个降噪信号与唤醒词的相似度;
根据所述相似度进行语音唤醒。
具体实现时,所述唤醒模块730可以参考图2示出的实施例中S203之后相关内容描述。
可选地,所述唤醒模块730在根据所述评分进行语音唤醒时,具体包括:
若所述N个降噪信号中每一个降噪信号与所述唤醒词的相似度的平均值大于预设阈值,和/或,所述N个降噪信号中有预设数量个降噪信号与所述唤醒词的相似度大于预设阈值,则进行语音唤醒。
具体实现时,所述唤醒模块730可以参考图2示出的实施例中S203之后相关内容描述。
另外,本申请实施例还提供了一种电子设备,该电子设备可以是终端。图8示出了本申请实施例提供的一种终端的结构示意图,所述终端800包括麦克风阵列801、处理器802和存储器803,其中,所述麦克风阵列801用于采集语音信号,所述存储器803用于存储程序代码,所述处理器802用于调用所述存储器中的程序代码来执行如下步骤以实现上述图2提供的语音唤醒方法:
利用N个波束形成降噪算法分别对所述语音信号进行降噪处理得到N个降噪信号;每个波束形成降噪算法对应N个区域中的一个区域,且不同波束形成降噪算法对应的区域不同;所述N个区域的并集覆盖所述麦克风阵列的信号采集区域,所述N为大于1的正整数;
利用唤醒引擎,根据所述N个降噪信号的至少一个进行语音唤醒。
可选地,所述处理器802还用于执行如下步骤:
根据硬件处理能力确定所述N的取值;
按照所述N的取值划分所述信号采集区域得到所述N个区域。
可选地,所述处理器802还用于执行如下步骤:
根据当前业务场景类型确定所述N的取值;
按照所述N的取值划分所述信号采集区域得到所述N个区域。
可选地,所述处理器802还用于执行如下步骤:
根据当前业务场景类型查找参数配置文件,确定与所述当前业务场景类型匹配的N的取值。
可选地,所述处理器802还用于:
按照角度对所述信号采集区域均匀划分得到所述N个区域。
可选地,所述麦克风阵列为单个;
则所述处理器802在进行降噪处理得到N个降噪信号时,具体用于:
复制所述语音信号得到N路所述语音信号;
利用N个波束形成降噪算法对所述N路语音信号进行降噪处理得到N个降噪信号, 其中,每个波束形成降噪算法用于处理所述N路语音信号中的一路语音信号,不同波束形成降噪算法用于处理不同路的语音信号。
可选地,所述麦克风阵列为多个;每个麦克风阵列采集一路语音信号,所述多个麦克风阵列采集多路语音信号;
所述处理器802具体用于:
所述多路语音信号为N路语音信号时,利用N个波束形成降噪算法对所述N路语音信号进行降噪处理得到N个降噪信号,其中,每个波束形成降噪算法用于处理所述N路语音信号中的一路语音信号,不同波束形成降噪算法用于处理不同路的语音信号。
可选地,所述波束形成降噪算法包括自适应波束形成降噪算法。
可选地,所述处理器802在利用唤醒引擎,根据所述N个降噪信号的至少一个进行语音唤醒时,具体用于:
利用唤醒引擎确定所述N个降噪信号中每一个降噪信号与唤醒词的相似度;
根据所述相似度进行语音唤醒。
可选地,所述处理器802根据所述评分进行语音唤醒,具体包括:
若所述N个降噪信号中每一个降噪信号与所述唤醒词的相似度的平均值大于预设阈值,和/或,所述N个降噪信号中有预设数量个降噪信号与所述唤醒词的相似度大于预设阈值,则进行语音唤醒。
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质用于存储程序代码,所述程序代码用于执行本申请所述的语音唤醒方法。
本申请实施例还提供了在一种包含计算机可读指令的计算机程序产品,当该计算机可读指令在计算机上运行时,使得计算机执行上述各方面所述的语音唤醒方法。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统,装置和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
应当理解,在本申请中,“至少一个(项)”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,用于描述关联对象的关联关系,表示可以存在三种关系,例如,“A和/或B”可以表示:只存在A,只存在B以及同时存在A和B三种情况,其中A,B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一项(个)”或 其类似表达,是指这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a,b或c中的至少一项(个),可以表示:a,b,c,“a和b”,“a和c”,“b和c”,或“a和b和c”,其中a,b,c可以是单个,也可以是多个。
以上所述,以上实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的范围。

Claims (22)

  1. 一种语音唤醒方法,其特征在于,所述方法包括:
    通过麦克风阵列采集语音信号;
    利用N个波束形成降噪算法分别对所述语音信号进行降噪处理得到N个降噪信号;每个波束形成降噪算法对应N个区域中的一个区域,且不同波束形成降噪算法对应的区域不同;所述N个区域的并集覆盖所述麦克风阵列的信号采集区域,所述N为大于1的正整数;
    利用唤醒引擎,根据所述N个降噪信号的至少一个进行语音唤醒。
  2. 根据权利要求1所述的方法,其特征在于,所述N个区域通过如下方式确定:
    根据硬件处理能力确定所述N的取值;
    按照所述N的取值划分所述信号采集区域得到所述N个区域。
  3. 根据权利要求1所述的方法,其特征在于,所述N个区域通过如下方式确定:
    根据当前业务场景类型确定所述N的取值;
    按照所述N的取值划分所述信号采集区域得到所述N个区域。
  4. 根据权利要求3所述的方法,其特征在于,所述根据当前业务场景类型确定所述N的取值包括:
    根据当前业务场景类型查找参数配置文件,确定与所述当前业务场景类型匹配的N的取值。
  5. 根据权利要求1至4任意一项所述的方法,其特征在于,所述N个区域是按照角度对所述信号采集区域均匀划分得到的。
  6. 根据权利要求1至4任意一项所述的方法,其特征在于,所述通过麦克风阵列采集语音信号包括:
    通过单个麦克风阵列采集语音信号;
    则所述利用N个波束形成降噪算法分别对所述语音信号进行降噪处理得到N个降噪信号包括:
    复制所述语音信号得到N路所述语音信号;
    利用N个波束形成降噪算法对所述N路语音信号进行降噪处理得到N个降噪信号,其中,每个波束形成降噪算法用于处理所述N路语音信号中的一路语音信号,不同波束形成降噪算法用于处理不同路的语音信号。
  7. 根据权利要求1至4任意一项所述的方法,其特征在于,所述通过麦克风阵列采集语音信号包括:
    通过多个麦克风阵列采集多路语音信号,其中,每个麦克风阵列采集一路语音信号;
    若所述多路语音信号为N路语音信号,则所述利用N个波束形成降噪算法分别对所述语音信号进行降噪处理得到N个降噪信号包括:
    利用N个波束形成降噪算法对所述N路语音信号进行降噪处理得到N个降噪信号,其中,每个波束形成降噪算法用于处理所述N路语音信号中的一路语音信号,不同波束形成降噪算法用于处理不同路的语音信号。
  8. 根据权利要求1至4任意一项所述的方法,其特征在于,所述波束形成降噪算法包括自适应波束形成降噪算法。
  9. 根据权利要求1至4任意一项所述的方法,其特征在于,所述利用唤醒引擎,根 据所述N个降噪信号的至少一个进行语音唤醒包括:
    利用唤醒引擎确定所述N个降噪信号中每一个降噪信号与唤醒词的相似度;
    根据所述相似度进行语音唤醒。
  10. 根据权利要求9所述的方法,其特征在于,所述根据所述相似度进行语音唤醒包括:
    若所述N个降噪信号中每一个降噪信号与所述唤醒词的相似度的平均值大于预设阈值,和/或,所述N个降噪信号中有预设数量个降噪信号与所述唤醒词的相似度大于预设阈值,则进行语音唤醒。
  11. 一种语音唤醒装置,其特征在于,所述装置包括:
    采集模块,用于通过麦克风阵列采集语音信号;
    降噪模块,用于利用N个波束形成降噪算法分别对所述语音信号进行降噪处理得到N个降噪信号;每个波束形成降噪算法对应N个区域中的一个区域,且不同波束形成降噪算法对应的区域不同;所述N个区域的并集覆盖所述麦克风阵列的信号采集区域,所述N为大于1的正整数;
    唤醒模块,用于利用唤醒引擎,根据所述N个降噪信号中的至少一个进行语音唤醒。
  12. 一种电子设备,其特征在于,所述电子设备包括麦克风阵列、处理器和存储器:
    所述麦克风阵列用于采集语音信号;
    所述存储器用于存储程序代码;
    所述处理器用于根据所述程序代码中的指令执行如下步骤:
    利用N个波束形成降噪算法分别对所述语音信号进行降噪处理得到N个降噪信号;每个波束形成降噪算法对应N个区域中的一个区域,且不同波束形成降噪算法对应的区域不同;所述N个区域的并集覆盖所述麦克风阵列的信号采集区域,所述N为大于1的正整数;
    利用唤醒引擎,根据所述N个降噪信号的至少一个进行语音唤醒。
  13. 根据权利要求1所述的电子设备,其特征在于,所述处理器还用于执行如下步骤:
    根据硬件处理能力确定所述N的取值;
    按照所述N的取值划分所述信号采集区域得到所述N个区域。
  14. 根据权利要求1所述的电子设备,其特征在于,所述处理器还用于执行如下步骤:
    根据当前业务场景类型确定所述N的取值;
    按照所述N的取值划分所述信号采集区域得到所述N个区域。
  15. 根据权利要求14所述的电子设备,其特征在于,所述处理器还用于执行如下步骤:
    根据当前业务场景类型查找参数配置文件,确定与所述当前业务场景类型匹配的N的取值。
  16. 根据权利要求12至15任一项所述的电子设备,其特征在于,所述处理器还用于:
    按照角度对所述信号采集区域均匀划分得到所述N个区域。
  17. 根据权利要求12至15任一项所述的电子设备,其特征在于,所述麦克风阵列为单个;
    则所述处理器在进行降噪处理得到N个降噪信号时,具体用于:
    复制所述语音信号得到N路所述语音信号;
    利用N个波束形成降噪算法对所述N路语音信号进行降噪处理得到N个降噪信号,其中,每个波束形成降噪算法用于处理所述N路语音信号中的一路语音信号,不同波束形成降噪算法用于处理不同路的语音信号。
  18. 根据权利要求12至15任一项所述的电子设备,其特征在于,所述麦克风阵列为多个;每个麦克风阵列采集一路语音信号,所述多个麦克风阵列采集多路语音信号;
    所述处理器具体用于:
    所述多路语音信号为N路语音信号时,利用N个波束形成降噪算法对所述N路语音信号进行降噪处理得到N个降噪信号,其中,每个波束形成降噪算法用于处理所述N路语音信号中的一路语音信号,不同波束形成降噪算法用于处理不同路的语音信号。
  19. 根据权利要求12至15任一项所述的电子设备,其特征在于,所述波束形成降噪算法包括自适应波束形成降噪算法。
  20. 根据权利要求12至15任一项所述的电子设备,其特征在于,所述处理器在利用唤醒引擎,根据所述N个降噪信号的至少一个进行语音唤醒时,具体用于:
    利用唤醒引擎确定所述N个降噪信号中每一个降噪信号与唤醒词的相似度;
    根据所述相似度进行语音唤醒。
  21. 根据权利要求20所述的电子设备,其特征在于,所述处理器根据所述相似度进行语音唤醒,具体包括:
    若所述N个降噪信号中每一个降噪信号与所述唤醒词的相似度的平均值大于预设阈值,和/或,所述N个降噪信号中有预设数量个降噪信号与所述唤醒词的相似度大于预设阈值,则进行语音唤醒。
  22. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质用于存储程序代码,所述程序代码用于执行权利要求1至10中任一项所述的语音唤醒方法。
PCT/CN2020/081341 2019-03-28 2020-03-26 一种语音唤醒方法、装置、设备及介质 WO2020192721A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US17/598,702 US20230031491A1 (en) 2019-03-28 2020-03-26 Voice Awakening Method and Apparatus, Device, and Medium
EP20777736.8A EP3926624B1 (en) 2019-03-28 2020-03-26 Voice awakening method and apparatus, and device and medium

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910243897.3A CN109949810B (zh) 2019-03-28 2019-03-28 一种语音唤醒方法、装置、设备及介质
CN201910243897.3 2019-03-28

Publications (1)

Publication Number Publication Date
WO2020192721A1 true WO2020192721A1 (zh) 2020-10-01

Family

ID=67012231

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/081341 WO2020192721A1 (zh) 2019-03-28 2020-03-26 一种语音唤醒方法、装置、设备及介质

Country Status (4)

Country Link
US (1) US20230031491A1 (zh)
EP (1) EP3926624B1 (zh)
CN (1) CN109949810B (zh)
WO (1) WO2020192721A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112562666A (zh) * 2020-11-30 2021-03-26 海信视像科技股份有限公司 一种筛选设备的方法及服务设备
CN112951261A (zh) * 2021-03-02 2021-06-11 北京声智科技有限公司 声源定位方法、装置及语音设备
CN113077802A (zh) * 2021-03-16 2021-07-06 联想(北京)有限公司 一种信息处理方法和装置

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109448708A (zh) * 2018-10-15 2019-03-08 四川长虹电器股份有限公司 远场语音唤醒系统
CN109949810B (zh) * 2019-03-28 2021-09-07 荣耀终端有限公司 一种语音唤醒方法、装置、设备及介质
CN110265020B (zh) * 2019-07-12 2021-07-06 大象声科(深圳)科技有限公司 语音唤醒方法、装置及电子设备、存储介质
CN110415695A (zh) * 2019-07-25 2019-11-05 华为技术有限公司 一种语音唤醒方法及电子设备
CN110364176A (zh) * 2019-08-21 2019-10-22 百度在线网络技术(北京)有限公司 语音信号处理方法及装置
CN112751953A (zh) * 2019-10-31 2021-05-04 北京小米移动软件有限公司 电子设备、控制方法、装置及存储介质
CN110808030B (zh) * 2019-11-22 2021-01-22 珠海格力电器股份有限公司 语音唤醒方法、系统、存储介质及电子设备
CN111223497B (zh) * 2020-01-06 2022-04-19 思必驰科技股份有限公司 一种终端的就近唤醒方法、装置、计算设备及存储介质
CN111341313A (zh) * 2020-03-04 2020-06-26 北京声智科技有限公司 一种车载多音区声源检测方法、装置及系统
CN111432303B (zh) * 2020-03-19 2023-01-10 交互未来(北京)科技有限公司 单耳耳机、智能电子设备、方法和计算机可读介质
CN111462743B (zh) * 2020-03-30 2023-09-12 北京声智科技有限公司 一种语音信号处理方法及装置
CN111583927A (zh) * 2020-05-08 2020-08-25 安创生态科技(深圳)有限公司 多通道i2s语音唤醒低功耗电路数据处理方法及装置
CN111739533A (zh) * 2020-07-28 2020-10-02 睿住科技有限公司 语音控制系统、方法与装置以及存储介质、语音设备
CN114333884B (zh) * 2020-09-30 2024-05-03 北京君正集成电路股份有限公司 一种基于麦克风阵列结合唤醒词进行的语音降噪方法
CN112599143A (zh) * 2020-11-30 2021-04-02 星络智能科技有限公司 降噪方法、语音采集设备及计算机可读存储介质
TWI765485B (zh) * 2020-12-21 2022-05-21 矽統科技股份有限公司 外接式語音喚醒裝置及其控制方法
CN113053368A (zh) * 2021-03-09 2021-06-29 锐迪科微电子(上海)有限公司 语音增强方法、电子设备和存储介质
CN113066500B (zh) * 2021-03-30 2023-05-23 联想(北京)有限公司 声音采集方法、装置及设备和存储介质
CN113270095B (zh) * 2021-04-26 2022-04-08 镁佳(北京)科技有限公司 语音处理方法、装置、存储介质及电子设备

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106710603A (zh) * 2016-12-23 2017-05-24 上海语知义信息技术有限公司 利用线性麦克风阵列的语音识别方法及系统
CN107018470A (zh) * 2016-01-28 2017-08-04 讯飞智元信息科技有限公司 一种基于环形麦克风阵列的语音记录方法及系统
CN107464565A (zh) * 2017-09-20 2017-12-12 百度在线网络技术(北京)有限公司 一种远场语音唤醒方法及设备
US10051366B1 (en) * 2017-09-28 2018-08-14 Sonos, Inc. Three-dimensional beam forming with a microphone array
CN109272989A (zh) * 2018-08-29 2019-01-25 北京京东尚科信息技术有限公司 语音唤醒方法、装置和计算机可读存储介质
CN109448708A (zh) * 2018-10-15 2019-03-08 四川长虹电器股份有限公司 远场语音唤醒系统
CN109949810A (zh) * 2019-03-28 2019-06-28 华为技术有限公司 一种语音唤醒方法、装置、设备及介质

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11437020B2 (en) * 2016-02-10 2022-09-06 Cerence Operating Company Techniques for spatially selective wake-up word recognition and related systems and methods
US10431211B2 (en) * 2016-07-29 2019-10-01 Qualcomm Incorporated Directional processing of far-field audio
US10438588B2 (en) * 2017-09-12 2019-10-08 Intel Corporation Simultaneous multi-user audio signal recognition and processing for far field audio
CN108538305A (zh) * 2018-04-20 2018-09-14 百度在线网络技术(北京)有限公司 语音识别方法、装置、设备及计算机可读存储介质
CN113658588A (zh) * 2018-09-29 2021-11-16 百度在线网络技术(北京)有限公司 多音区语音识别方法、装置及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107018470A (zh) * 2016-01-28 2017-08-04 讯飞智元信息科技有限公司 一种基于环形麦克风阵列的语音记录方法及系统
CN106710603A (zh) * 2016-12-23 2017-05-24 上海语知义信息技术有限公司 利用线性麦克风阵列的语音识别方法及系统
CN107464565A (zh) * 2017-09-20 2017-12-12 百度在线网络技术(北京)有限公司 一种远场语音唤醒方法及设备
US10051366B1 (en) * 2017-09-28 2018-08-14 Sonos, Inc. Three-dimensional beam forming with a microphone array
CN109272989A (zh) * 2018-08-29 2019-01-25 北京京东尚科信息技术有限公司 语音唤醒方法、装置和计算机可读存储介质
CN109448708A (zh) * 2018-10-15 2019-03-08 四川长虹电器股份有限公司 远场语音唤醒系统
CN109949810A (zh) * 2019-03-28 2019-06-28 华为技术有限公司 一种语音唤醒方法、装置、设备及介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3926624A4

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112562666A (zh) * 2020-11-30 2021-03-26 海信视像科技股份有限公司 一种筛选设备的方法及服务设备
CN112562666B (zh) * 2020-11-30 2022-11-04 海信视像科技股份有限公司 一种筛选设备的方法及服务设备
CN112951261A (zh) * 2021-03-02 2021-06-11 北京声智科技有限公司 声源定位方法、装置及语音设备
CN112951261B (zh) * 2021-03-02 2022-07-01 北京声智科技有限公司 声源定位方法、装置及语音设备
CN113077802A (zh) * 2021-03-16 2021-07-06 联想(北京)有限公司 一种信息处理方法和装置
CN113077802B (zh) * 2021-03-16 2023-10-24 联想(北京)有限公司 一种信息处理方法和装置

Also Published As

Publication number Publication date
US20230031491A1 (en) 2023-02-02
CN109949810B (zh) 2021-09-07
EP3926624A4 (en) 2022-04-20
EP3926624A1 (en) 2021-12-22
CN109949810A (zh) 2019-06-28
EP3926624B1 (en) 2024-05-22

Similar Documents

Publication Publication Date Title
WO2020192721A1 (zh) 一种语音唤醒方法、装置、设备及介质
CN107577449B (zh) 唤醒语音的拾取方法、装置、设备及存储介质
CN109286875B (zh) 用于定向拾音的方法、装置、电子设备和存储介质
AU2015284970B2 (en) Operating method for microphones and electronic device supporting the same
WO2021136037A1 (zh) 语音唤醒方法、设备及系统
US9668048B2 (en) Contextual switching of microphones
US11094334B2 (en) Sound processing method and apparatus
KR20190067902A (ko) 사운드 처리 방법 및 장치
US20130329908A1 (en) Adjusting audio beamforming settings based on system state
CN110673096B (zh) 语音定位方法和装置、计算机可读存储介质、电子设备
CN108922553A (zh) 用于音箱设备的波达方向估计方法及系统
CN109270493A (zh) 声源定位方法和装置
CN111863020B (zh) 语音信号处理方法、装置、设备及存储介质
CN112652320B (zh) 声源定位方法和装置、计算机可读存储介质、电子设备
CN107852546B (zh) 电子设备及其输入/输出方法
CN113497995B (zh) 麦克风阵列控制方法、装置、电子设备及计算机存储介质
CN111383661B (zh) 基于车载多音区的音区判决方法、装置、设备和介质
CN111933167A (zh) 电子设备的降噪方法、装置、存储介质及电子设备
US20230026735A1 (en) Noise suppression using tandem networks
US20220115007A1 (en) User voice activity detection using dynamic classifier
US11831973B2 (en) Camera setting adjustment based on event mapping
CN113223552B (zh) 语音增强方法、装置、设备、存储介质及程序
WO2024016793A1 (zh) 语音信号的处理方法、装置、设备及计算机可读存储介质
US20230353684A1 (en) Method and electronic device for removing echo flowing in due to external device
US20230109066A1 (en) Contextual beamforming to improve signal-to-noise ratio sensitive audio input processing efficiency in noisy environments

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20777736

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2020777736

Country of ref document: EP

Effective date: 20210917