WO2021008000A1 - Procédé et appareil de réveil vocal, dispositif électronique, et support d'enregistrement - Google Patents

Procédé et appareil de réveil vocal, dispositif électronique, et support d'enregistrement Download PDF

Info

Publication number
WO2021008000A1
WO2021008000A1 PCT/CN2019/114378 CN2019114378W WO2021008000A1 WO 2021008000 A1 WO2021008000 A1 WO 2021008000A1 CN 2019114378 W CN2019114378 W CN 2019114378W WO 2021008000 A1 WO2021008000 A1 WO 2021008000A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
noise ratio
wake
beam signal
interference
Prior art date
Application number
PCT/CN2019/114378
Other languages
English (en)
Chinese (zh)
Inventor
段相
张珍斌
Original Assignee
大象声科(深圳)科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 大象声科(深圳)科技有限公司 filed Critical 大象声科(深圳)科技有限公司
Publication of WO2021008000A1 publication Critical patent/WO2021008000A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/78Detection of presence or absence of voice signals
    • G10L25/84Detection of presence or absence of voice signals for discriminating voice from noise
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B7/00Radio transmission systems, i.e. using radiation field
    • H04B7/02Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas
    • H04B7/04Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas
    • H04B7/06Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station
    • H04B7/0613Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission
    • H04B7/0615Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission of weighted versions of same signal
    • H04B7/0617Diversity systems; Multi-antenna system, i.e. transmission or reception using multiple antennas using two or more spaced independent antennas at the transmitting station using simultaneous transmission of weighted versions of same signal for beam forming
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech

Definitions

  • the present disclosure relates to the technical field of intelligent voice interaction, and in particular to a voice wake-up method, device, electronic equipment, and storage medium.
  • Voice as the most natural way of human interaction, has also become one of the most important ways people hope to replace the mouse, keyboard, and touch screen to communicate with computers.
  • Voice wake-up technology has become a very important function in the process of human-computer interaction and has received more and more attention.
  • Wake-up rate, false wake-up, response time and power consumption level are four general evaluation indicators for judging voice wake-up technology.
  • voice wake-up technology users are increasingly pursuing experience effects.
  • the combination of traditional front-end voice enhancement technology and wake-up models has become an important way to improve the wake-up rate.
  • multi-microphone enhancement technology is widely used in front-end voice enhancement. With multi-mic technology, the signal-to-noise ratio of the input voice will be significantly enhanced, so that a better recognition effect can be obtained.
  • the voice wake-up rate is low due to interference and reverberation.
  • microphone technology can also be used to preprocess the sound signal.
  • the use of multi-microphone technology can make full use of spatial information to enhance speech.
  • the microphone array can solve the acoustic problems of the room, such as sound source location, tracking, noise cancellation, speech enhancement, signal source separation, and reverberation cancellation.
  • the present invention provides a voice wake-up method, device, electronic equipment, and storage medium.
  • a voice wake-up method including:
  • the step of calculating the signal-to-noise ratio of each beam signal includes:
  • the beam signal includes a target source signal, an interference source signal, and background noise
  • the point source signal energy includes the target source signal energy and the interference source signal energy
  • the method further includes:
  • the point source signal energy and background noise energy of each frequency point in the beam signal are smoothed by a smoothing factor.
  • the step of determining the most awakening direction through the signal-to-noise ratio includes:
  • each candidate beam signal determine the beam signal direction with the largest signal-to-noise ratio
  • the step of determining the interference direction according to the signal-to-noise ratio of each beam signal within a preset number of frames includes:
  • the preset signal-to-noise ratio threshold When the maximum value of the signal-to-noise ratio of all beam signals is less than the preset signal-to-noise ratio threshold, record the difference between the signal-to-noise ratio in this direction and the second signal-to-noise ratio in the direction of the maximum signal-to-noise ratio beam signal, The difference recorded in other directions is zero. If the maximum value of the signal-to-noise ratio of all beam signals is greater than the preset signal-to-noise ratio threshold, the difference value recorded in all beam signal directions is zero.
  • the step of determining the optimal beam signal direction of each frame according to the signal-to-noise ratio of each candidate beam signal includes:
  • the signal source direction detection is performed according to the signal-to-noise ratio of each candidate beam signal, and the beam signal direction that satisfies the condition is set as Optimal beam signal direction.
  • the steps include:
  • the beam signal direction where the maximum signal-to-noise ratio is located is set as the optimal beam signal direction within a certain preset number of frames.
  • the step of determining the wake-up direction through the signal-to-noise ratio further includes:
  • the interference direction is determined as the wake-up direction.
  • a voice wake-up device including:
  • the sound signal receiving module is used to receive the sound signal collected by the microphone
  • a fixed beamforming module configured to perform fixed beamforming on the sound signal to generate multiple beam signals in different directions
  • the signal-to-noise ratio calculation module is used to calculate the signal-to-noise ratio of each beam signal
  • a wake-up direction determination module configured to determine the wake-up direction based on the signal-to-noise ratio
  • the voice wake-up operation module is used to perform voice wake-up operations according to the sound signal in the wake-up direction.
  • an electronic device including:
  • At least one processor At least one processor
  • a memory communicatively connected with the at least one processor; wherein,
  • the memory stores instructions executable by the at least one processor, and the instructions are executed by the at least one processor, so that the at least one processor can execute the method according to the first aspect.
  • a computer-readable storage medium for storing a program that, when executed, causes an electronic device to perform the method described in the first aspect.
  • the sound signal After receiving the sound signal collected by the microphone, the sound signal is subjected to fixed beam forming, and then the signal-to-noise ratio of each beam signal is calculated, and the wake-up direction is determined by the signal-to-noise ratio to perform the voice wake-up operation, making the system in a low signal-to-noise ratio environment It can also accurately determine the wake-up direction, effectively improving the accuracy of voice wake-up.
  • Fig. 1 is a flowchart showing a voice wake-up method according to an exemplary embodiment.
  • Fig. 2 is a beam signal direction diagram according to the embodiment corresponding to Fig. 1.
  • FIG. 3 is a specific implementation flowchart of step S130 in the voice wake-up method in the embodiment corresponding to FIG. 1.
  • Fig. 4 is a schematic diagram of a microphone array according to an exemplary embodiment.
  • Fig. 5 is a specific implementation flow chart of step S140 in the voice wake-up method in the embodiment corresponding to Fig. 1.
  • FIG. 6 is a specific implementation flowchart of step S141 shown in the embodiment corresponding to FIG. 5.
  • FIG. 7 is a specific implementation flowchart of step S143 shown in the embodiment corresponding to FIG. 5.
  • FIG. 8 is another specific implementation flowchart of step S140 in the voice wake-up method in the embodiment corresponding to FIG. 5.
  • Fig. 9 is a block diagram showing a voice wake-up device according to an exemplary embodiment.
  • FIG. 10 is a block diagram of the signal-to-noise ratio calculation module 130 in the voice wake-up device according to the embodiment corresponding to FIG. 9.
  • FIG. 11 is a block diagram of the wake-up direction determining module 140 in the voice wake-up device according to the embodiment corresponding to FIG. 9.
  • Fig. 1 is a flowchart showing a voice wake-up method according to an exemplary embodiment.
  • the voice wake-up method can be used in electronic devices such as smart phones and computers.
  • the voice wake-up method may include step S110, step S120, step S130, and step S140.
  • Step S110 Receive the sound signal collected by the microphone.
  • the electronic device When the electronic device is awakened by voice, the electronic device will collect the sound signal through the microphone.
  • the sound signal collected by the microphone not only includes the voice signal used for voice wake-up, but also contains interference noise.
  • the voice awakening rate is improved through voice front-end enhancement technology.
  • the voice signal can be collected through a microphone array, and the number of microphones is M, then the collected microphone signal is:
  • n time
  • T transpose
  • Step S120 Perform fixed beamforming on the voice signal to generate multiple beam signals.
  • a delayed sum beamforming method and a filtered sum beamforming method can be used.
  • the number of beams is BN (BN ⁇ M), and the direction of the beams is fixed and uniformly distributed (linear array 0°-180°, circular array 0°-360°).
  • the coefficients of the beamformer can be implemented by delay-sum technology and filter-sum technology, and different beamforming methods can also be used for different frequency bands. Delay and sum beamforming can obtain higher white noise amplification gain; in the filtering and summing beamforming method, the differential array is widely used due to its smaller size and better frequency invariance characteristics.
  • this embodiment designs a beamformer with broadband and gain independent of frequency band.
  • Figure 2 is a beam signal pattern shown in this embodiment. The array is a circular array, the number of microphones is 4, and the radius is 0.035m, Figure 2(a) is a three-dimensional schematic diagram of frequency angle gain, and Figure 2(b) is a beam polar coordinate diagram. It can be seen from Fig. 2 that the beam sidelobe attenuation of this embodiment is about 25dB.
  • BN beam output can be obtained:
  • Y(w) [y 1 (w),y 2 (w),...,y BN (w)] T , w represents the frequency point.
  • Step S130 Calculate the signal-to-noise ratio of each beam signal.
  • the signal-to-noise ratio is the ratio between the energy in the sound signal and the noise spectrum.
  • step S130 may include step S131, step S132, and step S133.
  • Step S131 For each beam signal, calculate the energy and noise spectrum of each frequency point in the beam signal.
  • Step S132 Calculate the signal-to-noise ratio of each frequency point in the beam signal based on the energy and noise spectrum of each frequency point in the beam signal.
  • the corresponding beam signals also include signals of different frequency points. Therefore, in order to improve the voice awakening rate, the energy and noise spectrum of each frequency point in the beam signal must be calculated.
  • a corresponding smoothing factor is selected to smooth the energy of each frequency point in the beam signal through the position relationship between each beam signal and the interference direction.
  • the magnitude of the smoothing factor weight i (w) is related to the interference direction, that is, the greater the smoothing factor set for the beam signal whose beam direction is closer to the interference direction.
  • the beam signal smoothing factor 2 weight 2 (w) and the beam signal smoothing factor 4 weight 4 (w) corresponding to the weight is large (w) corresponding to beam signals 1weight 1 the weight of.
  • weight 2 (w) weight 4 (w) ⁇ 0.8
  • weight 1 (w) 0.6.
  • the estimation method can be: MCRA, IMCRA, MARTIN, DOBLINGER, HIRSCH and other single-channel noise spectrum estimation methods, and other noise spectrum estimation methods can also be used.
  • the specific noise spectrum is not determined here.
  • the estimation methods are described one by one.
  • Step S133 representing the signal-to-noise ratio of the beam signal according to the average value of the signal-to-noise ratio of the beam signal in the preset frequency band.
  • the preset frequency band range is 0-2 kHz.
  • Step S140 Determine the wake-up direction according to the signal-to-noise ratio.
  • the wake-up direction is the voice wake-up direction confirmed by the present invention.
  • the present invention calculates the signal-to-noise ratio of each beam signal, and then determines the wake-up direction from multiple beam signals according to the signal-to-noise ratio of each beam signal, and then uses the beam signal in the wake-up direction to perform voice wake-up operations on the electronic device.
  • the optimal beam signal direction When selecting the optimal beam signal direction from all beam signals, it can be determined as the optimal beam signal direction according to the beam signal direction with the largest average signal-to-noise ratio in a certain period of time, or it can be the largest signal-to-noise direction in a certain period of time.
  • the beam signal direction with the largest number of frames is determined as the optimal beam signal direction, and the optimal beam signal direction may also be determined by other methods, which will not be described here.
  • step S140 may include steps S141, S142, S143, and S144.
  • Step S141 Determine the interference direction according to the signal-to-noise ratio of each beam signal within the preset number of frames.
  • the interference direction is the direction of the noise source that interferes with the voice signal relative to the electronic device.
  • 3 is the interference source
  • the electronic device is located at the center of the circle
  • the direction of the interference source 3 relative to the center of the circle is the interference direction.
  • the interference source Since the interference source has a greater impact on the voice signal during voice wake-up, the sound signal generated by the interference source will have a greater impact on each beam signal. Therefore, by pre-determining the interference direction, and then through the positional relationship between other beam signal directions and the interference source, the corresponding beam signal is smoothed, thereby effectively reducing the influence of the interference source on voice wake-up and improving the accuracy of voice wake-up .
  • the interference direction can be the beam signal with the largest number of frames whose signal-to-noise ratio reaches the preset signal-to-noise ratio threshold within the preset number of frames.
  • the direction is determined as the interference direction, or the direction of the beam signal with the largest average signal-to-noise ratio within the preset number of frames is determined as the interference direction, or the interference direction may be determined by other methods, which will not be described here.
  • step S141 may include steps S1411, S1412, and S1413.
  • Step S1411 Calculate the maximum value of the signal-to-noise ratio of all beam signals in the current frame, and compare the maximum value with a preset signal-to-noise ratio threshold.
  • Step S1412 when the maximum value of the signal-to-noise ratio of all beam signals is less than the preset signal-to-noise ratio threshold, record the signal-to-noise ratio in the direction of the beam signal with the maximum signal-to-noise ratio between that direction and the second signal-to-noise ratio The difference recorded in other directions is zero. If the maximum value of the signal-to-noise ratio of all beam signals is greater than the preset signal-to-noise ratio threshold, the difference value recorded in all beam signal directions is zero.
  • Step S1413 Count the sum of the difference values recorded in each beam signal direction within the preset number of frames. If the sum is greater than zero, the beam signal direction with the largest sum is determined as the interference direction.
  • the preset number of frames is T1, and preferably, T1 ⁇ 2000 frames.
  • SNR i the signal-to-noise ratio of all beam signal directions
  • MAXSNR the maximum signal-to-noise ratio of all beam signal directions
  • set the threshold ⁇ if MAXSNR ⁇ , then consider the beam signal It is a silent segment, and the difference between the maximum signal-to-noise ratio and the second signal-to-noise ratio is recorded in the direction of the maximum signal-to-noise ratio beam signal, and the other directions are recorded as zero.
  • MAXSNR> ⁇ then mark all beam signal directions to zero, and finally count the sum of the difference values recorded in each beam signal direction within the preset frame number T1.
  • the beam signal direction with the largest sum is determined as the interference direction.
  • the value of T1 and ⁇ will be selected according to specific scenarios, so as to better improve the accuracy of the interference direction judgment.
  • T1>2000 frames, ⁇ 10dB.
  • Step S142 Remove the beam signal corresponding to the interference direction from all beam signals to obtain candidate beam signals.
  • the candidate beam signal is a beam signal set after removing the beam signal corresponding to the interference direction from all beam signals.
  • the interference direction determined by the technical solution of the present invention is not the optimal beam signal direction. Therefore, when the optimal beam signal direction is determined, the beam signal corresponding to the interference direction is removed from all beam signals, and then according to the candidate beam signal The signal-to-noise ratio further determines the optimal beam signal direction to improve the accuracy of determining the optimal beam signal direction.
  • Step S143 Determine the candidate beam signal with the largest signal-to-noise ratio in each frame according to the signal-to-noise ratio of each candidate beam signal.
  • the signal source direction detection is performed according to the signal-to-noise ratio of each candidate beam signal.
  • step S143 may further include step S1431, step S1432, step S1433, and step S1434:
  • Step S1431 Sort the signal-to-noise ratios of the candidate beam signals.
  • Step S1432 If the maximum signal-to-noise ratio of the candidate beam exceeds the threshold within the preset number of consecutive frames, and the difference between the maximum signal-to-noise ratio and the second signal-to-noise ratio reaches the preset difference threshold, step S1433 is executed ; If it is no, no processing is done.
  • step S1433 it is judged whether the beam signal directions of the maximum signal-to-noise ratio remain consistent within the preset number of consecutive frames. If it is (Y), then execute step S1434; if it is not, then no processing is performed.
  • Step S1434 Set the beam signal direction where the maximum SNR is located as the optimal beam signal direction within a certain preset number of frames.
  • the signal-to-noise ratios of candidate beam signals other than the interference direction are sorted, and the beam signal in the direction of the maximum signal-to-noise ratio is selected.
  • the beam signal direction has the maximum signal-to-noise ratio in consecutive N frames, and its signal-to-noise ratio MAXSNR> ⁇ (preset threshold) within N consecutive frames, and exceeds a certain threshold ⁇ of the second largest signal-to-noise ratio SECSNR, It is assumed that the direction of the MAXSNR beam signal is the optimal beam signal direction.
  • the optimal beam signal direction is set to the MAXSNR direction within a certain time range T3.
  • the size of T3 depends on different wake words.
  • Step S144 Count the candidate beam signals with the largest SNR in the preset number of frames, determine the direction where the candidate beam signals are located as the optimal beam signal direction, and use the optimal beam signal direction as the wake-up direction.
  • each candidate beam signal In each frame, according to the signal-to-noise ratio of each candidate beam signal, count the candidate beam signals with the largest signal-to-noise ratio in the preset number of frames, and determine the direction of the candidate beam signal as the optimal Beam signal direction.
  • the number of frames is T2.
  • the number of frames is T2 preset frame numbers, in the BN beams, except for the beam signals in the interference direction that have been detected, if the maximum SNR is less than the threshold th, then the current frame The optimal beam signal direction of is consistent with the previous frame; if the maximum SNR is greater than the threshold th, the beam signal corresponding to the maximum SNR is recorded as 1, the beam signal in the interference direction and other smaller SNR beam signals are recorded as 0, and the statistics are in The direction of the beam signal with the largest number of SNR occurrences in the T2 frames is determined as the optimal beam signal direction of the current frame.
  • 20 ⁇ T2 ⁇ 100 and th 10.
  • the wake-up direction will be identified as the interference direction.
  • the statistical interference direction and the optimal beam signal direction determined according to the aforementioned method are both used as the wake-up direction, that is, two wakeups are performed. If one of them exceeds the threshold, Determined to be awake state.
  • step S140 may further include step S146 and step S147:
  • Step S146 It is judged whether the energy of the signal source is far greater than a certain threshold of the energy of the interference source. If yes (Y), proceed to step S147; if no, proceed to step S142.
  • Step S147 Determine the interference direction as the wake-up direction.
  • the interference direction can be determined as the wake-up direction, and steps S142, S143, and S144 are performed at the same time to determine the optimal beam signal direction, and the interference direction and the optimal beam signal direction are carried out in two beams.
  • Signal direction wake-up it is also possible to determine that the signal source energy is greater than a certain threshold of the interference source energy, and determine the interference direction as the wake-up direction, without performing steps S142, S143, S144, and directly perform voice wake-up operations according to the voice signal in the interference direction. To improve the efficiency of voice wake-up operations.
  • Step S150 Perform a voice wake-up operation according to the voice signal in the wake-up direction.
  • the sound signal is subjected to fixed beam forming, and then the signal-to-noise ratio of each beam signal is calculated, and the wake-up direction is determined by the signal-to-noise ratio to perform speech according to the sound signal in the wake-up direction
  • the wake-up operation enables the system to accurately determine the wake-up direction even in a low signal-to-noise ratio environment, which effectively improves the accuracy of voice wake-up.
  • the experimental results are used to illustrate the experiment.
  • the experiment is carried out in a room of 6x3x3.5m.
  • the microphone array is a 4 wheat circular array with a radius of 0.035m and is located at 3x1.5x1.5m.
  • the interference direction is at 2x1.5x1.5m, and the wake-up positions are distributed on a circle 1.2m away from the microphone. There are two wake-ups every 30 degrees, a total of 24 wake-ups.
  • test signal-to-noise ratio is -5dB, 0dB, and 5dB, respectively, and the test results are shown in Table 1.
  • the third line in the table shows the probability that the best beam signal direction is correct. From the results, it can be seen that the probability of babble interference at -5dB is lower, and the probability of other cases is above 80%.
  • Rows 4 and 5 of Table 1 are the wake-up results of single wheat and the present invention respectively. From the table, it can be seen that the present invention can significantly improve the arousal rate.
  • Fig. 9 is a block diagram showing a voice wake-up device according to an exemplary embodiment.
  • the device includes but is not limited to: a sound signal receiving module 110, a fixed beam forming module 120, a signal-to-noise ratio calculation module 130, and an optimal beam signal direction
  • the determination module 140 and the voice wake-up operation module 150 are not limited to: a sound signal receiving module 110, a fixed beam forming module 120, a signal-to-noise ratio calculation module 130, and an optimal beam signal direction
  • the determination module 140 and the voice wake-up operation module 150 includes but is not limited to: a sound signal receiving module 110, a fixed beam forming module 120, a signal-to-noise ratio calculation module 130, and an optimal beam signal direction
  • the determination module 140 and the voice wake-up operation module 150 are not limited to: a voice wake-up operation module 150.
  • the sound signal receiving module 110 is used to receive the sound signal collected by the microphone
  • the fixed beam forming module 120 is configured to perform fixed beam forming on the sound signal to generate multiple beam signals;
  • the signal-to-noise ratio calculation module 130 is used to calculate the signal-to-noise ratio of each beam signal
  • the wake-up direction determination module 140 is configured to determine the wake-up direction according to the signal-to-noise ratio
  • the voice wake-up operation module 150 is configured to perform voice wake-up operations according to the voice signal in the wake-up direction.
  • the SNR calculation module 130 described in FIG. 9 includes, but is not limited to: a signal energy and noise spectrum calculation unit 131, a frequency point SNR calculation unit 132, and a beam signal SNR Calculating unit 133.
  • the signal energy and noise spectrum calculation unit 131 is configured to calculate the energy and noise spectrum of each frequency point in the beam signal for each beam signal;
  • the frequency point signal-to-noise ratio calculation unit 132 is configured to pass the signal-to-noise ratio of each frequency point in the beam signal;
  • the beam signal signal-to-noise ratio calculation unit 133 is configured to indicate the signal-to-noise ratio of the beam signal according to the average value of the signal-to-noise ratio of the beam signal within a preset frequency band.
  • the signal-to-noise ratio calculation module 130 described in FIG. 10 further includes, but is not limited to, a smoothing processing unit.
  • the smoothing processing unit is used for smoothing the energy of each frequency point in the beam signal through a smoothing factor.
  • the wake-up direction determination module 140 described in FIG. 9 includes but is not limited to: an interference direction determination unit 141, a rejection unit 142, a candidate beam signal determination unit 143 and a wake-up direction determination unit 144.
  • the interference direction determining unit 141 is configured to determine the interference direction according to the signal-to-noise ratio of each beam signal within a preset number of frames;
  • the removing unit 142 is configured to remove the beam signal in the interference direction from all the beam signals to obtain candidate beam signals;
  • the candidate beam signal determining unit 143 is configured to determine the candidate beam signal with the largest signal-to-noise ratio in each frame according to the signal-to-noise ratio of each candidate beam signal;
  • the wake-up direction determining unit 144 is configured to count the candidate beam signals with the largest SNR in the preset number of frames, determine the direction in which the candidate beam signals are located as the optimal beam signal direction, and determine the optimal beam signal direction. The direction of the beam signal is used as the wake-up direction.
  • the present invention also provides an electronic device that performs all or part of the steps of the voice wake-up method shown in any of the foregoing exemplary embodiments.
  • Electronic equipment includes:
  • a memory connected in communication with the processor; wherein,
  • the memory stores readable instructions, and when the readable instructions are executed by the processor, the method according to any of the foregoing exemplary embodiments is implemented.
  • a storage medium is also provided.
  • the storage medium is a computer-readable storage medium, for example, it may be a temporary and non-transitory computer-readable storage medium including instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

La présente invention concerne un procédé et un appareil de réveil vocal, un dispositif électronique et un support d'enregistrement. Le procédé consiste à : recevoir des signaux sonores collectés par un microphone (S110) ; réaliser une formation de faisceaux fixe sur les signaux sonores pour générer une pluralité de signaux de faisceau (S120) ; calculer le rapport signal sur bruit de chaque signal de faisceau (S130) ; déterminer une direction de réveil au moyen des rapports signal sur bruit (S140) ; et réaliser une opération de réveil vocal selon un signal sonore dans la direction de réveil (S150). Au moyen du procédé de réveil, un système peut déterminer avec précision une direction de réveil dans un environnement à faible rapport signal sur bruit, ce qui permet d'améliorer efficacement la précision du réveil vocal.
PCT/CN2019/114378 2019-07-12 2019-10-30 Procédé et appareil de réveil vocal, dispositif électronique, et support d'enregistrement WO2021008000A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910627574.4 2019-07-12
CN201910627574.4A CN110265020B (zh) 2019-07-12 2019-07-12 语音唤醒方法、装置及电子设备、存储介质

Publications (1)

Publication Number Publication Date
WO2021008000A1 true WO2021008000A1 (fr) 2021-01-21

Family

ID=67925774

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2019/114378 WO2021008000A1 (fr) 2019-07-12 2019-10-30 Procédé et appareil de réveil vocal, dispositif électronique, et support d'enregistrement

Country Status (2)

Country Link
CN (1) CN110265020B (fr)
WO (1) WO2021008000A1 (fr)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113053406A (zh) * 2021-05-08 2021-06-29 北京小米移动软件有限公司 声音信号识别方法及装置
CN113724704A (zh) * 2021-08-30 2021-11-30 深圳创维-Rgb电子有限公司 一种语音获取方法、装置、终端及存储介质

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110265020B (zh) * 2019-07-12 2021-07-06 大象声科(深圳)科技有限公司 语音唤醒方法、装置及电子设备、存储介质
CN111223497B (zh) * 2020-01-06 2022-04-19 思必驰科技股份有限公司 一种终端的就近唤醒方法、装置、计算设备及存储介质
CN111192589A (zh) * 2020-01-16 2020-05-22 云知声智能科技股份有限公司 语音唤醒方法及装置
CN111341297B (zh) * 2020-03-04 2023-04-07 开放智能机器(上海)有限公司 一种语音唤醒率测试系统及方法
CN111402883B (zh) * 2020-03-31 2023-05-26 云知声智能科技股份有限公司 一种复杂环境下分布式语音交互系统中就近响应系统和方法
CN111863012A (zh) * 2020-07-31 2020-10-30 北京小米松果电子有限公司 一种音频信号处理方法、装置、终端及存储介质
CN113066488B (zh) * 2021-03-26 2023-10-27 深圳市欧瑞博科技股份有限公司 语音唤醒智能控制方法、装置、电子设备及存储介质

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9734845B1 (en) * 2015-06-26 2017-08-15 Amazon Technologies, Inc. Mitigating effects of electronic audio sources in expression detection
CN108831498A (zh) * 2018-05-22 2018-11-16 出门问问信息科技有限公司 多波束波束成形的方法、装置及电子设备
CN108877827A (zh) * 2017-05-15 2018-11-23 福州瑞芯微电子股份有限公司 一种语音增强交互方法及系统、存储介质及电子设备
CN109272989A (zh) * 2018-08-29 2019-01-25 北京京东尚科信息技术有限公司 语音唤醒方法、装置和计算机可读存储介质
CN109473118A (zh) * 2018-12-24 2019-03-15 苏州思必驰信息科技有限公司 双通道语音增强方法及装置
US20190098400A1 (en) * 2017-09-28 2019-03-28 Sonos, Inc. Three-Dimensional Beam Forming with a Microphone Array
CN109920433A (zh) * 2019-03-19 2019-06-21 上海华镇电子科技有限公司 嘈杂环境下电子设备的语音唤醒方法
CN109949810A (zh) * 2019-03-28 2019-06-28 华为技术有限公司 一种语音唤醒方法、装置、设备及介质
CN110265020A (zh) * 2019-07-12 2019-09-20 大象声科(深圳)科技有限公司 语音唤醒方法、装置及电子设备、存储介质

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1473964A3 (fr) * 2003-05-02 2006-08-09 Samsung Electronics Co., Ltd. Réseau de microphones, méthode de traitement des signaux de ce réseau de microphones et méthode et système de reconnaissance de la parole en faisant usage
US9640179B1 (en) * 2013-06-27 2017-05-02 Amazon Technologies, Inc. Tailoring beamforming techniques to environments
CN104810021B (zh) * 2015-05-11 2017-08-18 百度在线网络技术(北京)有限公司 应用于远场识别的前处理方法和装置
CN106683685B (zh) * 2016-12-23 2020-05-22 云知声(上海)智能科技有限公司 基于最小二乘法的目标方向语音检测方法
CN108831495B (zh) * 2018-06-04 2022-11-29 桂林电子科技大学 一种应用于噪声环境下语音识别的语音增强方法
CN109102822B (zh) * 2018-07-25 2020-07-28 出门问问信息科技有限公司 一种基于固定波束形成的滤波方法及装置
CN109597022B (zh) * 2018-11-30 2023-02-17 腾讯科技(深圳)有限公司 声源方位角运算、定位目标音频的方法、装置和设备

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9734845B1 (en) * 2015-06-26 2017-08-15 Amazon Technologies, Inc. Mitigating effects of electronic audio sources in expression detection
CN108877827A (zh) * 2017-05-15 2018-11-23 福州瑞芯微电子股份有限公司 一种语音增强交互方法及系统、存储介质及电子设备
US20190098400A1 (en) * 2017-09-28 2019-03-28 Sonos, Inc. Three-Dimensional Beam Forming with a Microphone Array
CN108831498A (zh) * 2018-05-22 2018-11-16 出门问问信息科技有限公司 多波束波束成形的方法、装置及电子设备
CN109272989A (zh) * 2018-08-29 2019-01-25 北京京东尚科信息技术有限公司 语音唤醒方法、装置和计算机可读存储介质
CN109473118A (zh) * 2018-12-24 2019-03-15 苏州思必驰信息科技有限公司 双通道语音增强方法及装置
CN109920433A (zh) * 2019-03-19 2019-06-21 上海华镇电子科技有限公司 嘈杂环境下电子设备的语音唤醒方法
CN109949810A (zh) * 2019-03-28 2019-06-28 华为技术有限公司 一种语音唤醒方法、装置、设备及介质
CN110265020A (zh) * 2019-07-12 2019-09-20 大象声科(深圳)科技有限公司 语音唤醒方法、装置及电子设备、存储介质

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113053406A (zh) * 2021-05-08 2021-06-29 北京小米移动软件有限公司 声音信号识别方法及装置
CN113724704A (zh) * 2021-08-30 2021-11-30 深圳创维-Rgb电子有限公司 一种语音获取方法、装置、终端及存储介质

Also Published As

Publication number Publication date
CN110265020A (zh) 2019-09-20
CN110265020B (zh) 2021-07-06

Similar Documents

Publication Publication Date Title
WO2021008000A1 (fr) Procédé et appareil de réveil vocal, dispositif électronique, et support d'enregistrement
US10602267B2 (en) Sound signal processing apparatus and method for enhancing a sound signal
CN109272989B (zh) 语音唤醒方法、装置和计算机可读存储介质
CN110503969B (zh) 一种音频数据处理方法、装置及存储介质
CN107577449B (zh) 唤醒语音的拾取方法、装置、设备及存储介质
CN110556103B (zh) 音频信号处理方法、装置、系统、设备和存储介质
US9008329B1 (en) Noise reduction using multi-feature cluster tracker
CN107910011B (zh) 一种语音降噪方法、装置、服务器及存储介质
RU2642353C2 (ru) Устройство и способ для обеспечения информированной оценки вероятности и присутствия многоканальной речи
CN107464565B (zh) 一种远场语音唤醒方法及设备
JP2021500634A (ja) マイク・アレイに基づく対象音声取得方法及び装置
US9959886B2 (en) Spectral comb voice activity detection
CN110211599B (zh) 应用唤醒方法、装置、存储介质及电子设备
US10242677B2 (en) Speaker dependent voiced sound pattern detection thresholds
WO2019080551A1 (fr) Procédé et appareil de détection de voix cible
US9378754B1 (en) Adaptive spatial classifier for multi-microphone systems
WO2020048431A1 (fr) Procédé de traitement vocal, dispositif électronique et dispositif d'affichage
CN110830870B (zh) 一种基于传声器技术的耳机佩戴者语音活动检测系统
CN110610718A (zh) 一种提取期望声源语音信号的方法及装置
WO2016119388A1 (fr) Procédé et dispositif de construction de matrice de covariance de focalisation sur la base d'un signal vocal
CN112394324A (zh) 一种基于麦克风阵列的远距离声源定位的方法及系统
CN114464184B (zh) 语音识别的方法、设备和存储介质
CN113223552B (zh) 语音增强方法、装置、设备、存储介质及程序
CN111462757B (zh) 基于语音信号的数据处理方法、装置、终端及存储介质
Niu et al. An Adaptive Speech Noise Reduction Method Based on Noise Classification

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19937552

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 27.06.2022)

122 Ep: pct application non-entry in european phase

Ref document number: 19937552

Country of ref document: EP

Kind code of ref document: A1