WO2019169616A1 - 语音信号处理方法及装置 - Google Patents

语音信号处理方法及装置 Download PDF

Info

Publication number
WO2019169616A1
WO2019169616A1 PCT/CN2018/078505 CN2018078505W WO2019169616A1 WO 2019169616 A1 WO2019169616 A1 WO 2019169616A1 CN 2018078505 W CN2018078505 W CN 2018078505W WO 2019169616 A1 WO2019169616 A1 WO 2019169616A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice signal
tracking
angular position
microphone array
signal
Prior art date
Application number
PCT/CN2018/078505
Other languages
English (en)
French (fr)
Inventor
朱虎
王鑫山
李国梁
杨柯
郭红敬
Original Assignee
深圳市汇顶科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市汇顶科技股份有限公司 filed Critical 深圳市汇顶科技股份有限公司
Priority to PCT/CN2018/078505 priority Critical patent/WO2019169616A1/zh
Priority to CN201880000268.1A priority patent/CN110495185B/zh
Publication of WO2019169616A1 publication Critical patent/WO2019169616A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones

Definitions

  • the embodiments of the present invention relate to the field of computer technologies, and in particular, to a voice signal processing method and apparatus.
  • the physical position of the sound source changes during the movement, causing the beam of the microphone array to deviate from the sound source, resulting in reduced noise reduction performance.
  • the microphone array is required to always aim at the target sound source during the process of receiving the speech, thereby weakening the influence of the non-target sound source, such as weakening the speech and background noise of the non-target speaker.
  • some schemes such as motion image tracking or high-resolution spectral estimation based on time-delay estimation and particle filter tracking algorithms, are applied to speech signal processing.
  • the embodiment of the invention provides a method and a device for processing a speech signal, so as to solve the problem that the prior art speech signal processing scheme is applied to a fast processing of a speech signal in a moving scene, and the processing effect is poor.
  • a speech signal processing method comprising: obtaining an angular position of a speech signal relative to a microphone array, wherein the angular position includes an orientation of the speech signal relative to the microphone array An angle and a pitch angle; determining a direction vector of a sound source direction of the voice signal according to the angular position; performing a Kalman filter process on the voice signal according to the direction vector; processing according to the Kalman filter process As a result, voice signal tracking is performed.
  • a voice signal processing apparatus including: an angle acquiring module, configured to acquire an angular position of a voice signal relative to a microphone array, wherein the angular position includes the voice signal relative to An azimuth and elevation angle of the microphone array; a direction determining module, configured to determine a direction vector of a sound source direction of the voice signal according to the angular position; and a filtering module, configured to: according to the direction vector
  • the speech signal is subjected to Kalman filter processing; and the tracking module is configured to perform speech signal tracking according to the processing result of the Kalman filter processing.
  • the Kalman filter processing is performed on the voice signal according to the angular position of the voice signal relative to the microphone array, and then the voice signal tracking is performed according to the processing result of the Kalman filter processing.
  • Kalman filtering performs the current estimation only by the previous filtering results and deviations each time the filtering process is performed, and does not need to process other data, so it has a faster running speed.
  • Kalman filtering is a linear filtering. It is necessary to generate a state vector according to the position information and velocity information of the filtering object. However, the position information and the velocity information of the speech signal received by the microphone array cannot meet the linear filtering requirement of the Kalman filter.
  • the angular position of the voice signal is converted into a direction vector of the sound source direction that can satisfy the linear filtering requirement, and Kalman filtering is performed to obtain an estimated position of the next time voice signal in the moving scene for voice tracking.
  • FIG. 1 is a flow chart showing the steps of a method for processing a voice signal according to Embodiment 1 of the present invention
  • FIG. 2 is a schematic diagram showing an angular position of a voice signal relative to a microphone array in the embodiment shown in FIG. 1;
  • FIG. 3 is a flow chart showing the steps of a method for processing a voice signal according to Embodiment 2 of the present invention.
  • FIG. 4 is a flow chart showing the steps of a method for processing a voice signal according to Embodiment 3 of the present invention.
  • FIG. 5 is a structural block diagram of a voice signal processing apparatus according to Embodiment 4 of the present invention.
  • FIG. 6 is a structural block diagram of a voice signal processing apparatus according to Embodiment 5 of the present invention.
  • FIG. 7 is a schematic structural diagram of a voice signal processing system according to Embodiment 6 of the present invention.
  • FIG. 1 there is shown a flow chart of steps of a speech signal processing method according to a first embodiment of the present invention.
  • Step S102 Acquire an angular position of the voice signal relative to the microphone array.
  • the angular position comprises an azimuth and elevation angle of the speech signal relative to the microphone array.
  • a microphone array is an array structure consisting of a number of acoustic sensors, usually microphones, used to sample and process received speech signals from different directions in space.
  • voice communication the characteristics of the voice signal are mainly reflected in the time domain and the frequency domain, but the microphone array adds a spatial domain based on the time domain and the frequency domain, and nulls the received voice signals from different directions in the space.
  • Time processing The microphone array receives the original analog speech signal and performs processing such as weighting, delay, summation, etc. to form a spatially directional beam, ie, a beam of the microphone array.
  • the angular position of the voice signal relative to the microphone array can be understood as the pointing direction of the beam of the microphone array.
  • the microphone array has a plurality of array topologies, such as a uniform line array, a uniform area array, a uniform circular array, and an arbitrary discrete array.
  • the microphone array may adopt a uniform area array or a uniform circular array topology.
  • the speech signals in different directions have an azimuth and elevation angle with respect to the microphone array.
  • the Z-axis direction is set to the normal direction of the microphone array
  • the XOY plane is the plane of the microphone array
  • the direction of the voice signal is the sound source direction and the normal direction of the microphone array.
  • the angle ⁇ is the pitch angle of the speech signal relative to the microphone array
  • the direction of the incoming signal of the speech signal that is, the direction of the sound source, is projected in the plane of the microphone array and the angle of the X-axis. Is the azimuth of the speech signal relative to the microphone array.
  • Step S104 Determine a direction vector of a sound source direction of the voice signal according to the angular position.
  • the sound source direction of the voice signal relative to the microphone array can be determined.
  • the voice signal is indicated by the direction vector.
  • the direction of the sound source may take any suitable form including, but not limited to, a direction cosine vector.
  • Step S106 Perform Kalman filtering processing on the speech signal according to the direction vector.
  • Kalman filtering is a linear filtering. It is necessary to generate a state vector according to the position information and velocity information of the filtering object. However, the angular position and velocity of the speech signal received by the microphone array cannot meet the linear filtering requirements of the Kalman filter. Therefore, the speech is required. The angular position of the signal is converted into a direction vector that satisfies the direction of the sound source required for linear filtering, and then Kalman filtering is performed.
  • Step S108 Perform voice signal tracking according to the processing result of the Kalman filter processing.
  • the estimated position of the speech signal at the next moment in the moving scene can be obtained for voice signal tracking.
  • the Kalman filter is used to estimate the position of the voice signal at the next moment, wherein the specific time of the next moment is determined according to the tracking period of the voice signal, and the tracking period can be appropriately set by a person skilled in the art according to the actual situation, and the voice signal can be guaranteed. It can be smooth for a short time, for example, it can be set to 10ms (milliseconds).
  • the Kalman filtering process is performed on the speech signal according to the angular position of the speech signal relative to the microphone array, and then the speech signal tracking is performed according to the processing result of the Kalman filtering process.
  • Kalman filtering performs the current estimation only by the previous filtering results and deviations each time the filtering process is performed, and does not need to process other data, so it has a faster running speed.
  • Kalman filtering is a kind of linear filtering. It is necessary to generate a state vector according to the position information and velocity information of the filtering object. However, the angular position and velocity information of the speech signal received by the microphone array cannot meet the linear filtering requirement of the Kalman filter.
  • the angular position of the voice signal is converted into a direction vector of the sound source direction that can satisfy the linear filtering requirement, and Kalman filtering is performed to obtain an estimated position of the next time voice signal in the moving scene for voice tracking.
  • FIG. 3 a flow chart of steps of a method for processing a voice signal according to a second embodiment of the present invention is shown.
  • Step S202 Perform a voice signal search on the audio signal received by the microphone array.
  • a microphone array is generally composed of a plurality of sub-arrays.
  • a microphone array composed of four sub-arrays is taken as an example to describe a voice signal processing method provided by an embodiment of the present invention.
  • the microphone array in this embodiment may adopt a uniform area array or a uniform circular array topology.
  • the audio signal received by the microphone array may or may not contain a speech signal (such as a signal that is completely background noise), and the premise of processing or tracking the speech signal is that a speech signal needs to be searched.
  • the center wave phase of the search region corresponding to the audio signal received by the microphone array may be determined according to the guiding information; the voice signal search is started from the center wave phase.
  • the boot information is information used by the device in which the microphone array is located to determine the initial beam pointing of the sound source, and the boot information typically contains information on the approximate spatial location of the sound source.
  • the search area corresponding to the received audio signal appears as an area of the microphone array beam of different signals. That is, the center wave phase is initially given by the guidance information.
  • the center beam corresponding to the center wave phase is first searched for voice, and if the voice signal is searched, the search is ended. If the voice signal is not searched, the next beam is determined to perform a voice search, wherein the wave position displacement can be performed based on the center beam, and the shifted beam is used as the next beam for voice search.
  • the wave position displacement of the center beam can be shifted from multiple directions such as up, down, left, and right.
  • the determination of the next beam can be random, that is, randomly determine whether to shift left or right, up or down.
  • the voice signal search can be implemented by means of beam energy detection, that is, starting from the center wave phase, performing beam energy detection on the center beam corresponding to the center wave phase; if the search for the voice signal is determined according to the detection result Then, the speech signal search is ended; if it is determined that the speech signal is not found according to the detection result, the center beam is subjected to wave position displacement, and the beam energy detection of the shifted beam is continued.
  • the beam energy detection includes: obtaining, for each sub-array of the microphone array, a correlation between a voice signal of the current sub-array and a voice signal of all sub-arrays under the current beam; and acquiring, according to the correlation, a beam energy corresponding to the current beam And determining whether a voice signal is searched for in the current beam according to the relationship between the beam energy and the set noise threshold.
  • the noise threshold can be appropriately set by a person skilled in the art according to actual needs, for example, can be set according to the beam energy when there is no speech and only background noise.
  • the cross-correlation processing between the sub-arrays is performed by the correlation between the sub-arrays, so that the signal-to-noise ratio of the received speech signal can be improved to realize the speech. More efficient detection of signals.
  • Step S204 Perform speech capture on the searched speech signal, and determine an initial angular position of the speech signal relative to the microphone array according to the result of the speech capture.
  • the voice capture of the searched voice signal can be implemented by any suitable sound source localization algorithm.
  • a sound source localization algorithm based on steerable beamforming is adopted.
  • the algorithm calculates the angular cosine of the sound source based on the angular relationship between the angle cosine of the sound source position and the beam amplitude of the microphone array within a certain range, so as to obtain the position information of the sound source.
  • other sound source localization algorithms are also applicable, such as sound source localization algorithm based on time delay estimation, localization algorithm based on high resolution spectrum estimation, and SRP-PHAT (based on joint controllable response power and phase transformation).
  • Source location algorithm and more.
  • the false alarm detection may be performed on the searched voice signal first, and if the false alarm does not occur, the searched voice signal is voice captured. Since the noise always exists objectively, when the amplitude of the noise signal exceeds the detection threshold, the detection system will mistakenly find the target. This error is called “false alarm”. Through false alarm detection, the validity of the searched speech signal can be further determined, and the effectiveness and accuracy of subsequent speech capture can be improved.
  • the initial angular position of the captured speech signal relative to the microphone array that is, the azimuth and elevation angle of the captured speech signal relative to the microphone array, can be obtained.
  • Step S206 determining whether to perform voice signal tracking according to the initial angle position, and if yes, executing step S208; if not, returning to step S204.
  • the initial angular position can be directly used, converted into a direction vector of the sound source direction, and then Kalman filter is used for voice signal tracking.
  • the voice signal tracking process may be determined according to the initial angular position, and after the voice signal tracking is determined, the voice signal tracking process is entered, for example, determining the captured voice signal according to the initial angular position.
  • the direction cosine vector; the direction cosine vector is used as the observation vector of the Kalman filter, and the captured speech signal is subjected to Kalman filter processing; and the regeneration process of the Kalman filter process is obtained, if the innovation process is less than or equal to the set gate
  • the limit value determines the voice signal tracking.
  • the innovation process is the difference between the predicted value obtained by the Kalman filter and the actual measured value, and the threshold value can be appropriately set by a person skilled in the art according to actual needs, such as setting according to the beam width of the microphone array.
  • the angle cosine residual of the azimuth angle and the angle cosine residual of the pitch angle after the captured speech signal is processed by the Kalman filter may be compared with a set threshold, where the gate is set.
  • the limit value may be 1/6 of the beam width of the microphone array.
  • Step S208 Determine a beam direction of the microphone array according to the initial angular position, perform tracking and positioning of the voice signal according to the beam direction, and obtain an angular position of the next time voice signal relative to the microphone array according to the tracking positioning result.
  • the angular position of the speech signal relative to the microphone array is the initial angular position
  • the subsequent angular position is determined according to the result of the Kalman filtering process (tracking positioning). That is, the initial beam orientation of the microphone array is determined by the initial angular position obtained by the acquisition process, and the subsequent beam pointing is determined based on the angular position predicted by the Kalman filtering process.
  • the next time is determined according to the tracking period, and the tracking period can be appropriately set by a person skilled in the art according to actual conditions, and the short-time stability of the voice signal can be ensured, for example, it can be set to 10 ms (milliseconds).
  • the current time is 0 minutes 0 seconds 0 milliseconds
  • the tracking of the voice signal is started
  • the current beam position is determined by using the initial angular position
  • the Kalman filter processing in the embodiment of the present invention is performed according to the initial angular position, and 0 is obtained.
  • the angular position at 10 milliseconds in seconds.
  • Step S210 Determine a direction vector of a sound source direction of the voice signal according to the angular position of the next moment.
  • the direction cosine vector of the speech signal is determined according to the angular position of the next-time speech signal acquired in step S208 with respect to the microphone array, and the direction cosine vector is determined as the direction vector of the sound source direction of the speech signal.
  • direction vectors such as a direction sine vector, or a similar direction vector based on other coordinate systems, are also applicable.
  • the direction cosine vector of the speech signal can be expressed as:
  • Step S212 Perform Kalman filtering processing on the speech signal according to the direction vector.
  • the direction cosine vector of the speech signal can be used as the observation vector of the Kalman filter; according to the observation vector, the Kalman filter processing is performed on the speech signal.
  • the Kalman filter estimates the process state by means of feedback control, which first estimates the state of the process at a certain moment, and then obtains the feedback in the form of noise-containing measured variables.
  • the process of Kalman filtering is divided into two parts: the state model part and the observation model part.
  • the state model is a model that reflects the state change law.
  • the state equation is used to describe the state transition law of the adjacent time; the observation model reflects the relationship between the actual observation and the state variable.
  • the Kalman filter obtains a state-optimal estimate of the filtered object through the above two parts.
  • the processing process includes: establishing a state model (state equation) and an observation model (observation equation); setting parameters for the state model and the observation model; using the state model, according to the n-th The state at 1 time predicts the state at the nth time; using the observation model, the system prediction error at the nth time is estimated from the system prediction error at the n-1th time; the update process of the Kalman filter is calculated; according to the predicted nth time State and innovation process, calculate the system's optimal estimate; calculate the system's current forecast error at the current time.
  • the state vector, the state equation and the observation equation of the Kalman filter can be determined according to the cosine vector and the direction of the cosine of the direction; and, in the state prediction process
  • the state vector of the speech signal at the next moment can be estimated according to the state equation.
  • the innovation process of the cosine of the direction of the speech signal can be obtained by observing the equation; according to the estimated state vector and the innovation process, the next step can be obtained.
  • the optimal position estimate of the speech signal, ie the optimal direction cosine vector; then, based on the optimal direction cosine vector, the angular position of the speech signal relative to the microphone array at the next moment can be determined.
  • Step S214 Perform voice signal tracking according to the processing result of the Kalman filter processing.
  • the direction cosine prediction vector of the voice signal at the next moment is obtained according to the Kalman filter process, and the angular position prediction value of the voice signal relative to the microphone array is obtained; Speech signal tracking is performed based on the angular position prediction value. That is, the beam pointing of the microphone array can be automatically adjusted according to the angular position prediction value, and the beam prediction pointing direction of the microphone array at the next moment is obtained, thereby completing the voice signal tracking.
  • the angle position prediction value may be used as the beam direction of the voice signal of the next time received by the microphone array, and the process returns to step S208 to continue. That is, in this mode, the speech signal is repositioned centering on the predicted angular position of the speech signal at the next moment to correct the angular position obtained by the Kalman filter, so as to be more accurate after correction.
  • the angular position is followed by voice tracking. In this way, voice tracking is made more accurate and efficient.
  • the angular position prediction value may be directly used as the angular position of the speech signal at the next moment with respect to the microphone array, and the process returns to step S210. That is to say, in this mode, the voice signal tracking is directly performed using the predicted angular position of the next-time speech signal, and the voice signal tracking in this manner is relatively fast.
  • the processing of this situation may be in the following manner: in the process of voice signal tracking, if the voice signal is relative to the microphone array according to the next moment Angle position, tracking voice signal failure (such as the deviation of the angular position predicted by the Kalman filter processing from the actual angular position is greater than the set value, wherein the set value is set by a person skilled in the art according to actual needs), then the pair is used.
  • the Kalman filter coefficient of the previous speech signal is subjected to Kalman filter processing, and the speech signal tracking is performed again according to the processing result of the Kalman filter processing.
  • the Kalman filter coefficient of the previous speech signal is kept unchanged, and the tracking is performed again based on the prediction result of the previous speech signal. If the voice signal is still not tracked after the process is performed N times, it is determined that the voice signal is lost. Otherwise, the voice signal is considered to be flickering.
  • the N may be appropriately set by a person skilled in the art according to actual needs, and the embodiment of the present invention does not limit this.
  • the tracking voice signal failure may be determined according to the innovation process in the Kalman filtering process. For example, when the acquired innovation process exceeds the set threshold, it is determined that the current voice signal position is a wild value, and when the wild value continuously appears N times, the voice signal is determined to be lost, the voice track is interrupted, and the voice signal search is performed instead. Step S202 starts re-execution.
  • the Kalman filtering process is performed on the speech signal according to the angular position of the speech signal relative to the microphone array, and then the speech signal tracking is performed according to the processing result of the Kalman filtering process.
  • Kalman filtering performs the current estimation only by the previous filtering results and deviations each time the filtering process is performed, and does not need to process other data, so it has a faster running speed.
  • Kalman filtering is a linear filtering. It is necessary to generate a state vector according to the position information and velocity information of the filtering object. However, the position information and the velocity information of the speech signal received by the microphone array cannot meet the linear filtering requirement of the Kalman filter.
  • the angular position of the voice signal is converted into a direction vector of the sound source direction that can satisfy the linear filtering requirement, and Kalman filtering is performed to obtain an estimated position of the next time voice signal in the moving scene for voice tracking.
  • FIG. 4 a flow chart of steps of a method for processing a voice signal according to a third embodiment of the present invention is shown.
  • This embodiment describes the voice signal processing scheme provided by the present invention in the form of a specific example.
  • Step S302 Pre-processing the audio signal received by the microphone array.
  • the microphone array is divided into four sub-arrays, and the original sound is received from the noisy environment and converted into four analog audio signals. Since the positions of the four sub-arrays are different, such as the regular sub-array position of the uniform area array or the four positions of the upper, lower, left and right of the uniform circular array, there will be a difference in the time between the sounds reaching each sub-array, so that there are four analog audio signals in the phase. difference.
  • the four analog audio signals are converted into digital audio signals, for example, through a preamplifier, a band pass filter, and an analog to digital conversion device, and the resulting four analog audio signals are converted into four digital audio signals containing phase information. Then, data buffering, signal pre-emphasis, and windowing processing are performed on the four channels of digital audio signals.
  • Step S304 Search for a voice signal.
  • the device where the microphone array is located is first started, it is initialized according to the boot information.
  • the center wave phase of the search is given by the guidance information.
  • five beams are set based on the center beam corresponding to the phase of the center wave, that is, the center beam corresponding to the phase of the center wave, the beam after the center beam is shifted by half the beam width, and the center beam is shifted by half the beam width.
  • the latter beam the center beam moves up the beam after half the beam width, and the center beam moves down the beam after half the beam width.
  • the wave position displacement is performed in units of half beam width, but is not limited thereto. In practical applications, those skilled in the art may also perform wave position displacement based on the center beam in other appropriate units to obtain different beams.
  • the voice search starts from the center beam corresponding to the phase of the center wave. If no voice signal is found in the beam, then another beam is selected from the other four beams (such as randomly selecting one, or pressing The clockwise order selects one or the like, and the embodiment of the present invention does not limit the order of selection) to perform a voice search. If any of the beams searches for a voice signal, the voice search is ended, and the process proceeds to step S306 to capture the voice signal. If none of the five beams search for the voice signal, the guidance information is re-acquired, and the phase of the next center wave to be searched is adjusted and determined according to the re-acquired guidance information.
  • the center wave phase can be automatically adjusted to expand the search range.
  • the speech search for each beam can be achieved by beam energy detection of the beam.
  • the beam energy detection for each beam includes the following processes:
  • the correlation between the speech signals of the four sub-arrays of the microphone array and the speech signals of the sub-array 1 is:
  • i denotes a sub-array number, from 1 to 4 in this embodiment;
  • N represents the number of samples of the speech signal of the current speech frame;
  • y i (n) represents the noisy speech signal received by the i-th sub-array;
  • y 1 *(n) represents the conjugate of y 1 (n).
  • n i (n) represents the pure noise signal received by the ith sub-array without the speech signal
  • n 1 *(n) represents the conjugate of n 1 (n).
  • k 1 is an amplification factor
  • 1 ⁇ k 1 ⁇ 2.5 and optionally k 1 is 2.
  • Step S306 Capture the searched voice signal.
  • the beam direction of the searched voice signal is first repeatedly detected to determine whether a false alarm occurs in the search process. If a false alarm occurs, return to step S304 to perform a voice signal search again; if a false alarm is not generated and the voice signal is still detected, the searched voice signal is considered to be valid, and the angle of the voice signal relative to the microphone array is calculated.
  • is the pitch angle
  • Is the azimuth Is the azimuth.
  • a controllable beamforming algorithm is used for speech capture. The algorithm calculates the angular cosine of the sound source based on a linear relationship between the angle cosine of the sound source position and the beam amplitude difference of the microphone array within a certain range. Thereby obtaining the position information of the sound source.
  • the direction of the speech signal relative to the direction of the microphone array is expressed as:
  • ( ⁇ 0 , ⁇ 0 ) is the direction of the beam corresponding to the voice signal searched in step S304. Focusing on the beam 5, offsetting half of the beam width in the direction along the ⁇ coordinate direction to form beam 1 and beam 2, the two beam directions are respectively
  • the received signal synthesis is performed on the five beam azimuths, and the summing amplitudes F ⁇ 1 to F ⁇ 5 of the five directions can be obtained.
  • the error voltages in the ⁇ and ⁇ directions are obtained by the following equation to obtain the amplitude difference between the corresponding beams:
  • the angular error signal u ⁇ is approximately linear with ⁇ t , u ⁇ and ⁇ t within a certain range. ,which is:
  • the angular position can be tracked for the initial angular position.
  • of the voice signal may be determined with a threshold of a new information process, and the threshold of the innovation process may be taken. 1/6 beamwidth, when the angular cosine residual is less than the threshold of the innovation process, the speech signal is tracked. Otherwise, the searched speech signal is re-captured. By making a judgment by the residual, the accuracy of the angular position of the acquired speech signal can be ensured.
  • the capture process is not required, and the voice signal is located at this time.
  • the SRP-PHAT (based on joint controllable response power and phase transformation) sound source localization algorithm is used to obtain the angle of the speech signal relative to the microphone array.
  • the SRP-PHAT sound source localization algorithm combines the inherent robustness and short-term analysis characteristics of the controllable response power method with the insensitivity of the phase change method in the delay estimation to the surrounding environment of the signal, thus having certain noise immunity. Resound resistance and robustness.
  • Step S308 Perform voice signal tracking.
  • the angular position of the first tracked speech signal during the speech signal tracking process is given by the angular position obtained in step S306.
  • the angular position of the speech signal required at the next moment is tracked by the tracking result ( The Kalman filter results are given). That is, the beam pointing of the microphone array at the next moment is determined according to the angular position obtained in step S306, and then the direction cosine vector of the speech signal is obtained again by the sound source localization algorithm, and the process is cyclically executed.
  • the system enters a stable tracking process, in which the angular position of the voice signal is Converted to the angle cosine [X c Y c Z c ] T , the Kalman filter is processed with the angle cosine [X c Y c Z c ] T as the observation vector, and the direction cosine prediction value corresponding to the speech signal at the next moment is obtained [X' c Y' c Z' c ] T , then converted to the angular value of the speech signal According to the angular position, the positioning and tracking of the speech signal at the next moment is realized.
  • the embodiment of the present invention uses the angular cosine [X c Y c Z c ] T of the speech signal as the observation vector in the Kalman filter, and its value is:
  • the Kalman filtering process for the speech signal is as follows:
  • x(n) F(n,n-1)x(n-1)+ ⁇ (n,n-1)v 1 (n-1)
  • T can be appropriately set by a person skilled in the art according to actual conditions, and can ensure the short-term stability of the voice signal. For example, T can be set to 10 ms.
  • the observation equation of the Kalman filter is determined as:
  • Z(n) is the angular cosine vector of the speech signal at time n;
  • C(n) is the observation matrix at time n;
  • v 2 (n) represents the observed noise with a mean value of 0 independent of v 1 (n);
  • (n) is the state at time n.
  • the angle cosine vector Z(n) of the speech signal based on the measured n-time, and the angular cosine vector of the speech signal at the n-time predicted from the angle cosine vector of the speech signal at time n-1
  • the regeneration process of Kalman filtering can be obtained, namely:
  • the loop memory function can be used to extrapolate several cycles (ie, keep the Kalman filter filter coefficients unchanged, and then perform multiple tracking filters), continue to receive and track the direction of the previous prediction.
  • Voice signal A number of cycles may be appropriately set by a person skilled in the art according to the actual situation, such as three times, six times, and the like, which are not limited in the embodiment of the present invention.
  • the determination of flicker or tracking loss may be based on the innovation process, that is, determining whether to track the loss in the Kalman filtering process as the decision amount, and determining whether the innovation process exceeds the set threshold
  • the angular position obtained by the current speech signal positioning is a wild value.
  • the wild value occurs continuously (the number of consecutive occurrences is the same as the number of the above-mentioned several cycles)
  • the current voice tracking should be interrupted, and the process proceeds to step S304 to perform a voice search.
  • the setting threshold value may be appropriately set by a person skilled in the art according to actual conditions, for example, set to a 1/4 beam width, but is not limited thereto, and may be set by a person skilled in the art according to actual experience in practical applications.
  • the device to which the microphone array belongs is moved, and before the physical movement, the beam of the microphone array has been pointed to the next position, thereby reducing the time taken for signal processing and the delay caused by beam pointing adjustment.
  • the beam of the microphone array can adaptively align the direction of the wave direction of the sound source according to the physical movement of the device and the characteristics of the environment, suppress interference and noise signals in other directions, and have better adaptability to the movement characteristics of the microphone array carrier.
  • FIG. 5 there is shown a block diagram of a structure of a speech signal processing apparatus in accordance with a fourth embodiment of the present invention.
  • the voice signal processing apparatus of this embodiment includes: an angle obtaining module 402, configured to acquire an angular position of the voice signal relative to the microphone array, wherein the angular position includes an azimuth and a pitch angle of the voice signal relative to the microphone array; a module 404, configured to determine a direction vector of a sound source direction of the voice signal according to the angular position; a filtering module 406, configured to perform a Kalman filtering process on the voice signal according to the direction vector; and a tracking module 408, configured to The processing result of the Kalman filter processing is performed to perform voice signal tracking.
  • the Kalman filtering process is performed on the speech signal according to the angular position of the speech signal relative to the microphone array, and then the speech signal tracking is performed according to the processing result of the Kalman filtering process.
  • Kalman filtering performs the current estimation only by the previous filtering results and deviations each time the filtering process is performed, and does not need to process other data, so it has a faster running speed.
  • Kalman filtering is a linear filtering. It is necessary to generate a state vector according to the position information and velocity information of the filtering object. However, the position information and the velocity information of the speech signal received by the microphone array cannot meet the linear filtering requirement of the Kalman filter.
  • the angular position of the voice signal is converted into a direction vector of the sound source direction that can satisfy the linear filtering requirement, and Kalman filtering is performed to obtain an estimated position of the next time voice signal in the moving scene for voice tracking.
  • FIG. 6 there is shown a block diagram showing the structure of a speech signal processing apparatus according to a fifth embodiment of the present invention.
  • the voice signal processing apparatus of this embodiment includes: an angle obtaining module 502, configured to acquire an angular position of the voice signal relative to the microphone array, wherein the angular position includes an azimuth and a pitch angle of the voice signal relative to the microphone array; a module 504, configured to determine a direction vector of a sound source direction of the voice signal according to the angular position; a filtering module 506, configured to perform a Kalman filtering process on the voice signal according to the direction vector; and a tracking module 508, configured to The processing result of the Kalman filter processing is performed to perform voice signal tracking.
  • the direction determining module 504 is configured to determine a direction cosine vector of the voice signal according to the angular position, and determine the direction cosine vector as a direction vector of a sound source direction of the voice signal.
  • the filtering module 506 is configured to use a direction cosine vector of the voice signal as an observation vector of the Kalman filter; and perform Kalman filtering processing on the voice signal according to the observation vector.
  • the tracking module 508 includes: a prediction module 5082, configured to obtain an angular position prediction value of the voice signal relative to the microphone array according to a direction cosine prediction vector of the voice signal obtained after the Kalman filtering process; and the prediction tracking module 5084, The voice signal tracking is performed according to the angular position prediction value.
  • the prediction tracking module 5084 is configured to use the angular position prediction value as a beam direction of the voice signal of the next moment received by the microphone array, perform tracking and positioning of the voice signal according to the beam direction, and acquire the next moment according to the tracking positioning result.
  • the return direction determination module 504 performs the angular position of the speech signal relative to the microphone array; alternatively, the angular position prediction value is directly used as the angular position of the speech signal at the next moment relative to the microphone array, and the return direction determination module 504 performs.
  • the voice signal processing apparatus of this embodiment further includes: a loss processing module 510, configured to perform Kalman filtering processing on the previous voice signal if the tracking voice signal fails during the voice signal tracking process.
  • the Manchester filter coefficient is used to perform voice signal tracking again according to the processing result of the Kalman filter process.
  • the loss processing module 510 is configured to: in the process of tracking the voice signal, if the tracking voice signal fails according to the innovation process in the Kalman filtering process, the Kalman filtering process is used on the previous speech signal.
  • the Manchester filter coefficient is used to perform voice signal tracking again according to the processing result of the Kalman filter process.
  • the voice signal processing apparatus of the embodiment further includes: a search module 512, configured to perform a voice signal search on the audio signal received by the microphone array before the angle acquiring module 502 acquires the angular position of the voice signal relative to the microphone array;
  • the capture module 514 is configured to perform voice capture on the searched voice signal, determine an initial angular position of the voice signal relative to the microphone array according to the result of the voice capture, and an initial tracking module 516, configured to perform voice signal tracking according to the initial angular position determination.
  • the initial tracking module 516 is configured to determine a direction cosine vector of the captured speech signal according to the initial angular position; use the direction cosine vector as an observation vector of the Kalman filter, and perform Kalman filtering on the captured speech signal. Processing; obtaining a regeneration process of the Kalman filter process, if the innovation process is less than or equal to the set threshold, determining to perform voice signal tracking.
  • the capturing module 514 is configured to perform false alarm detection on the searched voice signal, and if no false alarm occurs, perform voice capture on the searched voice signal, and determine a voice signal relative to the microphone according to the result of the voice capture. The initial angular position of the array.
  • the search module 512 includes: a guiding module 5122, configured to determine a center wave phase of a search area corresponding to the audio signal received by the microphone array according to the guiding information; and a processing module 5124, configured to perform a voice signal search from the center wave phase .
  • the processing module 5124 is configured to perform beam energy detection on the center beam corresponding to the center wave phase from the center wave phase; if the search for the voice signal is determined according to the detection result, the voice signal search is ended; if the detection result is determined according to the detection result When the voice signal is searched, the center beam is subjected to wave position displacement, and the beam energy detection of the shifted beam is continued.
  • the processing module 5124 performs beam energy detection by: obtaining, for each sub-array of the microphone array, a correlation between a voice signal of the current sub-array and a voice signal of all sub-arrays under the current beam; according to the correlation Obtaining a beam energy corresponding to the current beam; determining whether to search for a voice signal in the current beam according to the relationship between the beam energy and the set noise threshold.
  • the voice signal processing apparatus of the present embodiment is used to implement the corresponding voice signal processing method in the foregoing multiple method embodiments, and has the beneficial effects of the corresponding method embodiments, and details are not described herein again.
  • FIG. 7 there is shown a block diagram of a speech signal processing system in accordance with a sixth embodiment of the present invention.
  • the voice signal processing system of this embodiment includes: a microphone array 602, a preamplifier 604, a band pass filter 606, an analog to digital conversion module 608, an audio signal processing module 610, a noise cancellation module 612, a voice output module 614, and a beam control module. 616.
  • the microphone array 602 is divided into four sub-arrays for receiving original sound from the environment and converting into four analog sound signals. Since the positions of the four sub-arrays are different, there is a difference in the time between the sound signals reaching each sub-array, so there is a phase difference in the four-way sound signals.
  • the preamplifier 604, the bandpass filter 606, and the analog to digital conversion module 608 are pre-processing processes of the sound signal, and convert the obtained four analog sound signals into four digital sound signals containing phase information.
  • the preamplifier 604 is used to amplify the analog sound signal
  • the bandpass filter 606 is used to filter the amplified analog sound signal
  • the analog to digital conversion module 608 is configured to convert the filtered analog sound signal into a digital sound signal.
  • the audio signal processing module 610 includes: a signal pre-processing module 6102, a voice search/capture module 6104, a voice positioning module 6106, and a tracking filtering module 6108.
  • the signal pre-processing module 6102 is configured to receive four digital audio signals from the analog-to-digital conversion module 608, and perform data buffering, signal pre-emphasis, and windowing processing.
  • the voice search/capture module 6104 is configured to implement the functions of the search module 512 in the fifth embodiment; the voice location module 6106 is configured to implement the function of the capture module 514 in the fifth embodiment; and the tracking filter module 6108 is used to implement the initial in the fifth embodiment.
  • voice search/capture module 6104 For a specific function implementation of the voice search/capture module 6104, the voice location module 6106, and the tracking filter module 6108, reference may be made to the description of the related parts in the fifth embodiment and the foregoing multiple method embodiments, and details are not described herein.
  • the tracking filter module 6108 is connected to the beam control module 616, and outputs the obtained angular position prediction value of the voice signal to the beam control module 616.
  • the beam control module 616 controls the direction of the beam of the microphone array 602 to automatically align the voice signal at the next moment. .
  • the noise cancellation module 612 performs a enhancement process on the voice signal processed by the signal preprocessing module 6102 by using a single channel voice enhancement method, and transmits the enhanced voice signal to the voice output module 614 for output.
  • the noise cancellation module 612 adopts a single channel speech enhancement method. For each frame of the speech signal, the time domain signal is first transformed into the frequency domain, and the noise of the speech signal is roughly calculated using the quantile noise estimation, and then the a priori signal to noise ratio is calculated. The posterior signal-to-noise ratio and the probability of occurrence of noise, update the estimated noise according to the magnitude of the noise probability, and finally calculate the filter coefficient of the Wiener filter according to the a priori SNR of each frame of the speech signal, according to the The filter coefficient is subjected to Wiener filtering of the speech signal and output.
  • the voice signal processing system of this embodiment can search for the position of the voice signal, then capture the voice signal to reduce the position error, and use the sound source localization and the tracking filter to predict the beam direction of the voice signal at the next moment, thereby realizing the real-time tracking sound of the microphone array beam.
  • Source enhance the sound direction of the sound source, and suppress the noise of interference in other directions.
  • the calculation amount is small, which is convenient for real-time tracking, fast tracking speed, and suitable for moving scenes.
  • the voice signal processing solution provided by the embodiment of the present invention has a fast tracking speed and good adaptability, and can be widely applied to various scenarios such as a hearing aid, a mobile terminal, a smart speaker, a video conference, and a mobile robot.
  • the Kalman filter in the embodiment of the present invention takes the standard Kalman filter as an example, but those skilled in the art should understand that other processes similar to the standard Kalman filter are also applicable.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)

Abstract

本发明实施例提供了一种语音信号处理方法及装置,其中,语音信号处理方法包括:获取语音信号相对于麦克风阵列的角度位置,其中,所述角度位置包括所述语音信号相对于所述麦克风阵列的方位角和俯仰角;根据所述角度位置,确定所述语音信号的声源方向的方向向量;根据所述方向向量,对所述语音信号进行卡尔曼滤波处理;根据所述卡尔曼滤波处理的处理结果,进行语音信号跟踪。本发明实施例提供的语音信号处理方案应用于移动场景中语音信号的快速处理时,可以获得较好的处理效果。

Description

语音信号处理方法及装置 技术领域
本发明实施例涉及计算机技术领域,尤其涉及一种语音信号处理方法及装置。
背景技术
随着人工智能技术的快速发展,语音信号处理作为人机交互研究中的一个重要环节,已经成为国内外科技巨头研究的热点。
在各种语音交互设备中(如数字助听器、多媒体系统、移动机器人等),由于声源在移动过程中的物理位置会发生变化,导致麦克风阵列的波束指向偏离声源,造成降噪性能降低。为了能实时达到最佳的语音效果,需要麦克风阵列在接收语音的过程中波束始终对准目标声源,削弱非目标声源的影响,如削弱非目标说话人的语音及背景噪声。为此,一些方案,如采用运动图像跟踪的方法或基于高分辨率谱估计、基于时延估计等定位算法与粒子滤波跟踪算法相结合的方式,被应用于语音信号处理。
然而,这些方案因算法自身收敛速度较慢或计算复杂度较高的特性,无法适应移动场景中语音信号的快速处理,如快速定位和跟踪,从而使得移动场景中的语音信号处理效果欠佳。
发明内容
本发明实施例提供一种语音信号处理方法及装置,以解决现有技术的语音信号处理方案应用于移动场景中语音信号的快速处理时,处理效果较差的问题。
根据本发明实施例的一个方面,提供了一种语音信号处理方法,包括:获取语音信号相对于麦克风阵列的角度位置,其中,所述角度位置包括所述语音信号相对于所述麦克风阵列的方位角和俯仰角;根据所述角度位置,确定所述语音信号的声源方向的方向向量;根据所述方向向量,对所述语音信号进行卡尔曼滤波处理;根据所述卡尔曼滤波处理的处理结果,进行语音信号跟踪。
根据本发明实施例的另一个方面,提供了一种语音信号处理装置,包括: 角度获取模块,用于获取语音信号相对于麦克风阵列的角度位置,其中,所述角度位置包括所述语音信号相对于所述麦克风阵列的方位角和俯仰角;方向确定模块,用于根据所述角度位置,确定所述语音信号的声源方向的方向向量;滤波模块,用于根据所述方向向量,对所述语音信号进行卡尔曼滤波处理;跟踪模块,用于根据所述卡尔曼滤波处理的处理结果,进行语音信号跟踪。
通过本发明实施例提供的方案,根据语音信号相对于麦克风阵列的角度位置,对语音信号进行卡尔曼滤波处理,进而根据卡尔曼滤波处理的处理结果进行语音信号跟踪。卡尔曼滤波在每次进行滤波处理时,仅以前次滤波结果和偏差进行本次的估计,无需对其它数据进行处理,因而具有较快的运行速度。卡尔曼滤波是一种线性滤波,需要根据滤波对象的位置信息和速度信息生成状态向量,但麦克风阵列接收的语音信号的位置信息和速度信息无法满足卡尔曼滤波的线性滤波要求,因此,本发明实施例中,将语音信号的角度位置转换为能够满足线性滤波要求的声源方向的方向向量,以进行卡尔曼滤波,获得移动场景中下一时刻语音信号的估计位置,以进行语音跟踪。
可见,将本发明实施例提供的语音信号处理方案应用于移动场景中语音信号的快速处理时,可以获得较好的处理效果。
附图说明
为了更清楚地说明本发明实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本发明的一些实施例,对于本领域技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1为根据本发明实施例一的一种语音信号处理方法的步骤流程图;
图2为图1所示实施例中的一种语音信号相对于麦克风阵列的角度位置的示意图;
图3为根据本发明实施例二的一种语音信号处理方法的步骤流程图;
图4为根据本发明实施例三的一种语音信号处理方法的步骤流程图;
图5为根据本发明实施例四的一种语音信号处理装置的结构框图;
图6为根据本发明实施例五的一种语音信号处理装置的结构框图;
图7为根据本发明实施例六的一种语音信号处理系统的结构示意图。
具体实施方式
为使得本发明实施例的发明目的、特征、优点能够更加的明显和易懂,下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明实施例一部分实施例,而非全部实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明实施例保护的范围。
实施例一
参照图1,示出了根据本发明实施例一的一种语音信号处理方法的步骤流程图。
步骤S102:获取语音信号相对于麦克风阵列的角度位置。
其中,所述角度位置包括语音信号相对于麦克风阵列的方位角和俯仰角。
麦克风阵列是由一定数目的声学传感器,通常为麦克风,组成的阵列结构,用来对接收到的来自空间不同方向的语音信号进行采样并处理。在语音通信中,语音信号的特征主要体现在时域和频域两方面,但麦克风阵列在时域和频域的基础上增加一个空间域,对接收到的来自空间不同方向的语音信号进行空时处理。麦克风阵列接收原始模拟语音信号并进行例如加权、时延、求和等的处理后形成空间指向性的波束,即麦克风阵列的波束。本发明实施例中,语音信号相对于麦克风阵列的角度位置可以理解为麦克风阵列的波束的指向方向。
麦克风阵列具有均匀线阵、均匀面阵、均匀圆阵、任意离散阵列等多种阵列拓扑结构,本发明实施例中,麦克风阵列可以采用均匀面阵或均匀圆阵拓扑结构。
基于该种结构,不同方向的语音信号相对于该麦克风阵列具有方位角和俯仰角。如图2所示,在三维坐标系XYZ中,设定Z轴方向为麦克风阵列的法线方向,XOY平面为麦克风阵列所在平面,语音信号的来波方向即声源方向与麦克风阵列法线方向的夹角θ为语音信号相对于麦克风阵列的俯仰角,语音信号的来波方向即声源方向在麦克风阵列所在平面内投影与X轴的夹角
Figure PCTCN2018078505-appb-000001
为语音信号相对于麦克风阵列的方位角。
步骤S104:根据所述角度位置,确定语音信号的声源方向的方向向量。
在获取了语音信号相对于麦克风阵列的方位角和俯仰角后,即可确定语音信号相对于麦克风阵列的声源方向,为便于后续进行卡尔曼滤波处理,本步骤中,通过方向向量指示语音信号的声源方向。其中,方向向量可以采用任意适当的形式,包括但不限于方向余弦向量。
步骤S106:根据所述方向向量,对语音信号进行卡尔曼滤波处理。
卡尔曼滤波是一种线性滤波,需要根据滤波对象的位置信息和速度信息生成状态向量,但麦克风阵列接收的语音信号的角度位置和速度无法满足卡尔曼滤波的线性滤波要求,因此,需要将语音信号的角度位置转换为能够满足线性滤波要求的声源方向的方向向量,再进行卡尔曼滤波。
步骤S108:根据卡尔曼滤波处理的处理结果,进行语音信号跟踪。
通过卡尔曼滤波,可以获得移动场景中下一时刻语音信号的估计位置,以进行语音信号跟踪。卡尔曼滤波用于预估下一时刻的语音信号的位置,其中,下一时刻的具体时间根据语音信号的跟踪周期确定,跟踪周期可以由本领域技术人员根据实际情况适当设置,能够保证语音信号的短时平稳即可,例如,可以设置为10ms(毫秒)。
通过本实施例,根据语音信号相对于麦克风阵列的角度位置,对语音信号进行卡尔曼滤波处理,进而根据卡尔曼滤波处理的处理结果进行语音信号跟踪。卡尔曼滤波在每次进行滤波处理时,仅以前次滤波结果和偏差进行本次的估计,无需对其它数据进行处理,因而具有较快的运行速度。卡尔曼滤波是一种线性滤波,需要根据滤波对象的位置信息和速度信息生成状态向量,但麦克风阵列接收的语音信号的角度位置和速度信息无法满足卡尔曼滤波的线性滤波要求,因此,本发明实施例中,将语音信号的角度位置转换为能够满足线性滤波要求的声源方向的方向向量,以进行卡尔曼滤波,获得移动场景中下一时刻语音信号的估计位置,以进行语音跟踪。
可见,将本实施例提供的语音信号处理方案应用于移动场景中语音信号的快速处理时,可以获得较好的处理效果。
实施例二
参照图3,示出了根据本发明实施例二的一种语音信号处理方法的步骤流程图。
本实施例的语音信号处理方法包括以下步骤:
步骤S202:对麦克风阵列接收的音频信号进行语音信号搜索。
一个麦克风阵列通常由多个子阵构成,本实施例中,以4路子阵构成的麦克风阵列为例,对本发明实施例提供的语音信号处理方法进行说明,其它数量子阵的麦克风阵列可参照本实施例实现。如实施例一中所述,本实施例中的麦克风阵列可采用均匀面阵或均匀圆阵拓扑结构。
麦克风阵列接收的音频信号可能包含有语音信号也可能不包含语音信号(如完全为背景噪声的信号),而对语音信号进行处理或跟踪的前提是需要搜索到语音信号。初始时,可以根据引导信息确定麦克风阵列接收的音频信号对应的搜索区域的中心波相位;从中心波相位开始进行语音信号搜索。引导信息是麦克风阵列所在设备用于确定声源的初始波束指向的信息,该引导信息中通常包含有声源的大致空间位置的信息。接收的音频信号对应的搜索区域表现为不同信号的麦克风阵列波束构成的区域。也即,中心波相位在初始时由引导信息给出,在进行语音搜索时,从中心波相位开始,先对中心波相位对应的中心波束进行语音搜索,若搜索到语音信号,则结束搜索,若未搜索到语音信号,则确定下一个波束进行语音搜索,其中,可以基于中心波束进行波位位移,以位移后的波束作为下一个进行语音搜索的波束。对中心波束的波位位移可以从上、下、左、右等多个方向进行位移,下一个波束的确定可以随机,即随机确定向左还是向右,向上还是向下进行位移。
在一种可行方式中,语音信号搜索可以通过波束能量检测的方式实现,也即,从中心波相位开始,对中心波相位对应的中心波束进行波束能量检测;若根据检测结果确定搜索到语音信号,则结束语音信号搜索;若根据检测结果确定未搜索到语音信号,则对中心波束进行波位位移,继续对位移后的波束进行波束能量检测。
其中,波束能量检测包括:针对麦克风阵列的每个子阵,获取当前波束下,当前子阵的语音信号与所有子阵的语音信号的相关度;根据所述相关度,获取当前波束对应的波束能量;根据波束能量与设定的噪声门限的关系,确定是否在当前波束搜索到语音信号。其中,噪声门限可以由本领域技术人员根据实际需求适当设定,例如,可以根据无语音仅有背景噪声 时的波束能量设定。因麦克风阵列的每个子阵接收到的噪声互不相关,因此通过各个子阵之间的相关度进行各个子阵之间的互相关处理,可以提高接收的语音信号的信噪比,以实现语音信号的更为有效的检测。
步骤S204:对搜索到的语音信号进行语音捕获,根据语音捕获的结果确定语音信号相对于麦克风阵列的初始角度位置。
对搜索到的语音信号进行语音捕获可以采用任意适当的声源定位算法实现,本实施例中采用基于可控波束形成的声源定位算法。该算法基于声源位置的角度余弦在一定范围内与麦克风阵列的波束幅值差值比成线性关系来计算声源的角度余弦,从而获得声源的位置信息。但不限于此,其它声源定位算法也同样适用,如基于时延估计的声源定位算法、基于高分辨力谱估计的定位算法和SRP-PHAT(基于联合可控响应功率和相位变换)声源定位算法,等等。通过语音捕获,可以获取到语音信号相对于麦克风阵列的初始角度位置。
为确保语音捕获的有效性,可选地,可以先对搜索到的语音信号进行虚警检测,若未发生虚警,则对搜索到的语音信号进行语音捕获。由于噪声总是客观存在的,当噪声信号的幅度超过检测门限时,检测系统就会误认为发现目标,这种错误称为“虚警”。通过虚警检测,可以进一步确定搜索到的语音信号的有效性,提高后续语音捕获的有效性和准确性。
在对搜索到的语音信号进行语音捕获后,即可获取捕获到的语音信号相对于麦克风阵列的初始角度位置,即,捕获到的语音信号相对于麦克风阵列的方位角和俯仰角。
步骤S206:根据初始角度位置判断是否进行语音信号跟踪,若是,则执行步骤S208;若否,则返回步骤S204。
在捕获到语音信号并确定了其初始角度位置后,一种可行方式中,可以直接使用该初始角度位置,将其转换为声源方向的方向向量后使用卡尔曼滤波进行语音信号跟踪。但可选地,还可以根据该初始角度位置判断是否进入语音信号跟踪流程,并在确定能够进行语音信号跟踪后,进入语音信号跟踪流程,例如,根据初始角度位置,确定捕获到的语音信号的方向余弦向量;将该方向余弦向量作为卡尔曼滤波的观测向量,对捕获到的语音信号进行卡尔曼滤波处理;获取卡尔曼滤波处理的新息过程,若该新息 过程小于或等于设定门限值,则确定进行语音信号跟踪。其中,新息过程是通过卡尔曼滤波获得的预测值与实际测量值之差,门限值可以由本领域技术人员根据实际需求适当设定,如根据麦克风阵列的波束宽度设定等。具体到本实施例,可以将捕获到的语音信号经卡尔曼滤波处理后的方位角的角度余弦残差和俯仰角的角度余弦残差与设定门限值进行比对,其中,设定门限值可以为1/6的麦克风阵列的波束宽度,当两个角度余弦残差均小于该设定门限值时,进行语音信号跟踪,否则可以返回继续进行前述语音捕获过程。由此,可以进一步确保进行语音信号跟踪的有效性和准确性。
步骤S208:根据初始角度位置确定麦克风阵列的波束指向,根据所述波束指向进行语音信号的跟踪定位,根据跟踪定位结果获取下一时刻语音信号相对于麦克风阵列的角度位置。
在语音信号跟踪过程中,初始时,语音信号相对于麦克风阵列的角度位置为所述初始角度位置,后续角度位置根据卡尔曼滤波处理(跟踪定位)的结果确定。也即,麦克风阵列的初始波束指向由捕获过程获得的初始角度位置确定,后续的波束指向根据卡尔曼滤波处理预测的角度位置确定。其中,下一时刻根据跟踪周期确定,跟踪周期可以由本领域技术人员根据实际情况适当设置,能够保证语音信号的短时平稳即可,例如,可以设置为10ms(毫秒)。如,当前时刻为0分0秒0毫秒,开始语音信号的跟踪,此刻使用初始角度位置确定当前波束指向,同时根据该初始角度位置进行本发明实施例中的卡尔曼滤波处理,获得0分0秒10毫秒时的角度位置。
步骤S210:根据下一时刻的所述角度位置,确定语音信号的声源方向的方向向量。
本实施例中,根据步骤S208中获取的下一时刻语音信号相对于麦克风阵列的角度位置,确定语音信号的方向余弦向量,将所述方向余弦向量确定为语音信号的声源方向的方向向量。但不限于此,其它方向向量,如方向正弦向量,或者基于其它坐标系的类似方向向量等也同样适用。
当语音信号相对于麦克风阵列的俯仰角为θ,方位角为
Figure PCTCN2018078505-appb-000002
时,语音信号的方向余弦向量可以表示为:
Figure PCTCN2018078505-appb-000003
步骤S212:根据所述方向向量,对语音信号进行卡尔曼滤波处理。
在获得了语音信号的方向余弦向量后,可以将该语音信号的方向余弦向量作为卡尔曼滤波的观测向量;根据所述观测向量,对语音信号进行卡尔曼滤波处理。
卡尔曼滤波用反馈控制的方式估计过程状态,其先估计过程某一时刻的状态,然后以含噪声的测量变量的方式获得反馈。卡尔曼滤波的过程分为两部分:状态模型部分和观测模型部分。其中,状态模型是反映状态变化规律的模型,通过状态方程来描写相邻时刻的状态转移变化规律;观测模型反映了实际观测量与状态变量之间的关系。卡尔曼滤波通过上述两部分得到滤波对象的状态最优估计。本发明实施例中,采用标准卡尔曼滤波,其处理过程包括:建立状态模型(状态方程)和观测模型(观测方程);对状态模型和观测模型设定参数;使用状态模型,根据第n-1时刻的状态预测第n时刻的状态;使用观测模型,根据第n-1时刻的系统预测误差估计第n时刻的系统预测误差;计算卡尔曼滤波的新息过程;根据预测的第n时刻的状态和新息过程,计算系统最优估算值;计算系统当前时刻的系统预测误差。
具体到本步骤,在获得了语音信号的方向余弦向量后,根据该方向余弦向量以及方向余弦的变化速度,可以确定卡尔曼滤波的状态向量、状态方程及观测方程;进而,在状态预测过程中,可以根据状态方程预估下一时刻语音信号的状态向量,在观测过程中,可以通过观测方程获得语音信号方向余弦的新息过程;根据预估的状态向量和新息过程,可以获得下一语音信号的最优位置估计,即最优方向余弦向量;然后,根据该最优方向余弦向量,可确定下一时刻语言信号相对于麦克风阵列的角度位置。
步骤S214:根据卡尔曼滤波处理的处理结果,进行语音信号跟踪。
在一种可行方式中,基于所述语音信号的方向余弦向量,可以根据卡尔曼滤波处理后获得下一时刻的语音信号的方向余弦预测向量,获取语音信号相对于麦克风阵列的角度位置预测值;根据角度位置预测值进行语音信号跟踪。也即,可以根据角度位置预测值自动调整麦克风阵列的波束指向,得到下一时刻麦克风阵列的波束预测指向方向,从而完成语音信号跟踪。
在根据所述角度位置预测值进行语音信号跟踪时,一种可行方式中,可 以将所述角度位置预测值作为麦克风阵列接收的下一时刻的语音信号的波束指向,返回步骤S208继续执行。也即,该种方式中,以预测的下一时刻的语音信号的角度位置为中心,进行语音信号的再次定位,以对卡尔曼滤波获得的角度位置进行校正,以校正后的更为精准的角度位置进行后续语音跟踪。通过这种方式,使得语音跟踪更为精准和高效。
在另一种可行方式中,可以将角度位置预测值直接作为下一时刻的语音信号相对于麦克风阵列的角度位置,并返回步骤S210执行。也即,该种方式中,直接使用预测的下一时刻语音信号的角度位置进行语音信号跟踪,这种方式的语音信号跟踪较为快速。
此外,在语音信号跟踪过程中,可能出现语音信号闪烁或丢失的情况,对这种情况的处理可以采用以下方式:在语音信号跟踪的过程中,若根据下一时刻语音信号相对于麦克风阵列的角度位置,跟踪语音信号失败(如卡尔曼滤波处理预测出的角度位置与实际角度位置的偏差大于设定值,其中,该设定值与本领域技术人员根据实际需求设定),则使用对前次语音信号进行卡尔曼滤波处理的卡尔曼滤波系数,再次根据卡尔曼滤波处理的处理结果,进行语音信号跟踪。也即,当出现语音信号闪烁或丢失时,保持上一语音信号的卡尔曼滤波系数不变,基于上一语音信号的预测结果再次进行跟踪。若进行该处理N次后,仍然无法跟踪到语音信号,则确定语音信号丢失,否则,认为语音信号出现了闪烁。其中,N可以由本领域技术人员根据实际需要适当设定,本发明实施例对此不作限制。
可选地,可以根据卡尔曼滤波处理中的新息过程,确定跟踪语音信号失败。例如,当获取的新息过程超过设定阈值则判定当前的语音信号的位置为野值,当野值连续出现N次,则确定语音信号丢失,中断语音跟踪,转而进行语音信号搜索,从步骤S202开始重新执行。
通过本实施例,根据语音信号相对于麦克风阵列的角度位置,对语音信号进行卡尔曼滤波处理,进而根据卡尔曼滤波处理的处理结果进行语音信号跟踪。卡尔曼滤波在每次进行滤波处理时,仅以前次滤波结果和偏差进行本次的估计,无需对其它数据进行处理,因而具有较快的运行速度。卡尔曼滤波是一种线性滤波,需要根据滤波对象的位置信息和速度信息生成状态向量,但麦克风阵列接收的语音信号的位置信息和速度信息无法满足卡尔曼滤波的 线性滤波要求,因此,本发明实施例中,将语音信号的角度位置转换为能够满足线性滤波要求的声源方向的方向向量,以进行卡尔曼滤波,获得移动场景中下一时刻语音信号的估计位置,以进行语音跟踪。
可见,将本实施例提供的语音信号处理方案应用于移动场景中语音信号的快速处理时,可以获得较好的处理效果。
实施例三
参照图4,示出了根据本发明实施例三的一种语音信号处理方法的步骤流程图。
本实施例以一个具体实例的形式,对本发明提供的语音信号处理方案进行说明。
本实施例的语音信号处理方法包括以下步骤:
步骤S302:对麦克风阵列接收的音频信号进行预处理。
本实施例中,麦克风阵列划分为4路子阵,从嘈杂的环境中接收原始声音,转换成4路模拟音频信号。由于4个子阵的位置不同,如均匀面阵的常规子阵位置或者均匀圆阵的上下左右四个位置,声音到达每个子阵之间的时间会存在差别,所以得到4路模拟音频信号存在相位差。
将4路模拟音频信号转换为数字音频信号,例如,通过前置放大器、带通滤波器和模数转换装置,将得到的4路模拟音频信号转换成含有相位信息的4路数字音频信号。然后,对4路数字音频信号进行数据缓存、信号预加重、加窗处理。
通过对音频信号的预处理,可以提高后续语音信号搜索、定位及跟踪的效率。
步骤S304:搜索语音信号。
进行语音信号搜索时,首先需要确定当前搜索的中心位置,即中心波相位,然后再基于该中心位置,按照一定的波相位排列顺序进行不同波束的语音搜索。
具体地,麦克风阵列所在设备首次启动时,会根据引导信息进行初始化。此时,搜索的中心波相位由引导信息给出。本实施例中,基于中心波相位对应的中心波束,设置了五个波束,即:中心波相位对应的中心波束、中心波 束左移半个波束宽度后的波束、中心波束右移半个波束宽度后的波束、中心波束上移半个波束宽度后的波束、和中心波束下移半个波束宽度后的波束。本实施例中,以半个波束宽度为单位进行波位位移,但不限于此,在实际应用中,本领域技术人员也可以以其它适当单位基于中心波束进行波位位移,获得不同的波束。
基于设置的五个波束,先从中心波相位对应的中心波束开始进行语音搜索,若在该波束未搜索到语音信号,则从其它四个波束中再选择一个波束(如随机选择一个,或按顺时针顺序选择一个等,本发明实施例对选择的顺序不作限制)进行语音搜索。若任一波束搜索到语音信号时,则结束语音搜索,转入步骤S306进行语音信号的捕获。若五个波束均未搜索到语音信号,则重新获取引导信息,根据重新获取的引导信息调整和确定下一个待搜索的中心波相位。此外,如果重新获取的引导信息发生了更新,则根据新的引导信息确定下一个待搜索的中心波相位;如果重新获取的引导信息没有发生更新,则可以自动调整中心波相位,扩大搜索范围。
在上述语音搜索过程中,对每个波束的语音搜索可以通过对波束的波束能量检测实现。本实施例中,对每个波束的波束能量检测包括以下过程:
(1)计算当前波束指向下,每个子阵的语音信号与麦克风阵列的所有子阵的语音信号的相关度。
以子阵1(可以为麦克风阵列的4个子阵中的任意一个)为例,则麦克风阵列的4个子阵的语音信号与子阵1的语音信号的相关度为:
Figure PCTCN2018078505-appb-000004
其中,i表示子阵序号,本实施例中从1到4;N表示对当前语音帧的语音信号的采样个数;y i(n)表示第i个子阵接收的带噪语音信号;y 1*(n)表示y 1(n)的共轭。以此类推,当以其它子阵为基准时,其它子阵与麦克风阵列的4个子阵的语音信号的相关度可采用与上式类似的公式实现。
可见,通过上述公式获得子阵1与麦克风阵列的4个子阵的相关度R 11、R 12、R 13和R 14
(2)将以上4个相关度看作4个子阵的语音信号的复幅度,合成后得到 麦克风阵列的当前波束接收的复幅度F Σ,即波束能量。
F Σ=R 11+R 12+R 13+R 14
(3)将没有语音只有背景噪声时的复幅度作为噪声门限,即:
Figure PCTCN2018078505-appb-000005
其中n i(n)表示第i个子阵接收的没有语音信号的纯噪声信号;n 1*(n)表示n 1(n)的共轭。则,判断当前波束是否搜索到语音信号的条件如下:
Figure PCTCN2018078505-appb-000006
其中,k 1为放大系数,1<k 1≤2.5,可选地,k 1为2。
步骤S306:对搜索到的语音信号进行捕获。
本实施例中,首先对搜索到的语音信号的波束方向进行一次重复检测,以确定搜索过程是否发生虚警。若发生虚警,则返回步骤S304重新进行语音信号搜索;若未发生虚警,仍然检测出语音信号,则认为搜索到的语音信号有效,计算出该语音信号相对于麦克风阵列的角度
Figure PCTCN2018078505-appb-000007
以进行语音信号的捕获,其中,θ为俯仰角,
Figure PCTCN2018078505-appb-000008
为方位角。本实施例中,采用可控的波束形成算法进行语音捕获,该算法基于声源位置的角度余弦在一定范围内与麦克风阵列的波束幅值差值比成线性关系来计算声源的角度余弦,从而获得声源的位置信息。
具体地,语言信号相对于麦克风阵列的方向余弦表示为:
Figure PCTCN2018078505-appb-000009
Figure PCTCN2018078505-appb-000010
角度余弦坐标系下,假设检测出语音信号的中心波束,本实例中为波束5的指向为(α 55)=(α 00)。其中,(α 0,β 0)为步骤S304搜索到的语音信号对应的波束的指向。以波束5为中心,沿α坐标方向左右各偏移该方向的半个波束宽度,形成波束1和波束2,两波束指向分别为
11)=(α 03dB/2,β 0)
22)=(α 03dB/2,β 0)
同理,波束3和波束4的指向分别为:
33)=(α 003dB/2)
44)=(α 003dB/2)
对5个波束方位分别进行接收信号合成,可以得到5个方位的和波束复幅度F Σ1~F Σ5。按下式求取α和β方向的误差电压,获得相应的波束之间的幅值差:
Figure PCTCN2018078505-appb-000011
分别记α t=α-α 0,β t=β-β 0为语音信号偏离中心波束指向的角度余弦值,在一定范围内角误差信号u α与α t、u β与β t近似成线性关系,即:
Figure PCTCN2018078505-appb-000012
其中斜率k α和k β可以通过拟合得到,进而求解出语音信号方向的角度余弦:
Figure PCTCN2018078505-appb-000013
根据关系
Figure PCTCN2018078505-appb-000014
可以解出语音信号的角度位置
Figure PCTCN2018078505-appb-000015
在获得了语音信号的角度位置
Figure PCTCN2018078505-appb-000016
后,可以该角度位置为初始角度位置进行语音信号的跟踪。
可选地,在确定是否根据获得的语音信号进行跟踪时,可以将语音信号的角度余弦残差|α t|和|β t|与一新息过程门限做判决,该新息过程门限可以取1/6波束宽度,当角度余弦残差小于该新息过程门限时则进行语音信号的跟踪。否则,重新对搜索到的语音信号进行捕获。通过残差做判决,可以保证获取的语音信号的角度位置的准确度。
在系统进入稳定跟踪过程以后,则不需要再进行捕获过程,此时进行语音信号的定位。本实施例中,采用SRP-PHAT(基于联合可控响应功率和相位变换)声源定位算法,得到语音信号相对于麦克风阵列的角度。SRP-PHAT声源定位算法将可控响应功率方法固有的鲁棒性、短时分析特性与时延估计中相位变换方法对信号周围环境的不敏感性相结合,从而具有一定的抗噪性、 抗混响性和鲁棒性。
步骤S308:进行语音信号跟踪。
语音信号跟踪过程中首个跟踪的语音信号的角度位置由步骤S306获得的角度位置给出,完成一次跟踪(卡尔曼滤波处理)后,下一时刻所需的语音信号的角度位置由跟踪结果(卡尔曼滤波结果)给出。即,根据步骤S306获得的角度位置确定下一时刻麦克风阵列的波束指向,然后通过声源定位算法再次获得语音信号的方向余弦向量,循环执行该过程。
具体地,在步骤S306的语音捕获完成后,系统进入稳定跟踪过程,在该过程中,将语音信号的角度位置
Figure PCTCN2018078505-appb-000017
转换成角度余弦[X c Y c Z c] T,以角度余弦[X c Y c Z c] T为观测向量完成卡尔曼滤波处理,得到下一时刻语音信号对应的方向余弦预测值[X′ cY′ cZ′ c] T,然后转换为语音信号的角度位值
Figure PCTCN2018078505-appb-000018
,根据该角度位置实现下一时刻语音信号的定位和跟踪。
常规的卡尔曼滤波采用由语音信号的位置信息和速度信息组成的状态向量
Figure PCTCN2018078505-appb-000019
但语音信号的角度位置和速度显然与观测值之间无法满足卡尔曼滤波的线性滤波的要求。为此,本发明实施例在卡尔曼滤波中用语音信号的角度余弦[X c Y c Z c] T作为观测向量,其值为:
Figure PCTCN2018078505-appb-000020
基于该观测向量,对语音信号的卡尔曼滤波过程如下:
(1)设置卡尔曼滤波的公式和参数。
包括:将n时刻语音信号位于坐标轴上三个维度的方向余弦分别记为
Figure PCTCN2018078505-appb-000021
和z c(n)=cosθ,并且其变化速度分别为
Figure PCTCN2018078505-appb-000022
Figure PCTCN2018078505-appb-000023
Figure PCTCN2018078505-appb-000024
则卡尔曼滤波的状态变量记为
Figure PCTCN2018078505-appb-000025
设定T为跟踪周期,当T很小的时候,可以得到卡尔曼滤波的状态方程:
x(n)=F(n,n-1)x(n-1)+Γ(n,n-1)v 1(n-1)
其中,x(n)为n时刻的状态;F(n,n-1)为从n-1时刻到n时刻的状态转移矩阵;Γ(n,n-1)为从n-1时刻到n时刻的系统输入方程(系统状态噪声输入矩阵);v 1(n-1)为n-1时刻的噪声;x(n-1)为n-1时刻的状态。T可以由本领域技术人员根据实际情况适当设置,能够保证语音信号的短时平稳即可,例如,T可以设置为10ms。
进一步地,通过测量获得的语音信号的角度余弦信息,确定卡尔曼滤波的观测方程为:
z(n)=C(n)x(n)+v 2(n)
其中,Z(n)为n时刻的语音信号的角度余弦向量;C(n)为n时刻的观测矩阵;v 2(n)表示与v 1(n)相互独立均值为0的观测噪声;x(n)为n时刻的状态。
基于实测的n时刻的语音信号的角度余弦向量Z(n),和根据n-1时刻的语音信号的角度余弦向量预测的n时刻的语音信号的角度余弦向量
Figure PCTCN2018078505-appb-000026
可以获得卡尔曼滤波的新息过程,即:
Figure PCTCN2018078505-appb-000027
(2)基于上述状态方程计算下一语音信号的状态,并基于新息过程对计算的下一语音信号的状态进行修正;根据修正结果确定下一语音信号的状态。
需要说明的是,在语音跟踪过程中,还必须考虑由于环境因素导致语音信号闪烁甚至跟踪丢失的问题。如果接收到的语音信号出现闪烁或丢失时,可以利用回路记忆功能外推若干周期(即,保持卡尔曼滤波的滤波系数不变,再进行多次跟踪滤波),继续接收并跟踪先前预测的方向的语音信号。其中,若干周期可以由本领域技术人员根据实际情况适当设置,如设置为3次、6次等等,本发明实施例对此不作限制。
其中,对闪烁或者跟踪丢失的判定可以以新息过程为依据,也即,判定是否跟踪丢失以卡尔曼滤波过程中的新息过程作为判决量,当新息过程超过设定门限值则判定当前语音信号定位得到的角度位置为野值。当连续出现野值(连续出现次数与上述若干周期次数相同),应中断当前语音跟踪,重新转入步骤S304进行语音搜索。其中,设定门限值可以由本领域技术人员根据实际情况适当设置,如,设置为1/4波束宽度,但不限于此,在实际应用中,由本领域技术人员根据实际经验设置也可。
通过本实施例,麦克风阵列所属设备在移动过程中,在物理移动之前,麦克风阵列的波束已经指向下一个位置,从而减少了由于信号处理花费的时间和对波束指向调整造成的时延。麦克风阵列的波束可以根据设备物理移动和环境的特性自适应对准声源的来波方向,抑制其他方向的干扰和噪声信号,对麦克风阵列载体的移动特性具有较好的适应性。
实施例四
参照图5,示出了根据本发明实施例四的一种语音信号处理装置的结构框图。
本实施例的语音信号处理装置包括:角度获取模块402,用于获取语音信号相对于麦克风阵列的角度位置,其中,所述角度位置包括语音信号相对于麦克风阵列的方位角和俯仰角;方向确定模块404,用于根据所述角度位置,确定语音信号的声源方向的方向向量;滤波模块406,用于根据所述方向向量,对语音信号进行卡尔曼滤波处理;跟踪模块408,用于根据卡尔曼滤波处理的处理结果,进行语音信号跟踪。
通过本实施例,根据语音信号相对于麦克风阵列的角度位置,对语音信号进行卡尔曼滤波处理,进而根据卡尔曼滤波处理的处理结果进行语音信号跟踪。卡尔曼滤波在每次进行滤波处理时,仅以前次滤波结果和偏差进行本次的估计,无需对其它数据进行处理,因而具有较快的运行速度。卡尔曼滤波是一种线性滤波,需要根据滤波对象的位置信息和速度信息生成状态向量,但麦克风阵列接收的语音信号的位置信息和速度信息无法满足卡尔曼滤波的线性滤波要求,因此,本发明实施例中,将语音信号的角度位置转换为能够满足线性滤波要求的声源方向的方向向量,以进行卡尔曼滤波,获得移动场景中下一时刻语音信号的估计位置,以进行语音跟踪。
可见,将本实施例提供的语音信号处理方案应用于移动场景中语音信号的快速处理时,可以获得较好的处理效果。
实施例五
参照图6,示出了根据本发明实施例五的一种语音信号处理装置的结构框图。
本实施例的语音信号处理装置包括:角度获取模块502,用于获取语音信号相对于麦克风阵列的角度位置,其中,所述角度位置包括语音信号相对于麦克风阵列的方位角和俯仰角;方向确定模块504,用于根据所述角度位置,确定语音信号的声源方向的方向向量;滤波模块506,用于根据所述方向向量,对语音信号进行卡尔曼滤波处理;跟踪模块508,用于根据卡尔曼滤波处理的处理结果,进行语音信号跟踪。
可选地,方向确定模块504用于根据所述角度位置,确定语音信号的方向余弦向量,将所述方向余弦向量确定为语音信号的声源方向的方向向量。
可选地,滤波模块506用于将语音信号的方向余弦向量作为卡尔曼滤波的观测向量;根据所述观测向量,对语音信号进行卡尔曼滤波处理。
可选地,跟踪模块508包括:预测模块5082,用于根据卡尔曼滤波处理后获得的语音信号的方向余弦预测向量,获取语音信号相对于麦克风阵列的角度位置预测值;预测跟踪模块5084,用于根据角度位置预测值进行语音信号跟踪。
可选地,预测跟踪模块5084用于将角度位置预测值作为麦克风阵列接收的下一时刻的语音信号的波束指向,根据所述波束指向进行语音信号的跟踪定位,根据跟踪定位结果获取下一时刻语音信号相对于麦克风阵列的角度位置,返回方向确定模块504执行;或者,将角度位置预测值直接作为下一时刻的语音信号相对于麦克风阵列的角度位置,并返回方向确定模块504执行。
可选地,本实施例的语音信号处理装置还包括:丢失处理模块510,用于在语音信号跟踪的过程中,若跟踪语音信号失败,则使用对前次语音信号进行卡尔曼滤波处理的卡尔曼滤波系数,再次根据卡尔曼滤波处理的处理结果,进行语音信号跟踪。
可选地,丢失处理模块510用于在语音信号跟踪的过程中,若根据卡尔曼滤波处理中的新息过程,确定跟踪语音信号失败,则使用对前次语音信号进行卡尔曼滤波处理的卡尔曼滤波系数,再次根据卡尔曼滤波处理的处理结果,进行语音信号跟踪。
可选地,本实施例的语音信号处理装置还包括:搜索模块512,用于在角度获取模块502获取语音信号相对于麦克风阵列的角度位置之前,对麦克 风阵列接收的音频信号进行语音信号搜索;捕获模块514,用于对搜索到的语音信号进行语音捕获,根据语音捕获的结果确定语音信号相对于麦克风阵列的初始角度位置;初始跟踪模块516,用于根据初始角度位置确定进行语音信号跟踪。
可选地,初始跟踪模块516用于根据初始角度位置,确定捕获到的语音信号的方向余弦向量;将所述方向余弦向量作为卡尔曼滤波的观测向量,对捕获到的语音信号进行卡尔曼滤波处理;获取卡尔曼滤波处理的新息过程,若所述新息过程小于或等于设定门限值,则确定进行语音信号跟踪。
可选地,捕获模块514用于对搜索到的语音信号进行虚警检测,若未发生虚警,则对搜索到的语音信号进行语音捕获,根据语音捕获的结果确定语音信号相对于所述麦克风阵列的初始角度位置。
可选地,搜索模块512包括:引导模块5122,用于根据引导信息,确定麦克风阵列接收的音频信号对应的搜索区域的中心波相位;处理模块5124,用于从中心波相位开始进行语音信号搜索。
可选地,处理模块5124用于从中心波相位开始,对中心波相位对应的中心波束进行波束能量检测;若根据检测结果确定搜索到语音信号,则结束语音信号搜索;若根据检测结果确定未搜索到语音信号,则对中心波束进行波位位移,继续对位移后的波束进行波束能量检测。
可选地,处理模块5124通过以下方式进行波束能量检测:针对麦克风阵列的每个子阵,获取当前波束下,当前子阵的语音信号与所有子阵的语音信号的相关度;根据所述相关度,获取当前波束对应的波束能量;根据所述波束能量与设定的噪声门限的关系,确定是否在当前波束搜索到语音信号。
本实施例的语音信号处理装置用于实现前述多个方法实施例中相应的语音信号处理方法,并具有相应的方法实施例的有益效果,在此不再赘述。
实施例六
参照图7,示出了根据本发明实施例六的一种语音信号处理系统的结构示意图。
本实施例的语音信号处理系统包括:麦克风阵列602、前置放大器604、 带通滤波器606、模数转换模块608、音频信号处理模块610、噪声消除模块612、语音输出模块614、波束控制模块616。
其中,麦克风阵列602划分为4路子阵,用于从环境中接收原始声音,转换成4路模拟声音信号。由于4个子阵的位置不同,声音信号到达每个子阵之间的时间会存在差别,所以得到4路声音信号存在相位差。
前置放大器604、带通滤波器606和模数转换模块608为声音信号的前期处理过程,将得到的4路模拟声音信号转换成含有相位信息的4路数字声音信号。其中,前置放大器604用于放大模拟声音信号,带通滤波器606用于对放大后的模拟声音信号进行滤波,模数转换模块608用于将滤波后的模拟声音信号转换为数字声音信号。
本实施例中,音频信号处理模块610包括:信号预处理模块6102、语音搜索/捕获模块6104、语音定位模块6106和跟踪滤波模块6108。
其中,信号预处理模块6102用于接收来自模数转换模块608的4路数字声音信号,进行数据缓存、信号预加重、加窗处理。
语音搜索/捕获模块6104用于实现实施例五中的搜索模块512的功能;语音定位模块6106用于实现实施例五中捕获模块514的功能;跟踪滤波模块6108用于实现实施例五中的初始跟踪模块516、以及,角度获取模块502、方向确定模块504、滤波模块506、跟踪模块508和丢失处理模块510的功能。
上述语音搜索/捕获模块6104、语音定位模块6106和跟踪滤波模块6108的具体功能实现可参照实施例五及前述多个方法实施例中相关部分的描述,在此不再详述。
此外,跟踪滤波模块6108与波束控制模块616连接,将获得的语音信号的角度位置预测值输出给波束控制模块616,波束控制模块616控制麦克风阵列602的波束在下一时刻自动对准语音信号的方向。
噪声消除模块612采用单通道语音增强方法,对经信号预处理模块6102处理后的语音信号进行增强处理,并将增强处理后的语音信号传输给语音输出模块614进行输出。
例如,噪声消除模块612采用单通道语音增强方法,对于每一帧语音信号,首先将时域信号变换到频域,采用分位数噪声估计粗略计算语音信号 的噪声,然后计算先验信噪比、后验信噪比和噪声出现的概率,根据噪声概率的大小更新估计的噪声,最后依据每一帧语音信号的先验信噪比,分频段的计算维纳滤波器的滤波系数,根据该滤波系数对语音信号进行维纳滤波后输出。
本实施例的语音信号处理系统,可以搜索语音信号的位置,然后捕获语音信号减小位置误差,利用声源定位和跟踪滤波预测下一时刻语音信号的波束指向,从而达到麦克风阵列波束实时跟踪声源,增强声源方向语音,抑制其他方向干扰的噪声。并且,计算量小,便于实时跟踪,跟踪速度快,适宜移动场景等。
综上,本发明实施例提供的语音信号处理方案跟踪速度快且适应性好,可广泛应用于诸如助听器、移动终端、智能音箱、视频会议、移动机器人等多种场景。此外,本发明实施例中的卡尔曼滤波以标准卡尔曼滤波为例,但本领域技术人员应当明了,其它与标准卡尔曼滤波类似的处理过程也同样适用。
最后应说明的是:以上实施例仅用以说明本发明实施例的技术方案,而非对其限制;尽管参照前述实施例对本发明实施例进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本发明各实施例技术方案的精神和范围。

Claims (22)

  1. 一种语音信号处理方法,包括:
    获取语音信号相对于麦克风阵列的角度位置,其中,所述角度位置包括所述语音信号相对于所述麦克风阵列的方位角和俯仰角;
    根据所述角度位置,确定所述语音信号的声源方向的方向向量;
    根据所述方向向量,对所述语音信号进行卡尔曼滤波处理;
    根据所述卡尔曼滤波处理的处理结果,进行语音信号跟踪。
  2. 根据权利要求1所述的方法,其中,
    所述根据所述角度位置,确定所述语音信号的声源方向的方向向量,包括:根据所述角度位置,确定所述语音信号的方向余弦向量,将所述方向余弦向量确定为所述语音信号的声源方向的方向向量;
    所述根据所述方向向量,对所述语音信号进行卡尔曼滤波处理,包括:将所述语音信号的方向余弦向量作为卡尔曼滤波的观测向量;根据所述观测向量,对所述语音信号进行卡尔曼滤波处理;
    所述根据所述卡尔曼滤波处理的处理结果,进行语音信号跟踪,包括:根据所述卡尔曼滤波处理后获得的所述语音信号的方向余弦预测向量,获取所述语音信号相对于所述麦克风阵列的角度位置预测值;根据所述角度位置预测值进行语音信号跟踪。
  3. 根据权利要求2所述的方法,其中,所述根据所述角度位置预测值进行语音信号跟踪,包括:
    将所述角度位置预测值作为所述麦克风阵列接收的下一时刻的语音信号的波束指向,根据所述波束指向进行语音信号的跟踪定位,根据跟踪定位结果获取下一时刻语音信号相对于麦克风阵列的角度位置,返回所述根据所述角度位置,确定所述语音信号的声源方向的方向向量的步骤执行;
    或者,
    将所述角度位置预测值直接作为下一时刻的语音信号相对于麦克风阵列的角度位置,并返回所述根据所述角度位置,确定所述语音信号的声源方向的方向向量的步骤执行。
  4. 根据权利要求1-3任一项所述的方法,其中,所述方法还包括:
    在所述语音信号跟踪的过程中,若跟踪语音信号失败,则使用对前次 语音信号进行卡尔曼滤波处理的卡尔曼滤波系数,再次根据所述卡尔曼滤波处理的处理结果,进行语音信号跟踪。
  5. 根据权利要求4所述的方法,其中,所述跟踪语音信号失败,包括:
    根据所述卡尔曼滤波处理中的新息过程,确定跟踪语音信号失败。
  6. 根据权利要求1-5任一项所述的方法,其中,在所述获取语音信号相对于麦克风阵列的角度位置之前,所述方法还包括:
    对所述麦克风阵列接收的音频信号进行语音信号搜索;
    对搜索到的语音信号进行语音捕获,根据语音捕获的结果确定语音信号相对于所述麦克风阵列的初始角度位置;
    根据所述初始角度位置确定进行语音信号跟踪。
  7. 根据权利要求6所述的方法,其中,所述根据所述初始角度位置确定进行语音信号跟踪,包括:
    根据所述初始角度位置,确定捕获到的所述语音信号的方向余弦向量;
    将所述方向余弦向量作为卡尔曼滤波的观测向量,对捕获到的语音信号进行卡尔曼滤波处理;
    获取所述卡尔曼滤波处理的新息过程,若所述新息过程小于或等于设定门限值,则确定进行语音信号跟踪。
  8. 根据权利要求6所述的方法,其中,所述对搜索到的语音信号进行语音捕获,包括:
    对所述搜索到的语音信号进行虚警检测,若未发生虚警,则对搜索到的语音信号进行语音捕获。
  9. 根据权利要求6所述的方法,其中,所述对所述麦克风阵列接收的音频信号进行语音信号搜索,包括:
    根据引导信息,确定所述麦克风阵列接收的音频信号对应的搜索区域的中心波相位;
    从所述中心波相位开始进行语音信号搜索。
  10. 根据权利要求9所述的方法,其中,所述从所述中心波相位开始进行语音信号搜索,包括:
    从所述中心波相位开始,对所述中心波相位对应的中心波束进行波束能量检测;
    若根据检测结果确定搜索到语音信号,则结束所述语音信号搜索;
    若根据检测结果确定未搜索到语音信号,则对所述中心波束进行波位位移,继续对位移后的波束进行波束能量检测。
  11. 根据权利要求10所述的方法,其中,所述波束能量检测包括:
    针对所述麦克风阵列的每个子阵,获取当前波束下,当前子阵的语音信号与所有子阵的语音信号的相关度;
    根据所述相关度,获取当前波束对应的波束能量;
    根据所述波束能量与设定的噪声门限的关系,确定是否在当前波束搜索到语音信号。
  12. 一种语音信号处理装置,包括:
    角度获取模块,用于获取语音信号相对于麦克风阵列的角度位置,其中,所述角度位置包括所述语音信号相对于所述麦克风阵列的方位角和俯仰角;
    方向确定模块,用于根据所述角度位置,确定所述语音信号的声源方向的方向向量;
    滤波模块,用于根据所述方向向量,对所述语音信号进行卡尔曼滤波处理;
    跟踪模块,用于根据所述卡尔曼滤波处理的处理结果,进行语音信号跟踪。
  13. 根据权利要求12所述的装置,其中,
    所述方向确定模块,用于根据所述角度位置,确定所述语音信号的方向余弦向量,将所述方向余弦向量确定为所述语音信号的声源方向的方向向量;
    所述滤波模块,用于将所述语音信号的方向余弦向量作为卡尔曼滤波的观测向量;根据所述观测向量,对所述语音信号进行卡尔曼滤波处理;
    所述跟踪模块包括:预测模块,用于根据所述卡尔曼滤波处理后获得的所述语音信号的方向余弦预测向量,获取所述语音信号相对于所述麦克风阵列的角度位置预测值;预测跟踪模块,用于根据所述角度位置预测值进行语音信号跟踪。
  14. 根据权利要求13所述的装置,其中,所述预测跟踪模块,用于将所述角度位置预测值作为所述麦克风阵列接收的下一时刻的语音信号的波束 指向,根据所述波束指向进行语音信号的跟踪定位,根据跟踪定位结果获取下一时刻语音信号相对于麦克风阵列的角度位置,返回所述方向确定模块执行;或者,将所述角度位置预测值直接作为下一时刻的语音信号相对于麦克风阵列的角度位置,并返回所述方向确定模块执行。
  15. 根据权利要求12-14任一项所述的装置,其中,所述装置还包括:
    丢失处理模块,用于在所述语音信号跟踪的过程中,若跟踪语音信号失败,则使用对前次语音信号进行卡尔曼滤波处理的卡尔曼滤波系数,再次根据所述卡尔曼滤波处理的处理结果,进行语音信号跟踪。
  16. 根据权利要求15所述的装置,其中,所述丢失处理模块,用于在所述语音信号跟踪的过程中,若根据所述卡尔曼滤波处理中的新息过程,确定跟踪语音信号失败,则使用对前次语音信号进行卡尔曼滤波处理的卡尔曼滤波系数,再次根据所述卡尔曼滤波处理的处理结果,进行语音信号跟踪。
  17. 根据权利要求12-16任一项所述的装置,其中,所述装置还包括:
    搜索模块,用于在所述角度获取模块获取语音信号相对于麦克风阵列的角度位置之前,对所述麦克风阵列接收的音频信号进行语音信号搜索;
    捕获模块,用于对搜索到的语音信号进行语音捕获,根据语音捕获的结果确定语音信号相对于所述麦克风阵列的初始角度位置;
    初始跟踪模块,用于根据所述初始角度位置确定进行语音信号跟踪。
  18. 根据权利要求17所述的装置,其中,所述初始跟踪模块,用于根据所述初始角度位置,确定捕获到的所述语音信号的方向余弦向量;将所述方向余弦向量作为卡尔曼滤波的观测向量,对捕获到的语音信号进行卡尔曼滤波处理;获取所述卡尔曼滤波处理的新息过程,若所述新息过程小于或等于设定门限值,则确定进行语音信号跟踪。
  19. 根据权利要求17所述的装置,其中,所述捕获模块,用于对所述搜索到的语音信号进行虚警检测,若未发生虚警,则对搜索到的语音信号进行语音捕获,根据语音捕获的结果确定语音信号相对于所述麦克风阵列的初始角度位置。
  20. 根据权利要求17所述的装置,其中,所述搜索模块包括:
    引导模块,用于根据引导信息,确定所述麦克风阵列接收的音频信号对应的搜索区域的中心波相位;
    处理模块,用于从所述中心波相位开始进行语音信号搜索。
  21. 根据权利要求20所述的装置,其中,所述处理模块,用于从所述中心波相位开始,对所述中心波相位对应的中心波束进行波束能量检测;若根据检测结果确定搜索到语音信号,则结束所述语音信号搜索;若根据检测结果确定未搜索到语音信号,则对所述中心波束进行波位位移,继续对位移后的波束进行波束能量检测。
  22. 根据权利要求21所述的装置,其中,所述处理模块通过以下方式进行波束能量检测:
    针对所述麦克风阵列的每个子阵,获取当前波束下,当前子阵的语音信号与所有子阵的语音信号的相关度;
    根据所述相关度,获取当前波束对应的波束能量;
    根据所述波束能量与设定的噪声门限的关系,确定是否在当前波束搜索到语音信号。
PCT/CN2018/078505 2018-03-09 2018-03-09 语音信号处理方法及装置 WO2019169616A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2018/078505 WO2019169616A1 (zh) 2018-03-09 2018-03-09 语音信号处理方法及装置
CN201880000268.1A CN110495185B (zh) 2018-03-09 2018-03-09 语音信号处理方法及装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/078505 WO2019169616A1 (zh) 2018-03-09 2018-03-09 语音信号处理方法及装置

Publications (1)

Publication Number Publication Date
WO2019169616A1 true WO2019169616A1 (zh) 2019-09-12

Family

ID=67845832

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/078505 WO2019169616A1 (zh) 2018-03-09 2018-03-09 语音信号处理方法及装置

Country Status (2)

Country Link
CN (1) CN110495185B (zh)
WO (1) WO2019169616A1 (zh)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111696570A (zh) * 2020-08-17 2020-09-22 北京声智科技有限公司 语音信号处理方法、装置、设备及存储介质
CN111785290A (zh) * 2020-05-18 2020-10-16 深圳市东微智能科技股份有限公司 麦克风阵列语音信号处理方法、装置、设备及存储介质
CN111798869A (zh) * 2020-09-10 2020-10-20 成都启英泰伦科技有限公司 一种基于双麦克风阵列的声源定位方法
CN113053376A (zh) * 2021-03-17 2021-06-29 财团法人车辆研究测试中心 语音辨识装置

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111402873B (zh) * 2020-02-25 2023-10-20 北京声智科技有限公司 语音信号处理方法、装置、设备及存储介质
CN113225478A (zh) * 2021-04-28 2021-08-06 维沃移动通信(杭州)有限公司 一种拍摄方法和装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104200813A (zh) * 2014-07-01 2014-12-10 东北大学 基于声源方向实时预测跟踪的动态盲信号分离方法
CN104330768A (zh) * 2013-12-04 2015-02-04 河南科技大学 一种基于声矢量传感器的机动声源方位估计方法
US20150036850A1 (en) * 2013-08-01 2015-02-05 Siemens Medical Instruments Pte. Ltd. Method for following a sound source, and hearing aid device
CN107507623A (zh) * 2017-10-09 2017-12-22 维拓智能科技(深圳)有限公司 基于麦克风阵列语音交互的自助服务终端

Family Cites Families (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6176837B1 (en) * 1998-04-17 2001-01-23 Massachusetts Institute Of Technology Motion tracking system
KR100499124B1 (ko) * 2002-03-27 2005-07-04 삼성전자주식회사 직교 원형 마이크 어레이 시스템 및 이를 이용한 음원의3차원 방향을 검출하는 방법
US7394907B2 (en) * 2003-06-16 2008-07-01 Microsoft Corporation System and process for sound source localization using microphone array beamsteering
DE602004021716D1 (de) * 2003-11-12 2009-08-06 Honda Motor Co Ltd Spracherkennungssystem
US8725506B2 (en) * 2010-06-30 2014-05-13 Intel Corporation Speech audio processing
CN102831898B (zh) * 2012-08-31 2013-11-13 厦门大学 带声源方向跟踪功能的麦克风阵列语音增强装置及其方法
CN103544959A (zh) * 2013-10-25 2014-01-29 华南理工大学 一种基于无线定位麦克风阵列语音增强的通话系统及方法
JP6195073B2 (ja) * 2014-07-14 2017-09-13 パナソニックIpマネジメント株式会社 収音制御装置及び収音システム
US9838804B2 (en) * 2015-02-27 2017-12-05 Cochlear Limited Methods, systems, and devices for adaptively filtering audio signals
KR101975057B1 (ko) * 2015-03-20 2019-05-03 한국전자통신연구원 잡음 환경에서의 음성 인식을 위한 특징 보상 장치 및 방법
CN107534725B (zh) * 2015-05-19 2020-06-16 华为技术有限公司 一种语音信号处理方法及装置
CN106970356A (zh) * 2016-01-14 2017-07-21 芋头科技(杭州)有限公司 一种复杂环境下声源定位跟踪方法
CN105807273B (zh) * 2016-04-20 2018-03-06 北京百度网讯科技有限公司 声源跟踪方法和装置
CN106251877B (zh) * 2016-08-11 2019-09-06 珠海全志科技股份有限公司 语音声源方向估计方法及装置
CN106842128B (zh) * 2017-02-11 2019-04-23 陈昭男 运动目标的声学跟踪方法及装置
CN107621266B (zh) * 2017-08-14 2020-12-15 上海宇航系统工程研究所 基于特征点跟踪的空间非合作目标相对导航方法

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150036850A1 (en) * 2013-08-01 2015-02-05 Siemens Medical Instruments Pte. Ltd. Method for following a sound source, and hearing aid device
CN104330768A (zh) * 2013-12-04 2015-02-04 河南科技大学 一种基于声矢量传感器的机动声源方位估计方法
CN104200813A (zh) * 2014-07-01 2014-12-10 东北大学 基于声源方向实时预测跟踪的动态盲信号分离方法
CN107507623A (zh) * 2017-10-09 2017-12-22 维拓智能科技(深圳)有限公司 基于麦克风阵列语音交互的自助服务终端

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111785290A (zh) * 2020-05-18 2020-10-16 深圳市东微智能科技股份有限公司 麦克风阵列语音信号处理方法、装置、设备及存储介质
CN111785290B (zh) * 2020-05-18 2023-12-26 深圳市东微智能科技股份有限公司 麦克风阵列语音信号处理方法、装置、设备及存储介质
CN111696570A (zh) * 2020-08-17 2020-09-22 北京声智科技有限公司 语音信号处理方法、装置、设备及存储介质
CN111696570B (zh) * 2020-08-17 2020-11-24 北京声智科技有限公司 语音信号处理方法、装置、设备及存储介质
CN111798869A (zh) * 2020-09-10 2020-10-20 成都启英泰伦科技有限公司 一种基于双麦克风阵列的声源定位方法
CN113053376A (zh) * 2021-03-17 2021-06-29 财团法人车辆研究测试中心 语音辨识装置

Also Published As

Publication number Publication date
CN110495185B (zh) 2022-07-01
CN110495185A (zh) 2019-11-22

Similar Documents

Publication Publication Date Title
WO2019169616A1 (zh) 语音信号处理方法及装置
US9837099B1 (en) Method and system for beam selection in microphone array beamformers
US10979805B2 (en) Microphone array auto-directive adaptive wideband beamforming using orientation information from MEMS sensors
CN106093864B (zh) 一种麦克风阵列声源空间实时定位方法
US9734822B1 (en) Feedback based beamformed signal selection
CN106782584B (zh) 音频信号处理设备、方法和电子设备
US9479885B1 (en) Methods and apparatuses for performing null steering of adaptive microphone array
US7536029B2 (en) Apparatus and method performing audio-video sensor fusion for object localization, tracking, and separation
CN111044973B (zh) 一种用于麦克风方阵的mvdr目标声源定向拾音方法
CN111445920B (zh) 一种多声源的语音信号实时分离方法、装置和拾音器
CN108109617B (zh) 一种远距离拾音方法
CN111025233A (zh) 一种声源方向定位方法和装置、语音设备和系统
US10887691B2 (en) Audio capture using beamforming
CN110610718B (zh) 一种提取期望声源语音信号的方法及装置
JP2004507767A (ja) 目的信号源から雑音環境に放射される信号を処理するシステム及び方法
WO2015106401A1 (zh) 语音处理方法和语音处理装置
CN110534126B (zh) 一种基于固定波束形成的声源定位和语音增强方法及系统
Badali et al. Evaluating real-time audio localization algorithms for artificial audition in robotics
Ince et al. Assessment of general applicability of ego noise estimation
TW202147862A (zh) 強烈雜訊干擾存在下穩健的揚聲器定位系統與方法
Zhang et al. Robust DOA Estimation Based on Convolutional Neural Network and Time-Frequency Masking.
WO2022142853A1 (zh) 一种声源定位方法以及装置
Zhang et al. Robust underwater direction-of-arrival tracking with uncertain environmental disturbances using a uniform circular hydrophone array
Novoa et al. Weighted delay-and-sum beamforming guided by visual tracking for human-robot interaction
CN111933182B (zh) 声源跟踪方法、装置、设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18908858

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18908858

Country of ref document: EP

Kind code of ref document: A1