WO2015106401A1 - 语音处理方法和语音处理装置 - Google Patents

语音处理方法和语音处理装置 Download PDF

Info

Publication number
WO2015106401A1
WO2015106401A1 PCT/CN2014/070641 CN2014070641W WO2015106401A1 WO 2015106401 A1 WO2015106401 A1 WO 2015106401A1 CN 2014070641 W CN2014070641 W CN 2014070641W WO 2015106401 A1 WO2015106401 A1 WO 2015106401A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
collection unit
arrival
sound collection
position data
Prior art date
Application number
PCT/CN2014/070641
Other languages
English (en)
French (fr)
Inventor
李长宁
Original Assignee
宇龙计算机通信科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 宇龙计算机通信科技(深圳)有限公司 filed Critical 宇龙计算机通信科技(深圳)有限公司
Priority to PCT/CN2014/070641 priority Critical patent/WO2015106401A1/zh
Priority to CN201480072103.7A priority patent/CN105874535B/zh
Priority to EP14878656.9A priority patent/EP3096319A4/en
Publication of WO2015106401A1 publication Critical patent/WO2015106401A1/zh
Priority to US15/206,410 priority patent/US20160322062A1/en

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/40Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
    • H04R1/406Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/04Circuits for transducers, loudspeakers or microphones for correcting frequency response
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups
    • H04R2430/20Processing of the output signals of the acoustic transducers of an array for obtaining a desired directivity characteristic

Definitions

  • the present invention relates to the field of communications technologies, and in particular, to a voice processing method and a voice processing device. Background technique
  • the existing multi-microphone terminals mainly include two microphone terminals and three microphone terminals (not shown), and the two microphone terminals are as shown in FIG. As shown, whether it is a two-microphone terminal or a three-microphone terminal, the vocal signal (microphone 1 in Fig. 1) is mainly collected through one microphone, and the other microphones mainly collect noise signals (microphone 2 in Fig. 1). The appropriate adaptive algorithm is then selected to remove the noise signal from the microphone 2 from the signal in the microphone 1 so that the clear speech is transmitted.
  • some mobile phone manufacturers have recently considered adopting a multi-microphone array-based speech noise reduction technology to perform noise reduction processing on the noisy speech signals collected during a call to obtain a clean speech signal.
  • Its implementation in mobile phones is achieved by implanting multiple microphones in the mobile phone. Generally, two to four microphones are placed under the mobile phone, arranged side by side (as shown in Figure 2), between each microphone. Keep a certain distance to form an array of microphones. Then, the signals received by the plurality of microphones are filtered by the method of array signal processing to achieve the purpose of noise reduction.
  • this technology is a more advanced and more adaptable mobile phone noise reduction scheme than adaptive noise cancellation technology.
  • Multi-microphone array signal processing is a modern signal processing method. It is a space-time domain signal processing technology. The algorithm itself not only considers the change of the signal with time, but also considers the change of the signal in space, so the calculation is very complicated. Since the mobile phone call is a real-time process, when using the multi-microphone array signal processing algorithm for noise reduction, it is hoped that the received language can be quickly received. The sound signal is denoised to minimize the amount of delay, but the mobile phone user often changes various postures during the process of receiving the call, which causes the distance and direction between the mobile phone and the user's sound source to change, so that the received The spatial characteristic information of the signal is also changing, and this change is random and unpredictable.
  • the invention is based on the above problems, and proposes a new speech processing method, which acquires terminal orientation change information during a call, and uses the information to timely correct certain parameters in the multi-microphone array-based speech denoising algorithm, so that
  • the noise reduction algorithm is adaptive and can adaptively adjust some parameters of the noise reduction algorithm according to the random change of the posture during the user's conversation to achieve the best noise reduction effect.
  • a voice processing method including: acquiring a position data change amount of a sound collection unit array on a terminal with respect to a user sound source; and correcting the a direction of arrival of the sound collection unit array; filtering the sound signal obtained by the sound collection unit.
  • the sound collection unit array signal processing method is a space-time signal processing method, because the voice signal received by the sound collection unit and various noise signals come from different orientations in space, so taking spatial orientation information into consideration will greatly improve
  • the ability of signal processing, and the noise reduction scheme based on the multi-sound acquisition unit array is to hope that the sound collection unit array extracts the sound signal from the space of the user's sound source from the space, thereby ignoring the noise signal from other directions, thereby achieving The purpose of noise reduction.
  • the sound collection unit array is to form a beam in space, so that Point to the direction in which the user is sounding, and filter out sounds in other directions.
  • the formation of the beam depends on the position of the sound collection unit array relative to the user's sound source.
  • the direction of arrival of the sound collection unit array is corrected according to the amount of change of the position information of the sound collection unit array on the terminal relative to the user sound source, regardless of the position of the terminal relative to the user sound source,
  • the sound signal from the direction of the user's sound source can always be extracted, so as to achieve the purpose of noise reduction, that is, the parameters of the noise reduction algorithm can be adjusted at any time according to the random change of the posture during the user's conversation. Good noise reduction effect.
  • the position data change amount of the sound collection unit array is acquired by using a gyroscope in the terminal, wherein the position data change amount includes a displacement change amount and a sound collection of the reference sound collection unit.
  • the amount of angular change in the cell array line is acquired by using a gyroscope in the terminal, wherein the position data change amount includes a displacement change amount and a sound collection of the reference sound collection unit. The amount of angular change in the cell array line.
  • the positions of the sound source and the sound collection unit are in a random change state, and at present, a large number of mobile phones are equipped with a gyroscope, and the gyroscope can provide accurate acceleration and angle change information. Therefore, the present invention uses the gyroscope to obtain the position data change amount of the sound collection unit array, and the accurate position data change amount is obtained, and the existing hardware devices in the terminal are fully utilized, and no additional hardware equipment is needed. Therefore, the hardware cost can be reduced while improving the noise reduction effect.
  • the step of modifying the direction of arrival of the sound collection unit array according to the change amount of the position data comprises: acquiring a reference sound collection unit and a sound collection unit in the sound collection unit array An initial position data of the array line relative to the user sound source, wherein the initial position data includes coordinate initial data of the reference sound collection unit and angle initial data of the sound collection unit array line; according to the initial position data and the position
  • the amount of data change calculates an angle of arrival (also referred to as a direction of arrival) between the direction of the sound wave of the current user sound source and the preset normal line of the sound collection unit array line.
  • a new angle of arrival between the changed sound source and the preset normal line of the sound collecting unit array line can be calculated, thereby determining In the changed direction of arrival, a new beam is formed, so that the direction of arrival of the microphone array can be directed to the user's sound source, so that the acquired sound signal is mainly the voice signal of the sound source.
  • the coordinates are established by using the user sound source as a coordinate origin. System, and calculate the angle of arrival according to the following formula:
  • e i+1 is the angle of arrival
  • (x ri , y ri , z ri ) is coordinate initial data of the reference sound collecting unit in the coordinate system
  • (a A, z ) is the The angle initial data of the sound collecting unit array line in the coordinate system
  • ( ⁇ 3 ⁇ 4, Ay ci , ⁇ réelle.) is the displacement change amount of the reference sound collecting unit in the coordinate system
  • ( ⁇ %, ⁇ , ⁇ ) is the amount of angular change of the sound collection unit array line in the coordinate system.
  • the angle of arrival of the microphone array with respect to the real-time variation of the user's sound source can be calculated. Because the formula is calculated, the computational complexity is greatly reduced, thereby reducing the estimation direction of the direction of arrival.
  • the method further includes: obtaining an initial position data of the reference sound collection unit and the sound collection unit array line with respect to the user sound source by using an automatic search direction of arrival direction.
  • the initial position data c of the sound collecting unit and the sound collecting unit array line with respect to the user sound source is acquired by using the automatic search direction direction. And ⁇ .
  • To determine the initial direction of arrival that is, to automatically obtain the initial position data c 0 of the sound collection unit and the sound collection unit array line relative to the user's sound source by automatically searching for the direction of arrival ( (x ci , y ci , z ci ) ) and ⁇ . (( ⁇ ⁇ ⁇ , ⁇ , ; ⁇ )).
  • the automatic search for the direction of arrival is to start the automatic determination of the direction of arrival after the mobile phone is turned on.
  • the method for estimating the direction of arrival according to the signal received by the microphone array is Traditional methods (including spectral estimation, linear prediction, etc.), subspace methods (including multiple signal classification, rotational invariant subspace method), maximum likelihood method, etc. These are the basic DOA estimation methods.
  • a general introduction to the literature on array signal processing is provided.
  • Each of these methods has its own advantages and disadvantages.
  • the traditional method may calculate the bill, but a large number of microphone array elements are required to obtain high-resolution speech effects, and the estimation of the direction of arrival is not as accurate as the latter two methods.
  • Such a small-sized array installed in a mobile phone is obviously not suitable for this method; the subspace method and the maximum likelihood method can better estimate the direction of arrival, but the calculation amount is very high.
  • the sub-space method or the maximum likelihood method can be used to estimate the direction of arrival when the call is connected.
  • the maximum likelihood method is a good choice because it is optimal.
  • the method although the calculation is the largest, but the calculation in the initial stage does not bring a large delay to the speech, and based on the accurate direction of arrival provided by the method, the direction information provided by the gyroscope can be used to change the real-time. The direction of the wave is corrected.
  • the present invention adopts an automatic search of the direction of arrival direction only when the initial position data is acquired, and when estimating the adaptive direction of arrival, the estimation of the direction of arrival can be realized only according to the amount of change of the position data provided by the gyroscope.
  • the automatic search of the direction of arrival is adopted. Since the calculation method of the automatic search direction is complicated, the whole process is in real time, and the present invention adopts the automatic search wave only when acquiring the initial position data. The method of reaching the direction is therefore better in real-time and the processing rate is greatly improved.
  • a voice processing device including: an acquiring unit, configured to acquire a position data change amount of a sound collection unit array on a terminal relative to a user sound source; Correcting a direction of arrival of the sound collection unit array according to the change amount of the position data; and processing unit, configured to perform filtering processing on the sound signal acquired by the sound collection unit.
  • the sound collection unit array signal processing method is a space-time signal processing method, because the voice signal received by the sound collection unit and various noise signals come from different orientations in space, so taking spatial orientation information into consideration will greatly improve
  • the ability of signal processing, and the noise reduction scheme based on multi-sound acquisition unit array is to hope that the sound collection unit array extracts the sound signal from the space of the user's sound source from the space, ignoring the noise signal from other directions, thereby achieving the drop.
  • the purpose of noise is a space-time signal processing method, because the voice signal received by the sound collection unit and various noise signals come from different orientations in space, so taking spatial orientation information into consideration will greatly improve
  • the ability of signal processing, and the noise reduction scheme based on multi-sound acquisition unit array is to hope that the sound collection unit array extracts the sound signal from the space of the user's sound source from the space, ignoring the noise signal from other directions, thereby achieving the drop.
  • the purpose of noise is to hope that the sound collection unit array extracts the sound signal from the space of
  • the sound collection unit array is to form a beam in the space to point to the direction in which the user sources the sound, and to filter out sounds in other directions.
  • the formation of the beam depends on the position of the sound collection unit array relative to the user's sound source.
  • the acquiring unit is a gyroscope for acquiring a position data change amount of the sound collecting unit array, wherein the position data change amount includes a displacement change amount of the reference sound collecting unit and The amount of angular change in the sound collection unit array line.
  • the positions of the sound source and the sound collection unit are in a random change state, and at present, a large number of mobile phones are equipped with a gyroscope, and the gyroscope can provide accurate acceleration and angle change information. Therefore, the present invention uses the gyroscope to obtain the position data change amount of the sound collection unit array, and the accurate position data change amount is obtained, and the existing hardware devices in the terminal are fully utilized, and no additional hardware equipment is needed. Therefore, the hardware cost can be reduced while improving the noise reduction effect.
  • the correcting unit includes: an initial position detecting unit, acquiring initial sound position data of the reference sound collecting unit and the sound collecting unit array line in the sound collecting unit array with respect to the user sound source
  • the initial position data includes coordinate initial data of the reference sound collection unit and angle initial data of the sound collection unit array line
  • the wave angle calculation unit calculates the current position according to the initial position data and the position data change amount An angle of arrival between a sound wave direction of the user sound source and a preset normal line of the sound collection unit array line to determine a direction of arrival of the sound collection unit array according to the angle of arrival.
  • a new angle of arrival between the changed sound source and the preset normal line of the sound collecting unit array line can be calculated, thereby determining In the changed direction of arrival, a new beam is formed, so that the direction of arrival of the microphone array can be directed to the user's sound source, so that the acquired sound signal is mainly the voice signal of the sound source.
  • the arrival angle calculation unit establishes a coordinate system by using the user sound source as a coordinate origin, and calculates the arrival angle according to the following formula:
  • (3 ⁇ 4, y ri , z ri ) is the coordinate initial data of the reference sound collecting unit in the coordinate system
  • ( ⁇ ⁇ ⁇ , ⁇ , ) is the sound collection
  • the angle initial data of the cell array line in the coordinate system ( ⁇ ., ⁇ ., ⁇ .) is the displacement change amount of the reference sound collecting unit in the coordinate system
  • ( ⁇ %, ⁇ , ⁇ ) is the amount of angular change of the sound collection unit array line in the coordinate system.
  • the angle of arrival of the microphone array with respect to the real-time variation of the user's sound source can be calculated. Because the formula is calculated, the computational complexity is greatly reduced, thereby reducing the estimation direction of the direction of arrival.
  • the initial position detecting unit acquires the reference sound collecting unit and the initial position data of the sound collecting unit array line with respect to the user sound source using an automatic search wave direction method.
  • the initial position data c of the sound collection unit and the sound collection unit array line with respect to the user sound source is obtained using an automatic search of the direction of arrival.
  • v To determine the initial direction of arrival, that is, to automatically obtain the initial position data c of the sound collection unit and the sound collection unit array line relative to the user's sound source by automatically searching for the direction of arrival. ( (x ci , y ci , z ci ) ) and ⁇ ( ( ⁇ ⁇ ⁇ , ⁇ , ; ⁇ )).
  • the present invention adopts an automatic search of the direction of arrival direction only when the initial position data is acquired, and when estimating the adaptive direction of arrival, the estimation of the direction of arrival can be realized only according to the amount of change of the position data provided by the gyroscope.
  • the automatic search of the direction of arrival is adopted. Since the calculation method of the automatic search direction is complicated, the whole process is in real time, and the present invention adopts the automatic search wave only when acquiring the initial position data. The method of reaching the direction is therefore better in real-time and the processing rate is greatly improved.
  • a program product stored on a non-transitory machine readable medium for voice processing, the program product comprising a machine executable for causing a computer system to perform the following steps Command: acquiring a position data change amount of the sound collection unit array on the terminal relative to the user sound source; correcting the sound collection sheet according to the position data change amount The direction of arrival of the element array.
  • a non-transitory machine readable medium storing a program product for speech processing, the program product comprising machine executable instructions for causing a computer system to perform the following steps: a position data change amount of the sound collection unit array on the terminal with respect to the user sound source; correcting the direction of arrival of the sound collection unit array according to the position data change amount.
  • a machine readable program the program causing a machine to perform the voice processing method according to any one of the above aspects.
  • a storage medium storing a machine readable program, wherein the machine readable program causes the machine to perform the voice processing method according to any one of the above aspects.
  • the invention provides a better noise reduction effect for a mobile phone with a multi-microphone array by using a gyroscope to provide information on the position and movement changes caused by the change of the posture of the mobile phone during the conversation of the mobile phone.
  • the noise reduction function module equipped with a multi-microphone array will have higher requirements on the mobile phone hardware, because the calculation capability is high, especially the estimation of the direction of arrival before beamforming is very complicated, and the present invention proposes
  • the information of the orientation change of the mobile phone provided by the gyroscope can accurately and quickly calculate the direction of arrival, and only requires a mathematical calculation, which does not require complex iterations, estimation algorithms, etc., so that the microphone array can be adaptively aligned at any time.
  • the sound source - the mouth is expected to improve the noise reduction of the microphone array.
  • Figure 1 shows a schematic diagram of a dual microphone position arrangement of a dual microphone terminal
  • FIG. 2 is a schematic diagram showing the arrangement of three microphone positions of a three-microphone terminal
  • FIG. 3 is a schematic diagram showing a voice processing method according to an embodiment of the present invention
  • FIG. 4 is a flowchart showing a hardware and software implementation of multi-microphone array noise reduction by means of gyroscope information according to an embodiment of the present invention
  • FIG. 5 is a block diagram of a terminal of a speech processing apparatus according to an embodiment of the present invention
  • FIG. 6 is a schematic diagram showing beamforming of a three-micro array mobile phone
  • FIG. 7 shows a schematic diagram of a sound receiving model of a microphone array
  • FIG. 8 is a schematic diagram showing an implementation principle of a delay-addition beamformer
  • Figure 9 shows a schematic diagram of the implementation principle of a delay-addition beamformer based on Wiener filtering
  • Figure 10 shows a geometrical diagram of the spatial position and orientation of a microphone array line in a handset.
  • FIG. 3 shows a schematic diagram of a speech processing method in accordance with one embodiment of the present invention.
  • the voice processing method may include the following steps: Step 302: Acquire a position data change amount of the sound collection unit array on the terminal relative to the user sound source; Step 304: According to the location The amount of data change corrects the direction of arrival of the sound collection unit array; Step 306: Perform filtering processing on the sound signal acquired by the sound collection unit.
  • the sound collection unit array signal processing method is a space-time signal processing method, because the voice signal received by the sound collection unit and various noise signals come from different orientations in space, so taking spatial orientation information into consideration will greatly improve
  • the ability of signal processing, and the noise reduction scheme based on multi-sound acquisition unit array is to hope that the sound collection unit array extracts the sound signal from the space of the user's sound source from the space, and filters the sound signal to achieve noise reduction. the goal of.
  • the sound collection unit array is to form a beam in space (as shown in Figure 6), pointing it in the direction of the user's sound source, and filtering out the sound in other directions.
  • the formation of the beam depends on the position of the sound collection unit array relative to the user's sound source.
  • the amount of change, correcting the direction of arrival of the sound collection unit array, regardless of the position of the terminal relative to the user's sound source, can always extract the sound signal from the direction of the user's sound source, thereby achieving the purpose of noise reduction, That is, it can adaptively adjust some parameters in the noise reduction algorithm according to the random change of the posture during the user's conversation, and filter the sound signal obtained by the sound collection unit to achieve the best noise reduction effect.
  • the position data change amount of the sound collection unit array is acquired by using a gyroscope in the terminal, wherein the position data change amount includes a displacement change amount and a sound collection of the reference sound collection unit.
  • the amount of angular change in the cell array line is acquired by using a gyroscope in the terminal, wherein the position data change amount includes a displacement change amount and a sound collection of the reference sound collection unit. The amount of angular change in the cell array line.
  • the step of modifying the direction of arrival of the sound collection unit array according to the change amount of the position data comprises: acquiring a reference sound collection unit and a sound collection unit in the sound collection unit array An initial position data of the array line relative to the user sound source, wherein the initial position data includes coordinate initial data of a reference sound collection unit and sound collection unit array line angle initial data; according to the initial position data and the position data
  • the amount of change calculates an angle of arrival between the direction of the sound wave of the current sound source of the user and the preset normal line of the sound collection unit array line (ie, the direction of arrival is determined).
  • the coordinate system is established by using the user sound source as a coordinate origin, and the angle of arrival is calculated according to the following formula:
  • e i+1 is the angle of arrival
  • (x ri , y ri , z ri ) is coordinate initial data of the reference sound collecting unit in the coordinate system
  • (a A, z ) is the The angle initial data of the sound collecting unit array line in the coordinate system
  • ( ⁇ 3 ⁇ 4 , Ay ci , ⁇ resort.) is the displacement change amount of the reference sound collecting unit in the coordinate system
  • ( ⁇ %, ⁇ , ⁇ ) is the amount of angular change of the sound collection unit array line in the coordinate system.
  • the angle of arrival of the microphone array with respect to the real-time variation of the user's sound source can be calculated. Because the formula is calculated, the computational complexity is greatly reduced, thereby reducing the estimation direction of the direction of arrival.
  • the method further includes: obtaining, by using an automatic search direction of arrival And taking initial sound position data of the reference sound collection unit and the sound collection unit array line with respect to the user sound source.
  • the initial position data C of the sound collecting unit with respect to the user's sound source is obtained using an automatic search of the direction of arrival.
  • v To determine the initial direction of arrival, that is, to automatically obtain the initial position data of the sound collection unit and the sound collection unit array line relative to the user's sound source by automatically searching for the direction of arrival ( (1 ⁇ 2, y ri , z Ci .) ) and Vo (automatic search for the direction of arrival is to start the calculation of the direction of the wave automatically from the moment the phone user starts to sound after the phone is turned on. Generally, according to the signal received by the microphone array.
  • the method of DOA estimation includes traditional methods (including spectral estimation, linear prediction, etc.), subspace methods (including multiple signal classification, rotation invariant subspace method), maximum likelihood method, etc. These are basic
  • the DOA estimation method is introduced in the general literature on array signal processing.
  • the traditional method may calculate the cartridge, but a large number of microphone array elements are required to obtain high resolution.
  • the rate of speech, and the estimation of the direction of the direction of arrival is not as accurate as the latter two methods, for the installation in the phone
  • Such a small-sized array is obviously not suitable for this method; although the subspace method and the maximum likelihood method can better estimate the direction of arrival, the amount of calculation is very large, and the real-time requirement for mobile phone calls is very high.
  • the present invention adopts an automatic search of the direction of arrival direction only when the initial position data is acquired, and when estimating the adaptive direction of arrival, the estimation of the direction of arrival can be realized only according to the amount of change of the position data provided by the gyroscope.
  • the automatic search of the direction of arrival is adopted. Since the calculation method of the automatic search direction is complicated, the whole process is in real time, and the present invention adopts the automatic search wave only when acquiring the initial position data. Direction The method is therefore better in real time and the processing rate is greatly improved.
  • FIG. 4 is a flow chart showing the hardware and software implementation of multi-microphone array noise reduction by means of gyroscope information, in accordance with one embodiment of the present invention.
  • Step 402 Automatically search for an initial position to form a beam.
  • the automatic search of the Boda method is used to search the initial position of the microphone array and the sounder to form a beam.
  • the automatic search for the direction of arrival is to start the automatic determination of the direction of arrival after the mobile phone is turned on.
  • the method for estimating the direction of arrival according to the signal received by the microphone array is Traditional methods (including spectral estimation, linear prediction, etc.), subspace methods (including multiple signal classification, rotational invariant subspace method), maximum likelihood method, etc. These are the basic DOA estimation methods.
  • a general introduction to the literature on array signal processing is provided. Each of these methods has its own advantages and disadvantages. For example, the traditional method may calculate the bill, but a large number of microphone array elements are required to obtain high-resolution speech effects, and the estimation of the direction of arrival is not as accurate as the latter two methods.
  • the subspace method and the maximum likelihood method can better estimate the direction of arrival, but the amount of calculation is very large, for mobile phone calls. Applications with high real-time requirements, these methods can not meet the requirements of real-time estimation in mobile phones.
  • the sub-space method or the maximum likelihood method can be used to estimate the direction of arrival when the call is connected. The maximum likelihood method is a good choice because it is optimal.
  • the method although the calculation is the largest, but the calculation in the initial stage does not bring a large delay to the speech, and based on the accurate direction of arrival provided by the method, the direction information provided by the gyroscope can be used to change the real-time.
  • the direction of the wave is corrected. That is to say, the initial position data c of the sound collecting unit and the sound collecting unit array line with respect to the user sound source can be acquired by automatically searching for the direction of arrival. ((1 ⁇ 2, y ci -, z c .) ) and , a i , )).
  • Step 404 The mobile phone gyroscope obtains a change parameter of the mobile phone position.
  • the position change data is acquired by the gyroscope.
  • Step 406 the direction of arrival is calculated. Calculate changes based on initial position information and bearing change After the direction of the wave.
  • Step 408 Input the calculated direction of arrival data into the waveforming algorithm, and form a beam by the microphone array.
  • Step 410 voice noise reduction processing.
  • the sound signal obtained by the sound collecting unit is filtered, that is, the voice signal collected by the beam is subjected to noise reduction processing.
  • Step 412 encoding and decoding an audio processing module.
  • the voice signal processed by the noise reduction is coded and processed for transmission.
  • FIG. 5 shows a terminal block diagram of a speech processing apparatus in accordance with still another embodiment of the present invention.
  • the voice processing device 500 includes: an obtaining unit 502, configured to acquire a position data change amount of a sound collection unit array on a terminal with respect to a user sound source; and a correction unit 504. Correcting the direction of arrival of the sound collection unit array according to the change amount of the position data; the processing unit 506, configured to perform filtering processing on the sound signal acquired by the sound collection unit.
  • the sound collection unit array signal processing method is a space-time signal processing method, because the voice signal received by the sound collection unit and various noise signals come from different orientations in space, so taking spatial orientation information into consideration will greatly improve
  • the ability of signal processing, and the noise reduction scheme based on the multi-sound acquisition unit array is to hope that the sound collection unit array extracts the sound signal from the space of the user's sound source from the space, thereby ignoring the noise signal from other directions, thereby achieving The purpose of noise reduction.
  • the sound collection unit array is to form a beam in space (as shown in Figure 6), pointing it in the direction of the user's sound source, and filtering out the sound in other directions.
  • the formation of the beam depends on the position of the sound collection unit array relative to the user's source of sound.
  • the direction of arrival of the sound collection unit array is corrected according to the amount of change of the position information of the sound collection unit array on the terminal relative to the user sound source, regardless of the position of the terminal relative to the user sound source,
  • the sound signal from the direction of the user's sound source is always extracted, so as to achieve the purpose of noise reduction, that is, the parameters of the noise reduction algorithm can be adjusted at any time according to the random change of the posture during the user's conversation. Noise reduction effect.
  • the acquiring unit is a gyroscope for acquiring a position data change amount of the sound collection unit array, wherein the position data change amount includes a parameter The displacement variation of the sound collection unit and the angular variation of the sound collection unit array line are examined.
  • the position of the sound source and the sound collection unit is in a random state, and a gyroscope is currently arranged on a large number of mobile phones, and the gyroscope can provide accurate acceleration and angle change information, so the present invention uses the gyro
  • the instrument obtains the position data change amount of the sound collection unit array, and the accurate position data change amount is obtained, and the existing hardware equipment in the terminal is fully utilized, and no additional hardware equipment is needed, so the noise reduction is improved. The effect can reduce hardware costs.
  • the modifying unit 504 includes: an initial position detecting unit 5042, acquiring an initial sound collecting unit in the sound collecting unit array and an initial of the sound collecting unit array line with respect to the user sound source Position data, wherein the initial position data includes coordinate initial data of the reference sound collection unit and angle initial data of the sound collection unit array line; the DOA angle calculation unit 5044 calculates the initial position data and the position data change amount And an angle of arrival between a sound wave direction of the current user sound source and a preset normal line of the sound collection unit array line to determine a direction of arrival of the sound collection unit array according to the angle of arrival.
  • a new angle of arrival between the changed sound source and the preset normal line of the sound collecting unit array line can be calculated, thereby determining In the changed direction of arrival, a new beam is formed, so that the direction of arrival of the microphone array can be directed to the user's sound source, so that the acquired sound signal is mainly the voice signal of the sound source.
  • the arrival angle calculation unit establishes a coordinate system by using the user sound source as a coordinate origin, and calculates the arrival angle according to the following formula:
  • the angle initial data of the cell array line in the coordinate system ( ⁇ ., ⁇ ., ⁇ .) is the displacement change amount of the reference sound collecting unit in the coordinate system
  • ( ⁇ %, ⁇ , ⁇ ) is the amount of angular change of the sound collection unit array line in the coordinate system.
  • the microphone can be calculated by the calculation formula of the upper cylinder The computational complexity is greatly reduced, thereby reducing the direction of arrival of the direction of arrival.
  • the initial position detecting unit 5042 acquires the initial position data of the reference sound collecting unit and the sound collecting unit array line with respect to the user sound source using an automatic search direction direction.
  • the initial position data c of the sound collecting unit with respect to the user sound source is obtained by using the automatic search direction direction. And v. And determining the initial direction of arrival.
  • the direction of arrival is corrected, so that the direction of the sound source is always extracted, and the noise is reduced. the goal of.
  • multi-microphone array signal processing method considers spatial information of signals, which is a kind of Space-time signal processing method, because the voice signal received by the microphone and various noise signals come from different orientations in space, the spatial orientation information is taken into account, which will greatly improve the signal processing capability, especially from the space.
  • An application of a certain orientation signal The noise reduction scheme based on the multi-microphone array happens to be that the microphone array extracts the sound signal from the sound source-mouth direction from the space, thereby ignoring the noise signal from other directions, thereby achieving the purpose of noise reduction.
  • FIG. 6 is a schematic diagram of beamforming of a mobile phone having a three-microphone array.
  • the middle of the mobile phone is placed with three microphones (shown by black dots) to form an array.
  • the beam formed by the array signal processing method is used to perform the noise reduction process as shown by the ripple in the figure, and the ripple range is an ideal voice signal.
  • the receiving range means that the microphone array only receives sound from the direction of the user's mouth, and automatically filters out noise interference from other directions.
  • the two main directions studied in the field of array signal processing are beamforming and direction of arrival estimation, and the array signal processing method for speech noise reduction is actually a problem of beamforming.
  • the voice noise reduction scheme of the mobile phone is more dependent on the desired voice signal and noise.
  • the difference of interference signals in space so the multi-sound acquisition unit array mobile phone noise reduction application mostly uses the beamforming algorithm based on spatial reference method.
  • the basic ideas are similar.
  • the most basic spatial reference method based beamforming principle is introduced, and then its defect for mobile phone noise reduction is explained.
  • the improvement of the orientation information based on the mobile phone gyroscope of the present invention is proposed.
  • the sound collection unit in the following introduction uses a microphone as an example.
  • the multi-microphone array signal processing algorithm first involves an array configuration of multiple microphones, that is, how to position the microphone, generally including a linear array of evenly spaced or non-uniform intervals, a circular planar array, a stereoscopic array, but due to the structure of the mobile phone And the size limit, the array built on the mobile phone is a linear array of hooks.
  • the array generally has two or three, and at most four microphones are equally spaced at the bottom of the mobile phone for picking up various sound signals, such as Figure 7 shows.
  • the first microphone is a reference, and the delay of other microphones relative to this reference microphone is
  • the direction vector of the microphone array is thus obtained as follows:
  • the direction vector is only related to the spatial angle, so the direction vector of the array can be written as "(), which is independent of the position of the reference point.
  • the output of the microphone can be written as a vector:
  • the above formula is the generation model of the microphone array signal X W , and the spatial angle ⁇ is a known one.
  • the beamforming technology can be used to extract the desired sound source signal from the microphone.
  • the implementation method is to enhance the desired signal and suppress the interference signal by performing spatial filtering on each microphone array signal.
  • the purpose, and the weighting factor of each array signal can be adaptively changed according to changes in the signal environment.
  • the microphones used here are all omnidirectional, but after weighting and summing the signals of the arrays, the direction of the array reception can be adjusted to be concentrated in one direction, that is, a beam is formed.
  • the basic idea of beamforming is to direct the array beams in one direction by weighting and summing the signals in the microphone array to reach the direction of the maximum output power of the desired signal.
  • the specific beamforming scheme is to add a suitable delay compensation to each of the picked-up signals W so that all the output signals are synchronized in the direction of the ⁇ , and the microphone array obtains the maximum gain of the incident signal in the direction of the ⁇ , and simultaneously
  • the microphone pick-up signals are weighted, and the weight coefficient is ⁇ ' ⁇ , so that the beam formed by the array is tapered, so that different signals are applied to different directions to achieve spatial filtering effect, thereby separating different directions in the space.
  • the source signal is used to extract the desired speech signal and reduce noise.
  • the most basic methods include the use of a delay-addition beamformer and the use of a Wiener-based delay-addition beamformer.
  • the implementation flowcharts of the two beamformers are shown in Figures 8 and 9, respectively.
  • the parameter ⁇ ' is already determined, the value depends on the spatial reference angle ⁇ , and the parameter in Fig. 9 needs to be obtained by the optimization method, and its value also depends on ⁇ , which should actually Record as ⁇ ⁇ ).
  • the required energy is required to maximize the output power of the beamformer, where the output is: 3)
  • the objective function based on the objective function can be established and optimized to maximize the beamformer output power, and the weight coefficient w obtained during the solution process (ie, the optimal parameter, also That is, the beamformer as shown in Fig. 8 is established, and the beamformer of Fig. 9 is similarly solved, except that the parameter estimation method 904 of the Wiener filter is required to establish the final Wiener filter 902.
  • a set of fixed beams can be derived from the distance azimuth parameters set by these hardware.
  • the shaping algorithm (such as the algorithm described above) is used for speech noise reduction, so that the best noise reduction effect can be achieved at all times. But this is a very ideal situation.
  • the location of the sound source is fixed (because the main pickup source of the mobile phone call is the voice of the caller, not the external vocal and interference noise.
  • DOA estimation methods are based on parameter estimation methods, such as maximum likelihood estimation, maximum entropy estimation, etc., which leads to the estimated direction of arrival S may not be very accurate, while the aforementioned good beam is mentioned.
  • the former relies on an accurate reference angle ⁇ so an inaccurate S estimate can affect the beamformer setup, which in turn affects the speech noise reduction effect.
  • the present invention proposes to utilize the information provided by the gyroscope to help beamforming achieve the purpose of noise reduction, and can well solve the technical problems mentioned above.
  • a gyroscope is currently installed on a large number of mobile phones, and the gyroscope can provide very accurate motion direction information, acceleration, and angle change information, so the gyroscope can be used to obtain the position data change amount of the sound collection unit array to determine.
  • the direction of arrival of the wave, wherein the amount of change in the position data includes the amount of change in displacement and the amount of change in angle.
  • the gyroscope can calculate the position information quickly and accurately, and does not occupy the mobile phone system resources, it can solve the above problem well, that is, instead of the DOA estimation algorithm, directly use the advantages of its hardware to calculate the Poda Direction S angle, then establish a beamformer to achieve good noise reduction.
  • the microphones of mobile phones equipped with multi-microphone arrays are generally at the bottom of their mobile phones, arranged in a uniform linear arrangement, generally containing 2 ⁇ 4 microphones.
  • an array of three microphones is formed, and the bottom three microphones form a straight line, which form This line is on the same plane of the phone screen, so the distance and rotation angle of the line will change with the movement or rotation of the whole phone, and the displacement and angle changes of the phone will be recorded by the gyroscope, so the gyroscope test
  • the data is the data of the direction change of the microphone array and can be used to determine the change of direction of the sound source.
  • the microphone on the far right of the array is referenced, as shown in Figure 10, dot 1002, dot 1004, Figure 10 shows a spatial coordinate system, two The position of the microphone array represented by the thick black line moves with the mobile phone and rotates.
  • the coordinate system is abstracted from the azimuth distance relationship between the sound source 1006 and the microphone array when the mobile phone is talking, to facilitate the analysis of the algorithm;
  • the change can also be expressed by the change in the relationship between the black thick line and the origin in the coordinate system.
  • the thick black line represents a straight line formed by the connection of the microphone array, and its length is d.
  • the two black straight lines shown in the figure represent the changes of the microphone array line before and after the user changes the orientation of the mobile phone during the call.
  • the line is the position before the change, and the lower line is the position after the change.
  • the direction of arrival ie the reference direction angle described above
  • the position of the reference microphone Ci
  • the direction of arrival ie, the reference direction angle described above
  • the position of the reference microphone is c i+1
  • the spatial coordinates are set to
  • the microphone position at the other end of the microphone array is set to b i+1
  • the orientation coordinates of the microphone array line are also supported.
  • the position of the reference microphone changes from Ci to c i+1 , and its displacement vector is recorded as:
  • the geometric relationship in 10 is actually obtained by the variables ⁇ and ⁇ .
  • the information is obtained according to the information before the change of the position of the mobile phone during the call and the change of the displacement and direction of the microphone array provided by the gyroscope. After the displacement and orientation information of the mobile phone, the direction of arrival e i+1 of the sound source at this moment is obtained.
  • the angle value ⁇ i+1 of the direction of arrival is derived from the parameter information in the space. It can be seen from Fig. 10 that the origin, bi, Ci and origin, b i+1 , c i+1 in the three-dimensional space form two triangles, and the relationship between the angle and the edge of the triangle can be used to:
  • the information provided by the gyroscope can be used to determine the direction of the direction of the change of the direction of arrival, so that as long as the position and direction information of the microphone array at the initial call of the mobile phone is known, Co and Vo, then the initial direction of arrival and the direction of arrival ⁇ i of all subsequent changes in the attitude of the handset can be determined by means of the unique orientation change provided by the gyroscope. And if you don't use the information provided by the gyroscope, then you need a more complicated beamforming method and a direction-of-arrival estimation algorithm.
  • the automatic direction of arrival estimation algorithm can be used. Although the automatic direction of arrival estimation algorithm is used in the initial acquisition of position data, the direction of arrival is estimated by means of a gyroscope during the subsequent dynamic change of the position of the mobile phone. Compared with the whole process, the automatic direction of arrival estimation algorithm is adopted.
  • the processing speed of the voice processing method of the invention is greatly improved, the real-time performance is good, and the burden of the terminal processor is reduced, and more importantly, the noise reduction is performed. better result.
  • a program product stored on a non-transitory machine readable medium for speech processing, the program product comprising machine executable instructions for causing a computer system to perform the following steps : acquiring a position data change amount of the sound collection unit array on the terminal relative to the user sound source; and correcting a direction of arrival of the sound collection unit array according to the position data change amount.
  • a non-volatile machine readable medium storing a program product for voice processing, the program product comprising machine executable instructions for causing a computer system to perform the steps of: acquiring a terminal The position data change amount of the sound collection unit array relative to the user sound source; and the direction of arrival of the sound collection unit array is corrected according to the position data change amount.
  • a machine readable program the program causing a machine to perform the voice processing method of any one of the above-described aspects.
  • a storage medium storing a machine readable program, wherein the machine readable program causes a machine to perform any of the technical solutions described above Voice processing method.
  • the terminal uses the gyroscope to obtain the terminal orientation change information during the call, and uses the information to timely correct certain parameters in the voice denoising algorithm based on the multi-microphone array.
  • the noise reduction algorithm is adaptive, and can adaptively adjust the noise reduction algorithm according to the random change of the posture during the user's conversation to achieve the best noise reduction effect.
  • the terminal orientation change information comes directly from the gyroscope, this greatly reduces the dependence on the terminal processor and further reduces power consumption.

Landscapes

  • Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Otolaryngology (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Obtaining Desirable Characteristics In Audible-Bandwidth Transducers (AREA)
  • Telephone Function (AREA)

Abstract

一种语音处理方法和装置,其中所述语音处理方法包括:获取终端上的声音采集单元阵列相对于用户发声源的位置数据变化量(302);根据所述位置数据变化量修正所述声音采集单元阵列的波达方向(304);对声音采集单元获取的声音信号进行滤波处理(306)。通过本方法,利用陀螺仪来获取通话时终端方位变化信息,并利用这些信息来对基于多麦克风阵列的语音降噪算法中某些参数进行及时修正,使得降噪算法具备自适应性,能自适应地根据用户通话过程中姿势的随机变化来随时调整降噪算法中的某些参数,达到最好的降噪效果,同时,大大节约了对终端资源的占用。

Description

语音处理方法和语音处理装置
技术领域
本发明涉及通信技术领域, 具体而言, 涉及一种语音处理方法和一种 语音处理装置。 背景技术
为了提高手机的语音通话质量, 许多手机厂商都通过增加麦克风数量 来提高语音通话的质量, 现有的多麦克风终端主要包括两麦克风终端和三 麦克风终端 (未示出) , 两麦克风终端如图 1 所示, 而无论是两麦克风终 端还是三麦克风终端, 通常都是通过一个麦克风主要采集人声信号 (图 1 中的麦克风 1 ) , 其他麦克风来主要采集噪音信号 (图 1 中的麦克风 2 ) , 然后选择合适的自适应算法从麦克风 1 中的信号中除去来自麦克风 2的噪音信号, 从而使得传输出去的是清晰语音。
与以上降噪方案不同, 最近一些手机厂商开始考虑采用基于多麦克风 阵列的语音降噪技术来对通话时所采集的带噪语音信号进行降噪处理, 获 取干净的语音信号。 其在手机中的实现是通过在手机中植入多个麦克风来 实现的, 一般是两个到四个麦克风被安置在手机下方, 呈并排排列 (如图 2 所示) , 每个麦克风之间保持一定的距离, 从而形成一个麦克风阵列。 然后通过阵列信号处理的方法来对多个麦克风接收到的信号进行滤波处 理, 达到降噪的目的。 通过对多个麦克风接收到的阵列信号进行滤波降噪 处理, 使得这一技术是比自适应噪声消除技术更先进, 适应能力更强的手 机降噪方案。
多麦克风阵列信号处理是一种现代信号处理方法, 是一种时空域信号 处理技术, 算法本身不但要考虑信号随时间的变化, 还要考虑信号在空间 中的变化, 所以计算非常复杂。 由于手机通话是一个实时的过程, 所以在 利用多麦克风阵列信号处理算法进行降噪时, 希望能快速地对接收到的语 音信号进行降噪处理, 从而尽量减少延迟量, 但是手机用户接听电话的过 程中常常会变换各种姿势, 这就导致手机与用户的发声音源之间的距离和 方向在发生变化, 使得接收到的信号的空间特征信息也在发生变化, 而且 这种变化是随机的, 无法预测的。 所以在这样一种信号空间信息随时变化 的状况下, 如果所采用的基于阵列信号处理的降噪算法不随时对其一些信 号方位相关的参数进行修正的话, 那么降噪效果将会下降, 也即不能在变 化的方向上达到好的降噪效果。 如果要使得降噪算法能迅速根据环境变化 进行改变的话, 将需要极大的计算量, 这样将会给手机硬件的计算能力带 来极大的挑战, 能耗也将大大增加。 那么这样一种基于多麦克风阵列信号 处理的降噪方案在手机上的应用将是不现实的, 可能不会给用户带来好的 体验, 要么降噪效果不好, 要么大量消耗手机资源。 发明内容
本发明正是基于上述问题, 提出了一种新的语音处理方法, 获取通话 时终端方位变化信息, 并利用这些信息来对基于多麦克风阵列的语音降噪 算法中某些参数进行及时修正, 使得降噪算法具备自适应性, 能自适应地 根据用户通话过程中姿势的随机变化来随时调整降噪算法中的某些参数, 达到最好的降噪效果。
有鉴于此, 根据本发明的一个方面, 提出了一种语音处理方法, 包 括: 获取终端上的声音采集单元阵列相对于用户发声源的位置数据变化 量; 根据所述位置数据变化量修正所述声音采集单元阵列的波达方向; 对 所述声音采集单元获取的声音信号进行滤波处理。
声音采集单元阵列信号处理方法是一种空时信号处理办法, 因为声音 采集单元接收到的语音信号以及各种噪声信号来自空间中不同的方位, 所 以将空间方位信息考虑进去, 将会极大地提高信号处理的能力, 而基于多 声音采集单元阵列的降噪方案就是希望声音采集单元阵列从空间中提取来 自于用户发声源这一方向的声音信号, 从而忽略掉来自其它方向的噪声信 号, 从而达到降噪的目的。
更具体地说, 声音采集单元阵列就是要在空间中形成一个波束, 使其 指向用户发声源的方向, 而过滤掉其它方向的声音。 所述波束的形成依赖 于声音采集单元阵列相对于用户发声源的位置。 通过该技术方案, 根据获 取的终端上声音采集单元阵列相对于用户发声源的位置信息的变化量, 修 正所述声音采集单元阵列的波达方向, 不管终端相对于用户发声源的位置 如何变化, 可以始终提取来自于用户发声源这一方向的声音信号, 从而达 到降噪的目的, 即能自适应地根据用户通话过程中姿势的随机变化来随时 调整降噪算法中的某些参数, 达到最好的降噪效果。
在上述技术方案中, 优选地, 利用所述终端中的陀螺仪获取所述声音 采集单元阵列的位置数据变化量, 其中, 所述位置数据变化量包括参考声 音采集单元的位移变化量和声音采集单元阵列线的角度变化量。
通过该技术方案, 在使用终端例如手机的过程中, 发声源与声音采集 单元的位置处于随机变化状态, 而目前大量手机上都配置了陀螺仪, 陀螺 仪能够提供精确的加速度, 角度变化信息, 所以本发明使用陀螺仪获取所 述声音采集单元阵列的位置数据变化量, 将会得到准确的位置数据变化 量, 同时也充分利用了终端中已有的硬件设备, 不需要增加额外的硬件设 备, 因此在提高降噪效果的同时能够减少硬件成本。
在上述技术方案中, 优选地, 所述根据所述位置数据变化量修正所述 声音采集单元阵列的波达方向的步骤包括: 获取所述声音采集单元阵列中 的参考声音采集单元以及声音采集单元阵列线相对于所述用户发声源的初 始位置数据, 其中所述初始位置数据包括参考声音采集单元的坐标初始数 据和声音采集单元阵列线的角度初始数据; 根据所述初始位置数据和所述 位置数据变化量计算出当前所述用户发声源的声波方向与所述声音采集单 元阵列线的预设法线之间的波达角度(也可以称为波达方向) 。
当发声源和声音采集单元的相对位置改变时, 根据陀螺仪提供的位置 变化数据, 可计算出变化后的发声源与声音采集单元阵列线预设法线之间 的新波达角度, 从而确定变化后的波达方向, 形成新的波束, 从而使麦克 风阵列的波达方向可指向用户发声源, 使获取的声音信号主要是发声源的 语音信号。
在上述技术方案中, 优选地, 以所述用户发声源为坐标原点建立坐标 系, 并根据以下公式计算出所述波达角度:
Figure imgf000006_0001
其中, e i+1为所述波达角度, (xri, yri, zri)为所述参考声音采集单元 在所述坐标系中的坐标初始数据, (a A, z)是所述声音采集单元阵列线 在所述坐标系中的角度初始数据, (Δ¾, Ayci, Δζ„.)是所述参考声音采集单元 在所述坐标系中的位移变化量, (Δ%, Δ , Δ^)是所述声音采集单元阵列线 在所述坐标系中的角度变化量。
通过上面筒单的计算公式就能够计算出麦克风阵列相对于用户发声源 实时变化的波达角度, 由于计算公式筒单, 因此大大降低了计算复杂度, 从而减少了波达方向估计时间。
在上述技术方案中, 优选地, 还包括: 使用自动搜索波达方向方式获 取所述参考声音采集单元以及声音采集单元阵列线相对于所述用户发声源 的初始位置数据。
通过该技术方案, 使用自动搜索波达方向方式获取声音采集单元以及 声音采集单元阵列线相对于用户发声源的初始位置数据 c。和 ν。, 以确定 初始波达方向, 也就是说, 采用自动搜索波达方向的方式就能够获取声音 采集单元以及声音采集单元阵列线相对于用户发声源的初始位置数据 c0 ( (xci, yci , zci) ) 和 ν。((αζ·, Α·, ;^·))。 自动搜索波达方向是在手机接通后, 手机用户开始发声的那一刻起开始进行自动确定波达方向的计算工作, 通 常来说, 根据麦克风阵列接收到的信号进行波达方向估计的方法有传统方 法 (包括谱估计法, 线性预测法等) , 子空间法 (包括多重信号分类法, 旋转不变子空间法) , 最大似然法等, 这些都是基本的波达方向估计方 法, 在一般的有关阵列信号处理的相关文献中都有介绍。 这些方法各有其 优劣特征, 如传统的方法可能计算筒单, 但是需要大量的麦克风阵元才能 获得高分辨率的语音效果, 而且对波达方向的估计也不如后面两类方法精 确, 对于手机中所安装的这种小尺寸的阵列, 显然这类方法是不适合的; 子空间方法和最大似然法虽然能较好地估计出波达方向, 但是计算量非常 大, 对于手机通话这种实时性要求很高的应用, 这些方法均不能满足在手 机中进行实时估计的要求。 但是为了确定初始通话时麦克风阵列的波达方 向, 可以用子空间方法或最大似然方法在接通电话时估计一次波达方向, 采用最大似然方法是好的选择, 因为它是最优的方法, 虽然其计算量最 大, 但是在初始阶段计算一次不会对语音带来大的延迟, 而基于该方法提 供的准确的波达方向, 后面就可以利用陀螺仪提供的方向信息来对实时变 化的波达方向进行修正了。
当参考声音单元与用户发声源的相对位置改变时, 根据陀螺仪提供的 变化量, 修正波达方向, 使波达方向始终对准发声源方向, 达到降低噪声 的目的。 因此, 本发明仅在初始位置数据的获取时采用自动搜索波达方向 方式, 而在后续估计自适应波达方向时, 仅根据陀螺仪提供的位置数据变 化量就可以实现波达方向的估计, 而在相关技术中全部采用自动搜索波达 方向方式, 由于自动搜素波达方向的方式计算较复杂, 因此整个过程实时 性较差, 而本发明由于仅在获取初始位置数据时采用自动搜索波达方向的 方法, 因此实时性较好, 处理速率也被大大提高。
根据本发明的又一方面, 还提供了一种语音处理装置, 其特征在于, 包括: 获取单元, 用于获取终端上的声音采集单元阵列相对于用户发声源 的位置数据变化量; 修正单元, 根据所述位置数据变化量修正所述声音采 集单元阵列的波达方向; 处理单元, 用于对所述声音采集单元获取的声音 信号进行滤波处理。
声音采集单元阵列信号处理方法是一种空时信号处理办法, 因为声音 采集单元接收到的语音信号以及各种噪声信号来自空间中不同的方位, 所 以将空间方位信息考虑进去, 将会极大地提高信号处理的能力, 而基于多 声音采集单元阵列的降噪方案就是希望声音采集单元阵列从空间中提取来 自于用户发声源这一方向的声音信号, 忽略掉来自其它方向的噪声信号, 从而达到降噪的目的。
更具体地说, 声音采集单元阵列就是要在空间中形成一个波束, 使其 指向用户发声源的方向, 而过滤掉其它方向的声音。 所述波束的形成依赖 于声音采集单元阵列相对于用户发声源的位置。 通过该技术方案, 根据获 取的终端上声音采集单元阵列相对于用户发声源的位置信息的变化量, 修 正所述声音采集单元阵列的波达方向, 不管终端相对于用户发声源的位置 如何变化, 可以始终提取来自于用户发声源这一方向的声音信号, 从而达 到降噪的目的, 即能自适应地根据用户通话过程中姿势的随机变化来随时 调整降噪算法中的某些参数, 达到最好的降噪效果。
在上述技术方案中, 优选地, 所述获取单元为陀螺仪, 用于获取所述 声音采集单元阵列的位置数据变化量, 其中, 所述位置数据变化量包括参 考声音采集单元的位移变化量和声音采集单元阵列线的角度变化量。
通过该技术方案, 在使用终端例如手机的过程中, 发声源与声音采集 单元的位置处于随机变化状态, 而目前大量手机上都配置了陀螺仪, 陀螺 仪能够提供精确的加速度, 角度变化信息, 所以本发明使用陀螺仪获取所 述声音采集单元阵列的位置数据变化量, 将会得到准确的位置数据变化 量, 同时也充分利用了终端中已有的硬件设备, 不需要增加额外的硬件设 备, 因此在提高降噪效果的同时能够减少硬件成本。
在上述技术方案中, 优选地, 所述修正单元包括: 初始位置检测单 元, 获取所述声音采集单元阵列中的参考声音采集单元以及声音采集单元 阵列线相对于所述用户发声源的初始位置数据, 其中所述初始位置数据包 括参考声音采集单元的坐标初始数据和声音采集单元阵列线的角度初始数 据; 波达角度计算单元, 根据所述初始位置数据和所述位置数据变化量计 算出当前所述用户发声源的声波方向与所述声音采集单元阵列线的预设法 线之间的波达角度, 以根据所述波达角度确定所述声音采集单元阵列的波 达方向。
当发声源和声音采集单元的相对位置改变时, 根据陀螺仪提供的位置 变化数据, 可计算出变化后的发声源与声音采集单元阵列线预设法线之间 的新波达角度, 从而确定变化后的波达方向, 形成新的波束, 从而使麦克 风阵列的波达方向可指向用户发声源, 使获取的声音信号主要是发声源的 语音信号。
在上述技术方案中, 优选地, 所述波达角度计算单元以所述用户发声 源为坐标原点建立坐标系, 并根据以下公式计算出所述波达角度:
Figure imgf000009_0001
其中, 为所述波达角度, (¾, yri, zri)为所述参考声音采集单元在 所述坐标系中的坐标初始数据, (αζ·, Α·, )是所述声音采集单元阵列线在 所述坐标系中的角度初始数据, (Δ^., Δ^., Δ^.)是所述参考声音采集单元在 所述坐标系中的位移变化量, (Δ%, Δ , Δ^)是所述声音采集单元阵列线在 所述坐标系中的角度变化量。
通过上面筒单的计算公式就能够计算出麦克风阵列相对于用户发声源 实时变化的波达角度, 由于计算公式筒单, 因此大大降低了计算复杂度, 从而减少了波达方向估计时间。
在上述技术方案中, 优选地, 初始位置检测单元使用自动搜索波达方 向方式获取所述参考声音采集单元以及所述声音采集单元阵列线相对于所 述用户发声源的初始位置数据。
使用自动搜索波达方向方式获取声音采集单元以及声音采集单元阵列 线相对于用户发声源的初始位置数据 c。和 v。, 以确定初始波达方向, 也 就是说, 采用自动搜索波达方向的方式就能够获取声音采集单元以及声音 采集单元阵列线相对于用户发声源的初始位置数据 c。 ( (xci , yci , zci ) ) 和 νο( (αζ·, Α, ;^·))。 当参考声音单元与用户发声源的相对位置改变时, 根据陀 螺仪提供的变化量, 修正波达方向, 使波达方向始终对准发声源方向, 达 到降低噪声的目的。 因此, 本发明仅在初始位置数据的获取时采用自动搜 索波达方向方式, 而在后续估计自适应波达方向时, 仅根据陀螺仪提供的 位置数据变化量就可以实现波达方向的估计, 而在相关技术中全部采用自 动搜索波达方向方式, 由于自动搜素波达方向的方式计算较复杂, 因此整 个过程实时性较差, 而本发明由于仅在获取初始位置数据时采用自动搜索 波达方向的方法, 因此实时性较好, 处理速率也被大大提高。
根据本发明的另一方面, 还提供了一种存储在非易失性机器可读介质 上的程序产品, 用于语音处理, 所述程序产品包括用于使计算机系统执行 以下步骤的机器可执行指令: 获取终端上的声音采集单元阵列相对于用户 发声源的位置数据变化量; 根据所述位置数据变化量修正所述声音采集单 元阵列的波达方向。
根据本发明的另一方面, 还提供了一种非易失机器可读介质, 存储有 用于语音处理的程序产品, 所述程序产品包括用于使计算机系统执行以下 步骤的机器可执行指令: 获取终端上的声音采集单元阵列相对于用户发声 源的位置数据变化量; 根据所述位置数据变化量修正所述声音采集单元阵 列的波达方向。
根据本发明的再一方面, 还提供了一种机器可读程序, 所述程序使机 器执行如上所述技术方案中任一所述的语音处理方法。
根据本发明的再一方面, 还提供了一种存储有机器可读程序的存储介 质, 其中, 所述机器可读程序使得机器执行如上所述技术方案中任一所述 的语音处理方法。
本发明借助陀螺仪提供的在手机通话过程中, 手机姿态变化带来的位 移和方位变化信息, 来对具备多麦克风阵列的手机提供更优质的降噪效 果。 一般来说配备多麦克风阵列的降噪功能模块都会对手机硬件提供较高 的要求, 因为对计算能力要求较高, 特别是在波束成形之前的波达方向的 估计是很复杂的, 本发明提出的这种利用陀螺仪提供的手机方位变化信息 能够准确快速地计算波达方向, 只需要一个数学式子的计算, 不需要复杂 的迭代, 估计等算法, 使得麦克风阵列能自适应地随时对准期望声源- 嘴, 使得麦克风阵列的降噪效果得到提高。 附图说明
图 1示出了双麦克风终端的双麦克风位置布置示意图;
图 2示出了三麦克风终端的三麦克风位置布置示意图;
图 3示出了根据本发明的一个实施例的语音处理方法的示意图; 图 4示出了根据本发明的一个实施例的借助陀螺仪信息进行多麦克风 阵列降噪的软硬件实现的流程图;
图 5示出了根据本发明的一个实施例的语音处理装置的终端框图; 图 6示出了三麦克阵列手机的波束成形的示意图;
图 7示出了麦克风阵列的声音接收模型的示意图; 图 8示出了延迟-相加波束形成器的实现原理示意图;
图 9 示出了基于维纳滤波的延迟 -相加波束形成器的实现原理示意 图;
图 10 示出了根据手机中麦克风阵列线的空间位置和方向变化的几何 示意图。 具体实施方式
为了能够更清楚地理解本发明的上述目的、 特征和优点, 下面结合附 图和具体实施方式对本发明进行进一步的详细描述。 需要说明的是, 在不 沖突的情况下, 本申请的实施例及实施例中的特征可以相互组合。
在下面的描述中阐述了很多具体细节以便于充分理解本发明, 但是, 本发明还可以采用其他不同于在此描述的其他方式来实施, 因此, 本发明 的保护范围并不受下面公开的具体实施例的限制。
图 3示出了根据本发明的一个实施例的语音处理方法的示意图。
如图 3 所示, 根据本发明的实施例的语音处理方法可以包括以下步 骤, 步骤 302 : 获取终端上的声音采集单元阵列相对于用户发声源的位置 数据变化量; 步骤 304: 根据所述位置数据变化量修正所述声音采集单元 阵列的波达方向; 步骤 306: 对声音采集单元获取的声音信号进行滤波处 理。
声音采集单元阵列信号处理方法是一种空时信号处理办法, 因为声音 采集单元接收到的语音信号以及各种噪声信号来自空间中不同的方位, 所 以将空间方位信息考虑进去, 将会极大地提高信号处理的能力, 而基于多 声音采集单元阵列的降噪方案就是希望声音采集单元阵列从空间中提取来 自于用户发声源这一方向的声音信号, 并对声音信号进行滤波处理, 从而 达到降噪的目的。
更具体地说, 声音采集单元阵列就是要在空间中形成一个波束(如图 6 所示) , 使其指向用户发声源的方向, 而过滤掉其它方向的声音。 所述 波束的形成依赖于声音采集单元阵列相对于用户发声源的位置。 通过该技 术方案, 根据获取终端上声音采集单元阵列相对于用户发声源的位置信息 的变化量, 修正所述声音采集单元阵列的波达方向, 不管终端相对于用户 发声源的位置如何变化, 可以始终提取来自于用户发声源这一方向的声音 信号, 从而达到降噪的目的, 即能自适应地根据用户通话过程中姿势的随 机变化来随时调整降噪算法中的某些参数, 对声音采集单元获取的声音信 号进行滤波处理, 达到最好的降噪效果。
在上述技术方案中, 优选地, 利用所述终端中的陀螺仪获取所述声音 采集单元阵列的位置数据变化量, 其中, 所述位置数据变化量包括参考声 音采集单元的位移变化量和声音采集单元阵列线的角度变化量。
在上述技术方案中, 优选地, 所述根据所述位置数据变化量修正所述 声音采集单元阵列的波达方向的步骤包括: 获取所述声音采集单元阵列中 的参考声音采集单元以及声音采集单元阵列线相对于所述用户发声源的初 始位置数据, 其中所述初始位置数据包括参考声音采集单元的坐标初始数 据和声音采集单元阵列线角度初始数据; 根据所述初始位置数据和所述位 置数据变化量计算出当前所述用户发声源的声波方向与所述声音采集单元 阵列线的预设法线之间的波达角度(即确定了波达方向) 。
在上述技术方案中, 优选地, 以所述用户发声源为坐标原点建立坐标 系, 并根据以下公式计算出所述波达角度:
Figure imgf000012_0001
其中, e i+1为所述波达角度, (xri, yri, zri)为所述参考声音采集单元 在所述坐标系中的坐标初始数据, (a A, z)是所述声音采集单元阵列线 在所述坐标系中的角度初始数据, (Δ¾ , Ayci , Δζ„.)是所述参考声音采集单元 在所述坐标系中的位移变化量, (Δ%, Δ , Δ^ )是所述声音采集单元阵列线 在所述坐标系中的角度变化量。
通过上面筒单的计算公式就能够计算出麦克风阵列相对于用户发声源 实时变化的波达角度, 由于计算公式筒单, 因此大大降低了计算复杂度, 从而减少了波达方向估计时间。
在上述技术方案中, 优选地, 还包括: 使用自动搜索波达方向方式获 取所述参考声音采集单元以及所述声音采集单元阵列线相对于所述用户发 声源的初始位置数据。
使用自动搜索波达方向方式获取声音采集单元相对于用户发声源的初 始位置数据 C。和 v。, 以确定初始波达方向, 也就是说, 采用自动搜索波 达方向的方式就能够获取声音采集单元以及声音采集单元阵列线相对于用 户发声源的初始位置数据 co ( (½, yri, zCi.) ) 和 Vo( 自动搜索 波达方向是在手机接通后, 手机用户开始发声的那一刻起开始进行自动确 定波达方向的计算工作, 通常来说, 根据麦克风阵列接收到的信号进行波 达方向估计的方法有传统方法 (包括谱估计法, 线性预测法等) , 子空间 法 (包括多重信号分类法, 旋转不变子空间法) , 最大似然法等, 这些都 是基本的波达方向估计方法, 在一般的有关阵列信号处理的相关文献中都 有介绍。 这些方法各有其优劣特征, 如传统的方法可能计算筒单, 但是需 要大量的麦克风阵元才能获得高分辨率的语音效果, 而且对波达方向的估 计也不如后面两类方法精确, 对于手机中所安装的这种小尺寸的阵列, 显 然这类方法是不适合的; 子空间方法和最大似然法虽然能较好地估计出波 达方向, 但是计算量非常大, 对于手机通话这种实时性要求很高的应用, 这些方法均不能满足在手机中进行实时估计的要求。 但是为了确定初始通 话时麦克风阵列的波达方向, 可以用子空间方法或最大似然方法在接通电 话时估计一次波达方向, 采用最大似然方法是好的选择, 因为它是最优的 方法, 虽然其计算量最大, 但是在初始阶段计算一次不会对语音带来大的 延迟, 而基于该方法提供的准确的波达方向, 后面就可以利用陀螺仪提供 的方向信息来对实时变化的波达方向进行修正了。
当参考声音单元与用户发声源的相对位置改变时, 根据陀螺仪提供的 变化量, 修正波达方向, 使波达方向始终对准发声源方向, 达到降低噪声 的目的。 因此, 本发明仅在初始位置数据的获取时采用自动搜索波达方向 方式, 而在后续估计自适应波达方向时, 仅根据陀螺仪提供的位置数据变 化量就可以实现波达方向的估计, 而在相关技术中全部采用自动搜索波达 方向方式, 由于自动搜素波达方向的方式计算较复杂, 因此整个过程实时 性较差, 而本发明由于仅在获取初始位置数据时采用自动搜索波达方向的 方法, 因此实时性较好, 处理速率也被大大提高。
图 4示出了根据本发明的一个实施例的借助陀螺仪信息进行多麦克风 阵列降噪的软硬件实现的流程图。
如图 4 所示, 借助陀螺仪信息进行多麦克风阵列降噪的实现流程如 下:
步骤 402 , 自动搜索初始位置, 形成波束。 使用自动搜索波达方式搜 索麦克风阵列与发声器的初始位置, 形成波束。
自动搜索波达方向是在手机接通后, 手机用户开始发声的那一刻起开 始进行自动确定波达方向的计算工作, 通常来说, 根据麦克风阵列接收到 的信号进行波达方向估计的方法有传统方法 (包括谱估计法, 线性预测法 等) , 子空间法 (包括多重信号分类法, 旋转不变子空间法) , 最大似然 法等, 这些都是基本的波达方向估计方法, 在一般的有关阵列信号处理的 相关文献中都有介绍。 这些方法各有其优劣特征, 如传统的方法可能计算 筒单, 但是需要大量的麦克风阵元才能获得高分辨率的语音效果, 而且对 波达方向的估计也不如后面两类方法精确, 对于手机中所安装的这种小尺 寸的阵列, 显然这类方法是不适合的; 子空间方法和最大似然法虽然能较 好地估计出波达方向, 但是计算量非常大, 对于手机通话这种实时性要求 很高的应用, 这些方法均不能满足在手机中进行实时估计的要求。 但是为 了确定初始通话时麦克风阵列的波达方向, 可以用子空间方法或最大似然 方法在接通电话时估计一次波达方向, 采用最大似然方法是好的选择, 因 为它是最优的方法, 虽然其计算量最大, 但是在初始阶段计算一次不会对 语音带来大的延迟, 而基于该方法提供的准确的波达方向, 后面就可以利 用陀螺仪提供的方向信息来对实时变化的波达方向进行修正了。 也就是 说, 采用自动搜索波达方向的方式就能够获取声音采集单元以及声音采集 单元阵列线相对于用户发声源的初始位置数据 c。 ( (½, yci-, zc.) ) 和 、ai , ))。
步骤 404 , 手机陀螺仪获取手机方位变化参数。 手机方位发生变化 时, 由陀螺仪获取位置变化数据。
步骤 406, 波达方向计算。 根据初始位置信息及方位变化量计算变化 后的波达方向。
步骤 408 , 将计算出的波达方向数据输入波达成形算法中, 麦克风阵 列形成波束。
步骤 410, 语音降噪处理。 对所述声音采集单元获取的声音信号进行 滤波处理, 也即是对波束采集到的语音信号进行降噪处理。
步骤 412 , 编解码等音频处理模块。 经降噪处理的语音信号进行编解 码处理, 向外传输。
图 5示出了根据本发明的又一实施例的语音处理装置的终端框图。 如图 5所示, 根据本发明的一个实施例的语音处理装置 500, 包括: 获取单元 502 , 用于获取终端上的声音采集单元阵列相对于用户发声源的 位置数据变化量; 修正单元 504 , 根据所述位置数据变化量修正所述声音 采集单元阵列的波达方向; 处理单元 506 , 用于对所述声音采集单元获取 的声音信号进行滤波处理。
声音采集单元阵列信号处理方法是一种空时信号处理办法, 因为声音 采集单元接收到的语音信号以及各种噪声信号来自空间中不同的方位, 所 以将空间方位信息考虑进去, 将会极大地提高信号处理的能力, 而基于多 声音采集单元阵列的降噪方案就是希望声音采集单元阵列从空间中提取来 自于用户发声源这一方向的声音信号, 从而忽略掉来自其它方向的噪声信 号, 从而达到降噪的目的。
更具体地说, 声音采集单元阵列就是要在空间中形成一个波束(如图 6 所示) , 使其指向用户发声源的方向, 而过滤掉其它方向的声音。 所述 波束的形成依赖于声音采集单元阵列相对于用户发声源的位置。 通过该技 术方案, 根据获取终端上声音采集单元阵列相对于用户发声源的位置信息 的变化量, 修正所述声音采集单元阵列的波达方向, 不管终端相对于用户 发声源的位置如何变化, 可以始终提取来自于用户发声源这一方向的声音 信号, 从而达到降噪的目的, 即能自适应地根据用户通话过程中姿势的随 机变化来随时调整降噪算法中的某些参数, 达到最好的降噪效果。
在上述技术方案中, 优选地, 所述获取单元为陀螺仪, 用于获取所述 声音采集单元阵列的位置数据变化量, 其中, 所述位置数据变化量包括参 考声音采集单元的位移变化量和声音采集单元阵列线的角度变化量。
在使用终端例如手机的过程中, 发声源与声音采集单元的位置处于随 机变化状态, 而目前大量手机上都配置了陀螺仪, 陀螺仪能够提供精确的 加速度, 角度变化信息, 所以本发明使用陀螺仪获取所述声音采集单元阵 列的位置数据变化量, 将会得到准确的位置数据变化量, 同时也充分利用 了终端中已有的硬件设备, 不需要增加额外的硬件设备, 因此在提高降噪 效果的同时能够减少硬件成本。
在上述技术方案中, 优选地, 所述修正单元 504 包括: 初始位置检测 单元 5042 , 获取所述声音采集单元阵列中的参考声音采集单元以及声音 采集单元阵列线相对于所述用户发声源的初始位置数据, 其中所述初始位 置数据包括参考声音采集单元的坐标初始数据和声音采集单元阵列线的角 度初始数据; 波达角度计算单元 5044 , 根据所述初始位置数据和所述位 置数据变化量计算出当前所述用户发声源的声波方向与所述声音采集单元 阵列线的预设法线之间的波达角度, 以根据所述波达角度确定所述声音采 集单元阵列的波达方向。
当发声源和声音采集单元的相对位置改变时, 根据陀螺仪提供的位置 变化数据, 可计算出变化后的发声源与声音采集单元阵列线预设法线之间 的新波达角度, 从而确定变化后的波达方向, 形成新的波束, 从而使麦克 风阵列的波达方向可指向用户发声源, 使获取的声音信号主要是发声源的 语音信号。
在上述技术方案中, 优选地, 所述波达角度计算单元以所述用户发声 源为坐标原点建立坐标系, 并根据以下公式计算出所述波达角度:
Figure imgf000016_0001
其中, 为所述波达角度, (¾, yri, zri)为所述参考声音采集单元在 所述坐标系中的坐标初始数据, (αζ·, Α· , )是所述声音采集单元阵列线在 所述坐标系中的角度初始数据, (Δ^., Δ^., Δ^.)是所述参考声音采集单元在 所述坐标系中的位移变化量, (Δ%, Δ , Δ^)是所述声音采集单元阵列线在 所述坐标系中的角度变化量。 通过上面筒单的计算公式就能够计算出麦克 大大降低了计算复杂度, 从而减少了波达方向估计时间。
在上述技术方案中, 优选地, 初始位置检测单元 5042 使用自动搜索 波达方向方式获取所述参考声音采集单元以及声音采集单元阵列线相对于 所述用户发声源的初始位置数据。
通过该技术方案, 使用自动搜索波达方向方式获取声音采集单元相对 于用户发声源的初始位置数据 c。和 v。, 进而确定初始波达方向, 当参考 声音单元与用户发声源的相对位置改变时, 根据陀螺仪提供的变化量, 修 正波达方向, 使波达方向始终提取发声源方向的信号, 达到降低噪声的目 的。
下面结合图 6至图 10进一步说明根据本发明的又一实施例。
与以往基于时域信号分析的语音降噪方案 (如双麦克风的自适应噪声 消除, 单麦克风的滤波的噪声消除等方案) 不同, 多麦克风阵列信号处理 方法考虑了信号的空间信息, 是一种空时信号处理方法, 因为麦克风接收 到的语音信号以及各种噪声信号来自空间中不同的方位, 所以将空间方位 信息考虑进去, 将会极大地提高信号处理的能力, 尤其是要从空间中提取 某一个方位的信号这样的应用。 而基于多麦克风阵列的降噪方案恰好是希 望麦克风阵列从空间中提取来自于发声源-嘴这一方向的声音信号, 从而 忽略掉来自其它方向的噪声信号, 从而达到降噪的目的。
更具体地说, 麦克风阵列就是要在空间中形成一个波束, 使其指向嘴 发出声源的方向, 而过滤掉其它方向的声音, 图 6就是一个具有三麦克风 阵列的手机的波束成形示意图, 图中手机下面被安置了 3个麦克风 (黑点 所示) 形成一个阵列, 利用阵列信号处理的方法来进行降噪处理时形成的 波束如图中的波纹所示, 波纹范围为一个理想的语音信号接收范围, 意味 着该麦克风阵列只接收来自用户嘴那个方向的声音, 而自动过滤掉了来自 其它方向的噪声干扰。
一般来说, 阵列信号处理领域主要研究的两个方向是波束成形和波达 方向估计, 而用于语音降噪的阵列信号处理方法实际上是波束成形的问 题。 实际上由于手机的语音降噪方案更多地依赖于期望的语音信号与噪声 干扰信号在空间中的差异性, 故目前多声音采集单元阵列手机降噪应用多 采用基于空间参考方式的波束成形算法, 当然这类方法会存在很多种变 体, 但其基本思路都是相似的。 下面首先对最基本的基于空间参考方式的 波束成形原理进行介绍, 然后说明其用于手机降噪的缺陷, 最后提出本发 明的基于手机陀螺仪方位信息的改善。 以下介绍中声音采集单元均以麦克 风为例介绍。
多麦克风阵列信号处理算法首先涉及到多个麦克风的阵列构造, 即怎 样去摆放麦克风的位置, 一般包括均匀间隔或非均匀间隔的线性阵列, 圓 形的平面阵列, 立体阵列, 但由于手机结构和体积的限制, 手机上构建的 阵列都是均勾的线性阵列, 这个阵列一般有两个或三个, 最多四个麦克风 等间距地排列在手机的底部, 用于拾取各种声音信号, 如图 7所示。 图 7 最下方为 M个麦克风组成的麦克风阵列 714, 计作 ·( = 1,2 ··,Μ), 相邻麦 克风之间相距为 d, 期望的声源 702信号为 ^), 麦克风阵列附近还包括若 干个噪声源 ( 704、 706、 708、 710、 712) , 计作 ( 0· = "',J) , 为声 源方向与参考的麦克风阵列法线方向之间的波达角度, 以第一个麦克风 为参考, 其他麦克风相对这个参考麦克风的时延为
Figure imgf000018_0001
由 此得到该麦克风阵列的方向向量为:
Figure imgf000018_0002
( 1 ) 式中 为波长, 当波长与阵列的几何结构确定时, 该方向向量 只与空间角度 有关, 因此阵列的方向向量可记为 "( ), 它与基准点的位 置无关。 这样 M个麦克风的输出可写为向量:
Figure imgf000018_0003
Figure imgf000018_0004
Figure imgf000018_0005
(2)
上式即为麦克风阵列信号 XW的生成模型, 空间角度 ^是一个已知的. 考, 建立了阵列模型后, 就可以采用波束成形技术从麦克风拾取信号 ) 中提取期望声源信号 了, 实现方式是通过对各个麦克风阵列信号加权 进行空域滤波, 来达到增强期望信号, 抑制干扰信号的目的, 而且可以根 据信号环境的变化自适应地改变各个阵列信号的加权因子。 这里所采用的 麦克风都是全指向的, 但是经过对各阵列信号加权求和处理后, 可以调整 阵列接收的方向聚集到一个方向, 即形成了一个波束。 总之, 波束成形的 基本思想就是通过将麦克风阵列中各信号进行加权求和, 将阵列波束导向 一个方向, 对期望信号达到最大输出功率的导向方向。
要形成一个指向性的波束, 首先需要对信号作一些假设, 如假设阵列 拾取的各个信号 W都与噪声源信号 W不相关, 且各个麦克风接收到的 信号具有相同的统计特性。 在该假设下, 具体的波束成形方案是对各路拾 取信号 W加上一个合适的延时补偿 使得所有输出信号在 ^方向同步, 实现麦克风阵列对 ^方向的入射信号获取最大增益, 同时对每个麦克风拾 取信号进行一个加权, 权系数为 ω'· , 使得对阵列形成的波束进行锥化处 理, 这样对不同方向的信号进行不同增益, 达到空间滤波的效果, 借以分 离出空间中的不同方向源的信号, 从而达到提取期望的语音信号及降噪目 的。 实际上存在多种方法来确定参数 。 最基本的方法的包括采用延迟- 相加波束形成器, 以及采用基于维纳滤波的延迟 -相加波束形成器。 这两 种波束形成器的实现流程图分别如图 8和图 9所示。
如图 8和图 9所示, 参数 τ '是已经确定的, 其值依赖于空间参考角度 θ , 而对于图 9 中的参数 需要通过优化方法来获取, 其值也依赖于 θ , 实际上应该记为 ω^)。 为了获得优化的 来形成所需要的波束, 需 要获得的 ^能使得波束形成器的输出功率达到最大, 其中输出 )为: 3 )
= ( , 波束形成器输出功率为:
Figure imgf000019_0001
( 4 Λ、 )
此时可建立基于 的目标函数, 并对其进行优化, 使得波束形成 器输出功率达到最大, 求解过程中所求取的权系数 w ( 即为最优参数, 也 即建立了如图 8所示的波束形成器, 而图 9的波束形成器的求法类似, 只 是需要利用维纳滤波器的参数估计方法 904 来建立最终的维纳滤波器 902。
以上是针对基本的波束形成的理论算法的描述, 能看出波束形成器的 建立依赖于空间参考角度^ 也即波达方向, 所以该参数对于波束形成器 以及语音降噪的效果是非常重要的, 一般来说需要非常准确的估计值, 如 果该值稍有偏差, 那么将会导致最终的降噪效果下降, 因为波束没能准确 指向发声源的方向, 而是指向了其他方向, 这就会收集到一些噪声干扰信 号, 特别是对于近场波束成形方法, 由于声源和噪声源可能距离麦克风阵 列都很近, 这样参考角度 S稍微的偏差都有可能导致降噪失败。 一般来 说, 如果麦克风阵列与期望获取的声源位置都是固定不变的话, 那么在测 得波达方向的准确值以后, 就可以从这些硬件设置的距离方位参数推导出 一套固定的波束成形算法 (如上面所述的算法) , 用于语音降噪处理, 这 样就可以时刻达到最佳的降噪效果。 但这是非常理想的状况, 对于现实的 手机通话场景来说, 虽然发声源的位置是固定不变的 (因为手机通话主要 的拾取音源是通话人的声音, 而不是外界的人声和干扰噪音) , 但是人在 通话过程中会随时变换姿态, 而且这是无法预测跟踪的, 即人打电话的姿 态变化是随机的, 这就导致电话的位置方位在随时发生变化, 与发声源的 距离和方向也在变化, 对于手机上的麦克风阵列来说, 波达方向也跟随发 生变化, 在这种情况下, 如果所采用的波束形成器的参数还是依赖于初始 的那个参考角度 S的话, 就会使得波束并没有指向发声源, 而是来自其它 方向的声音, 这就有可能将期望获取的声源语音信号当作噪音, 而将噪音 当作期望获取的语音, 导致降噪失败, 甚至带来非常差的通话效果。
为了解决上面描述的这一技术问题, 就需要手机麦克风阵列形成的波 束随时发生变动, 自适应的指向发声源, 这就需要采用波达方向估计的算 法, 实际上, 波达方向估计就是起到一个发声源定位的作用, 使得后面形 成的波束能指向正确。 波达方向估计方法都非常复杂, 需要很大的计算 量, 而且要随时对波达方向的变化进行监测, 如果用于手机上, 就会对手 机芯片带来很大的计算负担, 造成很大的能耗, 而且复杂的计算过程加上 后面的波束形成算法的计算过程, 会使得所处理的语音产生延时, 较大的 延时对于实时通话来说是需要避免的。 另外, 所有波达方向估计方法都是 基于参数估计的方法, 例如最大似然估计, 最大熵估计等, 这就导致所估 计出来的波达方向 S可能不是很精确, 而前面提到好的波束形成器依赖于 准确的参考角度^ 所以不精确的 S估计会影响到波束形成器的建立, 进 而影响到语音降噪效果。
基于上面的分析可知, 仅仅采用阵列信号处理的软件算法, 包括波束 形成和波达方向估计, 可能对于手机语音降噪应用不能胜任, 或者达不到 好的降噪效果, 那么需要考虑一些其他的解决途径。
本发明提出利用陀螺仪提供的信息来帮助波束成形达到降噪目的, 能 很好地解决上面提到的那些技术问题。 首先目前大量手机上都配置了陀螺 仪, 而陀螺仪能够提供非常精确的运动方向信息, 加速度, 角度变化信 息, 所以此处可以用陀螺仪获取所述声音采集单元阵列的位置数据变化量 来确定波达方向, 其中, 位置数据变化量包括位移变化量和角度变化量。 由于陀螺仪能快速精确地计算出的方位信息, 而且不占用手机系统资源, 所以能够很好的解决上面提出的问题, 即代替波达方向估计算法, 直接利 用其硬件的优势来计算出波达方向 S角度, 然后建立波束形成器, 达到好 的降噪效果。
下面结合图 10 说明如何借助陀螺仪来确定声音采集单元阵列的波达 方向的。 配置多麦克风阵列的手机的麦克风一般处于其手机底部, 成均匀 线性排列, 一般包含 2~4 个麦克风, 如图 2 所示为三个麦克风组成的阵 列, 底部三个麦克风形成一条直线, 它们形成的这条直线在手机屏幕同一 个平面上, 所以该直线移动的距离和旋转的角度会跟随整个手机的移动或 旋转而变化, 而手机的位移和角度变化会被陀螺仪记录, 故陀螺仪测试的 数据就是麦克风阵列位置方向变化的数据, 可以被用来确定声源波达方向 的变化。 如前面图 7所介绍的, 在进行波束成形时, 首先需要在麦克风阵 列中确定一个参考麦克风, 将声源与该麦克风的连线作为波达方向, 那么 在后面的算法推导中, 始终以麦克风阵列最右边的那个麦克风为参考, 如 图 10中圓点 1002、 圓点 1004所示, 图 10显示了一个空间坐标系, 两条 黑色粗直线所代表的麦克风阵列随手机移动和转动时位置的变化, 该坐标 系是从手机通话时发声源 1006 与麦克风阵列之间的方位距离关系抽象出 来, 以方便算法的分析; 在该图中, 我们将发声源 1006 作为三维空间中 的坐标原点, 表示声源位置总是不变地代表原点, 那么麦克风阵列就在这 个空间中随机的变化, 而麦克风与发声源 1006 之间距离方位的变化也就 可用该坐标系中黑色粗线与原点之间的关系变化来表示。 图中, 黑色粗线 代表麦克风阵列相连形成的一条直线, 设其长度为 d, 图中所显示的两条 黑色粗直线代表通话过程中用户改变手机方位前后麦克风阵列线的变化, 这里假设上面一条线为改变之前的位置, 下面一条线为改变之后的位置。
对于改变之前的麦克风阵列, 波达方向 (即前面所描述的参考方向角 度) 为 ·, 参考麦克风的位置为 Ci, 其空间坐标设为 =[^,^, ], 而麦克 风阵列另一端的麦克风位置设为 bi, 其空间坐标设为 δ=[ , ·, ·], 同时 假设这条麦克风阵列线的方位坐标 ( 即与三坐标轴所成角度) 为 =k,A, ], 这样 bi就可以用 Ci来表示作:
bi = [xbj, ybi, zbi ] = [xcj - d cos ai, yd - d cos βί, zci - d cos
Figure imgf000022_0001
( 5 ) 同理, 对于改变之后的麦克风阵列, 波达方向 (即前面所描述的参考 方向角度) 为 1 , 参考麦克风的位置为 ci+1 , 其空间坐标设为
Figure imgf000022_0002
而麦克风阵列另一端的麦克风位置设为 bi+1, 其空间 坐标设为 = Ι· ('·) ' ' z('+i) J, 同时 支设这条麦克风阵列线的方位坐标
(即与三坐标轴所成角度) 为 ι = ["'·+ι,Α+1,^], 这样 bi+1就可以用 Ci+1来 表示作:
i = k(!+i) , yb(M) , zb(M) J = k(!+i) - d cos ai+l , yc(M) - d cos βί+ι , zci - d cos γί+ι J ( 6 ) 然后假设麦克风阵列线位置方向变化带来的角度和和位移变化, 方位 从 变为 这个变化的向量记为:
△ = [△",■ , , A i ] = k+1 - ( , βί+ι - β, , rM -Yi] ( 7 )
参考麦克风的位置从 Ci变为 ci+1, 其位移向量记为:
=
Figure imgf000022_0003
( 8 ) 上面所描述的这两个向量 ΔΔ 是可以由手机陀螺仪获取的, 而且 会随着每时刻手机位置方位的变化及时提供相应的变化值。 有了以上这些已知的关于手机阵列线变化的变量, 下面就要根据图
10 中的几何关系求取 Θ 实际上是通过变量 Δ 和^来求取 Θ 即根 据通话过程中手机位置方位变化前的信息以及陀螺仪提供的麦克风阵列的 位移和方向的变化信息来求取变化后的手机位移和方位信息, 从而求出这 一时刻的发声源的波达方向 e i+1
下面就通过空间中的参数信息来推导波达方向的角度值 Θ i+1。 从图 10 能看出, 三维空间中原点, bi, Ci以及原点, bi+1 , ci+1构成两个三角形, 利用三角形的角与边的关系, 可得出:
Figure imgf000023_0001
_ xd cos ai + yci cos βί + zci cos
Figure imgf000023_0002
d 2 + (xc 2 (M) + yc 2 (M) + zc 2 (M) )- ((xc(M) - d cos aM J + (yc(M) - d cos βΜ + (zc(M) - d cos γί+
Figure imgf000023_0003
¾+i) cos«.+1 + yc(i+l) cos^.+1 + zc(i+l) cosf.+1
( 10 ) 考虑关系式 (7 ) 和 (8 ) , 带入上式进行展开, 可得:
Figure imgf000023_0004
+ Δ¾ )cos(a,. + Δα; )+ (ycl + Aycl )cos(^. + Δ ; )+ (zcl + Azcl )cos(/; + Δ;
l(¾ + Δ „· j + [yci + Ayci ) + [zci + Azci ) ) ( n )
从上面的式子 ( 9 ) , ( 10 ) , ( 11 ) 能看出, 手机的方位发生变 化, 麦克风阵列也随之发生变化, 变化前的波达方向参考角度为 Θ i, 该参 数已知, 那么相应的麦克风阵列的位置和方向也是已知的, 由参数 Ci和 Vi 唯一确定, 当发生变化后, 波达方向参考角度变为了 而此时 是 未知的, 但可以由参数 ci, vi, 以及陀螺仪提供的唯一的方位变化信息 Δνι 和 Aci来共同确定, 即式子 ( 11 ) 所表达的求法。 总之只要了解到手机位 置方向变化前的状态信息, 那么当发生变化以后, 就可以依靠陀螺仪提供 的信息来确定变化后的波达方向角度, 所以说只要知道了手机通话初始时 麦克风阵列的位置和方向信息, 即 Co和 Vo , 那么只要借助于陀螺仪提供 的唯一的方位变化状况, 初始的波达方向 和后面所有的手机姿态变化下 的波达方向 Θ i都可以被求出。 而如果不借助于陀螺仪提供的信息的话, 那么就需要更加复杂的波束成形的方法和波达方向估计算法, 比起式子
( 11 ) 所提供的计算波达方向的筒单计算式, 波达方向估计算法将会非常 复杂非常耗时, 而且不及陀螺仪提供的信息以及( 11 ) 式提供的计算方案 准确。
需说明的是, 在确定手机通话初始时麦克风阵列的位置和方向信息
( c0和 v。) 可采用自动波达方向估计算法, 虽然在初始获取位置数据采用 该自动波达方向估计算法, 但在后续手机位置动态变化的过程中, 借助陀 螺仪来估计波达方向, 相比较整个过程都采用自动波达方向估计算法的方 式, 本发明的语音处理方式的处理速度得到极大提升, 实时性好, 并且降 低了终端处理器的负担, 更重要的是, 降噪效果更好。
根据本发明的实施方式, 还提供了一种存储在非易失性机器可读介质 上的程序产品, 用于语音处理, 所述程序产品包括用于使计算机系统执行 以下步骤的机器可执行指令: 获取终端上的声音采集单元阵列相对于用户 发声源的位置数据变化量; 根据所述位置数据变化量修正所述声音采集单 元阵列的波达方向。
根据本发明的实施方式, 还提供了一种非易失机器可读介质, 存储有 用于语音处理的程序产品, 所述程序产品包括用于使计算机系统执行以下 步骤的机器可执行指令: 获取终端上的声音采集单元阵列相对于用户发声 源的位置数据变化量; 根据所述位置数据变化量修正所述声音采集单元阵 列的波达方向。
根据本发明的实施方式, 还提供了一种机器可读程序, 所述程序使机 器执行如上所述技术方案中任一所述的语音处理方法。
根据本发明的实施方式, 还提供了一种存储有机器可读程序的存储介 质, 其中, 所述机器可读程序使得机器执行如上所述技术方案中任一所述 的语音处理方法。
以上结合附图详细说明了本发明的技术方案, 在终端利用陀螺仪来获 取通话时终端方位变化信息, 并利用这些信息来对基于多麦克风阵列的语 音降噪算法中某些参数进行及时修正, 使得降噪算法具备自适应性, 能自 适应地根据用户通话过程中姿势的随机变化来随时调整降噪算法, 达到最 好的降噪效果。 同时, 由于终端方位变化信息直接来自于陀螺仪, 这就大 大减少了对终端处理器的依赖, 进一步减少功耗。
以上所述仅为本发明的优选实施例而已, 并不用于限制本发明, 对于 本领域的技术人员来说, 本发明可以有各种更改和变化。 凡在本发明的精 神和原则之内, 所作的任何修改、 等同替换、 改进等, 均应包含在本发明 的保护范围之内。

Claims

权 利 要 求 书
1. 一种语音处理方法, 其特征在于, 包括:
获取终端上的声音采集单元阵列相对于用户发声源的位置数据变化 量;
根据所述位置数据变化量修正所述声音采集单元阵列的波达方向; 对所述声音采集单元获取的声音信号进行滤波处理。
2. 根据权利要求 1 所述的语音处理方法, 其特征在于, 利用所述终 端中的陀螺仪获取所述声音采集单元阵列的位置数据变化量, 其中, 所述 位置数据变化量包括参考声音采集单元的位移变化量和声音采集单元阵列 线的角度变化量。
3. 根据权利要求 1 所述的语音处理方法, 其特征在于, 所述根据所 述位置数据变化量修正所述声音采集单元阵列的波达方向的步骤包括: 获取所述声音采集单元阵列中的参考声音采集单元以及声音采集单元 阵列线相对于所述用户发声源的初始位置数据, 其中所述初始位置数据包 括参考声音采集单元的坐标初始数据和声音采集单元阵列线的角度初始数 据;
根据所述初始位置数据和所述位置数据变化量计算出当前所述用户发 声源的声波方向与所述声音采集单元阵列线的预设法线之间的波达角度。
4. 根据权利要求 3 所述的语音处理方法, 其特征在于, 以所述用户 发声源为坐标原点建立坐标系, 并根据以下公式计算出所述波达角度:
Figure imgf000026_0001
其中, 为所述波达角度, (½, yCi., zri)为所述参考声音采集单元在 所述坐标系中的坐标初始数据, (A, A, z)是所述声音采集单元阵列线在 所述坐标系中的角度初始数据, (Δ^., Δ^., Δ^.)是所述参考声音采集单元在 所述坐标系中的位移变化量, (Δ« Δ , Δ^)是所述声音采集单元阵列线在 所述坐标系中的角度变化量。
5. 根据权利要求 3 或 4 所述的语音处理方法, 其特征在于, 还包 括: 使用自动搜索波达方向方式获取所述参考声音采集单元以及所述声音 采集单元阵列线相对于所述用户发声源的初始位置数据。
6. 一种语音处理装置, 其特征在于, 包括:
获取单元, 用于获取终端上的声音采集单元阵列相对于用户发声源的 位置数据变化量;
修正单元, 根据所述位置数据变化量修正所述声音采集单元阵列的波 达方向;
处理单元, 用于对所述声音采集单元获取的声音信号进行滤波处理。
7. 根据权利要求 6 所述的语音处理装置, 其特征在于, 所述获取单 元为陀螺仪, 用于获取所述声音采集单元阵列的位置数据变化量, 其中, 所述位置数据变化量包括参考声音采集单元的位移变化量和声音采集单元 阵列线的角度变化量。
8. 根据权利要求 6 所述的语音处理装置, 其特征在于, 所述修正单 元包括:
初始位置检测单元, 获取所述声音采集单元阵列中的参考声音采集单 元以及声音采集单元阵列线相对于所述用户发声源的初始位置数据, 其中 所述初始位置数据包括参考声音采集单元的坐标初始数据和声音采集单元 阵列线的角度初始数据;
波达角度计算单元, 根据所述初始位置数据和所述位置数据变化量计 算出当前所述用户发声源的声波方向与所述声音采集单元阵列线的预设法 线之间的波达角度。
9. 根据权利要求 8 所述的语音处理装置, 其特征在于, 所述波达角 度计算单元以所述用户发声源为坐标原点建立坐标系, 并根据以下公式计 算出所述波达角度:
+A¾)c。s( +Aai) + (yci+Ayci)cos( i+A i) + (zci+Azci)cos(ri+A} 其中, 为所述波达角度, (½, yCi., zri)为所述参考声音采集单元在 所述坐标系中的坐标初始数据, (A, A, z)是所述声音采集单元阵列线在 所述坐标系中的角度初始数据, (Δ^., Δ^., Δ^.)是所述参考声音采集单元在 所述坐标系中的位移变化量, (Δ« Δ , Δ^)是所述声音采集单元阵列线在 所述坐标系中的角度变化量。
10. 根据权利要求 8或 9所述的语音处理装置, 其特征在于, 初始位 置检测单元使用自动搜索波达方向方式获取所述参考声音采集单元以及所 述声音采集单元阵列线相对于所述用户发声源的初始位置数据。
PCT/CN2014/070641 2014-01-15 2014-01-15 语音处理方法和语音处理装置 WO2015106401A1 (zh)

Priority Applications (4)

Application Number Priority Date Filing Date Title
PCT/CN2014/070641 WO2015106401A1 (zh) 2014-01-15 2014-01-15 语音处理方法和语音处理装置
CN201480072103.7A CN105874535B (zh) 2014-01-15 2014-01-15 语音处理方法和语音处理装置
EP14878656.9A EP3096319A4 (en) 2014-01-15 2014-01-15 Speech processing method and speech processing apparatus
US15/206,410 US20160322062A1 (en) 2014-01-15 2016-07-11 Speech processing method and speech processing apparatus

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2014/070641 WO2015106401A1 (zh) 2014-01-15 2014-01-15 语音处理方法和语音处理装置

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US15/206,410 Continuation-In-Part US20160322062A1 (en) 2014-01-15 2016-07-11 Speech processing method and speech processing apparatus

Publications (1)

Publication Number Publication Date
WO2015106401A1 true WO2015106401A1 (zh) 2015-07-23

Family

ID=53542275

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2014/070641 WO2015106401A1 (zh) 2014-01-15 2014-01-15 语音处理方法和语音处理装置

Country Status (4)

Country Link
US (1) US20160322062A1 (zh)
EP (1) EP3096319A4 (zh)
CN (1) CN105874535B (zh)
WO (1) WO2015106401A1 (zh)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107437420A (zh) * 2016-05-27 2017-12-05 富泰华工业(深圳)有限公司 语音信息的接收方法、系统及装置
CN109410983A (zh) * 2018-11-23 2019-03-01 广东小天才科技有限公司 一种语音搜题方法及系统
CN113380266A (zh) * 2021-05-28 2021-09-10 中国电子科技集团公司第三研究所 一种微型双麦克风语音增强方法及微型双麦克风

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107343094A (zh) * 2017-06-30 2017-11-10 联想(北京)有限公司 一种处理方法及电子设备
CN107749296A (zh) * 2017-10-12 2018-03-02 深圳市沃特沃德股份有限公司 语音翻译方法和装置
CN108122563B (zh) * 2017-12-19 2021-03-30 北京声智科技有限公司 提高语音唤醒率及修正doa的方法
CN111986692A (zh) * 2019-05-24 2020-11-24 腾讯科技(深圳)有限公司 基于麦克风阵列的声源跟踪与拾音的方法和装置
CN114208209B (zh) 2019-07-30 2023-10-31 杜比实验室特许公司 音频处理系统、方法和介质
CN111131646A (zh) * 2019-12-30 2020-05-08 Oppo广东移动通信有限公司 通话降噪方法、装置、存储介质及电子装置
CN112770208B (zh) * 2021-01-18 2022-05-31 塔里木大学 一种基于自控分级的智能语音降噪采集装置
CN113075614A (zh) * 2021-03-17 2021-07-06 武汉创现科技有限公司 一种用于巡航器的声源测向装置、巡航器及智能垃圾桶
CN113949764A (zh) * 2021-10-20 2022-01-18 福州大学 一种用于手机内置陀螺仪的反监听主动降噪结构
CN114495959A (zh) * 2021-12-14 2022-05-13 科大讯飞股份有限公司 语音增强方法、装置和系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102164328A (zh) * 2010-12-29 2011-08-24 中国科学院声学研究所 一种用于家庭环境的基于传声器阵列的音频输入系统
CN102800325A (zh) * 2012-08-31 2012-11-28 厦门大学 一种超声波辅助麦克风阵列语音增强装置
JP2013201525A (ja) * 2012-03-23 2013-10-03 Mitsubishi Electric Corp ビームフォーミング処理装置
CN103366756A (zh) * 2012-03-28 2013-10-23 联想(北京)有限公司 一种声音信号的接收方法及装置
US20130332156A1 (en) * 2012-06-11 2013-12-12 Apple Inc. Sensor Fusion to Improve Speech/Audio Processing in a Mobile Device

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8150063B2 (en) * 2008-11-25 2012-04-03 Apple Inc. Stabilizing directional audio input from a moving microphone array
GB2495131A (en) * 2011-09-30 2013-04-03 Skype A mobile device includes a received-signal beamformer that adapts to motion of the mobile device
CN102945674A (zh) * 2012-12-03 2013-02-27 上海理工大学 利用数字降噪算法来实现对语音信号的降噪处理方法
US9525938B2 (en) * 2013-02-06 2016-12-20 Apple Inc. User voice location estimation for adjusting portable device beamforming settings
US9240176B2 (en) * 2013-02-08 2016-01-19 GM Global Technology Operations LLC Active noise control system and method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102164328A (zh) * 2010-12-29 2011-08-24 中国科学院声学研究所 一种用于家庭环境的基于传声器阵列的音频输入系统
JP2013201525A (ja) * 2012-03-23 2013-10-03 Mitsubishi Electric Corp ビームフォーミング処理装置
CN103366756A (zh) * 2012-03-28 2013-10-23 联想(北京)有限公司 一种声音信号的接收方法及装置
US20130332156A1 (en) * 2012-06-11 2013-12-12 Apple Inc. Sensor Fusion to Improve Speech/Audio Processing in a Mobile Device
CN102800325A (zh) * 2012-08-31 2012-11-28 厦门大学 一种超声波辅助麦克风阵列语音增强装置

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP3096319A4 *

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107437420A (zh) * 2016-05-27 2017-12-05 富泰华工业(深圳)有限公司 语音信息的接收方法、系统及装置
CN109410983A (zh) * 2018-11-23 2019-03-01 广东小天才科技有限公司 一种语音搜题方法及系统
CN113380266A (zh) * 2021-05-28 2021-09-10 中国电子科技集团公司第三研究所 一种微型双麦克风语音增强方法及微型双麦克风
CN113380266B (zh) * 2021-05-28 2022-06-28 中国电子科技集团公司第三研究所 一种微型双麦克风语音增强方法及微型双麦克风

Also Published As

Publication number Publication date
EP3096319A1 (en) 2016-11-23
CN105874535A (zh) 2016-08-17
EP3096319A4 (en) 2017-07-12
CN105874535B (zh) 2020-03-17
US20160322062A1 (en) 2016-11-03

Similar Documents

Publication Publication Date Title
WO2015106401A1 (zh) 语音处理方法和语音处理装置
US10979805B2 (en) Microphone array auto-directive adaptive wideband beamforming using orientation information from MEMS sensors
JP7011075B2 (ja) マイク・アレイに基づく対象音声取得方法及び装置
CN107221336B (zh) 一种增强目标语音的装置及其方法
CN107534725B (zh) 一种语音信号处理方法及装置
US9641935B1 (en) Methods and apparatuses for performing adaptive equalization of microphone arrays
CN109074816B (zh) 远场自动语音识别预处理
US10107887B2 (en) Systems and methods for displaying a user interface
US8787587B1 (en) Selection of system parameters based on non-acoustic sensor information
CN104103277B (zh) 一种基于时频掩膜的单声学矢量传感器目标语音增强方法
CN111044973B (zh) 一种用于麦克风方阵的mvdr目标声源定向拾音方法
US20140003635A1 (en) Audio signal processing device calibration
CN110140359B (zh) 使用波束形成的音频捕获
JP2013543987A (ja) 遠距離場マルチ音源追跡および分離のためのシステム、方法、装置およびコンピュータ可読媒体
WO2015035785A1 (zh) 语音信号处理方法与装置
JP6977448B2 (ja) 機器制御装置、機器制御プログラム、機器制御方法、対話装置、及びコミュニケーションシステム
CN110830870B (zh) 一种基于传声器技术的耳机佩戴者语音活动检测系统
WO2019169616A1 (zh) 语音信号处理方法及装置
CN110444220B (zh) 一种多模态远程语音感知方法及装置
US20170094421A1 (en) Dynamic relative transfer function estimation using structured sparse bayesian learning
CN109270493A (zh) 声源定位方法和装置
CN110890099B (zh) 声音信号处理方法、装置以及存储介质
CN113491137B (zh) 具有分数阶的灵活差分麦克风阵列
CN115884038A (zh) 音频采集方法、电子设备及存储介质
WO2021070278A1 (ja) 雑音抑圧装置、雑音抑圧方法、及び雑音抑圧プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 14878656

Country of ref document: EP

Kind code of ref document: A1

REEP Request for entry into the european phase

Ref document number: 2014878656

Country of ref document: EP

WWE Wipo information: entry into national phase

Ref document number: 2014878656

Country of ref document: EP

NENP Non-entry into the national phase

Ref country code: DE