WO2020147642A1 - 语音信号处理方法、装置、计算机可读介质及电子设备 - Google Patents

语音信号处理方法、装置、计算机可读介质及电子设备 Download PDF

Info

Publication number
WO2020147642A1
WO2020147642A1 PCT/CN2020/071205 CN2020071205W WO2020147642A1 WO 2020147642 A1 WO2020147642 A1 WO 2020147642A1 CN 2020071205 W CN2020071205 W CN 2020071205W WO 2020147642 A1 WO2020147642 A1 WO 2020147642A1
Authority
WO
WIPO (PCT)
Prior art keywords
signal
sound source
sound
reference signal
microphone array
Prior art date
Application number
PCT/CN2020/071205
Other languages
English (en)
French (fr)
Inventor
胡玉祥
Original Assignee
北京地平线机器人技术研发有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京地平线机器人技术研发有限公司 filed Critical 北京地平线机器人技术研发有限公司
Publication of WO2020147642A1 publication Critical patent/WO2020147642A1/zh
Priority to US17/352,748 priority Critical patent/US11817112B2/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/326Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only for microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R1/00Details of transducers, loudspeakers or microphones
    • H04R1/20Arrangements for obtaining desired frequency or directional characteristics
    • H04R1/32Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only
    • H04R1/40Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers
    • H04R1/406Arrangements for obtaining desired frequency or directional characteristics for obtaining desired directional characteristic only by combining a number of identical transducers microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups
    • H04R2430/20Processing of the output signals of the acoustic transducers of an array for obtaining a desired directivity characteristic
    • H04R2430/21Direction finding using differential microphone array [DMA]

Definitions

  • the present disclosure relates to the technical field of speech enhancement, and in particular to a speech signal processing method, device, computer readable medium and electronic equipment.
  • in-vehicle smart devices have made voice technology unprecedented development in modern life. As an important carrier of information transmission, whether the quality of voice can be guaranteed will affect the auditory effect to the human ear or the processing effect of the voice processing system. In an actual environment (for example, a vehicle-mounted system), due to factors such as environmental noise, reverberation, and interference, the quality of the voice signal picked up by the microphone array will be significantly reduced.
  • the voice separation technology takes the improvement of voice quality as the starting point, effectively suppressing noise, thereby enhancing the quality of noisy voice signals in a closed environment, and recovering the original pure voice signals as much as possible.
  • a voice signal processing method including: acquiring sound source position information and at least two sound signals from a microphone array;
  • the sound source position information suppress the sound signal from the sound source direction from at least two sound signals to obtain the noise reference signal of the microphone array
  • the sound source location information obtain the sound signal from the sound source direction from at least two sound signals to obtain the speech reference signal
  • the residual noise signal in the speech reference signal is removed to obtain the desired speech signal.
  • a voice signal processing device including:
  • the first acquisition module is used to acquire sound source location information and at least two sound signals from the microphone array;
  • the sound source suppression module is used to suppress the sound signal from the sound source direction from at least two sound signals according to the sound source position information to obtain the noise reference signal of the microphone array;
  • the sound source enhancement module is used to obtain the sound signal from the sound source direction from at least two sound signals according to the sound source location information to obtain the speech reference signal;
  • the noise reduction module is used to remove the residual noise signal in the voice reference signal based on the noise reference signal to obtain the desired voice signal.
  • a computer-readable storage medium stores a computer program, and the computer program is used to execute any of the above-mentioned methods.
  • an electronic device includes: a processor; a memory for storing instructions executable by the processor; and the processor for executing any of the above methods.
  • the speech signal processing method, device, computer readable medium, and electronic equipment provided by the embodiments of the present disclosure combined with sound source location information, on the one hand suppress the sound signal in the direction of the sound source to obtain a noise reference signal, on the other hand, obtain the sound source direction To obtain the voice reference signal from the voice signal, and then remove the noise signal from the voice reference signal to achieve the purpose of reducing noise interference and improving the voice enhancement effect.
  • Fig. 1 is a schematic flowchart of a voice signal processing method provided by a first exemplary embodiment of the present disclosure.
  • Fig. 2 is a schematic flowchart of a voice signal processing method provided by a second exemplary embodiment of the present disclosure.
  • Fig. 3 is a schematic flowchart of a voice signal processing method provided by a third exemplary embodiment of the present disclosure.
  • Fig. 4 is a system structure diagram provided by an exemplary embodiment of the present disclosure.
  • Fig. 5 is a schematic flowchart of a voice signal processing method provided by a fourth exemplary embodiment of the present disclosure.
  • Fig. 6 is a system structure diagram provided by another exemplary embodiment of the present disclosure.
  • Fig. 7 is a schematic structural diagram of a voice signal processing device provided by the first exemplary embodiment of the present disclosure.
  • Fig. 8 is a schematic structural diagram of a voice signal processing device provided by a second exemplary embodiment of the present disclosure.
  • Fig. 9 is a schematic structural diagram of a voice signal processing device provided by a third exemplary embodiment of the present disclosure.
  • Fig. 10 is a schematic structural diagram of a voice signal processing apparatus provided by a fourth exemplary embodiment of the present disclosure.
  • Fig. 11 is a schematic structural diagram of a voice signal processing device provided by a fifth exemplary embodiment of the present disclosure.
  • Fig. 12 is a structural diagram of an electronic device provided by an exemplary embodiment of the present disclosure.
  • BSS Blind Source Separation
  • GSC Generalized Sidelobe Canceller
  • the noise signal is separated from the sound signal from the sound source direction in combination with the sound source position information, and the residual noise in the sound signal from the sound source direction is removed based on the separated noise signal to reduce noise interference ,
  • Fig. 1 is a schematic flowchart of a voice signal processing method provided by a first exemplary embodiment of the present disclosure. This embodiment can be applied to electronic equipment. As shown in FIG. 1, it can include the following steps:
  • Step 101 Acquire sound source location information and at least two sound signals from the microphone array.
  • the sound source location information can be obtained through image recognition.
  • image acquisition may be performed by an image acquisition device, and then image recognition is performed on the acquired images to determine the location of each sound source, thereby forming sound source location information.
  • the sound source position information may include distance information and angle information of the sound source relative to the microphone array, or a combination of distance information and angle information.
  • the microphone array is composed of several microphones arranged in a certain geometric size in space.
  • the microphone array can collect the spatial and time/frequency information of the sound source from the environment, and use these spatial and time/frequency information as a sound signal for subsequent location and tracking of the sound source.
  • at least two sound signals can be obtained from the microphone array, and these sound signals come from multiple sound sources.
  • the sound signal may include a sound from a music player, a human speaking (voice signal), and other sounds in the environment.
  • Step 102 According to the sound source position information, suppress the sound signal from the sound source direction from the at least two sound signals to obtain a noise reference signal of the microphone array.
  • the sound source position information can be used to determine the direction of the sound source, so that the sound signal from the sound source direction can be determined from at least two sound signals of the microphone array, and the sound signal from the sound source direction can be suppressed.
  • the noise signal of the microphone array is obtained as the noise reference signal in this disclosure.
  • Step 103 Acquire a sound signal from the direction of the sound source from at least two sound signals according to the sound source location information to obtain a speech reference signal.
  • the signal from the direction of the sound source is determined according to the position information of the sound source, so as to obtain the sound signal from the direction of the sound source as the speech reference signal in the present disclosure.
  • step 102 and step 103 may not be restricted by the sequence. After step 101 is completed, step 102 or step 103 may be performed first.
  • Step 104 Based on the noise reference signal, remove the residual noise signal in the voice reference signal to obtain a desired voice signal.
  • the speech reference signal obtained in step 103 there will be some residual noise interference signals, which will affect the quality of the speech signal.
  • the residual noise signal in the voice reference signal is removed, and the noise interference of the voice reference signal is reduced.
  • a speech noise reduction method such as an adaptive filtering algorithm, a subspace noise reduction algorithm, and a linear filtering method may be used.
  • the noise reference signal can be filtered by an adaptive noise reduction filter, the speech reference signal and the filtered noise reference signal can be subtracted to obtain the speech enhancement signal, and then the self The filter coefficient of the adaptive noise reduction filter is adjusted until the strength of the speech enhancement signal is greater than the preset strength to obtain the desired speech signal.
  • the embodiments of the present disclosure adopt an adaptive noise reduction filter to perform noise reduction processing, which can effectively improve the speech enhancement effect of a specific scene (for example, the number of sound sources is more than the number of microphones), and a desired speech signal with better speech quality can be obtained.
  • the speech signal processing method provided by the present disclosure uses the position information of the sound source to accurately obtain the sound signal from the sound source direction, and separates the noise signal from the sound signal from the sound source direction, that is, on the one hand, suppresses the sound signal from the sound source direction
  • the noise reference signal on the other hand, the sound signal in the direction of the sound source is obtained to obtain the speech reference signal, and then based on the separated noise signal, the residual noise in the sound signal from the direction of the sound source is removed, so as to reduce noise interference and achieve The purpose of improving the effect of speech enhancement, so as to extract the desired speech signal with better signal quality.
  • Fig. 2 is a schematic flowchart of a voice signal processing method provided by a second exemplary embodiment of the present disclosure. As shown in Fig. 2, the voice signal processing method provided by the present disclosure may include the following steps:
  • Step 201 Acquire sound source location information and at least two sound signals from the microphone array.
  • step 201 for the implementation principle and implementation process of step 201, reference may be made to the related description of step 101 in the first exemplary embodiment. For brevity, detailed description is omitted here.
  • Step 202 Perform fast Fourier transform on at least two sound signals to obtain a speech signal matrix.
  • the sound signal can be represented by the spatial domain and time/frequency information of the sound source.
  • the sound signal for example, spatial domain and time/frequency information
  • FFT Fast Fourier Transform
  • the fast Fourier transform is used to calculate the sound signal, which can reduce the number of multiplications of the Fourier transform, thereby reducing the amount of calculation, accelerating the calculation rate and improving the calculation efficiency.
  • Step 203 based on the speech signal matrix, determine the noise reference signal of the microphone array through a preset blind source separation algorithm with direction constraints.
  • the preset blind source separation algorithm with direction constraints in the present disclosure may be an exemplary blind source separation (Blind Source Separation, BSS) algorithm with direction of arrival (Direction of Arrival, DOA) constraints.
  • BSS Blind Source Separation
  • DOA Direction of Arrival
  • the algorithm can be determined based on the sound source location information.
  • the cost function of the BSS algorithm with DOA constraints of the present disclosure can be expressed as follows:
  • W(k) is the separation filter corresponding to the k-th frequency point
  • log represents the logarithm
  • det represents the determinant of the matrix
  • G(y i ) is the contrast function, which can be expressed as log q(y i ), q(y i ) is the probability density distribution of the i-th sound source;
  • w 1 (k) is the first row of the separation matrix W(k)
  • g ⁇ (k) is the filter that forms the spatial zero in the sound source direction ⁇
  • is used to control the strength of the constraint condition .
  • step 203 can be implemented in the following manner: according to the sound source position information and the speech signal matrix, determine the guidance vector of the sound source direction, and then determine the first filter according to the guidance vector, where the first filter is used to suppress Further, the first filter is used as the initial value of the first group of separation filters of the blind source separation algorithm with direction constraints in the present disclosure.
  • w 1 (k) will converge around g ⁇ (k).
  • the penalty factor term introduced by the formula is 0.
  • a spatial zero point is formed in the direction of the sound source, thereby suppressing the signal from the direction of the sound source and outputting the noise of the microphone array Reference signal.
  • the steering vector can be determined by the following scheme: Illustratively, assuming that the number of microphones is 2, under free-field conditions, for the speech signal matrix In the sound signal from the ⁇ direction, the steering vector of h ⁇ (k) can be expressed as:
  • r is the distance between the microphones in the microphone array
  • k is the wave number
  • is the direction of the sound signal (or the position of the sound source).
  • the first filter in order to suppress the sound signal in the ⁇ direction, is determined according to its steering vector.
  • the first filter is represented by g ⁇ (k), so that the spatial response of g ⁇ (k) in the ⁇ direction can be expressed as:
  • the first filter g ⁇ (k) can be expressed as:
  • Step 204 Obtain a sound signal from the direction of the sound source from at least two sound signals according to the sound source location information to obtain a speech reference signal.
  • the blind source separation algorithm with direction constraint separates the sound signal of the microphone array, and the noise reference signal is obtained one way after separation (refer to the aforementioned step 202 and step 203), and the other way is as described in this step. Said separation obtains the speech reference signal.
  • Step 205 Based on the noise reference signal, remove the residual noise signal in the voice reference signal to obtain a desired voice signal.
  • step 205 for the implementation principle and implementation process of step 205, reference may be made to the related description of step 104 in the first exemplary embodiment. For brevity, detailed descriptions are omitted here.
  • the first filter g ⁇ (k) is designed based on the free field model.
  • the space zero effect formed by g ⁇ (k) is not ideal, that is, the direction of the sound source is suppressed.
  • the effect of the sound signal is not ideal.
  • the BSS algorithm can form a more ideal spatial zero point, but the BSS algorithm is more sensitive to the initial selection of the separation matrix, and when the number of sound sources exceeds the number of microphones, the BSS algorithm cannot guarantee The direction forms the spatial zero point.
  • adding DOA constraints provided sound source location information
  • the penalty factor in the blind source separation algorithm with direction constraints in the present disclosure so that w 1 (k) will converge around g ⁇ (k).
  • the penalty factor term introduced by the formula is 0.
  • a spatial zero point is formed in the direction of the sound source, thereby suppressing the signal from the direction of the sound source, and the output is a more ideal microphone array The noise reference signal.
  • the effect of removing residual noise in the speech reference signal based on the noise reference signal is particularly desirable, so as to output a desired speech signal of better quality.
  • Fig. 3 is a schematic flowchart of a voice signal processing method provided by a third exemplary embodiment of the present disclosure.
  • a voice signal processing method provided by the present disclosure may further include the following steps:
  • Step 301 Acquire sound source location information and at least two sound signals from the microphone array.
  • step 301 for the implementation principle and implementation process of step 301, reference may be made to the related description of step 101 in the first exemplary embodiment. For brevity, detailed description is omitted here.
  • Step 302 Perform fast Fourier transform on at least two sound signals to obtain a speech signal matrix.
  • step 302 for the implementation principle and implementation process of step 302, reference may be made to the related description of step 202 in the second exemplary embodiment. For brevity, detailed description is omitted here.
  • Step 303 Based on the speech signal matrix, determine the noise reference signal of the microphone array through a preset blind source separation algorithm with direction constraints.
  • step 303 for the implementation principle and implementation process of step 303, reference may be made to the related description of step 203 in the second exemplary embodiment. For brevity, detailed description is omitted here.
  • Step 304 Determine the guidance vector of the sound source direction according to the sound source position information and the speech signal matrix.
  • Step 305 Determine a second filter according to the steering vector, where the second filter is used to enhance the voice signal in the direction of the sound source.
  • the steering vector can characterize the characteristics of the sound signal.
  • the second filter is determined based on the characteristics of the sound signal of the microphone array.
  • the second filter can be operated with the steering vector to achieve the purpose of enhancing the voice signal in the direction of the sound source.
  • Step 306 Use the second filter as the initial value of the second set of separation filters of the blind source separation algorithm with direction constraints to output a speech reference signal.
  • Step 307 Based on the noise reference signal, remove the residual noise signal in the voice reference signal to obtain a desired voice signal.
  • step 307 for the implementation principle and implementation process of step 307, reference may be made to the related description of step 104 in the second exemplary embodiment. For brevity, detailed description is omitted here.
  • adding DOA constraints can provide a relatively ideal initial value for the BSS algorithm, thereby improving the separation effect of the BSS algorithm and enhancing the voice signal in the direction of the sound source.
  • Fig. 4 is a system structure diagram provided by an exemplary embodiment of the present disclosure.
  • the present disclosure provides a voice signal processing method, which can obtain sound source location information and at least two sound signals from a microphone array, and obtain a voice reference signal Y ch1 based on the BSS algorithm with DOA constraints of the present disclosure.
  • the voice reference signal Y ch1 is subjected to noise reduction processing through, for example, an adaptive noise reduction filter, and finally the desired voice signal Y is output.
  • noise reduction processing through, for example, an adaptive noise reduction filter, and finally the desired voice signal Y is output.
  • the figure only shows two channels of sound signals from the microphone array, which may actually be two channels or more than two channels of sound signals.
  • Fig. 5 is a schematic flowchart of a voice signal processing method provided by a fourth exemplary embodiment of the present disclosure. As shown in FIG. 5, a voice signal processing method provided by the present disclosure further includes the following steps:
  • Step 501 Acquire sound source location information and at least two sound signals from the microphone array.
  • step 501 can refer to the related description of step 101 in the first exemplary embodiment.
  • step 101 the implementation principle and implementation process of step 501 can refer to the related description of step 101 in the first exemplary embodiment.
  • detailed descriptions are omitted here.
  • Step 502 According to the sound source location information, suppress the sound signal from the sound source direction from the at least two sound signals to obtain a noise reference signal of the microphone array.
  • step 502 can refer to the related descriptions of step 102 in the first exemplary embodiment, step 202 and step 203 in the second exemplary embodiment. For the sake of brevity, detailed descriptions are omitted here.
  • Step 503 Obtain position information of the microphone array.
  • the distance information between two adjacent microphones in the microphone array can be obtained, which can be used as the position information of the microphone array in this step.
  • the position information of the microphone array can be obtained by obtaining the input information of the input device, or can be obtained from the configuration information of the microphone array itself. This is not limited in the embodiments of the present disclosure.
  • Step 504 Based on the sound source position information and the microphone array position information, a third filter is determined through a beamforming algorithm.
  • the third filter may be determined by a fixed beamforming algorithm.
  • the distance between two adjacent microphones in the microphone array is denoted as d, and assuming that the sound source is located in the far field, the incident angle of the beam reaching the array is ⁇ , and the sound velocity is c, then the nth microphone and the reference microphone
  • the function method estimates the time delay difference between the sound source reaching the reference microphone and another microphone.
  • the distance difference is calculated by the time delay, and then determined by the distance difference and the spatial geometric position of the microphone array.
  • it can be expressed as Thus, the third filter is determined.
  • a fixed beamforming algorithm to determine the third filter is only an exemplary embodiment of the present disclosure, but is not limited to an algorithm implementation using a fixed beamforming algorithm, for example, an adaptive beamforming algorithm can also be used for implementation.
  • Step 505 Process the sound signal from the microphone array through the third filter to obtain a speech reference signal.
  • the sound signal from the microphone array is input to the third filter in step 504.
  • the sound signal at least directly or indirectly includes information such as sound speed, so as to determine the incident angle of the beam to the array.
  • the sound signal of the sound source direction can be determined to output the speech reference signal.
  • Step 506 based on the noise reference signal, remove the residual noise signal in the voice reference signal to obtain a desired voice signal.
  • step 506 can refer to the related description of step 104 in the foregoing first exemplary embodiment, or can be referenced by reference to the step 104 in the foregoing first exemplary embodiment.
  • the implementation manner of 104 is realized, and for the sake of brevity, detailed description is omitted here.
  • the noise reference signal of the microphone array is determined by combining the sound source location information
  • the voice reference signal is determined by combining the sound source location information and using a beamforming algorithm, which can reduce the signal leakage of the voice reference signal, and based on the noise reference
  • the signal further performs noise reduction processing on the speech reference signal, which can further suppress the interference component in the desired speech signal.
  • Fig. 6 is a system structure diagram provided by another exemplary embodiment of the present disclosure. As shown in the structure diagram shown in Figure 6, after obtaining the sound source position information and at least two sound signals from the microphone array, they can be processed by the beamforming algorithm and the BSS algorithm with DOA constraints. Illustratively, it can be known The sound source direction can be determined based on the distance between adjacent microphones to obtain the speech reference signal Y ch1 , and the noise reference signal Y ch2 is determined after processing by the BSS algorithm with DOA constraints.
  • the voice reference signal obtained by the beamforming algorithm can reduce the impact of the large number of microphones in the microphone array (therefore, it is especially suitable for the scene with a large number of microphones in the microphone array), and the BSS algorithm with DOA constraint can effectively suppress the sound
  • the sound signal in the source direction can obtain a particularly desired noise reference signal Y ch2 (with less sound signal components in the sound source direction). Therefore, the desired speech signal Y finally determined based on the speech reference signal and the noise reference signal has reduced noise interference components and improved speech enhancement effect.
  • the foregoing embodiments describe in detail the voice signal processing method of the present disclosure, and the present disclosure also provides a device for implementing voice signal processing.
  • the voice signal processing device will be described below with reference to the accompanying drawings.
  • the functions of the modules, units or sub-units involved in the device can correspond to the aforementioned voice signal processing methods, and their technical effects can be referred to related embodiments of the aforementioned voice signal processing methods.
  • Fig. 7 is a schematic structural diagram of a voice signal processing device provided by the first exemplary embodiment of the present disclosure.
  • the speech signal processing device 700 in the present disclosure may include: a first acquisition module 710, a sound source suppression module 720, a sound source enhancement module 730, and a noise reduction module 740.
  • the first acquiring module 710 may be used to acquire sound source location information and at least two sound signals from the microphone array
  • the sound source suppression module 720 may be configured to obtain information from at least two sound sources based on the sound source location information acquired by the first acquiring module 710. The sound signal from the sound source direction is suppressed in the sound signal to obtain the noise reference signal of the microphone array.
  • the sound source enhancement module 730 may be used to obtain the sound signal from at least two channels according to the sound source position information of the first acquiring module 710
  • the sound signal in the direction of the sound source is used to obtain a voice reference signal
  • the noise reduction module 740 is used for removing residual noise signals in the voice reference signal based on the noise reference signal obtained by the sound source suppression module 720 to obtain a desired voice signal.
  • Fig. 8 is a schematic structural diagram of a voice signal processing device provided by a second exemplary embodiment of the present disclosure.
  • the speech signal processing device 700 in the present disclosure may include: a first acquisition module 710, a sound source suppression module 720, a sound source enhancement module 730, and a noise reduction module 740.
  • the sound source suppression module 720 may include a matrix determining unit 721 and a noise determining unit 722.
  • the matrix determining unit 721 may be used to perform fast Fourier transform on at least two sound signals to obtain a speech signal matrix
  • the noise determining unit 722 is used to determine the microphone based on the speech signal matrix and using a preset blind source separation algorithm with direction constraints.
  • the noise reference signal of the array, wherein the preset blind source separation algorithm with direction constraint is determined according to the sound source position information.
  • the noise determination unit 722 may further include: a vector determination subunit (not shown in the figure), a signal suppression subunit (not shown in the figure), and a noise determination subunit (not shown in the figure).
  • the vector determination subunit can be used to determine the sound source direction guidance vector according to the sound source position information and the speech signal matrix
  • the signal suppression subunit can be used to determine the first filter according to the guidance vector, and the first filter is used to suppress the sound The voice signal in the source direction
  • the noise determination subunit can be used to use the first filter as the initial value of the first set of separation filters of the blind source separation algorithm with direction constraints to output the noise reference signal of the microphone array.
  • Fig. 9 is a schematic structural diagram of a voice signal processing device provided by a third exemplary embodiment of the present disclosure.
  • the speech signal processing device 700 provided by the present disclosure includes a first acquisition module 710, a sound source suppression module 720, a sound source enhancement module 730, and a noise reduction module 740.
  • the sound source enhancement module 730 may include a vector determination unit 731, a speech enhancement unit 732, and a signal output unit 733.
  • the vector determining unit 731 may be used to determine a guidance vector of the sound source direction according to the sound source position information
  • the speech enhancement unit 732 may be used to determine a second filter according to the guidance vector, wherein the second filter is used to enhance the sound source
  • the signal output unit 733 may be used to use the second filter as the initial value of the second set of separation filters of the blind source separation algorithm with direction constraints to output the voice reference signal.
  • Fig. 10 is a schematic structural diagram of a voice signal processing apparatus provided by a fourth exemplary embodiment of the present disclosure.
  • the speech signal processing apparatus 700 provided by the present disclosure may further include a second acquisition module 750, and the sound source enhancement module 730 may include a filter determination unit 734 and a signal processing unit 735.
  • the second acquisition module 750 is used to obtain the position information of the microphone array
  • the filter determination unit 734 may be used to determine the third filter through the beamforming algorithm based on the sound source position information and the position information of the microphone array
  • the processing unit 735 is configured to process the sound signal from the microphone array through the third filter to obtain a voice reference signal.
  • Fig. 11 is a schematic structural diagram of a voice signal processing device provided by a fifth exemplary embodiment of the present disclosure.
  • the noise reduction module 740 may include a filtering unit 741, an arithmetic unit 742, and a coefficient adjustment unit 743.
  • the filtering unit 741 can be used to filter the noise reference signal through an adaptive noise reduction filter; the arithmetic unit 742 can be used to subtract the speech reference signal from the filtered noise reference signal to obtain speech enhancement The signal; the coefficient adjustment unit 743 may be used to adjust the filter coefficient of the adaptive noise reduction filter based on the speech enhancement signal until the strength of the speech enhancement signal is greater than the preset strength to obtain the desired speech signal.
  • the voice signal processing device provided by the embodiments of the present disclosure combines the sound source position information, on the one hand, suppresses the sound signal in the direction of the sound source to obtain a noise reference signal, and on the other hand, obtains the sound signal in the direction of the sound source to obtain the voice reference signal, and then Remove the noise signal from the speech reference signal to achieve the purpose of reducing noise interference and improving the effect of speech enhancement.
  • FIG. 12 illustrates a block diagram of an electronic device according to an embodiment of the present disclosure.
  • the electronic device 11 includes one or more processors 111 and a memory 112.
  • the processor 111 may be a central processing unit (CPU) or other form of processing unit with data processing capability and/or instruction execution capability, and may control other components in the electronic device 11 to perform desired functions.
  • CPU central processing unit
  • the processor 111 may be a central processing unit (CPU) or other form of processing unit with data processing capability and/or instruction execution capability, and may control other components in the electronic device 11 to perform desired functions.
  • the memory 112 may include one or more computer program products, and the computer program products may include various forms of computer-readable storage media, such as volatile memory and/or nonvolatile memory.
  • Volatile memory may include random access memory (RAM) and/or cache memory (cache), for example.
  • the non-volatile memory may include read-only memory (ROM), hard disk, flash memory, etc., for example.
  • One or more computer program instructions may be stored on a computer-readable storage medium, and the processor 111 may run the program instructions to implement the voice signal processing method and/or other desired functions of the various embodiments of the present disclosure described above. .
  • Various contents such as input signal, signal component, noise component, etc. can also be stored in the computer-readable storage medium.
  • the electronic device 11 may further include an input device 113 and an output device 114, and these components are interconnected by a bus system and/or other forms of connection mechanisms (not shown).
  • the input device 113 may be the aforementioned camera, microphone, microphone array, etc., for capturing an image or input signal of a sound source.
  • the input device 123 may be a communication network connector for receiving the collected input signal from the neural network processor.
  • the input device 113 may also include, for example, a keyboard, a mouse, and so on.
  • the output device 114 can output various information to the outside, including determined output voltage and output current information.
  • the output device 114 may include, for example, a display, a speaker, a printer, and a communication network and a remote output device connected thereto.
  • the electronic device 11 may also include any other appropriate components according to specific application conditions.
  • the embodiments of the present disclosure may also be computer program products, which include computer program instructions that, when run by a processor, cause the processor to perform the description in the "exemplary method" section of this specification. Steps in a voice signal processing method according to various embodiments of the present disclosure.
  • the computer program product can be used to write program codes for performing the operations of the embodiments of the present disclosure in any combination of one or more programming languages.
  • the programming languages include object-oriented programming languages, such as Java, C++, etc., as well as conventional Procedural programming language, such as "C" language or similar programming language.
  • the program code can be executed entirely on the user's computing device, partly on the user's device, executed as an independent software package, partly on the user's computing device and partly executed on the remote computing device, or entirely on the remote computing device or server Executed on.
  • embodiments of the present disclosure may also be a computer-readable storage medium on which computer program instructions are stored.
  • the processor executes the basis described in the “exemplary method” section of this specification. The steps in the voice signal processing method of various embodiments of the present disclosure.
  • the computer-readable storage medium may adopt any combination of one or more readable media.
  • the readable medium may be a readable signal medium or a readable storage medium.
  • the readable storage medium may include, but is not limited to, an electrical, magnetic, optical, electromagnetic, infrared, or semiconductor system, device, or device, or a combination of any of the above, for example. More specific examples (non-exhaustive list) of readable storage media include: electrical connections with one or more wires, portable disks, hard disks, random access memory (RAM), read-only memory (ROM), erasable Type programmable read only memory (EPROM or flash memory), optical fiber, portable compact disk read only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • each component or each step can be decomposed and/or recombined. These decompositions and/or recombinations should be regarded as equivalent solutions of the present disclosure.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Quality & Reliability (AREA)
  • Otolaryngology (AREA)
  • General Health & Medical Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

一种语音信号处理方法、装置、计算机可读介质及电子设备,包括:获取声源位置信息和来自传声器阵列的至少两路声音信号(步骤101);根据声源位置信息,从至少两路声音信号中抑制来自声源方向的声音信号,以获得传声器阵列的噪声参考信号(步骤102);根据声源位置信息,从至少两路声音信号中获取来自声源方向的声音信号,以得到语音参考信号(步骤103);基于噪声参考信号,去除语音参考信号中残留的噪声信号,以得到期望语音信号(步骤104)。结合声源位置信息,一方面抑制声源方向的声音信号以得到噪声参考信号,另一方面,获取声源方向的声音信号以得到语音参考信号,进而从语音参考信号中去除噪声信号,以实现降低噪声干扰,提升语音增强效果的目的。

Description

语音信号处理方法、装置、计算机可读介质及电子设备
本公开要求2019年1月15日提交的申请号为201910035553.3的中国专利申请的优先权,通过引用将其全部内容并入本文。
技术领域
本公开涉及语音增强技术领域,具体涉及一种语音信号处理方法、装置、计算机可读介质及电子设备。
发明背景
车载智能设备的普及使得语音技术在现代生活中得到了前所未有的发展,而语音作为信息传递的重要载体,其质量是否能得以保障会影响到达人耳的听觉效果或者语音处理系统的处理效果。在实际环境(例如,车载系统)中,由于受到环境噪声、混响以及干扰等因素的影响,传声器阵列拾取的语音信号的质量的会明显下降。语音分离技术则以提高语音质量为出发点,有效地对噪声进行抑制,从而增强封闭环境下的带噪语音信号的质量,尽可能地恢复出原始的纯净语音信号。
发明内容
为了解决上述技术问题,提出了本公开。
根据本公开的一个方面,提供了一种语音信号处理方法,包括:获取声源位置信息和来自传声器阵列的至少两路声音信号;
根据声源位置信息,从至少两路声音信号中抑制来自声源方向的声音信号,以获得传声器阵列的噪声参考信号;
根据声源位置信息,从至少两路声音信号中获取来自声源方向的声音信号,以得到语音参考信号;
基于噪声参考信号,去除语音参考信号中残留的噪声信号,以得到期望语音信号。
根据本公开的另一个方面,提供了一种语音信号处理装置,包括:
第一获取模块,用于获取声源位置信息和来自传声器阵列的至少两路声音信号;
声源抑制模块,用于根据声源位置信息,从至少两路声音信号中抑制来自声源方向的声音信号,以获得传声器阵列的噪声参考信号;
声源增强模块,用于根据声源位置信息,从至少两路声音信号中获取来自声源方向的声音信号,以得到语音参考信号;
降噪模块,用于基于噪声参考信号,去除语音参考信号中残留的噪声信号,以得到期望语音信号。
根据本公开的另一个方面,提供了一种计算机可读存储介质,存储介质存储有计算机程序,计算机程序用于执行上述任一所述的方法。
根据本公开的另一个方面,提供了一种电子设备,电子设备包括:处理器;用于存储处理器可执行指令的存储器;处理器用于执行上述任一的方法。
本公开实施例提供的语音信号处理方法、装置、计算机可读介质及电子设备,结合声源位置信息,一方面抑制声源方向的声音信号以得到噪声参考信号,另一方面,获取声源方向的声音信号以得到语音参考信号,进而从语音参考信号中去除噪声信号,以实现降低噪声干扰,提升语音增强效果的目的。
附图简要说明
通过结合附图对本公开实施例进行更详细的描述,本公开的上述以及其他目的、特征和优势将变得更加明显。附图用来提供对本公开实施例的进一步理解,并且构成说明书的一部分,与本公开实施例一起用于解释本公开,并不构成对本公开的限制。在附图中,相同的参考标号通常代表相同部件或步骤。
图1是本公开第一示例性实施例提供的语音信号处理方法的流程示意图。
图2是本公开第二示例性实施例提供的语音信号处理方法的流程示意图。
图3是本公开第三示例性实施例提供的语音信号处理方法的流程示意图。
图4是本公开一示例性实施例提供的系统结构图。
图5是本公开第四示例性实施例提供的语音信号处理方法的流程示意图。
图6是本公开另一示例性实施例提供的系统结构图。
图7是本公开第一示例性实施例提供的语音信号处理装置的结构示意图。
图8是本公开第二示例性实施例提供的语音信号处理装置的结构示意图。
图9是本公开第三示例性实施例提供的语音信号处理装置的结构示意图。
图10是本公开第四示例性实施例提供的语音信号处理装置的结构示意图。
图11是本公开第五示例性实施例提供的语音信号处理装置的结构示意图。
图12是本公开一示例性实施例提供的电子设备的结构图。
实施本公开的方式
下面,将参考附图详细地描述根据本公开的示例实施例。显然,所描述的实施例仅仅是本公开的一部分实施例,而不是本公开的全部实施例,应理 解,本公开不受这里描述的示例实施例的限制。
在语音增强技术领域中,从在嘈杂环境(例如,在该环境中包括了音乐、车辆行驶噪声以及多个说话人的声音)中分离提取期望说话人的语音信号是语音增强系统极具挑战性的问题之一。在一些技术中,采用盲信号分离(Blind Source Separation,BSS)系统或广义旁瓣相消器(Generalized Sidelobe Canceller,GSC)进行语音分离,识别和增强来自特定声源方向的语音信号,从而得到期望说话人的语音信号。但是,由于语音信号的时延及空间滤波效应,使得BSS或者GSC对期望语音信号进行分离可能出现噪声干扰较大或期望语音信号泄露等问题,从而使得分离出的语音的质量不够理想。当声源数目多于传声器数目时,盲信号分离系统的分离性能的稳定性较差,得到的期望语音信号的噪声干扰太大,而广义旁瓣相消器的阻塞矩阵设计较为复杂,且目前使用自由场声传播模型设计的阻塞矩阵,会引起期望方向的信号泄露。
在本公开中,结合声源位置信息,将噪声信号与来自声源方向的声音信号分离开,且基于分离出的噪声信号去除来自声源方向的声音信号中残留的噪声,以实现降低噪声干扰,提升语音增强效果的目的,从而提取出信号质量较好的期望语音信号。
图1是本公开第一示例性实施例提供的语音信号处理方法的流程示意图。本实施例可应用在电子设备上,如图1所示,可以包括如下步骤:
步骤101,获取声源位置信息和来自传声器阵列的至少两路声音信号。
其中,声源位置信息可以通过图像识别的方法获取到。示例性地,可以通过图像采集装置进行图像采集,再对采集到的图像进行图像识别,以确定每个声源的位置,从而形成声源位置信息。声源位置信息可以包括声源相对于传声器阵列的距离信息、角度信息,或者距离信息和角度信息的结合。
传声器阵列由在空间中按一定几何尺寸排列的若干个传声器组成。传声器阵列可以从环境中采集到声源的空域和时/频信息,以将这些空域和时/频信息作为声音信号,用于后续实现声源的定位和跟踪。在本公开中,可以从传声器阵列获取到至少两路声音信号,这些声音信号来自于多个声源。示例性地,在车载环境中,声音信号可以包括来自音乐播放器、人说话的声音(语音信号)及环境中的其他声音等。
步骤102,根据声源位置信息,从至少两路声音信号中抑制来自声源方向的声音信号,以获得传声器阵列的噪声参考信号。
在本公开中,声源位置信息可以用以确定声源的方向,从而可以从传声器阵列的至少两路声音信号中确定出来自声源方向的声音信号,进而抑制该来自声源方向的声音信号,得到传声器阵列的噪声信号,以作为本公开中的噪声参考信号。
步骤103,根据声源位置信息,从至少两路声音信号中获取来自声源方向的 声音信号,以得到语音参考信号。
在本步骤中,根据声源位置信息确定出来自声源方向的信号,从而获取来自声源方向的声音信号,以作为本公开中的语音参考信号。
需要说明的是,步骤102和步骤103可以不受先后顺序的限制,在完成步骤101后,可以先执行步骤102,也可以先执行步骤103。
步骤104,基于噪声参考信号,去除语音参考信号中残留的噪声信号,以得到期望语音信号。
在步骤103中得到的语音参考信号中,会有部分残留的噪声干扰信号,将会影响到语音信号的质量。为了能够提高期望语音信号的信号质量,在本公开中,基于步骤102中得到的噪声参考信号,去除语音参考信号中残留的噪声信号,降低语音参考信号的噪声干扰。在示例性实施例中,去除语音参考信号中残留的噪声信号可以用例如自适应滤波算法、子空间降噪算法、线性滤波法等语音降噪法。
示例性地,可以通过自适应降噪滤波器对噪声参考信号进行滤波处理,将语音参考信号与滤波处理后的噪声参考信号进行相减运算,以得到语音增强信号,再基于语音增强信号对自适应降噪滤波器的滤波系数进行调整,直到语音增强信号的强度大于预设强度为止,以得到期望语音信号。本公开实施例采用自适应降噪滤波器进行降噪处理,可有效提升特定场景(例如,声源数目多于传声器数目)的语音增强效果,得到语音质量更理想的期望语音信号。
本公开提供的语音信号处理方法,利用声源位置信息准确地获得来自声源方向的声音信号,而将噪声信号与来自声源方向的声音信号分离开,即一方面抑制声源方向的声音信号以得到噪声参考信号,另一方面,获取声源方向的声音信号以得到语音参考信号,进而基于分离出的噪声信号去除来自声源方向的声音信号中残留的噪声,以实现降低噪声干扰,达到提升语音增强效果的目的,从而提取出信号质量较好的期望语音信号。
图2是本公开第二示例性实施例提供的语音信号处理方法的流程示意图。如图2所示,本公开提供的语音信号处理方法可以包括如下步骤:
步骤201,获取声源位置信息和来自传声器阵列的至少两路声音信号。
在本示例性实施例中,步骤201的实现原理和实现过程可以参考第一示例性实施例中的步骤101的相关描述,为了简洁,在此不再进行详细描述。
步骤202,对至少两路声音信号进行快速傅立叶变换,得到语音信号矩阵。
在本公开中,声音信号可以用声源的空域和时/频信息来表示。在本步骤中,可以对声音信号(例如,空域和时/频信息)进行快速傅立叶变换(Fast Fourier Transform,FFT),以确定语音信号矩阵。为了能够保障声音信号的语音质量,会保留较多的抽样点。在本公开中,采用快速傅立叶变换针对声音信号进行计算, 能够减少傅立叶变换的乘法次数,从而降低计算量,加快运算速率和提高运算效率。
步骤203,基于语音信号矩阵,通过预设的带方向约束的盲源分离算法,确定传声器阵列的噪声参考信号。
根据本公开实施例,本公开中预设的带方向约束的盲源分离算法示例性的可以是带波达方向(Direction of Arrival,DOA)约束的盲信号分离(Blind Source Separation,BSS)算法,该算法可以根据声源位置信息确定。示例性地,本公开的带DOA约束的BSS算法代价函数可以表示如下:
Figure PCTCN2020071205-appb-000001
其中,W(k)为第k个频点对应的分离滤波器,log表示取对数,det表示求矩阵的行列式;
G(y i)为对照函数,可以表示为log q(y i),q(y i)为第i个声源的概率密度分布;
Figure PCTCN2020071205-appb-000002
为惩罚因子;其中,w 1(k)为分离矩阵W(k)的第一行,g θ(k)为在声源方向θ形成空间零点的滤波器,λ用来控制约束条件的强弱。
示例性地,步骤203可以通过如下方式实现:根据声源位置信息和语音信号矩阵,确定声源方向的导向向量,再根据该导向向量确定第一滤波器,其中,该第一滤波器用于抑制声源方向的语音信号,进一步地,将该第一滤波器作为本公开中带方向约束的盲源分离算法的第一组分离滤波器初值。在一些实施例中,基于本公开中带方向约束的盲源分离算法中的惩罚因子(参见步骤105中的相关描述),使得w 1(k)会在g θ(k)附近进行收敛。当w 1(k)=g θ(k)时,公式引入的惩罚因子项为0,此时会在声源方向形成空间零点,从而抑制了来自声源方向的信号,而输出传声器阵列的噪声参考信号。
在本公开实施例中,针对语音信号矩阵中的任一路声音信号,其导向向量的确定可以通过如下方案实现:示例性地,假设传声器数目为2时,在自由场条件下,针对语音信号矩阵中的来自θ方向声音信号,其导向向量为h θ(k)可以表示为:
h θ(k)=[1 e -jkr cos θ] T
其中,r为传声器阵列中传声器的间距,k为波数,θ为声音信号的方向(或者声源位置)。
进一步地,在本公开实施例中,为了抑制θ方向的声音信号,根据其导向向量来确定第一滤波器。示例性地,第一滤波器用g θ(k)来表示,则使得g θ(k)在θ方 向的空间响应可以表示为:
g θ(k)h θ(k)=0
示例性地,第一滤波器g θ(k)可以表示为:
g θ(k)=[1 -e jkr cos θ]
步骤204,根据声源位置信息,从至少两路声音信号中获取来自声源方向的声音信号,以得到语音参考信号。
在本公开实施例中,带方向约束的盲源分离算法对传声器阵列的声音信号进行分离后,分离后一路得到噪声参考信号(参考前述步骤202和步骤203),另一路则如本步骤中所述分离得到语音参考信号。
步骤205,基于噪声参考信号,去除语音参考信号中残留的噪声信号,以得到期望语音信号。
在本示例性实施例中,步骤205的实现原理和实现过程可以参考第一示例性实施例中的步骤104的相关描述,为了简洁,在此不再进行详细描述。
结合前述步骤的相关描述,第一滤波器g θ(k)是基于自由场模型设计的,实际环境由于混响等存在,g θ(k)形成的空间零点效果不够理想,即抑制声源方向的声音信号的效果不理想,此外,BSS算法可以形成较为理想的空间零点,但BSS算法对分离矩阵初值选区比较敏感,并且当声源数目多于传声器数目时,BSS算法无法保证在声源方向形成空间零点。而在本公开实施例中,添加DOA约束(提供的声源位置信息)可以为BSS算法提供较为理想的初值,此外,本公开中带方向约束的盲源分离算法中的惩罚因子(参见步骤203中的相关描述),使得w 1(k)会在g θ(k)附近进行收敛。当w 1(k)=g θ(k)时,公式引入的惩罚因子项为0,此时会在声源方向形成空间零点,从而抑制了来自声源方向的信号,而输出较理想传声器阵列的噪声参考信号。进一步地,基于该噪声参考信号去除语音参考信号中残留的噪声的效果是特别希望的,以至输出质量较好的期望语音信号。
图3是本公开第三示例性实施例提供的语音信号处理方法的流程示意图。
在图2所示的第二示例性实施例的基础上,如图3所示,本公开提供的一种语音信号处理方法还可以包括如下步骤:
步骤301,获取声源位置信息和来自传声器阵列的至少两路声音信号。
在本示例性实施例中,步骤301的实现原理和实现过程可以参考第一示例性实施例中的步骤101的相关描述,为了简洁,在此不再进行详细描述。
步骤302,对至少两路声音信号进行快速傅立叶变换,得到语音信号矩阵。
在本示例性实施例中,步骤302的实现原理和实现过程可以参考第二示例性实施例中的步骤202的相关描述,为了简洁,在此不再进行详细描述。
步骤303,基于语音信号矩阵,通过预设的带方向约束的盲源分离算法,确 定传声器阵列的噪声参考信号。
在本示例性实施例中,步骤303的实现原理和实现过程可以参考第二示例性实施例中的步骤203的相关描述,为了简洁,在此不再进行详细描述。
步骤304,根据声源位置信息和语音信号矩阵,确定声源方向的导向向量。
该步骤的实现原理和过程可以参考第二示例性实施例中步骤203中关于导向向量的相关描述。
步骤305,根据导向向量确定第二滤波器,第二滤波器用于增强声源方向的语音信号。
在本公开实施例中,导向向量可以表征出声音信号的特性。进一步地,基于传声器阵列的声音信号的特性确定出第二滤波器。示例性地,第二滤波器可以与导向向量运算后,可以达到增强声源方向语音信号的目的即可。
步骤306,将第二滤波器作为带方向约束的盲源分离算法的第二组分离滤波器初值,以输出语音参考信号。
步骤307,基于噪声参考信号,去除语音参考信号中残留的噪声信号,以得到期望语音信号。
在本示例性实施例中,步骤307的实现原理和实现过程可以参考第二示例性实施例中的步骤104的相关描述,为了简洁,在此不再进行详细描述。
在本公开实施例中,添加DOA约束(提供的声源位置信息)可以为BSS算法提供较为理想的初值,从而提高BSS算法的分离效果,增强声源方向的语音信号。
为了本领域技术人员能够清楚、准确地理解本公开技术方案,下面结合系统结构图对上述实施例进一步描述说明,前述第一示例性实施例、第二示例性实施例和第三示例性实施例都可以基于图4所示的系统结构图实现。图4是本公开一示例性实施例提供的系统结构图。如图4所示,本公开提供一种语音信号处理方法,可以获取声源位置信息和来自传声器阵列的至少两路声音信号,基于本公开的带DOA约束的BSS算法分离得到语音参考信号Y ch1和噪声参考信号Y ch2,并基于噪声参考信号Y ch2通过例如自适应降噪滤波器对语音参考信号Y ch1进行降噪处理,最终输出期望语音信号Y。需要说明的是,图中仅示意了来自传声器阵列的两路声音信号,实际可以是两路或者两路以上的声音信号。
图5是本公开第四示例性实施例提供的语音信号处理方法的流程示意图。如图5所示,本公开提供的一种语音信号处理方法还包括如下步骤:
步骤501,获取声源位置信息和来自传声器阵列的至少两路声音信号。
在本示例性实施例中,步骤501的实现原理和实现过程可以参考第一示例性 实施例中的步骤101的相关描述,为了简洁,在此不再进行详细描述。
步骤502,根据声源位置信息,从至少两路声音信号中抑制来自声源方向的声音信号,以获得传声器阵列的噪声参考信号。
在本示例性实施例中,步骤502的实现原理、实现过程及技术效果可以参考前述第一示例性实施例中的步骤102、第二示例性实施例中的步骤202和步骤203的相关描述,为了简洁,在此不再进行详细描述。
步骤503,获取传声器阵列的位置信息。
在本公开实施例中,可以获取传声器阵列中相邻两个传声器的之间的距离信息,从而作为本步骤中传声器阵列的位置信息。对于传声器阵列的位置信息可以通过获取输入装置的输入信息以获取,也可以从传声器阵列自身的配置信息中获取到。在本公开实施例中对此不做限制。
步骤504,基于声源位置信息以及传声器阵列的位置信息,通过波束形成算法确定第三滤波器。
在本公开实施例中,可以通过固定波束形成算法确定第三滤波器。示例性地,传声器阵列中相邻两个传声器的之间的距离表示为d,且假设声源位于远场,波束到达阵列的入射角度为θ,声速为c,则第n个传声器与参考传声器之间的延时可表示为:F n(τ)=(n-1)τ=(n-1)d cos(θ)/c,进而对于入射角度为θ的确定,可以先采用广义互相关函数法估计声源到达参考传声器与另一个传声器的时延差,通过时延来计算距离差,再利用距离差和传声器阵列的空间几何位置来确定,示例性地,可以表示为
Figure PCTCN2020071205-appb-000003
从而确定出第三滤波器。
采用固定波束形成算法确定第三滤波器只是本公开的一个示例性实施例,但不限于采用固定波束形成算法一种算法实现,例如还可以采用自适应波束形成算法等实现。
步骤505,通过第三滤波器处理来自传声器阵列的声音信号,得到语音参考信号。
在本步骤中,将来自传声器阵列的声音信号输入步骤504中的第三滤波器,示例性地,声音信号中至少直接或间接地包括有声速等信息,进而可以确定出波束到达阵列的入射角度θ,即可确定出声源方向的声音信号,以输出语音参考信号。
步骤506,基于噪声参考信号,去除语音参考信号中残留的噪声信号,以得到期望语音信号。
在本示例性实施例中,步骤506的实现原理、实现过程及技术效果可以参考前述第一示例性实施例中的步骤104的相关描述,或者,可以通过前述第一示例性实施例中关于步骤104的实现方式实现,为了简洁,在此不再进行详细描述。
根据本公开实施例,结合声源位置信息确定出传声器阵列的噪声参考信号, 以及结合声源位置信息并采用波束形成算法确定出语音参考信号,可以降低语音参考信号的信号泄露,以及基于噪声参考信号进一步对语音参考信号进行降噪处理,可以进一步抑制期望语音信号中的干扰分量。
为了本领域技术人员能够清楚、准确地理解本公开技术方案,下面结合系统结构图对上述实施例进一步描述说明,前述第四示例性实施例可以基于图6所示的系统结构图实现。
图6是本公开另一示例性实施例提供的系统结构图。如图6所示的结构图,在获得声源位置信息和来自传声器阵列的至少两路声音信号后,可以分别通过波束形成算法和带DOA约束的BSS算法进行处理,示例性地,在可以获知相邻传声器的距离时即可确定声源方向,从而得到语音参考信号Y ch1,以及带DOA约束的BSS算法处理后确定出噪声参考信号Y ch2。因此,通过波束形成算法得到语音参考信号可以降低传声器阵列中传声器数目较多情况的影响(因此,特别适用于传声器阵列中传声器数目较多的场景),通过带DOA约束的BSS算法可以有效抑制声源方向的声音信号,从而获得特别希望(声源方向的声音信号分量较少)的噪声参考信号Y ch2。因此,基于该语音参考信号和噪声参考信号最终确定的期望语音信号Y,其噪声干扰分量得到减少,语音增强效果得到提高。
前述实施例详细地对本公开语音信号处理方法进行描述,本公开还提供了实现语音信号处理装置。下面将结合附图对语音信号处理装置进行描述,装置中涉及的模块、单元或者子单元,其功能与前述语音信号处理方法可以相互对应,其技术效果可以参考前述语音信号处理方法相关实施例。
图7是本公开第一示例性实施例提供的语音信号处理装置的结构示意图。如图7所示,本公开中语音信号处理装置700可以包括:第一获取模块710、声源抑制模块720、声源增强模块730和降噪模块740。
其中,第一获取模块710可以用于获取声源位置信息和来自传声器阵列的至少两路声音信号,声源抑制模块720可以用于根据第一获取模块710获取的声源位置信息,从至少两路声音信号中抑制来自声源方向的声音信号,以获得传声器阵列的噪声参考信号,声源增强模块730可以用于根据第一获取模块710声源位置信息,从至少两路声音信号中获取来自声源方向的声音信号,以得到语音参考信号,以及降噪模块740用于基于声源抑制模块720得到的噪声参考信号,去除语音参考信号中残留的噪声信号,以得到期望语音信号。
图8是本公开第二示例性实施例提供的语音信号处理装置的结构示意图。如图8所示,本公开中语音信号处理装置700可以包括:第一获取模块710、声源抑制模块720、声源增强模块730、降噪模块740。
其中,声源抑制模块720可以包括矩阵确定单元721和噪声确定单元722。矩阵确定单元721可以用于对至少两路声音信号进行快速傅立叶变换,得到语音 信号矩阵,噪声确定单元722用于基于语音信号矩阵,并通过预设的带方向约束的盲源分离算法,确定传声器阵列的噪声参考信号,其中,所述预设的带方向约束的盲源分离算法根据所述声源位置信息确定。
进一步地,噪声确定单元722还可以包括:向量确定子单元(图中未示出)、信号抑制子单元(图中未示出)和噪声确定子单元(图中未示出)。其中,向量确定子单元可以用于根据声源位置信息和语音信号矩阵,确定声源方向的导向向量;信号抑制子单元可以用于根据导向向量确定第一滤波器,第一滤波器用于抑制声源方向的语音信号;噪声确定子单元可以用于将第一滤波器作为带方向约束的盲源分离算法的第一组分离滤波器初值,以输出传声器阵列的噪声参考信号。
图9是本公开第三示例性实施例提供的语音信号处理装置的结构示意图。如图9所示,本公开提供的语音信号处理装置700中包括第一获取模块710、声源抑制模块720、声源增强模块730、降噪模块740。其中,声源增强模块730可以包括向量确定单元731、语音增强单元732和信号输出单元733。
向量确定单元731可以用于根据声源位置信息确定声源方向的导向向量,语音增强单元732可以用于根据所述导向向量确定第二滤波器,其中,第二滤波器用于增强所述声源方向的语音信号,信号输出单元733可以用于将第二滤波器作为带方向约束的盲源分离算法的第二组分离滤波器初值,以输出语音参考信号。
图10是本公开第四示例性实施例提供的语音信号处理装置的结构示意图。如图10所示,本公开提供的语音信号处理装置700还可以包括第二获取模块750,以及声源增强模块730可以包括滤波器确定单元734和信号处理单元735。
其中,第二获取模块750用于获取所述传声器阵列的位置信息,以及滤波器确定单元734可以用于基于声源位置信息以及传声器阵列的位置信息,通过波束形成算法确定第三滤波器;信号处理单元735用于通过第三滤波器处理来自传声器阵列的声音信号,得到语音参考信号。
图11是本公开第五示例性实施例提供的语音信号处理装置的结构示意图。如图11所示,在前述图7至图10所示实施例的基础,降噪模块740可以包括滤波单元741、运算单元742和系数调整单元743。
其中,滤波单元741可以用于通过自适应降噪滤波器对噪声参考信号进行滤波处理;运算单元742可以用于将语音参考信号与滤波处理后的噪声参考信号进行相减运算,以得到语音增强信号;系数调整单元743可以用于基于语音增强信号对自适应降噪滤波器的滤波系数进行调整,直到语音增强信号的强度大于预设强度为止,以得到期望语音信号。
本公开实施例提供的语音信号处理装置,结合声源位置信息,一方面抑制声源方向的声音信号以得到噪声参考信号,另一方面,获取声源方向的声音信号以得到语音参考信号,进而从语音参考信号中去除噪声信号,以实现降低噪声干扰, 提升语音增强效果的目的。
图12图示了根据本公开实施例的电子设备的框图。
如图12所示,电子设备11包括一个或多个处理器111和存储器112。
处理器111可以是中央处理单元(CPU)或者具有数据处理能力和/或指令执行能力的其他形式的处理单元,并且可以控制电子设备11中的其他组件以执行期望的功能。
存储器112可以包括一个或多个计算机程序产品,计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。非易失性存储器例如可以包括只读存储器(ROM)、硬盘、闪存等。在计算机可读存储介质上可以存储一个或多个计算机程序指令,处理器111可以运行程序指令,以实现上文所述的本公开的各个实施例的语音信号处理方法以及/或者其他期望的功能。在计算机可读存储介质中还可以存储诸如输入信号、信号分量、噪声分量等各种内容。
在一个示例中,电子设备11还可以包括:输入装置113和输出装置114,这些组件通过总线系统和/或其他形式的连接机构(未示出)互连。
例如,该输入装置113可以是上述的摄像头或传声器、传声器阵列等,用于捕捉图像或声源的输入信号。在该电子设备是单机设备时,该输入装置123可以是通信网络连接器,用于从神经网络处理器接收所采集的输入信号。
此外,该输入设备113还可以包括例如键盘、鼠标等等。
该输出装置114可以向外部输出各种信息,包括确定出的输出电压、输出电流信息等。该输出设备114可以包括例如显示器、扬声器、打印机、以及通信网络及其所连接的远程输出设备等等。
当然,为了简化,图12中仅示出了该电子设备11中与本公开有关的组件中的一些,省略了诸如总线、输入/输出接口等等的组件。除此之外,根据具体应用情况,电子设备11还可以包括任何其他适当的组件。
除了上述方法和设备以外,本公开的实施例还可以是计算机程序产品,其包括计算机程序指令,计算机程序指令在被处理器运行时使得处理器执行本说明书上述“示例性方法”部分中描述的根据本公开各种实施例的语音信号处理方法中的步骤。
计算机程序产品可以以一种或多种程序设计语言的任意组合来编写用于执行本公开实施例操作的程序代码,程序设计语言包括面向对象的程序设计语言,诸如Java、C++等,还包括常规的过程式程序设计语言,诸如“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、作为一个独立的软件包执行、部分在用户计算设备上部分在远程计算设备上 执行、或者完全在远程计算设备或服务器上执行。
此外,本公开的实施例还可以是计算机可读存储介质,其上存储有计算机程序指令,计算机程序指令在被处理器运行时使得处理器执行本说明书上述“示例性方法”部分中描述的根据本公开各种实施例的语音信号处理方法中的步骤。
计算机可读存储介质可以采用一个或多个可读介质的任意组合。可读介质可以是可读信号介质或者可读存储介质。可读存储介质例如可以包括但不限于电、磁、光、电磁、红外线、或半导体的系统、装置或器件,或者任意以上的组合。可读存储介质的更具体的例子(非穷举的列表)包括:具有一个或多个导线的电连接、便携式盘、硬盘、随机存取存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、光纤、便携式紧凑盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。
以上结合具体实施例描述了本公开的基本原理,但是,需要指出的是,在本公开中提及的优点、优势、效果等仅是示例而非限制,不能认为这些优点、优势、效果等是本公开的各个实施例必须具备的。另外,上述公开的具体细节仅是为了示例的作用和便于理解的作用,而非限制,上述细节并不限制本公开为必须采用上述具体的细节来实现。
本公开中涉及的器件、装置、设备、系统的方框图仅作为例示性的例子并且不意图要求或暗示必须按照方框图示出的方式进行连接、布置、配置。如本领域技术人员将认识到的,可以按任意方式连接、布置、配置这些器件、装置、设备、系统。诸如“包括”、“包含”、“具有”等等的词语是开放性词汇,指“包括但不限于”,且可与其互换使用。这里所使用的词汇“或”和“和”指词汇“和/或”,且可与其互换使用,除非上下文明确指示不是如此。这里所使用的词汇“诸如”指词组“诸如但不限于”,且可与其互换使用。
还需要指出的是,在本公开的装置、设备和方法中,各部件或各步骤是可以分解和/或重新组合的。这些分解和/或重新组合应视为本公开的等效方案。
提供所公开的方面的以上描述以使本领域的任何技术人员能够做出或者使用本公开。对这些方面的各种修改对于本领域技术人员而言是非常显而易见的,并且在此定义的一般原理可以应用于其他方面而不脱离本公开的范围。因此,本公开不意图被限制到在此示出的方面,而是按照与在此公开的原理和新颖的特征一致的最宽范围。
为了例示和描述的目的已经给出了以上描述。此外,此描述不意图将本公开的实施例限制到在此公开的形式。尽管以上已经讨论了多个示例方面和实施例,但是本领域技术人员将认识到其某些变型、修改、改变、添加和子组合。

Claims (20)

  1. 一种语音信号处理方法,包括:
    获取声源位置信息和来自传声器阵列的至少两路声音信号;
    根据所述声源位置信息,从所述至少两路声音信号中抑制来自声源方向的声音信号,以获得所述传声器阵列的噪声参考信号;
    根据所述声源位置信息,从所述至少两路声音信号中获取来自声源方向的声音信号,以得到语音参考信号;
    基于所述噪声参考信号,去除所述语音参考信号中残留的噪声信号,以得到期望语音信号。
  2. 根据权利要求1所述的方法,其中,所述根据所述声源位置信息,从所述至少两路声音信号中抑制来自声源方向的语音信号,以获得所述传声器阵列的噪声参考信号,包括:
    对所述至少两路声音信号进行快速傅立叶变换,得到语音信号矩阵;
    基于所述语音信号矩阵,通过预设的带方向约束的盲源分离算法,确定所述传声器阵列的噪声参考信号,其中,所述预设的带方向约束的盲源分离算法根据所述声源位置信息确定。
  3. 根据权利要求2所述的方法,其中,所述基于所述语音信号矩阵,通过预设的所述带方向约束的盲源分离算法,确定所述传声器阵列的噪声参考信号包括:
    根据所述声源位置信息和语音信号矩阵,确定所述声源方向的导向向量;
    根据所述导向向量确定第一滤波器,所述第一滤波器用于抑制所述声源方向的语音信号;
    将所述第一滤波器作为所述带方向约束的盲源分离算法的第一组分离滤波器初值,以输出所述传声器阵列的噪声参考信号。
  4. 根据权利要求1至3中任一项所述的方法,其中,根据所述声源位置信息,从所述至少两路声音信号中获取来自声源方向的声音信号,以获得语音参考信号,包括:
    根据所述声源位置信息和语音信号矩阵,确定所述声源方向的导向向量;
    根据所述导向向量确定第二滤波器,所述第二滤波器用于增强所述声源方向的语音信号;
    将所述第二滤波器作为所述带方向约束的盲源分离算法的第二组分离滤波器初值,以输出所述语音参考信号。
  5. 根据权利要求4所述的方法,其中,所述根据所述导向向量确定第二滤波器,包括:
    根据所述导向向量所表征的传声器阵列的声音信号的特性,确定第二滤波器。
  6. 根据权利要求1至5中任一项中所述的方法,其中,所述方法还包括:
    获取所述传声器阵列的位置信息;
    所述根据所述声源位置信息,从所述至少两路声音信号中获取来自声源方向的声音信号,以得到语音参考信号,还包括:
    基于所述声源位置信息以及所述传声器阵列的位置信息,通过波束形成算法确定第三滤波器;
    通过所述第三滤波器处理所述来自传声器阵列的声音信号,得到所述语音参考信号。
  7. 根据权利要求6所述的方法,其中,所述获取所述传声器阵列的位置信息,包括:
    获取传声器阵列中相邻两个传声器的之间的距离信息,作为传声器阵列的位置信息。
  8. 根据权利要求6或7所述的方法,其中,所述获取所述传声器阵列的位置信息,包括:
    通过输入装置的输入信息获取所述传声器阵列的位置信息。
  9. 根据权利要求6或7所述的方法,其中,所述获取所述传声器阵列的位置信息,包括:
    从所述传声器阵列自身的配置信息中获取所述传声器阵列的位置信息。
  10. 根据权利要求6至9中任一项所述的方法,其中,所述通过波束形成算法确定第三滤波器,包括:
    通过固定波束形成算法或自适应波束形成算法确定所述第三滤波器。
  11. 根据权利要求1至10中任一项所述的方法,其中,所述基于所述噪声参考信号,去除所述语音参考信号中残留的噪声信号,以得到期望语音信号,包括:
    通过自适应降噪滤波器对所述噪声参考信号进行滤波处理;
    将所述语音参考信号与滤波处理后的所述噪声参考信号进行相减运算,以得到语音增强信号;
    基于所述语音增强信号对所述自适应降噪滤波器的滤波系数进行调整,直到所述语音增强信号的强度大于预设强度为止,以得到所述期望语音信号。
  12. 根据权利要求1至11中任一项所述的方法,其中,所述获取声源位置信息,包括:
    通过图像采集装置进行图像采集,再对采集到的图像进行图像识别,以确定每个声源的位置,从而形成声源位置信息。
  13. 一种语音信号处理装置,包括:
    第一获取模块,用于获取声源位置信息和来自传声器阵列的至少两路声音信 号;
    声源抑制模块,用于根据所述声源位置信息,从所述至少两路声音信号中抑制来自声源方向的声音信号,以获得所述传声器阵列的噪声参考信号;
    声源增强模块,用于根据所述声源位置信息,从所述至少两路声音信号中获取来自声源方向的声音信号,以得到语音参考信号;
    降噪模块,用于基于所述噪声参考信号,去除所述语音参考信号中残留的噪声信号,以得到期望语音信号。
  14. 根据权利要求13所述的装置,其中,所述声源抑制模块包括:
    矩阵确定单元,用于对所述至少两路声音信号进行快速傅立叶变换,确定语音信号矩阵;
    噪声确定单元,用于基于所述语音信号矩阵,通过预设的所述带方向约束的盲源分离算法,确定所述传声器阵列的噪声参考信号,其中,所述预设的带方向约束的盲源分离算法根据所述声源位置信息确定。
  15. 根据权利要求14所述的装置,其中,所述噪声确定单元包括:
    向量确定子单元,用于根据声源位置信息和语音信号矩阵,确定声源方向的导向向量;
    信号抑制子单元,用于根据导向向量确定第一滤波器,所述第一滤波器用于抑制声源方向的语音信号;
    噪声确定子单元,用于将第一滤波器作为带方向约束的盲源分离算法的第一组分离滤波器初值,以输出传声器阵列的噪声参考信号。
  16. 根据权利要求13至15中任一项所述的装置,其中,所述声源增强模块包括:
    向量确定单元,用于根据声源位置信息确定所述声源方向的导向向量;
    语音增强单元,用于根据所述导向向量确定第二滤波器,其中所述第二滤波器用于增强所述声源方向的语音信号;
    信号输出单元,用于将所述第二滤波器作为带方向约束的盲源分离算法的第二组分离滤波器初值,以输出语音参考信号。
  17. 根据权利要求13至16中任一项所述的装置,还包括:
    第二获取模块,用于获取所述传声器阵列的位置信息;
    其中,所述声源增强模块包括:
    滤波器确定单元,用于基于所述声源位置信息以及所述传声器阵列的位置信息,通过波束形成算法确定第三滤波器;
    信号处理单元,用于通过所述第三滤波器处理来自所述传声器阵列的声音信号,得到语音参考信号。
  18. 根据权利要求13至17中任一项所述的装置,其中,所述降噪模块包括:
    滤波单元,用于通过自适应降噪滤波器对所述噪声参考信号进行滤波处理;
    运算单元,用于将所述语音参考信号与滤波处理后的所述噪声参考信号进行相减运算,以得到语音增强信号;
    系数调整单元,用于基于所述语音增强信号对所述自适应降噪滤波器的滤波系数进行调整,直到所述语音增强信号的强度大于预设强度为止,以得到期望语音信号。
  19. 一种计算机可读存储介质,所述存储介质存储有计算机程序,所述计算机程序用于执行上述权利要求1-12任一所述的语音信号处理方法。
  20. 一种电子设备,所述电子设备包括:
    处理器;
    用于存储所述处理器可执行指令的存储器;
    所述处理器用于执行上述权利要求1-12任一所述的语音信号处理方法。
PCT/CN2020/071205 2019-01-15 2020-01-09 语音信号处理方法、装置、计算机可读介质及电子设备 WO2020147642A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/352,748 US11817112B2 (en) 2019-01-15 2021-06-21 Method, device, computer readable storage medium and electronic apparatus for speech signal processing

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201910035553.3 2019-01-15
CN201910035553.3A CN111435598B (zh) 2019-01-15 2019-01-15 语音信号处理方法、装置、计算机可读介质及电子设备

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/352,748 Continuation US11817112B2 (en) 2019-01-15 2021-06-21 Method, device, computer readable storage medium and electronic apparatus for speech signal processing

Publications (1)

Publication Number Publication Date
WO2020147642A1 true WO2020147642A1 (zh) 2020-07-23

Family

ID=71580676

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/071205 WO2020147642A1 (zh) 2019-01-15 2020-01-09 语音信号处理方法、装置、计算机可读介质及电子设备

Country Status (3)

Country Link
US (1) US11817112B2 (zh)
CN (1) CN111435598B (zh)
WO (1) WO2020147642A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113132519A (zh) * 2021-04-14 2021-07-16 Oppo广东移动通信有限公司 电子设备、电子设备的语音识别方法及存储介质

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114255749A (zh) * 2021-04-06 2022-03-29 北京安声科技有限公司 扫地机器人
CN113194372B (zh) * 2021-04-27 2022-11-15 歌尔股份有限公司 一种耳机的控制方法、装置及相关组件
CN113362847A (zh) * 2021-05-26 2021-09-07 北京小米移动软件有限公司 音频信号处理方法及装置、存储介质
CN114363770B (zh) * 2021-12-17 2024-03-26 北京小米移动软件有限公司 通透模式下的滤波方法、装置、耳机以及可读存储介质
CN115881151B (zh) * 2023-01-04 2023-05-12 广州市森锐科技股份有限公司 一种基于高拍仪的双向拾音消噪方法、装置、设备及介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000047699A (ja) * 1998-07-31 2000-02-18 Toshiba Corp 雑音抑圧処理装置および雑音抑圧処理方法
CN1333994A (zh) * 1998-11-16 2002-01-30 伊利诺伊大学评议会 双路立体声信号处理技术
CN102164328A (zh) * 2010-12-29 2011-08-24 中国科学院声学研究所 一种用于家庭环境的基于传声器阵列的音频输入系统
CN104041073A (zh) * 2011-12-06 2014-09-10 苹果公司 近场零位与波束成形
CN106782589A (zh) * 2016-12-12 2017-05-31 奇酷互联网络科技(深圳)有限公司 移动终端及其语音输入方法和装置

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2007215163A (ja) * 2006-01-12 2007-08-23 Kobe Steel Ltd 音源分離装置,音源分離装置用のプログラム及び音源分離方法
US8898056B2 (en) * 2006-03-01 2014-11-25 Qualcomm Incorporated System and method for generating a separated signal by reordering frequency components
JP2008236077A (ja) * 2007-03-16 2008-10-02 Kobe Steel Ltd 目的音抽出装置,目的音抽出プログラム
JP4444345B2 (ja) * 2007-06-08 2010-03-31 本田技研工業株式会社 音源分離システム
DK2211563T3 (da) * 2009-01-21 2011-12-19 Siemens Medical Instr Pte Ltd Fremgangsmåde og apparat til blind kildeadskillelse til forbedring af interferensestimering ved binaural Weiner-filtrering
US9025782B2 (en) * 2010-07-26 2015-05-05 Qualcomm Incorporated Systems, methods, apparatus, and computer-readable media for multi-microphone location-selective processing
CN105845148A (zh) * 2016-03-16 2016-08-10 重庆邮电大学 基于频点修正的卷积盲源分离方法
CN107993671A (zh) * 2017-12-04 2018-05-04 南京地平线机器人技术有限公司 声音处理方法、装置和电子设备
CN108735227B (zh) * 2018-06-22 2020-05-19 北京三听科技有限公司 对麦克风阵列拾取的语音信号进行声源分离的方法及系统

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2000047699A (ja) * 1998-07-31 2000-02-18 Toshiba Corp 雑音抑圧処理装置および雑音抑圧処理方法
CN1333994A (zh) * 1998-11-16 2002-01-30 伊利诺伊大学评议会 双路立体声信号处理技术
CN102164328A (zh) * 2010-12-29 2011-08-24 中国科学院声学研究所 一种用于家庭环境的基于传声器阵列的音频输入系统
CN104041073A (zh) * 2011-12-06 2014-09-10 苹果公司 近场零位与波束成形
CN106782589A (zh) * 2016-12-12 2017-05-31 奇酷互联网络科技(深圳)有限公司 移动终端及其语音输入方法和装置

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113132519A (zh) * 2021-04-14 2021-07-16 Oppo广东移动通信有限公司 电子设备、电子设备的语音识别方法及存储介质
CN113132519B (zh) * 2021-04-14 2023-06-02 Oppo广东移动通信有限公司 电子设备、电子设备的语音识别方法及存储介质

Also Published As

Publication number Publication date
CN111435598B (zh) 2023-08-18
US11817112B2 (en) 2023-11-14
US20210312936A1 (en) 2021-10-07
CN111435598A (zh) 2020-07-21

Similar Documents

Publication Publication Date Title
WO2020147642A1 (zh) 语音信号处理方法、装置、计算机可读介质及电子设备
JP7011075B2 (ja) マイク・アレイに基づく対象音声取得方法及び装置
US10123113B2 (en) Selective audio source enhancement
WO2020103703A1 (zh) 一种音频数据处理方法、装置、设备及存储介质
US8583428B2 (en) Sound source separation using spatial filtering and regularization phases
US20130294611A1 (en) Source separation by independent component analysis in conjuction with optimization of acoustic echo cancellation
CN110211599B (zh) 应用唤醒方法、装置、存储介质及电子设备
CN110660407B (zh) 一种音频处理方法及装置
TW202008352A (zh) 方位角估計的方法、設備、語音交互系統及儲存介質
EP3839949A1 (en) Audio signal processing method and device, terminal and storage medium
CN110673096A (zh) 语音定位方法和装置、计算机可读存储介质、电子设备
CN112349292A (zh) 信号分离方法和装置、计算机可读存储介质、电子设备
TWI581255B (zh) 前端音頻處理系統
CN110890098B (zh) 盲信号分离方法、装置和电子设备
WO2022121182A1 (zh) 语音端点检测方法、装置、设备及计算机可读存储介质
CN110689900B (zh) 信号增强方法和装置、计算机可读存储介质、电子设备
CN113223552B (zh) 语音增强方法、装置、设备、存储介质及程序
CN111192569B (zh) 双麦语音特征提取方法、装置、计算机设备和存储介质
CN115662394A (zh) 语音提取方法、装置、存储介质及电子装置
CN112802490B (zh) 一种基于传声器阵列的波束形成方法和装置
US20240212701A1 (en) Estimating an optimized mask for processing acquired sound data
Ayrapetian et al. Asynchronous acoustic echo cancellation over wireless channels
CN113109763B (zh) 声源位置确定方法和装置、可读存储介质、电子设备
CN112151061B (zh) 信号排序方法和装置、计算机可读存储介质、电子设备
CN113362808B (zh) 一种目标方向语音提取方法、装置、电子设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20740991

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20740991

Country of ref document: EP

Kind code of ref document: A1