CN113707171B - Airspace filtering voice enhancement system and method - Google Patents
Airspace filtering voice enhancement system and method Download PDFInfo
- Publication number
- CN113707171B CN113707171B CN202111004913.7A CN202111004913A CN113707171B CN 113707171 B CN113707171 B CN 113707171B CN 202111004913 A CN202111004913 A CN 202111004913A CN 113707171 B CN113707171 B CN 113707171B
- Authority
- CN
- China
- Prior art keywords
- time
- sound pressure
- target voice
- frequency
- vibration velocity
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000001914 filtration Methods 0.000 title claims abstract description 29
- 239000002245 particle Substances 0.000 claims abstract description 56
- 238000002493 microarray Methods 0.000 claims description 21
- 238000001228 spectrum Methods 0.000 claims description 11
- 238000012545 processing Methods 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims description 3
- 238000009432 framing Methods 0.000 claims description 2
- 238000007781 pre-processing Methods 0.000 claims description 2
- 238000013461 design Methods 0.000 abstract description 6
- 230000000694 effects Effects 0.000 abstract description 6
- 230000005764 inhibitory process Effects 0.000 abstract description 4
- 230000008901 benefit Effects 0.000 description 4
- 230000008569 process Effects 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000007792 addition Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000001629 suppression Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L21/0232—Processing in the frequency domain
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Circuit For Audible Band Transducer (AREA)
- Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)
Abstract
The invention discloses a space domain filtering voice enhancement system and a space domain filtering voice enhancement method. The two-dimensional particle vibration velocity sensing units which are orthogonally placed can synchronously and independently measure the spatial particle vibration velocity components while the sound pressure sensing units measure the sound pressure components in the sound field. Practice proves that the sound source of the structure has high orientation precision, directivity is not influenced by the aperture of the array, the array structure is compact, the aperture can be in millimeter level, the structure can adapt to various environments and various noises, and has a certain inhibition effect on room reverberation, so that the voice enhancement effect is realized, and the problems of limited design and application scenes of the existing microphone array are effectively solved.
Description
Technical Field
The invention relates to the technical field of signal processing, in particular to a spatial filtering voice enhancement system and a method.
Background
In the process of acquiring the voice signal, under the conditions of no environmental noise and no inter-room reverberation and very close distance to a sound source, a single microphone can acquire the voice signal with high quality. But in fact, the unfixed sound source position, sound environment and other factors affect the quality of the microphone for picking up the voice signals, thereby reducing the voice intelligibility. Based on the limitation of single microphone pickup, a method for voice processing by a microphone array is introduced, and the method improves pickup quality and promotes the development of voice enhancement technology. Unlike single microphone, the array microphone has space selectivity, captures target signals in specific direction by means of electronic control, and simultaneously has a suppression effect on surrounding noise, but the microphone array is limited by half-wavelength theory, the larger the number of microphones, the larger the aperture, and the problems of spatial aliasing and high operation complexity are solved, so that the design freedom and application scene of the microphone array are greatly limited.
Disclosure of Invention
The invention provides a spatial filtering voice enhancement system and a spatial filtering voice enhancement method based on a particle vibration velocity sensor microarray, which are used for solving the problem of limited microphone array design and application scene in the prior art.
In a first aspect, the present invention provides a spatially filtered speech enhancement system based on a particle velocity sensor microarray, comprising: the device comprises a sound pressure sensitive unit, two-dimensional particle vibration velocity sensitive units and a processor, wherein the particle vibration velocity sensitive units are orthogonally arranged on two sides of the sound pressure sensitive unit;
the sound pressure sensitive unit is used for measuring the sound pressure component in the sound field;
the particle vibration velocity sensing unit is used for measuring the spatial particle vibration velocity component;
the processor is used for estimating the target voice direction according to the sound pressure component and the space particle vibration velocity component, setting the energy of a frequency point outside the target voice frequency domain to zero, obtaining time-frequency point data corresponding to the target voice azimuth, and obtaining a target voice signal according to the time-frequency point data.
Optionally, the sound pressure sensing unit is further configured to collect a sound pressure signal p (t) of a channel;
The particle vibration velocity sensing unit is also used for collecting time-frequency data v x (t) and v y (t) of particle vibration velocity signals of two channels;
Wherein x 1(t),x2 (t) is the sound pressure signal of the target voice and the interference voice, θ 1,θ2 is the horizontal azimuth angle of the target voice and the interference voice, the positive direction of the x axis is 0 °, and n p(t),nx (t) and n y (t) are the noise signals received by the sound pressure and particle vibration velocity sensitive unit.
Optionally, the processor is further configured to pre-process the sound pressure signal and the time-frequency data to obtain corresponding channel time-frequency spectrum data.
Optionally, the processor is further configured to frame-window the sound pressure signal and the time-frequency data to obtain a corresponding single-frame time domain signal p win(l)、vxwin(l)、vywin (L), where l=1, 2 …, L is a single-frame time domain signal length, and then perform short-time fourier transform to transform the single-frame time domain signal into a frequency domain signal;
Pwin(l,k)=fft(pwin(l))、Vxwin(l,k)=fft(vxwin(l))、Vywin(l,k)=fft(vywin(l)).
Optionally, the processor is further configured to estimate an angle interval of the directional sound source in the single frame of voice signal, obtain an angle distribution of energy, and calculate a sound source arrival angle estimate of any time-frequency point based on a trigonometric function relationship between time-frequency spectrum data.
Optionally, the processor is further configured to perform processing through a preset window function, set energy of a frequency point outside the target voice frequency domain to zero, and obtain time-frequency point data corresponding to the target voice azimuth.
Optionally, the processor is further configured to perform convolution operation with the obtained energy distribution in the full-angle space by using a rectangular window function, a gaussian window, a hanning window or a hamming window, so as to zero the frequency point energy outside the target voice frequency region, and obtain time-frequency point data corresponding to the target voice azimuth;
Where θ is the target azimuth and Δθ is the width of the rectangular window.
Optionally, the processor is further configured to perform IFFT on time-frequency point data corresponding to the target speech azimuth to a corresponding time-domain signal, and splice the target speech signal after spatial domain filtering by using an overlap-add method.
In a second aspect, the present invention provides a method for spatial filtering speech enhancement using the system of any one of the preceding claims, the method comprising:
Based on the measured target voice and interference voice azimuth information, spatial filtering is carried out on time spectrum data of each channel of the mass vibration velocity sensor microarray, time frequency point data corresponding to the target voice azimuth is obtained, IFFT is carried out on the time frequency point data corresponding to the target voice azimuth, corresponding time domain signals are obtained, and the spatial-domain filtered target voice signals are spliced by adopting a superposition method.
In a third aspect, the present invention provides a computer readable storage medium storing a computer program of signal mapping, which when executed by at least one processor, implements a method of spatial filtering speech enhancement as described in any of the above.
The invention has the following beneficial effects:
The invention is composed of a sound pressure sensitive unit and two-dimensional particle vibration velocity sensitive units. The two-dimensional particle vibration velocity sensing units which are orthogonally placed can synchronously and independently measure the spatial particle vibration velocity components while the sound pressure sensing units measure the sound pressure components in the sound field. Practice proves that the sound source of the structure has high orientation precision, directivity is not influenced by the aperture of the array, the array structure is compact, the aperture can be in millimeter level, the structure can adapt to various environments and various noises, and has a certain inhibition effect on room reverberation, so that the voice enhancement effect is realized, and the problems of limited design and application scenes of the existing microphone array are effectively solved.
The foregoing description is only an overview of the present invention, and is intended to be implemented in accordance with the teachings of the present invention in order that the same may be more clearly understood and to make the same and other objects, features and advantages of the present invention more readily apparent.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to designate like parts throughout the figures. In the drawings:
FIG. 1 is a schematic diagram of a spatial filtering speech enhancement system based on a particle velocity sensor microarray according to an embodiment of the present invention;
FIG. 2 is a flow chart of a method for spatially filtered speech enhancement based on a particle velocity sensor microarray according to an embodiment of the present invention.
Detailed Description
Aiming at the problems that the number of microphones is larger, the aperture is larger, and the airspace aliasing and operation complexity are high, the embodiment of the invention designs an airspace filtering voice enhancement system based on a particle vibration velocity sensor microarray, which consists of a sound pressure sensitive unit and two-dimensional particle vibration velocity sensitive units, wherein the sound pressure sensitive unit is used for measuring the sound pressure component in a sound field, and the two-dimensional particle vibration velocity sensitive units which are orthogonally arranged can be used for synchronously and independently measuring the space particle vibration velocity component. Practice proves that the sound source of the structure has high orientation precision, directivity is not influenced by the aperture of the array, the array structure is compact, the aperture can be in millimeter level, the structure can adapt to various environments and various noises, and the structure has a certain inhibition effect on room reverberation. The present invention will be described in further detail with reference to the accompanying drawings and examples. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
A first embodiment of the present invention provides a spatially filtered speech enhancement system based on a particle velocity sensor microarray, see fig. 1, comprising: the device comprises a sound pressure sensitive unit, two-dimensional particle vibration velocity sensitive units and a processor, wherein the particle vibration velocity sensitive units are orthogonally arranged on two sides of the sound pressure sensitive unit;
the sound pressure sensitive unit is used for measuring the sound pressure component in the sound field;
the particle vibration velocity sensing unit is used for measuring the spatial particle vibration velocity component;
the processor is used for estimating the target voice direction according to the sound pressure component and the space particle vibration velocity component, setting the energy of a frequency point outside the target voice frequency domain to zero, obtaining time-frequency point data corresponding to the target voice azimuth, and obtaining a target voice signal according to the time-frequency point data.
That is, the embodiment of the invention designs a airspace filtering voice enhancement system based on a particle vibration velocity sensor microarray aiming at the problems that the number of microphones is larger, the aperture is larger and the airspace aliasing and the operation complexity are high because the existing microphone array is limited by a half-wavelength theory, and the system consists of a sound pressure sensitive unit and two-dimensional particle vibration velocity sensitive units, wherein the two-dimensional particle vibration velocity sensitive units which are orthogonally arranged can synchronously and independently measure the spatial particle vibration velocity components while the sound pressure sensitive units measure the sound pressure components in a sound field. Therefore, the original precision sound source orientation is ensured, the directivity is not influenced by the array aperture, the array structure is compact, the array structure can adapt to various environments and various noises, and the array structure has a certain inhibition effect on room reverberation.
In a specific implementation, the sound pressure sensing unit in the embodiment of the present invention is further configured to collect a sound pressure signal p (t) of a channel; the particle vibration velocity sensing unit is also used for collecting time-frequency data v x (t) and v y (t) of particle vibration velocity signals of two channels;
Wherein x 1(t),x2 (t) is the sound pressure signal of the target voice and the interference voice, θ 1,θ2 is the horizontal azimuth angle of the target voice and the interference voice, the positive direction of the x axis is 0 °, and n p(t),nx (t) and n y (t) are the noise signals received by the sound pressure and particle vibration velocity sensitive unit.
The processor is used for preprocessing the sound pressure signal and the time-frequency data to obtain corresponding channel time-frequency spectrum data.
Specifically, the processor in the embodiment of the present invention performs frame windowing on the sound pressure signal and the time-frequency data to obtain a corresponding single-frame time domain signal p win(l)、vxwin(l)、vywin (L), where l=1, 2 …, L and L are the lengths of the single-frame time domain signals, and performs short-time fourier transform to transform the single-frame time domain signal into a frequency domain signal ;Pwin(l,k)=fft(pwin(l))、Vxwin(l,k)=fft(vxwin(l))、Vywin(l,k)=fft(vywin(l)).
Further, the processor in the embodiment of the present invention is further configured to estimate an angle interval of a directional sound source in a single frame of speech signal, obtain an angle distribution of energy, and calculate a sound source arrival angle estimate of any time-frequency point based on a trigonometric function relationship between time-frequency spectrum data. And setting the energy of the frequency point outside the target voice frequency domain to zero through the preset window function processing to obtain time-frequency point data corresponding to the target voice azimuth.
The processor performs convolution operation by using a rectangular window function, a Gaussian window, a Hanning window or a Hamming window and the obtained energy distribution of the full-angle space so as to zero the energy of frequency points outside the target voice frequency region and obtain time-frequency point data corresponding to the target voice azimuth;
Where θ is the target azimuth and Δθ is the width of the rectangular window.
And finally, performing IFFT on time-frequency point data corresponding to the target voice azimuth through the processor to corresponding time domain signals, and splicing the target voice signals after space domain filtering by adopting an overlap-add method.
Correspondingly, the embodiment of the invention also provides a method for performing spatial filtering voice enhancement by applying the system, which comprises the following steps:
Based on the measured target voice and interference voice azimuth information, spatial filtering is carried out on time spectrum data of each channel of the mass vibration velocity sensor microarray, time frequency point data corresponding to the target voice azimuth is obtained, IFFT is carried out on the time frequency point data corresponding to the target voice azimuth, corresponding time domain signals are obtained, and the spatial-domain filtered target voice signals are spliced by adopting a superposition method.
In order to better explain the method of the present invention, the following will explain the method of the present invention in detail by way of a specific example:
The embodiment of the invention researches a voice enhancement algorithm based on spatial filtering by utilizing the trigonometric function relation among all received components of a particle vibration velocity sensor microarray and the time-frequency sparse characteristic of voice signals, and as shown in fig. 2, the specific working flow of the method comprises the following steps:
Step one, arranging a particle vibration velocity sensor microarray and collecting voice signals;
In specific implementation, the method specifically comprises the following steps: the micro array arrangement of the particle vibration velocity sensor is shown in fig. 1, and the sampling rate is 16kHz, so that the time-frequency data v x、vy of a sound pressure signal p of one channel and the two-channel particle vibration velocity signal can be obtained.
Wherein x 1(t),x2 (t) is the sound pressure signal of the target voice and the interference voice respectively, θ 1,θ2 is the horizontal azimuth angle of the target voice and the interference voice respectively, the positive direction of the x-axis is 0 °, and n p(t),nx (t) and n y (t) are the sound pressure and the noise signal received by the two-dimensional particle vibration velocity sensitive unit respectively.
Step two, carrying out framing and windowing processing on the microarray output data of the mass point vibration velocity sensor, and obtaining time-frequency data of a sound pressure signal of one channel and a mass point vibration velocity signal of two channels through short-time Fourier transformation;
To obtain an approximately smooth speech signal, the p, v x、vy signals in the previous step are windowed in frames to obtain a corresponding single-frame time-domain signal p win(l)、vxwin(l)、vywin (L) (l=1, 2 …, L is the single-frame time-domain signal length). Then, the single-frame time domain signal is transformed into a frequency domain signal by short-time Fourier transformation. The specific parameters were selected as follows: window function: hanning window, window length k=1024 sampling points; frame shifting by 50%; a fourier transform point k; and obtaining spectrum data of the corresponding channel.
Pwin(l,k)=fft(pwin(l))
Vxwin(l,k)=fft(vxwin(l))
Vywin(l,k)=fft(vywin(l))
Thirdly, obtaining high signal-to-noise ratio angle estimation of any time frequency point by utilizing a trigonometric function relation between a particle vibration velocity sensor microarray one-channel sound pressure signal and two-channel particle vibration velocity signals, and obtaining full-angle space energy distribution;
The purpose of this step is to estimate the angular interval of the directional sound source in the single frame speech signal and to obtain the angular distribution of the energy. According to step 1, the single-frame frequency domain noisy speech signal P win(k,l)、Vxwin(k,l)、Vywin (k, l) obtained in step 2.
And obtaining the sound source arrival angle estimation of any time-frequency point by utilizing the trigonometric function relation among the frequency spectrum data of each channel of the particle vibration velocity sensor microarray.
The known voice signals have good sparse characteristics in a time-frequency domain, when a plurality of voices in different directions exist, different voice signals have discrete distribution in the time-frequency domain, and the full-angle space energy distribution of the voice signals is obtained according to the sound source arrival angle estimation of any time-frequency point.
In other words, in the third step of the invention, when the directional environment interference voice exists, the triangular function relation among all channels of the particle vibration velocity sensor microarray is utilized to obtain the high signal-to-noise ratio angle estimation of any time-frequency point based on the time-frequency domain sparse characteristic of the voice signal, and the full angle space energy distribution can be further obtained.
Step four, convolution operation is carried out on the full-angle space energy distribution of the current frame signal of the mass point vibration velocity sensor microarray by utilizing rectangular window functions (window functions such as a Gaussian window, a Hanning window, a Hamming window and the like can be selected), and time-frequency point data corresponding to the target voice azimuth is obtained;
The energy distribution of the current frame signal in the full angle space is obtained through the steps. And performing convolution operation by using a rectangular window function (window functions such as a Gaussian window, a Hanning window, a Hamming window and the like) and the obtained energy distribution in the full-angle space, so as to zero the energy of the frequency point outside the target voice frequency region and obtain time-frequency point data corresponding to the target voice azimuth.
Where θ is the target azimuth and Δθ is the width of the rectangular window, typically taking Δθ=5°
The fourth step of the invention is to set the energy of the frequency point outside the target voice frequency area to zero based on the rectangular window space domain filtering (the window functions such as Gaussian window, hanning window, hamming window, etc. can also be selected), and obtain the time-frequency point data corresponding to the target voice azimuth.
And fifthly, acquiring time-frequency point data corresponding to the target voice azimuth, and splicing the target voice signals after the space domain filtering by adopting a superposition method after the inverse Fourier transform.
The final filtering data are obtained through the steps, IFFT is carried out on the final filtering data, corresponding time domain signals can be obtained, and the target voice signals after space domain filtering are spliced by adopting a superposition method.
In general, the spatial filtering speech enhancement algorithm of the particle velocity sensor microarray according to the embodiments of the present invention is based on a special particle velocity sensor microarray, and is composed of a sound pressure sensing unit and two-dimensional particle velocity sensing units. The particle vibration velocity sensing units are orthogonally arranged on two sides of the sound pressure sensing unit, particularly shown in fig. 1, the sound source orientation precision of the structure is high, directivity is not affected by the aperture of the array, the array structure is compact, and the aperture can be in millimeter level.
In addition, the embodiment of the invention realizes the orientation of target voice and interference voice based on the trigonometric function relation between the sound pressure signal of one channel and the particle vibration velocity signals of two channels. Then, the spatial filtering voice enhancement algorithm based on rectangular window filtering solves the problem that interference and noise in all directions are suppressed under the condition that the number of spatial sound sources and the sound source directions are unknown, and achieves target voice enhancement. Practice proves that the method has the advantages of high robustness, good reliability, strong practicability and the like.
A second embodiment of the present invention provides a computer-readable storage medium storing a computer program of signal mapping, which when executed by at least one processor, implements the method of spatial domain filtered speech enhancement according to any of the first embodiments of the present invention.
The relevant content of the embodiments of the present invention can be understood with reference to the first embodiment of the present invention, and will not be discussed in detail herein.
Although the preferred embodiments of the present invention have been disclosed for illustrative purposes, those skilled in the art will appreciate that various modifications, additions and substitutions are possible, and accordingly the scope of the invention is not limited to the embodiments described above.
Claims (4)
1. A particle velocity sensor microarray-based spatial filtering speech enhancement system, comprising: the device comprises a sound pressure sensitive unit, two-dimensional particle vibration velocity sensitive units and a processor, wherein the particle vibration velocity sensitive units are orthogonally arranged on two sides of the sound pressure sensitive unit;
the sound pressure sensitive unit is used for measuring the sound pressure component in the sound field and collecting a sound pressure signal p (t) of a channel;
The particle vibration velocity sensing unit is used for measuring space particle vibration velocity components and collecting time-frequency data v x (t) and v y (t) of particle vibration velocity signals of two channels;
Wherein x 1(t),x2 (t) is the sound pressure signal of the target voice and the interference voice respectively, θ 1,θ2 is the horizontal azimuth angle of the target voice and the interference voice respectively, the positive direction of the x axis is 0 °, and n p(t),nx (t) and n y (t) are the noise signals received by the sound pressure and particle vibration velocity sensitive unit respectively;
the processor is used for preprocessing the sound pressure signal and the time-frequency data to obtain corresponding channel time-frequency spectrum data; the method specifically comprises the following steps:
The sound pressure signal and the time frequency data are subjected to framing and windowing to obtain a corresponding single-frame time domain signal p win(l)、vxwin(l)、vywin (L), wherein l=1, 2 …, L and L are the length of the single-frame time domain signal, and then short-time Fourier transformation is carried out to transform the single-frame time domain signal into a frequency domain signal;
Pwin(l,k)=fft(pwin(l))、Vxwin(l,k)=fft(vxwin(l))、Vywin(l,k)=fft(vywin(l));
Estimating an angle interval of a directional sound source in a single frame of voice signal, obtaining energy angle distribution, and obtaining sound source arrival angle estimation of any time-frequency point based on a trigonometric function relation between time-frequency spectrum data;
Setting the energy of a frequency point outside the target voice frequency domain to be zero through the processing of a preset window function, and obtaining time-frequency point data corresponding to the target voice azimuth;
Performing convolution operation by using a rectangular window function, a Gaussian window, a Hanning window or a Hamming window and the obtained energy distribution of the full-angle space so as to zero the energy of frequency points outside the target voice frequency region and obtain time-frequency point data corresponding to the target voice azimuth;
Wherein θ is the target azimuth, Δθ is the width of the rectangular window; and estimating the target voice direction according to the sound pressure component and the space particle vibration velocity component, setting the energy of a frequency point outside the target voice frequency domain to zero, obtaining time-frequency point data corresponding to the target voice azimuth, and obtaining a target voice signal according to the time-frequency point data.
2. The system of claim 1, wherein the system further comprises a controller configured to control the controller,
The processor is further configured to perform IFFT on time-frequency point data corresponding to the target speech azimuth to a corresponding time-domain signal, and splice the target speech signal after space-domain filtering by using an overlap-add method.
3. A method of spatial filtered speech enhancement using the system of any of claims 1-2, the method comprising:
Based on the measured target voice and interference voice azimuth information, spatial filtering is carried out on time spectrum data of each channel of the mass vibration velocity sensor microarray, time frequency point data corresponding to the target voice azimuth is obtained, IFFT is carried out on the time frequency point data corresponding to the target voice azimuth, corresponding time domain signals are obtained, and the spatial-domain filtered target voice signals are spliced by adopting a superposition method.
4. A computer readable storage medium storing a computer program of signal mapping, which when executed by at least one processor, implements the method of spatial filtered speech enhancement of claim 3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111004913.7A CN113707171B (en) | 2021-08-30 | 2021-08-30 | Airspace filtering voice enhancement system and method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111004913.7A CN113707171B (en) | 2021-08-30 | 2021-08-30 | Airspace filtering voice enhancement system and method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN113707171A CN113707171A (en) | 2021-11-26 |
CN113707171B true CN113707171B (en) | 2024-05-14 |
Family
ID=78656897
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111004913.7A Active CN113707171B (en) | 2021-08-30 | 2021-08-30 | Airspace filtering voice enhancement system and method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113707171B (en) |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107369460A (en) * | 2017-07-31 | 2017-11-21 | 深圳海岸语音技术有限公司 | Speech sound enhancement device and method based on acoustics vector sensor space sharpening technique |
CN113160843A (en) * | 2021-03-23 | 2021-07-23 | 中国电子科技集团公司第三研究所 | Particle vibration velocity sensor microarray-based interference voice suppression method and device |
-
2021
- 2021-08-30 CN CN202111004913.7A patent/CN113707171B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107369460A (en) * | 2017-07-31 | 2017-11-21 | 深圳海岸语音技术有限公司 | Speech sound enhancement device and method based on acoustics vector sensor space sharpening technique |
CN113160843A (en) * | 2021-03-23 | 2021-07-23 | 中国电子科技集团公司第三研究所 | Particle vibration velocity sensor microarray-based interference voice suppression method and device |
Also Published As
Publication number | Publication date |
---|---|
CN113707171A (en) | 2021-11-26 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107221336B (en) | Device and method for enhancing target voice | |
US9984702B2 (en) | Extraction of reverberant sound using microphone arrays | |
CN104076331B (en) | A kind of sound localization method of seven yuan of microphone arrays | |
WO2020108614A1 (en) | Audio recognition method, and target audio positioning method, apparatus and device | |
JP6129316B2 (en) | Apparatus and method for providing information-based multi-channel speech presence probability estimation | |
WO2015196729A1 (en) | Microphone array speech enhancement method and device | |
JP4812302B2 (en) | Sound source direction estimation system, sound source direction estimation method, and sound source direction estimation program | |
CN107369460B (en) | Voice enhancement device and method based on acoustic vector sensor space sharpening technology | |
CN110706719B (en) | Voice extraction method and device, electronic equipment and storage medium | |
WO2015065682A1 (en) | Selective audio source enhancement | |
MX2014006499A (en) | Apparatus and method for microphone positioning based on a spatial power density. | |
Cobos et al. | Two-microphone multi-speaker localization based on a Laplacian mixture model | |
JP6225245B2 (en) | Signal processing apparatus, method and program | |
Huleihel et al. | Spherical array processing for acoustic analysis using room impulse responses and time-domain smoothing | |
WO2016065011A1 (en) | Reverberation estimator | |
CN111798869B (en) | Sound source positioning method based on double microphone arrays | |
CN113687305A (en) | Method, device and equipment for positioning sound source azimuth and computer readable storage medium | |
Pertilä | Online blind speech separation using multiple acoustic speaker tracking and time–frequency masking | |
CN113707171B (en) | Airspace filtering voice enhancement system and method | |
Cobos et al. | Two-microphone separation of speech mixtures based on interclass variance maximization | |
Guo et al. | A two-microphone based voice activity detection for distant-talking speech in wide range of direction of arrival | |
Raikar et al. | Effect of Microphone Position Measurement Error on RIR and its Impact on Speech Intelligibility and Quality. | |
Firoozabadi et al. | Combination of nested microphone array and subband processing for multiple simultaneous speaker localization | |
Astapov et al. | Far field speech enhancement at low SNR in presence of nonstationary noise based on spectral masking and MVDR beamforming | |
Firoozabadi et al. | Speakers counting by proposed nested microphone array in combination with limited space SRP |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |