CN113707171A - Spatial domain filtering speech enhancement system and method - Google Patents

Spatial domain filtering speech enhancement system and method Download PDF

Info

Publication number
CN113707171A
CN113707171A CN202111004913.7A CN202111004913A CN113707171A CN 113707171 A CN113707171 A CN 113707171A CN 202111004913 A CN202111004913 A CN 202111004913A CN 113707171 A CN113707171 A CN 113707171A
Authority
CN
China
Prior art keywords
time
target voice
sound pressure
frequency
signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111004913.7A
Other languages
Chinese (zh)
Other versions
CN113707171B (en
Inventor
王笑楠
李光
刘云飞
周瑜
冯杰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Third Research Institute Of China Electronics Technology Group Corp
Original Assignee
Third Research Institute Of China Electronics Technology Group Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Third Research Institute Of China Electronics Technology Group Corp filed Critical Third Research Institute Of China Electronics Technology Group Corp
Priority to CN202111004913.7A priority Critical patent/CN113707171B/en
Publication of CN113707171A publication Critical patent/CN113707171A/en
Application granted granted Critical
Publication of CN113707171B publication Critical patent/CN113707171B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Measurement Of Velocity Or Position Using Acoustic Or Ultrasonic Waves (AREA)

Abstract

The invention discloses a spatial filtering speech enhancement system and a spatial filtering speech enhancement method. The sound pressure sensitive unit measures sound pressure components in a sound field, and the two-dimensional particle vibration velocity sensitive units which are orthogonally arranged can synchronously and independently measure spatial particle vibration velocity components. Practice proves that the sound source with the structure has high directional precision, directivity is not influenced by the aperture of the array, the array structure is compact, the aperture can be in millimeter level, the microphone can adapt to various environments and various noises, room reverberation is restrained to a certain degree, a voice enhancement effect is achieved, and the problem that the existing microphone array is limited in design and application scenes is effectively solved.

Description

Spatial domain filtering speech enhancement system and method
Technical Field
The invention relates to the technical field of signal processing, in particular to a spatial filtering speech enhancement system and method.
Background
In the process of acquiring the voice signal, under the conditions of no environmental noise, no room reverberation and close distance from a sound source, the single microphone can acquire the high-quality voice signal. In fact, the quality of the speech signal picked up by the microphone is affected by unfixed factors such as the sound source position and the sound environment, and the speech intelligibility is further reduced. Based on the limitation of single microphone sound pickup, a method for processing voice by using a microphone array is introduced, the method improves the sound pickup quality and promotes the development of a voice enhancement technology. Different from a single microphone, the array microphone has space selectivity, and a target signal in a specific direction is captured by an electronic control method, and meanwhile, the suppression effect on surrounding noise is achieved, but the microphone array is limited by a half-wavelength theory, the larger the number of the microphones is, the larger the aperture is, and the problems of spatial domain aliasing and high operation complexity are solved, so that the design freedom and the application scene of the microphone array are greatly limited.
Disclosure of Invention
The invention provides a spatial filtering speech enhancement system and method based on a particle vibration velocity sensor micro-array, and aims to solve the problem that the design and application scenes of a microphone array are limited in the prior art.
In a first aspect, the present invention provides a spatial filtering speech enhancement system based on a particle velocity sensor microarray, comprising: the device comprises a sound pressure sensitive unit, two-dimensional particle vibration velocity sensitive units and a processor, wherein the particle vibration velocity sensitive units are orthogonally arranged on two sides of the sound pressure sensitive unit;
the sound pressure sensitive unit is used for measuring sound pressure components in a sound field;
the particle vibration velocity sensing unit is used for measuring the spatial particle vibration velocity component;
and the processor is used for estimating the direction of the target voice according to the sound pressure component and the vibration velocity component of the spatial particles, setting the frequency point energy outside the frequency domain of the target voice to zero to obtain time-frequency point data corresponding to the direction of the target voice, and obtaining the target voice signal according to the time-frequency point data.
Optionally, the sound pressure sensing unit is further configured to acquire a sound pressure signal p (t) of a channel;
the particle vibration velocity sensitive unit is also used forCollecting time-frequency data v of particle vibration velocity signals of two channelsx(t) and vy(t);
Figure BDA0003236829680000021
Wherein x is1(t),x2(t) sound pressure signals, θ, for the target and interfering speech, respectively12Respectively the horizontal azimuth angle of the target voice and the interference voice, and the positive direction of the x axis is 0 DEG, np(t),nx(t) and nyAnd (t) are respectively the sound pressure and the noise signal received by the particle vibration velocity sensitive unit.
Optionally, the processor is further configured to pre-process the sound pressure signal and the time-frequency data to obtain time-frequency spectrum data of a corresponding channel.
Optionally, the processor is further configured to perform frame windowing on the sound pressure signal and the time-frequency data to obtain a corresponding single-frame time-domain signal pwin(l)、vxwin(l)、vywin(l) L is 1, 2 …, L is the length of the single-frame time domain signal, and then short-time fourier transform is performed to transform the single-frame time domain signal into a frequency domain signal;
Pwin(l,k)=fft(pwin(l))、Vxwin(l,k)=fft(vxwin(l))、Vywin(l,k)=fft(vywin(l))。
optionally, the processor is further configured to estimate an angle interval of a directional sound source in the single frame of speech signal, obtain an angle distribution of energy, and obtain a sound source arrival angle estimation of any time frequency point based on a trigonometric function relationship between time-frequency spectrum data.
Optionally, the processor is further configured to set the frequency point energy outside the target voice frequency domain to zero through preset window function processing, so as to obtain time-frequency point data corresponding to the target voice azimuth.
Optionally, the processor is further configured to perform convolution operation on the energy distribution of the full-angle space obtained by using a rectangular window function, a gaussian window, a hanning window or a hamming window, so as to set the frequency point energy outside the target voice frequency region to zero, and obtain time-frequency point data corresponding to the target voice azimuth;
Figure BDA0003236829680000031
where θ is the target azimuth and Δ θ is the width of the rectangular window.
Optionally, the processor is further configured to perform IFFT on the time-frequency point data corresponding to the target voice azimuth to obtain a corresponding time-domain signal, and splice the spatial-domain-filtered target voice signal by using a splice-add method.
In a second aspect, the present invention provides a method for spatial filtering speech enhancement using the system of any one of the above, the method comprising:
and performing space-domain filtering on the time-frequency spectrum data of each channel of the prime point vibration velocity sensor microarray based on the measured target voice and interference voice azimuth information to obtain time-frequency point data corresponding to the target voice azimuth, performing IFFT on the time-frequency point data corresponding to the target voice azimuth to obtain corresponding time-domain signals, and splicing the target voice signals subjected to space-domain filtering by adopting a splicing and adding method.
In a third aspect, the present invention provides a computer-readable storage medium storing a signal-mapped computer program which, when executed by at least one processor, implements any of the above-described methods of spatial filtering speech enhancement.
The invention has the following beneficial effects:
the invention is composed of a sound pressure sensitive unit and two-dimensional particle vibration velocity sensitive units. The sound pressure sensitive unit measures sound pressure components in a sound field, and the two-dimensional particle vibration velocity sensitive units which are orthogonally arranged can synchronously and independently measure spatial particle vibration velocity components. Practice proves that the sound source with the structure has high directional precision, directivity is not influenced by the aperture of the array, the array structure is compact, the aperture can be in millimeter level, the microphone can adapt to various environments and various noises, room reverberation is restrained to a certain degree, a voice enhancement effect is achieved, and the problem that the existing microphone array is limited in design and application scenes is effectively solved.
The foregoing description is only an overview of the technical solutions of the present invention, and the embodiments of the present invention are described below in order to make the technical means of the present invention more clearly understood and to make the above and other objects, features, and advantages of the present invention more clearly understandable.
Drawings
Various other advantages and benefits will become apparent to those of ordinary skill in the art upon reading the following detailed description of the preferred embodiments. The drawings are only for purposes of illustrating the preferred embodiments and are not to be construed as limiting the invention. Also, like reference numerals are used to refer to like parts throughout the drawings. In the drawings:
FIG. 1 is a schematic structural diagram of a spatial filtering speech enhancement system based on a particle velocity sensor microarray according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart of a spatial filtering speech enhancement method based on a particle velocity sensor microarray according to an embodiment of the present invention.
Detailed Description
The embodiment of the invention designs a space domain filtering voice enhancement system based on a particle vibration velocity sensor microarray, aiming at the problems that the existing microphone array is limited by a half-wavelength theory, the aperture is larger when the number of microphones is larger, and the space domain aliasing and the operation complexity are high. Practice proves that the sound source with the structure has high directional precision, directivity is not influenced by the array aperture, the array structure is compact, the aperture can be in millimeter level, the structure can adapt to various environments and various noises, and a certain suppression effect is achieved on room reverberation. The present invention will be described in further detail below with reference to the drawings and examples. It should be understood that the specific embodiments described herein are merely illustrative of the invention and do not limit the invention.
The first embodiment of the present invention provides a spatial filtering speech enhancement system based on a particle velocity sensor microarray, referring to fig. 1, the system includes: the device comprises a sound pressure sensitive unit, two-dimensional particle vibration velocity sensitive units and a processor, wherein the particle vibration velocity sensitive units are orthogonally arranged on two sides of the sound pressure sensitive unit;
the sound pressure sensitive unit is used for measuring sound pressure components in a sound field;
the particle vibration velocity sensing unit is used for measuring the spatial particle vibration velocity component;
and the processor is used for estimating the direction of the target voice according to the sound pressure component and the vibration velocity component of the spatial particles, setting the frequency point energy outside the frequency domain of the target voice to zero to obtain time-frequency point data corresponding to the direction of the target voice, and obtaining the target voice signal according to the time-frequency point data.
That is to say, the embodiment of the present invention is directed to the problem that the existing microphone array is limited by the half-wavelength theory, the larger the number of microphones is, the larger the aperture is, and the problem of high spatial aliasing and computation complexity is caused, and a spatial filtering speech enhancement system based on the particle vibration velocity sensor microarray is designed, and the system is composed of a sound pressure sensitive unit and two-dimensional particle vibration velocity sensitive units, and when the sound pressure sensitive unit measures the sound pressure component in the sound field, the two-dimensional particle vibration velocity sensitive units placed orthogonally can measure the spatial particle vibration velocity component synchronously and independently. Therefore, the orientation of the sound source with the draft precision is ensured, the directivity is not influenced by the array aperture, the array structure is compact, the array structure can adapt to various environments and various noises, and a certain effect of inhibiting room reverberation is achieved.
In a specific implementation, the sound pressure sensing unit in the embodiment of the present invention is further configured to acquire a sound pressure signal p (t) of a channel; the particle vibration velocity sensitive unit is also used for collecting the time-frequency data v of the particle vibration velocity signals of the two channelsx(t) and vy(t);
Figure BDA0003236829680000061
Wherein x is1(t),x2(t) sound pressure signals, θ, for the target and interfering speech, respectively12Respectively the horizontal azimuth angle of the target voice and the interference voice, and the positive direction of the x axis is 0 DEG, np(t),nx(t) and nyAnd (t) are respectively the sound pressure and the noise signal received by the particle vibration velocity sensitive unit.
And the processor is used for preprocessing the sound pressure signal and the time frequency data to obtain time frequency spectrum data of the corresponding channel.
Specifically, in the embodiment of the present invention, the processor performs frame windowing on the sound pressure signal and the time-frequency data to obtain a corresponding single-frame time-domain signal pwin(l)、vxwin(l)、vywin(l) L is 1, 2 …, L is the length of the single-frame time domain signal, and then short-time fourier transform is performed to transform the single-frame time domain signal into a frequency domain signal; pwin(l,k)=fft(pwin(l))、Vxwin(l,k)=fft(vxwin(l))、Vywin(l,k)=fft(vywin(l))。
Further, the processor in the embodiment of the present invention is further configured to estimate an angle interval of a directional sound source in a single frame of voice signal, obtain an angle distribution of energy, and obtain a sound source arrival angle estimation of any time frequency point based on a trigonometric function relationship between time-frequency spectrum data. And setting the frequency point energy outside the target voice frequency domain to zero through preset window function processing to obtain time frequency point data corresponding to the target voice direction.
Specifically, the processor performs convolution operation on the energy distribution of the obtained full-angle space by using a rectangular window function, a Gaussian window, a Hanning window or a Hamming window to set the frequency point energy outside a target voice frequency area to zero, and obtains time-frequency point data corresponding to a target voice azimuth;
Figure BDA0003236829680000071
where θ is the target azimuth and Δ θ is the width of the rectangular window.
And finally, performing IFFT on the time-frequency point data corresponding to the target voice azimuth through the processor to obtain a corresponding time-domain signal, and splicing the target voice signal after spatial filtering by adopting a splicing addition method.
Correspondingly, an embodiment of the present invention further provides a method for performing spatial filtering speech enhancement by using any of the above systems, where the method includes:
and performing space-domain filtering on the time-frequency spectrum data of each channel of the prime point vibration velocity sensor microarray based on the measured target voice and interference voice azimuth information to obtain time-frequency point data corresponding to the target voice azimuth, performing IFFT on the time-frequency point data corresponding to the target voice azimuth to obtain corresponding time-domain signals, and splicing the target voice signals subjected to space-domain filtering by adopting a splicing and adding method.
In order to better illustrate the process of the invention, the process of the invention will be described in detail below by means of a specific example:
in the embodiment of the present invention, a speech enhancement algorithm based on spatial filtering is studied by using a trigonometric function relationship between each receiving component of a particle velocity sensor microarray and a time-frequency sparse characteristic of a speech signal, as shown in fig. 2, and the specific working flow of the method of the present invention comprises:
firstly, arranging a particle vibration velocity sensor microarray and acquiring a voice signal;
in specific implementation, the method specifically comprises the following steps: the particle vibration velocity sensor microarray is arranged as shown in figure 1, the sampling rate is 16kHz, and time-frequency data v of one-channel sound pressure signal p and two-channel particle vibration velocity signals can be obtainedx、vy
Figure BDA0003236829680000081
In the formula x1(t),x2(t) sound pressure signals, θ, for the target and interfering speech, respectively12Respectively the horizontal azimuth angle of the target voice and the interference voice, and the positive direction of the x axis is 0 DEG, np(t),nx(t) and nyAnd (t) are respectively the sound pressure and the noise signal received by the two-dimensional particle vibration velocity sensitive unit.
Step two, performing frame windowing processing on the output data of the mass point vibration velocity sensor microarray, and obtaining time-frequency data of a channel sound pressure signal and two channels of mass point vibration velocity signals through short-time Fourier transform;
to obtain an approximately stationary speech signal, p, v in the previous step are comparedx、vyDividing the signal into frames and windowing to obtain corresponding single-frame time domain signal pwin(l)、vxwin(l)、vywin(l) (L ═ 1, 2 …, L is the single frame time domain signal length). And then carrying out short-time Fourier transform on the signal to transform the single-frame time domain signal into a frequency domain signal. The specific parameters are selected as follows: the window function: a Hanning window, wherein the window length K is 1024 sampling points; frame shift by 50%; fourier transform point k; and obtaining the time-frequency spectrum data of the corresponding channel.
Pwin(l,k)=fft(pwin(l))
Vxwin(l,k)=fft(vxwin(l))
Vywin(l,k)=fft(vywin(l))
Thirdly, solving high signal-to-noise ratio angle estimation of any time frequency point by utilizing a trigonometric function relation between a first-channel sound pressure signal and two-channel particle vibration velocity signals of the particle vibration velocity sensor microarray to obtain full-angle spatial energy distribution;
the purpose of this step is to estimate the angular interval of the directional sound source in the single frame speech signal and obtain the angular distribution of the energy. According to the step 1 and the step 2, the single-frame frequency domain noisy speech signal P is obtainedwin(k,l)、Vxwin(k,l)、Vywin(k,l)。
And (3) obtaining the sound source arrival angle estimation of any time frequency point by utilizing the trigonometric function relation among the time-frequency spectrum data of each channel of the particle vibration velocity sensor microarray.
The known voice signals have good sparse characteristics in the time-frequency domain, when a plurality of voices in different directions exist, different voice signals have discrete distribution in the time-frequency domain, and the full-angle spatial energy distribution of the voice signals is obtained according to the estimation of the arrival angle of the sound source at any time frequency point.
That is, in the third step of the present invention, when the directional environment interference voice exists, based on the time-frequency domain sparse characteristic of the voice signal, the trigonometric function relationship between the channels of the particle velocity sensor microarray is used to obtain the high signal-to-noise ratio angle estimation of any time-frequency point, so as to further obtain the full-angle spatial energy distribution.
Performing convolution operation on the full-angle spatial energy distribution of the current frame signal of the mass point vibration velocity sensor microarray by using a rectangular window function (window functions such as a Gaussian window, a Hanning window and a Hamming window can also be selected) to obtain time-frequency point data corresponding to the target voice direction;
and obtaining the energy distribution of the current frame signal in the full-angle space through the steps. And performing convolution operation on the energy distribution of the obtained full-angle space by using a rectangular window function (window functions such as a Gaussian window, a Hanning window and a Hamming window can also be selected), so that the energy of frequency points outside a target voice frequency area is set to zero, and the time frequency point data corresponding to the target voice direction is obtained.
Figure BDA0003236829680000091
Where θ is the target azimuth, Δ θ is the width of the rectangular window, and Δ θ is typically taken to be 5 °
And fourthly, based on rectangular window spatial filtering (window functions such as a Gaussian window, a Hanning window and a Hamming window can also be selected), setting the energy of frequency points outside the target voice frequency region to zero, and acquiring the time frequency point data corresponding to the target voice direction.
And fifthly, splicing the target voice signal after spatial filtering by adopting a splice addition method after time-frequency point data corresponding to the obtained target voice azimuth is subjected to inverse Fourier transform.
And obtaining final filtering data through the steps, carrying out IFFT on the final filtering data to obtain corresponding time domain signals, and splicing the target voice signals after the spatial domain filtering by adopting a splicing addition method.
Generally speaking, the spatial filtering speech enhancement algorithm of the mass point vibration velocity sensor microarray in the embodiment of the invention is based on a special mass point vibration velocity sensor microarray, and consists of a sound pressure sensitive unit and two-dimensional mass point vibration velocity sensitive units. The particle vibration velocity sensitive unit is orthogonally arranged on two sides of the sound pressure sensitive unit, specifically shown in figure 1, the sound source with the structure has high directional precision, the directivity is not influenced by the array aperture, the array structure is compact, and the aperture can be in millimeter level.
In addition, the embodiment of the invention realizes the orientation of the target voice and the interference voice based on the trigonometric function relation between the sound pressure signal of one channel and the particle vibration velocity signals of the two channels. Then, the space-domain filtering speech enhancement algorithm based on rectangular window filtering solves the problem that interference and noise in all directions are inhibited under the condition that the number of space sound sources and the directions of the space sound sources are unknown, and achieves target speech enhancement. Practice proves that the method has the advantages of high robustness, good reliability, strong practicability and the like.
A second embodiment of the present invention provides a computer-readable storage medium storing a signal-mapped computer program which, when executed by at least one processor, implements the method of spatial filtering speech enhancement according to any one of the first embodiments of the present invention.
The relevant content of the embodiments of the present invention can be understood by referring to the first embodiment of the present invention, and will not be discussed in detail herein.
Although the preferred embodiments of the present invention have been disclosed for illustrative purposes, those skilled in the art will appreciate that various modifications, additions and substitutions are possible, and the scope of the invention should not be limited to the embodiments described above.

Claims (10)

1. A spatial filtering speech enhancement system based on particle velocity sensor microarray, comprising: the device comprises a sound pressure sensitive unit, two-dimensional particle vibration velocity sensitive units and a processor, wherein the particle vibration velocity sensitive units are orthogonally arranged on two sides of the sound pressure sensitive unit;
the sound pressure sensitive unit is used for measuring sound pressure components in a sound field;
the particle vibration velocity sensing unit is used for measuring the spatial particle vibration velocity component;
and the processor is used for estimating the direction of the target voice according to the sound pressure component and the vibration velocity component of the spatial particles, setting the frequency point energy outside the frequency domain of the target voice to zero to obtain time-frequency point data corresponding to the direction of the target voice, and obtaining the target voice signal according to the time-frequency point data.
2. The system of claim 1,
the sound pressure sensitive unit is also used for acquiring a sound pressure signal p (t) of a channel;
the particle vibration velocity sensitive unit is also used for collecting the time-frequency data v of the particle vibration velocity signals of the two channelsx(t) and vy(t);
Figure FDA0003236829670000011
Wherein x is1(t),x2(t) sound pressure signals, θ, for the target and interfering speech, respectively12Respectively the horizontal azimuth angle of the target voice and the interference voice, and the positive direction of the x axis is 0 DEG, np(t),nx(t) and nyAnd (t) are respectively the sound pressure and the noise signal received by the particle vibration velocity sensitive unit.
3. The system of claim 2,
the processor is further configured to preprocess the sound pressure signal and the time-frequency data to obtain time-frequency spectrum data of a corresponding channel.
4. The system of claim 3,
the processor is further configured to frame-window the sound pressure signal and the time-frequency data to obtain a corresponding single-frame time-domain signal pwin(l)、vxwin(l)、vywin(l) L is 1, 2 …, L is the length of the single-frame time domain signal, and then short-time fourier transform is performed to transform the single-frame time domain signal into a frequency domain signal;
Pwin(l,k)=fft(pwin(l))、Vxwin(l,k)=fft(vxwin(l))、Vywin(l,k)=fft(vywin(l))。
5. the system of claim 4,
the processor is further used for estimating an angle interval of a directional sound source in the single-frame voice signal, obtaining the angle distribution of energy, and obtaining the sound source arrival angle estimation of any time frequency point based on the trigonometric function relation among the time frequency spectrum data.
6. The system of claim 4,
and the processor is also used for setting the frequency point energy outside the target voice frequency domain to zero through the preset window function processing to obtain the time-frequency point data corresponding to the target voice direction.
7. The system of claim 6,
the processor is further used for performing convolution operation by using a rectangular window function, a Gaussian window, a Hanning window or a Hamming window and the obtained energy distribution of the full-angle space to set the frequency point energy outside the target voice frequency region to zero and obtain time-frequency point data corresponding to the target voice azimuth;
Figure FDA0003236829670000021
where θ is the target azimuth and Δ θ is the width of the rectangular window.
8. The system of claim 1,
the processor is further configured to perform IFFT on the time-frequency point data corresponding to the target voice azimuth to obtain a corresponding time-domain signal, and splice the space-domain filtered target voice signal by using a splice-add method.
9. A method for spatial filtering speech enhancement using the system of any of claims 1-8, the method comprising:
and performing space-domain filtering on the time-frequency spectrum data of each channel of the prime point vibration velocity sensor microarray based on the measured target voice and interference voice azimuth information to obtain time-frequency point data corresponding to the target voice azimuth, performing IFFT on the time-frequency point data corresponding to the target voice azimuth to obtain corresponding time-domain signals, and splicing the target voice signals subjected to space-domain filtering by adopting a splicing and adding method.
10. A computer-readable storage medium, storing a signal-mapped computer program which, when executed by at least one processor, performs the method of spatial filtered speech enhancement of claim 9.
CN202111004913.7A 2021-08-30 2021-08-30 Airspace filtering voice enhancement system and method Active CN113707171B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111004913.7A CN113707171B (en) 2021-08-30 2021-08-30 Airspace filtering voice enhancement system and method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111004913.7A CN113707171B (en) 2021-08-30 2021-08-30 Airspace filtering voice enhancement system and method

Publications (2)

Publication Number Publication Date
CN113707171A true CN113707171A (en) 2021-11-26
CN113707171B CN113707171B (en) 2024-05-14

Family

ID=78656897

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111004913.7A Active CN113707171B (en) 2021-08-30 2021-08-30 Airspace filtering voice enhancement system and method

Country Status (1)

Country Link
CN (1) CN113707171B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107369460A (en) * 2017-07-31 2017-11-21 深圳海岸语音技术有限公司 Speech sound enhancement device and method based on acoustics vector sensor space sharpening technique
CN113160843A (en) * 2021-03-23 2021-07-23 中国电子科技集团公司第三研究所 Particle vibration velocity sensor microarray-based interference voice suppression method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107369460A (en) * 2017-07-31 2017-11-21 深圳海岸语音技术有限公司 Speech sound enhancement device and method based on acoustics vector sensor space sharpening technique
CN113160843A (en) * 2021-03-23 2021-07-23 中国电子科技集团公司第三研究所 Particle vibration velocity sensor microarray-based interference voice suppression method and device

Also Published As

Publication number Publication date
CN113707171B (en) 2024-05-14

Similar Documents

Publication Publication Date Title
CN107221336B (en) Device and method for enhancing target voice
US9984702B2 (en) Extraction of reverberant sound using microphone arrays
US9633651B2 (en) Apparatus and method for providing an informed multichannel speech presence probability estimation
Schwarz et al. Coherent-to-diffuse power ratio estimation for dereverberation
US9100734B2 (en) Systems, methods, apparatus, and computer-readable media for far-field multi-source tracking and separation
US7626889B2 (en) Sensor array post-filter for tracking spatial distributions of signals and noise
TWI530201B (en) Sound acquisition via the extraction of geometrical information from direction of arrival estimates
WO2015196729A1 (en) Microphone array speech enhancement method and device
CN107369460B (en) Voice enhancement device and method based on acoustic vector sensor space sharpening technology
JP4812302B2 (en) Sound source direction estimation system, sound source direction estimation method, and sound source direction estimation program
MX2014006499A (en) Apparatus and method for microphone positioning based on a spatial power density.
CN110706719B (en) Voice extraction method and device, electronic equipment and storage medium
Kolossa et al. Nonlinear postprocessing for blind speech separation
Huleihel et al. Spherical array processing for acoustic analysis using room impulse responses and time-domain smoothing
JP3975153B2 (en) Blind signal separation method and apparatus, blind signal separation program and recording medium recording the program
CN114089279A (en) Sound target positioning method based on uniform concentric circle microphone array
CN111798869A (en) Sound source positioning method based on double microphone arrays
CN110890099A (en) Sound signal processing method, device and storage medium
CN113093106A (en) Sound source positioning method and system
CN109901114B (en) Time delay estimation method suitable for sound source positioning
CN113707171B (en) Airspace filtering voice enhancement system and method
EP3819655A1 (en) Determination of sound source direction
CN117037836B (en) Real-time sound source separation method and device based on signal covariance matrix reconstruction
Ramamurthy et al. Experimental performance analysis of sound source detection with SRP PHAT-β
Stolbov et al. Dual-microphone speech enhancement system attenuating both coherent and diffuse background noise

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant