CN110838303A - Voice sound source positioning method using microphone array - Google Patents
Voice sound source positioning method using microphone array Download PDFInfo
- Publication number
- CN110838303A CN110838303A CN201911069273.0A CN201911069273A CN110838303A CN 110838303 A CN110838303 A CN 110838303A CN 201911069273 A CN201911069273 A CN 201911069273A CN 110838303 A CN110838303 A CN 110838303A
- Authority
- CN
- China
- Prior art keywords
- time
- voice
- signal
- frequency
- sound
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G01—MEASURING; TESTING
- G01S—RADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
- G01S5/00—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
- G01S5/18—Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
Abstract
The invention discloses a method for positioning a voice sound source by using a microphone array, which comprises the following steps: (1) generating a training sample to obtain a time-frequency domain signal and obtain a power envelope; (2) judging whether each time-frequency point of the time-frequency domain signal is a direct voice signal; (3) training a neural network of a UNET structure by using the sample generated in the step (1); (4) predicting a time-frequency point corresponding to a direct sound of a to-be-detected noise-containing signal voice by using a trained neural network with a UNET structure; (5) and applying a positioning method to the time-frequency point which is judged to be the direct voice sound to obtain a positioning result. The voice sound source positioning method can effectively remove the influence of interference and reverberation in the environment with high reverberation and high interference, and obtain the result with higher accuracy and robustness.
Description
Technical Field
The invention relates to a voice sound source positioning method using a microphone array under a high-interference and high-reverberation environment based on a UNET structure, and belongs to the technical field of voice signal processing.
Background
The purpose of Speech Signal Source Localization (SSL) is to estimate the angle (DOA) at which the Speech signal reaches the microphone array. Sound localization, or DOA estimation, of speech signals using a microphone array is a very important and hot topic in acoustic signal processing. The method plays a very important role in sound capture in many application scenarios, such as man-machine voice interaction, lens tracking and intelligent monitoring of intelligent devices. However, there is a difficulty in that the speech signal is a broadband, non-stationary random process, while there is also noise floor, reverberation and other interfering sound sources.
Classical sound source localization methods can be divided into TDOA (time Delay Of arrival), SRP (SteeredResponse Power) and Spatial Spectrum; the data-driven method mainly utilizes a convolutional neural network to directly obtain the DOA result. In a large number of practical application scenes, not only reverberation but also noise interference exists, and most of the existing methods cannot keep high accuracy and robustness in such a complex environment.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides a voice sound source positioning method using a microphone array, which can still obtain results with higher accuracy and robustness in the environment with high reverberation and high interference.
In order to achieve the purpose, the invention adopts the technical scheme that:
a method for locating a voice sound source using a microphone array, comprising the steps of:
step 2, respectively calculating respective space power response spectrums of all time-frequency points in the time-frequency domain of the noise-containing voice signal and the clean voice signal, further estimating time delay corresponding to the time-frequency points, and recording the time delay corresponding to the time-frequency pointsAndrespectively, a noisy speech signal and a clean speech signalA time-frequency window delay estimate corresponding to time n and frequency band k; obtaining a time-frequency point distribution diagram corresponding to the direct voice sound;
step 3, training a neural network of the UNET structure by using the power spectrum amplitude logarithmic value of the noise-containing voice signal and the clean voice signal in the step 1 and the time-frequency point distribution diagram corresponding to the direct voice sound in the step 2; estimating a time-frequency point distribution diagram corresponding to the voice direct sound of the signal to be detected by using the power spectrum amplitude logarithmic value of the signal to be detected and the trained neural network;
and 4, obtaining a voice sound source positioning result by using the voice direct sound distribution estimated in the step 3 as a weight and combining a weighted positioning algorithm.
Further, in the step 2, selecting the time-frequency distribution points corresponding to the direct sound should satisfy the following conditions at the same time:
1) time delay estimation in noisy speech signalsDiffers from the real time delay tau (dsin theta)/c by less than a threshold value TH1D, c and theta are the distance of the microphone, the sound velocity and the angle of the voice source reaching the array respectively;
2) in clean speech signals, time delay estimationThe difference of the real time delay tau is less than a threshold value TH1A time-frequency window of (d);
3) the spatial power spectral response correlation of the same position of the noise-containing voice signal and the clean voice signal is greater than a threshold value TH2Time-frequency window of (d).
Further, in step 3, the input of the neural network is a logarithmic power spectrogram of the noisy speech signal, and the output is a clean speech signal logarithmic power spectrogram and a speech direct sound time-frequency point distribution diagram, wherein the clean speech signal logarithmic power spectrogram is used for assisting training, and a value corresponding to the speech direct sound time-frequency point distribution diagram is used as a weight value of the time-frequency point of step 4.
According to the voice sound source positioning method, the direct voice component is positioned, so that interference and reverberation components are reduced to the greatest extent to participate in positioning, results with high accuracy and robustness can still be obtained in the environment with high reverberation and high interference, and the influence of interference noise on the positioning effect is effectively avoided. The neural network with the UNET structure firstly performs down-sampling, learns the deep features of input data through convolution of different degrees, and performs deconvolution to realize up-sampling fitting of original data and output data features. The network structure has strong feature learning capability in a deep neural network, and is suitable for learning voice features and judging direct sound. The UNET network model adopted in the invention can be used in different array shapes, and because the UNET network predicts single-channel signals, the UNET network does not need to retrain the arrays in different shapes in actual use.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
fig. 2 is a basic block diagram of an UNET network in an embodiment of the present invention;
FIG. 3 is a logarithmic power spectrum of a noisy speech signal;
FIG. 4 is a log power spectrum of a clean speech signal;
FIG. 5 is a graph of theoretical direct speech sound time-frequency distribution;
FIG. 6 is a distribution diagram of UNET predicted direct speech sound time-frequency points;
fig. 7 shows unweighted noisy signals, theoretically weighted noisy signals, predicted spatial power responses of weighted noisy signals of direct sound (the left peak corresponds to a speech signal, and the right peak corresponds to an interference signal, which are normalized according to the maximum value).
Detailed Description
The present invention is further illustrated by the following description in conjunction with the accompanying drawings and the specific embodiments, it is to be understood that these examples are given solely for the purpose of illustration and are not intended as a definition of the limits of the invention, since various equivalent modifications will occur to those skilled in the art upon reading the present invention and fall within the limits of the appended claims.
The embodiment is performed in simulation, and provides a method for positioning a voice sound source by using a microphone array based on a UNET structure, which is suitable for high-interference and high-reverberation environments and is also suitable for arrays with different shapes, and the method comprises the following steps:
1. generating training samples to obtain time-frequency domain signals and obtaining power envelopes.
Arranging voice or interference sound sources in a simulation room, collecting signals by using I microphones, respectively collecting voice signals and interference signals at different positions, superposing the voice signals and the interference signals on a time domain to form noisy voice signals, normalizing the signal amplitude according to the maximum value, and obtaining time-frequency domain signals thereof through STFT (space time transform), wherein the time-frequency domain signals are recorded as xi(n, k) representing the noise-containing speech signal of the nth frame of the ith microphone in the frequency band k, and the speech time-frequency domain signal received by the microphone array before superposition is recorded as si(n, k). The mean values of the power spectral magnitudes of noisy speech signal X (n, k) and clean speech signal S (n, k) are respectively expressed as:
in the formula, xι(n, k) and sιAnd (n, k) respectively represent single-channel signals in the noise-containing voice signal and the clean voice signal received by the microphone array. Respectively represent the logarithmization as
XL(n,k)=log10(X(n,k)+ξ) (3)
SL(n,k)=log10(S(n,k)+ξ) (4)
In the formula, XL(n, k) and SL(n, k) represent the power spectrum amplitude logarithm of the noisy signal and the clean speech signal respectively, and ξ is a background noise power estimation value used for reducing the influence of the background noise on the robustness of the invention.
2. And judging the direct voice signal.
In a real scenario there will always be some environmental disturbances and room reverberation, which has a bad influence on the speech localization. The judgment of the direct voice is beneficial to improving the accuracy and robustness of the voice source positioning, and the influence of interference and reverberation is effectively eliminated.
For any time-frequency point in the time-frequency domain of the voice signal and the voice signal containing noise, the respective space power response spectrum P is calculated by utilizing a controllable response power (SRP) algorithmX(τ | n, k) and PS(τ|n,k)
Where X (n, k) and S (n, k) are the multi-channel frequency domain noisy speech and clean speech signals corresponding to time n and frequency band k, respectively, and X (n, k) ═ X1(n,k),x2(n,k),...,xI(n,k)]T,S(n,k)=[s1(n,k),s2(n,k),...,sI(n,k)]TThe superscript "H" is the conjugate transposed symbol, "T" is the transposed symbol, and g (k, τ) is the steering vector corresponding to the frequency band k delay τ. After obtaining the space power response, further estimating the time delay (TDoA) corresponding to the point
In the formula (I), the compound is shown in the specification,andthe time-frequency window delay estimates for noisy speech and clean speech signals corresponding to time n and frequency band k, respectively, and argmax represents the value of the argument that corresponds to the maximum value of the expression.
Extracting time-frequency distribution points corresponding to the direct sound as follows:
1) estimating time delay in corresponding time-frequency window in noisy speech signalDiffers from the real time delay tau (dsin theta)/c by less than a threshold value TH1D, c and theta are the distance of the microphone, the sound velocity and the angle of the voice source reaching the array respectively;
2) in speech signals, the time delay in the corresponding time-frequency window is estimatedThe difference of the real time delay tau is less than a threshold value TH1A time-frequency window of (d);
3) spatial power spectrum response P of two groups of signals at same positionX(τ | n, k) and PS(τ | n, k) correlation is greater than a threshold value TH2Time-frequency window of (d).
The center point of the time-frequency window which meets the three conditions is marked as 1, otherwise, the center point is 0. And obtaining the time-frequency point distribution diagram of the direct voice sound.
3. And training the UNET structure.
The basic block diagram of UNET used in this embodiment is shown in fig. 2. Cnn (K), decnn (K) are Convolutional Neural Network Layer (Convolutional Neural Network Layer) and Convolutional Neural Network Layer (Convolutional Neural Network Layer) of K channel, respectively, and the activation functions are all leakage ReLU (lreul). The length and the width of the matrix unit in the layer are expanded by the latter, the matrix unit corresponds to Max scaling, which means the maximum pooling layer, the length and the width of the matrix unit in the layer are reduced, and the matrix unit corresponds to the UNET decoding and encoding processes respectively, and the expansion or the reduction is doubled in the invention. Input in UNET structure is logarithmized power spectrogram X of noise-containing voice signalL(n, k), two outputs Speech (S) and DPD (D), which are respectively a logarithmic power spectrogram S of direct voice soundL(n, k) and the time-frequency point distribution diagram of the direct voice sound obtained in the step 2. Both Input and Speech spectrogram information is derived from data acquired by a single microphone, regardless of the array structure, so that the model can be applied to different types of arrays.
The UNET neural network cost function is min (1-lambda) | S | -S | | sweet wind2+λ||D*-D||2(9)
In the formula, S and D are predicted values of S and D output by the neural network, | · U.T. respectively2Representing a second order norm, λ is 0 at the beginning of training and gradually increases to 1 as training progresses.
4. And predicting the direct voice of the to-be-detected noise-containing signal by using the UNET structure.
When the UNET neural network is used, only the logarithmized power spectrogram X of a noise-containing voice signal is Input in the InputL(n, k), namely, obtaining a time-frequency point distribution diagram of the direct voice sound in Output, wherein a corresponding value in the distribution diagram can be used as a weight of a time-frequency point used in the following process and is marked as W (n, k).
5. A positioning result is obtained using a weighted controlled response power (WSRP) algorithm.
Here, the usual positioning methods can be used: and the SRP method is used for positioning the selected time-frequency point. Because the time-frequency points need to be screened, the WSRP method is adopted in this embodiment, and the final positioning result is expressed as
Where g (k, theta) is a steering vector corresponding to the delay theta of the frequency band k, theta represents a possible value of the sound wave arrival direction, i.e. an independent variable,representing the direction of arrival of the acoustic wave to be estimated. The microphone array may be any suitable array, typically a line array or a ring array is used. If the microphone uses a uniform line array, g (k, θ) is expressed as
Where exp denotes an index based on a natural logarithm e, j denotes an imaginary variable, c denotes a sound velocity, d is a distance vector of the microphone array, and ω iskIs the angular frequency corresponding to band k.
At this point, a voice sound source localization result is obtained.
An example of a simulation is given below.
1. Simulated hybrid speech generation
The present implementation takes the positioning of the simulation signal as an example. And during simulation, generating room impulse response by using Imagemodel, convolving the room impulse response with clean voice to generate voice under a reverberation environment, and convolving and superposing the room impulse response generated by Imagemodel at different sound source positions with the clean voice with the same room parameters to obtain a mixed signal. When Imagemodel is used for simulation, the microphone array adopts a 4-channel line array, the unit spacing of the microphone array is 2cm during network training, the prediction time distance is 3.5cm, and the room size is set to be 7 multiplied by 5 multiplied by 3m3Obtaining nearby randomly; the target sound source is positioned at 60 degrees, 45 degrees and 30 degrees on the left side of the array, the distance from the center of the array is 2m, and the interference sound source is positioned at 45 degrees on the right side of the array; the room reverberation time is randomly selected between 0.2s and 0.9s, and the signal-to-interference ratio is randomly selected between-5 dB and 10 dB. Each speech sample is 1.2s in length. The sampling frequency of the signal is 16 KHz. And respectively collecting the voice signals and the interference signals at different positions, and superposing the voice signals and the interference signals on a time domain to form noisy voice signals. And the reflectivity of the wall surface of the room is all 0 when the voice direct sound signal is collected. Since a single channel signal is used in training, the selection of the shape of the array and the location of the sound source has a negligible effect on the network training.
2. Method process flow
a) Parameter setting
The parameters of the process of the invention are first given in table 1. It should be noted that the method of the present invention does not require adjustment of parameters in different environments, and the given parameters can be applied in various environments.
TABLE 1 respective parameters
Parameters | Values |
|
512 |
|
256 |
|
1×10-4 |
c | 344m/s |
TH1 | d/(15c) |
TH2 | 0.98 |
Range of the frequency band | [2000Hz,8000Hz] |
b) Short time Fourier transform
And (3) performing discrete short-time Fourier transform on the time domain signal acquired by the microphone to obtain a time-frequency domain signal, wherein the window function is a Hanning window, the window length is 32ms, and the window shift is 16 ms.
c) Computing an "energy" envelope
Each time-frequency point of the time-frequency domain signal: the logarithmized power spectrum amplitude is calculated using the equations (1) - (4).
d) Selecting time-frequency points corresponding to direct voice sound
Each time-frequency point of the time-frequency domain signal: and (3) calculating the spatial power response and the time delay by using the formulas (5) to (8), and judging whether the sound is direct sound according to 3 conditions in the step 2.
e) Training designed UNET structures with generated samples
For the designed UNET structure:
1. input is a logarithmized power spectrum X of a noisy speech signalL(n, k), two outputs Speech (S) and DPD (D), which are respectively a logarithmic power spectrogram S of direct voice soundL(n, k) and the time-frequency point distribution map of the direct voice sound obtained in the step 1.2;
see (9) for the UNET neural network cost function.
f) Predicting direct sound
For the trained UNET structure: and inputting a logarithmic power spectrogram of the noise-containing voice signal at Input, and obtaining a time-frequency point distribution diagram of the direct voice sound in Output.
g) The method of weighting controllable response power is applied to the selected time-frequency point to obtain the positioning result
Each time-frequency point of the time-frequency domain signal: the final positioning result is estimated using equation (10).
In order to illustrate the advantages of the method, the method is compared and verified with the common traditional algorithm SRP-PHAT by using simulation and experiment.
Under simulation conditions, 60 sets of data were tested in each direction using a 4-channel line array. The experimental conditions were the same as those in the simulation example. Fig. 3-7 show that after the noisy speech signal is processed by the method of the present invention, the energy in the interference signal (right side) is greatly reduced in the spatial power response, and the influence of the interference on the positioning is greatly reduced.
A positioning result differing from the true angle by less than 5 ° is defined here as a valid positioning. Table 2 shows the effective positioning rate of the method and the conventional algorithm SRP-PHAT in the test set, and the effectiveness of the positioning effect can be obviously seen.
TABLE 2 comparison of effective location results
Angle (°) | The method of the invention | SRP-PHAT |
-30 | 68.33% | 23.33% |
-45 | 75% | 18.33% |
-60 | 55% | 15% |
In the experiment, the test was performed in two rooms: room 1 is a small Room with high reverberation, and has a volume of 5.2 × 3.5 × 3m, and T60 ═ 1.10 s; room 2 Audio-visual Room, volume 7.3X 5.3X 3m3T60 ═ 0.36 s; 50 voice samples are recorded by using a 4-channel line array with the spacing of 3.5cm, interference samples containing 20 different common noises are played circularly in a recording environment, and the distances between a sound source and the microphone array are expected to be 2 meters and the heights are the same. The sampling rate is 16 KHz. The speech sound source is at-30 deg. -45 deg. -60 deg. respectively, and the interfering sound source is at 45 deg.. The signal-to-interference ratio stabilizes at 3dB to be practical.
TABLE 3 comparison of RMSE (. degree.) for different methods in the experiment
Simulation and experiments show that the method provided by the invention is superior to the SRP-PHAT method in accuracy and robustness, the method is more stable under the condition of high reverberation, and the maximum RMSE in the experiment is 3.69 degrees and is far lower than that of the traditional SRP-PHAT algorithm.
Claims (3)
1. A method for locating a voice sound source using a microphone array, comprising the steps of:
step 1, collecting voice signals and interference signals by using a microphone array, obtaining time-frequency domain signals of noise-containing voice signals and clean voice signals, and calculating power spectrum amplitude logarithm values of the noise-containing voice signals and the clean voice signals; the clean voice signal is a signal only composed of direct voice sound;
step 2, respectively calculating respective space power response spectrums of all time-frequency points in the time-frequency domain of the noise-containing voice signal and the clean voice signal, further estimating time delay corresponding to the time-frequency points, and recording the time delay corresponding to the time-frequency pointsAndtime-frequency window delay estimation values of the noisy speech signal and the clean speech signal corresponding to time n and frequency band k respectively; obtaining a time-frequency point distribution diagram corresponding to the direct voice sound;
step 3, training a neural network of the UNET structure by using the power spectrum amplitude logarithmic value of the noise-containing voice signal and the clean voice signal in the step 1 and the time-frequency point distribution diagram corresponding to the direct voice sound in the step 2; estimating a time-frequency point distribution diagram corresponding to the voice direct sound of the signal to be detected by using the power spectrum amplitude logarithmic value of the signal to be detected and the trained neural network;
and 4, obtaining a voice sound source positioning result by using the voice direct sound distribution estimated in the step 3 as a weight and combining a weighted positioning algorithm.
2. The method as claimed in claim 1, wherein the selecting of the time-frequency distribution points corresponding to the direct sound in step 2 satisfies the following conditions:
1) time delay estimation in noisy speech signalsDiffers from the real time delay tau (dsin theta)/c by less than a threshold value TH1D, c and theta are the distance of the microphone, the sound velocity and the angle of the voice source reaching the array respectively;
2) in clean speech signals, time delay estimationThe difference of the real time delay tau is less than a threshold value TH1A time-frequency window of (d);
3) the spatial power spectral response correlation of the same position of the noise-containing voice signal and the clean voice signal is greater than a threshold value TH2Time-frequency window of (d).
3. The method as claimed in claim 1, wherein in the step 3, the input of the neural network is a log-quantized power spectrum of the noisy speech signal, and the output is a log-quantized power spectrum of the clean speech signal and a time-frequency point distribution map of the direct speech sound, wherein the log-quantized power spectrum of the clean speech signal is used for training assistance, and a value corresponding to the time-frequency point distribution map of the direct speech sound is used as the weight value of the time-frequency point in the step 4.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911069273.0A CN110838303B (en) | 2019-11-05 | 2019-11-05 | Voice sound source positioning method using microphone array |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911069273.0A CN110838303B (en) | 2019-11-05 | 2019-11-05 | Voice sound source positioning method using microphone array |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110838303A true CN110838303A (en) | 2020-02-25 |
CN110838303B CN110838303B (en) | 2022-02-08 |
Family
ID=69576300
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911069273.0A Active CN110838303B (en) | 2019-11-05 | 2019-11-05 | Voice sound source positioning method using microphone array |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110838303B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111312273A (en) * | 2020-05-11 | 2020-06-19 | 腾讯科技(深圳)有限公司 | Reverberation elimination method, apparatus, computer device and storage medium |
CN112269158A (en) * | 2020-10-14 | 2021-01-26 | 南京南大电子智慧型服务机器人研究院有限公司 | Method for positioning voice source by utilizing microphone array based on UNET structure |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102184730A (en) * | 2011-02-17 | 2011-09-14 | 南京大学 | Feed-forward active noise barrier |
US20180018970A1 (en) * | 2016-07-15 | 2018-01-18 | Google Inc. | Neural network for recognition of signals in multiple sensory domains |
CN107703486A (en) * | 2017-08-23 | 2018-02-16 | 南京邮电大学 | A kind of auditory localization algorithm based on convolutional neural networks CNN |
RU2659100C1 (en) * | 2017-06-05 | 2018-06-28 | Федеральное Государственное Казенное Военное Образовательное Учреждение Высшего Образования "Тихоокеанское Высшее Военно-Морское Училище Имени С.О. Макарова" Министерства Обороны Российской Федерации (Г. Владивосток) | Large-scale radio-hydro acoustic system formation and application method for monitoring, recognizing and classifying the fields generated by the sources in marine environment |
US20180341838A1 (en) * | 2017-05-23 | 2018-11-29 | Viktor Prokopenya | Increasing network transmission capacity and data resolution quality and computer systems and computer-implemented methods for implementing thereof |
CN109410273A (en) * | 2017-08-15 | 2019-03-01 | 西门子保健有限责任公司 | According to the locating plate prediction of surface data in medical imaging |
US20190104357A1 (en) * | 2017-09-29 | 2019-04-04 | Apple Inc. | Machine learning based sound field analysis |
CN109754812A (en) * | 2019-01-30 | 2019-05-14 | 华南理工大学 | A kind of voiceprint authentication method of the anti-recording attack detecting based on convolutional neural networks |
CN109839612A (en) * | 2018-08-31 | 2019-06-04 | 大象声科(深圳)科技有限公司 | Sounnd source direction estimation method based on time-frequency masking and deep neural network |
CN110068795A (en) * | 2019-03-31 | 2019-07-30 | 天津大学 | A kind of indoor microphone array sound localization method based on convolutional neural networks |
CN110333494A (en) * | 2019-04-10 | 2019-10-15 | 马培峰 | A kind of InSAR timing deformation prediction method, system and relevant apparatus |
-
2019
- 2019-11-05 CN CN201911069273.0A patent/CN110838303B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102184730A (en) * | 2011-02-17 | 2011-09-14 | 南京大学 | Feed-forward active noise barrier |
US20180018970A1 (en) * | 2016-07-15 | 2018-01-18 | Google Inc. | Neural network for recognition of signals in multiple sensory domains |
US20180341838A1 (en) * | 2017-05-23 | 2018-11-29 | Viktor Prokopenya | Increasing network transmission capacity and data resolution quality and computer systems and computer-implemented methods for implementing thereof |
RU2659100C1 (en) * | 2017-06-05 | 2018-06-28 | Федеральное Государственное Казенное Военное Образовательное Учреждение Высшего Образования "Тихоокеанское Высшее Военно-Морское Училище Имени С.О. Макарова" Министерства Обороны Российской Федерации (Г. Владивосток) | Large-scale radio-hydro acoustic system formation and application method for monitoring, recognizing and classifying the fields generated by the sources in marine environment |
CN109410273A (en) * | 2017-08-15 | 2019-03-01 | 西门子保健有限责任公司 | According to the locating plate prediction of surface data in medical imaging |
CN107703486A (en) * | 2017-08-23 | 2018-02-16 | 南京邮电大学 | A kind of auditory localization algorithm based on convolutional neural networks CNN |
US20190104357A1 (en) * | 2017-09-29 | 2019-04-04 | Apple Inc. | Machine learning based sound field analysis |
CN109839612A (en) * | 2018-08-31 | 2019-06-04 | 大象声科(深圳)科技有限公司 | Sounnd source direction estimation method based on time-frequency masking and deep neural network |
CN109754812A (en) * | 2019-01-30 | 2019-05-14 | 华南理工大学 | A kind of voiceprint authentication method of the anti-recording attack detecting based on convolutional neural networks |
CN110068795A (en) * | 2019-03-31 | 2019-07-30 | 天津大学 | A kind of indoor microphone array sound localization method based on convolutional neural networks |
CN110333494A (en) * | 2019-04-10 | 2019-10-15 | 马培峰 | A kind of InSAR timing deformation prediction method, system and relevant apparatus |
Non-Patent Citations (4)
Title |
---|
YONGLIANG SUN,ET AL.: "Human Localization Using Multi-Source Heterogeneous Data in Indoor Environments", 《IEEE ACCESS》 * |
宋建国等: "改进的神经网络级联相关算法及其在初至拾取中的应用", 《石油地球物理勘探》 * |
王浩等: "基于UNET直达声判决的鲁棒性语音源定位", 《2019年全国声学大会论文集 》 * |
谢庆等: "基于多特征量的油中局放超声直达波识别研究", 《中国电机工程学报》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111312273A (en) * | 2020-05-11 | 2020-06-19 | 腾讯科技(深圳)有限公司 | Reverberation elimination method, apparatus, computer device and storage medium |
CN112269158A (en) * | 2020-10-14 | 2021-01-26 | 南京南大电子智慧型服务机器人研究院有限公司 | Method for positioning voice source by utilizing microphone array based on UNET structure |
Also Published As
Publication number | Publication date |
---|---|
CN110838303B (en) | 2022-02-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109839612B (en) | Sound source direction estimation method and device based on time-frequency masking and deep neural network | |
Kim et al. | Generation of Large-Scale Simulated Utterances in Virtual Rooms to Train Deep-Neural Networks for Far-Field Speech Recognition in Google Home. | |
CN107452389B (en) | Universal single-track real-time noise reduction method | |
CN106782590A (en) | Based on microphone array Beamforming Method under reverberant ambiance | |
CN110726972B (en) | Voice sound source positioning method using microphone array under interference and high reverberation environment | |
CN101667425A (en) | Method for carrying out blind source separation on convolutionary aliasing voice signals | |
Raykar et al. | Speaker localization using excitation source information in speech | |
Niwa et al. | Post-filter design for speech enhancement in various noisy environments | |
CN110838303B (en) | Voice sound source positioning method using microphone array | |
CN112904279A (en) | Sound source positioning method based on convolutional neural network and sub-band SRP-PHAT space spectrum | |
CN113129918A (en) | Voice dereverberation method combining beam forming and deep complex U-Net network | |
Pertilä et al. | Microphone array post-filtering using supervised machine learning for speech enhancement. | |
CN114171041A (en) | Voice noise reduction method, device and equipment based on environment detection and storage medium | |
CN110111802A (en) | Adaptive dereverberation method based on Kalman filtering | |
CN112269158B (en) | Method for positioning voice source by utilizing microphone array based on UNET structure | |
CN115424627A (en) | Voice enhancement hybrid processing method based on convolution cycle network and WPE algorithm | |
CN111123202B (en) | Indoor early reflected sound positioning method and system | |
Pirhosseinloo et al. | A new feature set for masking-based monaural speech separation | |
Guo et al. | Underwater target detection and localization with feature map and CNN-based classification | |
Firoozabadi et al. | Combination of nested microphone array and subband processing for multiple simultaneous speaker localization | |
CN115426055A (en) | Noise-containing underwater acoustic signal blind source separation method based on decoupling convolutional neural network | |
CN101645701B (en) | Time delay estimation method based on filter bank and system thereof | |
CN112712818A (en) | Voice enhancement method, device and equipment | |
Sarabia et al. | Spatial LibriSpeech: An Augmented Dataset for Spatial Audio Learning | |
JP2005258215A (en) | Signal processing method and signal processing device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |