CN112034424A - Neural network sound source direction finding method and system based on double microphones - Google Patents

Neural network sound source direction finding method and system based on double microphones Download PDF

Info

Publication number
CN112034424A
CN112034424A CN202010871213.7A CN202010871213A CN112034424A CN 112034424 A CN112034424 A CN 112034424A CN 202010871213 A CN202010871213 A CN 202010871213A CN 112034424 A CN112034424 A CN 112034424A
Authority
CN
China
Prior art keywords
sound source
neural network
output
hidden layer
layer
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010871213.7A
Other languages
Chinese (zh)
Inventor
刘明
周彦兵
孙冲武
赵学华
高波
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Information Technology
Original Assignee
Shenzhen Institute of Information Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Information Technology filed Critical Shenzhen Institute of Information Technology
Priority to CN202010871213.7A priority Critical patent/CN112034424A/en
Publication of CN112034424A publication Critical patent/CN112034424A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/22Position of source determined by co-ordinating a plurality of position lines defined by path-difference measurements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/049Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention provides a neural network sound source direction finding method and system based on double microphones, wherein the neural network sound source direction finding method comprises the following steps: acquiring a two-path sampling signal: two microphones are adopted to collect time domain data of two paths of sound source signals, and the collected time domain data are simultaneously transmitted to a neural network analysis module and a correlation characteristic extraction module; an output characteristic obtaining step: receiving the time domain data collected in the step 1 by adopting a neural network analysis module, wherein the neural network analysis module comprises an input layer, a hidden layer 1, a hidden layer 2, a hidden layer 3 and an output layer, analyzing the time sequence relation between two paths of sound source signals by the neural network analysis module through a recurrent neural layer according to the received time domain data, and finally obtaining the processed output characteristics at the hidden layer 2; a correlation characteristic extraction step: and (5) carrying out subsequent processing steps. The invention has the beneficial effects that: and the two microphones are adopted to realize sound source detection, so that the cost and the power consumption of a voice product are reduced.

Description

Neural network sound source direction finding method and system based on double microphones
Technical Field
The invention relates to the field of data processing, in particular to a neural network sound source direction finding method and system based on double microphones.
Background
Currently, most of the speech products in the market adopt a single microphone system to pick up and process speech, and generally, such products adopt a high-performance and high-directivity directional microphone to acquire a high-quality sound source signal. However, a single microphone system with high directivity can only pick up one sound source signal, and cannot adjust the direction of the microphone along with the movement of the sound source, which greatly limits the flexibility of use. Furthermore, in some products where the direction and location of the sound source is required, the single-microphone system does not have the capability of automatically detecting the azimuth and tracking the sound source. Although there are some methods for enhancing the sensitivity of the voice product to the sound source position by using a plurality of microphones to form an array, these products usually need more than 4 microphones to implement, which also increases the cost and power consumption of the voice product to a large extent.
Disclosure of Invention
The invention provides a neural network sound source direction finding method based on double microphones, which comprises the following steps:
acquiring a two-path sampling signal: two microphones are adopted to collect time domain data of two paths of sound source signals, and the collected time domain data are simultaneously transmitted to a neural network analysis module and a correlation characteristic extraction module;
an output characteristic obtaining step: receiving time domain data collected in the acquisition step of the two-way sampling signal by adopting a neural network analysis module, wherein the neural network analysis module comprises an input layer, a hidden layer 1, a hidden layer 2, a hidden layer 3 and an output layer, analyzing the time sequence relation between the two-way sound source signals by the neural network analysis module through a recurrent neural layer by using the received time domain data, and finally obtaining the processed output characteristics at the hidden layer 2; a correlation characteristic extraction step: a correlation characteristic extraction module is adopted to receive time domain data collected in the acquisition step of the two paths of sampling signals, then the correlation characteristic extraction module calculates correlation coefficients of multiple angles of the two paths of sound source signals in the received time domain data, and the correlation coefficients are cascaded with output characteristics of a hidden layer 2 of a neural network analysis module;
and (3) subsequent processing steps: the cascaded output characteristics are sent to a hidden layer 3 of a neural network analysis module for subsequent processing, and an angle classification result is given out on an output layer of the neural network analysis module.
As a further improvement of the present invention, after performing the subsequent processing, the method further comprises performing the following steps: a statistical judgment step: and counting the classification results of the angles of all frames in the set value by adopting a counting and judging module, and finally outputting the angle value with the highest counting value as a direction finding result per second.
As a further improvement of the present invention, in the output characteristic acquiring step, the following steps are further performed:
step 1: sampling data is processed in frames by adopting a sampling frequency of 44kHz, the frame length of each frame is 5ms, and each frame has 220 sampling points;
step 2: the input layer inputs two paths of sampling data each time; the hidden layer 1 adopts an LSTM neural network structure with a memory effect on a time sequence; the hidden layer 2, the hidden layer 3 and the output layer adopt a full-connection layer structure.
As a further improvement of the present invention, in step 2, the input layer inputs 440-dimensional data together, the hidden layer 1 uses 256 LSTM neurons together, the hidden layer 2 uses 128 neurons, the hidden layer 3 uses 64 neurons, and the output layer uses 7 neurons.
As a further improvement of the present invention, in the step 2, the operation principle of the LSTM neural network structure is as follows:
LSTM unit inputs feature t of current framenOutput result h retained beforen-1Combining and keeping the last frame data in the state Cn-1Input together to process to generate an output h of a current framenAnd an output state C of the current framenRepeating the recursive operation to capture the timing relationship between the signals, wherein each operation generates the current frame output hnIt is passed to the hidden layer 3 for subsequent operations.
As a further improvement of the invention, the output layer adopts a multi-classification Softmax function as an objective function of the model and takes cross entropy as a loss function of the training model, and the calculation formula is shown as follows:
Figure BDA0002651184030000021
wherein Ti represents a real classification label of the training data, and since 7 direction finding angles are output, the value of m in the calculation formula (8) is 7.
As a further improvement of the present invention, in the step of extracting the correlation characteristics, the correlation characteristics extracting module calculates correlation coefficients of 7 angles of two sound source signals in the received time domain data, where the 7 angles are 0 °, 30 °, 60 °, 90 °, 120 °, 150 °, and 180 °, respectively.
As a further improvement of the invention, the correlation coefficient calculation of the 7 angles is specifically as follows
Step S1: calculating the time difference of the two paths of sound source signals reaching the two microphones; the method comprises the following specific steps:
the distance d between the two microphones is 15cm, and the time difference of the two sound source signals reaching the two microphones is calculated according to the following formula:
Figure BDA0002651184030000031
where θ is the angle of the sound source, ranging from 0 to 180 degrees, vSoundRepresenting the propagation speed of sound, which is taken as 340 m/s;
step S2: supposing that a signal received by one microphone is X1(t), a signal received by the other microphone is X2(t), when the correlation characteristic extraction module carries out processing, two frames of data are cached and respectively marked as a t-1 frame and a t frame, a correlation coefficient of the t-1 frame is only calculated each time, after calculation is finished, the t frame data is moved forwards by 220 points as a new t-1 frame, and the positions of 221-440 points are filled with newly sampled data as the t frame data;
step S3: the correlation characteristic extraction module calculates correlation coefficients from delays corresponding to different angles to obtain 7-dimensional correlation coefficients, specifically:
taking X1(t) as a reference, shifting X2(t) to the right to realize alignment, and calculating the correlation coefficient as follows:
Figure BDA0002651184030000032
taking X2(t) as a reference, shifting X1(t) to the right to realize alignment, and calculating the correlation coefficient as follows:
Figure BDA0002651184030000033
where Cov (·) represents covariance calculation of two frames of data, Var (·) represents variance calculation, n ═ 1, 2, 3, and 4 represent sound source incidence conditions at angles of 0 degree, 30 degrees, 60 degrees, and 90 degrees, respectively, and n ═ 5, 6, and 7 represent sound source incidence conditions at angles of 120 degrees, 150 degrees, and 180 degrees, respectively.
In the statistical judgment step, the statistical judgment module counts and counts the classification results of the angles of 200 frames within 1 s.
The invention also discloses a neural network sound source direction finding system based on the double microphones, which comprises the following components: a memory, a processor and a computer program stored on the memory, the computer program being configured to implement the steps of the neural network sound source direction finding method of the present invention when invoked by the processor.
The invention has the beneficial effects that: 1. the neural network sound source direction finding method only adopts two microphones to realize sound source detection, thereby reducing the cost and power consumption of a voice product; 2. the neural network sound source direction finding method has the advantages that the accuracy of sound source direction detection by adopting the neural network is higher, the anti-interference capability is higher, and the real-time tracking of a sound source can be realized; 3. the neural network sound source direction finding method disclosed by the invention is combined with the correlation characteristics of the two-way signals, so that the design of a neural network model is simplified, and the operation complexity of an algorithm is reduced.
Drawings
FIG. 1 is a diagram of the direction-finding angle of a dual-microphone sound source of the neural network sound source direction-finding method of the present invention;
FIG. 2 is a block diagram of a neural network sound source direction finding algorithm of a dual microphone of the neural network sound source direction finding method of the present invention;
FIG. 3 is a schematic diagram of the operation of the LSTM unit in the neural network sound source direction-finding method of the present invention;
fig. 4 is an alignment schematic diagram of two paths of microphone sampling data of the neural network sound source direction finding method of the present invention.
Detailed Description
As shown in fig. 2, the invention discloses a neural network sound source direction finding method based on two microphones, which uses two microphones to detect a sound source in the range of 180 degrees in front of the front, and the method comprises the following steps:
acquiring a two-path sampling signal: two microphones are adopted to collect time domain data of two paths of sound source signals, and the collected time domain data are simultaneously transmitted to a neural network analysis module 4 and a correlation characteristic extraction module 5;
an output characteristic obtaining step: the method comprises the steps that a neural network analysis module 4 is adopted to receive time domain data collected in the acquisition step of two paths of sampling signals, the neural network analysis module 4 comprises an input layer, a hidden layer 1, a hidden layer 2, a hidden layer 3 and an output layer, the neural network analysis module 4 analyzes the time sequence relation between two paths of sound source signals through a recurrent neural layer on the received time domain data, and finally processed output characteristics are obtained on the hidden layer 2;
a correlation characteristic extraction step: a correlation characteristic extraction module 5 is adopted to receive the time domain data acquired in the acquisition step of the two paths of sampling signals, then the correlation characteristic extraction module 5 calculates correlation coefficients of multiple angles of the two paths of sound source signals in the received time domain data, and the correlation coefficients are cascaded with the output characteristics of the hidden layer 2 of the neural network analysis module 4;
and (3) subsequent processing steps: the cascaded output characteristics are sent to a hidden layer 3 of a neural network analysis module 4 for subsequent processing, and an angle classification result is given at an output layer of the neural network analysis module 4.
In order to further improve the stability of the direction finding output angle value, the method further comprises the following steps after the subsequent processing is executed:
a statistical judgment step: and counting the classification results of the angles of all frames in the set value by adopting a counting and judging module 6, and finally outputting the angle value with the highest counting value as a direction finding result per second.
In the output characteristic obtaining step, the method further comprises the following steps:
step 1: sampling frequency of 44kHz is adopted, sampling data are processed in frames, the frame length of each frame is 5ms, namely 220 sampling points of each frame;
step 2: the input layer inputs two paths of sampling data each time; the hidden layer 1 adopts a Long Short-Term Memory (LSTM) unit with a Memory effect on a time sequence to ensure the sensitivity of the model to the time sequence; the hidden layer 2, the hidden layer 3 and the output layer adopt a full-connection layer structure.
In step 2, the input layer inputs 440-dimensional data together, the hidden layer 1 adopts 256 LSTM neurons together, the hidden layer 2 adopts 128 neurons, the hidden layer 3 adopts 64 neurons, and the output layer adopts 7 neurons.
As shown in fig. 3, in step 2, the operation principle of the LSTM neural network structure is as follows:
LSTM unit inputs feature t of current framenOutput result h retained beforen-1Combining and keeping the last frame data in the state Cn-1Are input together for processing and then are producedGenerating an output h of a current framenAnd an output state C of the current framenRepeating the recursive operation to capture the timing relationship between the signals, wherein each operation generates the current frame output hnIt is passed to the hidden layer 3 for subsequent operations.
The calculation of each gate and its output is as follows, where (-) and tanh (-) represent the sigmoid and hyperbolic tangent activation functions, respectively:
Figure BDA0002651184030000051
fn=(Wf[hn-1,xn]+bf) (2)
un=(Wu[hn-1,xn]+bu) (3)
On=(Wo[hn-1,xn]+bo) (4)
Figure BDA0002651184030000052
hn=On*tanh(Cn) (6)
wherein f isnRepresenting the output of the forgetting gate of the current frame, unRepresenting the output of the current frame update gate, OnRepresenting the output of the current frame output gate.
The hidden layers 2 and 3 in the neural network model are fully connected layers, and after each neuron performs weighted summation, nonlinear activation operation is performed, as shown in the following formula (7):
h(i)=g(W·h(i-1)+b) (7)
where W and b are the weight and bias of the neuron, respectively, h represents the output of the hidden layer, i is the index of the layer, and g (-) represents the nonlinear activation operation, here the ReLU activation function is used.
In addition, the output layer of the neural network analysis module 4 adopts a full-connection structure, but only performs linear operation. The output layer adopts a multi-classification Softmax function as an objective function of the model, takes the cross entropy as a loss function of the training model, and has a calculation formula shown as the following formula:
Figure BDA0002651184030000061
wherein Ti represents a real classification label of the training data, and since 7 direction finding angles are output, the value of m in the calculation formula (8) is 7. That is, the output layer of the neural network analysis module 4 will give output probabilities of 7 neurons, the sum of their probabilities is 1, and the angle value corresponding to the neuron with the highest probability is taken as the sound source direction measured by the neural network analysis module 4.
In the step of extracting the correlation characteristics, the correlation characteristic extraction module 5 calculates correlation coefficients of 7 angles of two sound source signals in the received time domain data, where the 7 angles are 0 °, 30 °, 60 °, 90 °, 120 °, 150 ° and 180 °, respectively, and the angles are gradually increased from 0 ° in a counterclockwise direction from the front of two microphones, as shown in fig. 1.
In addition to inputting the 440-dimensional two-way sampling data to the neural network analysis module 5, the correlation coefficients of 7 angles extracted by the correlation feature extraction module 4 are also input to the neural network as a group of important features to assist the model in classifying the sound source angles. The correlation coefficient calculation for the 7 angles is specifically as follows:
step S1: calculating the time difference of the two paths of sound source signals reaching the two microphones; the method comprises the following specific steps:
the distance d between the two microphones is 15cm, and the time difference of the two sound source signals reaching the two microphones is calculated according to the following formula:
Figure BDA0002651184030000062
where θ is the angle of the sound source, ranging from 0 to 180 degrees, vSoundRepresenting the propagation speed of sound, which is taken as 340 m/s; when the sound source is positioned at 0 degree and 180 degrees of the double-microphone system, the time difference of the two paths of signals is the largest, and is about 0.44 ms; when the sound source signals are positioned at 30 degrees and 150 degrees, the time difference of the two paths of signals is about 0.38 ms; when the sound source signals are positioned at 60 degrees and 120 degrees, the time difference of the two paths of signals is about 0.22 ms; when the sound source is positioned right in front of the double-microphone system (at a position of 90 degrees), the two signals arrive at the same time, and no time difference exists. According to the difference of the arrival time of the two paths of signals, and the sequence of the signals received by the two microphones, 7 different sound source positions can be effectively distinguished. In order to ensure sufficient time resolution, the designed algorithm adopts a sampling rate of 44kHz, and if the time difference is converted into the number of sampling points, the time difference of two paths of signals is 19 sampling points when a sound source is positioned at 0 degree and 180 degrees; when the sound source is positioned at 30 degrees and 120 degrees, the time difference of the two paths of signals is 16 sampling points; when the sound source is positioned at 60 degrees and 150 degrees, the time difference of the two paths of signals is 9 sampling points; when the sound source is located at the 90-degree position, the time difference is 0. Therefore, two paths of sound source data can be correspondingly delayed in the time domain, and correlation coefficients of 7 different incidence angles are calculated to distinguish sound source positions.
Step S2: assuming that a signal received by the left microphone in fig. 4 is X1(t), a signal received by the right microphone is X2(t), when the correlation feature extraction module 5 performs processing, two frames of data are cached and respectively marked as a t-1 frame and a t frame, a correlation coefficient of the t-1 frame is only calculated each time, after the calculation is completed, the t frame data is moved forward by 220 points as a new t-1 frame, and the positions of 221-440 points are filled with the newly sampled data as the t frame data;
step S3: the correlation feature extraction module 5 will try to calculate the correlation coefficient from the delays corresponding to different angles, to obtain a 7-dimensional correlation coefficient, specifically:
taking X1(t) as a reference, shifting X2(t) to the right to realize alignment, and calculating the correlation coefficient as follows:
Figure BDA0002651184030000071
taking X2(t) as a reference, shifting X1(t) to the right to realize alignment, and calculating the correlation coefficient as follows:
Figure BDA0002651184030000072
where Cov (·) represents covariance calculation of two frames of data, Var (·) represents variance calculation, n ═ 1, 2, 3, and 4 represent sound source incidence conditions at angles of 0 degree, 30 degrees, 60 degrees, and 90 degrees, respectively, and n ═ 5, 6, and 7 represent sound source incidence conditions at angles of 120 degrees, 150 degrees, and 180 degrees, respectively.
In the statistical judgment step, the statistical judgment module 6 counts, counts and decides the classification result of the angles of 200 frames within 1 s. For example, in the neural network classification result within a certain second, the number of occurrences of 0 degree angle is 10, the number of occurrences of 30 degree angle is 170, the number of occurrences of 60 degree angle is 20, and the number of occurrences of 90 degree angle, 120 degree angle, 150 degree angle and 180 degree angle is 0, and then the angle with the largest number of occurrences is extracted, that is, 30 degrees is output as the direction-finding angle of the second. The count value is then set to zero and a new round of counting is restarted. The statistic and judgment module 6 is equivalent to counting the output result of the neural network within each second, and updating the direction-finding angle every other second, so that the real-time tracking of the system on the sound source is ensured, and the output stability and the anti-interference capability of the whole sound source direction-finding system are effectively improved.
In the neural network sound source direction finding method based on the double microphones, firstly, double-microphone hardware equipment with the distance of 15cm is utilized to respectively record the audio frequency of each sound source angle for 10 hours, the total audio frequency data is 70 hours, and in order to ensure the good generalization capability of a model, the distance between the sound source transmitted from each angle and the microphone is uncertain. Meanwhile, in order to improve the robustness of the model to noise interference, 10dB white noise is randomly added into the recorded audio to construct a training data set. And then, extracting time domain sampling data and correlation coefficient characteristics of the two paths of signals, dividing 10% of all training data to be used as a verification set, performing model parameter optimization by adopting a back propagation algorithm, and storing the model when the loss on the training set and the verification set is minimum so as to obtain the neural network model with the sound source direction finding capability.
The invention also discloses a neural network sound source direction finding system based on the double microphones, which comprises the following components: a memory, a processor and a computer program stored on the memory, the computer program being configured to implement the steps of the neural network sound source direction finding method of the present invention when invoked by the processor.
The invention has the beneficial effects that: 1. the neural network sound source direction finding method only adopts two microphones to realize sound source detection, thereby reducing the cost and power consumption of a voice product; 2. the neural network sound source direction finding method has the advantages that the accuracy of sound source direction detection by adopting the neural network is higher, the anti-interference capability is higher, and the real-time tracking of a sound source can be realized; 3. the neural network sound source direction finding method disclosed by the invention is combined with the correlation characteristics of the two-way signals, so that the design of a neural network model is simplified, and the operation complexity of an algorithm is reduced.
The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several simple deductions or substitutions can be made without departing from the spirit of the invention, and all shall be considered as belonging to the protection scope of the invention.

Claims (10)

1. A neural network sound source direction finding method based on two microphones is characterized by comprising the following steps:
acquiring a two-path sampling signal: two microphones are adopted to collect time domain data of two paths of sound source signals, and the collected time domain data are simultaneously transmitted to a neural network analysis module and a correlation characteristic extraction module;
an output characteristic obtaining step: receiving time domain data collected in the acquisition step of the two-way sampling signal by adopting a neural network analysis module, wherein the neural network analysis module comprises an input layer, a hidden layer 1, a hidden layer 2, a hidden layer 3 and an output layer, analyzing the time sequence relation between the two-way sound source signals by the neural network analysis module through a recurrent neural layer by using the received time domain data, and finally obtaining the processed output characteristics at the hidden layer 2;
a correlation characteristic extraction step: a correlation characteristic extraction module is adopted to receive time domain data collected in the acquisition step of the two paths of sampling signals, then the correlation characteristic extraction module calculates correlation coefficients of multiple angles of the two paths of sound source signals in the received time domain data, and the correlation coefficients are cascaded with output characteristics of a hidden layer 2 of a neural network analysis module;
and (3) subsequent processing steps: the cascaded output characteristics are sent to a hidden layer 3 of a neural network analysis module for subsequent processing, and an angle classification result is given out on an output layer of the neural network analysis module.
2. The neural network sound source direction finding method according to claim 1, further comprising performing the following steps after performing the subsequent processing:
a statistical judgment step: and counting the classification results of the angles of all frames in the set value by adopting a counting and judging module, and finally outputting the angle value with the highest counting value as a direction finding result per second.
3. The neural network sound source direction finding method according to claim 1, further comprising, in the output feature acquiring step, performing the steps of:
step 1: sampling data is processed in frames by adopting a sampling frequency of 44kHz, the frame length of each frame is 5ms, and each frame has 220 sampling points;
step 2: the input layer inputs two paths of sampling data each time; the hidden layer 1 adopts an LSTM neural network structure with a memory effect on a time sequence; the hidden layer 2, the hidden layer 3 and the output layer adopt a full-connection layer structure.
4. The method as claimed in claim 3, wherein in step 2, the input layer inputs 440-dimensional data together, the hidden layer 1 uses 256 LSTM neurons together, the hidden layer 2 uses 128 neurons, the hidden layer 3 uses 64 neurons, and the output layer uses 7 neurons.
5. The neural network sound source direction finding method according to claim 3, wherein in the step 2, the operation principle of the LSTM neural network structure is as follows:
LSTM unit inputs feature t of current framenOutput result h retained beforen-1Combining and keeping the last frame data in the state Cn-1Input together to process to generate an output h of a current framenAnd an output state C of the current framenRepeating the recursive operation to capture the timing relationship between the signals, wherein each operation generates the current frame output hnIt is passed to the hidden layer (3) to perform the subsequent operation.
6. The neural network sound source direction finding method according to claim 3, wherein the output layer adopts a multi-classification Softmax function as an objective function of the model and takes cross entropy as a loss function of the training model, and the calculation formula is as follows:
Figure FDA0002651184020000021
wherein Ti represents a real classification label of the training data, and since 7 direction finding angles are output, the value of m in the calculation formula (8) is 7.
7. The method according to claim 1, wherein in the step of extracting the correlation characteristics, the correlation characteristics extraction module calculates correlation coefficients of 7 angles of two sound source signals in the received time domain data, and the 7 angles are 0 °, 30 °, 60 °, 90 °, 120 °, 150 °, and 180 °, respectively.
8. The neural network sound source direction finding method according to claim 7, wherein the correlation coefficient of the 7 angles is calculated as follows:
step S1: calculating the time difference of the two paths of sound source signals reaching the two microphones; the method comprises the following specific steps:
the distance d between the two microphones is 15cm, and the time difference of the two sound source signals reaching the two microphones is calculated according to the following formula:
Figure FDA0002651184020000022
where θ is the angle of the sound source, ranging from 0 to 180 degrees, vSoundRepresenting the propagation speed of sound, which is taken as 340 m/s;
step S2: supposing that a signal received by one microphone is X1(t), a signal received by the other microphone is X2(t), when the correlation characteristic extraction module carries out processing, two frames of data are cached and respectively marked as a t-1 frame and a t frame, a correlation coefficient of the t-1 frame is only calculated each time, after calculation is finished, the t frame data is moved forwards by 220 points as a new t-1 frame, and the positions of 221-440 points are filled with newly sampled data as the t frame data;
step S3: the correlation characteristic extraction module calculates correlation coefficients from delays corresponding to different angles to obtain 7-dimensional correlation coefficients, specifically:
taking X1(t) as a reference, shifting X2(t) to the right to realize alignment, and calculating the correlation coefficient as follows:
Figure FDA0002651184020000031
taking X2(t) as a reference, shifting X1(t) to the right to realize alignment, and calculating the correlation coefficient as follows:
Figure FDA0002651184020000032
where Cov (·) represents covariance calculation of two frames of data, Var (·) represents variance calculation, n ═ 1, 2, 3, and 4 represent sound source incidence conditions at angles of 0 degree, 30 degrees, 60 degrees, and 90 degrees, respectively, and n ═ 5, 6, and 7 represent sound source incidence conditions at angles of 120 degrees, 150 degrees, and 180 degrees, respectively.
9. The neural network sound source direction finding method according to claim 2, wherein in the statistical determination step, the statistical determination module performs count statistics on the classification results of the angles of 200 frames within 1 s.
10. A neural network sound source direction finding system based on two microphones is characterized by comprising: a memory, a processor and a computer program stored on the memory, the computer program being configured to carry out the steps of the neural network sound source direction finding method of any one of claims 1-9 when invoked by the processor.
CN202010871213.7A 2020-08-26 2020-08-26 Neural network sound source direction finding method and system based on double microphones Pending CN112034424A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010871213.7A CN112034424A (en) 2020-08-26 2020-08-26 Neural network sound source direction finding method and system based on double microphones

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010871213.7A CN112034424A (en) 2020-08-26 2020-08-26 Neural network sound source direction finding method and system based on double microphones

Publications (1)

Publication Number Publication Date
CN112034424A true CN112034424A (en) 2020-12-04

Family

ID=73581848

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010871213.7A Pending CN112034424A (en) 2020-08-26 2020-08-26 Neural network sound source direction finding method and system based on double microphones

Country Status (1)

Country Link
CN (1) CN112034424A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113311391A (en) * 2021-04-25 2021-08-27 普联国际有限公司 Sound source positioning method, device and equipment based on microphone array and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113311391A (en) * 2021-04-25 2021-08-27 普联国际有限公司 Sound source positioning method, device and equipment based on microphone array and storage medium

Similar Documents

Publication Publication Date Title
Yi et al. Particle filtering based track-before-detect method for passive array sonar systems
Liu et al. DOA estimation based on CNN for underwater acoustic array
CN108318862B (en) Sound source positioning method based on neural network
CN109782231B (en) End-to-end sound source positioning method and system based on multi-task learning
CN111783558A (en) Satellite navigation interference signal type intelligent identification method and system
CN112349297B (en) Depression detection method based on microphone array
CN109993280A (en) A kind of underwater sound source localization method based on deep learning
CN109444869B (en) Radar extension target parameter adjustable detector for signal mismatch
CN109884591B (en) Microphone array-based multi-rotor unmanned aerial vehicle acoustic signal enhancement method
CN112904279A (en) Sound source positioning method based on convolutional neural network and sub-band SRP-PHAT space spectrum
Ge et al. Deep learning approach in DOA estimation: A systematic literature review
CN105976827A (en) Integrated-learning-based indoor sound source positioning method
CN114509811B (en) Single station rear azimuth estimation method and device based on deep learning
CN106992010A (en) Without the microphone array speech enhancement device under the conditions of direct sound wave
CN112394324A (en) Microphone array-based remote sound source positioning method and system
CN112034424A (en) Neural network sound source direction finding method and system based on double microphones
CN111368930A (en) Radar human body posture identification method and system based on multi-class spectrogram fusion and hierarchical learning
CN114462454A (en) Grid-free signal source DOA estimation method based on deep learning
CN111859241B (en) Unsupervised sound source orientation method based on sound transfer function learning
CN111443328A (en) Sound event detection and positioning method based on deep learning
CN113707136B (en) Audio and video mixed voice front-end processing method for voice interaction of service robot
Hu et al. Robust binaural sound localisation with temporal attention
Kuc Artificial neural network classification of foliage targets from spectrograms of sequential echoes using a biomimetic audible sonar
Jia et al. Two-dimensional detection based LRSS point recognition for multi-source DOA estimation
Yang et al. A Review of Sound Source Localization Research in Three-Dimensional Space

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination