CN114724571B - Robust distributed speaker noise elimination system - Google Patents
Robust distributed speaker noise elimination system Download PDFInfo
- Publication number
- CN114724571B CN114724571B CN202210329198.2A CN202210329198A CN114724571B CN 114724571 B CN114724571 B CN 114724571B CN 202210329198 A CN202210329198 A CN 202210329198A CN 114724571 B CN114724571 B CN 114724571B
- Authority
- CN
- China
- Prior art keywords
- module
- node
- noise ratio
- root node
- input signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 230000008030 elimination Effects 0.000 title claims abstract description 30
- 238000003379 elimination reaction Methods 0.000 title claims abstract description 30
- 230000000694 effects Effects 0.000 claims abstract description 32
- 239000011159 matrix material Substances 0.000 claims abstract description 30
- 238000001514 detection method Methods 0.000 claims abstract description 26
- 238000013144 data compression Methods 0.000 claims abstract description 11
- 238000013138 pruning Methods 0.000 claims abstract description 11
- 230000005540 biological transmission Effects 0.000 claims abstract description 9
- 238000004364 calculation method Methods 0.000 claims abstract description 6
- 238000001228 spectrum Methods 0.000 claims description 23
- 238000012545 processing Methods 0.000 claims description 6
- 238000007906 compression Methods 0.000 claims description 5
- 230000006835 compression Effects 0.000 claims description 5
- 238000009432 framing Methods 0.000 claims description 3
- 230000009466 transformation Effects 0.000 claims 2
- 238000005516 engineering process Methods 0.000 abstract description 16
- 238000000034 method Methods 0.000 description 16
- 230000003044 adaptive effect Effects 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 230000033001 locomotion Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000012795 verification Methods 0.000 description 4
- 230000000644 propagated effect Effects 0.000 description 3
- 238000012546 transfer Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000037433 frameshift Effects 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 244000141353 Prunus domestica Species 0.000 description 1
- 230000002411 adverse Effects 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000005265 energy consumption Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000009827 uniform distribution Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D30/00—Reducing energy consumption in communication networks
- Y02D30/70—Reducing energy consumption in communication networks in wireless communication networks
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Noise Elimination (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The invention discloses a robust distributed speaker noise elimination system, which comprises a discrete Fourier transform module, a voice activity detection module, a signal-to-noise ratio calculation module, a tree topology pruning module, a data driving comparison module, a data compression module, a root node operation module, a covariance matrix estimation module, a filter updating module, a result transmission module, a root node updating module and an inverse discrete Fourier transform module. The invention relates to a robust distributed speaker noise elimination technology which can be applied to any network topology connection, and the invention completes the comparison of input signal to noise ratio under the tree topology by pruning any network topology into the tree topology, so that the invention has certain robustness for a moving speaker, namely, no matter where the speaker is, the node with the maximum input signal to noise ratio can be always found, and finally distributed speaker noise elimination is carried out.
Description
Technical Field
The invention relates to the technical field of distributed noise elimination, in particular to a robust distributed speaker noise elimination system.
Background
Typically, the speech signal quality is severely affected by background noise, resulting in a significant performance penalty for the working device. In order to reduce the adverse effect of background noise on the working equipment, it is necessary to extract clean speech signals from the noisy speech signals. Among them, the conventional single microphone and multi-microphone noise canceling method can improve voice quality to some extent, but still has many limitations such as: a single microphone cannot acquire spatial information, structural rules of multiple microphones, and the like. However, the advent of Wireless Acoustic Sensor Networks (WASNs) has well remedied these limitations, consisting of multiple independent network nodes, each of which may carry a microphone or microphones, which form a network connection topology by way of wireless communications, and each of which has its own computing center. Each node in the WASN is located arbitrarily compared to the conventional single and multi-microphones, so that there is always one node closest to the source and the signal-to-noise ratio (SNR) of the voice signal collected by that node is relatively highest, which will improve the distributed speaker noise cancellation performance more effectively.
With the gradual improvement of the noise cancellation technology applied to the WASN, the technology can be divided into two types of centralized noise cancellation and distributed noise cancellation. The implementation of the centralized noise cancellation depends on an additional data processing center, that is, each node in the WASN needs to uniformly send the collected voice signals to the data processing center, and all operations of the noise cancellation technology are implemented by the data processing center. The method not only brings great operation amount and energy consumption to the data processing center, but also stops working when the WASN is damaged due to excessive dependence on the data processing center. However, distributed noise cancellation is performed cooperatively by each node, i.e., each node performs a corresponding operation, thereby eliminating the need for a data processing center. Even if part of nodes in the WASN are damaged, the distributed noise elimination can still achieve better noise elimination performance.
In the prior art, a distributed adaptive node-specific noise cancellation technique is proposed. The technology expands the distributed self-adaptive node specific signal estimation algorithm to the tree topology, and the distributed self-adaptive node specific signal estimation algorithm exchanges data with the neighboring nodes through each node under the situation of tree topology connection, so that the output of each node is approximately the same as the effect of the data processing center. Although the technology expands the existing distributed algorithm to the tree topology, and the final output result of distributed noise elimination can be approximately the same as the effect realized by the data processing center, the output performance of the technology is poor.
Topology independent distributed adaptive node specific noise cancellation techniques have also been studied in the prior art, which reduce the number of data exchanges by linearly compressing the signals received by each node, while the algorithm can be applied to any network topology, and the final distributed noise cancellation performance can achieve an effect similar to centralized. Although the technology realizes a topology independent distributed noise elimination algorithm, the technology realizes the distributed implementation of a centralized multi-channel wiener filtering algorithm, so that residual noise after voice noise elimination is still serious.
In addition, the influence of the distributed data exchange bit number on the noise elimination performance is considered in the prior art, so that the technology of adaptive quantization is proposed, and the technology can properly control the required energy and communication bandwidth according to the current environment. Although the technical scheme keeps low power consumption while finishing noise elimination, the performance of noise elimination is still poor, and a lot of residual noise exists.
For the existing distributed voice noise cancellation technology, a part of the distributed voice noise cancellation technology focuses on noise cancellation while ignoring consumption of communication bandwidth between nodes and power consumption of the nodes, a part focuses on reducing communication load and computational complexity as much as possible while noise cancellation performance is not satisfactory, and a part does not consider the motion state of a speaker although achieving a certain balance between the two, because when the position of the speaker changes, the voice characteristics collected by each node change, which leads to a large change in performance of the existing distributed noise cancellation technology. Based on the above, in order to further improve the distributed noise elimination effect under the condition that the speaker moves, the invention provides a robust distributed speaker noise elimination technical scheme on the premise of not being constrained by WASN topology, and the distributed noise elimination system combines the characteristics of the moving speaker and WASN and completes noise elimination.
Disclosure of Invention
According to the problems of the prior art, the invention discloses a robust distributed speaker noise cancellation system, comprising:
the discrete Fourier transform module is used for respectively carrying out framing and windowing processing on signals of J nodes in the wireless acoustic sensor network, carrying out discrete Fourier transform on each frame of signals to obtain discrete spectrum signals, and defining the discrete spectrum signals as node local signals;
The voice activity detection module is used for receiving the discrete spectrum signals transmitted by the discrete Fourier transform module, detecting the voice activity of the discrete spectrum signals and judging whether each frame of signal has voice or not so as to obtain a voice activity detection result;
The signal-to-noise ratio calculation module is used for calculating the input signal-to-noise ratio of each frame of signal according to the voice activity detection result obtained by the voice activity detection module;
the tree topology pruning module prunes the topology formed by the nodes in the wireless sensor network to form a tree topology structure;
The data driving comparison module is used for calculating each node in the tree topology structure and carrying out data driving comparison on the input signal to noise ratio so as to obtain the maximum input signal to noise ratio;
The data compression module is used for receiving the discrete spectrum signals transmitted by the discrete Fourier transform module and compressing the data of the neighbor nodes of the root node in a compression vector mode to obtain compressed data;
the root node operation module receives the maximum input signal-to-noise ratio transmitted by the data driving comparison module, and sums the compressed data transmitted by the data compression module and the compressed data of the root node to obtain a voice signal after noise elimination;
the covariance matrix estimation module is used for receiving the detection result transmitted by the voice activity detection module after the root node builds a local signal, and respectively calculating a noise covariance matrix and a voice covariance matrix;
The filter updating module 9 is used for receiving the covariance matrix transmitted by the covariance matrix estimation module and updating the filter of the root node;
the result transmission module receives the maximum input signal-to-noise ratio transmitted by the root node operation module and the voice signal after noise elimination transmitted by the root node operation module, and transmits the maximum input signal-to-noise ratio and the voice signal to each node along the direction away from the root node;
The root node updating module is used for comparing the maximum input signal-to-noise ratio of each node transmitted by the result transmission module with the input signal-to-noise ratio of each node, and enabling the node which is equal to the maximum input signal-to-noise ratio to be the root node of the next iteration;
The inverse discrete fourier transform module 12 receives the noise-removed speech signal transmitted by the result transmission module, performs inverse discrete fourier transform on the noise-removed speech signal to obtain a time-domain output speech signal of the current frame, and performs overlap-add on the time-domain output speech signal to obtain a final output signal.
Further, when the data driving comparison module obtains the input signal to noise ratio to perform data driving comparison, the following method is adopted:
Let the node of maximum input signal to noise ratio be the root node r i,
The input signal-to-noise ratio of any non-root node o i with only one neighbor node is sent to the neighbor nodes of the neighbor nodes, the non-root node p i with more than one neighbor node compares the input signal-to-noise ratio sent by the non-root node o i with all the neighbor nodes, the maximum value is found and sent to the neighbor node f t, and the sent maximum input signal-to-noise ratio is expressed as follows:
Where o r i is an element in the neighbor set of node p i, T is the number of elements in the set, r is an element in the set {1,2,.,. T }, repeating this step until the data reaches the root node;
the root node compares the input signal-to-noise ratio of the own node with the input signal-to-noise ratio sent by the neighbor node:
and B is the total number of neighbors of the root node, and finally the maximum input signal-to-noise ratio iSNR i is obtained.
Further, the filter updating module updates the filter by using the following expression:
Wherein β+_0 is an adjustment factor, u n is a (E j +B) dimensional selection vector, only one element is 1, the other elements are 0, element 1 is at any position of the previous E j dimension, the filter part of the neighbor of the corresponding root node obtained according to the above formula is sent to its neighbor node by the root node, and the neighbor node of the root node updates its own filter:
By adopting the technical scheme, the robust distributed speaker noise elimination system provided by the invention is a robust distributed speaker noise elimination technology which can be applied to any network topology connection, and the noise elimination technology can be used for eliminating the distributed speaker noise by pruning any network topology into a tree topology and completing comparison of input signal to noise ratios under the tree topology, so that the system has certain robustness for a moving speaker, namely, no matter where the speaker is located, the node with the maximum input signal to noise ratio can be always found, and finally the distributed speaker noise elimination is carried out. The invention takes the node with the maximum input signal-to-noise ratio as the root node and only carries out filter updating on the root node, which not only reduces the quantity of data exchanged between the nodes, but also can eliminate a large amount of noise through the adjusting factors.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to the drawings without inventive effort to those skilled in the art.
FIG. 1 is a schematic diagram of the system of the present invention;
FIG. 2 is a schematic diagram of a wireless acoustic sensor network in accordance with the present invention;
FIG. 3 is a schematic diagram of a network topology according to the present invention;
FIG. 4 shows the PESQ values (white noise) after speech noise cancellation for each method for different input signal-to-noise ratios in an embodiment of the present invention;
FIG. 5 shows the PESQ values (babble noise) after speech noise cancellation for each method at different input signal-to-noise ratios in an embodiment of the present invention;
FIG. 6 shows the PESQ values (in-vehicle noise) after speech noise cancellation for each method under different input signal-to-noise ratios in an embodiment of the present invention;
Detailed Description
In order to make the technical scheme and advantages of the present invention more clear, the technical scheme in the embodiment of the present invention is clearly and completely described below with reference to the accompanying drawings in the embodiment of the present invention:
the robust distributed speaker noise cancellation system shown in fig. 1 comprises a discrete fourier transform module 1, a voice activity detection module 2, a signal-to-noise ratio calculation module 3, a tree topology pruning module 4, a data driving comparison module 5, a data compression module 6, a root node operation module 7, a covariance matrix estimation module 8, a filter updating module 9, a result transmission module 10, a root node updating module 11 and an inverse discrete fourier transform module 12.
The discrete fourier transform module 1 is used for performing frame windowing processing on signals of J nodes in the wireless acoustic sensor network respectively, performing discrete fourier transform on each frame of signals to obtain discrete spectrum signals, and defining the discrete spectrum signals as node local signals.
As a preferred mode, the discrete fourier transform module 1 operates on the principle that: in wassn there are a total of J nodes, and each node J has E j microphones. First, each path of signal y j,e (n) (i.e., the e-th path of signal of the j-th node) of each node j is subjected to framing and windowing, and then each frame of signal is subjected to Discrete Fourier Transform (DFT). The sampling frequency fs of the voice signal is 16kHz during verification, a hanning window is used, the frame shift is 50%, and the length of each frame of data is m=320 points. Wherein the hanning window has the following expression:
ω(m)=0.5-0.5cos(2πm/M),m=0,1,...,M-1 (1)
The windowed signal can be obtained according to the hanning window expression as follows:
y′j,e(m)=yj,e(n)ω(m) (2)
Then each frame of signal after windowing of each signal carries out DFT, and discrete frequency spectrum can be obtained after conversion as follows:
Where k represents the bin index and l represents the frame index.
The signals Y j,e (k, l) of each node are stacked, and the stacked vector form is expressed as follows:
Wherein the indices k and l are omitted for convenience. In addition, y j=xj+vj, where x j is the speech portion and v j is the noise portion.
Further, the voice activity detection module 2 is configured to receive the discrete spectrum signal transmitted by the discrete fourier transform module 1, perform voice activity detection on the discrete spectrum signal, and determine whether voice exists in each frame of signal, so as to obtain a voice activity detection result.
As a preferred mode, the voice activity detection module 2 works on the principle that the discrete spectrum of each signal obtained by the discrete fourier transform module 1 is respectively subjected to voice activity detection, and the characteristic that most of the first second of voice is a non-voice segment is utilized, and the number of non-voice frames of the voice signal which are the most initial is NIS frame in combination with the processing procedure of frame windowing, wherein nis=fs/(50% ×m) -1=99. Then the noise average spectrum estimated using this NIS frame is:
equation (5) represents that the corresponding frequency points of each frame signal are summed and then averaged. Further, the logarithmic spectrum estimation of the noise frame is represented as follows:
Where |·| is a modulo operation. Then, the log spectrum of each frame of signal is calculated:
The logarithmic spectrum distance between each frame of signal and noise signal can be obtained by the formula (6) and the formula (7), and the logarithmic spectrum distance formula is as follows:
To sum up, a method for determining voice activity detection can be obtained: first, a no-speech-segment counter is set, which can be set to an initial value of 100, while a log-spectral distance threshold of 3 is set. Then, the logarithmic spectrum distance d spec between each frame signal and noise frame is calculated, and it is determined whether d spec is smaller than the logarithmic spectrum distance threshold, if yes, the frame is a no-speech frame, the no-speech segment counter is incremented by 1, if not, the frame is a speech frame, and no-speech segment counter must be zeroed. Finally, it should be noted that if the value of the no-speech-segment counter before the zero-resetting is smaller than the minimum no-speech length, the frames that are no-speech segments after the last zero-resetting of the no-speech-segment counter and before the zero-resetting of the no-speech-segment counter are considered to be speech frames. Here let the minimum no-speech length be 10.
In order to reduce the distortion of voice during verification, the patent considers the voice frame when the voice activity detection result of each path of signal is the noise frame, otherwise, the voice frame is considered.
The function of the signal-to-noise ratio calculation module 3 is to calculate an input signal-to-noise ratio for each frame signal based on the voice activity detection result obtained by the voice activity detection module 2.
Preferably, the working principle of the signal-to-noise ratio calculating module 3 is to calculate an input signal-to-noise ratio for each frame signal according to the voice activity detection result obtained by the voice activity detecting module 2, and when no voice frame exists, to calculate the noise power by statistical average, and when a voice frame appears, to calculate the input signal-to-noise ratio for the frame:
Where |· | is a two-norm and E [ · ] represents the statistical average up to the current first frame.
The tree topology pruning module 4 is used for pruning the topology formed by the nodes in the wireless sensor network to form a tree topology.
As a preferred manner, the tree topology pruning module 4 works on the principle that the topology formed by the nodes is pruned, firstly, we specify that the coordinates of two nodes are (x 1,y1) and (x 2,y2), respectively, and then the euclidean distance d of the two nodes can be expressed as follows:
according to the above operation, the Euclidean distance between every two points can be obtained. Next, a node needs to be arbitrarily selected from the network to be placed in the node set S (which is initially an empty set). The points which are connected with the points in the set S and are not selected are selected, the point with the smallest Euclidean distance with the points in the set S is put into the set S, and the steps are repeated until all the points are selected.
The function of the data-driven comparison module 5 is to perform data-driven comparison on the input signal-to-noise ratio calculated by each node in the tree topology formed by the tree topology pruning module 4, and finally obtain the maximum input signal-to-noise ratio.
Preferably, since the tree topology is formed as it can be regenerated at each iteration, an iteration index i is also added to the representation of the node. The data driving comparison in the formed tree topology is divided into the following steps:
Firstly, a node needing to be assigned with the maximum input signal-to-noise ratio is a root node r i, and the initial root node is any node because the initial voice part is generally a non-voice frame and the input signal-to-noise ratio cannot be calculated;
The second step, the input signal-to-noise ratio (belonging to the output of the signal-to-noise ratio calculation module 3) of any non-root node o i with only one neighbor node is sent to its neighbor node (towards the root node), then the non-root node p i with more than one neighbor node sends the maximum value of the input signal-to-noise ratios sent from all neighbor nodes (excluding some neighbor node f t pointing to the root node) to the neighbor node f t, and the sent maximum input signal-to-noise ratio is expressed as follows:
Where o r i is an element in the set of neighbors of node p i (excluding its neighbor node f t), T is the number of elements in the set, r is an element in the set {1, 2., T };
Third, repeating the second step until the data reach the root node;
Fourth, the root node compares the input signal-to-noise ratio of the own node with the input signal-to-noise ratio sent by the neighbor node:
and B is the total number of neighbors of the root node, and finally the maximum input signal-to-noise ratio iSNR i is obtained.
The data compression module 6 is used for receiving the data transmitted by the discrete fourier transform module 1, and compressing the data of the neighbor nodes of the root node through the compression vector to obtain compressed data.
Preferably, the data compression module 6 operates on the principle that only the data y j (belonging to the output of the discrete fourier transform module 1) of the neighboring node of the root node is compressed:
Where z j i is scalar data after compression, w j iH is a compression vector, and also is a partial filter of the centralized filter corresponding to the node data, (. Cndot.) H represents the conjugate transpose of the vector or matrix. In addition, the compressed vector needs to be initialized, and the compressed vector element is initialized to be random numbers which are subjected to uniform distribution in a unit interval during verification.
It should be emphasized that, all the corner marks i appearing are iteration indexes, and the data of the ith iteration can be considered as the data of the ith frame, and the patent sets the iteration index to 1 for the first time, namely, starts from the data of the 1 st frame during verification.
The root node operation module 7 is used for receiving the maximum input signal to noise ratio obtained by the data driving comparison module 5, and summing the compressed data from the data compression module 6 and the compressed data of the root node itself to obtain a voice signal after noise elimination.
As a preferred mode, the root node operation module 7 operates on the principle that the received data drives the comparison module 5 to obtain the maximum input signal to noise ratio iSNR i, and sums the compressed data from the data compression module 6 with the compressed data of the root node itself to obtain the voice signal after noise cancellation:
The covariance matrix estimation module 8 is used for respectively calculating a noise covariance matrix and a voice covariance matrix according to the detection result transmitted by the voice activity detection module 2 after the root node constructs a local signal.
As a preferable mode, the covariance matrix estimation module 8 works on the principle that the root node is constructed by using the compressed data of the root node neighbor output by the data compression module 6 to perform local signal construction:
wherein the signal is a column vector of dimension (E j + B). Meanwhile, the signal may also be expressed as:
other non-root nodes also perform the above operation to construct local signals of own nodes. Unlike the above, the non-root node does not need to receive the compressed data of its neighbor nodes, and only needs to put the 0 element equivalent to the number of its neighbors into the local signal.
Then, the noise covariance matrix and the voice covariance matrix are estimated on each node j, and the noise covariance matrix is updated only in the frames without voice, and the rest frames update the voice covariance matrix. The noise covariance matrix updating formula of the current first frame of each node is as follows:
Where α=0.997, l-1 represents the noise covariance matrix estimate of the frame on the jth node. The noise covariance matrix has an estimated value for each frequency bin, and if the current frame is a noise frame, the value is updated as described above. When a non-noise frame occurs, then the noisy speech covariance matrix is updated as follows:
the speech covariance matrix can be obtained from equation (17) and equation (18):
The filter updating module 9 is used for updating the filter of the root node according to the covariance matrix estimated by the covariance matrix estimation module 8.
Preferably, the filter updating module 9 works on the principle that the filter of the root node can be updated according to the covariance matrix estimated by the covariance matrix estimation module 8, and the expression of the filter is as follows:
Wherein β+_0 is an adjustment factor, the larger it is, the larger the noise cancellation capability is, the more the resulting speech distortion will be relatively large, u n is a (E j +B) -dimensional selection vector, which has only one element of 1, the other elements of 0, and element 1 is at any position of the preceding E j -dimensional. The filter portion of the corresponding root node neighbor resulting from equation (20) is then sent by the root node to its neighbor nodes, which then update their own filters:
The effect of the result transfer module 10 is to receive the maximum input signal-to-noise ratio and the noise-cancelled speech signal transmitted by the root node operation module 7, and to propagate both to each node in a direction away from the root node.
Preferably, the result delivery module 10 operates on the principle that the maximum input signal-to-noise ratio iSNR i obtained by the root node operation module 7 and the noise-cancelled voice signal d RD i are broadcast to each node in a direction away from the root node.
The root node update module 11 is configured to compare the maximum input signal-to-noise ratio propagated to each node by the result transfer module 10 with the input signal-to-noise ratio of each node, so that the node equal to the maximum input signal-to-noise ratio is the root node of the next iteration.
Preferably, the root node updating module 11 operates on the principle that the maximum input signal-to-noise ratio iSNR i propagated to each node by the result delivery module 10 is compared with the input signal-to-noise ratio of each node, and the node equal to the maximum input signal-to-noise ratio iSNR i is updated to the root node of the next iteration.
The inverse discrete fourier transform module 12 is operative to receive the noise-cancelled speech signal transmitted by the result transmission module 10, perform inverse discrete fourier transform on the noise-cancelled speech signal to obtain a time-domain output speech signal of the current frame, and perform overlap-add on the time-domain output speech signal to obtain a final output signal.
Preferably, the inverse discrete fourier transform module 12 operates on the principle that the noise-removed speech signal d RD i propagated to each node by the result transfer module 10 is subjected to Inverse Discrete Fourier Transform (IDFT) to obtain a noise-removed speech signal output by the current frame in the time domain. The IDFT formula is as follows:
Where i and l represent the same meaning, i.e. the ith iteration is identical to the first frame, the iteration index i is omitted below when both occur simultaneously.
Because the discrete fourier transform module 1 performs frame windowing processing on each path of signal and the frame shift is 50%, when the first frame output voice signal is obtained, overlap-add operation is performed on the first frame output voice signal and the second frame output voice signal, and the overlap part accounts for 50%, and the specific formula is as follows:
Where [. Cndot. ] is a rounding operation, [ a ] represents a maximum integer not exceeding the number a.
In order to verify the effectiveness of the method, the invention simulates a 7m multiplied by 3m closed room by Imgae model, and the reverberation time is 300ms. 10 nodes are randomly distributed in the room, each node is a linear array (with a spacing of 8 cm) containing 3 microphones, and the height of each node is set to be 1m. The starting point and the ending point of the motion trail of a female speaker are respectively (3 m,1.4 m) and (0.7 m,5.8 m), the height is 1.7m, and the motion trail is specifically a curve plus a straight line. The simulated two-dimensional WASN is shown in fig. 2, and the oval dotted line represents 1 node, and 3 microphones contained in each node are represented by solid dots, and meanwhile, coordinates of two inflection points of the speaker motion trail are marked.
The speech is 1 clean speech signal randomly selected from TIMIT database [ https:// download. Csdn. Net/download/sdhyfxh/4086482] for a period of 6 seconds. The noise is selected from white noise, babble noise and noise in the automobile, and the sampling frequencies of the voice and the noise are 16kHz. Fig. 3 shows a network topology connection schematic diagram of the WASN, and a black bold solid line represents a tree topology obtained by pruning an original topology (black bold and non-bold representation).
At this time, the robust distributed parametric multi-channel wiener filtering (RD-PMWF) proposed by the patent is adopted to reduce noise of the signals received by each node, and simultaneously, the methods in the document [1] and the document [2] are adopted to respectively eliminate voice noise for the experiment. Fig. 4, 5 and 6 show a comparison of the performance of the different methods against three background noise. Wherein fig. 4 is a performance comparison of Perceptual Evaluation of Speech Quality (PESQ) under different input signal-to-noise ratios with white noise as background noise, the experiment was performed taking the adjustment factor β of the distributed denoising technique of this patent as 15 and 20, respectively. Fig. 5 and 6 differ from fig. 4 only in the background noise, and the remaining experimental conditions are identical. Experimental results show that the method of the document [1] cannot denoise the signal before denoising under any background noise, and the method of the document [2] has certain denoising capability, but the denoising capability is stronger under the condition of the movement of a speaker.
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.
Accessories:
[1]A.Bertrand and M.Moonen,″Distributed Adaptive Estimation of Node-Specific Signals in Wireless Sensor Networks With a Tree Topology,″in IEEE Transactions on Signal Processing,vol.59,no.5,PP.2196-2210,May 2011.
[2]J.Szurley,A.Bertrand and M.Moonen,″Topology-Independent Distributed Adaptive Node-Specific Signal Estimation in Wireless Sensor Networks,″in IEEE Transactions on Signal and Information Processing over Networks,vol.3,no.1,pp.130-144,March 2017.
[3]F.de la Hucha Arce,M.Moonen,M.Verhelst,A.Bertrand,″Adaptive Quantization for Multichannel Wiener Filter-Based Speech Enhancement in Wireless Acoustic Sensor Networks,″Wireless Communications and Mobile Computing,vol.2017,Article ID 3173196,15 pages,2017.
Claims (3)
1. A robust distributed speaker noise cancellation system, comprising:
the discrete Fourier transform module is used for respectively carrying out framing and windowing processing on signals of J nodes in the wireless acoustic sensor network, carrying out discrete Fourier transform on each frame of signals to obtain discrete spectrum signals, and defining the discrete spectrum signals as node local signals;
The voice activity detection module is used for receiving the discrete spectrum signals transmitted by the discrete Fourier transform module, detecting the voice activity of the discrete spectrum signals and judging whether each frame of signal has voice or not so as to obtain a voice activity detection result;
The signal-to-noise ratio calculation module is used for calculating the input signal-to-noise ratio of each frame of signal according to the voice activity detection result obtained by the voice activity detection module;
Pruning topology formed by nodes in the wireless sensor network to obtain Euclidean distance between every two points, randomly selecting one node from the network to be placed in a node set S, selecting points which are connected with the points in the set S and are not selected, placing the point with the smallest Euclidean distance between the points and the points in the set S, and repeating the steps until all the points are selected;
The data driving comparison module is used for calculating each node in the tree topology structure, and performing data driving comparison on the input signal to noise ratio so as to obtain the maximum input signal to noise ratio, wherein the data driving comparison module is used for generating again after each iteration when forming the tree topology, and adding an iteration index to the representation of the node;
The data compression module is used for receiving the discrete spectrum signals transmitted by the discrete Fourier transform module and compressing the data of the neighbor nodes of the root node in a compression vector mode to obtain compressed data;
the root node operation module receives the maximum input signal-to-noise ratio transmitted by the data driving comparison module, and sums the compressed data transmitted by the data compression module and the compressed data of the root node to obtain a voice signal after noise elimination;
the covariance matrix estimation module is used for receiving the detection result transmitted by the voice activity detection module after the root node builds a local signal, and respectively calculating a noise covariance matrix and a voice covariance matrix;
The filter updating module is used for receiving the covariance matrix transmitted by the covariance matrix estimation module and updating the filter of the root node;
the result transmission module receives the maximum input signal-to-noise ratio transmitted by the root node operation module and the voice signal after noise elimination transmitted by the root node operation module, and transmits the maximum input signal-to-noise ratio and the voice signal to each node along the direction away from the root node;
The root node updating module is used for comparing the maximum input signal-to-noise ratio of each node transmitted by the result transmission module with the input signal-to-noise ratio of each node, and enabling the node which is equal to the maximum input signal-to-noise ratio to be the root node of the next iteration;
the discrete Fourier inverse transformation module receives the voice signal after noise elimination transmitted by the result transmission module, performs discrete Fourier inverse transformation on the voice signal after noise elimination to obtain a time domain output voice signal of the current frame, and performs overlap addition on the time domain output voice signal to obtain a final output signal.
2. The system according to claim 1, wherein: the data driving comparison module obtains the input signal to noise ratio to perform data driving comparison in the following manner:
Let the node of maximum input signal to noise ratio be the root node r i,
The input signal-to-noise ratio of any non-root node o i with only one neighbor node is sent to the neighbor nodes of the neighbor nodes, the non-root node p i with more than one neighbor node compares the input signal-to-noise ratio sent by the non-root node o i with all the neighbor nodes, the maximum value is found and sent to the neighbor node f i, and the sent maximum input signal-to-noise ratio is expressed as follows:
where o r i is an element in the neighbor set of node p i, T is the number of elements in the set, r is an element in the set {1,2,.,. T }, repeating this step until the data reaches the root node;
the root node compares the input signal-to-noise ratio of the own node with the input signal-to-noise ratio sent by the neighbor node:
and B is the total number of neighbors of the root node, and finally the maximum input signal-to-noise ratio iSNR i is obtained.
3. The system according to claim 1, wherein: the filter updating module updates the filter by adopting the following expression:
Wherein β+_0 is an adjustment factor, u n is a (E j +B) dimensional selection vector, only one element is 1, the other elements are 0, element 1 is at any position of the previous E j dimension, the filter part of the neighbor of the corresponding root node obtained according to the above formula is sent to its neighbor node by the root node, and the neighbor node of the root node updates its own filter:
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210329198.2A CN114724571B (en) | 2022-03-29 | 2022-03-29 | Robust distributed speaker noise elimination system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210329198.2A CN114724571B (en) | 2022-03-29 | 2022-03-29 | Robust distributed speaker noise elimination system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114724571A CN114724571A (en) | 2022-07-08 |
CN114724571B true CN114724571B (en) | 2024-05-03 |
Family
ID=82239147
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210329198.2A Active CN114724571B (en) | 2022-03-29 | 2022-03-29 | Robust distributed speaker noise elimination system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114724571B (en) |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1141548A (en) * | 1995-02-17 | 1997-01-29 | 索尼公司 | Method and apparatus for reducing noise in speech signal |
EP1585112A1 (en) * | 2004-03-30 | 2005-10-12 | Dialog Semiconductor GmbH | Delay free noise suppression |
WO2015189261A1 (en) * | 2014-06-13 | 2015-12-17 | Retune DSP ApS | Multi-band noise reduction system and methodology for digital audio signals |
CN105389491A (en) * | 2014-08-28 | 2016-03-09 | 凯文·艾伦·杜西 | Facial recognition authentication system including path parameters |
CN106973412A (en) * | 2017-01-18 | 2017-07-21 | 南京航空航天大学 | The distributed compression repeater system and design method of many junction networks under Gaussian source |
WO2018086444A1 (en) * | 2016-11-10 | 2018-05-17 | 电信科学技术研究院 | Method for estimating signal-to-noise ratio for noise suppression, and user terminal |
CN110739004A (en) * | 2019-10-25 | 2020-01-31 | 大连理工大学 | distributed voice noise elimination system for WASN |
EP3739356A1 (en) * | 2019-05-12 | 2020-11-18 | Origin Wireless, Inc. | Method, apparatus, and system for wireless tracking, scanning and monitoring |
CN113763984A (en) * | 2021-09-23 | 2021-12-07 | 大连理工大学 | Parameterized noise elimination system for distributed multiple speakers |
-
2022
- 2022-03-29 CN CN202210329198.2A patent/CN114724571B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1141548A (en) * | 1995-02-17 | 1997-01-29 | 索尼公司 | Method and apparatus for reducing noise in speech signal |
EP1585112A1 (en) * | 2004-03-30 | 2005-10-12 | Dialog Semiconductor GmbH | Delay free noise suppression |
WO2015189261A1 (en) * | 2014-06-13 | 2015-12-17 | Retune DSP ApS | Multi-band noise reduction system and methodology for digital audio signals |
CN105389491A (en) * | 2014-08-28 | 2016-03-09 | 凯文·艾伦·杜西 | Facial recognition authentication system including path parameters |
WO2018086444A1 (en) * | 2016-11-10 | 2018-05-17 | 电信科学技术研究院 | Method for estimating signal-to-noise ratio for noise suppression, and user terminal |
CN106973412A (en) * | 2017-01-18 | 2017-07-21 | 南京航空航天大学 | The distributed compression repeater system and design method of many junction networks under Gaussian source |
EP3739356A1 (en) * | 2019-05-12 | 2020-11-18 | Origin Wireless, Inc. | Method, apparatus, and system for wireless tracking, scanning and monitoring |
CN110739004A (en) * | 2019-10-25 | 2020-01-31 | 大连理工大学 | distributed voice noise elimination system for WASN |
CN113763984A (en) * | 2021-09-23 | 2021-12-07 | 大连理工大学 | Parameterized noise elimination system for distributed multiple speakers |
Also Published As
Publication number | Publication date |
---|---|
CN114724571A (en) | 2022-07-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110867181B (en) | Multi-target speech enhancement method based on SCNN and TCNN joint estimation | |
CN110600050B (en) | Microphone array voice enhancement method and system based on deep neural network | |
Buchner et al. | TRINICON: A versatile framework for multichannel blind signal processing | |
CN109727604A (en) | Frequency domain echo cancel method and computer storage media for speech recognition front-ends | |
US8848933B2 (en) | Signal enhancement device, method thereof, program, and recording medium | |
CN112581973B (en) | Voice enhancement method and system | |
CN112735456B (en) | Speech enhancement method based on DNN-CLSTM network | |
CN107845389A (en) | A kind of sound enhancement method based on multiresolution sense of hearing cepstrum coefficient and depth convolutional neural networks | |
US20080208538A1 (en) | Systems, methods, and apparatus for signal separation | |
Zhao et al. | Late reverberation suppression using recurrent neural networks with long short-term memory | |
CN104835503A (en) | Improved GSC self-adaptive speech enhancement method | |
CN105280193A (en) | Prior signal-to-noise ratio estimating method based on MMSE error criterion | |
Geng et al. | End-to-end speech enhancement based on discrete cosine transform | |
CN110739004B (en) | Distributed voice noise elimination system for WASN | |
CN115424627A (en) | Voice enhancement hybrid processing method based on convolution cycle network and WPE algorithm | |
Fu et al. | Boosting objective scores of a speech enhancement model by metricgan post-processing | |
CN112530451A (en) | Speech enhancement method based on denoising autoencoder | |
Selvi et al. | Hybridization of spectral filtering with particle swarm optimization for speech signal enhancement | |
CN114724571B (en) | Robust distributed speaker noise elimination system | |
CN113763984B (en) | Parameterized noise elimination system for distributed multi-speaker | |
Yamashita et al. | Improved spectral subtraction utilizing iterative processing | |
CN114724574A (en) | Double-microphone noise reduction method with adjustable expected sound source direction | |
Schwartz et al. | RNN-based step-size estimation for the RLS algorithm with application to acoustic echo cancellation | |
Srinivasarao | An efficient recurrent Rats function network (Rrfn) based speech enhancement through noise reduction | |
Boyko et al. | Using recurrent neural network to noise absorption from audio files. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |