CN114724571B - Robust distributed speaker noise elimination system - Google Patents

Robust distributed speaker noise elimination system Download PDF

Info

Publication number
CN114724571B
CN114724571B CN202210329198.2A CN202210329198A CN114724571B CN 114724571 B CN114724571 B CN 114724571B CN 202210329198 A CN202210329198 A CN 202210329198A CN 114724571 B CN114724571 B CN 114724571B
Authority
CN
China
Prior art keywords
module
node
noise ratio
root node
input signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210329198.2A
Other languages
Chinese (zh)
Other versions
CN114724571A (en
Inventor
畅瑞江
陈喆
殷福亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Dalian University of Technology
Original Assignee
Dalian University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Dalian University of Technology filed Critical Dalian University of Technology
Priority to CN202210329198.2A priority Critical patent/CN114724571B/en
Publication of CN114724571A publication Critical patent/CN114724571A/en
Application granted granted Critical
Publication of CN114724571B publication Critical patent/CN114724571B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Noise Elimination (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a robust distributed speaker noise elimination system, which comprises a discrete Fourier transform module, a voice activity detection module, a signal-to-noise ratio calculation module, a tree topology pruning module, a data driving comparison module, a data compression module, a root node operation module, a covariance matrix estimation module, a filter updating module, a result transmission module, a root node updating module and an inverse discrete Fourier transform module. The invention relates to a robust distributed speaker noise elimination technology which can be applied to any network topology connection, and the invention completes the comparison of input signal to noise ratio under the tree topology by pruning any network topology into the tree topology, so that the invention has certain robustness for a moving speaker, namely, no matter where the speaker is, the node with the maximum input signal to noise ratio can be always found, and finally distributed speaker noise elimination is carried out.

Description

Robust distributed speaker noise elimination system
Technical Field
The invention relates to the technical field of distributed noise elimination, in particular to a robust distributed speaker noise elimination system.
Background
Typically, the speech signal quality is severely affected by background noise, resulting in a significant performance penalty for the working device. In order to reduce the adverse effect of background noise on the working equipment, it is necessary to extract clean speech signals from the noisy speech signals. Among them, the conventional single microphone and multi-microphone noise canceling method can improve voice quality to some extent, but still has many limitations such as: a single microphone cannot acquire spatial information, structural rules of multiple microphones, and the like. However, the advent of Wireless Acoustic Sensor Networks (WASNs) has well remedied these limitations, consisting of multiple independent network nodes, each of which may carry a microphone or microphones, which form a network connection topology by way of wireless communications, and each of which has its own computing center. Each node in the WASN is located arbitrarily compared to the conventional single and multi-microphones, so that there is always one node closest to the source and the signal-to-noise ratio (SNR) of the voice signal collected by that node is relatively highest, which will improve the distributed speaker noise cancellation performance more effectively.
With the gradual improvement of the noise cancellation technology applied to the WASN, the technology can be divided into two types of centralized noise cancellation and distributed noise cancellation. The implementation of the centralized noise cancellation depends on an additional data processing center, that is, each node in the WASN needs to uniformly send the collected voice signals to the data processing center, and all operations of the noise cancellation technology are implemented by the data processing center. The method not only brings great operation amount and energy consumption to the data processing center, but also stops working when the WASN is damaged due to excessive dependence on the data processing center. However, distributed noise cancellation is performed cooperatively by each node, i.e., each node performs a corresponding operation, thereby eliminating the need for a data processing center. Even if part of nodes in the WASN are damaged, the distributed noise elimination can still achieve better noise elimination performance.
In the prior art, a distributed adaptive node-specific noise cancellation technique is proposed. The technology expands the distributed self-adaptive node specific signal estimation algorithm to the tree topology, and the distributed self-adaptive node specific signal estimation algorithm exchanges data with the neighboring nodes through each node under the situation of tree topology connection, so that the output of each node is approximately the same as the effect of the data processing center. Although the technology expands the existing distributed algorithm to the tree topology, and the final output result of distributed noise elimination can be approximately the same as the effect realized by the data processing center, the output performance of the technology is poor.
Topology independent distributed adaptive node specific noise cancellation techniques have also been studied in the prior art, which reduce the number of data exchanges by linearly compressing the signals received by each node, while the algorithm can be applied to any network topology, and the final distributed noise cancellation performance can achieve an effect similar to centralized. Although the technology realizes a topology independent distributed noise elimination algorithm, the technology realizes the distributed implementation of a centralized multi-channel wiener filtering algorithm, so that residual noise after voice noise elimination is still serious.
In addition, the influence of the distributed data exchange bit number on the noise elimination performance is considered in the prior art, so that the technology of adaptive quantization is proposed, and the technology can properly control the required energy and communication bandwidth according to the current environment. Although the technical scheme keeps low power consumption while finishing noise elimination, the performance of noise elimination is still poor, and a lot of residual noise exists.
For the existing distributed voice noise cancellation technology, a part of the distributed voice noise cancellation technology focuses on noise cancellation while ignoring consumption of communication bandwidth between nodes and power consumption of the nodes, a part focuses on reducing communication load and computational complexity as much as possible while noise cancellation performance is not satisfactory, and a part does not consider the motion state of a speaker although achieving a certain balance between the two, because when the position of the speaker changes, the voice characteristics collected by each node change, which leads to a large change in performance of the existing distributed noise cancellation technology. Based on the above, in order to further improve the distributed noise elimination effect under the condition that the speaker moves, the invention provides a robust distributed speaker noise elimination technical scheme on the premise of not being constrained by WASN topology, and the distributed noise elimination system combines the characteristics of the moving speaker and WASN and completes noise elimination.
Disclosure of Invention
According to the problems of the prior art, the invention discloses a robust distributed speaker noise cancellation system, comprising:
the discrete Fourier transform module is used for respectively carrying out framing and windowing processing on signals of J nodes in the wireless acoustic sensor network, carrying out discrete Fourier transform on each frame of signals to obtain discrete spectrum signals, and defining the discrete spectrum signals as node local signals;
The voice activity detection module is used for receiving the discrete spectrum signals transmitted by the discrete Fourier transform module, detecting the voice activity of the discrete spectrum signals and judging whether each frame of signal has voice or not so as to obtain a voice activity detection result;
The signal-to-noise ratio calculation module is used for calculating the input signal-to-noise ratio of each frame of signal according to the voice activity detection result obtained by the voice activity detection module;
the tree topology pruning module prunes the topology formed by the nodes in the wireless sensor network to form a tree topology structure;
The data driving comparison module is used for calculating each node in the tree topology structure and carrying out data driving comparison on the input signal to noise ratio so as to obtain the maximum input signal to noise ratio;
The data compression module is used for receiving the discrete spectrum signals transmitted by the discrete Fourier transform module and compressing the data of the neighbor nodes of the root node in a compression vector mode to obtain compressed data;
the root node operation module receives the maximum input signal-to-noise ratio transmitted by the data driving comparison module, and sums the compressed data transmitted by the data compression module and the compressed data of the root node to obtain a voice signal after noise elimination;
the covariance matrix estimation module is used for receiving the detection result transmitted by the voice activity detection module after the root node builds a local signal, and respectively calculating a noise covariance matrix and a voice covariance matrix;
The filter updating module 9 is used for receiving the covariance matrix transmitted by the covariance matrix estimation module and updating the filter of the root node;
the result transmission module receives the maximum input signal-to-noise ratio transmitted by the root node operation module and the voice signal after noise elimination transmitted by the root node operation module, and transmits the maximum input signal-to-noise ratio and the voice signal to each node along the direction away from the root node;
The root node updating module is used for comparing the maximum input signal-to-noise ratio of each node transmitted by the result transmission module with the input signal-to-noise ratio of each node, and enabling the node which is equal to the maximum input signal-to-noise ratio to be the root node of the next iteration;
The inverse discrete fourier transform module 12 receives the noise-removed speech signal transmitted by the result transmission module, performs inverse discrete fourier transform on the noise-removed speech signal to obtain a time-domain output speech signal of the current frame, and performs overlap-add on the time-domain output speech signal to obtain a final output signal.
Further, when the data driving comparison module obtains the input signal to noise ratio to perform data driving comparison, the following method is adopted:
Let the node of maximum input signal to noise ratio be the root node r i,
The input signal-to-noise ratio of any non-root node o i with only one neighbor node is sent to the neighbor nodes of the neighbor nodes, the non-root node p i with more than one neighbor node compares the input signal-to-noise ratio sent by the non-root node o i with all the neighbor nodes, the maximum value is found and sent to the neighbor node f t, and the sent maximum input signal-to-noise ratio is expressed as follows:
Where o r i is an element in the neighbor set of node p i, T is the number of elements in the set, r is an element in the set {1,2,.,. T }, repeating this step until the data reaches the root node;
the root node compares the input signal-to-noise ratio of the own node with the input signal-to-noise ratio sent by the neighbor node:
and B is the total number of neighbors of the root node, and finally the maximum input signal-to-noise ratio iSNR i is obtained.
Further, the filter updating module updates the filter by using the following expression:
Wherein β+_0 is an adjustment factor, u n is a (E j +B) dimensional selection vector, only one element is 1, the other elements are 0, element 1 is at any position of the previous E j dimension, the filter part of the neighbor of the corresponding root node obtained according to the above formula is sent to its neighbor node by the root node, and the neighbor node of the root node updates its own filter:
By adopting the technical scheme, the robust distributed speaker noise elimination system provided by the invention is a robust distributed speaker noise elimination technology which can be applied to any network topology connection, and the noise elimination technology can be used for eliminating the distributed speaker noise by pruning any network topology into a tree topology and completing comparison of input signal to noise ratios under the tree topology, so that the system has certain robustness for a moving speaker, namely, no matter where the speaker is located, the node with the maximum input signal to noise ratio can be always found, and finally the distributed speaker noise elimination is carried out. The invention takes the node with the maximum input signal-to-noise ratio as the root node and only carries out filter updating on the root node, which not only reduces the quantity of data exchanged between the nodes, but also can eliminate a large amount of noise through the adjusting factors.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present application, and other drawings may be obtained according to the drawings without inventive effort to those skilled in the art.
FIG. 1 is a schematic diagram of the system of the present invention;
FIG. 2 is a schematic diagram of a wireless acoustic sensor network in accordance with the present invention;
FIG. 3 is a schematic diagram of a network topology according to the present invention;
FIG. 4 shows the PESQ values (white noise) after speech noise cancellation for each method for different input signal-to-noise ratios in an embodiment of the present invention;
FIG. 5 shows the PESQ values (babble noise) after speech noise cancellation for each method at different input signal-to-noise ratios in an embodiment of the present invention;
FIG. 6 shows the PESQ values (in-vehicle noise) after speech noise cancellation for each method under different input signal-to-noise ratios in an embodiment of the present invention;
Detailed Description
In order to make the technical scheme and advantages of the present invention more clear, the technical scheme in the embodiment of the present invention is clearly and completely described below with reference to the accompanying drawings in the embodiment of the present invention:
the robust distributed speaker noise cancellation system shown in fig. 1 comprises a discrete fourier transform module 1, a voice activity detection module 2, a signal-to-noise ratio calculation module 3, a tree topology pruning module 4, a data driving comparison module 5, a data compression module 6, a root node operation module 7, a covariance matrix estimation module 8, a filter updating module 9, a result transmission module 10, a root node updating module 11 and an inverse discrete fourier transform module 12.
The discrete fourier transform module 1 is used for performing frame windowing processing on signals of J nodes in the wireless acoustic sensor network respectively, performing discrete fourier transform on each frame of signals to obtain discrete spectrum signals, and defining the discrete spectrum signals as node local signals.
As a preferred mode, the discrete fourier transform module 1 operates on the principle that: in wassn there are a total of J nodes, and each node J has E j microphones. First, each path of signal y j,e (n) (i.e., the e-th path of signal of the j-th node) of each node j is subjected to framing and windowing, and then each frame of signal is subjected to Discrete Fourier Transform (DFT). The sampling frequency fs of the voice signal is 16kHz during verification, a hanning window is used, the frame shift is 50%, and the length of each frame of data is m=320 points. Wherein the hanning window has the following expression:
ω(m)=0.5-0.5cos(2πm/M),m=0,1,...,M-1 (1)
The windowed signal can be obtained according to the hanning window expression as follows:
y′j,e(m)=yj,e(n)ω(m) (2)
Then each frame of signal after windowing of each signal carries out DFT, and discrete frequency spectrum can be obtained after conversion as follows:
Where k represents the bin index and l represents the frame index.
The signals Y j,e (k, l) of each node are stacked, and the stacked vector form is expressed as follows:
Wherein the indices k and l are omitted for convenience. In addition, y j=xj+vj, where x j is the speech portion and v j is the noise portion.
Further, the voice activity detection module 2 is configured to receive the discrete spectrum signal transmitted by the discrete fourier transform module 1, perform voice activity detection on the discrete spectrum signal, and determine whether voice exists in each frame of signal, so as to obtain a voice activity detection result.
As a preferred mode, the voice activity detection module 2 works on the principle that the discrete spectrum of each signal obtained by the discrete fourier transform module 1 is respectively subjected to voice activity detection, and the characteristic that most of the first second of voice is a non-voice segment is utilized, and the number of non-voice frames of the voice signal which are the most initial is NIS frame in combination with the processing procedure of frame windowing, wherein nis=fs/(50% ×m) -1=99. Then the noise average spectrum estimated using this NIS frame is:
equation (5) represents that the corresponding frequency points of each frame signal are summed and then averaged. Further, the logarithmic spectrum estimation of the noise frame is represented as follows:
Where |·| is a modulo operation. Then, the log spectrum of each frame of signal is calculated:
The logarithmic spectrum distance between each frame of signal and noise signal can be obtained by the formula (6) and the formula (7), and the logarithmic spectrum distance formula is as follows:
To sum up, a method for determining voice activity detection can be obtained: first, a no-speech-segment counter is set, which can be set to an initial value of 100, while a log-spectral distance threshold of 3 is set. Then, the logarithmic spectrum distance d spec between each frame signal and noise frame is calculated, and it is determined whether d spec is smaller than the logarithmic spectrum distance threshold, if yes, the frame is a no-speech frame, the no-speech segment counter is incremented by 1, if not, the frame is a speech frame, and no-speech segment counter must be zeroed. Finally, it should be noted that if the value of the no-speech-segment counter before the zero-resetting is smaller than the minimum no-speech length, the frames that are no-speech segments after the last zero-resetting of the no-speech-segment counter and before the zero-resetting of the no-speech-segment counter are considered to be speech frames. Here let the minimum no-speech length be 10.
In order to reduce the distortion of voice during verification, the patent considers the voice frame when the voice activity detection result of each path of signal is the noise frame, otherwise, the voice frame is considered.
The function of the signal-to-noise ratio calculation module 3 is to calculate an input signal-to-noise ratio for each frame signal based on the voice activity detection result obtained by the voice activity detection module 2.
Preferably, the working principle of the signal-to-noise ratio calculating module 3 is to calculate an input signal-to-noise ratio for each frame signal according to the voice activity detection result obtained by the voice activity detecting module 2, and when no voice frame exists, to calculate the noise power by statistical average, and when a voice frame appears, to calculate the input signal-to-noise ratio for the frame:
Where |· | is a two-norm and E [ · ] represents the statistical average up to the current first frame.
The tree topology pruning module 4 is used for pruning the topology formed by the nodes in the wireless sensor network to form a tree topology.
As a preferred manner, the tree topology pruning module 4 works on the principle that the topology formed by the nodes is pruned, firstly, we specify that the coordinates of two nodes are (x 1,y1) and (x 2,y2), respectively, and then the euclidean distance d of the two nodes can be expressed as follows:
according to the above operation, the Euclidean distance between every two points can be obtained. Next, a node needs to be arbitrarily selected from the network to be placed in the node set S (which is initially an empty set). The points which are connected with the points in the set S and are not selected are selected, the point with the smallest Euclidean distance with the points in the set S is put into the set S, and the steps are repeated until all the points are selected.
The function of the data-driven comparison module 5 is to perform data-driven comparison on the input signal-to-noise ratio calculated by each node in the tree topology formed by the tree topology pruning module 4, and finally obtain the maximum input signal-to-noise ratio.
Preferably, since the tree topology is formed as it can be regenerated at each iteration, an iteration index i is also added to the representation of the node. The data driving comparison in the formed tree topology is divided into the following steps:
Firstly, a node needing to be assigned with the maximum input signal-to-noise ratio is a root node r i, and the initial root node is any node because the initial voice part is generally a non-voice frame and the input signal-to-noise ratio cannot be calculated;
The second step, the input signal-to-noise ratio (belonging to the output of the signal-to-noise ratio calculation module 3) of any non-root node o i with only one neighbor node is sent to its neighbor node (towards the root node), then the non-root node p i with more than one neighbor node sends the maximum value of the input signal-to-noise ratios sent from all neighbor nodes (excluding some neighbor node f t pointing to the root node) to the neighbor node f t, and the sent maximum input signal-to-noise ratio is expressed as follows:
Where o r i is an element in the set of neighbors of node p i (excluding its neighbor node f t), T is the number of elements in the set, r is an element in the set {1, 2., T };
Third, repeating the second step until the data reach the root node;
Fourth, the root node compares the input signal-to-noise ratio of the own node with the input signal-to-noise ratio sent by the neighbor node:
and B is the total number of neighbors of the root node, and finally the maximum input signal-to-noise ratio iSNR i is obtained.
The data compression module 6 is used for receiving the data transmitted by the discrete fourier transform module 1, and compressing the data of the neighbor nodes of the root node through the compression vector to obtain compressed data.
Preferably, the data compression module 6 operates on the principle that only the data y j (belonging to the output of the discrete fourier transform module 1) of the neighboring node of the root node is compressed:
Where z j i is scalar data after compression, w j iH is a compression vector, and also is a partial filter of the centralized filter corresponding to the node data, (. Cndot.) H represents the conjugate transpose of the vector or matrix. In addition, the compressed vector needs to be initialized, and the compressed vector element is initialized to be random numbers which are subjected to uniform distribution in a unit interval during verification.
It should be emphasized that, all the corner marks i appearing are iteration indexes, and the data of the ith iteration can be considered as the data of the ith frame, and the patent sets the iteration index to 1 for the first time, namely, starts from the data of the 1 st frame during verification.
The root node operation module 7 is used for receiving the maximum input signal to noise ratio obtained by the data driving comparison module 5, and summing the compressed data from the data compression module 6 and the compressed data of the root node itself to obtain a voice signal after noise elimination.
As a preferred mode, the root node operation module 7 operates on the principle that the received data drives the comparison module 5 to obtain the maximum input signal to noise ratio iSNR i, and sums the compressed data from the data compression module 6 with the compressed data of the root node itself to obtain the voice signal after noise cancellation:
The covariance matrix estimation module 8 is used for respectively calculating a noise covariance matrix and a voice covariance matrix according to the detection result transmitted by the voice activity detection module 2 after the root node constructs a local signal.
As a preferable mode, the covariance matrix estimation module 8 works on the principle that the root node is constructed by using the compressed data of the root node neighbor output by the data compression module 6 to perform local signal construction:
wherein the signal is a column vector of dimension (E j + B). Meanwhile, the signal may also be expressed as:
other non-root nodes also perform the above operation to construct local signals of own nodes. Unlike the above, the non-root node does not need to receive the compressed data of its neighbor nodes, and only needs to put the 0 element equivalent to the number of its neighbors into the local signal.
Then, the noise covariance matrix and the voice covariance matrix are estimated on each node j, and the noise covariance matrix is updated only in the frames without voice, and the rest frames update the voice covariance matrix. The noise covariance matrix updating formula of the current first frame of each node is as follows:
Where α=0.997, l-1 represents the noise covariance matrix estimate of the frame on the jth node. The noise covariance matrix has an estimated value for each frequency bin, and if the current frame is a noise frame, the value is updated as described above. When a non-noise frame occurs, then the noisy speech covariance matrix is updated as follows:
the speech covariance matrix can be obtained from equation (17) and equation (18):
The filter updating module 9 is used for updating the filter of the root node according to the covariance matrix estimated by the covariance matrix estimation module 8.
Preferably, the filter updating module 9 works on the principle that the filter of the root node can be updated according to the covariance matrix estimated by the covariance matrix estimation module 8, and the expression of the filter is as follows:
Wherein β+_0 is an adjustment factor, the larger it is, the larger the noise cancellation capability is, the more the resulting speech distortion will be relatively large, u n is a (E j +B) -dimensional selection vector, which has only one element of 1, the other elements of 0, and element 1 is at any position of the preceding E j -dimensional. The filter portion of the corresponding root node neighbor resulting from equation (20) is then sent by the root node to its neighbor nodes, which then update their own filters:
The effect of the result transfer module 10 is to receive the maximum input signal-to-noise ratio and the noise-cancelled speech signal transmitted by the root node operation module 7, and to propagate both to each node in a direction away from the root node.
Preferably, the result delivery module 10 operates on the principle that the maximum input signal-to-noise ratio iSNR i obtained by the root node operation module 7 and the noise-cancelled voice signal d RD i are broadcast to each node in a direction away from the root node.
The root node update module 11 is configured to compare the maximum input signal-to-noise ratio propagated to each node by the result transfer module 10 with the input signal-to-noise ratio of each node, so that the node equal to the maximum input signal-to-noise ratio is the root node of the next iteration.
Preferably, the root node updating module 11 operates on the principle that the maximum input signal-to-noise ratio iSNR i propagated to each node by the result delivery module 10 is compared with the input signal-to-noise ratio of each node, and the node equal to the maximum input signal-to-noise ratio iSNR i is updated to the root node of the next iteration.
The inverse discrete fourier transform module 12 is operative to receive the noise-cancelled speech signal transmitted by the result transmission module 10, perform inverse discrete fourier transform on the noise-cancelled speech signal to obtain a time-domain output speech signal of the current frame, and perform overlap-add on the time-domain output speech signal to obtain a final output signal.
Preferably, the inverse discrete fourier transform module 12 operates on the principle that the noise-removed speech signal d RD i propagated to each node by the result transfer module 10 is subjected to Inverse Discrete Fourier Transform (IDFT) to obtain a noise-removed speech signal output by the current frame in the time domain. The IDFT formula is as follows:
Where i and l represent the same meaning, i.e. the ith iteration is identical to the first frame, the iteration index i is omitted below when both occur simultaneously.
Because the discrete fourier transform module 1 performs frame windowing processing on each path of signal and the frame shift is 50%, when the first frame output voice signal is obtained, overlap-add operation is performed on the first frame output voice signal and the second frame output voice signal, and the overlap part accounts for 50%, and the specific formula is as follows:
Where [. Cndot. ] is a rounding operation, [ a ] represents a maximum integer not exceeding the number a.
In order to verify the effectiveness of the method, the invention simulates a 7m multiplied by 3m closed room by Imgae model, and the reverberation time is 300ms. 10 nodes are randomly distributed in the room, each node is a linear array (with a spacing of 8 cm) containing 3 microphones, and the height of each node is set to be 1m. The starting point and the ending point of the motion trail of a female speaker are respectively (3 m,1.4 m) and (0.7 m,5.8 m), the height is 1.7m, and the motion trail is specifically a curve plus a straight line. The simulated two-dimensional WASN is shown in fig. 2, and the oval dotted line represents 1 node, and 3 microphones contained in each node are represented by solid dots, and meanwhile, coordinates of two inflection points of the speaker motion trail are marked.
The speech is 1 clean speech signal randomly selected from TIMIT database [ https:// download. Csdn. Net/download/sdhyfxh/4086482] for a period of 6 seconds. The noise is selected from white noise, babble noise and noise in the automobile, and the sampling frequencies of the voice and the noise are 16kHz. Fig. 3 shows a network topology connection schematic diagram of the WASN, and a black bold solid line represents a tree topology obtained by pruning an original topology (black bold and non-bold representation).
At this time, the robust distributed parametric multi-channel wiener filtering (RD-PMWF) proposed by the patent is adopted to reduce noise of the signals received by each node, and simultaneously, the methods in the document [1] and the document [2] are adopted to respectively eliminate voice noise for the experiment. Fig. 4, 5 and 6 show a comparison of the performance of the different methods against three background noise. Wherein fig. 4 is a performance comparison of Perceptual Evaluation of Speech Quality (PESQ) under different input signal-to-noise ratios with white noise as background noise, the experiment was performed taking the adjustment factor β of the distributed denoising technique of this patent as 15 and 20, respectively. Fig. 5 and 6 differ from fig. 4 only in the background noise, and the remaining experimental conditions are identical. Experimental results show that the method of the document [1] cannot denoise the signal before denoising under any background noise, and the method of the document [2] has certain denoising capability, but the denoising capability is stronger under the condition of the movement of a speaker.
The foregoing is only a preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art, who is within the scope of the present invention, should make equivalent substitutions or modifications according to the technical scheme of the present invention and the inventive concept thereof, and should be covered by the scope of the present invention.
Accessories:
[1]A.Bertrand and M.Moonen,″Distributed Adaptive Estimation of Node-Specific Signals in Wireless Sensor Networks With a Tree Topology,″in IEEE Transactions on Signal Processing,vol.59,no.5,PP.2196-2210,May 2011.
[2]J.Szurley,A.Bertrand and M.Moonen,″Topology-Independent Distributed Adaptive Node-Specific Signal Estimation in Wireless Sensor Networks,″in IEEE Transactions on Signal and Information Processing over Networks,vol.3,no.1,pp.130-144,March 2017.
[3]F.de la Hucha Arce,M.Moonen,M.Verhelst,A.Bertrand,″Adaptive Quantization for Multichannel Wiener Filter-Based Speech Enhancement in Wireless Acoustic Sensor Networks,″Wireless Communications and Mobile Computing,vol.2017,Article ID 3173196,15 pages,2017.

Claims (3)

1. A robust distributed speaker noise cancellation system, comprising:
the discrete Fourier transform module is used for respectively carrying out framing and windowing processing on signals of J nodes in the wireless acoustic sensor network, carrying out discrete Fourier transform on each frame of signals to obtain discrete spectrum signals, and defining the discrete spectrum signals as node local signals;
The voice activity detection module is used for receiving the discrete spectrum signals transmitted by the discrete Fourier transform module, detecting the voice activity of the discrete spectrum signals and judging whether each frame of signal has voice or not so as to obtain a voice activity detection result;
The signal-to-noise ratio calculation module is used for calculating the input signal-to-noise ratio of each frame of signal according to the voice activity detection result obtained by the voice activity detection module;
Pruning topology formed by nodes in the wireless sensor network to obtain Euclidean distance between every two points, randomly selecting one node from the network to be placed in a node set S, selecting points which are connected with the points in the set S and are not selected, placing the point with the smallest Euclidean distance between the points and the points in the set S, and repeating the steps until all the points are selected;
The data driving comparison module is used for calculating each node in the tree topology structure, and performing data driving comparison on the input signal to noise ratio so as to obtain the maximum input signal to noise ratio, wherein the data driving comparison module is used for generating again after each iteration when forming the tree topology, and adding an iteration index to the representation of the node;
The data compression module is used for receiving the discrete spectrum signals transmitted by the discrete Fourier transform module and compressing the data of the neighbor nodes of the root node in a compression vector mode to obtain compressed data;
the root node operation module receives the maximum input signal-to-noise ratio transmitted by the data driving comparison module, and sums the compressed data transmitted by the data compression module and the compressed data of the root node to obtain a voice signal after noise elimination;
the covariance matrix estimation module is used for receiving the detection result transmitted by the voice activity detection module after the root node builds a local signal, and respectively calculating a noise covariance matrix and a voice covariance matrix;
The filter updating module is used for receiving the covariance matrix transmitted by the covariance matrix estimation module and updating the filter of the root node;
the result transmission module receives the maximum input signal-to-noise ratio transmitted by the root node operation module and the voice signal after noise elimination transmitted by the root node operation module, and transmits the maximum input signal-to-noise ratio and the voice signal to each node along the direction away from the root node;
The root node updating module is used for comparing the maximum input signal-to-noise ratio of each node transmitted by the result transmission module with the input signal-to-noise ratio of each node, and enabling the node which is equal to the maximum input signal-to-noise ratio to be the root node of the next iteration;
the discrete Fourier inverse transformation module receives the voice signal after noise elimination transmitted by the result transmission module, performs discrete Fourier inverse transformation on the voice signal after noise elimination to obtain a time domain output voice signal of the current frame, and performs overlap addition on the time domain output voice signal to obtain a final output signal.
2. The system according to claim 1, wherein: the data driving comparison module obtains the input signal to noise ratio to perform data driving comparison in the following manner:
Let the node of maximum input signal to noise ratio be the root node r i,
The input signal-to-noise ratio of any non-root node o i with only one neighbor node is sent to the neighbor nodes of the neighbor nodes, the non-root node p i with more than one neighbor node compares the input signal-to-noise ratio sent by the non-root node o i with all the neighbor nodes, the maximum value is found and sent to the neighbor node f i, and the sent maximum input signal-to-noise ratio is expressed as follows:
where o r i is an element in the neighbor set of node p i, T is the number of elements in the set, r is an element in the set {1,2,.,. T }, repeating this step until the data reaches the root node;
the root node compares the input signal-to-noise ratio of the own node with the input signal-to-noise ratio sent by the neighbor node:
and B is the total number of neighbors of the root node, and finally the maximum input signal-to-noise ratio iSNR i is obtained.
3. The system according to claim 1, wherein: the filter updating module updates the filter by adopting the following expression:
Wherein β+_0 is an adjustment factor, u n is a (E j +B) dimensional selection vector, only one element is 1, the other elements are 0, element 1 is at any position of the previous E j dimension, the filter part of the neighbor of the corresponding root node obtained according to the above formula is sent to its neighbor node by the root node, and the neighbor node of the root node updates its own filter:
CN202210329198.2A 2022-03-29 2022-03-29 Robust distributed speaker noise elimination system Active CN114724571B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210329198.2A CN114724571B (en) 2022-03-29 2022-03-29 Robust distributed speaker noise elimination system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210329198.2A CN114724571B (en) 2022-03-29 2022-03-29 Robust distributed speaker noise elimination system

Publications (2)

Publication Number Publication Date
CN114724571A CN114724571A (en) 2022-07-08
CN114724571B true CN114724571B (en) 2024-05-03

Family

ID=82239147

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210329198.2A Active CN114724571B (en) 2022-03-29 2022-03-29 Robust distributed speaker noise elimination system

Country Status (1)

Country Link
CN (1) CN114724571B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1141548A (en) * 1995-02-17 1997-01-29 索尼公司 Method and apparatus for reducing noise in speech signal
EP1585112A1 (en) * 2004-03-30 2005-10-12 Dialog Semiconductor GmbH Delay free noise suppression
WO2015189261A1 (en) * 2014-06-13 2015-12-17 Retune DSP ApS Multi-band noise reduction system and methodology for digital audio signals
CN105389491A (en) * 2014-08-28 2016-03-09 凯文·艾伦·杜西 Facial recognition authentication system including path parameters
CN106973412A (en) * 2017-01-18 2017-07-21 南京航空航天大学 The distributed compression repeater system and design method of many junction networks under Gaussian source
WO2018086444A1 (en) * 2016-11-10 2018-05-17 电信科学技术研究院 Method for estimating signal-to-noise ratio for noise suppression, and user terminal
CN110739004A (en) * 2019-10-25 2020-01-31 大连理工大学 distributed voice noise elimination system for WASN
EP3739356A1 (en) * 2019-05-12 2020-11-18 Origin Wireless, Inc. Method, apparatus, and system for wireless tracking, scanning and monitoring
CN113763984A (en) * 2021-09-23 2021-12-07 大连理工大学 Parameterized noise elimination system for distributed multiple speakers

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1141548A (en) * 1995-02-17 1997-01-29 索尼公司 Method and apparatus for reducing noise in speech signal
EP1585112A1 (en) * 2004-03-30 2005-10-12 Dialog Semiconductor GmbH Delay free noise suppression
WO2015189261A1 (en) * 2014-06-13 2015-12-17 Retune DSP ApS Multi-band noise reduction system and methodology for digital audio signals
CN105389491A (en) * 2014-08-28 2016-03-09 凯文·艾伦·杜西 Facial recognition authentication system including path parameters
WO2018086444A1 (en) * 2016-11-10 2018-05-17 电信科学技术研究院 Method for estimating signal-to-noise ratio for noise suppression, and user terminal
CN106973412A (en) * 2017-01-18 2017-07-21 南京航空航天大学 The distributed compression repeater system and design method of many junction networks under Gaussian source
EP3739356A1 (en) * 2019-05-12 2020-11-18 Origin Wireless, Inc. Method, apparatus, and system for wireless tracking, scanning and monitoring
CN110739004A (en) * 2019-10-25 2020-01-31 大连理工大学 distributed voice noise elimination system for WASN
CN113763984A (en) * 2021-09-23 2021-12-07 大连理工大学 Parameterized noise elimination system for distributed multiple speakers

Also Published As

Publication number Publication date
CN114724571A (en) 2022-07-08

Similar Documents

Publication Publication Date Title
CN110867181B (en) Multi-target speech enhancement method based on SCNN and TCNN joint estimation
CN110600050B (en) Microphone array voice enhancement method and system based on deep neural network
Buchner et al. TRINICON: A versatile framework for multichannel blind signal processing
CN109727604A (en) Frequency domain echo cancel method and computer storage media for speech recognition front-ends
US8848933B2 (en) Signal enhancement device, method thereof, program, and recording medium
CN112581973B (en) Voice enhancement method and system
CN112735456B (en) Speech enhancement method based on DNN-CLSTM network
CN107845389A (en) A kind of sound enhancement method based on multiresolution sense of hearing cepstrum coefficient and depth convolutional neural networks
US20080208538A1 (en) Systems, methods, and apparatus for signal separation
Zhao et al. Late reverberation suppression using recurrent neural networks with long short-term memory
CN104835503A (en) Improved GSC self-adaptive speech enhancement method
CN105280193A (en) Prior signal-to-noise ratio estimating method based on MMSE error criterion
Geng et al. End-to-end speech enhancement based on discrete cosine transform
CN110739004B (en) Distributed voice noise elimination system for WASN
CN115424627A (en) Voice enhancement hybrid processing method based on convolution cycle network and WPE algorithm
Fu et al. Boosting objective scores of a speech enhancement model by metricgan post-processing
CN112530451A (en) Speech enhancement method based on denoising autoencoder
Selvi et al. Hybridization of spectral filtering with particle swarm optimization for speech signal enhancement
CN114724571B (en) Robust distributed speaker noise elimination system
CN113763984B (en) Parameterized noise elimination system for distributed multi-speaker
Yamashita et al. Improved spectral subtraction utilizing iterative processing
CN114724574A (en) Double-microphone noise reduction method with adjustable expected sound source direction
Schwartz et al. RNN-based step-size estimation for the RLS algorithm with application to acoustic echo cancellation
Srinivasarao An efficient recurrent Rats function network (Rrfn) based speech enhancement through noise reduction
Boyko et al. Using recurrent neural network to noise absorption from audio files.

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant