CN114898766A - Distributed optical fiber voice enhancement method based on GAN network and tunnel rescue system - Google Patents

Distributed optical fiber voice enhancement method based on GAN network and tunnel rescue system Download PDF

Info

Publication number
CN114898766A
CN114898766A CN202210812753.7A CN202210812753A CN114898766A CN 114898766 A CN114898766 A CN 114898766A CN 202210812753 A CN202210812753 A CN 202210812753A CN 114898766 A CN114898766 A CN 114898766A
Authority
CN
China
Prior art keywords
das
voice
gan network
voice signals
optical fiber
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210812753.7A
Other languages
Chinese (zh)
Inventor
盛鹏
罗煜
何子牛
王茂宁
钟羽中
张晨思
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Expressway Construction And Development Group Co ltd
Original Assignee
Sichuan Expressway Construction And Development Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Expressway Construction And Development Group Co ltd filed Critical Sichuan Expressway Construction And Development Group Co ltd
Priority to CN202210812753.7A priority Critical patent/CN114898766A/en
Publication of CN114898766A publication Critical patent/CN114898766A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/02Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using spectral analysis, e.g. transform vocoders or subband vocoders
    • G10L19/022Blocking, i.e. grouping of samples in time; Choice of analysis windows; Overlap factoring
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/04Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
    • G10L19/26Pre-filtering or post-filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04BTRANSMISSION
    • H04B10/00Transmission systems employing electromagnetic waves other than radio-waves, e.g. infrared, visible or ultraviolet light, or employing corpuscular radiation, e.g. quantum communication
    • H04B10/07Arrangements for monitoring or testing transmission systems; Arrangements for fault measurement of transmission systems
    • H04B10/071Arrangements for monitoring or testing transmission systems; Arrangements for fault measurement of transmission systems using a reflected signal, e.g. using optical time domain reflectometers [OTDR]

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Quality & Reliability (AREA)
  • Electromagnetism (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention discloses a distributed optical fiber voice enhancement method based on a GAN network and a tunnel rescue system, wherein the distributed optical fiber voice enhancement method based on the GAN network comprises the following steps: collecting DAS voice signals; obtaining a pure voice signal; preprocessing the acquired DAS voice signals; converting the preprocessed DAS voice signals into a Mel frequency spectrum characteristic diagram; and inputting the Mel frequency spectrum characteristic diagram and the pure voice signal into a pre-constructed GAN network. The distributed optical fiber voice enhancement method of the invention introduces the concept of multi-frequency band by reforming the existing GAN network, changes the receptive field, and then utilizes the improved GAN network to carry out voice processing, thereby effectively enhancing the optical fiber voice signal and improving the high-frequency signal.

Description

Distributed optical fiber voice enhancement method based on GAN network and tunnel rescue system
Technical Field
The invention belongs to the technical field of voice enhancement, and particularly relates to a distributed optical fiber voice enhancement method based on a GAN network and a tunnel rescue system.
Background
The existing monitoring technology for highway tunnel abnormal events (such as dangerous events like traffic accidents, fire disasters, collapse and the like in tunnels and emergency events like rescue calls and the like) comprises various strain sensors, inclination sensors, temperature sensors, laser, video and other means; however, these methods are all point-type monitoring, and cannot realize linear long-distance monitoring, and especially when a tunnel accident causes power failure and network disconnection, accident information and rescue calls cannot be transmitted to the outside.
Under the scene of tunnel collapse, it is extremely difficult thing that the rescue team looks for the trapped person through the manpower, catches, discerns through the sound of asking for help to the trapped person, and then fixes a position the trapped person, will have huge help to rescue work, saves rescue time greatly. Voice enhancement is a common technique when capturing and recognizing distress sounds of trapped persons.
Classical speech enhancement methods are spectral subtraction, wiener filtering, statistical model-based methods and subspace algorithms. Since the 80 s, neural networks have also been applied to speech enhancement; in recent years, a denoising auto-encoder structure has been widely adopted, and a Recurrent Neural Network (RNN) is also widely used. For example, recursive denoising autocoders exhibit significant performance in exploiting temporal context information in embedded signals. Most current systems are based on short-time fourier analysis/synthesis frameworks, which only change the size of the spectrum, since one often claims that short-time phase is not important for speech enhancement. However, further studies have shown that significant improvements in speech quality are possible, particularly when a clean phase spectrum is known. In 1988, Tamura et al proposed a deep network that worked directly on the original audio waveform, but they used a feed forward layer that worked on a speaker-and isolated word database on a frame-by-frame (60 samples) basis.
With the rise of neural networks in recent years, the neural networks have been widely applied to speech enhancement tasks due to their excellent feature extraction and data fitting capabilities, and the methods based on the neural networks have been greatly improved compared with the conventional methods. In the methods, a DNN-based method is mainly used for processing voice in a frequency domain, obtaining a short-time frequency spectrum through short-time Fourier transform, then processing the short-time frequency spectrum, learning a mapping function from a noise signal to a pure signal by using the fitting capacity of DNN, adding the first 6 frames of each audio frequency as the estimation of noise into training, and finally reconstructing and enhancing the voice signal by using the phase of the voice containing the noise in post-processing. Although the enhancement effect of the method is improved well compared with the prior art, the robustness of the method to the non-stationary noise environment is obviously poor because the noise estimation only uses the information of the first 6 frames.
Disclosure of Invention
The invention aims to overcome one or more defects in the prior art and provides a distributed optical fiber voice enhancement method based on a GAN network and a tunnel rescue system.
The purpose of the invention is realized by the following technical scheme:
according to a first aspect of the present invention, a distributed optical fiber voice enhancement method and a tunnel rescue system based on a GAN network are provided, including:
collecting DAS voice signals;
obtaining a pure voice signal;
preprocessing the acquired DAS voice signals;
converting the preprocessed DAS voice signals into a Mel frequency spectrum characteristic diagram;
and inputting the Mel frequency spectrum characteristic diagram and the pure voice signal into a pre-constructed GAN network.
Preferably, the preprocessing of the acquired DAS voice signals includes:
pre-emphasis the DAS voice signals;
framing the pre-emphasized DAS voice signals;
and windowing the DAS voice signals after the framing.
Preferably, the pre-emphasis conversion formula is:
Figure DEST_PATH_IMAGE001
in the formula, alpha represents a pre-emphasis coefficient, and the value of the pre-emphasis coefficient is between 0.9 and 1.0; x (t) represents the audio amplitude at time t; y (t) represents the amplitude of the audio at time t after pre-emphasis.
Preferably, when the pre-emphasized DAS voice signal is framed, an overlapping region exists between two adjacent frames.
Preferably, the windowing processing is performed on the framed DAS voice signal, and includes:
each frame signal is multiplied by a hamming window, respectively.
Preferably, the converting the preprocessed DAS voice signals into mel-frequency spectrum feature maps includes:
performing fast Fourier transform on the preprocessed DAS voice signals to obtain energy spectrums of the DAS voice signals;
and filtering the energy spectrum of the DAS voice signal by using a Mel-scale triangular filter bank to obtain a Mel frequency spectrum characteristic diagram.
Preferably, the method for constructing the GAN network includes:
constructing a generator for realizing frequency domain to time domain up-sampling;
constructing a discriminator;
and combining the generator and the discriminator to form the finished GAN network.
Preferably, the GAN network includes a generator and at least one discriminator, the generator includes two transition layers and four upsampling layers, the upsampling layer is located between the two transition layers, a residual empty convolutional block is connected after each upsampling layer, and the discriminator includes three convolutional layers and four downsampling layers.
Preferably, the residual hole convolution block is formed by four layers of hole convolutions, and the expansion coefficients of the four layers of hole convolutions are 1, 3, 9 and 27, respectively.
According to a second aspect of the present invention, a GAN network-based tunnel rescue system is provided, which includes:
the first container is arranged at the signal acquisition point;
the second end of the optical cable is wound on the outer wall of the first container and is used for collecting DAS voice signals;
the input end of the optical fiber sensing equipment is connected with the first end of the optical cable; the optical fiber sensing equipment is used for acquiring pure voice signals, preprocessing the acquired DAS voice signals, converting the preprocessed DAS voice signals into Mel frequency spectrum characteristic diagrams, inputting the Mel frequency spectrum characteristic diagrams and the pure voice signals into a pre-constructed GAN network, and positioning signal acquisition points according to the DAS voice signals processed by the GAN network.
The invention has the beneficial effects that:
(1) the distributed optical fiber voice enhancement method of the invention introduces the concept of multi-band by reforming the existing GAN network, changes the receptive field, and then utilizes the improved GAN network to carry out voice processing, thereby effectively enhancing optical fiber voice signals and improving high-frequency signals;
(2) the tunnel rescue system provided by the invention is used for acquiring and transmitting emergency call signals based on optical fibers, has strong environment adaptability, and can work normally under the extreme conditions of network interruption, power failure and the like in a tunnel.
Drawings
Fig. 1 is a flowchart of an embodiment of a distributed optical fiber voice enhancement method based on a GAN network according to the present invention;
FIG. 2 is a block diagram of the components of one embodiment of a GAN network;
FIG. 3 is a block diagram of the components of one embodiment of a generator;
FIG. 4 is a block diagram of the components of one embodiment of an arbiter;
FIG. 5 is a time-space two-dimensional signal diagram of rockfall detection in a rockfall experiment from near to far;
FIG. 6 is a time-space two-dimensional signal diagram of rockfall detection in a rockfall experiment from far to near.
Detailed Description
The technical solutions of the present invention will be described clearly and completely with reference to the following embodiments, and it should be understood that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without inventive effort based on the embodiments of the present invention, are within the scope of the present invention.
Referring to fig. 1 to fig. 6, the present embodiment provides a distributed optical fiber voice enhancement method and a tunnel rescue system based on a GAN network:
one embodiment of the distributed optical fiber voice enhancement method based on the GAN network provided by the invention comprises the following steps: as shown in fig. 1, a distributed optical fiber voice enhancement method based on GAN network includes:
s100, DAS voice signals are collected, and pure voice signals are obtained.
Generally, the DAS voice signal is collected by a DAS (distributed optical fiber sensing system) detection device based on a Φ -OTDR. The clean speech signal is a speech signal without background noise.
In yet another embodiment, the DAS voice signal acquisition apparatus includes a fiber optic sensing device and an optical cable, wherein a first end of the optical cable is connected to an input end of the fiber optic sensing device, and a second end of the optical cable is wound around an outer wall of a first container, and the first container is disposed at a signal acquisition point. In the embodiment, the first container and the optical fiber jointly form a resonant cavity to generate a resonance effect, so that the action of the sound pressure of external sound on the optical fiber is larger, the signal modulation in the optical fiber is enhanced, and the voice is collected; the sensing probe composed of the first container and the optical fiber in the embodiment not only enlarges the voice detection range, but also improves the sensitivity. For example, a plastic container with a cavity inside is selected as the first container, the plastic container has only one opening communicating the cavity with the outside of the first container, the optical fibers are wound tightly against the outer wall of the plastic container, and each adjacent turn of the optical fibers is bonded together.
After voice signals are extracted through optical fiber sensing equipment, original signals are in a wav format, data read from wav files are read by using a python tool and then returned to data, the data are one-dimensional data matrixes, and the number of channels of the original signals is 1; the sampling rate is 48000Hz and the number of bits is 16 bits.
S200, preprocessing the acquired DAS voice signals.
In one embodiment, preprocessing the acquired DAS voice signals includes:
s210, pre-emphasis is carried out on the DAS voice signals.
The signal transmission line has low-pass filtering characteristics, so that the attenuation of high-frequency components of signals is large and the attenuation of low-frequency components is small in the transmission process; in the embodiment, the high-frequency components of the acquired DAS voice signals are improved by performing pre-emphasis operation, and the overlarge attenuation of the high-frequency components in the transmission process can be compensated. Pre-emphasis enhances the high frequency content of the signal by enhancing the amplitude at the rising and falling edges of the signal, balancing the spectrum, avoiding numerical problems with fourier transform operated devices, and improving the signal-to-noise ratio of the signal.
The transformation formula of the pre-emphasis is as follows:
Figure 100002_DEST_PATH_IMAGE002
in the formula, alpha represents a pre-emphasis coefficient, and the value thereof is between 0.9 and 1.0, in this embodiment, 0.97 is taken; x (t) represents the audio amplitude at time t; y (t) represents the amplitude of the audio at time t after pre-emphasis.
And S220, framing the pre-emphasized DAS voice signals.
In one embodiment, when the pre-emphasized DAS voice signal is framed, an overlapping region exists between two adjacent frames, so that the two adjacent frames can be prevented from changing too much. For example, each frame is 25 ms, each frame has a step size of 10 ms, and there is an overlap area of 15 ms between two adjacent frames.
And S230, windowing the DAS voice signals after framing.
Because the signals are disconnected during the framing processing, in the embodiment, each frame of DAS voice signals after framing is multiplied by a section of data with the same length, so that the continuity of the signals is ensured, and the preprocessed DAS voice signals are processed in the subsequent steps; the segment of data is data over the entire period of the window function, varying from a minimum to a maximum, then a minimum. Specifically, each frame signal is multiplied by a hamming window, thereby increasing the continuity of the left and right ends of the frame.
If the framed signal is s (N), N =0,1 …, N-1, and N is the size of the frame, then multiply by the hamming window
Figure DEST_PATH_IMAGE003
W (n) is of the form:
Figure DEST_PATH_IMAGE004
where a = 0.46164, this value is set to produce a zero-crossing at a frequency of 5 pi/(N-1) so that side lobes can be largely eliminated.
S300, converting the preprocessed DAS voice signals into a Mel frequency spectrum characteristic diagram.
In one embodiment, converting the preprocessed DAS voice signals into mel-frequency spectrum feature maps includes:
and S310, carrying out fast Fourier transform on the preprocessed DAS voice signals to obtain an energy spectrum of the DAS voice signals.
The signal characteristics are usually difficult to see through the signal transformation in the time domain, and after the signal is transformed into the energy spectrum in the frequency domain, different energy distributions represent the characteristics of different voices, so that the characteristics of the signal can be conveniently observed.
And S320, filtering the energy spectrum of the DAS voice signal by using a Mel-scale triangular filter bank to obtain a Mel frequency spectrum characteristic diagram.
In this embodiment, the triangular filter bank includes 80 filters, with a center frequency of f (m), and m =1,2,. multidot.80; the interval between each f (m) decreases as the value of m decreases and increases as the value of m increases.
In the embodiment, the triangular filter bank is adopted to filter the energy spectrum, so that the frequency spectrum can be smoothed, the effect of harmonic waves is eliminated, the formants of the original voice are highlighted, and meanwhile, the calculation amount can be reduced.
And S400, inputting the Mel frequency spectrum characteristic diagram and a preset pure signal serving as a label into a pre-constructed GAN network.
In one embodiment, the method for constructing the GAN network includes: constructing a generator for realizing frequency domain to time domain up-sampling; constructing a discriminator; and combining the generator and the arbiter to form a finished GAN network.
In one embodiment, as shown in fig. 2, the GAN network includes a Generator (Generator) and a Discriminator (Discriminator), the Generator divides the speech into different channels according to different frequency domains, and then filters the channels with a PQMF filter to obtain original waveforms, which are then input to the Discriminator.
The generator comprises two transition layers and four upsampling layers, wherein the upsampling layers are positioned between the two transition layers, and a residual cavity rolling block is connected behind each upsampling layer. Specifically, as shown in fig. 3, the generator includes two convolutional layers (Conv layer) as transition layers and four upsampling layers (2 × in fig. 3) followed by one residual convolutional block (residual stack), and each residual convolutional block is composed of 4 (4 × in fig. 3) hole convolutional blocks (scaled convolutional blocks). The up-sampling layer plays a role in gradual transition from a frequency domain to a time domain, and is usually deconvolution (deconvolution) for a convolutional network, wherein the deconvolution expands a low-dimensional matrix or vector into a high-dimensional space through operation. In the GAN network, the generator needs to upsample the input with dimension one to the number of points of frame shift (hop length) length by four-layer upsampling, which is 256 times in this embodiment, that is, needs to be expanded by 256 times. Therefore, although the number of parameters of the upsampling layer is large, the upsampling layer actually plays an auxiliary role in the model, and the conversion from the main frequency domain to the time domain is realized by depending on the residual convolution block.
In this embodiment, the clean signal is first divided into a plurality of sub-bands by the PQMF filter as a label, the generator finally predicts the audio frequencies of the 4 bands by a one-dimensional convolution network layer and using the tanh nonlinear activation function with an output channel of 4, thereby implementing the multi-band function. In the embodiment, a multiband is introduced, the generator divides the voice into different channels according to different frequency domains, and then the PQMF filter is used for recombining a plurality of sub-bands. After the generator introduces multiple frequency bands, conditions among the sub-frequency bands are independent, and corresponding multiples are sampled in each frequency band, so that the model structure is actually compressed correspondingly, the calculated amount is reduced, the generation speed of the voice can be improved, and the model can pay more attention to high-frequency signals according to different frequency domains.
Because the voice signal that the optic fibre got has the characteristics that the SNR is low, the voice characteristic is concentrated, in order to can be better the long distance dependency on the modeling time domain, adopt the scheme of deepening the residual error block layer in this embodiment, residual error hole convolution block layer comprises four layers of hole convolutions, the expansion coefficient of four layers of hole convolutions is 1, 3, 9 and 27 respectively to expand the model sense field to 81, the scope of the original audio frequency that can not only learn is bigger, also can handle the required large-span time dependence in the original audio frequency generation.
In this embodiment, the discriminator is a multi-scale discriminator, and the final score is calculated by multi-layer down-sampling. The discriminator comprises three separate convolutional layers and four downsampling layers, where downsampling is performed by convolution, i.e. a convolutional layer with a step size of 4. The multi-scale is to use three identical discriminators to score three times, and the inputs of the three discriminators correspond to different frequency scales; specifically, as shown in fig. 4, the input of the three discriminators (Discriminator Block) is the normal sampling rate voice, the voice sampled once, and the voice sampled twice, where the down sampling is directly implemented by the Average pooling layer (Average Pool), and the discriminators calculate the difference of different frequency bands by separating different frequency bands, so as to optimize the training effect.
As described above, the generator first performs feature analysis on the input mel-frequency spectrum feature map to generate a simulated enhanced speech signal; then the discriminator takes the input pure signal as a reference, scores the analog generation signal of the generator and feeds back the signal to the generator; and the generator optimizes the output according to the feedback of the discriminator, the operation is repeated, and finally the generator outputs a high-quality enhanced voice signal, so that the GAN network realizes the enhancement of the voice signal. The generator is completely opposite to the goal of the arbiter, which is to fool the arbiter, i.e. maximize the classification error of the arbiter, while the goal of the arbiter is to correctly distinguish between the real data and the generated data, i.e. minimize the classification error. Therefore, in each training iteration, the weight of the generated network changes towards the direction of increasing classification error, i.e. the direction of the error gradient; and the weight of the discrimination network changes toward a direction of decreasing the classification error, that is, a direction of decreasing the error gradient. The generator updates the generator network parameters once according to the loss calculated by the discriminator after each round of training, and the audio generated by the generator achieves the effect that the discriminator cannot judge whether the audio is true or false (the loss is in a preset interval, for example, the loss is close to 0.5) through counterstudy, and at the moment, the generator outputs a high-quality enhanced voice signal.
One embodiment of the GAN network-based tunnel rescue system provided by the invention comprises: the tunnel rescue system based on the GAN network comprises optical fiber sensing equipment, an optical cable and a first container, wherein the first container is arranged at a signal acquisition point; the second end of the optical cable is wound on the outer wall of the first container and used for collecting DAS voice signals; the input end of the optical fiber sensing equipment is connected with the first end of the optical cable, the optical fiber sensing equipment is used for acquiring pure voice signals, preprocessing the acquired DAS voice signals, converting the preprocessed DAS voice signals into Mel frequency spectrum characteristic diagrams, inputting the Mel frequency spectrum characteristic diagrams and the pure voice signals into a pre-constructed GAN network, and positioning signal acquisition points according to the DAS voice signals processed by the GAN network.
The tunnel rescue system in this embodiment collects and transmits emergency call signals based on optical fibers, and has the following advantages: the sensitivity is high, and various vibrations (including sound waves) can be monitored; the maximum measurement distance can reach 45km, and the method is suitable for vibration monitoring in a wide area range; the spatial resolution is up to 1 m, the positioning precision is 2-5 m, and the abnormal position can be accurately positioned; the environment adaptability is strong, and the tunnel can work normally even under the condition of an electrode end such as network interruption and power failure in the tunnel.
Case (2): experiments were conducted on the Longchi smart expressway demonstration base at 26/11/2021, and rockfall tests were conducted in the tunnel using the vacant optical fibers in the optical cables laid on the demonstration roads. The experimental data show a very good rockfall detection effect. Specifically, 1500 meters of total length of optical fibers (naturally placed) are laid on an intelligent high-speed dragon pool demonstration base, the first 800m is coiled fibers, slope rockfall is simulated by an experimental mode of artificial continuous rockfall, the spatial resolution is 5m, and the time sampling rate is 1 KHz. 3 segments of data are respectively collected and analyzed from two angles of time, space and time. Fig. 5 and 6 show the test results, where fig. 5 is a time-space two-dimensional signal diagram for rockfall detection in a rockfall experiment from near to far, and fig. 6 is a time-space two-dimensional signal diagram for rockfall detection in a rockfall experiment from far to near. As can be seen from fig. 5 and 6, the tunnel rescue system can accurately detect the vibration in the tunnel, and the voice signals belong to the vibration signals, so that the tunnel rescue system can effectively acquire the DAS voice signals.
The foregoing is illustrative of the preferred embodiments of this invention, and it is to be understood that the invention is not limited to the precise form disclosed herein and that various other combinations, modifications, and environments may be resorted to, falling within the scope of the concept as disclosed herein, either as described above or as apparent to those skilled in the relevant art. And that modifications and variations may be effected by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A distributed optical fiber voice enhancement method based on a GAN network is characterized by comprising the following steps:
collecting DAS voice signals;
obtaining a pure voice signal;
preprocessing the acquired DAS voice signals;
the preprocessed DAS voice signals are converted into a Mel frequency spectrum characteristic diagram;
and inputting the Mel frequency spectrum characteristic diagram and the pure voice signal into a pre-constructed GAN network.
2. The GAN network-based distributed optical fiber voice enhancement method of claim 1, wherein the preprocessing the collected DAS voice signals comprises:
pre-emphasis the DAS voice signals;
framing the pre-emphasized DAS voice signals;
and windowing the DAS voice signals after the framing.
3. The GAN network-based distributed fiber optic voice enhancement method of claim 2, wherein the pre-emphasis transform formula is:
Figure DEST_PATH_IMAGE002
in the formula, alpha represents a pre-emphasis coefficient, and the value of the pre-emphasis coefficient is between 0.9 and 1.0; x (t) represents the audio amplitude at time t; y (t) represents the amplitude of the audio at time t after pre-emphasis.
4. The distributed optical fiber voice enhancement method based on the GAN network as claimed in claim 2, wherein when the pre-emphasized DAS voice signals are framed, there is an overlapping area between two adjacent frames.
5. The GAN network-based distributed optical fiber voice enhancement method of claim 2, wherein windowing the framed DAS voice signals comprises:
each frame signal is multiplied by a hamming window, respectively.
6. The GAN network-based distributed optical fiber voice enhancement method of claim 1, wherein converting the preprocessed DAS voice signals into mel-frequency spectrum feature maps comprises:
performing fast Fourier transform on the preprocessed DAS voice signals to obtain energy spectrums of the DAS voice signals;
and filtering the energy spectrum of the DAS voice signal by using a Mel-scale triangular filter bank to obtain a Mel frequency spectrum characteristic diagram.
7. The GAN network-based distributed optical fiber voice enhancement method of claim 1, wherein the GAN network construction method comprises:
constructing a generator for realizing frequency domain to time domain up-sampling;
constructing a discriminator;
and combining the generator and the discriminator to form the finished GAN network.
8. The distributed fiber optic speech enhancement method based on a GAN network of claim 1, wherein the GAN network comprises a generator and at least one discriminator, the generator comprises two transition layers and four upsampling layers, the upsampling layer is located between the two transition layers, each upsampling layer is followed by a residual hole convolution block, and the discriminator comprises three convolutional layers and four downsampling layers.
9. The GAN network-based distributed fiber optic speech enhancement method of claim 7, wherein the residual hole convolution block is formed by four layers of hole convolutions with expansion coefficients of 1, 3, 9 and 27.
10. Tunnel rescue system based on GAN network characterized in that includes:
the first container is arranged at the signal acquisition point;
the second end of the optical cable is wound on the outer wall of the first container and is used for collecting DAS voice signals;
the input end of the optical fiber sensing equipment is connected with the first end of the optical cable; the optical fiber sensing equipment is used for acquiring pure voice signals, preprocessing the acquired DAS voice signals, converting the preprocessed DAS voice signals into Mel frequency spectrum characteristic diagrams, inputting the Mel frequency spectrum characteristic diagrams and the pure voice signals into a pre-constructed GAN network, and positioning signal acquisition points according to the DAS voice signals processed by the GAN network.
CN202210812753.7A 2022-07-12 2022-07-12 Distributed optical fiber voice enhancement method based on GAN network and tunnel rescue system Pending CN114898766A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210812753.7A CN114898766A (en) 2022-07-12 2022-07-12 Distributed optical fiber voice enhancement method based on GAN network and tunnel rescue system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210812753.7A CN114898766A (en) 2022-07-12 2022-07-12 Distributed optical fiber voice enhancement method based on GAN network and tunnel rescue system

Publications (1)

Publication Number Publication Date
CN114898766A true CN114898766A (en) 2022-08-12

Family

ID=82729796

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210812753.7A Pending CN114898766A (en) 2022-07-12 2022-07-12 Distributed optical fiber voice enhancement method based on GAN network and tunnel rescue system

Country Status (1)

Country Link
CN (1) CN114898766A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116092484A (en) * 2023-04-07 2023-05-09 四川高速公路建设开发集团有限公司 Signal detection method and system based on distributed optical fiber sensing in high-interference environment

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN202325687U (en) * 2011-11-15 2012-07-11 北京航天易联科技发展有限公司 Mine safety monitoring, early-warning and positioning device based on optical fibre sensing
CN207184740U (en) * 2017-05-24 2018-04-03 安徽师范大学 A kind of high sensitivity optical fiber speech detection device
CN108346433A (en) * 2017-12-28 2018-07-31 北京搜狗科技发展有限公司 A kind of audio-frequency processing method, device, equipment and readable storage medium storing program for executing
CN110619885A (en) * 2019-08-15 2019-12-27 西北工业大学 Method for generating confrontation network voice enhancement based on deep complete convolution neural network
CN112033522A (en) * 2020-08-10 2020-12-04 太原理工大学 Sound signal detection system and method of distributed optical fiber sensor
CN112509593A (en) * 2020-11-17 2021-03-16 北京清微智能科技有限公司 Voice enhancement network model, single-channel voice enhancement method and system
CN113314109A (en) * 2021-07-29 2021-08-27 南京烽火星空通信发展有限公司 Voice generation method based on cycle generation network
CN113409759A (en) * 2021-07-07 2021-09-17 浙江工业大学 End-to-end real-time speech synthesis method
CN114446314A (en) * 2021-12-31 2022-05-06 中国人民解放军陆军工程大学 Voice enhancement method for deeply generating confrontation network
CN114740423A (en) * 2022-03-18 2022-07-12 广东技术师范大学 Ocean target positioning method, device, equipment and medium based on DAS
CN114783455A (en) * 2022-05-07 2022-07-22 北京快鱼电子股份公司 Method, apparatus, electronic device and computer readable medium for voice noise reduction

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN202325687U (en) * 2011-11-15 2012-07-11 北京航天易联科技发展有限公司 Mine safety monitoring, early-warning and positioning device based on optical fibre sensing
CN207184740U (en) * 2017-05-24 2018-04-03 安徽师范大学 A kind of high sensitivity optical fiber speech detection device
CN108346433A (en) * 2017-12-28 2018-07-31 北京搜狗科技发展有限公司 A kind of audio-frequency processing method, device, equipment and readable storage medium storing program for executing
CN110619885A (en) * 2019-08-15 2019-12-27 西北工业大学 Method for generating confrontation network voice enhancement based on deep complete convolution neural network
CN112033522A (en) * 2020-08-10 2020-12-04 太原理工大学 Sound signal detection system and method of distributed optical fiber sensor
CN112509593A (en) * 2020-11-17 2021-03-16 北京清微智能科技有限公司 Voice enhancement network model, single-channel voice enhancement method and system
CN113409759A (en) * 2021-07-07 2021-09-17 浙江工业大学 End-to-end real-time speech synthesis method
CN113314109A (en) * 2021-07-29 2021-08-27 南京烽火星空通信发展有限公司 Voice generation method based on cycle generation network
CN114446314A (en) * 2021-12-31 2022-05-06 中国人民解放军陆军工程大学 Voice enhancement method for deeply generating confrontation network
CN114740423A (en) * 2022-03-18 2022-07-12 广东技术师范大学 Ocean target positioning method, device, equipment and medium based on DAS
CN114783455A (en) * 2022-05-07 2022-07-22 北京快鱼电子股份公司 Method, apparatus, electronic device and computer readable medium for voice noise reduction

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
CHKHETIANI L: ""SE-MelGAN -- Speaker Agnostic Rapid Speech Enhancement"", 《ARXIV PREPRINT ARXIV:2006.07637》 *
KUMAR K: "" MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis"", 《NEURAL INFORMATION PROCESSING SYSTEMS》 *
陈飞扬: ""基于生成对抗网络的多判别歌声合成声码器的研究"", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116092484A (en) * 2023-04-07 2023-05-09 四川高速公路建设开发集团有限公司 Signal detection method and system based on distributed optical fiber sensing in high-interference environment
CN116092484B (en) * 2023-04-07 2023-06-09 四川高速公路建设开发集团有限公司 Signal detection method and system based on distributed optical fiber sensing in high-interference environment

Similar Documents

Publication Publication Date Title
CN110827837B (en) Whale activity audio classification method based on deep learning
EP0907258B1 (en) Audio signal compression, speech signal compression and speech recognition
CN103026407B (en) Bandwidth extender
CN102870156B (en) Audio communication device, method for outputting an audio signal, and communication system
DE69332994T2 (en) Highly efficient coding process
US5185848A (en) Noise reduction system using neural network
US6513004B1 (en) Optimized local feature extraction for automatic speech recognition
CN108447495B (en) Deep learning voice enhancement method based on comprehensive feature set
CN113405825B (en) Belt conveyor fault diagnosis method based on sound signals
CN108182949A (en) A kind of highway anomalous audio event category method based on depth conversion feature
CN110459241B (en) Method and system for extracting voice features
JPH05346797A (en) Voiced sound discriminating method
CN111785285A (en) Voiceprint recognition method for home multi-feature parameter fusion
CN112992121B (en) Voice enhancement method based on attention residual error learning
CN114898766A (en) Distributed optical fiber voice enhancement method based on GAN network and tunnel rescue system
JPH08123484A (en) Method and device for signal synthesis
CN108806725A (en) Speech differentiation method, apparatus, computer equipment and storage medium
CN113160852A (en) Voice emotion recognition method, device, equipment and storage medium
CN112183582A (en) Multi-feature fusion underwater target identification method
CN116778956A (en) Transformer acoustic feature extraction and fault identification method
CN115910097A (en) Audible signal identification method and system for latent fault of high-voltage circuit breaker
Wang et al. Low pass filtering and bandwidth extension for robust anti-spoofing countermeasure against codec variabilities
Mazumder et al. Feature extraction techniques for speech processing: A review
CN116524273A (en) Method, device, equipment and storage medium for detecting draft tube of power station
CN111145726A (en) Deep learning-based sound scene classification method, system, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20220812