CN111489759A

CN111489759A - Noise evaluation method based on optical fiber voice time domain signal waveform alignment

Info

Publication number: CN111489759A
Application number: CN202010210101.7A
Authority: CN
Inventors: 吕辰刚; 马敬敬; 霍紫强
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2020-03-23
Filing date: 2020-03-23
Publication date: 2020-08-04

Abstract

The invention relates to a noise evaluation method based on optical fiber voice time domain signal waveform alignment, which comprises the following steps: processing an original voice time domain signal; building a laser microphone of the optical fiber ring cavity; the location where the fiber optic coil is deployed; inputting the voice signal after time domain processing into a laser microphone through an optical fiber coil, observing the condition of the output signal by using an oscilloscope, and obtaining a stable output voice signal by adjusting the output power of an optical fiber laser source; and aligning the time domains of the input voice signal and the output voice signal of the laser microphone according to the inserted square wave signal, denoising the voice signal passing through the cavity laser microphone, namely the noise-containing voice signal, and evaluating the voice quality.

Description

Noise evaluation method based on optical fiber voice time domain signal waveform alignment

Technical Field

The invention relates to an optical fiber sensing technology and an optical fiber annular cavity laser microphone, and belongs to the field of voice signal noise evaluation.

Background

As a mainstream communication mode, voice obviously becomes an important man-machine interaction means in the future. However, noise is inevitable and varies in the process of communication between people or human-computer interaction. For example, environmental noise, mechanical noise, traffic noise such as cars when people make calls on the road, etc. in voice communication affect voice quality. In the rapid development of the information-based society, the overall quality requirement on voice is higher and higher, and the voice denoising technology is rapidly developed. With the introduction of the deep research and new idea of speech denoising, a series of speech denoising methods such as human auditory masking, artificial neural network and speech denoising algorithm based on wavelet transform also appear in succession.

For a voice signal after voice denoising, how to judge the denoising effect needs to introduce a voice denoising evaluation index, and people perform performance evaluation on a voice enhancement algorithm by two methods, namely subjective evaluation, wherein the evaluation index relates to the intelligibility and the perception quality of a speech. Perceptual quality generally refers to the degree of speech recognition, speech quality, timbre pitch, etc. The subjective evaluation mainly considers whether the speech intelligibility is clear or not and whether the information transmitted by the speech signal is complete or not. Another method is objective evaluation by examining the quality of speech coding and speech communication. The objective evaluation method is to compare the voice denoising performance according to specific data, and other factors except the data are not needed to measure the denoising quality. The objective evaluation indexes mainly comprise signal-to-noise ratio, segmented signal-to-noise ratio and log-spectrum distortion measure. The objective evaluation index can obviously reflect the denoising performance to a certain extent, and is very important for the denoising effect.

The objective speech assessment method is illustrated here by an indicator of the signal-to-noise ratio:

the signal-to-noise ratio is defined as follows:

wherein, s (n) represents original voice, which may be a voice file recorded in a quiet environment of a laboratory, or a clearer voice file on a mobile phone, a computer or other equipment, or a recording file in mp3 format;

representing the image by means of a de-noising processThe latter speech, L, represents the number of sample points of the speech signal, L is a parameter set by itself in the experiment.

It can also be known from the formula that the basic idea of speech quality assessment is to compare two speech signals, so that the original speech and the denoised speech must be in the same time period and have the same time length during speech quality assessment, that is, the speech signal time domain alignment processing is required. The general voice alignment processing is realized by an SPPAS tool, an audio alignment algorithm or a manual alignment method, but the common voice alignment processing has obvious errors, so that differences can be obviously heard subjectively. Speech time domain alignment is the basis for speech signal processing.

Technical scheme

The invention aims to provide a new method for realizing time domain waveform alignment of optical fiber signals, which is applied to the denoising processing of laser microphone voice signals based on an optical fiber ring cavity and realizes the time synchronization of input original and noisy voice when the denoising effect is subjected to voice quality evaluation. The technical scheme is realized as follows:

a noise evaluation method based on optical fiber voice time domain signal waveform alignment comprises the following steps:

firstly, processing an original voice time domain signal. Reading the original voice signal by Matlab software to obtain the sequence information of the original voice signal, adding square wave sequence information at the proper position of the original voice sequence, and converting and storing the synthesized voice sequence as an audio file for inputting into a laser microphone.

The laser microphone comprises an optical fiber laser source, an erbium-doped optical fiber, an FFP filter, a 2 × 2 coupler, an optical fiber coil and a data acquisition part, wherein the optical fiber laser source is connected with the FFP filter and outside the optical fiber ring cavity, continuous laser is generated by the optical fiber laser source, an optical signal which can only be transmitted in one direction is formed after passing through an isolator, then the optical signal is amplified by the erbium-doped optical fiber and then transmitted to the optical fiber coil, the optical fiber coil is formed by winding optical fibers with the length of more than 1 kilometer and serves as a voice signal sensing device, the optical signal is transmitted back to the optical fiber laser source through the coupler and the FFP filter, the coupler is used for converting the optical signal and an electric signal, the signal is converted through the coupler, and an output voice signal is obtained through the data acquisition part;

thirdly, arranging the position of the optical fiber coil;

fourthly, inputting the voice signal after time domain processing into a laser microphone through an optical fiber coil, observing the condition of the output signal by using an oscilloscope, and obtaining a stable output voice signal by adjusting the output power of an optical fiber laser source;

fifthly, aligning the time domains of the input voice signal and the output voice signal of the laser microphone according to the inserted square wave signal, denoising the voice signal passing through the cavity laser microphone, namely the noise-containing voice signal, and evaluating the voice quality.

Drawings

Fig. 1 is a schematic structural diagram of a laser microphone of a fiber ring cavity.

FIG. 2 is a flow chart of noise quality assessment based on waveform alignment of a fiber optic voice time domain voice signal according to the present invention.

Fig. 3 (a) is a schematic diagram of time domain waveforms of an original speech and a square wave signal, and (b) is a schematic diagram of a speech signal after a square wave sequence is added in the present invention.

Detailed Description

The invention is further illustrated and described below with reference to the accompanying drawings and specific examples.

The method is applied to evaluating the voice signal collected by the laser microphone in the optical fiber annular cavity, namely the voice signal containing noise.

First, the new microphone will be described, and referring to fig. 1, the system can be divided into three parts.

The first part is the hardware part of the system, which comprises a 980nm laser source, which generates continuous laser to provide light energy for the system, an erbium-doped fiber (EDFA), which is an optical fiber doped with a small amount of rare earth element erbium and can amplify light in the 1550nm range, a tunable fiber Fabry-Perot (FFP) filter adopted by the filter, the FFP filter and the optical fiber system have good compatibility, a 2 × 2 coupler outside the annular cavity of the optical fiber is directly connected with the filter, the annular cavity is formed by the FFP filter, the rest light is fed back into the annular cavity, and due to the action of an isolator, laser with the same transmission wavelength as that of the FFP can be generated in one direction only in the annular cavity.

The second part is an optical fiber coil, is an induction part of a voice signal, is also an input position of the voice signal, is equivalent to a loudspeaker effect, and is formed by winding ordinary optical fibers of a plurality of kilometers (which can be between 1 kilometer and 10 kilometers).

The third part is a voice acquisition part which consists of a Photodiode (PD) and a data acquisition card (DAQ).

The voice signal collected by the laser microphone of the fiber ring cavity is a voice signal containing noise. Noise originates from interference in the environment, and may be caused by a person speaking, walking, eating, opening or closing a door, knocking, traffic outside a window, or natural wind, rain, etc. during the recording process. These noises degrade the speech quality, and in order to obtain a clearer speech signal, speech denoising is required. In the speech denoising quality evaluation, a time domain alignment process for the speech signal is required, which is also the object of the present invention.

Referring to fig. 2, the flow chart of the present invention, the corresponding steps are briefly described as follows:

(1) time-domain processing of the original speech signal. The original voice signal can be a voice file recorded in a quiet environment of a laboratory, or a clearer mp3 voice file downloaded by equipment such as a mobile phone and a computer, the original voice signal is read to the computer by Matlab to obtain corresponding sequence information, square wave sequence information is added at a proper position of a voice sequence, a position with a large change of a sequence value is generally selected for adding, and then the synthesized sequence signal is converted into the voice signal to be used as a result of voice time domain processing.

(2) And (5) building a laser microphone system. The laser microphone system of the fiber ring cavity comprises components such as a fiber laser source, an FFP filter, an isolator, an erbium-doped fiber, a fiber coil and the like. And a voice acquisition part consisting of PD and DAQ outside the annular cavity.

(3) And arranging the fiber coil position. The optical fiber coil is used for sensing a voice signal, is also an input position of the voice signal and is formed by winding 2 kilometers of common optical fibers.

(4) And setting system parameters. And after a microphone system is built and an optical fiber coil is arranged, the microphone system is connected with an oscilloscope, and the optical fiber laser source is adjusted under the condition of voice signal input, so that sensitive and stable signal output is obtained under proper power. Generally, the higher the output power of the fiber laser source, the faster and more sensitive the output signal will be, and in the experiment, the output power of the fiber laser source is set to be about 350 w.

The idea of the invention is to add a square wave voice sequence in an original voice sequence, wherein the voice signal has a time mark point, then input the synthesized voice signal into a laser microphone, and the signal passing through the laser microphone is a voice signal containing noise, thereby ensuring that the input and output voice signals are aligned in time domain. Then, the output noise-containing voice signal is subjected to noise processing and voice signal quality evaluation.

Referring to fig. 3, a diagram (a) shows that in an original speech signal, a peak position is selected as a time node to add a square wave signal, and after the addition, a speech signal of a diagram (b) can be obtained, and a synthesized speech signal shown in the diagram (b) can be used as an input speech of a microphone system.

The invention has the following beneficial effects:

(1) the invention realizes the time domain waveform alignment of the optical fiber acoustic signal and has higher accuracy and practicability.

(2) The portability is good, aiming at the optical fiber sound signal wavelengths of different forms, only standard square waves, sawtooth waves or sine waves and the like need to be added at proper positions, and experimental programs can be used universally under various operating systems.

Claims

1. A noise evaluation method based on optical fiber voice time domain signal waveform alignment comprises the following steps:

firstly, processing an original voice time domain signal: reading the original voice signal by Matlab software to obtain the sequence information of the original voice signal, adding square wave sequence information at the proper position of the original voice sequence, and converting and storing the synthesized voice sequence as an audio file for inputting into a laser microphone.

The second step, building a laser microphone of the fiber ring cavity, which comprises a fiber laser source, an erbium-doped fiber, an FFP filter, a 2 × 2 coupler outside the fiber ring cavity connected with the FFP filter, a fiber coil and a data acquisition part, wherein the fiber laser source generates continuous laser, optical signals which can only be transmitted in one direction are formed after passing through an isolator, then the optical signals are amplified by the erbium-doped fiber and then transmitted to the fiber coil, the fiber coil is formed by winding fibers with the length of more than 1 kilometer and serves as a voice signal induction device, the optical signals are transmitted back to the fiber laser source through the coupler and the FFP filter, the coupler is used for converting the optical signals and the electric signals, the signals are converted through the coupler, and output voice signals are obtained through the data acquisition part;

thirdly, arranging the position of the optical fiber coil;