CN110970051A - Voice data acquisition method, terminal and readable storage medium - Google Patents

Voice data acquisition method, terminal and readable storage medium Download PDF

Info

Publication number
CN110970051A
CN110970051A CN201911248621.0A CN201911248621A CN110970051A CN 110970051 A CN110970051 A CN 110970051A CN 201911248621 A CN201911248621 A CN 201911248621A CN 110970051 A CN110970051 A CN 110970051A
Authority
CN
China
Prior art keywords
voice
signal
noise
speech
ratio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911248621.0A
Other languages
Chinese (zh)
Inventor
黄族良
龙洪锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Speakin Intelligent Technology Co ltd
Original Assignee
Guangzhou Speakin Intelligent Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Speakin Intelligent Technology Co ltd filed Critical Guangzhou Speakin Intelligent Technology Co ltd
Priority to CN201911248621.0A priority Critical patent/CN110970051A/en
Publication of CN110970051A publication Critical patent/CN110970051A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L21/0232Processing in the frequency domain
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/21Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being power information
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/45Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02165Two microphones, one receiving mainly the noise signal and the other one mainly the speech signal

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Quality & Reliability (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

The invention discloses a voice data acquisition method, which comprises the steps of obtaining a voice signal with noise, estimating a voice ratio estimated value of the voice signal in the voice signal with noise, judging whether the voice ratio estimated value is smaller than a preset voice ratio, and if the voice ratio estimated value is smaller than the preset voice ratio, carrying out noise reduction processing on the voice signal with noise to obtain a noise reduction voice signal. The invention also discloses a terminal and a readable storage medium. The purpose of guaranteeing the authenticity of the voice signal while improving the voice data acquisition quality is achieved.

Description

Voice data acquisition method, terminal and readable storage medium
Technical Field
The present invention relates to the field of voice signal processing, and in particular, to a voice data acquisition method, a terminal, and a readable storage medium.
Background
With the development of the voiceprint recognition technology, the use of the voiceprint recognition technology by a public security department in the investigation and solution process is increasingly frequent, but in general, the voiceprint library of the public security department retains original voice data, and the original voice data cannot guarantee the surrounding environment during acquisition, so that the original voice data has noise, and the voiceprint recognition technology needs to extract voiceprint features from the original voice data, so that the voice data quality for extracting the voiceprint features must be guaranteed, that is, the acquired original voice data needs to be subjected to denoising processing before extracting the voiceprint features, but the denoising processing inevitably causes the problem of original voice signal distortion.
The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.
Disclosure of Invention
The invention mainly aims to provide a voice data acquisition method, a terminal and a readable storage medium, and aims to solve the problems of low voice data acquisition quality and low voice signal authenticity.
In order to achieve the above object, the present invention provides a voice data collecting method, including:
acquiring a voice signal with noise;
estimating a voice ratio estimation value of a voice signal in the voice signal with noise;
judging whether the voice ratio estimation value is smaller than a preset voice ratio value or not;
and if the voice proportion estimation value is smaller than a preset voice proportion value, carrying out noise reduction processing on the voice signal with noise to obtain a noise reduction voice signal.
Optionally, the step of estimating the speech-to-noise ratio estimation value of the speech signal in the noisy speech signal comprises:
framing and windowing the voice signal with noise to obtain a voice frame signal with noise;
and respectively carrying out discrete Fourier transform on each voice frame signal with noise according to a time sequence to obtain a discrete magnitude spectrum group.
Optionally, the speech signal to noise ratio estimation value is an a posteriori signal to noise ratio, and the step of estimating the speech signal to noise ratio estimation value of the speech signal in the speech signal to noise ratio includes:
based on the discrete magnitude spectrum group, acquiring a minimum magnitude spectrum by a minimum statistical method;
acquiring a noise power spectrum estimation value based on the minimum amplitude spectrum;
and obtaining the posterior signal-to-noise ratio based on the minimum amplitude spectrum and the noise power spectrum estimation value.
Optionally, the step of performing noise reduction processing on the noisy speech signal includes:
acquiring a voice amplitude spectrum of the same frequency of a previous frame of the noise power spectrum, and acquiring a voice power spectrum estimation value based on the voice amplitude spectrum;
and obtaining a prior signal-to-noise ratio based on the voice power spectrum estimated value, the noise power spectrum estimated value and the posterior signal-to-noise ratio.
Optionally, the step of obtaining a prior signal-to-noise ratio based on the speech power spectrum estimation value, the noise power spectrum estimation value, and the posterior signal-to-noise ratio includes:
obtaining a first prior signal-to-noise ratio based on the voice power spectrum estimation value and the noise power spectrum estimation value;
and obtaining a second prior signal-to-noise ratio based on MMSE (Minimum Mean square error) according to the posterior signal-to-noise ratio and the first prior signal-to-noise ratio, wherein the first prior signal-to-noise ratio is the prior signal-to-noise ratio of the previous frame of the same-frequency voice of the second prior signal-to-noise ratio.
Optionally, the step of performing noise reduction processing on the noisy speech signal includes:
obtaining a noise reduction attenuation gain based on the second prior signal-to-noise ratio;
obtaining a noise reduction voice frame signal based on the noise reduction attenuation gain and the noisy voice frame signal;
and performing windowing and overlap-add processing on the noise-reduction voice frame signal to obtain a noise-reduction voice signal.
Optionally, the step of acquiring a noisy speech signal comprises:
acquiring a negative original voice signal of a microphone of a reverse access circuit;
acquiring a positive original voice signal of a microphone of a forward access circuit;
and overlapping the negative original voice signal and the positive original voice signal to preliminarily remove noise to obtain a voice signal with noise.
Optionally, after the step of determining whether the speech ratio estimation value is smaller than a preset speech ratio value, the method further includes:
and if the posterior signal-to-noise ratio is greater than or equal to the preset value, not performing noise reduction processing on the voice signal with noise.
In order to achieve the above object, the present invention further provides a terminal, including: the voice data acquisition system comprises a memory, a processor and a voice data acquisition program which is stored on the memory and can run on the processor, wherein the voice data acquisition program realizes the steps of the voice data acquisition method when being executed by the processor.
In addition, in order to achieve the above object, the present invention further provides a computer storage medium, in which a voice data collecting program is stored, and the voice data collecting program realizes the steps of the voice data collecting method when being executed by a processor.
The voice data acquisition method, the terminal and the readable storage medium provided by the embodiment of the invention estimate the voice ratio estimated value of the voice signal in the voice signal with noise by acquiring the voice signal with noise, judge whether the voice ratio estimated value is smaller than the preset voice ratio, if the voice ratio estimated value is smaller than the preset voice ratio, perform noise reduction processing on the voice signal with noise to obtain a noise reduction voice signal, when acquiring the voice data, set two microphones, a forward access circuit and a reverse access circuit to perform preliminary noise reduction, and perform noise reduction processing on the voice signal with noise when the voice degree contained in the voice is too low, so as to ensure the quality of the voice data, and in addition, when the voice degree contained in the voice is higher, not perform noise reduction processing on the voice signal with noise, and also ensure the authenticity of the voice data, therefore, whether noise reduction processing is required for the noise-carrying speech signal is determined according to the level of the speech signal contained in the noise-carrying speech signal. The purpose of guaranteeing the authenticity of the voice signal while improving the voice data acquisition quality is achieved.
Drawings
Fig. 1 is a schematic terminal structure diagram of a hardware operating environment according to an embodiment of the present invention;
FIG. 2 is a schematic flow chart illustrating a voice data collection method according to a first embodiment of the present invention;
FIG. 3 is a flowchart illustrating a voice data collection method according to a second embodiment of the present invention;
fig. 4 is a flowchart illustrating a voice data collection method according to a third embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The main solution of the embodiment of the invention is as follows: the voice data acquisition method, the terminal and the readable storage medium provided by the embodiment of the invention estimate the voice ratio estimated value of the voice signal in the voice signal with noise by acquiring the voice signal with noise, judge whether the voice ratio estimated value is smaller than the preset voice ratio, and perform noise reduction processing on the voice signal with noise if the voice ratio estimated value is smaller than the preset voice ratio, thereby realizing the purpose of improving the voice data acquisition quality and ensuring the authenticity of the voice signal.
In the prior art, when the collected voice data is processed, all voice signals are subjected to noise reduction processing, and the voice signals are not required to be subjected to noise reduction processing, so that the problem of serious voice signal distortion is caused.
The invention provides a solution, which ensures the authenticity of the voice signal while improving the voice data acquisition quality.
As shown in fig. 1, fig. 1 is a schematic terminal structure diagram of a hardware operating environment according to an embodiment of the present invention.
The terminal of the embodiment of the invention can be a PC, and can also be a mobile terminal device with a display function, such as a smart phone, a tablet computer, an electronic book reader, an MP3(Moving Picture Experts Group Audio Layer III, dynamic video Experts compress standard Audio Layer 3) player, an MP4(Moving Picture Experts Group Audio Layer IV, dynamic video Experts compress standard Audio Layer 3) player, a portable computer, and the like.
As shown in fig. 1, the terminal may include: a processor 1001, such as a CPU, a network interface 1004, a user interface 1003, a memory 1005, a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001.
Optionally, the terminal may further include a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WiFi module, and the like. Such as light sensors, motion sensors, and other sensors. Specifically, the light sensor may include an ambient light sensor that may adjust the brightness of the display screen according to the brightness of ambient light, and a proximity sensor that may turn off the display screen and/or the backlight when the mobile terminal is moved to the ear. As one of the motion sensors, the gravity acceleration sensor can detect the magnitude of acceleration in each direction (generally, three axes), detect the magnitude and direction of gravity when the mobile terminal is stationary, and can be used for applications (such as horizontal and vertical screen switching, related games, magnetometer attitude calibration), vibration recognition related functions (such as pedometer and tapping) and the like for recognizing the attitude of the mobile terminal; of course, the mobile terminal may also be configured with other sensors such as a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which are not described herein again.
Those skilled in the art will appreciate that the terminal structure shown in fig. 1 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.
As shown in fig. 1, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, a user interface module, and a voice data collecting program.
In the terminal shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and performing data communication with the backend server; the user interface 1003 is mainly used for connecting a client (user side) and performing data communication with the client; and the processor 1001 may be configured to call the voice data collecting program stored in the memory 1005 and perform the following operations:
acquiring a voice signal with noise;
estimating a voice ratio estimation value of a voice signal in the voice signal with noise;
judging whether the voice ratio estimation value is smaller than a preset voice ratio value or not;
and if the voice proportion estimation value is smaller than a preset voice proportion value, carrying out noise reduction processing on the voice signal with noise to obtain a noise reduction voice signal.
Further, the processor 1001 may be configured to call the voice data collecting program stored in the memory 1005, and further perform the following operations:
the step of estimating the speech to speech ratio estimate of the speech signal in the noisy speech signal comprises:
framing and windowing the voice signal with noise to obtain a voice frame signal with noise;
and respectively carrying out discrete Fourier transform on each voice frame signal with noise according to a time sequence to obtain a discrete magnitude spectrum group.
Further, the speech signal to noise ratio estimation value is an a posteriori signal to noise ratio, and the step of estimating the speech signal to noise ratio estimation value of the speech signal in the speech signal with noise includes:
based on the discrete magnitude spectrum group, acquiring a minimum magnitude spectrum by a minimum statistical method;
acquiring a noise power spectrum estimation value based on the minimum amplitude spectrum;
and obtaining the posterior signal-to-noise ratio based on the minimum amplitude spectrum and the noise power spectrum estimation value.
Further, the processor 1001 may be configured to call the voice data collecting program stored in the memory 1005, and further perform the following operations:
the step of performing noise reduction processing on the noisy speech signal comprises:
acquiring a voice amplitude spectrum of the same frequency of a previous frame of the noise power spectrum, and acquiring a voice power spectrum estimation value based on the voice amplitude spectrum;
and obtaining a prior signal-to-noise ratio based on the voice power spectrum estimated value, the noise power spectrum estimated value and the posterior signal-to-noise ratio.
Further, the step of obtaining a priori signal-to-noise ratio based on the speech power spectrum estimation value, the noise power spectrum estimation value, and the a posteriori signal-to-noise ratio comprises:
obtaining a first prior signal-to-noise ratio based on the voice power spectrum estimation value and the noise power spectrum estimation value;
and obtaining a second prior signal-to-noise ratio based on MMSE according to the posterior signal-to-noise ratio and the first prior signal-to-noise ratio, wherein the first prior signal-to-noise ratio is the prior signal-to-noise ratio of the previous frame of the same-frequency voice of the second prior signal-to-noise ratio.
Further, the step of performing noise reduction processing on the noisy speech signal includes:
obtaining a noise reduction attenuation gain based on the second prior signal-to-noise ratio;
obtaining a noise reduction voice frame signal based on the noise reduction attenuation gain and the noisy voice frame signal;
and performing windowing and overlap-add processing on the noise-reduction voice frame signal to obtain a noise-reduction voice signal.
Further, the processor 1001 may be configured to call the voice data collecting program stored in the memory 1005, and further perform the following operations:
the step of obtaining a noisy speech signal comprises:
acquiring a negative original voice signal of a microphone of a reverse access circuit;
acquiring a positive original voice signal of a microphone of a forward access circuit;
and overlapping the negative original voice signal and the positive original voice signal to preliminarily remove noise to obtain a voice signal with noise.
Further, the processor 1001 may be configured to call the voice data collecting program stored in the memory 1005, and further perform the following operations:
after the step of judging whether the voice ratio estimation value is smaller than a preset voice ratio value, the method further comprises the following steps:
and if the posterior signal-to-noise ratio is greater than or equal to the preset value, not performing noise reduction processing on the voice signal with noise.
Referring to fig. 2, in a first embodiment of the voice data collecting method of the present invention, the voice data collecting method includes:
step S10, acquiring a voice signal with noise;
and acquiring voice data with noise based on the public security part voiceprint information acquisition process, and acquiring a voice signal with noise through the voice data with noise.
Step S20, estimating the voice ratio estimation value of the voice signal in the voice signal with noise;
the method for obtaining the posterior signal-to-noise ratio comprises the steps of framing a noisy speech signal (the speech signal is unstable, a stable random process is a main means for researching the speech signal, and a short-time speech signal is stable, so the speech signal is generally divided into a series of 10-30 ms speech segments which are called analysis frames, the time length of the frames is generally about 0-1/2 as frame shift, the frame shift is an overlapping area between two adjacent frames, the process is called framing) and windowing (after framing, the beginning and the end of each frame are discontinuous, and a Hamming window is generally added for solving the problem) to obtain the noisy speech frame signal, and discrete transform is respectively carried out on each noisy speech frame according to the time sequence, obtaining a discrete amplitude spectrum group, then processing the discrete amplitude spectrum group according to a minimum statistical method to obtain a minimum amplitude spectrum, then calculating a noise power spectrum estimation value according to the minimum amplitude spectrum, calculating a posterior signal-to-noise ratio by combining the minimum amplitude spectrum with the noise power spectrum estimation value, or processing a noisy speech signal through a time domain processing technology to obtain a time domain spectrum, then selecting a preset number of time domain points with larger amplitude which can reflect the speech and noise energy according to the time domain spectrum, wherein the time domain points can be directly selected from the large to the small according to the amplitude, or selected from a preset number of time domain points at intervals from the large to the small according to the amplitude, then judging whether the time domain points belong to the speech or the noise through a VAD algorithm, respectively counting the number of the time domain points belonging to the speech and the noise, and then dividing the number of the time domain points of the speech by the sum of the number of the time domain points of the speech and the noise, and obtaining the occupation ratio of the time domain points belonging to the voice in the selected time domain points, and taking the occupation ratio as the voice occupation ratio estimation value of the voice signal with noise.
Step S30, judging whether the voice ratio estimation value is smaller than a preset voice ratio value;
judging whether the noise-carrying voice needs to be subjected to noise reduction according to the voice-carrying degree in the noise-carrying voice, namely if the voice ratio estimated value is smaller than a preset voice ratio value, carrying out noise reduction processing on the noise-carrying voice; if the estimated value of the voice ratio in the voice with noise is larger than or equal to the preset voice ratio value, in order to reduce the distortion of the voice, the noise reduction processing is not carried out on the voice with noise.
And step S40, if the voice proportion estimation value is smaller than a preset voice proportion value, carrying out noise reduction processing on the voice signal with noise to obtain a noise reduction voice signal.
The posterior signal-to-noise ratio represents the ratio of the power of the noise signal to the power of the noisy speech signal, so that the degree of the noise signal contained in the noisy speech signal can be known according to the posterior signal-to-noise ratio of the noisy speech signal, if the degree of the noise signal contained is not high, namely the posterior signal-to-noise ratio is greater than or equal to a preset signal-to-noise ratio (wherein the preset signal-to-noise ratio can be set by a user or can be preset by a system, specifically, the preset signal-to-noise ratio is set according to the user's choice for the authenticity of the speech signal and the quality of the speech data, and if the user requires the authenticity of the speech signal to be higher than the quality of the speech data, the preset signal-to-noise ratio is set to be small), then the speech voiceprint characteristics in the noisy; if the degree of the noise signal is high, that is, the posterior signal-to-noise ratio is smaller than the preset signal-to-noise ratio, it is indicated that the noise signal has affected the collection of the voice voiceprint feature, and if the voice voiceprint feature in the noisy voice signal needs to be accurately collected, the noise reduction processing needs to be performed on the noisy voice signal.
In this embodiment, a noisy speech signal is obtained, a speech ratio estimation value of a speech signal in the noisy speech signal is estimated, whether the speech ratio estimation value is smaller than a preset speech ratio value is judged, if the speech ratio estimation value is smaller than the preset speech ratio value, noise reduction processing is performed on the noisy speech signal to obtain a noise-reduced speech signal, when speech data is collected, two microphones, a forward access circuit and a reverse access circuit are arranged to perform preliminary noise reduction, and noise reduction processing is performed on the noisy speech signal when a speech degree contained in speech is too low, so that quality of the speech data is ensured, in addition, when the speech degree contained in speech is high, noise reduction processing is not performed on the noisy speech signal, and authenticity of the speech data is ensured, therefore, according to a level of the speech signal contained in the noisy speech signal, to determine whether the noise-carrying speech signal needs to be noise-reduced. The purpose of guaranteeing the authenticity of the voice signal while improving the voice data acquisition quality is achieved.
Referring to fig. 3, in a second embodiment of the voice data collecting method of the present invention, based on the first embodiment, the step of performing noise reduction processing on the noisy voice signal includes:
step S50, obtaining noise reduction attenuation gain based on the second prior signal-to-noise ratio;
framing noisy speech signals (because speech signals are not stable, a stable random process is a main means for researching speech signals, and short-time speech signals are stable, the speech signals are generally divided into a series of 10-30 ms speech segments, the speech segments are called analysis frames, in order to enable smooth transition between frames, about 0-1/2 duration is generally used as frame shift, the frame shift is an overlapping area between two adjacent frames, the process is called framing) and windowing (after framing, the beginning and the end of each frame are discontinuous, a hamming window is generally added for solving the problem) are carried out to obtain noisy speech frame signals, discrete Fourier transform is carried out on each speech frame noisy speech frame signal according to the time sequence to obtain discrete magnitude spectrum groups, then the discrete magnitude spectrum groups are processed according to a minimum statistical method, obtaining a minimum amplitude spectrum, then calculating a noise power spectrum estimation value according to the minimum amplitude spectrum, calculating a posterior signal-to-noise ratio by combining the minimum amplitude spectrum with the noise power spectrum estimation value, obtaining an estimation value of a voice amplitude spectrum which is one frame before the noise power spectrum and has the same frequency through one frame delay, calculating a voice power spectrum estimation value according to the voice amplitude spectrum estimation value, obtaining a first prior signal-to-noise ratio according to the voice power spectrum estimation value and the noise power spectrum estimation value, then substituting the first prior signal-to-noise ratio and the posterior signal-to-noise ratio into an estimation calculation formula based on MMSE (minimum mean square error) to obtain a second prior signal-to-noise ratio, and then calculating a noise reduction attenuation gain by the second prior signal-.
Step S60, obtaining a noise reduction speech frame signal based on the noise reduction attenuation gain and the noisy speech frame signal;
the noise reduction attenuation gain represents the degree of noise reduction required by the noise-carrying voice, the larger the noise content in the noise-carrying voice is, the smaller the value is, theoretically, the noise reduction attenuation gain of the voice signal is close to 1, the noise reduction attenuation gain of the noise signal is close to 0, and the noise-carrying voice frame signal is processed according to the noise reduction attenuation gain obtained through calculation to obtain the noise reduction voice frame signal.
And step S70, performing windowing and overlap-add processing on the noise-reduced speech frame signal to obtain a noise-reduced speech signal.
And performing windowing and overlap-add processing on the voice frame signal subjected to noise reduction processing to obtain a noise-reduced voice signal.
In this embodiment, the wiener filter algorithm has an input value of a noisy speech signal, a difference between the expected output and the actual output is an error, and the mean square of the difference is a mean square error, so that the smaller the mean square error is, the closer the actual output is to the expected output, that is, the smaller the mean square error is, the better the noise removal effect is, and in order to minimize the mean square error, the MMSE is used to estimate the prior signal-to-noise ratio, so that the noise reduction processing and noise reduction effect of the noisy speech is good.
Referring to fig. 4, in a third embodiment of the voice data collecting method according to the present invention, based on the first embodiment, the step of acquiring a noisy voice signal includes:
step S80, acquiring a negative original voice signal of a microphone of the reverse access circuit;
the negative original voice signal of the microphone of the reverse access circuit is collected, the noise reduction principle of the double microphones is realized by the positive and negative counteraction of the signals, the microphone of the forward access circuit in the double microphones is closer to the voice signal sound source to be collected, and the microphone of the reverse access circuit is farther from the voice signal sound source to be collected, so that the voice signal collected by the microphone of the forward access circuit is stronger than the voice signal collected by the microphone of the reverse access circuit. The microphone of the reverse access circuit is responsible for collecting negative original voice signals, generally, the microphone of the reverse access circuit and the microphone of the forward access circuit are assembled in one device, however, in practice, the larger the distance between the microphone of the reverse access circuit and the microphone of the forward access circuit is, the better (in case of ensuring that both the microphone of the reverse access circuit and the microphone of the forward access circuit can receive voice signals to be collected), because the distance from the sound source of general noise to the microphone is far larger than the distance from the sound source of voice to be collected to the microphone, noise signals collected by the microphone of the reverse access circuit and the microphone of the forward access circuit can be cancelled, and because the strengths of the voice signals collected by the microphone of the reverse access circuit and the microphone of the forward access circuit are different, the speech signal is left in the cancellation process. Certainly, the dual microphones cannot perform hundred-percent denoising and can only perform denoising, and in some scenes, the effect of denoising by the dual microphones is very poor, for example, the noise intensity is too high, and generally, denoising by the dual microphones can only cancel noise with intensity in a certain range; the noise source is particularly close to the source of speech.
Step S90, acquiring a positive original voice signal of a microphone of a positive access circuit;
the positive original voice signal of a microphone of a forward access circuit is collected.
And S100, mutually superposing the negative original voice signal and the positive original voice signal to preliminarily remove noise to obtain a voice signal with noise.
Because the negative original speech signal collected in step S80 and the positive original speech signal collected in step S90 are in an inverse phase relationship, they are superimposed to each other to remove noise, that is, the positive original speech signal and the negative original speech signal are subtracted from each other, wherein the noise theoretically cancels each other, and the speech signals in the positive original speech signal and the negative original speech signal are only partially cancelled due to the different strengths of the speech signals collected by the two microphones, and finally the speech signal is obtained. However, the noise source may be too strong in the original speech signal, or the noise source is close to the microphone for acquiring the positive original speech signal or close to the microphone for acquiring the negative original speech signal during acquisition, which may cause the noise removal to be incomplete, and the original speech signal finally subjected to the dual-microphone noise reduction processing still has noise.
In this embodiment, when collecting original speech signal, adopt two microphones noise reduction system to carry out preliminary noise reduction to original speech signal, obtain follow-up required noisy speech signal, because what the two microphones end to be got in noise reduction is the speech signal that intensity is weak relatively, rather than an estimated value, so the authenticity of speech signal has been guaranteed, and, owing to having passed through preliminary noise reduction, follow-up noisy speech signal that need fall noise through wiener filtering also can be relative less, speech signal's authenticity has further been improved, the emergence of the excessive distortion condition of speech signal has been avoided.
The present invention also provides a terminal, including: the voice data acquisition system comprises a memory, a processor and a voice data acquisition program which is stored on the memory and can run on the processor, wherein the voice data acquisition program realizes the steps of the voice data acquisition method in each embodiment when being executed by the processor.
The invention also provides a computer-readable storage medium, on which a voice data acquisition program is stored, which, when executed by a processor, implements the steps of the embodiments of the voice data acquisition method described above.
In the embodiments of the voice data collecting terminal and the computer-readable storage medium of the present invention, all technical features of the embodiments of the voice data collecting method are included, and the expanding and explaining contents of the specification are basically the same as those of the embodiments of the voice data collecting method, and are not described herein again.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A method for voice data acquisition, comprising:
acquiring a voice signal with noise;
estimating a voice ratio estimation value of a voice signal in the voice signal with noise;
judging whether the voice ratio estimation value is smaller than a preset voice ratio value or not;
and if the voice proportion estimation value is smaller than a preset voice proportion value, carrying out noise reduction processing on the voice signal with noise to obtain a noise reduction voice signal.
2. The speech data acquisition method of claim 1 wherein said step of estimating a speech to speech ratio estimate for a speech signal in said noisy speech signal is preceded by the steps of:
framing and windowing the voice signal with noise to obtain a voice frame signal with noise;
and respectively carrying out discrete Fourier transform on each voice frame signal with noise according to a time sequence to obtain a discrete magnitude spectrum group.
3. The speech data acquisition method of claim 2 wherein the speech to noise ratio estimate is an a posteriori signal to noise ratio, and wherein the step of estimating the speech to noise ratio estimate for the speech signal in the noisy speech signal comprises:
based on the discrete magnitude spectrum group, acquiring a minimum magnitude spectrum by a minimum statistical method;
acquiring a noise power spectrum estimation value based on the minimum amplitude spectrum;
and obtaining the posterior signal-to-noise ratio based on the minimum amplitude spectrum and the noise power spectrum estimation value.
4. The speech data acquisition method of claim 3 wherein said step of denoising said noisy speech signal comprises:
acquiring a voice amplitude spectrum of the same frequency of a previous frame of the noise power spectrum, and acquiring a voice power spectrum estimation value based on the voice amplitude spectrum;
and obtaining a prior signal-to-noise ratio based on the voice power spectrum estimated value, the noise power spectrum estimated value and the posterior signal-to-noise ratio.
5. The speech data acquisition method of claim 4 wherein said step of deriving a priori signal-to-noise ratio based on said estimate of the speech power spectrum, said estimate of the noise power spectrum, and said a posteriori signal-to-noise ratio comprises:
obtaining a first prior signal-to-noise ratio based on the voice power spectrum estimation value and the noise power spectrum estimation value;
and obtaining a second prior signal-to-noise ratio based on MMSE according to the posterior signal-to-noise ratio and the first prior signal-to-noise ratio, wherein the first prior signal-to-noise ratio is the prior signal-to-noise ratio of the previous frame of the same-frequency voice of the second prior signal-to-noise ratio.
6. The speech data acquisition method of claim 5 wherein said step of denoising said noisy speech signal comprises:
obtaining a noise reduction attenuation gain based on the second prior signal-to-noise ratio;
obtaining a noise reduction voice frame signal based on the noise reduction attenuation gain and the noisy voice frame signal;
and performing windowing and overlap-add processing on the noise-reduction voice frame signal to obtain a noise-reduction voice signal.
7. The voice data acquisition method of claim 1, wherein the step of acquiring a noisy voice signal is preceded by:
acquiring a negative original voice signal of a microphone of a reverse access circuit;
acquiring a positive original voice signal of a microphone of a forward access circuit;
and overlapping the negative original voice signal and the positive original voice signal to preliminarily remove noise to obtain a voice signal with noise.
8. The method for collecting voice data according to claim 1, wherein the step of determining whether the estimated voice ratio value is smaller than a preset voice ratio value further comprises:
and if the posterior signal-to-noise ratio is greater than or equal to the preset value, not performing noise reduction processing on the voice signal with noise.
9. A terminal, characterized in that the terminal comprises: a memory, a processor and a voice data acquisition program stored on the memory and running on the processor, the voice data acquisition program when executed by the processor implementing the steps of the voice data acquisition method of any one of claims 1 to 8.
10. A readable storage medium, characterized in that the readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the speech data acquisition method according to any one of claims 1 to 8.
CN201911248621.0A 2019-12-06 2019-12-06 Voice data acquisition method, terminal and readable storage medium Pending CN110970051A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911248621.0A CN110970051A (en) 2019-12-06 2019-12-06 Voice data acquisition method, terminal and readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911248621.0A CN110970051A (en) 2019-12-06 2019-12-06 Voice data acquisition method, terminal and readable storage medium

Publications (1)

Publication Number Publication Date
CN110970051A true CN110970051A (en) 2020-04-07

Family

ID=70033397

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911248621.0A Pending CN110970051A (en) 2019-12-06 2019-12-06 Voice data acquisition method, terminal and readable storage medium

Country Status (1)

Country Link
CN (1) CN110970051A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111627454A (en) * 2020-05-13 2020-09-04 广州国音智能科技有限公司 Environment voice acquisition processing method, device, equipment and readable storage medium
CN112053702A (en) * 2020-09-30 2020-12-08 北京大米科技有限公司 Voice processing method and device and electronic equipment
CN112185410A (en) * 2020-10-21 2021-01-05 北京猿力未来科技有限公司 Audio processing method and device
CN112700787A (en) * 2021-03-24 2021-04-23 深圳市中科蓝讯科技股份有限公司 Noise reduction method, nonvolatile readable storage medium and electronic device
CN114171061A (en) * 2021-12-29 2022-03-11 苏州科达特种视讯有限公司 Time delay estimation method, equipment and storage medium
CN114822573A (en) * 2022-04-28 2022-07-29 歌尔股份有限公司 Speech enhancement method, speech enhancement device, earphone device and computer-readable storage medium

Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140025374A1 (en) * 2012-07-22 2014-01-23 Xia Lou Speech enhancement to improve speech intelligibility and automatic speech recognition
CN103700375A (en) * 2013-12-28 2014-04-02 珠海全志科技股份有限公司 Voice noise-reducing method and voice noise-reducing device
CN104869209A (en) * 2015-04-24 2015-08-26 广东小天才科技有限公司 Method and apparatus for adjusting recording of mobile terminal
CN105261359A (en) * 2015-12-01 2016-01-20 南京师范大学 Noise elimination system and method of mobile phone microphones
CN105280193A (en) * 2015-07-20 2016-01-27 广东顺德中山大学卡内基梅隆大学国际联合研究院 Prior signal-to-noise ratio estimating method based on MMSE error criterion
CN105872156A (en) * 2016-05-25 2016-08-17 腾讯科技(深圳)有限公司 Echo time delay tracking method and device
CN106098077A (en) * 2016-07-28 2016-11-09 浙江诺尔康神经电子科技股份有限公司 Artificial cochlea's speech processing system of a kind of band noise reduction and method
CN106328155A (en) * 2016-09-13 2017-01-11 广东顺德中山大学卡内基梅隆大学国际联合研究院 Speech enhancement method of correcting priori signal-to-noise ratio overestimation
CN106961509A (en) * 2017-04-25 2017-07-18 广东欧珀移动通信有限公司 Session parameter processing method, device and electronic equipment
CN107610712A (en) * 2017-10-18 2018-01-19 会听声学科技(北京)有限公司 The improved MMSE of combination and spectrum-subtraction a kind of sound enhancement method
CN109068215A (en) * 2018-08-14 2018-12-21 歌尔科技有限公司 A kind of noise-reduction method of In-Ear Headphones, device and In-Ear Headphones
CN109473096A (en) * 2017-09-08 2019-03-15 北京君林科技股份有限公司 A kind of intelligent sound equipment and its control method
CN109473095A (en) * 2017-09-08 2019-03-15 北京君林科技股份有限公司 A kind of intelligent home control system and control method
CN109584881A (en) * 2018-11-29 2019-04-05 平安科技(深圳)有限公司 Number identification method, device and terminal device based on speech processes
CN109961799A (en) * 2019-01-31 2019-07-02 杭州惠耳听力技术设备有限公司 A kind of hearing aid multicenter voice enhancing algorithm based on Iterative Wiener Filtering
CN110136737A (en) * 2019-06-18 2019-08-16 北京拙河科技有限公司 A kind of voice de-noising method and device
CN110335593A (en) * 2019-06-17 2019-10-15 平安科技(深圳)有限公司 Sound end detecting method, device, equipment and storage medium

Patent Citations (17)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20140025374A1 (en) * 2012-07-22 2014-01-23 Xia Lou Speech enhancement to improve speech intelligibility and automatic speech recognition
CN103700375A (en) * 2013-12-28 2014-04-02 珠海全志科技股份有限公司 Voice noise-reducing method and voice noise-reducing device
CN104869209A (en) * 2015-04-24 2015-08-26 广东小天才科技有限公司 Method and apparatus for adjusting recording of mobile terminal
CN105280193A (en) * 2015-07-20 2016-01-27 广东顺德中山大学卡内基梅隆大学国际联合研究院 Prior signal-to-noise ratio estimating method based on MMSE error criterion
CN105261359A (en) * 2015-12-01 2016-01-20 南京师范大学 Noise elimination system and method of mobile phone microphones
CN105872156A (en) * 2016-05-25 2016-08-17 腾讯科技(深圳)有限公司 Echo time delay tracking method and device
CN106098077A (en) * 2016-07-28 2016-11-09 浙江诺尔康神经电子科技股份有限公司 Artificial cochlea's speech processing system of a kind of band noise reduction and method
CN106328155A (en) * 2016-09-13 2017-01-11 广东顺德中山大学卡内基梅隆大学国际联合研究院 Speech enhancement method of correcting priori signal-to-noise ratio overestimation
CN106961509A (en) * 2017-04-25 2017-07-18 广东欧珀移动通信有限公司 Session parameter processing method, device and electronic equipment
CN109473096A (en) * 2017-09-08 2019-03-15 北京君林科技股份有限公司 A kind of intelligent sound equipment and its control method
CN109473095A (en) * 2017-09-08 2019-03-15 北京君林科技股份有限公司 A kind of intelligent home control system and control method
CN107610712A (en) * 2017-10-18 2018-01-19 会听声学科技(北京)有限公司 The improved MMSE of combination and spectrum-subtraction a kind of sound enhancement method
CN109068215A (en) * 2018-08-14 2018-12-21 歌尔科技有限公司 A kind of noise-reduction method of In-Ear Headphones, device and In-Ear Headphones
CN109584881A (en) * 2018-11-29 2019-04-05 平安科技(深圳)有限公司 Number identification method, device and terminal device based on speech processes
CN109961799A (en) * 2019-01-31 2019-07-02 杭州惠耳听力技术设备有限公司 A kind of hearing aid multicenter voice enhancing algorithm based on Iterative Wiener Filtering
CN110335593A (en) * 2019-06-17 2019-10-15 平安科技(深圳)有限公司 Sound end detecting method, device, equipment and storage medium
CN110136737A (en) * 2019-06-18 2019-08-16 北京拙河科技有限公司 A kind of voice de-noising method and device

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111627454A (en) * 2020-05-13 2020-09-04 广州国音智能科技有限公司 Environment voice acquisition processing method, device, equipment and readable storage medium
CN111627454B (en) * 2020-05-13 2023-07-21 广州国音智能科技有限公司 Method, device and equipment for collecting and processing environmental voice and readable storage medium
CN112053702A (en) * 2020-09-30 2020-12-08 北京大米科技有限公司 Voice processing method and device and electronic equipment
CN112053702B (en) * 2020-09-30 2024-03-19 北京大米科技有限公司 Voice processing method and device and electronic equipment
CN112185410A (en) * 2020-10-21 2021-01-05 北京猿力未来科技有限公司 Audio processing method and device
CN112185410B (en) * 2020-10-21 2024-04-30 北京猿力未来科技有限公司 Audio processing method and device
CN112700787A (en) * 2021-03-24 2021-04-23 深圳市中科蓝讯科技股份有限公司 Noise reduction method, nonvolatile readable storage medium and electronic device
CN114171061A (en) * 2021-12-29 2022-03-11 苏州科达特种视讯有限公司 Time delay estimation method, equipment and storage medium
CN114822573A (en) * 2022-04-28 2022-07-29 歌尔股份有限公司 Speech enhancement method, speech enhancement device, earphone device and computer-readable storage medium

Similar Documents

Publication Publication Date Title
CN110970051A (en) Voice data acquisition method, terminal and readable storage medium
CN109767783B (en) Voice enhancement method, device, equipment and storage medium
CN110970057B (en) Sound processing method, device and equipment
CN107910013B (en) Voice signal output processing method and device
CN107833579B (en) Noise elimination method, device and computer readable storage medium
CN104103278A (en) Real time voice denoising method and device
CN110556125B (en) Feature extraction method and device based on voice signal and computer storage medium
CN110808030B (en) Voice awakening method, system, storage medium and electronic equipment
CN113539285B (en) Audio signal noise reduction method, electronic device and storage medium
CN112233689B (en) Audio noise reduction method, device, equipment and medium
CN110931028A (en) Voice processing method and device and electronic equipment
CN110503973B (en) Audio signal transient noise suppression method, system and storage medium
CN111627416B (en) Audio noise elimination method, device, equipment and storage medium
CN110556128B (en) Voice activity detection method and device and computer readable storage medium
CN110689901B (en) Voice noise reduction method and device, electronic equipment and readable storage medium
CN112201267A (en) Audio processing method and device, electronic equipment and storage medium
WO2008076515A1 (en) Method and apparatus for robust speech activity detection
CN113316075B (en) Howling detection method and device and electronic equipment
CN112233688B (en) Audio noise reduction method, device, equipment and medium
CN114360572A (en) Voice denoising method and device, electronic equipment and storage medium
CN114255778A (en) Audio stream noise reduction method, device, equipment and storage medium
CN110875043B (en) Voiceprint recognition method and device, mobile terminal and computer readable storage medium
CN115295024A (en) Signal processing method, signal processing device, electronic apparatus, and medium
CN114627889A (en) Multi-sound-source sound signal processing method and device, storage medium and electronic equipment
CN111048096B (en) Voice signal processing method and device and terminal

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20200407