CN110322891A - A kind of processing method of voice signal, device, terminal and storage medium - Google Patents
A kind of processing method of voice signal, device, terminal and storage medium Download PDFInfo
- Publication number
- CN110322891A CN110322891A CN201910593752.6A CN201910593752A CN110322891A CN 110322891 A CN110322891 A CN 110322891A CN 201910593752 A CN201910593752 A CN 201910593752A CN 110322891 A CN110322891 A CN 110322891A
- Authority
- CN
- China
- Prior art keywords
- voice signal
- band voice
- narrow band
- frequency domain
- domain character
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 19
- 238000003062 neural network model Methods 0.000 claims abstract description 43
- 230000009467 reduction Effects 0.000 claims abstract description 42
- 238000001228 spectrum Methods 0.000 claims abstract description 38
- 239000000284 extract Substances 0.000 claims abstract description 9
- 238000000034 method Methods 0.000 claims description 29
- 238000012545 processing Methods 0.000 claims description 22
- 230000006870 function Effects 0.000 claims description 18
- 238000009432 framing Methods 0.000 claims description 16
- 238000012952 Resampling Methods 0.000 claims description 15
- 230000000694 effects Effects 0.000 claims description 11
- 230000004913 activation Effects 0.000 claims description 10
- 238000001514 detection method Methods 0.000 claims description 8
- 238000000605 extraction Methods 0.000 claims description 6
- 230000005236 sound signal Effects 0.000 claims description 6
- 238000004590 computer program Methods 0.000 claims description 3
- 230000006835 compression Effects 0.000 abstract description 9
- 238000007906 compression Methods 0.000 abstract description 9
- 238000011084 recovery Methods 0.000 abstract description 7
- 238000012549 training Methods 0.000 description 29
- 238000004891 communication Methods 0.000 description 8
- 230000008569 process Effects 0.000 description 8
- 230000005540 biological transmission Effects 0.000 description 7
- 230000015572 biosynthetic process Effects 0.000 description 6
- 238000012360 testing method Methods 0.000 description 6
- 238000013528 artificial neural network Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000003287 optical effect Effects 0.000 description 5
- 238000004458 analytical method Methods 0.000 description 4
- 230000005291 magnetic effect Effects 0.000 description 4
- 238000003786 synthesis reaction Methods 0.000 description 4
- 230000002123 temporal effect Effects 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 3
- 230000002093 peripheral effect Effects 0.000 description 3
- 238000005070 sampling Methods 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 230000003925 brain function Effects 0.000 description 2
- 235000013399 edible fruits Nutrition 0.000 description 2
- 230000005611 electricity Effects 0.000 description 2
- 230000010365 information processing Effects 0.000 description 2
- 230000001537 neural effect Effects 0.000 description 2
- 210000002569 neuron Anatomy 0.000 description 2
- 238000005312 nonlinear dynamic Methods 0.000 description 2
- 238000012546 transfer Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000004556 brain Anatomy 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000003247 decreasing effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 230000001771 impaired effect Effects 0.000 description 1
- 210000005036 nerve Anatomy 0.000 description 1
- 210000003733 optic disk Anatomy 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L19/00—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
- G10L19/04—Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis using predictive techniques
- G10L19/16—Vocoder architecture
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Compression, Expansion, Code Conversion, And Decoders (AREA)
Abstract
The embodiment of the invention discloses a kind of processing method of voice signal, device, terminal and storage mediums, which comprises obtains compressed narrow band voice signal;Extract the frequency domain character of the narrow band voice signal;The frequency domain character of the narrow band voice signal is inputted into trained depth noise reduction self-encoding encoder neural network model and carries out nonlinear fitting, obtains the frequency domain character of Whole frequency band voice signal;The power spectrum that the frequency domain character of the Whole frequency band voice signal is converted to Whole frequency band voice signal does inverse Fourier transform to the power spectrum of the Whole frequency band voice signal using the phase information of corresponding narrow band signal, obtains Whole frequency band voice signal.The embodiment of the present invention carries out bandwidth recovery to narrow band voice signal after compression by using depth noise reduction self-encoding encoder neural network model, improves the quality and intelligibility of voice signal.
Description
Technical field
The present embodiments relate to voice processing technology field more particularly to a kind of processing method of voice signal, device,
Terminal and storage medium.
Background technique
Voice signal is one of the important way that the mankind are linked up, especially with the development that science and technology is with rapid changepl. never-ending changes and improvements, language
Sound signal needs are transmitted between mobile phone, computer.Transmission process just needs to carry out compressed encoding to voice signal, with removal
Redundancy in voice signal reduces transmission bit rate or memory space, therefore is particularly important to the compression of voice signal.
Vocoder appears in AT&T Labs, the U.S. earliest, is mainly used for signal band compression, phonetic storage communication and guarantor
Close communication.Speech Signal Compression coding is widely applied using channel vocoder, it extracts language to voice signal first
The frequency domain character parameter of sound signal carries out coding encrypting, recovers raw tone waveform, the course of work further according to characteristic parameter
Are as follows: for the time-frequency spectrum information input of voice signal into vocoder, voice signal is divided into frequency band by the bandpass filter in vocoder
The signal in adjacent different channels reuses Hilbert transform and low-pass filter and carries out envelope extraction to signal, then adopts
Sinusoidal signal is used to carry out amplitude modulation to the envelope information extracted as carrier wave, finally signal synthesizes one group by treated
Export voice signal.
But human ear is utilized to this characteristic of voice signal phase-unsensitive in vocoder, synthesizes to speech signal analysis
When only have to the amplitude spectrum of signal it is required, so voice signal and primary speech signal that vocoder synthesizes on waveform very
Difficulty is compared, and the voice quality and intelligibility of vocoder synthesis can only be measured by subjective scoring to be measured.In addition acoustic code
Device TRANSFER MODEL parameter while bringing preferable band compression effect, also brings larger danger to the naturalness of voice signal
Evil.When especially with single channel vocoder, the narrow band voice signal of synthesis has cast out many details, so as to cause narrowband speech
The quality and intelligibility of signal reduce.
Summary of the invention
The embodiment of the present invention provides method, apparatus, server and the storage medium of a kind of voice signal, to improve voice letter
Number quality and intelligibility.
In a first aspect, the embodiment of the invention provides a kind of processing methods of voice signal, comprising:
Obtain compressed narrow band voice signal;
Extract the frequency domain character of the narrow band voice signal;
The frequency domain character of the narrow band voice signal is inputted into trained depth noise reduction self-encoding encoder neural network model
Nonlinear fitting is carried out, the frequency domain character of Whole frequency band voice signal is obtained;
The power spectrum that the frequency domain character of the Whole frequency band voice signal is converted to Whole frequency band voice signal, to the full range
Power spectrum with voice signal does inverse Fourier transform, obtains Whole frequency band voice signal.
Optionally, the frequency domain character is mel-frequency cepstrum coefficient.
Optionally, the depth noise reduction self-encoding encoder neural network model uses sigmoid function as activation primitive, hidden
The hiding number of plies is set as layer 2-4.
Optionally, the compressed narrow band voice signal of acquisition includes:
Primary speech signal input vocoder is compressed, compressed narrow band voice signal is obtained;
The narrow band voice signal is pre-processed.
Optionally, the vocoder is channel vocoder.
Optionally, the low-pass cut-off frequencies of the vocoder are set as 100Hz, 300Hz or 500Hz.
Optionally, it is described to the narrow band voice signal carry out pretreatment include:
Preemphasis is carried out to the narrow band voice signal, obtains preemphasis narrow band voice signal;
Resampling is carried out to the preemphasis narrow band voice signal, obtains resampling narrow band voice signal;
Framing operation is carried out to the resampling narrow band voice signal and adding window is smooth, the narrowband speech letter after obtaining framing
Number;
Voice activity detection is carried out to the narrow band voice signal after the framing, it is living to obtain removing mute section of narrowband speech
Dynamic signal.
Second aspect, the embodiment of the invention provides a kind of processing units of voice signal, comprising:
Narrow band voice signal obtains module, for obtaining compressed narrow band voice signal;
Narrowband frequency domain character extraction module, for extracting the frequency domain character of the narrow band voice signal;
Whole frequency band frequency domain character obtains module, for the frequency domain character of the narrow band voice signal to be inputted trained depth
It spends noise reduction self-encoding encoder neural network model and carries out nonlinear fitting, obtain the frequency domain character of Whole frequency band voice signal;
Whole frequency band voice signal obtains module, for the frequency domain character of the Whole frequency band voice signal to be converted to Whole frequency band
The power spectrum of voice signal does inverse Fourier transform to the power spectrum of the Whole frequency band voice signal, obtains Whole frequency band voice letter
Number.
The third aspect, the embodiment of the invention also provides a kind of terminals, comprising:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processing
Device realizes the processing method of the voice signal provided such as any embodiment of the present invention.
Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer
Program realizes the processing method of the voice signal provided such as any embodiment of the present invention when the program is executed by processor.
The frequency domain character is input to by the embodiment of the present invention by the frequency domain character of the compressed narrow band voice signal of extraction
Trained depth noise reduction self-encoding encoder neural network model carries out nonlinear fitting, and the frequency domain for obtaining Whole frequency band voice signal is special
It levies, then the frequency domain character of Whole frequency band voice signal is converted to the power spectrum of Whole frequency band voice signal, and then do Fourier's inversion
It changes, to obtain Whole frequency band voice signal.It realizes and compressed narrow band voice signal is reverted into Whole frequency band voice signal, mention
The high quality and intelligibility of voice signal.
Detailed description of the invention
Fig. 1 is a kind of flow chart of the processing method for voice signal that the embodiment of the present invention one provides;
Fig. 2 is a kind of flow chart of the processing method of voice signal provided by Embodiment 2 of the present invention;
Fig. 3 is the flow chart that a kind of pair of narrow band voice signal that the embodiment of the present invention three provides carries out pretreated method;
Fig. 4 is a kind of structural schematic diagram of the processing unit for voice signal that the embodiment of the present invention four provides;
Fig. 5 is a kind of structural schematic diagram for terminal that the embodiment of the present invention five provides.
Specific embodiment
The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched
The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just
Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.
It should be mentioned that some exemplary embodiments are described as before exemplary embodiment is discussed in greater detail
The processing or method described as flow chart.Although each step is described as the processing of sequence by flow chart, many of these
Step can be implemented concurrently, concomitantly or simultaneously.In addition, the sequence of each step can be rearranged.When its operation
Processing can be terminated when completion, it is also possible to have the additional step being not included in attached drawing.Handle the side of can correspond to
Method, function, regulation, subroutine, subprogram etc..
In addition, term " first ", " second " etc. can be used to describe herein various directions, movement, step or element etc.,
But these directions, movement, step or element should not be limited by these terms.These terms are only used to by first direction, movement, step
Rapid or element and another direction, movement, step or element are distinguished.For example, the case where not departing from scope of the present application
Under, the first training sample can be known as the second training sample, and similarly, the second training sample can be known as to the first training sample
This.First training sample and the second training sample both training sample, but it is not same training sample.Term " the
One ", " second " etc. is not understood to indicate or imply relative importance or implicitly indicates the number of indicated technical characteristic
Amount." first " is defined as a result, the feature of " second " can explicitly or implicitly include one or more of the features.
In the description of the present invention, " multiple ", " batch " are meant that at least two, such as two, three etc., unless otherwise clearly having
The restriction of body.
Embodiment one
Fig. 1 is a kind of flow chart of the processing method for voice signal that the embodiment of the present invention one provides, and the present embodiment can fit
Narrow band voice signal for exporting to vocoder carries out bandwidth recovery, and this method can be held by the processing unit of voice signal
Row, which can be realized by the way of software and/or hardware, and can be integrated at the terminal, such as smart phone, plate electricity
Brain, PC (PC) and learning machine etc.
As shown in Figure 1, a kind of processing method for voice signal that the embodiment of the present invention one provides may include:
S101, compressed narrow band voice signal is obtained;
Specifically, the mode that speech is transmitted between electronic devices is referred to as voice signal.Voice signal is transmitted across
Journey first has to carry out compressed encoding to voice, to remove the redundancy in untreated primary speech signal, reduces transfer ratio
Special rate or memory space.After carrying out band compression in untreated primary speech signal input vocoder, reform into narrow
Band voice signal, and narrow band voice signal is declined compared to primary speech signal, voice quality and intelligibility, therefore,
Compressed narrow band voice signal need to be first obtained, the subsequent relevant treatment for being converted into Whole frequency band voice signal is carried out to it, with
Improve the voice quality and intelligibility of narrow band voice signal.
S102, the frequency domain character for extracting the narrow band voice signal;
Specifically, the feature of voice signal mainly has two major classes, temporal signatures and frequency domain character, temporal signatures include: short
When average energy, short-time average zero-crossing rate, formant and pitch period etc., frequency domain character includes: linear predictor coefficient (Linear
Predictive Coding, LPC), linear prediction residue error (Linear Predictive Cepstral Coding,
LPCC), line spectrum pairs parameter (linear spectrum pairs, LSP), short-term spectrum and mel-frequency cepstrum coefficient (Mel-
Frequency Cepstral Coefficients, MFCC) etc..Extract the frequency domain character of narrow band voice signal, it is preferred that can
To extract the mel-frequency cepstrum coefficient of narrow band voice signal, to carry out subsequent audio signal processing method.
S103, the frequency domain character of the narrow band voice signal is inputted into trained depth noise reduction self-encoding encoder neural network
Model carries out nonlinear fitting, obtains the frequency domain character of Whole frequency band voice signal;
Specifically, neural network be widely interconnected by a large amount of, simple processing unit (referred to as neuron) and
The complex networks system of formation, it reflects many essential characteristics of human brain function, is a highly complex non-linear dynamic
Learning system.Neural network has large-scale parallel, distributed storage and processing, self-organizing, adaptive and self-learning ability, especially
It is suitble to processing to need while considers many factors and condition, inaccurate and fuzzy information-processing problem.Depth noise reduction is self-editing
The effect of code device neural network model is to carrying out nonlinear fitting between narrow band voice signal and full frequency band voice signal, mainly
Including training stage and test phase.
In the training stage of depth noise reduction self-encoding encoder neural network model, depth noise reduction self-encoding encoder neural network model
Including input layer, hidden layer and output layer, input layer is used to receive the input letter of depth noise reduction self-encoding encoder neural network model
Number, output layer is for exporting depth noise reduction self-encoding encoder neural network model output signal, and hidden layer is for carrying out input signal
Non-linear matches between output signal.Activation primitive is used to for nonlinear characteristic being introduced into depth noise reduction self-encoding encoder nerve
In network model, so that depth noise reduction self-encoding encoder neural network model completes the nonlinear fitting to input-output signal.It is excellent
Choosing, use sigmoid function as activation primitive, the hiding number of plies is set as 3 layers, and every layer of neuronal quantity is set as 500.
Depth noise reduction self-encoding encoder neural network is input to by the frequency domain character of the narrow band voice signal of a large amount of first training sample
Model, by the frequency domain character and each first training sample of the Whole frequency band voice signal of the first training sample of each of model output
Corresponding primary speech signal calculates loss function after being compared, and controls activation primitive according to loss function calculated result and obtains
The Nonlinear Parameter fitting result of input-output signal, to complete the training to depth noise reduction self-encoding encoder neural network model.
The test phase of depth noise reduction self-encoding encoder neural network model i.e. trained depth noise reduction self-encoding encoder
Neural network model uses test phase, and the frequency domain character of the narrow band voice signal of the second training sample is input to depth drop
Self-encoding encoder of making an uproar neural network model, by the frequency domain character of the Whole frequency band voice signal of the second training sample of each of model output
Primary speech signal corresponding with each second training sample calculates loss function after being compared, and is calculated and is tied according to loss function
Whether fruit determination will also need to continue to the training of depth noise reduction self-encoding encoder neural network model.
S104, the power spectrum that the frequency domain character of the Whole frequency band voice signal is converted to Whole frequency band voice signal, to institute
The power spectrum for stating Whole frequency band voice signal does inverse Fourier transform, obtains Whole frequency band voice signal.
Specifically, the present embodiment carries out Short Time Fourier Analysis to the frequency domain character of Whole frequency band voice signal, calculate each
It is overlapped the discrete Fourier transform of adding window frame, to obtain the power spectrum of every frame signal;Again to the power of Whole frequency band voice signal
Spectrum does inverse Fourier transform, the Whole frequency band voice signal for the bandwidth that is restored.
Compared with the existing technology, the embodiment of the present invention one passes through the frequency domain character for extracting compressed narrow band voice signal,
The frequency domain character is input to trained depth noise reduction self-encoding encoder neural network model and carries out nonlinear fitting, obtains full range
Frequency domain character with voice signal, then the frequency domain character of Whole frequency band voice signal is converted to the power of Whole frequency band voice signal
Spectrum, and then inverse Fourier transform is done, to obtain Whole frequency band voice signal.It realizes and restores compressed narrow band voice signal
For Whole frequency band voice signal, the quality and intelligibility of voice signal are improved.
Embodiment two
Fig. 2 be a kind of flow chart of the processing method of voice signal provided by Embodiment 2 of the present invention, the present embodiment be
Further refinement on the basis of above-mentioned technical proposal.As shown in Fig. 2, this method specifically includes:
S201, primary speech signal input vocoder is compressed, obtains compressed narrow band voice signal.
Specifically, primary speech signal is untreated voice signal.Vocoder is that a kind of pair of speech is analyzed
Volume, decoder with synthesis, also referred to as speech analysis synthesis system or voice band compressibility, are mainly used for signal band pressure
Contracting, phonetic storage communication and secret communication.Primary speech signal is input in vocoder, and vocoder carries out band compression to it,
The voice signal of output is exactly compressed narrow band voice signal.
Optionally, channel vocoder can be used to compress primary speech signal, the low-pass cut-off frequencies of vocoder
It is set as 100Hz, 300Hz or 500Hz.The frequency range of voice signal is divided into many nearby frequency bands or led to by channel vocoder
Road, the amplitude spectrum of the narrow band voice signal approximate representation voice signal of output, therefore channel vocoder is only to the width of voice signal
Required by degree spectrum has, the voice signal of output has certain loss on frequency band, wherein the voice of single channel vocoder output
The frequency band loss of signal is the most serious.
S202, the narrow band voice signal is pre-processed.
Specifically, before narrow band voice signal is analyzed and is handled, it is necessary to carry out preemphasis to it, framing, add
The pretreatment operations such as window.The purpose of these operations is eliminated because of mankind's phonatory organ itself and setting due to acquisition voice signal
Standby brought aliasing, higher hamonic wave distortion, high frequency etc. influence of the factor to quality of speech signal, guarantee subsequent language as far as possible
The signal that sound is handled more evenly, smoothly, for speech recognition provides good parameter, improves speech processes matter
Amount.
S203, the frequency domain character for extracting the narrow band voice signal;
Specifically, the feature of voice signal mainly has two major classes, temporal signatures and frequency domain character, temporal signatures include: short
When average energy, short-time average zero-crossing rate, formant and pitch period etc., frequency domain character includes: linear predictor coefficient (Linear
Predictive Coding, LPC), linear prediction residue error (Linear Predictive Cepstral Coding,
LPCC), line spectrum pairs parameter (linear spectrum pairs, LSP), short-term spectrum and mel-frequency cepstrum coefficient (Mel-
Frequency Cepstral Coefficients, MFCC) etc..Extract the frequency domain character of narrow band voice signal, it is preferred that can
To extract the mel-frequency cepstrum coefficient of narrow band voice signal, to carry out subsequent audio signal processing method.
S204, the frequency domain character of the narrow band voice signal is inputted into trained depth noise reduction self-encoding encoder neural network
Model carries out nonlinear fitting, obtains the frequency domain character of Whole frequency band voice signal;
Specifically, neural network be widely interconnected by a large amount of, simple processing unit (referred to as neuron) and
The complex networks system of formation, it reflects many essential characteristics of human brain function, is a highly complex non-linear dynamic
Learning system.Neural network has large-scale parallel, distributed storage and processing, self-organizing, adaptive and self-learning ability, especially
It is suitble to processing to need while considers many factors and condition, inaccurate and fuzzy information-processing problem.Depth noise reduction is self-editing
The effect of code device neural network model is to carrying out nonlinear fitting between narrow band voice signal and full frequency band voice signal, mainly
Including training stage and test phase.
In the training stage of depth noise reduction self-encoding encoder neural network model, depth noise reduction self-encoding encoder neural network model
Including input layer, hidden layer and output layer, input layer is used to receive the input letter of depth noise reduction self-encoding encoder neural network model
Number, output layer is for exporting depth noise reduction self-encoding encoder neural network model output signal, and hidden layer is for carrying out input signal
Non-linear matches between output signal.Depth noise reduction self-encoding encoder neural network model is also needed using activation primitive with energy
Enough to work normally, activation primitive is used to for nonlinear characteristic being introduced into depth noise reduction self-encoding encoder neural network model, so that
Depth noise reduction self-encoding encoder neural network model completes the nonlinear fitting to input-output signal.Preferably, it uses
Sigmoid function is as activation primitive, and the hiding number of plies is set as 3 layers, and every layer of neuronal quantity is set as 500.By a large amount of
The frequency domain character of narrow band voice signal of the first training sample be input to depth noise reduction self-encoding encoder neural network model, by mould
Each of type the output frequency domain character of the Whole frequency band voice signal of the first training sample and the corresponding original of each first training sample
Beginning voice signal calculates loss function after being compared, and it is defeated to control activation primitive acquisition input-according to loss function calculated result
The Nonlinear Parameter fitting result of signal out, to complete the training to depth noise reduction self-encoding encoder neural network model.
The test phase of depth noise reduction self-encoding encoder neural network model i.e. trained depth noise reduction self-encoding encoder
Neural network model uses test phase, and the frequency domain character of the narrow band voice signal of the second training sample is input to depth drop
Self-encoding encoder of making an uproar neural network model, by the frequency domain character of the Whole frequency band voice signal of the second training sample of each of model output
Primary speech signal corresponding with each second training sample calculates loss function after being compared, and is calculated and is tied according to loss function
Whether fruit determination will also need to continue to the training of depth noise reduction self-encoding encoder neural network model.
S205, the power spectrum that the frequency domain character of the Whole frequency band voice signal is converted to Whole frequency band voice signal, to institute
The power spectrum for stating Whole frequency band voice signal does inverse Fourier transform, obtains Whole frequency band voice signal.
Short Time Fourier Analysis is carried out to the frequency domain character of Whole frequency band voice signal, calculates the discrete of each overlapping adding window frame
Fourier transformation, to obtain the power spectrum of every frame signal;Inverse Fourier transform is done to the power spectrum of Whole frequency band voice signal again,
The Whole frequency band voice signal for the bandwidth that is restored.
Optionally, it increases to verify the Whole frequency band voice signal intelligibility after restoring, it is objective in short-term to can be used
Intelligibility (Short-Time Objective Intelligibility, STOI) assesses voice signal, STOI assessment
Being worth range is [0,1], is scored higher, intelligibility is higher.As shown in table 1, using the compressed narrowband speech of the embodiment of the present invention
The STOI of Whole frequency band voice signal after signal and recovery assesses contrast table, as can be seen from Table 1, no matter vocoder low pass cutoff
The intelligibility of the size of frequency, the Whole frequency band voice signal after recovery is equal compared to the intelligibility of compressed narrow band voice signal
It increases.
The STOI assessment of narrow band voice signal and Whole frequency band voice signal after recovery after table 1 compresses
The embodiment of the present invention two by by primary speech signal be input in vocoder carry out band compression handle to obtain it is narrow
Band voice signal, pre-processes compressed narrow band voice signal, then extracts the frequency domain character of narrow band voice signal again
And be entered into trained depth noise reduction self-encoding encoder neural network model and carry out band recovery processing, to obtain full range
Band voice signal.Narrow band voice signal is not only reverted into Whole frequency band voice signal, but also ensure that Whole frequency band voice signal
Amplitude spectrum improves the quality of speech processes, increases the intelligibility of voice signal.
Embodiment three
Fig. 3 is the flow chart that a kind of pair of narrow band voice signal that the embodiment of the present invention three provides carries out pretreated method,
The present embodiment is on the basis of the above embodiments, to carry out pretreated further refinement to narrow band voice signal.Such as Fig. 3 institute
Show, this method specifically includes:
S301, preemphasis is carried out to the narrow band voice signal, obtains preemphasis narrow band voice signal.
Specifically, preemphasis is a kind of signal processing mode compensated in transmitting terminal to input signal high fdrequency component.
Voice signal is damaged in transmission process very greatly, in order to obtain relatively good voice signal waveform in receiving end, it is necessary to right
Impaired voice signal compensates, the thought of pre-emphasis technique be exactly transmission line beginning enhancing voice signal high frequency at
Point, to compensate excessive decaying of the high fdrequency component in transmission process.By the narrow band voice signal of preemphasis, the level of output
Signal is similar, and attenuation is greatly decreased.
S302, resampling is carried out to the preemphasis narrow band voice signal, obtains resampling narrow band voice signal.
Specifically, it is new sample frequency to adapt to different sample frequencys that resampling, which is by original sampling frequency conversion,
It is required that.According to nyquist sampling law, sample frequency needs to adopt more than or equal to 2 times of signal maximum frequency component itself
Digital signal after sample completely remains the information in original signal, the data after sampling can be taken to restore to believe
Number.The frequency range of usual voice is 50Hz -6kHz, and the frequency range of musical instrument sound is probably 50Hz -8kHz, it is preferred that right
The sample frequency that narrow band voice signal after preemphasis carries out resampling can be set to 16kHz.
S303, narrowband obtain framing after smooth to resampling narrow band voice signal progress framing operation and adding window
Voice signal;
Specifically, voice signal is the random sequence changed over time, it is not one steady from the point of view of the overall situation
Random process, still, in a relatively short period of time, it is believed that it is the stable random process of approximation.This bit of approximation
Stable voice signal is just a frame signal, and length is called frame length.One section of voice signal interception is put down at multiple segment approximations
Steady voice signal is just named the framing of voice signal.During carrying out framing to voice signal, each frame signal is all needed
Windowing process is carried out, so that the amplitude of a frame signal is gradient to zero at both ends.The time difference of the initial position of adjacent two frame is named
Frame shifting is done, common following the example of is to be taken as the half of frame length.Optionally, frame length can be set to 16ms, and it is 8ms that corresponding frame, which moves,
It is smooth that adding window is carried out using Hamming window.
S304, voice activity detection is carried out to the narrow band voice signal after the framing, obtains removing mute section of narrowband
Speech activity signal.
Specifically, the purpose of voice activity detection is that detection voice signal whether there is, i.e., whether have in voice signal quiet
The signal of segment just removes this section of mute signal if there is mute section of signal in voice signal.Voice activity detection subtracts
The resource occupied during transmitting voice signal and space are lacked, have avoided the coding and transmission to quiet data packet, has saved and calculate
Time and bandwidth.
The embodiment of the present invention passes through living to narrow band voice signal progress preemphasis, resampling, framing and adding window and voice
Dynamic detection, realizes the pretreatment to compressed narrow band voice signal, eliminate aliasing of the voice signal in transmission process,
Higher hamonic wave distortion, high frequency etc. influence of the factor to quality of speech signal, the signal for guaranteeing that subsequent voice is handled are more equal
It is even, smooth, good parameter is provided for speech recognition, improves speech processes quality.
Example IV
Fig. 4 is a kind of structural schematic diagram of the processing unit for voice signal that the embodiment of the present invention four provides, the present embodiment
It is applicable to carry out bandwidth recovery to the narrow band voice signal that vocoder exports.The device can be using software and/or hardware
Mode is realized, and can be integrated at the terminal, such as smart phone, tablet computer, PC (PC) and learning machine etc..The present invention
The processing of voice signal provided by any embodiment of the invention can be performed in the processing unit of voice signal provided by embodiment
Method has the corresponding functional module of execution method and beneficial effect.The content of not detailed description can in the embodiment of the present invention three
With reference to the description in any means embodiment of the present invention.
As shown in figure 4, a kind of processing unit 400 for voice signal that the embodiment of the present invention three provides includes:
Narrow band voice signal obtains module 401, for obtaining compressed narrow band voice signal;
Narrowband frequency domain character extraction module 402, for extracting the frequency domain character of the narrow band voice signal;
Whole frequency band frequency domain character obtains module 403, for training the frequency domain character input of the narrow band voice signal
Depth noise reduction self-encoding encoder neural network model carry out nonlinear fitting, obtain the frequency domain character of Whole frequency band voice signal;
Whole frequency band voice signal obtains module 404, for being converted to full the frequency domain character of the Whole frequency band voice signal
The power spectrum of band speech signal does inverse Fourier transform to the power spectrum of the Whole frequency band voice signal, obtains Whole frequency band language
Sound signal.
Optionally, the frequency domain character is mel-frequency cepstrum coefficient.
Optionally, the depth noise reduction self-encoding encoder neural network model uses sigmoid function as activation primitive, hidden
The hiding number of plies is set as layer 2-4.
Optionally, the narrow band voice signal acquisition module 401 includes:
Narrow band voice signal acquiring unit, for compressing primary speech signal input vocoder, after obtaining compression
Narrow band voice signal;
Narrow band voice signal pretreatment unit, user pre-process the narrow band voice signal.
Optionally, the vocoder is channel vocoder.
Optionally, the low-pass cut-off frequencies of the vocoder are set as 100Hz, 300Hz or 500Hz.
Optionally, the narrow band voice signal pretreatment unit is specifically used for:
Preemphasis is carried out to the narrow band voice signal, obtains preemphasis narrow band voice signal;
Resampling is carried out to the preemphasis narrow band voice signal, obtains resampling narrow band voice signal;
Framing operation is carried out to the resampling narrow band voice signal and adding window is smooth, the narrowband speech letter after obtaining framing
Number;
Voice activity detection is carried out to the narrow band voice signal after the framing, it is living to obtain removing mute section of narrowband speech
Dynamic signal.
The embodiment of the present invention three is inputted the frequency domain character by the frequency domain character of the compressed narrow band voice signal of extraction
Nonlinear fitting is carried out to trained depth noise reduction self-encoding encoder neural network model, obtains the frequency domain of Whole frequency band voice signal
Feature, then the frequency domain character of Whole frequency band voice signal is converted to the power spectrum of Whole frequency band voice signal, and then it is inverse to be Fourier
Transformation, to obtain Whole frequency band voice signal.It realizes and compressed narrow band voice signal is reverted into Whole frequency band voice signal,
Improve the quality and intelligibility of voice signal.
Embodiment five
Fig. 5 is a kind of structural schematic diagram for terminal that the embodiment of the present invention five provides.Fig. 5, which is shown, to be suitable for being used to realizing this
The block diagram of the exemplary terminal 512 of invention embodiment.The terminal 512 that Fig. 5 is shown is only an example, should not be to the present invention
The function and use scope of embodiment bring any restrictions.
As shown in figure 5, terminal 512 is showed in the form of general purpose terminal.The component of terminal 512 can include but is not limited to:
One or more processor 516 (taking a processor as an example in Fig. 5), storage device 528 connect different system component (packets
Include storage device 528 and processor 516) bus 518.
Bus 518 indicates one of a few class bus structures or a variety of, including storage device bus or storage device control
Device processed, peripheral bus, graphics acceleration port, processor or total using the local of any bus structures in a variety of bus structures
Line.For example, these architectures include but is not limited to industry standard architecture (Industry Subversive
Alliance, ISA) bus, microchannel architecture (Micro Channel Architecture, MAC) bus is enhanced
Isa bus, Video Electronics Standards Association (Video Electronics Standards Association, VESA) local are total
Line and peripheral component interconnection (Peripheral Component Interconnect, PCI) bus.
Terminal 512 typically comprises a variety of computer system readable media.These media can be it is any can be by terminal
The usable medium of 512 access, including volatile and non-volatile media, moveable and immovable medium.
Storage device 528 may include the computer system readable media of form of volatile memory, such as arbitrary access
Memory (Random Access Memory, RAM) 530 and/or cache memory 532.Terminal 512 can be wrapped further
Include other removable/nonremovable, volatile/non-volatile computer system storage mediums.Only as an example, storage system
534 can be used for reading and writing immovable, non-volatile magnetic media (Fig. 5 do not show, commonly referred to as " hard disk drive ").Although
It is not shown in Fig. 5, the disc driver for reading and writing to removable non-volatile magnetic disk (such as " floppy disk ") can be provided, and
To removable anonvolatile optical disk, such as CD-ROM (Compact Disc Read-Only Memory, CD-ROM), number
Optic disk (Digital Video Disc-Read Only Memory, DVD-ROM) or other optical mediums) read-write CD drive
Dynamic device.In these cases, each driver can be connected by one or more data media interfaces with bus 518.It deposits
Storage device 528 may include at least one program product, which has one group of (for example, at least one) program module, this
A little program modules are configured to perform the function of various embodiments of the present invention.
Program/utility 540 with one group of (at least one) program module 542 can store in such as storage dress
It sets in 528, such program module 542 includes but is not limited to operating system, one or more application program, other program moulds
It may include the realization of network environment in block and program data, each of these examples or certain combination.Program module
542 usually execute function and/or method in embodiment described in the invention.
Terminal 512 can also be logical with one or more external equipments 514 (such as keyboard, direction terminal, display 524 etc.)
Letter, can also be enabled a user to one or more terminal interact with the terminal 512 communicate, and/or with make the terminal 512
Any terminal (such as network interface card, modem etc.) communication that can be communicated with one or more of the other computing terminal.This
Kind communication can be carried out by input/output (I/O) interface 522.Also, terminal 512 can also by network adapter 520 with
One or more network (such as local area network (Local Area Network, LAN), wide area network (Wide Area Network,
WAN) and/or public network, for example, internet) communication.As shown in figure 5, network adapter 520 passes through bus 518 and terminal 512
Other modules communication.It should be understood that although not shown in the drawings, other hardware and/or software mould can be used in conjunction with terminal 512
Block, including but not limited to: microcode, terminal driver, redundant processor, external disk drive array, disk array
(Redundant Arrays of Independent Disks, RAID) system, tape drive and data backup storage system
System etc..
The program that processor 516 is stored in storage device 528 by operation, thereby executing various function application and number
According to processing, such as realize the processing method of voice signal provided by any embodiment of the invention, this method may include:
Obtain compressed narrow band voice signal;
Extract the frequency domain character of the narrow band voice signal;
The frequency domain character of the narrow band voice signal is inputted into trained depth noise reduction self-encoding encoder neural network model
Nonlinear fitting is carried out, the frequency domain character of Whole frequency band voice signal is obtained;
The power spectrum that the frequency domain character of the Whole frequency band voice signal is converted to Whole frequency band voice signal, to the full range
Power spectrum with voice signal does inverse Fourier transform, obtains Whole frequency band voice signal.
Embodiment six
The embodiment of the present invention six additionally provides a kind of computer readable storage medium, is stored thereon with computer program, should
The processing method such as voice signal provided by any embodiment of the invention is realized when program is executed by processor, this method can be with
Include:
Obtain compressed narrow band voice signal;
Extract the frequency domain character of the narrow band voice signal;
The frequency domain character of the narrow band voice signal is inputted into trained depth noise reduction self-encoding encoder neural network model
Nonlinear fitting is carried out, the frequency domain character of Whole frequency band voice signal is obtained;
The power spectrum that the frequency domain character of the Whole frequency band voice signal is converted to Whole frequency band voice signal, to the full range
Power spectrum with voice signal does inverse Fourier transform, obtains Whole frequency band voice signal.
The computer storage medium of the embodiment of the present invention five, can appointing using one or more computer-readable media
Meaning combination.Computer-readable medium can be computer-readable signal media or computer readable storage medium.Computer can
Reading storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device
Or device, or any above combination.The more specific example (non exhaustive list) of computer readable storage medium includes:
Electrical connection, portable computer diskette, hard disk, random access memory (RAM), read-only storage with one or more conducting wires
Device (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD-
ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer-readable storage
Medium can be any tangible medium for including or store program, which can be commanded execution system, device or device
Using or it is in connection.
Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal,
Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited
In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can
Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for
By the use of instruction execution system, device or device or program in connection.
The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited
In wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.
The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof
Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++,
It further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with
It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion
Divide and partially executes or executed on remote computer or terminal completely on the remote computer on the user computer.It is relating to
And in the situation of remote computer, remote computer can pass through the network of any kind --- including local area network (LAN) or extensively
Domain net (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as provided using Internet service
Quotient is connected by internet).
Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that
The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation,
It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention
It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also
It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.
Claims (10)
1. a kind of processing method of voice signal characterized by comprising
Obtain compressed narrow band voice signal;
Extract the frequency domain character of the narrow band voice signal;
The frequency domain character of the narrow band voice signal is inputted trained depth noise reduction self-encoding encoder neural network model to carry out
Nonlinear fitting obtains the frequency domain character of Whole frequency band voice signal;
The power spectrum that the frequency domain character of the Whole frequency band voice signal is converted to Whole frequency band voice signal, to the Whole frequency band language
The power spectrum of sound signal does inverse Fourier transform, obtains Whole frequency band voice signal.
2. the method as described in claim 1, which is characterized in that the frequency domain character is mel-frequency cepstrum coefficient.
3. the method as described in claim 1, which is characterized in that the depth noise reduction self-encoding encoder neural network model uses
Sigmoid function is set as layer 2-4 as activation primitive, the hiding number of plies.
4. the method as described in claim 1, which is characterized in that described to obtain compressed narrow band voice signal and include:
Primary speech signal input vocoder is compressed, compressed narrow band voice signal is obtained;
The narrow band voice signal is pre-processed.
5. method as claimed in claim 4, which is characterized in that the vocoder is channel vocoder.
6. method as claimed in claim 4, which is characterized in that the low-pass cut-off frequencies of the vocoder be set as 100Hz,
300Hz or 500Hz.
7. method as claimed in claim 4, which is characterized in that it is described to the narrow band voice signal carry out pretreatment include:
Preemphasis is carried out to the narrow band voice signal, obtains preemphasis narrow band voice signal;
Resampling is carried out to the preemphasis narrow band voice signal, obtains resampling narrow band voice signal;
Framing operation is carried out to the resampling narrow band voice signal and adding window is smooth, the narrow band voice signal after obtaining framing;
Voice activity detection is carried out to the narrow band voice signal after the framing, obtains removing mute section of narrowband speech activity letter
Number.
8. a kind of processing unit of voice signal characterized by comprising
Narrow band voice signal obtains module, for obtaining compressed narrow band voice signal;
Narrowband frequency domain character extraction module, for extracting the frequency domain character of the narrow band voice signal;
Whole frequency band frequency domain character obtains module, drops for the frequency domain character of the narrow band voice signal to be inputted trained depth
Self-encoding encoder of making an uproar neural network model carries out nonlinear fitting, obtains the frequency domain character of Whole frequency band voice signal;
Whole frequency band voice signal obtains module, for the frequency domain character of the Whole frequency band voice signal to be converted to Whole frequency band voice
The power spectrum of signal does inverse Fourier transform to the power spectrum of the Whole frequency band voice signal, obtains Whole frequency band voice signal.
9. a kind of terminal characterized by comprising
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are executed by one or more of processors, so that one or more of processors are real
Now such as the processing method of voice signal of any of claims 1-7.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor
The processing method such as voice signal of any of claims 1-7 is realized when execution.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910593752.6A CN110322891B (en) | 2019-07-03 | 2019-07-03 | Voice signal processing method and device, terminal and storage medium |
PCT/CN2020/078944 WO2021000597A1 (en) | 2019-07-03 | 2020-03-12 | Voice signal processing method and device, terminal, and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910593752.6A CN110322891B (en) | 2019-07-03 | 2019-07-03 | Voice signal processing method and device, terminal and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110322891A true CN110322891A (en) | 2019-10-11 |
CN110322891B CN110322891B (en) | 2021-12-10 |
Family
ID=68122557
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910593752.6A Active CN110322891B (en) | 2019-07-03 | 2019-07-03 | Voice signal processing method and device, terminal and storage medium |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN110322891B (en) |
WO (1) | WO2021000597A1 (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110956957A (en) * | 2019-12-23 | 2020-04-03 | 苏州思必驰信息科技有限公司 | Training method and system of speech enhancement model |
CN111292768A (en) * | 2020-02-07 | 2020-06-16 | 腾讯科技(深圳)有限公司 | Method and device for hiding lost packet, storage medium and computer equipment |
CN111508500A (en) * | 2020-04-17 | 2020-08-07 | 五邑大学 | Voice emotion recognition method, system, device and storage medium |
CN112053421A (en) * | 2020-10-14 | 2020-12-08 | 腾讯科技(深圳)有限公司 | Signal noise reduction processing method, device, equipment and storage medium |
WO2021000597A1 (en) * | 2019-07-03 | 2021-01-07 | 南方科技大学 | Voice signal processing method and device, terminal, and storage medium |
CN114265373A (en) * | 2021-11-22 | 2022-04-01 | 煤炭科学研究总院 | Integrated control platform control system for fully mechanized mining face |
CN115063895A (en) * | 2022-06-10 | 2022-09-16 | 深圳市智远联科技有限公司 | Ticket selling method and system based on voice recognition |
CN117672247A (en) * | 2024-01-31 | 2024-03-08 | 中国电子科技集团公司第十五研究所 | Method and system for filtering narrowband noise through real-time audio |
Families Citing this family (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113903352A (en) * | 2021-09-28 | 2022-01-07 | 阿里云计算有限公司 | Single-channel speech enhancement method and device |
CN113708855B (en) * | 2021-09-29 | 2023-07-25 | 北京信息科技大学 | OTFS data driving and receiving method, system and medium based on deep learning |
CN114863940B (en) * | 2022-07-05 | 2022-09-30 | 北京百瑞互联技术有限公司 | Model training method for voice quality conversion, method, device and medium for improving voice quality |
CN116364063B (en) * | 2023-06-01 | 2023-09-05 | 蔚来汽车科技(安徽)有限公司 | Phoneme alignment method, apparatus, driving apparatus, and medium |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6691082B1 (en) * | 1999-08-03 | 2004-02-10 | Lucent Technologies Inc | Method and system for sub-band hybrid coding |
CN1988565A (en) * | 2005-12-23 | 2007-06-27 | Qnx软件操作系统(威美科)有限公司 | Bandwidth extension of narrowband speech |
CN104036781A (en) * | 2013-03-05 | 2014-09-10 | 深港产学研基地 | Voice signal bandwidth expansion device and method |
CN104658547A (en) * | 2013-11-20 | 2015-05-27 | 大连佑嘉软件科技有限公司 | Method for expanding artificial voice bandwidth |
CN105814631A (en) * | 2013-12-15 | 2016-07-27 | 高通股份有限公司 | Systems and methods of blind bandwidth extension |
CN106782511A (en) * | 2016-12-22 | 2017-05-31 | 太原理工大学 | Amendment linear depth autoencoder network audio recognition method |
CN107705801A (en) * | 2016-08-05 | 2018-02-16 | 中国科学院自动化研究所 | The training method and Speech bandwidth extension method of Speech bandwidth extension model |
CN108198571A (en) * | 2017-12-21 | 2018-06-22 | 中国科学院声学研究所 | A kind of bandwidth expanding method judged based on adaptive bandwidth and system |
CN109215635A (en) * | 2018-10-25 | 2019-01-15 | 武汉大学 | Broadband voice spectral tilt degree characteristic parameter method for reconstructing for speech intelligibility enhancing |
WO2019081070A1 (en) * | 2017-10-27 | 2019-05-02 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus, method or computer program for generating a bandwidth-enhanced audio signal using a neural network processor |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP2559026A1 (en) * | 2010-04-12 | 2013-02-20 | Freescale Semiconductor, Inc. | Audio communication device, method for outputting an audio signal, and communication system |
WO2013188562A2 (en) * | 2012-06-12 | 2013-12-19 | Audience, Inc. | Bandwidth extension via constrained synthesis |
CN107452389B (en) * | 2017-07-20 | 2020-09-01 | 大象声科(深圳)科技有限公司 | Universal single-track real-time noise reduction method |
CN110322891B (en) * | 2019-07-03 | 2021-12-10 | 南方科技大学 | Voice signal processing method and device, terminal and storage medium |
-
2019
- 2019-07-03 CN CN201910593752.6A patent/CN110322891B/en active Active
-
2020
- 2020-03-12 WO PCT/CN2020/078944 patent/WO2021000597A1/en active Application Filing
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6691082B1 (en) * | 1999-08-03 | 2004-02-10 | Lucent Technologies Inc | Method and system for sub-band hybrid coding |
CN1988565A (en) * | 2005-12-23 | 2007-06-27 | Qnx软件操作系统(威美科)有限公司 | Bandwidth extension of narrowband speech |
CN104036781A (en) * | 2013-03-05 | 2014-09-10 | 深港产学研基地 | Voice signal bandwidth expansion device and method |
CN104658547A (en) * | 2013-11-20 | 2015-05-27 | 大连佑嘉软件科技有限公司 | Method for expanding artificial voice bandwidth |
CN105814631A (en) * | 2013-12-15 | 2016-07-27 | 高通股份有限公司 | Systems and methods of blind bandwidth extension |
CN107705801A (en) * | 2016-08-05 | 2018-02-16 | 中国科学院自动化研究所 | The training method and Speech bandwidth extension method of Speech bandwidth extension model |
CN106782511A (en) * | 2016-12-22 | 2017-05-31 | 太原理工大学 | Amendment linear depth autoencoder network audio recognition method |
WO2019081070A1 (en) * | 2017-10-27 | 2019-05-02 | Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V. | Apparatus, method or computer program for generating a bandwidth-enhanced audio signal using a neural network processor |
CN108198571A (en) * | 2017-12-21 | 2018-06-22 | 中国科学院声学研究所 | A kind of bandwidth expanding method judged based on adaptive bandwidth and system |
CN109215635A (en) * | 2018-10-25 | 2019-01-15 | 武汉大学 | Broadband voice spectral tilt degree characteristic parameter method for reconstructing for speech intelligibility enhancing |
Non-Patent Citations (4)
Title |
---|
A. UNCINI: ""frequency recovery of narrow-band speech using adaptive spline neural networks"", 《ACOUSTICS, SPEECH, & SIGNAL PROCESSING, ON IEEE INTERNATIONAL CONFERENCE》 * |
SHAHINA A: ""mapping neural networks for bandwidth extension of narrowband speech"", 《INTERSPEECH》 * |
YE FUQIANG: ""Effect of band power weighting on understanding sentences synthesized with temporal information"", 《JOURNAL OF THE ACOUSTICAL SOCIETY OF AMERICA》 * |
顾宇: ""基于神经网络的语音频带扩展方法研究"", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021000597A1 (en) * | 2019-07-03 | 2021-01-07 | 南方科技大学 | Voice signal processing method and device, terminal, and storage medium |
CN110956957A (en) * | 2019-12-23 | 2020-04-03 | 苏州思必驰信息科技有限公司 | Training method and system of speech enhancement model |
CN110956957B (en) * | 2019-12-23 | 2022-05-17 | 思必驰科技股份有限公司 | Training method and system of speech enhancement model |
CN111292768B (en) * | 2020-02-07 | 2023-06-02 | 腾讯科技(深圳)有限公司 | Method, device, storage medium and computer equipment for hiding packet loss |
CN111292768A (en) * | 2020-02-07 | 2020-06-16 | 腾讯科技(深圳)有限公司 | Method and device for hiding lost packet, storage medium and computer equipment |
CN111508500A (en) * | 2020-04-17 | 2020-08-07 | 五邑大学 | Voice emotion recognition method, system, device and storage medium |
CN111508500B (en) * | 2020-04-17 | 2023-08-29 | 五邑大学 | Voice emotion recognition method, system, device and storage medium |
CN112053421A (en) * | 2020-10-14 | 2020-12-08 | 腾讯科技(深圳)有限公司 | Signal noise reduction processing method, device, equipment and storage medium |
CN112053421B (en) * | 2020-10-14 | 2023-06-23 | 腾讯科技(深圳)有限公司 | Signal noise reduction processing method, device, equipment and storage medium |
CN114265373A (en) * | 2021-11-22 | 2022-04-01 | 煤炭科学研究总院 | Integrated control platform control system for fully mechanized mining face |
CN115063895A (en) * | 2022-06-10 | 2022-09-16 | 深圳市智远联科技有限公司 | Ticket selling method and system based on voice recognition |
CN117672247A (en) * | 2024-01-31 | 2024-03-08 | 中国电子科技集团公司第十五研究所 | Method and system for filtering narrowband noise through real-time audio |
CN117672247B (en) * | 2024-01-31 | 2024-04-02 | 中国电子科技集团公司第十五研究所 | Method and system for filtering narrowband noise through real-time audio |
Also Published As
Publication number | Publication date |
---|---|
WO2021000597A1 (en) | 2021-01-07 |
CN110322891B (en) | 2021-12-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110322891A (en) | A kind of processing method of voice signal, device, terminal and storage medium | |
CN101183527B (en) | Method and apparatus for encoding and decoding high frequency signal | |
US6691090B1 (en) | Speech recognition system including dimensionality reduction of baseband frequency signals | |
US7035797B2 (en) | Data-driven filtering of cepstral time trajectories for robust speech recognition | |
CN102652336B (en) | Speech signal restoration device and speech signal restoration method | |
Ganapathy et al. | Robust feature extraction using modulation filtering of autoregressive models | |
KR20120090086A (en) | Determining an upperband signal from a narrowband signal | |
CN101140759A (en) | Band-width spreading method and system for voice or audio signal | |
Yu et al. | Time-domain multi-modal bone/air conducted speech enhancement | |
Ganapathy | Signal analysis using autoregressive models of amplitude modulation | |
JP2023548707A (en) | Speech enhancement methods, devices, equipment and computer programs | |
AU643769B2 (en) | Coding of acoustic waveforms | |
Bhatt | Simulation and overall comparative evaluation of performance between different techniques for high band feature extraction based on artificial bandwidth extension of speech over proposed global system for mobile full rate narrow band coder | |
Hsu et al. | Revise: Self-supervised speech resynthesis with visual input for universal and generalized speech regeneration | |
Cámara et al. | Phase-Aware Transformations in Variational Autoencoders for Audio Effects | |
CN114333893A (en) | Voice processing method and device, electronic equipment and readable medium | |
CN113744715A (en) | Vocoder speech synthesis method, device, computer equipment and storage medium | |
Schröter et al. | CLC: complex linear coding for the DNS 2020 challenge | |
CN109215635B (en) | Broadband voice frequency spectrum gradient characteristic parameter reconstruction method for voice definition enhancement | |
Prasad et al. | Speech bandwidth extension aided by magnitude spectrum data hiding | |
CN115035904A (en) | High-quality vocoder model based on generative antagonistic neural network | |
Jose | Amrconvnet: Amr-coded speech enhancement using convolutional neural networks | |
CN114333891A (en) | Voice processing method and device, electronic equipment and readable medium | |
Zivic | Modern Communications Technology | |
Sivaraman et al. | Speech Bandwidth Expansion For Speaker Recognition On Telephony Audio. |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |