CN110322891A

CN110322891A - A kind of processing method of voice signal, device, terminal and storage medium

Info

Publication number: CN110322891A
Application number: CN201910593752.6A
Authority: CN
Inventors: 陈霏; 叶富强
Original assignee: Southwest University of Science and Technology
Current assignee: Southwest University of Science and Technology; Southern University of Science and Technology
Priority date: 2019-07-03
Filing date: 2019-07-03
Publication date: 2019-10-11
Anticipated expiration: 2039-07-03
Also published as: WO2021000597A1; CN110322891B

Abstract

The embodiment of the invention discloses a kind of processing method of voice signal, device, terminal and storage mediums, which comprises obtains compressed narrow band voice signal；Extract the frequency domain character of the narrow band voice signal；The frequency domain character of the narrow band voice signal is inputted into trained depth noise reduction self-encoding encoder neural network model and carries out nonlinear fitting, obtains the frequency domain character of Whole frequency band voice signal；The power spectrum that the frequency domain character of the Whole frequency band voice signal is converted to Whole frequency band voice signal does inverse Fourier transform to the power spectrum of the Whole frequency band voice signal using the phase information of corresponding narrow band signal, obtains Whole frequency band voice signal.The embodiment of the present invention carries out bandwidth recovery to narrow band voice signal after compression by using depth noise reduction self-encoding encoder neural network model, improves the quality and intelligibility of voice signal.

Description

A kind of processing method of voice signal, device, terminal and storage medium

Technical field

The present embodiments relate to voice processing technology field more particularly to a kind of processing method of voice signal, device, Terminal and storage medium.

Background technique

Voice signal is one of the important way that the mankind are linked up, especially with the development that science and technology is with rapid changepl. never-ending changes and improvements, language Sound signal needs are transmitted between mobile phone, computer.Transmission process just needs to carry out compressed encoding to voice signal, with removal Redundancy in voice signal reduces transmission bit rate or memory space, therefore is particularly important to the compression of voice signal.

Vocoder appears in AT&T Labs, the U.S. earliest, is mainly used for signal band compression, phonetic storage communication and guarantor Close communication.Speech Signal Compression coding is widely applied using channel vocoder, it extracts language to voice signal first The frequency domain character parameter of sound signal carries out coding encrypting, recovers raw tone waveform, the course of work further according to characteristic parameter Are as follows: for the time-frequency spectrum information input of voice signal into vocoder, voice signal is divided into frequency band by the bandpass filter in vocoder The signal in adjacent different channels reuses Hilbert transform and low-pass filter and carries out envelope extraction to signal, then adopts Sinusoidal signal is used to carry out amplitude modulation to the envelope information extracted as carrier wave, finally signal synthesizes one group by treated Export voice signal.

But human ear is utilized to this characteristic of voice signal phase-unsensitive in vocoder, synthesizes to speech signal analysis When only have to the amplitude spectrum of signal it is required, so voice signal and primary speech signal that vocoder synthesizes on waveform very Difficulty is compared, and the voice quality and intelligibility of vocoder synthesis can only be measured by subjective scoring to be measured.In addition acoustic code Device TRANSFER MODEL parameter while bringing preferable band compression effect, also brings larger danger to the naturalness of voice signal Evil.When especially with single channel vocoder, the narrow band voice signal of synthesis has cast out many details, so as to cause narrowband speech The quality and intelligibility of signal reduce.

Summary of the invention

The embodiment of the present invention provides method, apparatus, server and the storage medium of a kind of voice signal, to improve voice letter Number quality and intelligibility.

In a first aspect, the embodiment of the invention provides a kind of processing methods of voice signal, comprising:

Obtain compressed narrow band voice signal；

Extract the frequency domain character of the narrow band voice signal；

The frequency domain character of the narrow band voice signal is inputted into trained depth noise reduction self-encoding encoder neural network model Nonlinear fitting is carried out, the frequency domain character of Whole frequency band voice signal is obtained；

The power spectrum that the frequency domain character of the Whole frequency band voice signal is converted to Whole frequency band voice signal, to the full range Power spectrum with voice signal does inverse Fourier transform, obtains Whole frequency band voice signal.

Optionally, the frequency domain character is mel-frequency cepstrum coefficient.

Optionally, the depth noise reduction self-encoding encoder neural network model uses sigmoid function as activation primitive, hidden The hiding number of plies is set as layer 2-4.

Optionally, the compressed narrow band voice signal of acquisition includes:

Primary speech signal input vocoder is compressed, compressed narrow band voice signal is obtained；

The narrow band voice signal is pre-processed.

Optionally, the vocoder is channel vocoder.

Optionally, the low-pass cut-off frequencies of the vocoder are set as 100Hz, 300Hz or 500Hz.

Optionally, it is described to the narrow band voice signal carry out pretreatment include:

Preemphasis is carried out to the narrow band voice signal, obtains preemphasis narrow band voice signal；

Resampling is carried out to the preemphasis narrow band voice signal, obtains resampling narrow band voice signal；

Framing operation is carried out to the resampling narrow band voice signal and adding window is smooth, the narrowband speech letter after obtaining framing Number；

Voice activity detection is carried out to the narrow band voice signal after the framing, it is living to obtain removing mute section of narrowband speech Dynamic signal.

Second aspect, the embodiment of the invention provides a kind of processing units of voice signal, comprising:

Narrow band voice signal obtains module, for obtaining compressed narrow band voice signal；

Narrowband frequency domain character extraction module, for extracting the frequency domain character of the narrow band voice signal；

Whole frequency band frequency domain character obtains module, for the frequency domain character of the narrow band voice signal to be inputted trained depth It spends noise reduction self-encoding encoder neural network model and carries out nonlinear fitting, obtain the frequency domain character of Whole frequency band voice signal；

Whole frequency band voice signal obtains module, for the frequency domain character of the Whole frequency band voice signal to be converted to Whole frequency band The power spectrum of voice signal does inverse Fourier transform to the power spectrum of the Whole frequency band voice signal, obtains Whole frequency band voice letter Number.

The third aspect, the embodiment of the invention also provides a kind of terminals, comprising:

One or more processors；

Storage device, for storing one or more programs,

When one or more of programs are executed by one or more of processors, so that one or more of processing Device realizes the processing method of the voice signal provided such as any embodiment of the present invention.

Fourth aspect, the embodiment of the invention also provides a kind of computer readable storage mediums, are stored thereon with computer Program realizes the processing method of the voice signal provided such as any embodiment of the present invention when the program is executed by processor.

The frequency domain character is input to by the embodiment of the present invention by the frequency domain character of the compressed narrow band voice signal of extraction Trained depth noise reduction self-encoding encoder neural network model carries out nonlinear fitting, and the frequency domain for obtaining Whole frequency band voice signal is special It levies, then the frequency domain character of Whole frequency band voice signal is converted to the power spectrum of Whole frequency band voice signal, and then do Fourier's inversion It changes, to obtain Whole frequency band voice signal.It realizes and compressed narrow band voice signal is reverted into Whole frequency band voice signal, mention The high quality and intelligibility of voice signal.

Detailed description of the invention

Fig. 1 is a kind of flow chart of the processing method for voice signal that the embodiment of the present invention one provides；

Fig. 2 is a kind of flow chart of the processing method of voice signal provided by Embodiment 2 of the present invention；

Fig. 3 is the flow chart that a kind of pair of narrow band voice signal that the embodiment of the present invention three provides carries out pretreated method；

Fig. 4 is a kind of structural schematic diagram of the processing unit for voice signal that the embodiment of the present invention four provides；

Fig. 5 is a kind of structural schematic diagram for terminal that the embodiment of the present invention five provides.

Specific embodiment

The present invention is described in further detail with reference to the accompanying drawings and examples.It is understood that this place is retouched The specific embodiment stated is used only for explaining the present invention rather than limiting the invention.It also should be noted that in order to just Only the parts related to the present invention are shown in description, attached drawing rather than entire infrastructure.

It should be mentioned that some exemplary embodiments are described as before exemplary embodiment is discussed in greater detail The processing or method described as flow chart.Although each step is described as the processing of sequence by flow chart, many of these Step can be implemented concurrently, concomitantly or simultaneously.In addition, the sequence of each step can be rearranged.When its operation Processing can be terminated when completion, it is also possible to have the additional step being not included in attached drawing.Handle the side of can correspond to Method, function, regulation, subroutine, subprogram etc..

In addition, term " first ", " second " etc. can be used to describe herein various directions, movement, step or element etc., But these directions, movement, step or element should not be limited by these terms.These terms are only used to by first direction, movement, step Rapid or element and another direction, movement, step or element are distinguished.For example, the case where not departing from scope of the present application Under, the first training sample can be known as the second training sample, and similarly, the second training sample can be known as to the first training sample This.First training sample and the second training sample both training sample, but it is not same training sample.Term " the One ", " second " etc. is not understood to indicate or imply relative importance or implicitly indicates the number of indicated technical characteristic Amount." first " is defined as a result, the feature of " second " can explicitly or implicitly include one or more of the features. In the description of the present invention, " multiple ", " batch " are meant that at least two, such as two, three etc., unless otherwise clearly having The restriction of body.

Embodiment one

Fig. 1 is a kind of flow chart of the processing method for voice signal that the embodiment of the present invention one provides, and the present embodiment can fit Narrow band voice signal for exporting to vocoder carries out bandwidth recovery, and this method can be held by the processing unit of voice signal Row, which can be realized by the way of software and/or hardware, and can be integrated at the terminal, such as smart phone, plate electricity Brain, PC (PC) and learning machine etc.

As shown in Figure 1, a kind of processing method for voice signal that the embodiment of the present invention one provides may include:

S101, compressed narrow band voice signal is obtained；

Specifically, the mode that speech is transmitted between electronic devices is referred to as voice signal.Voice signal is transmitted across Journey first has to carry out compressed encoding to voice, to remove the redundancy in untreated primary speech signal, reduces transfer ratio Special rate or memory space.After carrying out band compression in untreated primary speech signal input vocoder, reform into narrow Band voice signal, and narrow band voice signal is declined compared to primary speech signal, voice quality and intelligibility, therefore, Compressed narrow band voice signal need to be first obtained, the subsequent relevant treatment for being converted into Whole frequency band voice signal is carried out to it, with Improve the voice quality and intelligibility of narrow band voice signal.

S102, the frequency domain character for extracting the narrow band voice signal；

Specifically, the feature of voice signal mainly has two major classes, temporal signatures and frequency domain character, temporal signatures include: short When average energy, short-time average zero-crossing rate, formant and pitch period etc., frequency domain character includes: linear predictor coefficient (Linear Predictive Coding, LPC), linear prediction residue error (Linear Predictive Cepstral Coding, LPCC), line spectrum pairs parameter (linear spectrum pairs, LSP), short-term spectrum and mel-frequency cepstrum coefficient (Mel- Frequency Cepstral Coefficients, MFCC) etc..Extract the frequency domain character of narrow band voice signal, it is preferred that can To extract the mel-frequency cepstrum coefficient of narrow band voice signal, to carry out subsequent audio signal processing method.

S103, the frequency domain character of the narrow band voice signal is inputted into trained depth noise reduction self-encoding encoder neural network Model carries out nonlinear fitting, obtains the frequency domain character of Whole frequency band voice signal；

Specifically, neural network be widely interconnected by a large amount of, simple processing unit (referred to as neuron) and The complex networks system of formation, it reflects many essential characteristics of human brain function, is a highly complex non-linear dynamic Learning system.Neural network has large-scale parallel, distributed storage and processing, self-organizing, adaptive and self-learning ability, especially It is suitble to processing to need while considers many factors and condition, inaccurate and fuzzy information-processing problem.Depth noise reduction is self-editing The effect of code device neural network model is to carrying out nonlinear fitting between narrow band voice signal and full frequency band voice signal, mainly Including training stage and test phase.

In the training stage of depth noise reduction self-encoding encoder neural network model, depth noise reduction self-encoding encoder neural network model Including input layer, hidden layer and output layer, input layer is used to receive the input letter of depth noise reduction self-encoding encoder neural network model Number, output layer is for exporting depth noise reduction self-encoding encoder neural network model output signal, and hidden layer is for carrying out input signal Non-linear matches between output signal.Activation primitive is used to for nonlinear characteristic being introduced into depth noise reduction self-encoding encoder nerve In network model, so that depth noise reduction self-encoding encoder neural network model completes the nonlinear fitting to input-output signal.It is excellent Choosing, use sigmoid function as activation primitive, the hiding number of plies is set as 3 layers, and every layer of neuronal quantity is set as 500. Depth noise reduction self-encoding encoder neural network is input to by the frequency domain character of the narrow band voice signal of a large amount of first training sample Model, by the frequency domain character and each first training sample of the Whole frequency band voice signal of the first training sample of each of model output Corresponding primary speech signal calculates loss function after being compared, and controls activation primitive according to loss function calculated result and obtains The Nonlinear Parameter fitting result of input-output signal, to complete the training to depth noise reduction self-encoding encoder neural network model.

The test phase of depth noise reduction self-encoding encoder neural network model i.e. trained depth noise reduction self-encoding encoder Neural network model uses test phase, and the frequency domain character of the narrow band voice signal of the second training sample is input to depth drop Self-encoding encoder of making an uproar neural network model, by the frequency domain character of the Whole frequency band voice signal of the second training sample of each of model output Primary speech signal corresponding with each second training sample calculates loss function after being compared, and is calculated and is tied according to loss function Whether fruit determination will also need to continue to the training of depth noise reduction self-encoding encoder neural network model.

S104, the power spectrum that the frequency domain character of the Whole frequency band voice signal is converted to Whole frequency band voice signal, to institute The power spectrum for stating Whole frequency band voice signal does inverse Fourier transform, obtains Whole frequency band voice signal.

Specifically, the present embodiment carries out Short Time Fourier Analysis to the frequency domain character of Whole frequency band voice signal, calculate each It is overlapped the discrete Fourier transform of adding window frame, to obtain the power spectrum of every frame signal；Again to the power of Whole frequency band voice signal Spectrum does inverse Fourier transform, the Whole frequency band voice signal for the bandwidth that is restored.

Compared with the existing technology, the embodiment of the present invention one passes through the frequency domain character for extracting compressed narrow band voice signal, The frequency domain character is input to trained depth noise reduction self-encoding encoder neural network model and carries out nonlinear fitting, obtains full range Frequency domain character with voice signal, then the frequency domain character of Whole frequency band voice signal is converted to the power of Whole frequency band voice signal Spectrum, and then inverse Fourier transform is done, to obtain Whole frequency band voice signal.It realizes and restores compressed narrow band voice signal For Whole frequency band voice signal, the quality and intelligibility of voice signal are improved.

Embodiment two

Fig. 2 be a kind of flow chart of the processing method of voice signal provided by Embodiment 2 of the present invention, the present embodiment be Further refinement on the basis of above-mentioned technical proposal.As shown in Fig. 2, this method specifically includes:

S201, primary speech signal input vocoder is compressed, obtains compressed narrow band voice signal.

Specifically, primary speech signal is untreated voice signal.Vocoder is that a kind of pair of speech is analyzed Volume, decoder with synthesis, also referred to as speech analysis synthesis system or voice band compressibility, are mainly used for signal band pressure Contracting, phonetic storage communication and secret communication.Primary speech signal is input in vocoder, and vocoder carries out band compression to it, The voice signal of output is exactly compressed narrow band voice signal.

Optionally, channel vocoder can be used to compress primary speech signal, the low-pass cut-off frequencies of vocoder It is set as 100Hz, 300Hz or 500Hz.The frequency range of voice signal is divided into many nearby frequency bands or led to by channel vocoder Road, the amplitude spectrum of the narrow band voice signal approximate representation voice signal of output, therefore channel vocoder is only to the width of voice signal Required by degree spectrum has, the voice signal of output has certain loss on frequency band, wherein the voice of single channel vocoder output The frequency band loss of signal is the most serious.

S202, the narrow band voice signal is pre-processed.

Specifically, before narrow band voice signal is analyzed and is handled, it is necessary to carry out preemphasis to it, framing, add The pretreatment operations such as window.The purpose of these operations is eliminated because of mankind's phonatory organ itself and setting due to acquisition voice signal Standby brought aliasing, higher hamonic wave distortion, high frequency etc. influence of the factor to quality of speech signal, guarantee subsequent language as far as possible The signal that sound is handled more evenly, smoothly, for speech recognition provides good parameter, improves speech processes matter Amount.

S203, the frequency domain character for extracting the narrow band voice signal；

S204, the frequency domain character of the narrow band voice signal is inputted into trained depth noise reduction self-encoding encoder neural network Model carries out nonlinear fitting, obtains the frequency domain character of Whole frequency band voice signal；

In the training stage of depth noise reduction self-encoding encoder neural network model, depth noise reduction self-encoding encoder neural network model Including input layer, hidden layer and output layer, input layer is used to receive the input letter of depth noise reduction self-encoding encoder neural network model Number, output layer is for exporting depth noise reduction self-encoding encoder neural network model output signal, and hidden layer is for carrying out input signal Non-linear matches between output signal.Depth noise reduction self-encoding encoder neural network model is also needed using activation primitive with energy Enough to work normally, activation primitive is used to for nonlinear characteristic being introduced into depth noise reduction self-encoding encoder neural network model, so that Depth noise reduction self-encoding encoder neural network model completes the nonlinear fitting to input-output signal.Preferably, it uses Sigmoid function is as activation primitive, and the hiding number of plies is set as 3 layers, and every layer of neuronal quantity is set as 500.By a large amount of The frequency domain character of narrow band voice signal of the first training sample be input to depth noise reduction self-encoding encoder neural network model, by mould Each of type the output frequency domain character of the Whole frequency band voice signal of the first training sample and the corresponding original of each first training sample Beginning voice signal calculates loss function after being compared, and it is defeated to control activation primitive acquisition input-according to loss function calculated result The Nonlinear Parameter fitting result of signal out, to complete the training to depth noise reduction self-encoding encoder neural network model.

S205, the power spectrum that the frequency domain character of the Whole frequency band voice signal is converted to Whole frequency band voice signal, to institute The power spectrum for stating Whole frequency band voice signal does inverse Fourier transform, obtains Whole frequency band voice signal.

Short Time Fourier Analysis is carried out to the frequency domain character of Whole frequency band voice signal, calculates the discrete of each overlapping adding window frame Fourier transformation, to obtain the power spectrum of every frame signal；Inverse Fourier transform is done to the power spectrum of Whole frequency band voice signal again, The Whole frequency band voice signal for the bandwidth that is restored.

Optionally, it increases to verify the Whole frequency band voice signal intelligibility after restoring, it is objective in short-term to can be used Intelligibility (Short-Time Objective Intelligibility, STOI) assesses voice signal, STOI assessment Being worth range is [0,1], is scored higher, intelligibility is higher.As shown in table 1, using the compressed narrowband speech of the embodiment of the present invention The STOI of Whole frequency band voice signal after signal and recovery assesses contrast table, as can be seen from Table 1, no matter vocoder low pass cutoff The intelligibility of the size of frequency, the Whole frequency band voice signal after recovery is equal compared to the intelligibility of compressed narrow band voice signal It increases.

The STOI assessment of narrow band voice signal and Whole frequency band voice signal after recovery after table 1 compresses

The embodiment of the present invention two by by primary speech signal be input in vocoder carry out band compression handle to obtain it is narrow Band voice signal, pre-processes compressed narrow band voice signal, then extracts the frequency domain character of narrow band voice signal again And be entered into trained depth noise reduction self-encoding encoder neural network model and carry out band recovery processing, to obtain full range Band voice signal.Narrow band voice signal is not only reverted into Whole frequency band voice signal, but also ensure that Whole frequency band voice signal Amplitude spectrum improves the quality of speech processes, increases the intelligibility of voice signal.

Embodiment three

Fig. 3 is the flow chart that a kind of pair of narrow band voice signal that the embodiment of the present invention three provides carries out pretreated method, The present embodiment is on the basis of the above embodiments, to carry out pretreated further refinement to narrow band voice signal.Such as Fig. 3 institute Show, this method specifically includes:

S301, preemphasis is carried out to the narrow band voice signal, obtains preemphasis narrow band voice signal.

Specifically, preemphasis is a kind of signal processing mode compensated in transmitting terminal to input signal high fdrequency component. Voice signal is damaged in transmission process very greatly, in order to obtain relatively good voice signal waveform in receiving end, it is necessary to right Impaired voice signal compensates, the thought of pre-emphasis technique be exactly transmission line beginning enhancing voice signal high frequency at Point, to compensate excessive decaying of the high fdrequency component in transmission process.By the narrow band voice signal of preemphasis, the level of output Signal is similar, and attenuation is greatly decreased.

S302, resampling is carried out to the preemphasis narrow band voice signal, obtains resampling narrow band voice signal.

Specifically, it is new sample frequency to adapt to different sample frequencys that resampling, which is by original sampling frequency conversion, It is required that.According to nyquist sampling law, sample frequency needs to adopt more than or equal to 2 times of signal maximum frequency component itself Digital signal after sample completely remains the information in original signal, the data after sampling can be taken to restore to believe Number.The frequency range of usual voice is 50Hz -6kHz, and the frequency range of musical instrument sound is probably 50Hz -8kHz, it is preferred that right The sample frequency that narrow band voice signal after preemphasis carries out resampling can be set to 16kHz.

S303, narrowband obtain framing after smooth to resampling narrow band voice signal progress framing operation and adding window Voice signal；

Specifically, voice signal is the random sequence changed over time, it is not one steady from the point of view of the overall situation Random process, still, in a relatively short period of time, it is believed that it is the stable random process of approximation.This bit of approximation Stable voice signal is just a frame signal, and length is called frame length.One section of voice signal interception is put down at multiple segment approximations Steady voice signal is just named the framing of voice signal.During carrying out framing to voice signal, each frame signal is all needed Windowing process is carried out, so that the amplitude of a frame signal is gradient to zero at both ends.The time difference of the initial position of adjacent two frame is named Frame shifting is done, common following the example of is to be taken as the half of frame length.Optionally, frame length can be set to 16ms, and it is 8ms that corresponding frame, which moves, It is smooth that adding window is carried out using Hamming window.

S304, voice activity detection is carried out to the narrow band voice signal after the framing, obtains removing mute section of narrowband Speech activity signal.

Specifically, the purpose of voice activity detection is that detection voice signal whether there is, i.e., whether have in voice signal quiet The signal of segment just removes this section of mute signal if there is mute section of signal in voice signal.Voice activity detection subtracts The resource occupied during transmitting voice signal and space are lacked, have avoided the coding and transmission to quiet data packet, has saved and calculate Time and bandwidth.

The embodiment of the present invention passes through living to narrow band voice signal progress preemphasis, resampling, framing and adding window and voice Dynamic detection, realizes the pretreatment to compressed narrow band voice signal, eliminate aliasing of the voice signal in transmission process, Higher hamonic wave distortion, high frequency etc. influence of the factor to quality of speech signal, the signal for guaranteeing that subsequent voice is handled are more equal It is even, smooth, good parameter is provided for speech recognition, improves speech processes quality.

Example IV

Fig. 4 is a kind of structural schematic diagram of the processing unit for voice signal that the embodiment of the present invention four provides, the present embodiment It is applicable to carry out bandwidth recovery to the narrow band voice signal that vocoder exports.The device can be using software and/or hardware Mode is realized, and can be integrated at the terminal, such as smart phone, tablet computer, PC (PC) and learning machine etc..The present invention The processing of voice signal provided by any embodiment of the invention can be performed in the processing unit of voice signal provided by embodiment Method has the corresponding functional module of execution method and beneficial effect.The content of not detailed description can in the embodiment of the present invention three With reference to the description in any means embodiment of the present invention.

As shown in figure 4, a kind of processing unit 400 for voice signal that the embodiment of the present invention three provides includes:

Narrow band voice signal obtains module 401, for obtaining compressed narrow band voice signal；

Narrowband frequency domain character extraction module 402, for extracting the frequency domain character of the narrow band voice signal；

Whole frequency band frequency domain character obtains module 403, for training the frequency domain character input of the narrow band voice signal Depth noise reduction self-encoding encoder neural network model carry out nonlinear fitting, obtain the frequency domain character of Whole frequency band voice signal；

Whole frequency band voice signal obtains module 404, for being converted to full the frequency domain character of the Whole frequency band voice signal The power spectrum of band speech signal does inverse Fourier transform to the power spectrum of the Whole frequency band voice signal, obtains Whole frequency band language Sound signal.

Optionally, the narrow band voice signal acquisition module 401 includes:

Narrow band voice signal acquiring unit, for compressing primary speech signal input vocoder, after obtaining compression Narrow band voice signal；

Narrow band voice signal pretreatment unit, user pre-process the narrow band voice signal.

Optionally, the vocoder is channel vocoder.

Optionally, the narrow band voice signal pretreatment unit is specifically used for:

The embodiment of the present invention three is inputted the frequency domain character by the frequency domain character of the compressed narrow band voice signal of extraction Nonlinear fitting is carried out to trained depth noise reduction self-encoding encoder neural network model, obtains the frequency domain of Whole frequency band voice signal Feature, then the frequency domain character of Whole frequency band voice signal is converted to the power spectrum of Whole frequency band voice signal, and then it is inverse to be Fourier Transformation, to obtain Whole frequency band voice signal.It realizes and compressed narrow band voice signal is reverted into Whole frequency band voice signal, Improve the quality and intelligibility of voice signal.

Embodiment five

Fig. 5 is a kind of structural schematic diagram for terminal that the embodiment of the present invention five provides.Fig. 5, which is shown, to be suitable for being used to realizing this The block diagram of the exemplary terminal 512 of invention embodiment.The terminal 512 that Fig. 5 is shown is only an example, should not be to the present invention The function and use scope of embodiment bring any restrictions.

As shown in figure 5, terminal 512 is showed in the form of general purpose terminal.The component of terminal 512 can include but is not limited to: One or more processor 516 (taking a processor as an example in Fig. 5), storage device 528 connect different system component (packets Include storage device 528 and processor 516) bus 518.

Bus 518 indicates one of a few class bus structures or a variety of, including storage device bus or storage device control Device processed, peripheral bus, graphics acceleration port, processor or total using the local of any bus structures in a variety of bus structures Line.For example, these architectures include but is not limited to industry standard architecture (Industry Subversive Alliance, ISA) bus, microchannel architecture (Micro Channel Architecture, MAC) bus is enhanced Isa bus, Video Electronics Standards Association (Video Electronics Standards Association, VESA) local are total Line and peripheral component interconnection (Peripheral Component Interconnect, PCI) bus.

Terminal 512 typically comprises a variety of computer system readable media.These media can be it is any can be by terminal The usable medium of 512 access, including volatile and non-volatile media, moveable and immovable medium.

Storage device 528 may include the computer system readable media of form of volatile memory, such as arbitrary access Memory (Random Access Memory, RAM) 530 and/or cache memory 532.Terminal 512 can be wrapped further Include other removable/nonremovable, volatile/non-volatile computer system storage mediums.Only as an example, storage system 534 can be used for reading and writing immovable, non-volatile magnetic media (Fig. 5 do not show, commonly referred to as " hard disk drive ").Although It is not shown in Fig. 5, the disc driver for reading and writing to removable non-volatile magnetic disk (such as " floppy disk ") can be provided, and To removable anonvolatile optical disk, such as CD-ROM (Compact Disc Read-Only Memory, CD-ROM), number Optic disk (Digital Video Disc-Read Only Memory, DVD-ROM) or other optical mediums) read-write CD drive Dynamic device.In these cases, each driver can be connected by one or more data media interfaces with bus 518.It deposits Storage device 528 may include at least one program product, which has one group of (for example, at least one) program module, this A little program modules are configured to perform the function of various embodiments of the present invention.

Program/utility 540 with one group of (at least one) program module 542 can store in such as storage dress It sets in 528, such program module 542 includes but is not limited to operating system, one or more application program, other program moulds It may include the realization of network environment in block and program data, each of these examples or certain combination.Program module 542 usually execute function and/or method in embodiment described in the invention.

Terminal 512 can also be logical with one or more external equipments 514 (such as keyboard, direction terminal, display 524 etc.) Letter, can also be enabled a user to one or more terminal interact with the terminal 512 communicate, and/or with make the terminal 512 Any terminal (such as network interface card, modem etc.) communication that can be communicated with one or more of the other computing terminal.This Kind communication can be carried out by input/output (I/O) interface 522.Also, terminal 512 can also by network adapter 520 with One or more network (such as local area network (Local Area Network, LAN), wide area network (Wide Area Network, WAN) and/or public network, for example, internet) communication.As shown in figure 5, network adapter 520 passes through bus 518 and terminal 512 Other modules communication.It should be understood that although not shown in the drawings, other hardware and/or software mould can be used in conjunction with terminal 512 Block, including but not limited to: microcode, terminal driver, redundant processor, external disk drive array, disk array (Redundant Arrays of Independent Disks, RAID) system, tape drive and data backup storage system System etc..

The program that processor 516 is stored in storage device 528 by operation, thereby executing various function application and number According to processing, such as realize the processing method of voice signal provided by any embodiment of the invention, this method may include:

Obtain compressed narrow band voice signal；

Extract the frequency domain character of the narrow band voice signal；

Embodiment six

The embodiment of the present invention six additionally provides a kind of computer readable storage medium, is stored thereon with computer program, should The processing method such as voice signal provided by any embodiment of the invention is realized when program is executed by processor, this method can be with Include:

Obtain compressed narrow band voice signal；

Extract the frequency domain character of the narrow band voice signal；

The computer storage medium of the embodiment of the present invention five, can appointing using one or more computer-readable media Meaning combination.Computer-readable medium can be computer-readable signal media or computer readable storage medium.Computer can Reading storage medium for example may be-but not limited to-the system of electricity, magnetic, optical, electromagnetic, infrared ray or semiconductor, device Or device, or any above combination.The more specific example (non exhaustive list) of computer readable storage medium includes: Electrical connection, portable computer diskette, hard disk, random access memory (RAM), read-only storage with one or more conducting wires Device (ROM), erasable programmable read only memory (EPROM or flash memory), optical fiber, portable compact disc read-only memory (CD- ROM), light storage device, magnetic memory device or above-mentioned any appropriate combination.In this document, computer-readable storage Medium can be any tangible medium for including or store program, which can be commanded execution system, device or device Using or it is in connection.

Computer-readable signal media may include in a base band or as carrier wave a part propagate data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium other than storage medium is read, which can send, propagates or transmit and be used for By the use of instruction execution system, device or device or program in connection.

The program code for including on computer-readable medium can transmit with any suitable medium, including --- but it is unlimited In wireless, electric wire, optical cable, RF etc. or above-mentioned any appropriate combination.

The computer for executing operation of the present invention can be write with one or more programming languages or combinations thereof Program code, described program design language include object oriented program language-such as Java, Smalltalk, C++, It further include conventional procedural programming language-such as " C " language or similar programming language.Program code can be with It fully executes, partly execute on the user computer on the user computer, being executed as an independent software package, portion Divide and partially executes or executed on remote computer or terminal completely on the remote computer on the user computer.It is relating to And in the situation of remote computer, remote computer can pass through the network of any kind --- including local area network (LAN) or extensively Domain net (WAN)-be connected to subscriber computer, or, it may be connected to outer computer (such as provided using Internet service Quotient is connected by internet).

Note that the above is only a better embodiment of the present invention and the applied technical principle.It will be appreciated by those skilled in the art that The invention is not limited to the specific embodiments described herein, be able to carry out for a person skilled in the art it is various it is apparent variation, It readjusts and substitutes without departing from protection scope of the present invention.Therefore, although being carried out by above embodiments to the present invention It is described in further detail, but the present invention is not limited to the above embodiments only, without departing from the inventive concept, also It may include more other equivalent embodiments, and the scope of the invention is determined by the scope of the appended claims.

Claims

1. a kind of processing method of voice signal characterized by comprising

Obtain compressed narrow band voice signal；

Extract the frequency domain character of the narrow band voice signal；

The frequency domain character of the narrow band voice signal is inputted trained depth noise reduction self-encoding encoder neural network model to carry out Nonlinear fitting obtains the frequency domain character of Whole frequency band voice signal；

The power spectrum that the frequency domain character of the Whole frequency band voice signal is converted to Whole frequency band voice signal, to the Whole frequency band language The power spectrum of sound signal does inverse Fourier transform, obtains Whole frequency band voice signal.

2. the method as described in claim 1, which is characterized in that the frequency domain character is mel-frequency cepstrum coefficient.

3. the method as described in claim 1, which is characterized in that the depth noise reduction self-encoding encoder neural network model uses Sigmoid function is set as layer 2-4 as activation primitive, the hiding number of plies.

4. the method as described in claim 1, which is characterized in that described to obtain compressed narrow band voice signal and include:

The narrow band voice signal is pre-processed.

5. method as claimed in claim 4, which is characterized in that the vocoder is channel vocoder.

6. method as claimed in claim 4, which is characterized in that the low-pass cut-off frequencies of the vocoder be set as 100Hz, 300Hz or 500Hz.

7. method as claimed in claim 4, which is characterized in that it is described to the narrow band voice signal carry out pretreatment include:

Framing operation is carried out to the resampling narrow band voice signal and adding window is smooth, the narrow band voice signal after obtaining framing；

Voice activity detection is carried out to the narrow band voice signal after the framing, obtains removing mute section of narrowband speech activity letter Number.

8. a kind of processing unit of voice signal characterized by comprising

Whole frequency band frequency domain character obtains module, drops for the frequency domain character of the narrow band voice signal to be inputted trained depth Self-encoding encoder of making an uproar neural network model carries out nonlinear fitting, obtains the frequency domain character of Whole frequency band voice signal；

Whole frequency band voice signal obtains module, for the frequency domain character of the Whole frequency band voice signal to be converted to Whole frequency band voice The power spectrum of signal does inverse Fourier transform to the power spectrum of the Whole frequency band voice signal, obtains Whole frequency band voice signal.

9. a kind of terminal characterized by comprising

One or more processors；

Storage device, for storing one or more programs,

When one or more of programs are executed by one or more of processors, so that one or more of processors are real Now such as the processing method of voice signal of any of claims 1-7.

10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is by processor The processing method such as voice signal of any of claims 1-7 is realized when execution.