CN109119093A

CN109119093A - Voice de-noising method, device, storage medium and mobile terminal

Info

Publication number: CN109119093A
Application number: CN201811273582.5A
Authority: CN
Inventors: 陈岩
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2018-10-30
Filing date: 2018-10-30
Publication date: 2019-01-01
Also published as: WO2020088154A1

Abstract

The embodiment of the present application discloses voice de-noising method, device, storage medium and mobile terminal.Wherein, method includes: the voice signal for receiving and being acquired based on earphone single microphone, extract the initial amplitude spectrum and phase spectrum of the voice signal, initial amplitude spectrum is transmitted in noise reduction model trained in advance, the first noise reduction amplitude spectrum is obtained, masking processing is carried out to the first noise reduction amplitude spectrum, generates the second noise reduction amplitude spectrum, the second noise reduction amplitude spectrum and the phase spectrum are recombinated, the first reducing noise of voice is generated.By using above scheme, keep the phase spectrum of voice signal constant, noise reduction process is carried out based on amplitude spectrum of the noise reduction model to voice signal, and the first noise reduction amplitude spectrum obtained to processing carries out masking processing, eliminate the problem of dtmf distortion DTMF in noise reduction process, it solves the problems, such as that the pronunciation signal noise acquired under noisy environment is big, realizes the quick high accuracy noise reduction of the voice signal acquired to earphone.

Description

Voice de-noising method, device, storage medium and mobile terminal

Technical field

The invention relates to voice processing technology field more particularly to a kind of voice de-noising method, device, storage Jie Matter and mobile terminal.

Background technique

With the fast development of the mobile terminals such as mobile phone, earphone becomes the important component of the mobile terminals such as mobile phone, can Realize audio broadcastings, voice instant messaging by earphone, the functions such as dial and receive calls, by more and more users receive with It uses.

During making a phone call, voice signal is acquired by ear microphone, voice signal is sent to mobile terminal, Voice signal is sent to counterpart telephone by uplink.But earphone picks up sound, noise reduction generally by single microphone Ability is poor, when user is when the noisy public place such as subway, bus is made a phone call, passes through what ear microphone acquired Pronunciation signal noise is big, and the noise reduction of high quality can not be carried out to the voice signal of acquisition, and speech quality is poor.

Summary of the invention

The embodiment of the present application provides voice de-noising method, device, storage medium and mobile terminal, acquires to ear microphone Voice signal carry out high quality noise reduction, improve speech quality.

In a first aspect, the embodiment of the present application provides a kind of voice de-noising method, comprising:

Receive the voice signal acquired based on earphone single microphone；

The initial amplitude spectrum and phase spectrum for extracting the voice signal, are transmitted to training in advance for initial amplitude spectrum In noise reduction model, the first noise reduction amplitude spectrum is obtained；

Masking processing is carried out to the first noise reduction amplitude spectrum, generates the second noise reduction amplitude spectrum；

The second noise reduction amplitude spectrum and the phase spectrum are recombinated, the first reducing noise of voice is generated.

Second aspect, the embodiment of the present application provide a kind of voice noise reduction device, comprising:

Speech reception module, for receiving the voice signal based on the acquisition of earphone single microphone；

First noise reduction amplitude spectrum generation module, the initial amplitude for extracting the voice signal is composed and phase spectrum, by institute It states initial amplitude spectrum to be transmitted in noise reduction model trained in advance, obtains the first noise reduction amplitude spectrum；

Second noise reduction amplitude spectrum generation module generates second for carrying out masking processing to the first noise reduction amplitude spectrum Noise reduction amplitude spectrum；

First reducing noise of voice generation module, it is raw for being recombinated to the second noise reduction amplitude spectrum and the phase spectrum At the first reducing noise of voice.

The third aspect, the embodiment of the present application provide a kind of computer readable storage medium, are stored thereon with computer journey Sequence realizes the voice de-noising method as described in the embodiment of the present application when the program is executed by processor.

Fourth aspect, the embodiment of the present application provide a kind of mobile terminal, including memory, processor and are stored in storage On device and the computer program that can run on a processor, the processor realize such as the application when executing the computer program Voice de-noising method described in embodiment.

The voice de-noising method provided in the embodiment of the present application is believed by receiving the voice acquired based on earphone single microphone Number, the initial amplitude spectrum and phase spectrum of the voice signal are extracted, initial amplitude spectrum is transmitted to noise reduction trained in advance In model, the first noise reduction amplitude spectrum is obtained, masking processing is carried out to the first noise reduction amplitude spectrum, generates the second noise reduction amplitude Spectrum, recombinates the second noise reduction amplitude spectrum and the phase spectrum, generates the first reducing noise of voice.By using above-mentioned side Case keeps the phase spectrum of voice signal constant, carries out noise reduction process based on amplitude spectrum of the noise reduction model to voice signal, and to place It manages the first obtained noise reduction amplitude spectrum and carries out masking processing, eliminate the problem of dtmf distortion DTMF in noise reduction process, solve in noisy environment The big problem of the pronunciation signal noise of lower acquisition realizes the quick high accuracy noise reduction of the voice signal acquired to earphone.

Detailed description of the invention

Fig. 1 is a kind of flow diagram of voice de-noising method provided by the embodiments of the present application；

Fig. 2 is the flow diagram of another voice de-noising method provided by the embodiments of the present application；

Fig. 3 is the flow diagram of another voice de-noising method provided by the embodiments of the present application；

Fig. 4 is a kind of structural schematic diagram of voice noise reduction device provided by the embodiments of the present application；

Fig. 5 is a kind of structural schematic diagram of mobile terminal provided by the embodiments of the present application；

Fig. 6 is the structural schematic diagram of another mobile terminal provided by the embodiments of the present application.

Specific embodiment

Further illustrate the technical solution of the application below with reference to the accompanying drawings and specific embodiments.It is understood that It is that specific embodiment described herein is used only for explaining the application, rather than the restriction to the application.It further needs exist for illustrating , part relevant to the application is illustrated only for ease of description, in attached drawing rather than entire infrastructure.

It should be mentioned that some exemplary embodiments are described as before exemplary embodiment is discussed in greater detail The processing or method described as flow chart.Although each step is described as the processing of sequence by flow chart, many of these Step can be implemented concurrently, concomitantly or simultaneously.In addition, the sequence of each step can be rearranged.When its operation The processing can be terminated when completion, it is also possible to have the additional step being not included in attached drawing.The processing can be with Corresponding to method, function, regulation, subroutine, subprogram etc..

Fig. 1 is a kind of flow diagram of voice de-noising method provided by the embodiments of the present application, and this method can be by voice Denoising device executes, and wherein the device can be implemented by software and/or hardware, and can generally integrate in the terminal.Such as Fig. 1 institute Show, this method comprises:

The voice signal that step 101, reception are acquired based on earphone single microphone.

Step 102, the initial amplitude spectrum and phase spectrum for extracting the voice signal, initial amplitude spectrum are transmitted to pre- First in trained noise reduction model, the first noise reduction amplitude spectrum is obtained.

Step 103 carries out masking processing to the first noise reduction amplitude spectrum, generates the second noise reduction amplitude spectrum.

Step 104 recombinates the second noise reduction amplitude spectrum and the phase spectrum, generates the first reducing noise of voice.

Illustratively, the mobile terminal in the embodiment of the present application may include that mobile phone and tablet computer etc. have call function Smart machine.Earphone in the present embodiment can send out the voice signal of acquisition with mobile terminal wired connection or wireless connection It send to mobile terminal.It should be noted that acquiring voice signal based on the single microphone being arranged in earphone, Mike in earphone is reduced Wind quantity reduces hardware cost.

After ear microphone acquires analog signal, the voice signal of acquisition is carried out by analog-to-digital conversion based on analog-digital converter, Audio digital signals are obtained, audio digital signals are transmitted to mobile terminal.Optionally, earphone carries out the analog signal of acquisition Segmentation, will treated each section of voice signal real-time Transmission to mobile terminal.For example, can will be 0 per signal strength twice Time point between analog signal be divided into a signal segment.Optionally, carrying out segmentation to analog signal includes: with microphone Acquisition voice signal Startup time is initial time, is denoted as sart point in time t0；It searches in analog signal after sart point in time t0 First time occur signal strength be 0 signal time point, by after sart point in time t0 first time occur signal it is strong The time point for the signal that degree is 0 is as end time point t1；By the simulation letter between sart point in time t0 and end time point t1 Number it is divided into a signal segment；The signal for being 0 with the presence or absence of signal strength is continued to test, if so, will be after end time point t1 First time occur signal strength be 0 signal time point, by after end time point t1 first time occur signal it is strong Degree for 0 signal time point as next signal segment at the beginning of point t0, and execution above-mentioned steps are recycled, in real time to voice The analog signal of acquisition equipment acquisition is segmented.Optionally, it can also be and analog signal carried out according to fixed interval Segmentation.Illustratively, fixed interval T is triggered with sound acquisition equipment constantly for initial time, the segmentation of analog signal It can be 0-T, T-2T and 2T-3T etc., wherein T can be 500ms or 1s.

The voice signal that mobile terminal real-time reception earphone is sent carries out Short Time Fourier Transform to voice signal, generates The initial amplitude of voice signal is composed and phase spectrum.Wherein, initial amplitude spectrum is that the range value of voice signal is bent with the variation of frequency Line, phase spectrum are the phase curves varying with frequency of voice signal.In the present embodiment, initial amplitude is composed based on noise reduction model It is handled, to realize to the function of voice signal noise reduction, while phase spectrum is constant, ensure that the accurate of voice signal after noise reduction Degree.

Optionally, the noise reduction model is deep neural network model (the Deep Neural with decrease of noise functions Networks, DNN) or production confrontation network model (Generative Adversarial Net, GAN).For having The DNN network model of decrease of noise functions is generated by great amount of samples training.Wherein sample includes clean speech and based on clean speech Noise signal generates plus voice of making an uproar is added, DNN network model can be the training by way of supervised learning and obtain, specifically Voice of making an uproar will be added to be input in DNN noise reduction model to be trained, based on DNN noise reduction model to be trained, obtain processed drop It makes an uproar voice, which is compared with clean speech, when reducing noise of voice and clean speech difference, according to reducing noise of voice And the deviation of clean speech reversely adjusts the weight parameter in DNN noise reduction model, and iteration executes above-mentioned training process, until DNN The reducing noise of voice and clean speech of the output of noise reduction model reach default similarity.The DNN noise reduction model that training is completed has language Sound decrease of noise functions, after the initial amplitude spectrum of the received voice signal of mobile terminal is input to the DNN noise reduction model of training completion, It can obtain the first noise reduction amplitude spectrum of the voice signal.

It for the GAN network model with decrease of noise functions, including generates sub-network and differentiates sub-network, generate sub-network and use In carrying out noise reduction process to the amplitude spectrum of input, differentiate sub-network for differentiating to the amplitude spectrum of input, it is determined whether packet Noisy acoustical signal.Such as first differentiation submodel is trained based on training sample, it is improved by adjusting network parameter and differentiates son The discrimination precision of model, after the completion of differentiating submodel training, the fixed network parameter for differentiating submodel, to generate submodel into Row training, adjusts the network parameter for generating submodel, so that including the differentiation of noise signal in generation submodel output amplitude spectrum Probability decline.Above-mentioned training process is recycled, when the output result for differentiating submodel and generation submodel meets default error, really Determine production confrontation network model training to complete.After the completion of the training of GAN network model, initial amplitude spectrum is input to GAN net The amplitude spectrum for generating submodel input is determined as the first noise reduction amplitude spectrum by the generation submodel of network model.Optionally, to first Before beginning amplitude spectrum is handled, it can also be and initial amplitude spectrum is input in the differentiation submodel of GAN network model, according to Whether differentiate that the output result of submodel determines in initial amplitude spectrum includes noise signal, input if so, initial amplitude is composed To the generation submodel of GAN network model, if not, it is determined that the voice signal of earphone acquisition is clean signal, without being dropped It makes an uproar processing.In some embodiments, it can also be the differentiation submodel that the first noise reduction amplitude spectrum is input to GAN network model In, when being greater than preset value according to the probability in the output result for differentiating submodel in the first noise reduction amplitude spectrum including noise signal, It is input to the generation submodel of GAN network model to the first noise reduction amplitude spectrum circulation, carries out secondary noise reduction, until output result is full Sufficient preset requirement.By the detection to the first noise reduction amplitude spectrum, noise reduction precision is improved, voice is believed after further improving processing Number clarity.

In the present embodiment, after determining the first noise reduction amplitude spectrum, masking processing is carried out to the first noise reduction amplitude spectrum, is used In providing the quality of voice signal, the distorted signals caused by noise reduction process is avoided.Masking processing is for removing the first noise reduction width Distorted signal in degree spectrum compensates.Optionally, distorted signals is judged whether there is according to the first noise reduction amplitude spectrum, if it is not, Step 103 is then omitted, the first noise reduction amplitude spectrum is recombinated with phase spectrum, obtains the first reducing noise of voice；If so, to described One noise reduction amplitude spectrum carries out masking processing, obtains the second noise reduction amplitude spectrum, recombinates to the second noise reduction amplitude spectrum and phase spectrum, And it carries out inverse Fourier transform in short-term and generates the first reducing noise of voice.

After obtaining the first reducing noise of voice, the first reducing noise of voice is transmitted.Illustratively, mobile terminal processing is logical First reducing noise of voice is then transmitted to the uplink of call, is transferred to counterpart mobile terminal by speech phase.Illustratively, mobile First reducing noise of voice is sent to instant communication server, is transmitted to counterpart mobile terminal by terminal processes instant communication state.It is real Now to the quick high accuracy noise reduction of the voice signal of earphone acquisition, speech quality is improved.

Fig. 2 is the flow diagram of another voice de-noising method provided by the embodiments of the present application, referring to fig. 2, this implementation The method of example includes the following steps:

The voice signal that step 201, reception are acquired based on earphone single microphone.

Step 202, the initial amplitude spectrum and phase spectrum for extracting the voice signal, initial amplitude spectrum are transmitted to pre- First in trained noise reduction model, the first noise reduction amplitude spectrum is obtained.

The range value of each frequency point of current demand signal frame in the first noise reduction amplitude spectrum and described second drop in step 203 The range value of the correspondence frequency point of a upper signal frame in amplitude spectrum of making an uproar is smoothed, and generates the second noise reduction width of current demand signal frame Degree spectrum.

Step 204 recombinates the second noise reduction amplitude spectrum and the phase spectrum, generates the first reducing noise of voice.

In the present embodiment, during carrying out masking processing to the first noise reduction amplitude spectrum, as unit of signal frame Reason, for arbitrary signal frame, based on the second noise reduction amplitude spectrum in preceding signal frame to the first noise reduction amplitude spectrum of current demand signal frame It is smoothed, specifically, to any frequency point in current demand signal frame, by the range value of the frequency point and a upper signal frame the The range value of the frequency point is smoothed in two noise reduction amplitude spectrums, obtains the frequency point in current demand signal frame the second noise reduction amplitude spectrum Range value.Wherein, the masking factor being smoothed can be fixed preset value, can also be according to real-time change Range value determines.

Optionally, masking processing is carried out to the first noise reduction amplitude spectrum and meets following formula:

Wherein, λ (m, k) is masking factor, which meets following formula:

And

Wherein,For the second noise reduction amplitude spectrum,For the first noise reduction amplitude spectrum, m is the frame number of voice signal, and k is frequency point, σ is standard deviation.

The voice de-noising method provided in the embodiment of the present application is kept away by carrying out masking processing to the first noise reduction amplitude spectrum Exempt from distorted signals caused by noise reduction process, improves noise-reduced speech signal quality.

Fig. 3 is the flow diagram of another voice de-noising method provided by the embodiments of the present application, and the present embodiment is above-mentioned One optinal plan of embodiment, correspondingly, as shown in figure 3, the method for the present embodiment includes the following steps:

The voice signal that step 301, reception are acquired based on earphone single microphone.

Step 302, the initial amplitude spectrum and phase spectrum for extracting the voice signal, initial amplitude spectrum are transmitted to pre- First in trained noise reduction model, the first noise reduction amplitude spectrum is obtained.

Step 303 carries out masking processing to the first noise reduction amplitude spectrum, generates the second noise reduction amplitude spectrum.

Step 304 recombinates the second noise reduction amplitude spectrum and the phase spectrum, generates the first reducing noise of voice, will First reducing noise of voice is sent to the uplink of call, is transferred to counterpart mobile terminal.

Step 305, the call voice signal for receiving downlink transmission carry out noise to the call voice signal and estimate Meter.

Step 306, when determining that the call voice signal includes noise signal according to noise estimation, by the call voice The amplitude spectrum of signal is input in noise reduction model trained in advance, obtains third noise reduction amplitude spectrum.

Step 307 generates the second reducing noise of voice according to third noise reduction amplitude spectrum, and second reducing noise of voice is broadcast It puts.

In the present embodiment, step 301 after carrying out noise reduction to voice signal, is passed to step 304 for acquiring voice signal Counterpart mobile terminal is transported to, realizes the transmission of voice signal in communication process；Step 305 is moved to step 307 for receiving other side The voice signal that dynamic terminal is sent, when the voice signal is there are when noise signal, after carrying out noise reduction to received voice signal, base Loudspeaker in earphone or mobile terminal plays out, and realizes the reception of voice signal in communication process.

Wherein, in step 305, the call voice signal for receiving downlink transmission, makes an uproar to call voice signal Sound estimation, when call voice signal be clean speech signal, when not including noise signal, which is broadcast It puts.When call voice signal includes noise signal, Short Time Fourier Transform is carried out to call voice signal, obtains call voice The amplitude spectrum and phase spectrum of signal will be carried out at noise reduction based on amplitude spectrum of the noise reduction model trained in advance to call voice signal Reason, obtains third noise reduction amplitude spectrum, the phase spectrum of third noise reduction amplitude spectrum and call voice signal is recombinated, and carry out short When inverse Fourier transform, obtain the second reducing noise of voice, by the second reducing noise of voice be sent to the loudspeaker of mobile terminal or earphone into Row plays.By carrying out noise estimation to call voice signal, the call voice signal comprising noise signal be directed to Property noise reduction process, avoid carrying out clean speech signal invalid noise reduction process, improve Speech processing efficiency, avoid leading to Letter delay, improves speech quality.

In the present embodiment, when the generation submodel of application GAN network model carries out noise reduction process to voice signal, base Noise estimation is carried out to call voice signal in the differentiation submodel of application GAN network model.It is described that the call voice is believed Number carry out noise estimation, comprising: by the call voice signal be input to production confrontation network model differentiation submodel in, Determine whether the call voice signal includes noise signal according to the output result for differentiating submodel, wherein described to sentence Small pin for the case model is used to carry out noise estimation to input voice, carries out noise to voice signal respectively based on same GAN network model Estimation and noise reduction process not only increase signal processing efficiency, also save network it is not necessary that different network models is respectively set The memory source that model occupies.

In some embodiments, described that noise estimation is carried out to the call voice signal, comprising: by the call voice Signal is compared with preset template signal, calculates the difference power of the call voice signal and preset template signal, root Determine whether the call voice signal includes noise signal according to the difference power.Wherein, it is more to can be storage in mobile terminal Template Information similar in the voice content of the template signal of a different phonetic content, selection and call voice signal is to call voice Signal carries out noise evaluation.Illustratively, call voice signal and the difference power of preset template signal are bigger, show language of conversing Sound signal includes that noise signal is more, and call voice signal and the difference power of preset template signal are smaller, shows call voice Signal includes that noise signal is fewer.Power threshold is set, when call voice signal and the difference power of preset template signal are greater than Or when being equal to power threshold, determine that the call voice signal includes noise signal, when call voice signal and preset template When the difference power of signal is less than power threshold, determine that call voice signal does not include noise signal, i.e., call voice signal is dry Net voice signal.Wherein, power threshold can be is arranged according to user demand, if it is desired to which high quality is conversed, is then arranged lesser Power threshold.It can be after receiving the voice signal acquired based on earphone single microphone, the voice signal of acquisition carried out Noise evaluation directly sends the upper of call for the voice signal of acquisition when determining in voice signal does not include noise signal Row access, is transferred to counterpart mobile terminal.

It should be noted that step 301 is not limited in the present embodiment to step 304 and step 305 to the execution of step 307 Sequentially, in other embodiments, it can also be and first carry out step 305 to step 307, then execute step 301 to step 304.

The voice de-noising method provided in the embodiment of the present application, by believing the voice of earphone acquisition under through state Number and counterpart mobile terminal send call voice signal judged, when comprising noise signal, be based on preset noise reduction Model carries out noise reduction process, solves the problems, such as that speech quality is poor when conversing under noisy environment, improves noise reduction precision and lead to Talk about quality.

Fig. 4 is a kind of structural block diagram of voice noise reduction device provided by the embodiments of the present application, the device can by software and/or Hardware realization is typically integrated in mobile terminal, can by execute mobile terminal voice de-noising method come to desktop layouts into Edlin.As shown in figure 4, the device includes: speech reception module 401, the first noise reduction amplitude spectrum generation module 402, second drop It makes an uproar amplitude spectrum generation module 403 and the first reducing noise of voice generation module 404.

Speech reception module 401, for receiving the voice signal based on the acquisition of earphone single microphone；

First noise reduction amplitude spectrum generation module 402, the initial amplitude for extracting the voice signal is composed and phase spectrum, will The initial amplitude spectrum is transmitted in noise reduction model trained in advance, obtains the first noise reduction amplitude spectrum；

Second noise reduction amplitude spectrum generation module 403 generates for carrying out masking processing to the first noise reduction amplitude spectrum Two noise reduction amplitude spectrums；

First reducing noise of voice generation module 404, for being recombinated to the second noise reduction amplitude spectrum and the phase spectrum, Generate the first reducing noise of voice.

The voice noise reduction device provided in the embodiment of the present application keeps the phase spectrum of voice signal constant, is based on noise reduction mould Type carries out noise reduction process to the amplitude spectrum of voice signal, and the first noise reduction amplitude spectrum obtained to processing carries out masking processing, disappears It except the problem of dtmf distortion DTMF in noise reduction process, solves the problems, such as that the pronunciation signal noise acquired under noisy environment is big, realizes to ear The quick high accuracy noise reduction of the voice signal of machine acquisition.

On the basis of the above embodiments, the noise reduction model is that deep neural network model or production fight network Model.

On the basis of the above embodiments, the second noise reduction amplitude spectrum generation module 403 is used for:

By the range value of each frequency point of current demand signal frame in the first noise reduction amplitude spectrum and the second noise reduction amplitude spectrum In the range value of correspondence frequency point of a upper signal frame be smoothed, generate the second noise reduction amplitude spectrum of current demand signal frame.

On the basis of the above embodiments, the second noise reduction amplitude spectrum generation module 403 meets following formula:

Wherein, masking factor λ (m, k) meets following formula:

And

On the basis of the above embodiments, the initial amplitude spectrum of the voice signal and phase spectrum are based on believing the voice Number carry out Short Time Fourier Transform generation；

The reducing noise of voice based on to after recombination the second noise reduction amplitude spectrum and the phase spectrum carry out in short-term Fourier it is inverse Transformation generates.

On the basis of the above embodiments, further includes:

Noise estimation module, for receive downlink transmission call voice signal, to the call voice signal into The estimation of row noise；

Third noise reduction amplitude spectrum generation module, for determining that the call voice signal includes noise letter according to noise estimation Number when, the amplitude spectrum of the call voice signal is input in advance trained noise reduction model, obtains third noise reduction amplitude spectrum；

Second reducing noise of voice generation module, for generating the second reducing noise of voice according to third noise reduction amplitude spectrum, and will be described Second reducing noise of voice plays out.

On the basis of the above embodiments, noise estimation module is used for:

The call voice signal is compared with preset template signal, calculate the call voice signal and is preset Template signal difference power, determine whether the call voice signal includes noise signal according to the difference power.

On the basis of the above embodiments, noise estimation module is used for:

The call voice signal is input in the differentiation submodel of production confrontation network model, according to the differentiation The output result of submodel determines whether the call voice signal includes noise signal, wherein the differentiation submodel is used for Noise estimation is carried out to input voice.

The embodiment of the present application also provides a kind of storage medium comprising computer executable instructions, and the computer is executable Instruction is used to execute voice de-noising method when being executed by computer processor, this method comprises:

Receive the voice signal acquired based on earphone single microphone；

Storage medium --- any various types of memory devices or storage equipment.Term " storage medium " is intended to wrap It includes: install medium, such as CD-ROM, floppy disk or magnetic tape equipment；Computer system memory or random access memory, such as DRAM, DDRRAM, SRAM, EDORAM, blue Bath (Rambus) RAM etc.；Nonvolatile memory, such as flash memory, magnetic medium (example Such as hard disk or optical storage)；Register or the memory component of other similar types etc..Storage medium can further include other types Memory or combinations thereof.In addition, storage medium can be located at program in the first computer system being wherein performed, or It can be located in different second computer systems, second computer system is connected to the first meter by network (such as internet) Calculation machine system.Second computer system can provide program instruction to the first computer for executing.Term " storage medium " can To include two or more that may reside in different location (such as in the different computer systems by network connection) Storage medium.Storage medium can store the program instruction that can be performed by one or more processors and (such as be implemented as counting Calculation machine program).

Certainly, a kind of storage medium comprising computer executable instructions, computer provided by the embodiment of the present application The voice de-noising operation that executable instruction is not limited to the described above, can also be performed voice provided by the application any embodiment Relevant operation in noise-reduction method.

The embodiment of the present application provides a kind of mobile terminal, and language provided by the embodiments of the present application can be integrated in the mobile terminal Sound denoising device.Fig. 5 is a kind of structural schematic diagram of mobile terminal provided by the embodiments of the present application.Mobile terminal 500 can wrap It includes: memory 501, processor 502 and the computer program that is stored on memory 501 and can be run in processor 502, it is described Processor 502 realizes the voice de-noising method as described in the embodiment of the present application when executing the computer program.

Mobile terminal provided by the embodiments of the present application keeps the phase spectrum of voice signal constant, based on noise reduction model to language The amplitude spectrum of sound signal carries out noise reduction process, and the first noise reduction amplitude spectrum obtained to processing carries out masking processing, eliminates noise reduction Problem of dtmf distortion DTMF in the process solves the problems, such as that the pronunciation signal noise acquired under noisy environment is big, realizes and acquires to earphone Voice signal quick high accuracy noise reduction.

Fig. 6 is the structural schematic diagram of another mobile terminal provided by the embodiments of the present application.The mobile terminal may include: Shell (not shown), memory 601, central processing unit (central processing unit, CPU) 602 (are also known as located Manage device, hereinafter referred to as CPU), circuit board (not shown) and power circuit (not shown).The circuit board is placed in institute State the space interior that shell surrounds；The CPU602 and the memory 601 are arranged on the circuit board；The power supply electricity Road, for each circuit or the device power supply for the mobile terminal；The memory 601, for storing executable program generation Code；The CPU602 is run and the executable journey by reading the executable program code stored in the memory 601 The corresponding computer program of sequence code, to perform the steps of

Receive the voice signal acquired based on earphone single microphone；

The mobile terminal further include: Peripheral Interface 603, RF (Radio Frequency, radio frequency) circuit 605, audio-frequency electric Road 606, loudspeaker 611, power management chip 608, input/output (I/O) subsystem 609, other input/control devicess 610, Touch screen 612, other input/control devicess 610 and outside port 604, these components pass through one or more communication bus Or signal wire 607 communicates.

It should be understood that illustrating the example that mobile terminal 600 is only mobile terminal, and mobile terminal 600 It can have than shown in the drawings more or less component, can combine two or more components, or can be with It is configured with different components.Various parts shown in the drawings can include one or more signal processings and/or dedicated It is realized in the combination of hardware, software or hardware and software including integrated circuit.

Just the mobile terminal provided in this embodiment for operating to voice de-noising is described in detail below, the movement Terminal takes the mobile phone as an example.

Memory 601, the memory 601 can be accessed by CPU602, Peripheral Interface 603 etc., and the memory 601 can It can also include nonvolatile memory to include high-speed random access memory, such as one or more disk memory, Flush memory device or other volatile solid-state parts.

The peripheral hardware that outputs and inputs of equipment can be connected to CPU602 and deposited by Peripheral Interface 603, the Peripheral Interface 603 Reservoir 601.

I/O subsystem 609, the I/O subsystem 609 can be by the input/output peripherals in equipment, such as touch screen 612 With other input/control devicess 610, it is connected to Peripheral Interface 603.I/O subsystem 609 may include 6091 He of display controller For controlling one or more input controllers 6092 of other input/control devicess 610.Wherein, one or more input controls Device 6092 processed receives electric signal from other input/control devicess 610 or sends electric signal to other input/control devicess 610, Other input/control devicess 610 may include physical button (push button, rocker buttons etc.), dial, slide switch, behaviour Vertical pole clicks idler wheel.It is worth noting that input controller 6092 can with it is following any one connect: keyboard, infrared port, The indicating equipment of USB interface and such as mouse.

Touch screen 612, the touch screen 612 are the input interface and output interface between customer mobile terminal and user, Visual output is shown to user, visual output may include figure, text, icon, video etc..

Display controller 6091 in I/O subsystem 609 receives electric signal from touch screen 612 or sends out to touch screen 612 Electric signals.Touch screen 612 detects the contact on touch screen, and the contact that display controller 6091 will test is converted to and is shown The interaction of user interface object on touch screen 612, i.e. realization human-computer interaction, the user interface being shown on touch screen 612 Object can be the icon of running game, the icon for being networked to corresponding network etc..It is worth noting that equipment can also include light Mouse, light mouse are the extensions for the touch sensitive surface for not showing the touch sensitive surface visually exported, or formed by touch screen.

RF circuit 605 is mainly used for establishing the communication of mobile phone Yu wireless network (i.e. network side), realizes mobile phone and wireless network The data receiver of network and transmission.Such as transmitting-receiving short message, Email etc..Specifically, RF circuit 605 receives and sends RF letter Number, RF signal is also referred to as electromagnetic signal, and RF circuit 605 converts electrical signals to electromagnetic signal or electromagnetic signal is converted to telecommunications Number, and communicated by the electromagnetic signal with communication network and other equipment.RF circuit 605 may include for executing The known circuit of these functions comprising but it is not limited to antenna system, RF transceiver, one or more amplifiers, tuner, one A or multiple oscillators, digital signal processor, CODEC (COder-DECoder, coder) chipset, user identifier mould Block (Subscriber Identity Module, SIM) etc..

Voicefrequency circuit 606 is mainly used for receiving audio data from Peripheral Interface 603, which is converted to telecommunications Number, and the electric signal is sent to loudspeaker 611.

Loudspeaker 611 is reduced to sound for mobile phone to be passed through RF circuit 605 from the received voice signal of wireless network And the sound is played to user.

Power management chip 608, the hardware for being connected by CPU602, I/O subsystem and Peripheral Interface are powered And power management.

The application, which can be performed, in voice noise reduction device, storage medium and the mobile terminal provided in above-described embodiment arbitrarily implements Voice de-noising method provided by example has and executes the corresponding functional module of this method and beneficial effect.Not in above-described embodiment In detailed description technical detail, reference can be made to voice de-noising method provided by the application any embodiment.

Note that above are only the preferred embodiment and institute's application technology principle of the application.It will be appreciated by those skilled in the art that The application is not limited to specific embodiment described here, be able to carry out for a person skilled in the art it is various it is apparent variation, The protection scope readjusted and substituted without departing from the application.Therefore, although being carried out by above embodiments to the application It is described in further detail, but the application is not limited only to above embodiments, in the case where not departing from the application design, also It may include more other equivalent embodiments, and scope of the present application is determined by the scope of the appended claims.

Claims

1. a kind of voice de-noising method characterized by comprising

Receive the voice signal acquired based on earphone single microphone；

The initial amplitude spectrum and phase spectrum for extracting the voice signal, are transmitted to noise reduction trained in advance for initial amplitude spectrum In model, the first noise reduction amplitude spectrum is obtained；

2. the method according to claim 1, wherein the noise reduction model is deep neural network model or life An accepted way of doing sth fights network model.

3. being given birth to the method according to claim 1, wherein carrying out masking processing to the first noise reduction amplitude spectrum At the second noise reduction amplitude spectrum, comprising:

On in the range value of each frequency point of current demand signal frame in the first noise reduction amplitude spectrum and the second noise reduction amplitude spectrum The range value of the correspondence frequency point of one signal frame is smoothed, and generates the second noise reduction amplitude spectrum of current demand signal frame.

4. according to the method described in claim 3, it is characterized in that, by the first noise reduction amplitude spectrum current demand signal frame it is each The range value of frequency point is smoothed with the range value of the corresponding frequency point of a upper signal frame in the second noise reduction amplitude spectrum, full The following formula of foot:

Wherein, masking factor λ (m, k) meets following formula:

And

Wherein,For the second noise reduction amplitude spectrum,For the first noise reduction amplitude spectrum, m is the frame number of voice signal, and k is frequency point, and σ is Standard deviation.

5. the method according to claim 1, wherein the initial amplitude spectrum and phase spectrum of the voice signal are based on Short Time Fourier Transform generation is carried out to the voice signal；

The reducing noise of voice based on to after recombination the second noise reduction amplitude spectrum and the phase spectrum carry out inverse Fourier transform in short-term It generates.

6. the method according to claim 1, wherein the method also includes:

The call voice signal for receiving downlink transmission carries out noise estimation to the call voice signal；

When determining that the call voice signal includes noise signal according to noise estimation, by the amplitude spectrum of the call voice signal It is input in noise reduction model trained in advance, obtains third noise reduction amplitude spectrum；

The second reducing noise of voice is generated according to third noise reduction amplitude spectrum, and second reducing noise of voice is played out.

7. according to the method described in claim 6, it is characterized in that, it is described to the call voice signal carry out noise estimation, Include:

The call voice signal is compared with preset template signal, calculates the call voice signal and preset mould The difference power of partitioned signal determines whether the call voice signal includes noise signal according to the difference power；Alternatively,

The call voice signal is input in the differentiation submodel of production confrontation network model, according to the differentiation submodule The output result of type determines whether the call voice signal includes noise signal, wherein the differentiation submodel is used for defeated Enter voice and carries out noise estimation.

8. a kind of voice noise reduction device characterized by comprising

First noise reduction amplitude spectrum generation module, the initial amplitude for extracting the voice signal is composed and phase spectrum, will be described first Beginning amplitude spectrum is transmitted in noise reduction model trained in advance, obtains the first noise reduction amplitude spectrum；

Second noise reduction amplitude spectrum generation module generates the second noise reduction for carrying out masking processing to the first noise reduction amplitude spectrum Amplitude spectrum；

First reducing noise of voice generation module generates for recombinating to the second noise reduction amplitude spectrum and the phase spectrum One reducing noise of voice.

9. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the program is held by processor The voice de-noising method as described in any in claim 1-7 is realized when row.

10. a kind of mobile terminal, which is characterized in that including memory, processor and storage are on a memory and can be in processor The computer program of operation, the processor realize language as claimed in claim 1 when executing the computer program Sound noise-reduction method.