CN110534123A

CN110534123A - Sound enhancement method, device, storage medium, electronic equipment

Info

Publication number: CN110534123A
Application number: CN201910663257.8A
Authority: CN
Inventors: 李晨星; 许家铭; 徐波
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2019-07-22
Filing date: 2019-07-22
Publication date: 2019-12-03
Anticipated expiration: 2039-07-22
Also published as: CN110534123B

Abstract

The present embodiments relate to a kind of sound enhancement method, device, storage medium, electronic equipments, which comprises calls voice capture device, acquires the voice in current environment；According to preset speech processing algorithm, the voice is handled, single-channel voice is obtained；Punctuate cutting is carried out to the single-channel voice, obtains the voice segment data flow comprising preset kind sound；The voice segment data flow is inputted in preset speech enhan-cement network model, enhancing voice corresponding with the voice segment data flow is obtained；It is voice segments by the enhancing speech synthesis.Thus, it is possible to realize the application of more scenes, the influence of noise is avoided, it is contemplated that characteristics of speech sounds avoids introducing distortion, to avoid causing to damage to voice.

Description

Sound enhancement method, device, storage medium, electronic equipment

Technical field

The present embodiments relate to computerized information technology for automatically treating field more particularly to a kind of sound enhancement method, Device, storage medium, electronic equipment.

Background technique

Voice, i.e. the substance shell of language, are the external forms of language, are the most directly symbols of the thinking activities of recorder Number system is that user carries out that information exchange is most natural, one of most effective means.User is while obtaining voice signal, no The evitable interference that will receive ambient noise, RMR room reverb and other users has seriously affected voice quality, and then has influenced The performance of speech recognition, speech enhan-cement comes into being since then.Speech enhan-cement is to inhibit interference, prompt as preposition processing mode A kind of effective way of far field phonetic recognization rate.

Speech enhan-cement, refer to when voice signal by various noise jammings, even flood after, mentioned from noise background Useful voice signal is taken, the technology of noise jamming is inhibited, reduces.In short, it is extracted from noisy speech as pure as possible Raw tone.

In the related technology, traditional sound enhancement method mainly has spectrum-subtraction, Wiener filtering and based on least mean-square error Short-time spectra amplitude Enhancement Method.Although traditional sound enhancement method has, speed is fast, does not need large-scale training corpus etc. Advantage, but these methods depend greatly on the estimation of noise, and the applicable scene of these methods is few, fails to consider Characteristics of speech sounds inevitably introduces distortion, causes to damage to voice.

Summary of the invention

In consideration of it, to solve above-mentioned technical problem or partial technical problems, the embodiment of the invention provides a kind of increasings of voice Strong method, device, storage medium, electronic equipment.

In a first aspect, the embodiment of the invention provides a kind of Speech enhancement methods, which comprises

Voice capture device is called, the voice in current environment is acquired；

According to preset speech processing algorithm, the voice is handled, single-channel voice is obtained；

Punctuate cutting is carried out to the single-channel voice, obtains the voice segment data flow comprising preset kind sound；

The voice segment data flow is inputted in preset speech enhan-cement network model, is obtained and the voice segment number According to the corresponding enhancing voice of stream；

It is voice segments by the enhancing speech synthesis.

It is in a possible embodiment, described that the voice is handled according to preset speech processing algorithm, Obtain single-channel voice, comprising:

The voice is converted by A/D, is sampled according to preset sample rate, obtains single-channel voice.

In a possible embodiment, described that punctuate cutting is carried out to the single-channel voice, it obtains comprising default The voice segment data flow of type sound, comprising:

Punctuate cutting is carried out to the voice in the single-channel voice in preset threshold range；

For any one frame voice in the single-channel voice in preset threshold range, the nerve pre-established is utilized Network model is detected whether comprising preset kind sound；

If the frame voice includes preset kind sound, retain the frame voice；

All speech frames comprising preset kind sound are combined, the voice segment data comprising preset kind sound are obtained Stream.

In a possible embodiment, the method also includes:

If the frame voice does not include preset kind sound, the frame voice is filtered.

Second aspect, the embodiment of the present invention provide a kind of speech sound enhancement device, and described device includes:

Voice acquisition module acquires the voice in current environment for calling voice capture device；

Speech processing module, for handling the voice, obtaining single channel according to preset speech processing algorithm Voice；

Phonetic segmentation module is obtained for carrying out punctuate cutting to the single-channel voice comprising preset kind sound Voice segment data flow；

Speech enhan-cement module is obtained for inputting the voice segment data flow in preset speech enhan-cement network model To enhancing voice corresponding with the voice segment data flow；

Voice synthetic module, for being voice segments by the enhancing speech synthesis.

In a possible embodiment, the speech processing module is specifically used for:

In a possible embodiment, the phonetic segmentation module is specifically used for:

If the frame voice includes preset kind sound, retain the frame voice；

In a possible embodiment, described device further include:

Voice filtering module filters the frame voice if not including preset kind sound for the frame voice.

The third aspect, the embodiment of the present invention provide a kind of storage medium, and the storage medium is stored with one or more Program, one or more of programs can be executed by one or more processor, to realize sound enhancement method above-mentioned.

Fourth aspect, the embodiment of the present invention provide a kind of electronic equipment, comprising: processor and memory, the processor For executing the speech enhan-cement program stored in the memory, to realize sound enhancement method above-mentioned.

Technical solution provided in an embodiment of the present invention obtains single-channel voice by being handled voice, to single channel Voice carries out punctuate cutting and obtains the voice segment data flow comprising preset kind sound, voice segment data flow is inputted default Speech enhan-cement network model in, avoid the influence of noise, it is contemplated that characteristics of speech sounds is avoided introducing and is distorted, to avoid pair Voice causes to damage, and such available enhancing voice synthesizes the enhancing voice and obtains voice segments, answering for more scenes may be implemented With.

Detailed description of the invention

In order to illustrate more clearly of this specification embodiment or technical solution in the prior art, below will to embodiment or Attached drawing needed to be used in the description of the prior art is briefly described, it should be apparent that, the accompanying drawings in the following description is only The some embodiments recorded in this specification embodiment for those of ordinary skill in the art can also be attached according to these Figure obtains other attached drawings.

Fig. 1 is the implementation process diagram of the sound enhancement method of the embodiment of the present invention；

Fig. 2 is the structural schematic diagram of the speech sound enhancement device of the embodiment of the present invention；

Fig. 3 is the structural schematic diagram of the electronic equipment of the embodiment of the present invention.

Specific embodiment

In order to make the object, technical scheme and advantages of the embodiment of the invention clearer, below in conjunction with the embodiment of the present invention In attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is A part of the embodiment of the present invention, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art Every other embodiment obtained without making creative work, shall fall within the protection scope of the present invention.

In order to facilitate understanding of embodiments of the present invention, it is further explained below in conjunction with attached drawing with specific embodiment Bright, embodiment does not constitute the restriction to the embodiment of the present invention.

As shown in Figure 1, being a kind of implementation process diagram of sound enhancement method provided in an embodiment of the present invention, this method It can specifically include following steps:

S101 calls voice capture device, acquires the voice in current environment.

In embodiments of the present invention, it for current environment, can be in the acoustic enviroment that far field, band are made an uproar, the present invention is real It applies example and this is not construed as limiting.

In current environment, voice capture device, such as microphone are called, voice is acquired, carries target in the voice and use Noise in original voice and current environment at family can be other in current environment and use for the noise in current environment The voice at family can be music in current environment, impact sound etc., relative to original voice of target user, all other sound Noise can be considered as, the embodiment of the present invention is not construed as limiting this.

S102 handles the voice, obtains single-channel voice according to preset speech processing algorithm.

For collected language in above-mentioned steps S101, is handled according to preset speech processing algorithm, obtain list Channel speech provides a kind of optional implementation here and is handled according to preset speech processing algorithm:

The voice is converted by A/D, is sampled according to preset sample rate, obtains single-channel voice.Wherein, right In A/D, the circuit for converting analog signals into digital signal, referred to as analog-digital converter are referred to.

For example, calling the language in microphone acquisition current environment, voice is converted by A/D, according to 16000 sample rates It is sampled, obtains the single-channel voice of 16000 sample rates.

S103 carries out punctuate cutting to the single-channel voice, obtains the voice segment data comprising preset kind sound Stream；

One neural network model of training in advance, the neural network model is for detecting whether every frame voice includes default class Type sound here presets at original voice that type sound refers to target user；

Punctuate cutting is carried out to the voice in the single-channel voice in preset threshold range, for the single channel language Any one frame voice in sound in preset threshold range is detected whether using the neural network model pre-established comprising default Type sound；

If the frame voice includes preset kind sound, retain the frame voice；If the frame voice does not include default class Type sound then filters the frame voice；It can so be crossed and be filtered out comprising target user's by the neural network model pre-established Other speech frames except original voice, can leave the speech frame comprising preset kind sound；

The voice segment data flow is inputted in preset speech enhan-cement network model, is obtained and the voice by S104 The corresponding enhancing voice of piecewise data flow；

Speech enhan-cement network model enhances voice segment data flow end to end, obtains enhancing voice, the voice Enhancing network model input is the voice segment data flow comprising preset kind sound, is exported to enhance voice.

In embodiments of the present invention, speech enhan-cement network model is the multiple dimensioned time domain speech based on air cock control convolutional network Enhance model, can specifically include coder module, enhances module, decoder module.

Coder module: noise waveform is encoded to intermediate features space.Wherein, input section is by one-dimensional convolutional neural networks Be converted to high dimensional feature representation.

Enhancing module: the high dimensional feature expression after coding is operated, it includes three operating process: Analysis On Multi-scale Features Extraction, convolution block and multi-scale feature fusion.

Multi resolution feature extraction: these features are extracted and merged using different size of gate convolution operation parallel.Tool For body, feature is extracted using one-dimensional gate convolution operation, in fact, the feature extraction of different scale is by different IPs size Gate convolutional network realize, later after the network output result splicing by these with different IPs, normalized by layer Method after feature normalization to exporting.

Convolution block: it is made of several convolution blocks.In each piece, using the full convolution net based on convolution network Network.It in each piece, repeats convolution algorithm R times, while constantly promoting the convolutional network coefficient of expansion, extend receptive field.Pass through expansion Open up receptive field, information when network can capture long.

Multi-scale feature fusion: the convolutional neural networks of different levels export different types of feature, as low layer texture is (shallow Layer) and semantic clues (deep layer).These features are different the contribution of final task.Specifically, in the embodiment of the present invention In not instead of not directly using the output of the last layer as final output, extract the output of each convolution block, and by they It is fused to the final output of model.The output characteristics of each convolution block indicates the details of different levels.It is established for each piece One connection.Transmit the information of different masses in the training process.This process is known as feature transmission.From other layers The benefit of information be unknown.Useful information is screened using door control mechanism, controls information flow.Specifically, step by step High-level characteristic is transferred to shallow-layer by ground.

Decoder module: the inverse process of coding module.Character representation is decoded as speech samples by it.Specifically, it uses One-dimensional transposition convolution realizes decoding process.

The enhancing speech synthesis is voice segments by S105.

Voice segment data flow is handled by above-mentioned speech enhan-cement network model, obtains enhancing voice, it will be described Enhancing speech synthesis is voice segments.

Sound enhancement method in the embodiment of the present invention constructs the more rulers efficiently based on air cock control convolutional network Spending time domain speech enhances model, captures voice signal using the multiple dimensioned time domain speech enhancing model based on air cock control convolutional network Timing information；Door control mechanism is integrated into the enhancing model of the multiple dimensioned time domain speech based on air cock control convolutional network, base is made It can learn the character representation of different levels in the multiple dimensioned time domain speech enhancing model of air cock control convolutional network；It is not selection The output of the last layer is exported as final output by the characteristic pattern of fusion different depth, in different depth Foundation between layers connection, in this way the information that deep layer is acquired can be for delivery to shallow-layer.Another door control mechanism is used for Screen useful information.

In order to verify the validity of the sound enhancement method in the embodiment of the present invention, one is constructed first based on air cock control The multiple dimensioned time domain speech of convolutional network enhances model, and the output of the last one convolution block selects convolution as final output Block number is 3, and the convolutional network coefficient of expansion is experimental configuration as 6.On this basis, multi-scale feature fusion and spy are gradually increased Sign transmission.

From experimental result as can be seen that the multiple dimensioned time domain speech enhancing model based on air cock control convolutional network can be effective Enhance voice, is transmitted by being stepped up Fusion Features and feature, further improve the performance of model.Be based on convolution Model compare, the final mask in the embodiment of the present invention (objective in short-term can in PESQ (evaluation of voice quality consciousness) and STOI Degree of understanding) on obtain respectively 0.12 and 0.01 performance boost.In addition, compared with noisy voice, in the embodiment of the present invention Performance of the model on PESQ and STOI 0.43 and 0.123 has been respectively increased.

Convolution block number is 4, and the experimental configuration that the convolutional network coefficient of expansion is 8 has best performance.With noisy language Sound is compared, and the best model in the embodiment of the present invention realizes 0.54 and 0.125 performance improvement respectively on PESQ and STOI. Enhancing model in the embodiment of the present invention can not only effectively enhance noise speech, and performance is better than other baseline systems. The performance of multiple dimensioned time domain speech enhancing model based on air cock control convolutional network is better than the system based on frequency domain and based on circulation The system of neural network.By extending receptive field, the multiple dimensioned time domain speech enhancing model based on air cock control convolutional network can be with Capture long-term dependence.Especially in terms of STOI, significant performance improvement is achieved.This shows through end-to-end training, Multiple dimensioned time domain speech enhancing model based on air cock control convolutional network can more accurately enhance and estimate voice.

By the above-mentioned description to technical solution provided in an embodiment of the present invention, single-pass is obtained by being handled voice Road voice carries out punctuate cutting to single-channel voice and obtains the voice segment data flow comprising preset kind sound, by voice point Segment data stream inputs in preset speech enhan-cement network model, avoids the influence of noise, it is contemplated that characteristics of speech sounds avoids introducing Distortion, to avoid causing to damage to voice, such available enhancing voice synthesizes the enhancing voice and obtains voice segments, can To realize the application of more scenes.

Relative to embodiment of the method, the embodiment of the invention also provides a kind of embodiments of speech sound enhancement device, such as Fig. 2 institute Show, the apparatus may include: voice acquisition module 210, speech processing module 220, phonetic segmentation module 230, speech enhan-cement mould Block 240, voice synthetic module 250.

Voice acquisition module 210 calls voice capture device for voice collecting, acquires the voice in current environment；

Speech processing module 220, for handling the voice, obtaining list according to preset speech processing algorithm Channel speech；

Phonetic segmentation module 230 is obtained for carrying out punctuate cutting to the single-channel voice comprising preset kind sound Voice segment data flow；

Speech enhan-cement module 240, for the voice segment data flow to be inputted in preset speech enhan-cement network model, Obtain enhancing voice corresponding with the voice segment data flow；

Voice synthetic module 250, for being voice segments by the enhancing speech synthesis.

A kind of specific embodiment provided according to the present invention, the speech processing module 220 are specifically used for: by institute's predicate Sound is converted by A/D, is sampled according to preset sample rate, obtains single-channel voice.

A kind of specific embodiment provided according to the present invention, the phonetic segmentation module 230 are specifically used for: to the list Voice in channel speech in preset threshold range carries out punctuate cutting；For in the single-channel voice in preset threshold model Interior any one frame voice is enclosed, is detected whether using the neural network model pre-established comprising preset kind sound；If should Frame voice includes preset kind sound, then retains the frame voice；All speech frames comprising preset kind sound are combined, are wrapped The voice segment data flow of the sound containing preset kind.

A kind of specific embodiment provided according to the present invention, described device further include:

Voice filtering module 260 filters the frame voice if not including preset kind sound for the frame voice.

Fig. 3 is the structural schematic diagram of the electronic equipment of one kind provided in an embodiment of the present invention, electronic equipment shown in Fig. 3 300 include: at least one processor 301, memory 302, at least one network interface 304 and other users interface 303.It is mobile Various components in terminal 300 are coupled by bus system 305.It is understood that bus system 305 is for realizing these groups Connection communication between part.Bus system 305 further includes power bus, control bus and state in addition to including data/address bus Signal bus.But for the sake of clear explanation, various buses are all designated as bus system 305 in Fig. 3.

Wherein, user interface 303 may include display, keyboard or pointing device (for example, mouse, trace ball (trackball), touch-sensitive plate or touch screen etc..

It is appreciated that the memory 302 in the embodiment of the present invention can be volatile memory or nonvolatile memory, It or may include both volatile and non-volatile memories.Wherein, nonvolatile memory can be read-only memory (Read- OnlyMemory, ROM), programmable read only memory (ProgrammableROM, PROM), Erasable Programmable Read Only Memory EPROM (ErasablePROM, EPROM), electrically erasable programmable read-only memory (ElectricallyEPROM, EEPROM) dodge It deposits.Volatile memory can be random access memory (RandomAccessMemory, RAM), and it is slow to be used as external high speed It deposits.By exemplary but be not restricted explanation, the RAM of many forms is available, such as static random access memory (StaticRAM, SRAM), dynamic random access memory (DynamicRAM, DRAM), Synchronous Dynamic Random Access Memory (SynchronousDRAM, SDRAM), double data speed synchronous dynamic RAM (DoubleDataRate SDRAM, DDRSDRAM), enhanced Synchronous Dynamic Random Access Memory (Enhanced SDRAM, ESDRAM), synchronized links Dynamic random access memory (SynchlinkDRAM, SLDRAM) and direct rambus random access memory (DirectRambusRAM, DRRAM).Memory 302 described herein is intended to include but is not limited to these to be suitble to any other The memory of type.

In some embodiments, memory 302 stores following element, and unit or data structure can be performed, or Their subset of person or their superset: operating system 3021 and application program 3022.

Wherein, operating system 3021 include various system programs, such as ccf layer, core library layer, driving layer etc., are used for Realize various basic businesses and the hardware based task of processing.Application program 3022 includes various application programs, such as media Player (MediaPlayer), browser (Browser) etc., for realizing various applied business.Realize embodiment of the present invention side The program of method may be embodied in application program 3022.

In embodiments of the present invention, by the program or instruction of calling memory 302 to store, specifically, can be application The program or instruction stored in program 3022, processor 301 are used to execute method and step provided by each method embodiment, such as Include:

Voice capture device is called, the voice in current environment is acquired；According to preset speech processing algorithm, to institute's predicate Sound is handled, and single-channel voice is obtained；Punctuate cutting is carried out to the single-channel voice, is obtained comprising preset kind sound Voice segment data flow；The voice segment data flow is inputted in preset speech enhan-cement network model, is obtained and institute's predicate The corresponding enhancing voice of sound piecewise data flow；It is voice segments by the enhancing speech synthesis.

The method that the embodiments of the present invention disclose can be applied in processor 301, or be realized by processor 301. Processor 301 may be a kind of IC chip, the processing capacity with signal.During realization, the above method it is each Step can be completed by the integrated logic circuit of the hardware in processor 301 or the instruction of software form.Above-mentioned processing Device 301 can be general processor, digital signal processor (DigitalSignalProcessor, DSP), specific integrated circuit (ApplicationSpecific IntegratedCircuit, ASIC), ready-made programmable gate array (FieldProgrammableGateArray, FPGA) either other programmable logic device, discrete gate or transistor logic Device, discrete hardware components.It may be implemented or execute disclosed each method, step and the logical box in the embodiment of the present invention Figure.General processor can be microprocessor or the processor is also possible to any conventional processor etc..In conjunction with the present invention The step of method disclosed in embodiment, can be embodied directly in hardware decoding processor and execute completion, or use decoding processor In hardware and software unit combination execute completion.Software unit can be located at random access memory, and flash memory, read-only memory can In the storage medium of this fields such as program read-only memory or electrically erasable programmable memory, register maturation.The storage Medium is located at memory 302, and processor 301 reads the information in memory 302, and the step of the above method is completed in conjunction with its hardware Suddenly.

It is understood that embodiments described herein can with hardware, software, firmware, middleware, microcode or its Combination is to realize.For hardware realization, processing unit be may be implemented in one or more specific integrated circuit (Application SpecificIntegratedCircuits, ASIC), digital signal processor (DigitalSignalProcessing, DSP), Digital signal processing appts (DSPDevice, DSPD), programmable logic device (ProgrammableLogicDevice, PLD), Field programmable gate array (Field-ProgrammableGateArray, FPGA), general processor, controller, microcontroller In device, microprocessor, other electronic units for executing herein described function or combinations thereof.

For software implementations, the techniques described herein can be realized by executing the unit of function described herein.Software generation Code is storable in memory and is executed by processor.Memory can in the processor or portion realizes outside the processor.

Electronic equipment provided in this embodiment can be electronic equipment as shown in Figure 3, and voice as shown in figure 1 can be performed and increase All steps of strong method, and then realize the technical effect of sound enhancement method shown in Fig. 1, Fig. 1 associated description is specifically please referred to, Succinctly to describe, therefore not to repeat here.

The embodiment of the invention also provides a kind of storage medium (computer readable storage mediums).Here storage medium is deposited Contain one or more program.Wherein, storage medium may include volatile memory, such as random access memory；It deposits Reservoir also may include nonvolatile memory, such as read-only memory, flash memory, hard disk or solid state hard disk；Memory It can also include the combination of the memory of mentioned kind.

It is above-mentioned in language to realize when one or more program can be executed by one or more processor in storage medium Sound enhances the sound enhancement method that equipment side executes.

The processor is following in speech enhancement apparatus to realize for executing the speech enhan-cement program stored in memory The step of sound enhancement method that side executes:

Professional should further appreciate that, described in conjunction with the examples disclosed in the embodiments of the present disclosure Unit and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, hard in order to clearly demonstrate The interchangeability of part and software generally describes each exemplary composition and step according to function in the above description. These functions are implemented in hardware or software actually, the specific application and design constraint depending on technical solution. Professional technician can use different methods to achieve the described function each specific application, but this realization It should not be considered as beyond the scope of the present invention.

The step of method described in conjunction with the examples disclosed in this document or algorithm, can be executed with hardware, processor The combination of software module or the two is implemented.Software module can be placed in random access memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technical field In any other form of storage medium well known to interior.

Above-described specific embodiment has carried out further the purpose of the present invention, technical scheme and beneficial effects It is described in detail, it should be understood that being not intended to limit the present invention the foregoing is merely a specific embodiment of the invention Protection scope, all within the spirits and principles of the present invention, any modification, equivalent substitution, improvement and etc. done should all include Within protection scope of the present invention.

Claims

1. a kind of sound enhancement method, which is characterized in that the described method includes:

Voice capture device is called, the voice in current environment is acquired；

The voice segment data flow is inputted in preset speech enhan-cement network model, is obtained and the voice segment data flow Corresponding enhancing voice；

It is voice segments by the enhancing speech synthesis.

2. the method according to claim 1, wherein described according to preset speech processing algorithm, to institute's predicate Sound is handled, and single-channel voice is obtained, comprising:

3. being obtained the method according to claim 1, wherein described carry out punctuate cutting to the single-channel voice To the voice segment data flow comprising preset kind sound, comprising:

For any one frame voice in the single-channel voice in preset threshold range, the neural network pre-established is utilized Whether model inspection includes preset kind sound；

If the frame voice includes preset kind sound, retain the frame voice；

All speech frames comprising preset kind sound are combined, the voice segment data flow comprising preset kind sound is obtained.

4. according to the method described in claim 3, it is characterized in that, the method also includes:

5. a kind of speech sound enhancement device, which is characterized in that described device includes:

Speech processing module, for handling the voice, obtaining single channel language according to preset speech processing algorithm Sound；

Phonetic segmentation module obtains the voice comprising preset kind sound for carrying out punctuate cutting to the single-channel voice Piecewise data flow；

Speech enhan-cement module, for the voice segment data flow to be inputted in preset speech enhan-cement network model, obtain with The corresponding enhancing voice of the voice segment data flow；

6. device according to claim 5, which is characterized in that the speech processing module is specifically used for:

7. device according to claim 5, which is characterized in that the phonetic segmentation module is specifically used for:

If the frame voice includes preset kind sound, retain the frame voice；

8. device according to claim 7, which is characterized in that described device further include:

9. a kind of electronic equipment characterized by comprising processor and memory, the processor is for executing the storage The speech enhan-cement program stored in device, to realize sound enhancement method according to any one of claims 1 to 4.

10. a kind of storage medium, which is characterized in that the storage medium is stored with one or more program, it is one or The multiple programs of person can be executed by one or more processor, to realize speech enhan-cement according to any one of claims 1 to 4 Method.