CN108806708A - Voice de-noising method based on Computational auditory scene analysis and generation confrontation network model - Google Patents
Voice de-noising method based on Computational auditory scene analysis and generation confrontation network model Download PDFInfo
- Publication number
- CN108806708A CN108806708A CN201810606145.4A CN201810606145A CN108806708A CN 108806708 A CN108806708 A CN 108806708A CN 201810606145 A CN201810606145 A CN 201810606145A CN 108806708 A CN108806708 A CN 108806708A
- Authority
- CN
- China
- Prior art keywords
- voice
- arbiter
- scene analysis
- auditory scene
- confrontation network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000004458 analytical method Methods 0.000 title claims abstract description 23
- 238000012549 training Methods 0.000 claims description 20
- 230000009467 reduction Effects 0.000 claims description 11
- 230000008569 process Effects 0.000 claims description 10
- 230000000873 masking effect Effects 0.000 claims description 7
- 238000011946 reduction process Methods 0.000 claims description 4
- 241000208340 Araliaceae Species 0.000 claims 1
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 claims 1
- 235000003140 Panax quinquefolius Nutrition 0.000 claims 1
- 235000008434 ginseng Nutrition 0.000 claims 1
- 238000013507 mapping Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 3
- 230000004913 activation Effects 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 241001269238 Data Species 0.000 description 1
- 208000001613 Gambling Diseases 0.000 description 1
- 230000003416 augmentation Effects 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 239000011248 coating agent Substances 0.000 description 1
- 238000000576 coating method Methods 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000000354 decomposition reaction Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 238000011478 gradient descent method Methods 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 230000002035 prolonged effect Effects 0.000 description 1
- 230000000644 propagated effect Effects 0.000 description 1
- 239000004576 sand Substances 0.000 description 1
- 238000004088 simulation Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 238000000844 transformation Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
- G10L21/0308—Voice signal separating characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Quality & Reliability (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Soundproofing, Sound Blocking, And Sound Damping (AREA)
Abstract
The present invention relates to a kind of based on Computational auditory scene analysis and generates the voice de-noising method of confrontation network model, including:Step 1, noisy speech is handled based on the generator and arbiter for generating confrontation network, obtains intermediate result;Step 2, the intermediate result is handled based on Computational auditory scene analysis method, obtains final result.The present invention can remove the partial noise in voice signal acquired under Complex Channel background environment, and preferably phonological component can be kept not distort.
Description
Technical field
The present invention relates to a kind of voice de-noising methods, more particularly to one kind based on Computational auditory scene analysis and generating confrontation
The voice de-noising method of network model.
Background technology
Voice is the most important means that the mankind mutually transmit information, one section of voice bearer intention of speaker, identity, feelings
The abundant information such as thread.Voice signal can be propagated by various kinds of media such as empty gas and water, radio.Voice signal is passing
During broadcasting, or due to the limitation of collecting device, usually it can all be interfered by various noises.Especially in certain professions
In, extraneous noise is inevitable, and in many cases, and noise type is complicated, intensity is larger.This noise like
Serious influence can be caused on subsequent voice signal processing, such as the accuracy of speech recognition can be reduced.In addition, if passing through people
The mode of work handles the voice data of this Noise, and prolonged work can cause to damage to the auditory system of people.
Invention content
The purpose of the present invention is to provide a kind of based on Computational auditory scene analysis and generates the voice of confrontation network model
Noise-reduction method to remove the partial noise in voice signal acquired under Complex Channel background environment, and keeps phonological component
It does not distort.
The present invention provides a kind of based on Computational auditory scene analysis and generates the voice de-noising method of confrontation network model,
Including:
Step 1, noisy speech is handled based on the generator and arbiter for generating confrontation network, obtains intermediate knot
Fruit;
Step 2, intermediate result is handled based on Computational auditory scene analysis method, obtains final result.
Further, in step 1, the training process for generating confrontation network includes:
1) noisy data and clean data are inputted into arbiter, enables arbiter be judged as differing, passes through backpropagation
Mode adjusts the network parameter of arbiter;
2) noisy data input generator is subjected to noise reduction process, is exported as a result, defeated together with noisy data later
Enter arbiter, enable arbiter be judged as identical, the network parameter of arbiter is adjusted by way of backpropagation;
3) fixing step 2) in the obtained network parameter of arbiter, the net of generator is adjusted by way of backpropagation
Network parameter, target are that generator is made to be judged as differing.
Further, step 2 includes:
Using the intermediate result as the input of Computational auditory scene analysis, masking estimation is carried out to input signal, according to
Estimated result synthesizes the intermediate result again, obtains the voice data after noise reduction.
Compared with prior art the beneficial effects of the invention are as follows:
The partial noise in voice signal acquired under Complex Channel background environment can be removed, and can preferably be kept
Phonological component does not distort.
Description of the drawings
Fig. 1 is the flow the present invention is based on Computational auditory scene analysis and the voice de-noising method for generating confrontation network model
Figure;
Fig. 2 is the network structure of generator;
Fig. 3 is the network training process figure for generating confrontation network.
Specific implementation mode
The present invention is described in detail for each embodiment shown in below in conjunction with the accompanying drawings, but it should explanation, these
Embodiment is not limitation of the present invention, those of ordinary skill in the art according to function, method made by these embodiments,
Or the equivalent transformation in structure or replacement, all belong to the scope of protection of the present invention within.
It present embodiments provides a kind of based on Computational auditory scene analysis (Computational auditory scene
Analysis, CASA) and generate confrontation network (Generative adversarial networks, GAN) model voice drop
Method for de-noising, including:
Step 1, based on the generator (Generator) and arbiter (Discriminator) for generating confrontation network to containing
Voice of making an uproar is handled, and intermediate result is obtained;
Step 2, intermediate result is handled based on Computational auditory scene analysis method, obtains final result.
Voice de-noising method provided in this embodiment based on Computational auditory scene analysis and generation confrontation network model, energy
The partial noise in voice signal acquired under Complex Channel background environment is enough removed, and can preferably keep phonological component not
It distorts.
The training process of generation confrontation network includes in step 1:
1) noisy data and clean data are inputted into arbiter, enables arbiter be judged as differing, passes through backpropagation
Mode adjusts the network parameter of arbiter;
2) noisy data input generator is subjected to noise reduction process, is exported as a result, defeated together with noisy data later
Enter arbiter, enable arbiter be judged as identical, the network parameter of arbiter is adjusted by way of backpropagation;
3) fixing step 2) in the obtained network parameter of arbiter, the net of generator is adjusted by way of backpropagation
Network parameter, target are that generator is made to be judged as differing.
In the present embodiment, step 2 includes:
Using the intermediate result as the input of Computational auditory scene analysis, masking estimation is carried out to input signal, according to
Estimated result synthesizes the intermediate result again, obtains the voice data after noise reduction.
Invention is further described in detail below.
Noise-reduction method of the present embodiment based on Computational auditory scene analysis and generation confrontation network carry out voice de-noising, CASA
It is with ideal two-value masking (ideal binary mask, IBM) or ideal floating value masking (ideal ratio mask, IRM)
Target is calculated, converts voice de-noising problem to parameter Estimation and two-value classification problem, GAN is differentiated by a generator and one
Device forms, and network training process simulates zero-sum two-person game, is optimized respectively to the parameter of generator and arbiter, training mesh
Mark is that a kind of effective mapping model between truthful data to training data is arrived in study.As shown in Figure 1, y (n) is for sample rate
fsLength is the noisy speech of n, and after GAN is handled, the intermediate result of output isAfter being handled using CASA, finally
Result be x (n).In the present embodiment, the sample rate f of all voice datasIt is unified for 16kHz.Below to the noise reduction based on GAN
And the noise reduction part based on CASA is described in detail.
1, the noise reduction based on GAN
The essence of GAN networks is the zero-sum game between generator and arbiter.By being adjusted in continuous gambling process
Parameter is saved, network gradually learns to the mapping relations between specific data, and can ensure that can be by this after training
Mapping relations handle completely new data.For the voice de-noising problem under Complex Channel background environment, GAN networks need to learn
Practise be y (n) withBetween mapping relations.
The GAN network structures that this method is proposed using Pascual et al., wherein generator G is as final progress noise reduction
Network, that is, complete from y (n) toProcess, arbiter D only the training stage be used to G carry out game training, testing
Stage can be removed completely.
The network structure of G is as shown in Figure 2, and structure is similar with autocoder, by the encoder of lower half portion in Fig. 2
It is constituted with decoder two parts in top half.Such composition is so that network has feature end to end, the input of network
It is the voice signal of similar length with output, avoids complicated characteristic extraction procedure.The network knot of encoder and decoder
Structure is identical, but the arrangement mode of several network layers is different, and the two is at symmetric relation.It is that full convolution connects between layers, this
So that dense layer is not present in network, and temporarily closely to be associated with during network attention input signal and whole levels
Relationship is further able to reduce the quantity of training parameter.
G is made of 22 one-dimensional trapezoidal convolutional layers, and every layer of filter width is 31, step-length 2.The number of filter is successively
It is incremented by, width successively successively decreases.Every layer of output dimension be hits × characteristic pattern, respectively 16384 × 1,8192 × 16,4096
× 32,2048 × 32,1024 × 64,512 × 64,256 × 128,128 × 128,64 × 256,32 × 256,16 × 512,8 ×
1024.Decoder network and encoder network filter width having the same and filter quantity.G networks are in addition between levels
Connection outside, in fact also corresponding with the decoder layer connection of each coding layer shunts the compression process among model, i.e.,
Now jump connection.In this way, the details of low level can be removed so that speech waveform is more properly rebuild.Jump connection will
The information that fine processing is crossed in voice is directly delivered to decoding stage, and can solve gradient disperse to a certain extent and ask
Topic so that gradient can be transmitted deeper in back-propagation process in network model.
The network structure of arbiter D is similar to the coded portion of G, is one-dimensional convolutional coding structure and the convolution with sorter network
Topological structure is identical.But difference lies in:1) input of D is two channels, and each channel is 16384 sampled points;2) exist
Before LeakyReLU activation primitives, using virtually batch norm, and α=0.3;3) it is one one after the last one active coating
Convolutional layer is tieed up, and filter width is 1, it in this way will not be down-sampled to hiding activation primitive progress.Therefore, the parameter number of full articulamentum
Amount reduces the method merged to 8,1024 channels from 8 × 1024=8192 and can learn to arrive by deconvolution parameter.
The training process of network is as shown in Figure 3, and wherein y indicates the training data of Noise,It indicates corresponding to be free of
The data of noise,It indicates by G treated data.The training voice data that this method uses is Valentini et al. public affairs
Trainset partial datas in the database opened include 11572 clean speech from 28 people.This method is by above-mentioned
Clean speech data add the mode of Gaussian noise to simulate the voice data under Complex Channel background condition, and in order to simulate
The complexity of real noise, adds the noise of different signal-to-noise ratio, and concrete condition is as shown in table 1.Noise it can be seen from table
The data accounting of relatively high (40dB) and noise relatively low (20dB) is less, and signal-to-noise ratio is the data accounting highest of 30dB, in this way
Design be for the noise situations under preferably simulation of real scenes.
1 training data of table adds noise situations
Every 100 of training data is a batch, to every batch of training data, set trained rate as 0.0002, training process
Including following three steps:
1) noisy data y and clean dataD is inputted, enables D be judged as differing and (being labeled as 1), passes through backpropagation
Mode adjusts the network parameter of D;
2) noisy data y inputs G is subjected to noise reduction process, is exportedD is inputted together with y later, D is enabled to be judged as phase
With (being labeled as 0), the network parameter of D is adjusted by way of backpropagation;
3) network parameter of the D obtained in fixed previous step, adjusts the network parameter of G, mesh by way of backpropagation
It is designated as that D is made to be judged as differing and (being labeled as 1).
It is an epoch that whole training samples, which traverse one time,.After 86 epoch, terminate training process, Zhi Hougu
Determine the network parameter of G, and in this, as final noise reduction network.
2, the noise reduction based on CASA
Y (n) is after the processing of G, resultInput as CASA.As shown in fig. 1, it carries out first to input
The masking of signal is estimated, then according to estimated result pairIt is synthesized again, finally obtains the voice data x (n) after noise reduction.
Assuming thatIt is made of pure voice s (n) and noise signal l (n), i.e.,
Time-frequency representation Y ∈ Rm×nIt can be decomposed into sparse speech items S and low-rank noise item L, i.e.,
Y=S+L (2)
Above formula can be solved by the method for RPCA:
In view of the physical meaning of spectrogram, two after decomposition should be non-negative, therefore have:
But above-mentioned Model Condition is excessively harsh, therefore introduces dense error term:
Y=S+L+E (5)
By introducing auxiliary variable L+And S+, formula (4) can be rewritten as:
Its augmentation Lagrange's equation is:
In formula, ΩY, ΩSAnd ΩLFor extended binary variable, ρ is scale parameter.
Object function in formula (7) can divide, therefore the solution of ADMM algorithms may be used.All variables in formula (7)
Can alternately it be updated respectively under ADMM frames by solving corresponding word problem.In two auxiliary variables and three binary
Under the constraints of variable, LρIt can be minimized and be solved by gradient descent method.
Input signalIt can be analyzed to sparse item, low-rank item as stated above after gammatone filtering transformations
With dense item three parts.Then, the masking estimation that IBM and IRM is performed as follows can be obtained:
In this way, noisy speech signalPoint of realization voice and noise can be synthesized again by sheltering weighted sum on frequency spectrum
Solution, to achieve the purpose that noise reduction.
The present invention is directed to the noise-reduction method of voice data under Complex Channel background environment, it can be achieved that under the conditions of to Complex Noise
The decrease of noise functions of the voice data got, while also preferably phonological component can be kept not distort.This method can be used as
Intercept the skills such as artificial speech recognition under environment, automatic speech recognition, Application on Voiceprint Recognition, voice keyword detection, speech emotional analysis
The preprocessing part of art, play the role of reduce noise jamming, improve identification or Detection accuracy, can be applied to information obtain with
The military fields such as analysis can also be applied to the civil fields such as big data analysis.
The series of detailed descriptions listed above only for the present invention feasible embodiment specifically
Bright, they are all without departing from equivalent implementations made by technical spirit of the present invention not to limit the scope of the invention
Or change should all be included in the protection scope of the present invention.
It is obvious to a person skilled in the art that invention is not limited to the details of the above exemplary embodiments, Er Qie
In the case of without departing substantially from spirit or essential attributes of the invention, the present invention can be realized in other specific forms.Therefore, no matter
From the point of view of which point, the present embodiments are to be considered as illustrative and not restrictive, and the scope of the present invention is by appended power
Profit requires rather than above description limits, it is intended that all by what is fallen within the meaning and scope of the equivalent requirements of the claims
Variation is included within the present invention.
Claims (3)
1. a kind of voice de-noising method based on Computational auditory scene analysis and generation confrontation network model, which is characterized in that packet
It includes:
Step 1, noisy speech is handled based on the generator and arbiter for generating confrontation network, obtains intermediate result;
Step 2, the intermediate result is handled based on Computational auditory scene analysis method, obtains final result.
2. the voice de-noising side according to claim 1 based on Computational auditory scene analysis and generation confrontation network model
Method, which is characterized in that in step 1, the training process for generating confrontation network includes:
1) noisy data and clean data are inputted into arbiter, enables arbiter be judged as differing, by way of backpropagation
Adjust the network parameter of arbiter;
2) noisy data input generator is subjected to noise reduction process, is exported and is sentenced as a result, being inputted together with noisy data later
Other device, enables arbiter be judged as identical, and the network parameter of arbiter is adjusted by way of backpropagation;
3) fixing step 2) in the obtained network parameter of arbiter, the network ginseng of generator is adjusted by way of backpropagation
Number, target are that generator is made to be judged as differing.
3. the voice de-noising side according to claim 2 based on Computational auditory scene analysis and generation confrontation network model
Method, which is characterized in that the step 2 includes:
Using the intermediate result as the input of Computational auditory scene analysis, masking estimation is carried out to input signal, according to estimation
As a result the intermediate result is synthesized again, obtains the voice data after noise reduction.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810606145.4A CN108806708A (en) | 2018-06-13 | 2018-06-13 | Voice de-noising method based on Computational auditory scene analysis and generation confrontation network model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810606145.4A CN108806708A (en) | 2018-06-13 | 2018-06-13 | Voice de-noising method based on Computational auditory scene analysis and generation confrontation network model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108806708A true CN108806708A (en) | 2018-11-13 |
Family
ID=64085675
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810606145.4A Pending CN108806708A (en) | 2018-06-13 | 2018-06-13 | Voice de-noising method based on Computational auditory scene analysis and generation confrontation network model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108806708A (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110310650A (en) * | 2019-04-08 | 2019-10-08 | 清华大学 | A kind of voice enhancement algorithm based on second-order differential microphone array |
CN110363751A (en) * | 2019-07-01 | 2019-10-22 | 浙江大学 | A kind of big enteroscope polyp detection method based on generation collaborative network |
CN110503976A (en) * | 2019-08-15 | 2019-11-26 | 广州华多网络科技有限公司 | Audio separation method, device, electronic equipment and storage medium |
CN110751960A (en) * | 2019-10-16 | 2020-02-04 | 北京网众共创科技有限公司 | Method and device for determining noise data |
CN110751958A (en) * | 2019-09-25 | 2020-02-04 | 电子科技大学 | Noise reduction method based on RCED network |
CN111383651A (en) * | 2018-12-29 | 2020-07-07 | Tcl集团股份有限公司 | Voice noise reduction method and device and terminal equipment |
CN111583954A (en) * | 2020-05-12 | 2020-08-25 | 中国人民解放军国防科技大学 | Speaker independent single-channel voice separation method |
CN111933187A (en) * | 2020-09-21 | 2020-11-13 | 深圳追一科技有限公司 | Emotion recognition model training method and device, computer equipment and storage medium |
CN112133293A (en) * | 2019-11-04 | 2020-12-25 | 重庆邮电大学 | Phrase voice sample compensation method based on generation countermeasure network and storage medium |
CN112259068A (en) * | 2020-10-21 | 2021-01-22 | 上海协格空调工程有限公司 | Active noise reduction air conditioning system and noise reduction control method thereof |
CN112466320A (en) * | 2020-12-12 | 2021-03-09 | 中国人民解放军战略支援部队信息工程大学 | Underwater acoustic signal noise reduction method based on generation countermeasure network |
CN112487914A (en) * | 2020-11-25 | 2021-03-12 | 山东省人工智能研究院 | ECG noise reduction method based on deep convolution generation countermeasure network |
CN113096673A (en) * | 2021-03-30 | 2021-07-09 | 山东省计算中心(国家超级计算济南中心) | Voice processing method and system based on generation countermeasure network |
CN113160844A (en) * | 2021-04-27 | 2021-07-23 | 山东省计算中心(国家超级计算济南中心) | Speech enhancement method and system based on noise background classification |
CN113409377A (en) * | 2021-06-23 | 2021-09-17 | 四川大学 | Phase unwrapping method for generating countermeasure network based on jump connection |
CN115392325A (en) * | 2022-10-26 | 2022-11-25 | 中国人民解放军国防科技大学 | Multi-feature noise reduction modulation identification method based on cycleGan |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120010881A1 (en) * | 2010-07-12 | 2012-01-12 | Carlos Avendano | Monaural Noise Suppression Based on Computational Auditory Scene Analysis |
CN102890930A (en) * | 2011-07-19 | 2013-01-23 | 上海上大海润信息系统有限公司 | Speech emotion recognizing method based on hidden Markov model (HMM) / self-organizing feature map neural network (SOFMNN) hybrid model |
CN104064196A (en) * | 2014-06-20 | 2014-09-24 | 哈尔滨工业大学深圳研究生院 | Method for improving speech recognition accuracy on basis of voice leading end noise elimination |
CN104538043A (en) * | 2015-01-16 | 2015-04-22 | 北京邮电大学 | Real-time emotion reminder for call |
US9215527B1 (en) * | 2009-12-14 | 2015-12-15 | Cirrus Logic, Inc. | Multi-band integrated speech separating microphone array processor with adaptive beamforming |
CN107452405A (en) * | 2017-08-16 | 2017-12-08 | 北京易真学思教育科技有限公司 | A kind of method and device that data evaluation is carried out according to voice content |
CN107452389A (en) * | 2017-07-20 | 2017-12-08 | 大象声科(深圳)科技有限公司 | A kind of general monophonic real-time noise-reducing method |
CN107563428A (en) * | 2017-08-25 | 2018-01-09 | 西安电子科技大学 | Classification of Polarimetric SAR Image method based on generation confrontation network |
CN107845389A (en) * | 2017-12-21 | 2018-03-27 | 北京工业大学 | A kind of sound enhancement method based on multiresolution sense of hearing cepstrum coefficient and depth convolutional neural networks |
CN107945811A (en) * | 2017-10-23 | 2018-04-20 | 北京大学 | A kind of production towards bandspreading resists network training method and audio coding, coding/decoding method |
-
2018
- 2018-06-13 CN CN201810606145.4A patent/CN108806708A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9215527B1 (en) * | 2009-12-14 | 2015-12-15 | Cirrus Logic, Inc. | Multi-band integrated speech separating microphone array processor with adaptive beamforming |
US20120010881A1 (en) * | 2010-07-12 | 2012-01-12 | Carlos Avendano | Monaural Noise Suppression Based on Computational Auditory Scene Analysis |
CN102890930A (en) * | 2011-07-19 | 2013-01-23 | 上海上大海润信息系统有限公司 | Speech emotion recognizing method based on hidden Markov model (HMM) / self-organizing feature map neural network (SOFMNN) hybrid model |
CN104064196A (en) * | 2014-06-20 | 2014-09-24 | 哈尔滨工业大学深圳研究生院 | Method for improving speech recognition accuracy on basis of voice leading end noise elimination |
CN104538043A (en) * | 2015-01-16 | 2015-04-22 | 北京邮电大学 | Real-time emotion reminder for call |
CN107452389A (en) * | 2017-07-20 | 2017-12-08 | 大象声科(深圳)科技有限公司 | A kind of general monophonic real-time noise-reducing method |
CN107452405A (en) * | 2017-08-16 | 2017-12-08 | 北京易真学思教育科技有限公司 | A kind of method and device that data evaluation is carried out according to voice content |
CN107563428A (en) * | 2017-08-25 | 2018-01-09 | 西安电子科技大学 | Classification of Polarimetric SAR Image method based on generation confrontation network |
CN107945811A (en) * | 2017-10-23 | 2018-04-20 | 北京大学 | A kind of production towards bandspreading resists network training method and audio coding, coding/decoding method |
CN107845389A (en) * | 2017-12-21 | 2018-03-27 | 北京工业大学 | A kind of sound enhancement method based on multiresolution sense of hearing cepstrum coefficient and depth convolutional neural networks |
Non-Patent Citations (4)
Title |
---|
PASCUAL S .ETC: "SEGAN: speech enhancement generative adversarial network", 《ARXIV PREPRINT》 * |
PASCUAL S .ETC: "SEGAN: speech enhancement generative adversarial network", 《ARXIV PREPRINT》, 30 June 2017 (2017-06-30), pages 1 - 5 * |
陈龙 等: "面向无线电侦听的语音降噪方法", 《电声技术》 * |
陈龙 等: "面向无线电侦听的语音降噪方法", 《电声技术》, vol. 42, no. 4, 30 April 2018 (2018-04-30), pages 25 - 30 * |
Cited By (26)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111383651A (en) * | 2018-12-29 | 2020-07-07 | Tcl集团股份有限公司 | Voice noise reduction method and device and terminal equipment |
CN110310650A (en) * | 2019-04-08 | 2019-10-08 | 清华大学 | A kind of voice enhancement algorithm based on second-order differential microphone array |
CN110363751B (en) * | 2019-07-01 | 2021-08-03 | 浙江大学 | Large intestine endoscope polyp detection method based on generation cooperative network |
CN110363751A (en) * | 2019-07-01 | 2019-10-22 | 浙江大学 | A kind of big enteroscope polyp detection method based on generation collaborative network |
CN110503976A (en) * | 2019-08-15 | 2019-11-26 | 广州华多网络科技有限公司 | Audio separation method, device, electronic equipment and storage medium |
CN110503976B (en) * | 2019-08-15 | 2021-11-23 | 广州方硅信息技术有限公司 | Audio separation method and device, electronic equipment and storage medium |
CN110751958A (en) * | 2019-09-25 | 2020-02-04 | 电子科技大学 | Noise reduction method based on RCED network |
CN110751960A (en) * | 2019-10-16 | 2020-02-04 | 北京网众共创科技有限公司 | Method and device for determining noise data |
CN110751960B (en) * | 2019-10-16 | 2022-04-26 | 北京网众共创科技有限公司 | Method and device for determining noise data |
CN112133293A (en) * | 2019-11-04 | 2020-12-25 | 重庆邮电大学 | Phrase voice sample compensation method based on generation countermeasure network and storage medium |
CN111583954A (en) * | 2020-05-12 | 2020-08-25 | 中国人民解放军国防科技大学 | Speaker independent single-channel voice separation method |
CN111933187B (en) * | 2020-09-21 | 2021-02-05 | 深圳追一科技有限公司 | Emotion recognition model training method and device, computer equipment and storage medium |
CN111933187A (en) * | 2020-09-21 | 2020-11-13 | 深圳追一科技有限公司 | Emotion recognition model training method and device, computer equipment and storage medium |
CN112259068B (en) * | 2020-10-21 | 2023-04-11 | 上海协格空调工程有限公司 | Active noise reduction air conditioning system and noise reduction control method thereof |
CN112259068A (en) * | 2020-10-21 | 2021-01-22 | 上海协格空调工程有限公司 | Active noise reduction air conditioning system and noise reduction control method thereof |
CN112487914A (en) * | 2020-11-25 | 2021-03-12 | 山东省人工智能研究院 | ECG noise reduction method based on deep convolution generation countermeasure network |
CN112487914B (en) * | 2020-11-25 | 2021-08-31 | 山东省人工智能研究院 | ECG noise reduction method based on deep convolution generation countermeasure network |
CN112466320A (en) * | 2020-12-12 | 2021-03-09 | 中国人民解放军战略支援部队信息工程大学 | Underwater acoustic signal noise reduction method based on generation countermeasure network |
CN112466320B (en) * | 2020-12-12 | 2023-11-10 | 中国人民解放军战略支援部队信息工程大学 | Underwater sound signal noise reduction method based on generation countermeasure network |
CN113096673A (en) * | 2021-03-30 | 2021-07-09 | 山东省计算中心(国家超级计算济南中心) | Voice processing method and system based on generation countermeasure network |
CN113096673B (en) * | 2021-03-30 | 2022-09-30 | 山东省计算中心(国家超级计算济南中心) | Voice processing method and system based on generation countermeasure network |
CN113160844A (en) * | 2021-04-27 | 2021-07-23 | 山东省计算中心(国家超级计算济南中心) | Speech enhancement method and system based on noise background classification |
CN113409377A (en) * | 2021-06-23 | 2021-09-17 | 四川大学 | Phase unwrapping method for generating countermeasure network based on jump connection |
CN113409377B (en) * | 2021-06-23 | 2022-09-27 | 四川大学 | Phase unwrapping method for generating countermeasure network based on jump connection |
CN115392325A (en) * | 2022-10-26 | 2022-11-25 | 中国人民解放军国防科技大学 | Multi-feature noise reduction modulation identification method based on cycleGan |
CN115392325B (en) * | 2022-10-26 | 2023-08-18 | 中国人民解放军国防科技大学 | Multi-feature noise reduction modulation identification method based on CycleGan |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108806708A (en) | Voice de-noising method based on Computational auditory scene analysis and generation confrontation network model | |
CN110619885B (en) | Method for generating confrontation network voice enhancement based on deep complete convolution neural network | |
Braun et al. | A curriculum learning method for improved noise robustness in automatic speech recognition | |
CN109036465B (en) | Speech emotion recognition method | |
DE112015004785B4 (en) | Method for converting a noisy signal into an enhanced audio signal | |
CN109524020B (en) | Speech enhancement processing method | |
CN108172238A (en) | A kind of voice enhancement algorithm based on multiple convolutional neural networks in speech recognition system | |
CN105047194B (en) | A kind of self study sound spectrograph feature extracting method for speech emotion recognition | |
CN107845389A (en) | A kind of sound enhancement method based on multiresolution sense of hearing cepstrum coefficient and depth convolutional neural networks | |
Shah et al. | Time-frequency mask-based speech enhancement using convolutional generative adversarial network | |
CN106653056A (en) | Fundamental frequency extraction model based on LSTM recurrent neural network and training method thereof | |
CN109559736A (en) | A kind of film performer's automatic dubbing method based on confrontation network | |
CN112802491B (en) | Voice enhancement method for generating confrontation network based on time-frequency domain | |
Hui et al. | Convolutional maxout neural networks for speech separation | |
CN109890043A (en) | A kind of wireless signal noise-reduction method based on production confrontation network | |
CN107967920A (en) | A kind of improved own coding neutral net voice enhancement algorithm | |
CN111429947A (en) | Speech emotion recognition method based on multi-stage residual convolutional neural network | |
CN114428234A (en) | Radar high-resolution range profile noise reduction identification method based on GAN and self-attention | |
CN110102051A (en) | The plug-in detection method and device of game | |
CN107516065A (en) | The sophisticated signal denoising method of empirical mode decomposition combination dictionary learning | |
CN106204482A (en) | Based on the mixed noise minimizing technology that weighting is sparse | |
CN114863938B (en) | Method and system for identifying bird language based on attention residual error and feature fusion | |
Zöhrer et al. | Representation learning for single-channel source separation and bandwidth extension | |
CN114283829A (en) | Voice enhancement method based on dynamic gate control convolution cyclic network | |
Nair et al. | Mfcc based noise reduction in asr using kalman filtering |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181113 |
|
RJ01 | Rejection of invention patent application after publication |