CN109326302A - A kind of sound enhancement method comparing and generate confrontation network based on vocal print - Google Patents

A kind of sound enhancement method comparing and generate confrontation network based on vocal print Download PDF

Info

Publication number
CN109326302A
CN109326302A CN201811353760.5A CN201811353760A CN109326302A CN 109326302 A CN109326302 A CN 109326302A CN 201811353760 A CN201811353760 A CN 201811353760A CN 109326302 A CN109326302 A CN 109326302A
Authority
CN
China
Prior art keywords
sound
generator
audio
discriminator
clean
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811353760.5A
Other languages
Chinese (zh)
Other versions
CN109326302B (en
Inventor
钟艳如
张家豪
赵帅杰
李芳�
蓝如师
罗笑南
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guilin University of Electronic Technology
Original Assignee
Guilin University of Electronic Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guilin University of Electronic Technology filed Critical Guilin University of Electronic Technology
Priority to CN201811353760.5A priority Critical patent/CN109326302B/en
Publication of CN109326302A publication Critical patent/CN109326302A/en
Application granted granted Critical
Publication of CN109326302B publication Critical patent/CN109326302B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0264Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques

Abstract

The present invention disclose it is a kind of based on vocal print compare and generate confrontation network sound enhancement method, 1) establish three speech databases, respectively correspond Application on Voiceprint Recognition encoder, noise separation system and speech Separation system;2) training Application on Voiceprint Recognition encoder extracts the vocal print feature of target speaker, obtains target vocal print feature;3) sound spectrograph will be converted to noise frequency to be sent into the generator in noise separation system, must predict clean audio;4) it will predict that clean audio and true clean audio are sent into the discriminator training in noise separation system;5) assessor weight parameter is adjusted, so that assessor is preferably told true clean audio and predicts the difference of clean audio, obtain generating the generator of almost true clean audio;6) sound of speaker is sent into trained generator, generates and predicts clean sound spectrograph, the voice signal enhanced.This method small scale, calculate it is low, be easy to transplant, keep certain space-invariance and denoising effect is good.

Description

A kind of sound enhancement method comparing and generate confrontation network based on vocal print
Technical field
The present invention relates to speech enhancement technique field, specifically a kind of voice for comparing and generating confrontation network based on vocal print Enhancement Method.
Background technique
With the development of society, electronic product is universal, requirement of the people to voice quality is higher and higher.How electricity is improved Mobile communication quality of the sub- product under noisy environment has become instantly most popular research direction.And speech enhan-cement can mention The quality and comprehensibility of voice under high-noise environment, speech enhan-cement not only have in hearing aid and artificial cochlea field important Using, and it has been successfully applied to the pretreatment stage in speech recognition and Speaker Recognition System.
The method of classical speech enhan-cement has spectrum-subtraction, Wiener filtering, the method based on statistical model and Subspace algorithm. Since the eighties, neural network is also applied to speech enhan-cement.In recent years, denoising had been widely adopted from coding scheme.Example Such as, circulation denoising shows in the processing to audio signal contextual information good from coding.Nearest shot and long term memory network It is applied to denoising task.Although these above-mentioned methods can obtain good effect, but need a large amount of data and Calculation amount, it is difficult to be transplanted to embedded device.Moreover, these methods tend to rely on training set, the clean audio of output is base The average value of clean audio is exported in training set, can relatively be obscured, the processing to details is simultaneously not fully up to expectations.
Summary of the invention
It is an object of the invention to overcome the deficiencies in the prior art, and provide a kind of based on vocal print comparison and generation confrontation net The sound enhancement method of network, this method small scale, calculating is lower, is easy to transplant, keeps certain space-invariance and denoise effect Fruit is good.
Realizing the technical solution of the object of the invention is:
A kind of sound enhancement method being compared and generated confrontation network based on vocal print, includes the following steps:
1) three speech databases are established, Application on Voiceprint Recognition encoder, noise separation system and speech Separation system are respectively corresponded System;
2) training Application on Voiceprint Recognition encoder extracts the vocal print feature of target speaker, obtains target vocal print feature;
3) audio that band is made an uproar is converted to sound spectrograph and is sent into the generator in noise separation system, and generator is according to vocal print The target vocal print feature that identification encoder extracts isolates the sound of target speaker, obtains predicting clean audio;
4) clean audio feeding true in the clean audio of prediction and step 1) speech Separation system that step 3) obtains is made an uproar Discriminator in sound separation system is trained, and the sound for making discriminator tell speaker is generated by noise separation system Prediction sound spectrograph whether meet the distribution of realAudio;
5) adjust assessor weight parameter, make assessor preferably tell true clean audio and generator generate it is pre- The difference for surveying clean audio distinguishes the weight parameter that result updates generator according to discriminator, until life can not be identified in discriminator Grow up to be a useful person generation prediction audio and true clean audio difference, obtain can produce the generation of almost true clean audio Device;
6) it is trained to be converted to sound spectrograph feeding through Short Time Fourier Transform for the sound that speaker is collected by microphone In good generator, generates and predict clean sound spectrograph, then voice analog signal is converted to by anti-Short Time Fourier Transform, voice Analog signal plays back to arrive the voice signal of enhancing through loudspeaker.
The Application on Voiceprint Recognition encoder of answering is 2000NISI Speaker Recongnition Evaluation voice Answer Application on Voiceprint Recognition encoder in library;The noise separation system is the noise separation system in 100-nonspeech noise library;Institute The speech Separation system stated is the speech Separation system of TIMIT sound bank.
In step 2), the Application on Voiceprint Recognition encoder extracts the vocal print feature of target speaker, specifically: audio is believed Number the frame that width is 25ms and step-length is 10ms is converted to, every frame is filtered by mel filter, and is mentioned from result The energy spectrum having a size of 40 is taken to construct the sliding window of regular length on these frames as network inputs, and in each window Then shot and long term memory network last frame is exported the vocal print feature as the sliding window by upper operation shot and long term memory network (d-vector) it indicates.
The generator is by one 8 layers of convolutional network, one 1 layer of shot and long term memory recirculating network and one 2 The fully-connected network composition of layer, every layer is all used Relu activation primitive, and the last layer fully-connected network activates letter using sigmoid Number, for the sound spectrograph of input signal after convolutional layer, the vocal print feature (d-vector) of reference audio can be spliced to frame by frame volume In the output of lamination, shot and long term being inputted together and remembers layer, finally, the output of network is one section identical with input sound spectrograph dimension Output masking is multiplied with input sound spectrograph, the clean audio spectrogram of prediction of output audio can be obtained by mask (mask)
The discriminator is made of, every layer one 2 layers of convolutional networks and one 2 layers of full Connection Neural Network Relu activation primitive is all used, the last layer fully-connected network uses sigmoid activation primitive, and generator is clean by the prediction of generation Audio spectrogramIt is sent into discriminator, then clean audio X true in step 1) is sent into discriminator, trains discriminator neural network, The clean audio spectrogram of the prediction that discriminator generates generatorIt is determined as that false data gives low point (close to 0), to step 1) In true clean audio X be determined as that truthful data awards high marks (close to 1), learn point of truthful data and prediction data with this Whether cloth, the sound for making discriminator tell speaker in step 6) are accorded with by noise separation system prediction sound spectrograph generated Close the distribution of realAudio.
The adjustment assessor weight parameter, specifically by the message transmission of true falseness to generator, generator tune The parameter of whole network model corrects the sound spectrograph of its output, makes it closer to true distribution, and elimination is authenticated device and is determined as False noise signal, even if the clean sound spectrograph of prediction that generator generates" can out-trick " discriminator, and discriminator determines life Grow up to be a useful person generation the clean sound spectrograph of prediction be the true clean audio obtained in the TIMIT sound bank sound spectrograph X, in nerve net During network backpropagation, discriminator can preferably tell the clean sound of prediction of true clean audio and generator generation The difference of frequency, that is, preferably find the feature of true clean audio;Likewise, generator also can be with the mirror of continuous renewal Other device, adjusts its parameter, and the prediction sound spectrograph for generating it is mobile towards true clean audio sound spectrograph.
The generator, discriminator, mutual game are confronted with each other, and confrontation network algorithm is generated, and algorithmic formula is as follows:
It disappears to solve the problems, such as that classical way faces gradient, confrontation network (the least- is generated using least square Squares GAN) it replaces intersecting entropy loss (the cross-entropy loss), then:
In above-mentioned formula, G indicates generator (Generator), and D indicates discriminator (Discriminator), and V represents damage Mistake value, data indicate that the sound bank of true clean audio in step 1) speech Separation system, x indicate to extract in data true Clean speech audio, noise indicate the band noise frequency sound bank in step 1) speech Separation system, and n expression is extracted from noise Band noise frequency corresponding with x, G (n) indicate that generator carries out denoising to noisy speech, obtain predicting clean audioD(G (n)) indicate discriminator to the clean audio of predictionIt carries out being determined as that false number gives low point (close to 0), to true clean audio X It is determined as that truthful data awards high marks (close to 1).
A kind of sound enhancement method comparing and generate confrontation network based on vocal print provided by the invention, this method scale It is small, calculate it is lower, be easy to transplant, keep certain space-invariance and denoising effect is good.
Detailed description of the invention
Fig. 1 is the flow chart of the method for the present invention;
Fig. 2 is the Application on Voiceprint Recognition encoder schematic diagram in the present invention;
Fig. 3 is the generator schematic diagram in the present invention;
Fig. 4 is the discriminator schematic diagram in the present invention.
Specific embodiment
The present invention is further elaborated with reference to the accompanying drawings and examples, but is not limitation of the invention.
Embodiment:
As shown in Figure 1, a kind of sound enhancement method for being compared and being generated confrontation network based on vocal print, includes the following steps:
1) three speech databases are established, Application on Voiceprint Recognition encoder, noise separation system and speech Separation system are respectively corresponded System;
2) training Application on Voiceprint Recognition encoder extracts the vocal print feature of target speaker, obtains target vocal print feature;
3) audio that band is made an uproar is converted to sound spectrograph and is sent into the generator in noise separation system, and generator is according to vocal print The target vocal print feature that identification encoder extracts isolates the sound of target speaker, obtains predicting clean audio;
4) clean audio feeding true in the clean audio of prediction and step 1) speech Separation system that step 3) obtains is made an uproar Discriminator in sound separation system is trained, and the sound for making discriminator tell speaker is generated by noise separation system Prediction sound spectrograph whether meet the distribution of realAudio;
5) adjust assessor weight parameter, make assessor preferably tell true clean audio and generator generate it is pre- The difference for surveying clean audio distinguishes the weight parameter that result updates generator according to discriminator, until life can not be identified in discriminator Grow up to be a useful person generation prediction audio and true clean audio difference, obtain can produce the generation of almost true clean audio Device;
6) it is trained to be converted to sound spectrograph feeding through Short Time Fourier Transform for the sound that speaker is collected by microphone In good generator, generates and predict clean sound spectrograph, then voice analog signal is converted to by anti-Short Time Fourier Transform, voice Analog signal plays back to arrive the voice signal of enhancing through loudspeaker.
The Application on Voiceprint Recognition encoder of answering is 2000NISI Speaker Recongnition Evaluation voice Answer Application on Voiceprint Recognition encoder in library;The noise separation system is the noise separation system in 100-nonspeech noise library;Institute The speech Separation system stated is the speech Separation system of TIMIT sound bank.
2000NISI Speaker Recongnition Evaluation sound bank is that vocal print feature is extracted in paper most Common data set is usually directly known as " CALLHOME " in the literature, it includes dialect in 500, is distributed in 6 in language: Ah Draw primary language, English, German, Japanese, mandarin, Spanish;
TIMIT sound bank is common by Texas Instrument (TI), the Massachusetts Institute of Technology (MIT) and Stanford Research Institute (SRI) The acoustics of acquisition-phoneme continuous speech corpus includes 6300 sentences, by every from regional 630 people of 8, U.S. main dialect People says given 10 sentences, and all sentences have carried out manual segmentation, label all in phone-level, and according to the ratio of 7:3 Data set is divided into training set (70%) and test set (30%);
100-nonspeech noise library is inhuman noise sound in 100 collected by Guo Ning tiger team.
Use 2000NISI Speaker Recongnition Evaluation as first database training vocal print It identifies encoder, it is made to can be very good to extract the vocal print feature (d-vector) of speaker.Secondly, needing triple database Train entire noise separation system, input: 1. the clean audio from target speaker 2. 3. say from target by band noise frequency The reference audio of words person;Clean audio is selected from TIMIT sound bank and is made an uproar from noise according to different signal-to-noise ratio (SNR) anamorphic zone Audio is finally removed in target speaker and randomly selects a reference audio composition triple data in used clean audio, As the second database.
In step 2), the Application on Voiceprint Recognition encoder extracts the vocal print feature of target speaker, as shown in Fig. 2, specifically It is: converts audio signals into the frame that width is 25ms and step-length is 10ms, every frame is filtered by mel filter, and And the energy spectrum having a size of 40 is extracted from result as network inputs, the sliding window of regular length is constructed on these frames, And shot and long term memory network is run on each window, then it regard the output of shot and long term memory network last frame as the sliding window Vocal print feature (d-vector) indicate.
As shown in figure 3, the generator, is by one 8 layers of convolutional network, one 1 layer of shot and long term memory circulation Network and one 2 layers of fully-connected network composition, every layer is all used Relu activation primitive, and the last layer fully-connected network uses Sigmoid activation primitive, for the sound spectrograph of input signal after convolutional layer, the vocal print feature (d-vector) of reference audio can quilt It is spliced in the output of convolutional layer frame by frame, inputs shot and long term together and remember layer, finally, the output of network is one section and input language spectrum Output masking is multiplied by the identical mask of figure dimension (mask) with input sound spectrograph, and the prediction that output audio can be obtained is clean Audio spectrogram
As shown in figure 4, the discriminator, is by one 2 layers of convolutional network and one 2 layers of full Connection Neural Network Composition, every layer is all used Relu activation primitive, and the last layer fully-connected network uses sigmoid activation primitive, and generator will generate The clean audio spectrogram of predictionIt is sent into discriminator, then clean audio X true in step 1) is sent into discriminator, training discriminator Neural network, the clean audio spectrogram of the prediction that discriminator generates generatorIt is determined as that false data gives low point (close to 0), Truthful data, which awards high marks (close to 1), to be determined as to clean audio X true in step 1), truthful data and prediction number are learnt with this According to distribution, so that discriminator is told the sound of speaker in step 6) and pass through noise separation system prediction sound spectrograph generated Whether the distribution of realAudio is met.
The adjustment assessor weight parameter, specifically by the message transmission of true falseness to generator, generator tune The parameter of whole network model corrects the sound spectrograph of its output, makes it closer to true distribution, and elimination is authenticated device and is determined as False noise signal, even if the clean sound spectrograph of prediction that generator generates" can out-trick " discriminator, and discriminator determines life Grow up to be a useful person generation the clean sound spectrograph of prediction be the true clean audio obtained in the TIMIT sound bank sound spectrograph X, in nerve net During network backpropagation, discriminator can preferably tell the clean sound of prediction of true clean audio and generator generation The difference of frequency, that is, preferably find the feature of true clean audio;Likewise, generator also can be with the mirror of continuous renewal Other device, adjusts its parameter, and the prediction sound spectrograph for generating it is mobile towards true clean audio sound spectrograph.
The generator, discriminator, mutual game are confronted with each other, and confrontation network algorithm is generated, and algorithmic formula is as follows:
It disappears to solve the problems, such as that classical way faces gradient, confrontation network (theleast- is generated using least square Squares GAN) it replaces intersecting entropy loss (the cross-entropy loss), then:
In above-mentioned formula, G indicates generator (Generator), and D indicates discriminator (Discriminator), and V represents damage Mistake value, data indicate that the sound bank of true clean audio in step 1) speech Separation system, x indicate to extract in data true Clean speech audio, noise indicate the band noise frequency sound bank in step 1) speech Separation system, and n expression is extracted from noise Band noise frequency corresponding with x, G (n) indicate that generator carries out denoising to noisy speech, obtain predicting clean audioD(G (n)) indicate discriminator to the clean audio of predictionIt carries out being determined as that false number gives low point (close to 0), to true clean audio X It is determined as that truthful data awards high marks (close to 1).

Claims (7)

1. a kind of sound enhancement method for comparing and generating confrontation network based on vocal print, which comprises the steps of:
1) three speech databases are established, Application on Voiceprint Recognition encoder, noise separation system and speech Separation system are respectively corresponded;
2) training Application on Voiceprint Recognition encoder extracts the vocal print feature of target speaker, obtains target vocal print feature;
3) audio that band is made an uproar is converted to sound spectrograph and is sent into the generator in noise separation system, and generator is according to Application on Voiceprint Recognition The target vocal print feature that encoder extracts isolates the sound of target speaker, obtains predicting clean audio;
4) true clean audio is sent into noise point in the clean audio of the prediction obtained step 3) and step 1) speech Separation system It is trained from the discriminator in system, the sound for making discriminator tell speaker is generated pre- by noise separation system Survey the distribution whether sound spectrograph meets realAudio;
5) assessor weight parameter is adjusted, so that assessor is preferably told the prediction that really clean audio and generator generate dry The difference of net audio distinguishes the weight parameter that result updates generator according to discriminator, until generator can not be identified in discriminator The difference of the prediction audio of generation and true clean audio obtains can produce the generator of almost true clean audio;
6) it is trained to be converted to sound spectrograph feeding through Short Time Fourier Transform for the sound that speaker is collected by microphone In generator, generates and predict clean sound spectrograph, then voice analog signal, speech simulation are converted to by anti-Short Time Fourier Transform Signal plays back to arrive the voice signal of enhancing through loudspeaker.
2. a kind of sound enhancement method for comparing and generating confrontation network based on vocal print according to claim 1, feature It is, the Application on Voiceprint Recognition encoder of answering is 2000NISI Speaker Recongnition Evaluation sound bank Answer Application on Voiceprint Recognition encoder;The noise separation system is the noise separation system in 100-nonspeech noise library;Described Speech Separation system is the speech Separation system of TIMIT sound bank.
3. a kind of sound enhancement method for comparing and generating confrontation network based on vocal print according to claim 1, feature It is, in step 2), the Application on Voiceprint Recognition encoder extracts the vocal print feature of target speaker, specifically: by audio signal The frame that width is 25ms and step-length is 10ms is converted to, every frame is filtered by mel filter, and is extracted from result Energy spectrum having a size of 40 constructs the sliding window of regular length as network inputs on these frames, and on each window Shot and long term memory network is run, then shot and long term memory network last frame is exported to the vocal print feature (d- as the sliding window Vector it) indicates.
4. a kind of sound enhancement method for comparing and generating confrontation network based on vocal print according to claim 1, feature It is, the generator, is by one 8 layers of convolutional network, one 1 layer of shot and long term memory recirculating network and one 2 layers Fully-connected network composition, every layer all use Relu activation primitive, the last layer fully-connected network use sigmoid activation primitive, For the sound spectrograph of input signal after convolutional layer, the vocal print feature (d-vector) of reference audio can be spliced to frame by frame convolutional layer Output on, input shot and long term together and remember layer, finally, the output of network be one section with input the identical mask of sound spectrograph dimension (mask), output masking is multiplied with input sound spectrograph, the clean audio spectrogram of prediction of output audio can be obtained
5. a kind of sound enhancement method for comparing and generating confrontation network based on vocal print according to claim 1, feature It is, the discriminator, is made of one 2 layers of convolutional networks and one 2 layers of full Connection Neural Network, every layer is all used Relu activation primitive, the last layer fully-connected network use sigmoid activation primitive, and generator is by the clean audio of the prediction of generation SpectrogramIt is sent into discriminator, then clean audio X true in step 1) is sent into discriminator, training discriminator neural network identifies The clean audio spectrogram of the prediction that device generates generatorIt is determined as that false data gives low point (close to 0), to true in step 1) It does solid work net audio X and is determined as that truthful data awards high marks (close to 1), learn the distribution of truthful data and prediction data with this, make Whether the sound that discriminator tells speaker in step 6) is met very by noise separation system prediction sound spectrograph generated The distribution of flatness frequency.
6. a kind of sound enhancement method for comparing and generating confrontation network based on vocal print according to claim 1, feature It is, the adjustment assessor weight parameter, specifically by the message transmission of true falseness to generator, generator adjusts net The parameter of network model corrects the sound spectrograph of its output, makes it closer to true distribution, and elimination is authenticated device and is determined as falseness Noise signal, even if generator generate the clean sound spectrograph of prediction" can out-trick " discriminator, and discriminator determines generator The clean sound spectrograph of prediction of generation is the sound spectrograph X of the true clean audio obtained in the TIMIT sound bank, anti-in neural network During propagation, discriminator can preferably tell the clean audio of prediction of true clean audio and generator generation Difference, that is, preferably find the feature of true clean audio;Likewise, generator also can be with the identification of continuous renewal Device adjusts its parameter, and the prediction sound spectrograph for generating it is mobile towards true clean audio sound spectrograph.
7. a kind of sound enhancement method for comparing and generating confrontation network based on vocal print according to claim 1, feature It is, the generator, discriminator, mutual game is confronted with each other, and confrontation network algorithm is generated, and algorithmic formula is as follows:
It disappears to solve the problems, such as that classical way faces gradient, confrontation network (the least- is generated using least square Squares GAN) it replaces intersecting entropy loss (the cross-entropy loss), then:
In above-mentioned formula, G indicates generator (Generator), and D indicates discriminator (Discriminator), and V represents penalty values, Data indicates that the sound bank of true clean audio in step 1) speech Separation system, x indicate to extract in data true clean Speech audio, noise indicate that the band noise frequency sound bank in step 1) speech Separation system, n indicate extraction and x from noise Corresponding band noise frequency, G (n) indicate that generator carries out denoising to noisy speech, obtain predicting clean audioD(G (n)) indicate discriminator to the clean audio of predictionIt carries out being determined as that false number gives low point (close to 0), to true clean audio X It is determined as that truthful data awards high marks (close to 1).
CN201811353760.5A 2018-11-14 2018-11-14 Voice enhancement method based on voiceprint comparison and generation of confrontation network Active CN109326302B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811353760.5A CN109326302B (en) 2018-11-14 2018-11-14 Voice enhancement method based on voiceprint comparison and generation of confrontation network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811353760.5A CN109326302B (en) 2018-11-14 2018-11-14 Voice enhancement method based on voiceprint comparison and generation of confrontation network

Publications (2)

Publication Number Publication Date
CN109326302A true CN109326302A (en) 2019-02-12
CN109326302B CN109326302B (en) 2022-11-08

Family

ID=65257213

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811353760.5A Active CN109326302B (en) 2018-11-14 2018-11-14 Voice enhancement method based on voiceprint comparison and generation of confrontation network

Country Status (1)

Country Link
CN (1) CN109326302B (en)

Cited By (36)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110164470A (en) * 2019-06-12 2019-08-23 成都嗨翻屋科技有限公司 Voice separation method, device, user terminal and storage medium
CN110211591A (en) * 2019-06-24 2019-09-06 卓尔智联(武汉)研究院有限公司 Interview data analysing method, computer installation and medium based on emotional semantic classification
CN110289004A (en) * 2019-06-18 2019-09-27 暨南大学 A kind of artificial synthesized vocal print detection system and method based on deep learning
CN110619885A (en) * 2019-08-15 2019-12-27 西北工业大学 Method for generating confrontation network voice enhancement based on deep complete convolution neural network
CN110619886A (en) * 2019-10-11 2019-12-27 北京工商大学 End-to-end voice enhancement method for low-resource Tujia language
CN110675891A (en) * 2019-09-25 2020-01-10 电子科技大学 Voice separation method and module based on multilayer attention mechanism
CN110718232A (en) * 2019-09-23 2020-01-21 东南大学 Speech enhancement method for generating countermeasure network based on two-dimensional spectrogram and condition
CN110853663A (en) * 2019-10-12 2020-02-28 平安科技(深圳)有限公司 Speech enhancement method based on artificial intelligence, server and storage medium
CN111128197A (en) * 2019-12-25 2020-05-08 北京邮电大学 Multi-speaker voice separation method based on voiceprint features and generation confrontation learning
CN111243569A (en) * 2020-02-24 2020-06-05 浙江工业大学 Emotional voice automatic generation method and device based on generation type confrontation network
CN111261147A (en) * 2020-01-20 2020-06-09 浙江工业大学 Music embedding attack defense method facing voice recognition system
CN111276132A (en) * 2020-02-04 2020-06-12 北京声智科技有限公司 Voice processing method, electronic equipment and computer readable storage medium
CN111341304A (en) * 2020-02-28 2020-06-26 广州国音智能科技有限公司 Method, device and equipment for training speech characteristics of speaker based on GAN
CN111524526A (en) * 2020-05-14 2020-08-11 中国工商银行股份有限公司 Voiceprint recognition method and device
CN111785281A (en) * 2020-06-17 2020-10-16 国家计算机网络与信息安全管理中心 Voiceprint recognition method and system based on channel compensation
CN111862989A (en) * 2020-06-01 2020-10-30 北京捷通华声科技股份有限公司 Acoustic feature processing method and device
CN111883091A (en) * 2020-07-09 2020-11-03 腾讯音乐娱乐科技(深圳)有限公司 Audio noise reduction method and training method of audio noise reduction model
CN112216300A (en) * 2020-09-25 2021-01-12 三一专用汽车有限责任公司 Noise reduction method and device for sound in driving cab of mixer truck and mixer truck
CN112259112A (en) * 2020-09-28 2021-01-22 上海声瀚信息科技有限公司 Echo cancellation method combining voiceprint recognition and deep learning
CN112687275A (en) * 2020-12-25 2021-04-20 北京中科深智科技有限公司 Voice filtering method and filtering system
CN112802491A (en) * 2021-02-07 2021-05-14 武汉大学 Voice enhancement method for generating countermeasure network based on time-frequency domain
CN112989108A (en) * 2021-02-24 2021-06-18 腾讯科技(深圳)有限公司 Language detection method and device based on artificial intelligence and electronic equipment
CN113035217A (en) * 2021-03-01 2021-06-25 武汉大学 Voice enhancement method based on voiceprint embedding under low signal-to-noise ratio condition
WO2021203880A1 (en) * 2020-04-10 2021-10-14 华为技术有限公司 Speech enhancement method, neural network training method, and related device
CN113571084A (en) * 2021-07-08 2021-10-29 咪咕音乐有限公司 Audio processing method, device, equipment and storage medium
CN113707168A (en) * 2021-09-03 2021-11-26 合肥讯飞数码科技有限公司 Voice enhancement method, device, equipment and storage medium
CN113724713A (en) * 2021-09-07 2021-11-30 科大讯飞股份有限公司 Voice recognition method, device, equipment and storage medium
CN113823293A (en) * 2021-09-28 2021-12-21 武汉理工大学 Speaker recognition method and system based on voice enhancement
WO2022077305A1 (en) * 2020-10-15 2022-04-21 Beijing Didi Infinity Technology And Development Co., Ltd. Method and system for acoustic echo cancellation
CN114609493A (en) * 2022-05-09 2022-06-10 杭州兆华电子股份有限公司 Partial discharge signal identification method with enhanced signal data
JP2022536190A (en) * 2020-04-28 2022-08-12 平安科技(深▲せん▼)有限公司 Voiceprint Recognition Method, Apparatus, Equipment, and Storage Medium
US20220358904A1 (en) * 2019-03-20 2022-11-10 Research Foundation Of The City University Of New York Method for extracting speech from degraded signals by predicting the inputs to a speech vocoder
US11514925B2 (en) * 2020-04-30 2022-11-29 Adobe Inc. Using a predictive model to automatically enhance audio having various audio quality issues
WO2023020500A1 (en) * 2021-08-17 2023-02-23 中移(苏州)软件技术有限公司 Speech separation method and apparatus, and storage medium
WO2023102930A1 (en) * 2021-12-10 2023-06-15 清华大学深圳国际研究生院 Speech enhancement method, electronic device, program product, and storage medium
CN116458894A (en) * 2023-04-21 2023-07-21 山东省人工智能研究院 Electrocardiosignal enhancement and classification method based on composite generation countermeasure network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1327976A1 (en) * 2001-12-21 2003-07-16 Cortologic AG Method and system for recognition of speech in a noisy environment
WO2017168870A1 (en) * 2016-03-28 2017-10-05 ソニー株式会社 Information processing device and information processing method
CN108074244A (en) * 2017-09-07 2018-05-25 汉鼎宇佑互联网股份有限公司 A kind of safe city wagon flow statistical method for merging deep learning and Background difference
CN108597496A (en) * 2018-05-07 2018-09-28 广州势必可赢网络科技有限公司 A kind of speech production method and device for fighting network based on production
CN108682418A (en) * 2018-06-26 2018-10-19 北京理工大学 A kind of audio recognition method based on pre-training and two-way LSTM

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1327976A1 (en) * 2001-12-21 2003-07-16 Cortologic AG Method and system for recognition of speech in a noisy environment
WO2017168870A1 (en) * 2016-03-28 2017-10-05 ソニー株式会社 Information processing device and information processing method
CN108074244A (en) * 2017-09-07 2018-05-25 汉鼎宇佑互联网股份有限公司 A kind of safe city wagon flow statistical method for merging deep learning and Background difference
CN108597496A (en) * 2018-05-07 2018-09-28 广州势必可赢网络科技有限公司 A kind of speech production method and device for fighting network based on production
CN108682418A (en) * 2018-06-26 2018-10-19 北京理工大学 A kind of audio recognition method based on pre-training and two-way LSTM

Cited By (55)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220358904A1 (en) * 2019-03-20 2022-11-10 Research Foundation Of The City University Of New York Method for extracting speech from degraded signals by predicting the inputs to a speech vocoder
CN110164470A (en) * 2019-06-12 2019-08-23 成都嗨翻屋科技有限公司 Voice separation method, device, user terminal and storage medium
CN110289004A (en) * 2019-06-18 2019-09-27 暨南大学 A kind of artificial synthesized vocal print detection system and method based on deep learning
CN110289004B (en) * 2019-06-18 2021-09-07 暨南大学 Artificial synthesis voiceprint detection system and method based on deep learning
CN110211591A (en) * 2019-06-24 2019-09-06 卓尔智联(武汉)研究院有限公司 Interview data analysing method, computer installation and medium based on emotional semantic classification
CN110211591B (en) * 2019-06-24 2021-12-21 卓尔智联(武汉)研究院有限公司 Interview data analysis method based on emotion classification, computer device and medium
CN110619885A (en) * 2019-08-15 2019-12-27 西北工业大学 Method for generating confrontation network voice enhancement based on deep complete convolution neural network
CN110619885B (en) * 2019-08-15 2022-02-11 西北工业大学 Method for generating confrontation network voice enhancement based on deep complete convolution neural network
CN110718232A (en) * 2019-09-23 2020-01-21 东南大学 Speech enhancement method for generating countermeasure network based on two-dimensional spectrogram and condition
CN110675891A (en) * 2019-09-25 2020-01-10 电子科技大学 Voice separation method and module based on multilayer attention mechanism
CN110675891B (en) * 2019-09-25 2020-09-18 电子科技大学 Voice separation method and module based on multilayer attention mechanism
CN110619886B (en) * 2019-10-11 2022-03-22 北京工商大学 End-to-end voice enhancement method for low-resource Tujia language
CN110619886A (en) * 2019-10-11 2019-12-27 北京工商大学 End-to-end voice enhancement method for low-resource Tujia language
CN110853663A (en) * 2019-10-12 2020-02-28 平安科技(深圳)有限公司 Speech enhancement method based on artificial intelligence, server and storage medium
CN110853663B (en) * 2019-10-12 2023-04-28 平安科技(深圳)有限公司 Speech enhancement method based on artificial intelligence, server and storage medium
WO2021068338A1 (en) * 2019-10-12 2021-04-15 平安科技(深圳)有限公司 Speech enhancement method based on artificial intelligence, server and storage medium
CN111128197B (en) * 2019-12-25 2022-05-13 北京邮电大学 Multi-speaker voice separation method based on voiceprint features and generation confrontation learning
CN111128197A (en) * 2019-12-25 2020-05-08 北京邮电大学 Multi-speaker voice separation method based on voiceprint features and generation confrontation learning
CN111261147A (en) * 2020-01-20 2020-06-09 浙江工业大学 Music embedding attack defense method facing voice recognition system
CN111276132A (en) * 2020-02-04 2020-06-12 北京声智科技有限公司 Voice processing method, electronic equipment and computer readable storage medium
CN111243569B (en) * 2020-02-24 2022-03-08 浙江工业大学 Emotional voice automatic generation method and device based on generation type confrontation network
CN111243569A (en) * 2020-02-24 2020-06-05 浙江工业大学 Emotional voice automatic generation method and device based on generation type confrontation network
CN111341304A (en) * 2020-02-28 2020-06-26 广州国音智能科技有限公司 Method, device and equipment for training speech characteristics of speaker based on GAN
WO2021203880A1 (en) * 2020-04-10 2021-10-14 华为技术有限公司 Speech enhancement method, neural network training method, and related device
JP2022536190A (en) * 2020-04-28 2022-08-12 平安科技(深▲せん▼)有限公司 Voiceprint Recognition Method, Apparatus, Equipment, and Storage Medium
JP7184236B2 (en) 2020-04-28 2022-12-06 平安科技(深▲せん▼)有限公司 Voiceprint Recognition Method, Apparatus, Equipment, and Storage Medium
US11514925B2 (en) * 2020-04-30 2022-11-29 Adobe Inc. Using a predictive model to automatically enhance audio having various audio quality issues
CN111524526A (en) * 2020-05-14 2020-08-11 中国工商银行股份有限公司 Voiceprint recognition method and device
CN111524526B (en) * 2020-05-14 2023-11-17 中国工商银行股份有限公司 Voiceprint recognition method and voiceprint recognition device
CN111862989B (en) * 2020-06-01 2024-03-08 北京捷通华声科技股份有限公司 Acoustic feature processing method and device
CN111862989A (en) * 2020-06-01 2020-10-30 北京捷通华声科技股份有限公司 Acoustic feature processing method and device
CN111785281A (en) * 2020-06-17 2020-10-16 国家计算机网络与信息安全管理中心 Voiceprint recognition method and system based on channel compensation
CN111883091A (en) * 2020-07-09 2020-11-03 腾讯音乐娱乐科技(深圳)有限公司 Audio noise reduction method and training method of audio noise reduction model
CN112216300A (en) * 2020-09-25 2021-01-12 三一专用汽车有限责任公司 Noise reduction method and device for sound in driving cab of mixer truck and mixer truck
CN112259112A (en) * 2020-09-28 2021-01-22 上海声瀚信息科技有限公司 Echo cancellation method combining voiceprint recognition and deep learning
CN115668366A (en) * 2020-10-15 2023-01-31 北京嘀嘀无限科技发展有限公司 Acoustic echo cancellation method and system
WO2022077305A1 (en) * 2020-10-15 2022-04-21 Beijing Didi Infinity Technology And Development Co., Ltd. Method and system for acoustic echo cancellation
CN112687275A (en) * 2020-12-25 2021-04-20 北京中科深智科技有限公司 Voice filtering method and filtering system
CN112802491B (en) * 2021-02-07 2022-06-14 武汉大学 Voice enhancement method for generating confrontation network based on time-frequency domain
CN112802491A (en) * 2021-02-07 2021-05-14 武汉大学 Voice enhancement method for generating countermeasure network based on time-frequency domain
CN112989108A (en) * 2021-02-24 2021-06-18 腾讯科技(深圳)有限公司 Language detection method and device based on artificial intelligence and electronic equipment
CN113035217A (en) * 2021-03-01 2021-06-25 武汉大学 Voice enhancement method based on voiceprint embedding under low signal-to-noise ratio condition
CN113035217B (en) * 2021-03-01 2023-11-10 武汉大学 Voice enhancement method based on voiceprint embedding under low signal-to-noise ratio condition
CN113571084A (en) * 2021-07-08 2021-10-29 咪咕音乐有限公司 Audio processing method, device, equipment and storage medium
CN113571084B (en) * 2021-07-08 2024-03-22 咪咕音乐有限公司 Audio processing method, device, equipment and storage medium
WO2023020500A1 (en) * 2021-08-17 2023-02-23 中移(苏州)软件技术有限公司 Speech separation method and apparatus, and storage medium
CN113707168A (en) * 2021-09-03 2021-11-26 合肥讯飞数码科技有限公司 Voice enhancement method, device, equipment and storage medium
CN113724713A (en) * 2021-09-07 2021-11-30 科大讯飞股份有限公司 Voice recognition method, device, equipment and storage medium
CN113823293B (en) * 2021-09-28 2024-04-26 武汉理工大学 Speaker recognition method and system based on voice enhancement
CN113823293A (en) * 2021-09-28 2021-12-21 武汉理工大学 Speaker recognition method and system based on voice enhancement
WO2023102930A1 (en) * 2021-12-10 2023-06-15 清华大学深圳国际研究生院 Speech enhancement method, electronic device, program product, and storage medium
CN114609493B (en) * 2022-05-09 2022-08-12 杭州兆华电子股份有限公司 Partial discharge signal identification method with enhanced signal data
CN114609493A (en) * 2022-05-09 2022-06-10 杭州兆华电子股份有限公司 Partial discharge signal identification method with enhanced signal data
CN116458894B (en) * 2023-04-21 2024-01-26 山东省人工智能研究院 Electrocardiosignal enhancement and classification method based on composite generation countermeasure network
CN116458894A (en) * 2023-04-21 2023-07-21 山东省人工智能研究院 Electrocardiosignal enhancement and classification method based on composite generation countermeasure network

Also Published As

Publication number Publication date
CN109326302B (en) 2022-11-08

Similar Documents

Publication Publication Date Title
CN109326302A (en) A kind of sound enhancement method comparing and generate confrontation network based on vocal print
CN104732978B (en) The relevant method for distinguishing speek person of text based on combined depth study
CN105632501A (en) Deep-learning-technology-based automatic accent classification method and apparatus
Hui et al. Convolutional maxout neural networks for speech separation
Kinoshita et al. Text-informed speech enhancement with deep neural networks.
Xu et al. Global variance equalization for improving deep neural network based speech enhancement
CN108615533A (en) A kind of high-performance sound enhancement method based on deep learning
CN108962229A (en) A kind of target speaker's voice extraction method based on single channel, unsupervised formula
CN110136709A (en) Audio recognition method and video conferencing system based on speech recognition
CN112331218B (en) Single-channel voice separation method and device for multiple speakers
CN111899757A (en) Single-channel voice separation method and system for target speaker extraction
Do et al. Speech source separation using variational autoencoder and bandpass filter
Soleymani et al. Prosodic-enhanced siamese convolutional neural networks for cross-device text-independent speaker verification
CN111798875A (en) VAD implementation method based on three-value quantization compression
CN106297769B (en) A kind of distinctive feature extracting method applied to languages identification
Bhardwaj et al. Deep neural network trained Punjabi children speech recognition system using Kaldi toolkit
CN105845143A (en) Speaker confirmation method and speaker confirmation system based on support vector machine
CN110246518A (en) Speech-emotion recognition method, device, system and storage medium based on more granularity sound state fusion features
Sekkate et al. Speaker identification for OFDM-based aeronautical communication system
Gadasin et al. Using Formants for Human Speech Recognition by Artificial Intelligence
Tan et al. Denoised senone i-vectors for robust speaker verification
CN112466276A (en) Speech synthesis system training method and device and readable storage medium
Le et al. Personalized speech enhancement combining band-split rnn and speaker attentive module
Wang et al. Robust speech recognition from ratio masks
CN111785262B (en) Speaker age and gender classification method based on residual error network and fusion characteristics

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant