CN109326302A - A kind of sound enhancement method comparing and generate confrontation network based on vocal print - Google Patents
A kind of sound enhancement method comparing and generate confrontation network based on vocal print Download PDFInfo
- Publication number
- CN109326302A CN109326302A CN201811353760.5A CN201811353760A CN109326302A CN 109326302 A CN109326302 A CN 109326302A CN 201811353760 A CN201811353760 A CN 201811353760A CN 109326302 A CN109326302 A CN 109326302A
- Authority
- CN
- China
- Prior art keywords
- sound
- generator
- audio
- discriminator
- clean
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0264—Noise filtering characterised by the type of parameter measurement, e.g. correlation techniques, zero crossing techniques or predictive techniques
Abstract
The present invention disclose it is a kind of based on vocal print compare and generate confrontation network sound enhancement method, 1) establish three speech databases, respectively correspond Application on Voiceprint Recognition encoder, noise separation system and speech Separation system;2) training Application on Voiceprint Recognition encoder extracts the vocal print feature of target speaker, obtains target vocal print feature;3) sound spectrograph will be converted to noise frequency to be sent into the generator in noise separation system, must predict clean audio;4) it will predict that clean audio and true clean audio are sent into the discriminator training in noise separation system;5) assessor weight parameter is adjusted, so that assessor is preferably told true clean audio and predicts the difference of clean audio, obtain generating the generator of almost true clean audio;6) sound of speaker is sent into trained generator, generates and predicts clean sound spectrograph, the voice signal enhanced.This method small scale, calculate it is low, be easy to transplant, keep certain space-invariance and denoising effect is good.
Description
Technical field
The present invention relates to speech enhancement technique field, specifically a kind of voice for comparing and generating confrontation network based on vocal print
Enhancement Method.
Background technique
With the development of society, electronic product is universal, requirement of the people to voice quality is higher and higher.How electricity is improved
Mobile communication quality of the sub- product under noisy environment has become instantly most popular research direction.And speech enhan-cement can mention
The quality and comprehensibility of voice under high-noise environment, speech enhan-cement not only have in hearing aid and artificial cochlea field important
Using, and it has been successfully applied to the pretreatment stage in speech recognition and Speaker Recognition System.
The method of classical speech enhan-cement has spectrum-subtraction, Wiener filtering, the method based on statistical model and Subspace algorithm.
Since the eighties, neural network is also applied to speech enhan-cement.In recent years, denoising had been widely adopted from coding scheme.Example
Such as, circulation denoising shows in the processing to audio signal contextual information good from coding.Nearest shot and long term memory network
It is applied to denoising task.Although these above-mentioned methods can obtain good effect, but need a large amount of data and
Calculation amount, it is difficult to be transplanted to embedded device.Moreover, these methods tend to rely on training set, the clean audio of output is base
The average value of clean audio is exported in training set, can relatively be obscured, the processing to details is simultaneously not fully up to expectations.
Summary of the invention
It is an object of the invention to overcome the deficiencies in the prior art, and provide a kind of based on vocal print comparison and generation confrontation net
The sound enhancement method of network, this method small scale, calculating is lower, is easy to transplant, keeps certain space-invariance and denoise effect
Fruit is good.
Realizing the technical solution of the object of the invention is:
A kind of sound enhancement method being compared and generated confrontation network based on vocal print, includes the following steps:
1) three speech databases are established, Application on Voiceprint Recognition encoder, noise separation system and speech Separation system are respectively corresponded
System;
2) training Application on Voiceprint Recognition encoder extracts the vocal print feature of target speaker, obtains target vocal print feature;
3) audio that band is made an uproar is converted to sound spectrograph and is sent into the generator in noise separation system, and generator is according to vocal print
The target vocal print feature that identification encoder extracts isolates the sound of target speaker, obtains predicting clean audio;
4) clean audio feeding true in the clean audio of prediction and step 1) speech Separation system that step 3) obtains is made an uproar
Discriminator in sound separation system is trained, and the sound for making discriminator tell speaker is generated by noise separation system
Prediction sound spectrograph whether meet the distribution of realAudio;
5) adjust assessor weight parameter, make assessor preferably tell true clean audio and generator generate it is pre-
The difference for surveying clean audio distinguishes the weight parameter that result updates generator according to discriminator, until life can not be identified in discriminator
Grow up to be a useful person generation prediction audio and true clean audio difference, obtain can produce the generation of almost true clean audio
Device;
6) it is trained to be converted to sound spectrograph feeding through Short Time Fourier Transform for the sound that speaker is collected by microphone
In good generator, generates and predict clean sound spectrograph, then voice analog signal is converted to by anti-Short Time Fourier Transform, voice
Analog signal plays back to arrive the voice signal of enhancing through loudspeaker.
The Application on Voiceprint Recognition encoder of answering is 2000NISI Speaker Recongnition Evaluation voice
Answer Application on Voiceprint Recognition encoder in library;The noise separation system is the noise separation system in 100-nonspeech noise library;Institute
The speech Separation system stated is the speech Separation system of TIMIT sound bank.
In step 2), the Application on Voiceprint Recognition encoder extracts the vocal print feature of target speaker, specifically: audio is believed
Number the frame that width is 25ms and step-length is 10ms is converted to, every frame is filtered by mel filter, and is mentioned from result
The energy spectrum having a size of 40 is taken to construct the sliding window of regular length on these frames as network inputs, and in each window
Then shot and long term memory network last frame is exported the vocal print feature as the sliding window by upper operation shot and long term memory network
(d-vector) it indicates.
The generator is by one 8 layers of convolutional network, one 1 layer of shot and long term memory recirculating network and one 2
The fully-connected network composition of layer, every layer is all used Relu activation primitive, and the last layer fully-connected network activates letter using sigmoid
Number, for the sound spectrograph of input signal after convolutional layer, the vocal print feature (d-vector) of reference audio can be spliced to frame by frame volume
In the output of lamination, shot and long term being inputted together and remembers layer, finally, the output of network is one section identical with input sound spectrograph dimension
Output masking is multiplied with input sound spectrograph, the clean audio spectrogram of prediction of output audio can be obtained by mask (mask)
The discriminator is made of, every layer one 2 layers of convolutional networks and one 2 layers of full Connection Neural Network
Relu activation primitive is all used, the last layer fully-connected network uses sigmoid activation primitive, and generator is clean by the prediction of generation
Audio spectrogramIt is sent into discriminator, then clean audio X true in step 1) is sent into discriminator, trains discriminator neural network,
The clean audio spectrogram of the prediction that discriminator generates generatorIt is determined as that false data gives low point (close to 0), to step 1)
In true clean audio X be determined as that truthful data awards high marks (close to 1), learn point of truthful data and prediction data with this
Whether cloth, the sound for making discriminator tell speaker in step 6) are accorded with by noise separation system prediction sound spectrograph generated
Close the distribution of realAudio.
The adjustment assessor weight parameter, specifically by the message transmission of true falseness to generator, generator tune
The parameter of whole network model corrects the sound spectrograph of its output, makes it closer to true distribution, and elimination is authenticated device and is determined as
False noise signal, even if the clean sound spectrograph of prediction that generator generates" can out-trick " discriminator, and discriminator determines life
Grow up to be a useful person generation the clean sound spectrograph of prediction be the true clean audio obtained in the TIMIT sound bank sound spectrograph X, in nerve net
During network backpropagation, discriminator can preferably tell the clean sound of prediction of true clean audio and generator generation
The difference of frequency, that is, preferably find the feature of true clean audio;Likewise, generator also can be with the mirror of continuous renewal
Other device, adjusts its parameter, and the prediction sound spectrograph for generating it is mobile towards true clean audio sound spectrograph.
The generator, discriminator, mutual game are confronted with each other, and confrontation network algorithm is generated, and algorithmic formula is as follows:
It disappears to solve the problems, such as that classical way faces gradient, confrontation network (the least- is generated using least square
Squares GAN) it replaces intersecting entropy loss (the cross-entropy loss), then:
In above-mentioned formula, G indicates generator (Generator), and D indicates discriminator (Discriminator), and V represents damage
Mistake value, data indicate that the sound bank of true clean audio in step 1) speech Separation system, x indicate to extract in data true
Clean speech audio, noise indicate the band noise frequency sound bank in step 1) speech Separation system, and n expression is extracted from noise
Band noise frequency corresponding with x, G (n) indicate that generator carries out denoising to noisy speech, obtain predicting clean audioD(G
(n)) indicate discriminator to the clean audio of predictionIt carries out being determined as that false number gives low point (close to 0), to true clean audio X
It is determined as that truthful data awards high marks (close to 1).
A kind of sound enhancement method comparing and generate confrontation network based on vocal print provided by the invention, this method scale
It is small, calculate it is lower, be easy to transplant, keep certain space-invariance and denoising effect is good.
Detailed description of the invention
Fig. 1 is the flow chart of the method for the present invention;
Fig. 2 is the Application on Voiceprint Recognition encoder schematic diagram in the present invention;
Fig. 3 is the generator schematic diagram in the present invention;
Fig. 4 is the discriminator schematic diagram in the present invention.
Specific embodiment
The present invention is further elaborated with reference to the accompanying drawings and examples, but is not limitation of the invention.
Embodiment:
As shown in Figure 1, a kind of sound enhancement method for being compared and being generated confrontation network based on vocal print, includes the following steps:
1) three speech databases are established, Application on Voiceprint Recognition encoder, noise separation system and speech Separation system are respectively corresponded
System;
2) training Application on Voiceprint Recognition encoder extracts the vocal print feature of target speaker, obtains target vocal print feature;
3) audio that band is made an uproar is converted to sound spectrograph and is sent into the generator in noise separation system, and generator is according to vocal print
The target vocal print feature that identification encoder extracts isolates the sound of target speaker, obtains predicting clean audio;
4) clean audio feeding true in the clean audio of prediction and step 1) speech Separation system that step 3) obtains is made an uproar
Discriminator in sound separation system is trained, and the sound for making discriminator tell speaker is generated by noise separation system
Prediction sound spectrograph whether meet the distribution of realAudio;
5) adjust assessor weight parameter, make assessor preferably tell true clean audio and generator generate it is pre-
The difference for surveying clean audio distinguishes the weight parameter that result updates generator according to discriminator, until life can not be identified in discriminator
Grow up to be a useful person generation prediction audio and true clean audio difference, obtain can produce the generation of almost true clean audio
Device;
6) it is trained to be converted to sound spectrograph feeding through Short Time Fourier Transform for the sound that speaker is collected by microphone
In good generator, generates and predict clean sound spectrograph, then voice analog signal is converted to by anti-Short Time Fourier Transform, voice
Analog signal plays back to arrive the voice signal of enhancing through loudspeaker.
The Application on Voiceprint Recognition encoder of answering is 2000NISI Speaker Recongnition Evaluation voice
Answer Application on Voiceprint Recognition encoder in library;The noise separation system is the noise separation system in 100-nonspeech noise library;Institute
The speech Separation system stated is the speech Separation system of TIMIT sound bank.
2000NISI Speaker Recongnition Evaluation sound bank is that vocal print feature is extracted in paper most
Common data set is usually directly known as " CALLHOME " in the literature, it includes dialect in 500, is distributed in 6 in language: Ah
Draw primary language, English, German, Japanese, mandarin, Spanish;
TIMIT sound bank is common by Texas Instrument (TI), the Massachusetts Institute of Technology (MIT) and Stanford Research Institute (SRI)
The acoustics of acquisition-phoneme continuous speech corpus includes 6300 sentences, by every from regional 630 people of 8, U.S. main dialect
People says given 10 sentences, and all sentences have carried out manual segmentation, label all in phone-level, and according to the ratio of 7:3
Data set is divided into training set (70%) and test set (30%);
100-nonspeech noise library is inhuman noise sound in 100 collected by Guo Ning tiger team.
Use 2000NISI Speaker Recongnition Evaluation as first database training vocal print
It identifies encoder, it is made to can be very good to extract the vocal print feature (d-vector) of speaker.Secondly, needing triple database
Train entire noise separation system, input: 1. the clean audio from target speaker 2. 3. say from target by band noise frequency
The reference audio of words person;Clean audio is selected from TIMIT sound bank and is made an uproar from noise according to different signal-to-noise ratio (SNR) anamorphic zone
Audio is finally removed in target speaker and randomly selects a reference audio composition triple data in used clean audio,
As the second database.
In step 2), the Application on Voiceprint Recognition encoder extracts the vocal print feature of target speaker, as shown in Fig. 2, specifically
It is: converts audio signals into the frame that width is 25ms and step-length is 10ms, every frame is filtered by mel filter, and
And the energy spectrum having a size of 40 is extracted from result as network inputs, the sliding window of regular length is constructed on these frames,
And shot and long term memory network is run on each window, then it regard the output of shot and long term memory network last frame as the sliding window
Vocal print feature (d-vector) indicate.
As shown in figure 3, the generator, is by one 8 layers of convolutional network, one 1 layer of shot and long term memory circulation
Network and one 2 layers of fully-connected network composition, every layer is all used Relu activation primitive, and the last layer fully-connected network uses
Sigmoid activation primitive, for the sound spectrograph of input signal after convolutional layer, the vocal print feature (d-vector) of reference audio can quilt
It is spliced in the output of convolutional layer frame by frame, inputs shot and long term together and remember layer, finally, the output of network is one section and input language spectrum
Output masking is multiplied by the identical mask of figure dimension (mask) with input sound spectrograph, and the prediction that output audio can be obtained is clean
Audio spectrogram
As shown in figure 4, the discriminator, is by one 2 layers of convolutional network and one 2 layers of full Connection Neural Network
Composition, every layer is all used Relu activation primitive, and the last layer fully-connected network uses sigmoid activation primitive, and generator will generate
The clean audio spectrogram of predictionIt is sent into discriminator, then clean audio X true in step 1) is sent into discriminator, training discriminator
Neural network, the clean audio spectrogram of the prediction that discriminator generates generatorIt is determined as that false data gives low point (close to 0),
Truthful data, which awards high marks (close to 1), to be determined as to clean audio X true in step 1), truthful data and prediction number are learnt with this
According to distribution, so that discriminator is told the sound of speaker in step 6) and pass through noise separation system prediction sound spectrograph generated
Whether the distribution of realAudio is met.
The adjustment assessor weight parameter, specifically by the message transmission of true falseness to generator, generator tune
The parameter of whole network model corrects the sound spectrograph of its output, makes it closer to true distribution, and elimination is authenticated device and is determined as
False noise signal, even if the clean sound spectrograph of prediction that generator generates" can out-trick " discriminator, and discriminator determines life
Grow up to be a useful person generation the clean sound spectrograph of prediction be the true clean audio obtained in the TIMIT sound bank sound spectrograph X, in nerve net
During network backpropagation, discriminator can preferably tell the clean sound of prediction of true clean audio and generator generation
The difference of frequency, that is, preferably find the feature of true clean audio;Likewise, generator also can be with the mirror of continuous renewal
Other device, adjusts its parameter, and the prediction sound spectrograph for generating it is mobile towards true clean audio sound spectrograph.
The generator, discriminator, mutual game are confronted with each other, and confrontation network algorithm is generated, and algorithmic formula is as follows:
It disappears to solve the problems, such as that classical way faces gradient, confrontation network (theleast- is generated using least square
Squares GAN) it replaces intersecting entropy loss (the cross-entropy loss), then:
In above-mentioned formula, G indicates generator (Generator), and D indicates discriminator (Discriminator), and V represents damage
Mistake value, data indicate that the sound bank of true clean audio in step 1) speech Separation system, x indicate to extract in data true
Clean speech audio, noise indicate the band noise frequency sound bank in step 1) speech Separation system, and n expression is extracted from noise
Band noise frequency corresponding with x, G (n) indicate that generator carries out denoising to noisy speech, obtain predicting clean audioD(G
(n)) indicate discriminator to the clean audio of predictionIt carries out being determined as that false number gives low point (close to 0), to true clean audio X
It is determined as that truthful data awards high marks (close to 1).
Claims (7)
1. a kind of sound enhancement method for comparing and generating confrontation network based on vocal print, which comprises the steps of:
1) three speech databases are established, Application on Voiceprint Recognition encoder, noise separation system and speech Separation system are respectively corresponded;
2) training Application on Voiceprint Recognition encoder extracts the vocal print feature of target speaker, obtains target vocal print feature;
3) audio that band is made an uproar is converted to sound spectrograph and is sent into the generator in noise separation system, and generator is according to Application on Voiceprint Recognition
The target vocal print feature that encoder extracts isolates the sound of target speaker, obtains predicting clean audio;
4) true clean audio is sent into noise point in the clean audio of the prediction obtained step 3) and step 1) speech Separation system
It is trained from the discriminator in system, the sound for making discriminator tell speaker is generated pre- by noise separation system
Survey the distribution whether sound spectrograph meets realAudio;
5) assessor weight parameter is adjusted, so that assessor is preferably told the prediction that really clean audio and generator generate dry
The difference of net audio distinguishes the weight parameter that result updates generator according to discriminator, until generator can not be identified in discriminator
The difference of the prediction audio of generation and true clean audio obtains can produce the generator of almost true clean audio;
6) it is trained to be converted to sound spectrograph feeding through Short Time Fourier Transform for the sound that speaker is collected by microphone
In generator, generates and predict clean sound spectrograph, then voice analog signal, speech simulation are converted to by anti-Short Time Fourier Transform
Signal plays back to arrive the voice signal of enhancing through loudspeaker.
2. a kind of sound enhancement method for comparing and generating confrontation network based on vocal print according to claim 1, feature
It is, the Application on Voiceprint Recognition encoder of answering is 2000NISI Speaker Recongnition Evaluation sound bank
Answer Application on Voiceprint Recognition encoder;The noise separation system is the noise separation system in 100-nonspeech noise library;Described
Speech Separation system is the speech Separation system of TIMIT sound bank.
3. a kind of sound enhancement method for comparing and generating confrontation network based on vocal print according to claim 1, feature
It is, in step 2), the Application on Voiceprint Recognition encoder extracts the vocal print feature of target speaker, specifically: by audio signal
The frame that width is 25ms and step-length is 10ms is converted to, every frame is filtered by mel filter, and is extracted from result
Energy spectrum having a size of 40 constructs the sliding window of regular length as network inputs on these frames, and on each window
Shot and long term memory network is run, then shot and long term memory network last frame is exported to the vocal print feature (d- as the sliding window
Vector it) indicates.
4. a kind of sound enhancement method for comparing and generating confrontation network based on vocal print according to claim 1, feature
It is, the generator, is by one 8 layers of convolutional network, one 1 layer of shot and long term memory recirculating network and one 2 layers
Fully-connected network composition, every layer all use Relu activation primitive, the last layer fully-connected network use sigmoid activation primitive,
For the sound spectrograph of input signal after convolutional layer, the vocal print feature (d-vector) of reference audio can be spliced to frame by frame convolutional layer
Output on, input shot and long term together and remember layer, finally, the output of network be one section with input the identical mask of sound spectrograph dimension
(mask), output masking is multiplied with input sound spectrograph, the clean audio spectrogram of prediction of output audio can be obtained
5. a kind of sound enhancement method for comparing and generating confrontation network based on vocal print according to claim 1, feature
It is, the discriminator, is made of one 2 layers of convolutional networks and one 2 layers of full Connection Neural Network, every layer is all used
Relu activation primitive, the last layer fully-connected network use sigmoid activation primitive, and generator is by the clean audio of the prediction of generation
SpectrogramIt is sent into discriminator, then clean audio X true in step 1) is sent into discriminator, training discriminator neural network identifies
The clean audio spectrogram of the prediction that device generates generatorIt is determined as that false data gives low point (close to 0), to true in step 1)
It does solid work net audio X and is determined as that truthful data awards high marks (close to 1), learn the distribution of truthful data and prediction data with this, make
Whether the sound that discriminator tells speaker in step 6) is met very by noise separation system prediction sound spectrograph generated
The distribution of flatness frequency.
6. a kind of sound enhancement method for comparing and generating confrontation network based on vocal print according to claim 1, feature
It is, the adjustment assessor weight parameter, specifically by the message transmission of true falseness to generator, generator adjusts net
The parameter of network model corrects the sound spectrograph of its output, makes it closer to true distribution, and elimination is authenticated device and is determined as falseness
Noise signal, even if generator generate the clean sound spectrograph of prediction" can out-trick " discriminator, and discriminator determines generator
The clean sound spectrograph of prediction of generation is the sound spectrograph X of the true clean audio obtained in the TIMIT sound bank, anti-in neural network
During propagation, discriminator can preferably tell the clean audio of prediction of true clean audio and generator generation
Difference, that is, preferably find the feature of true clean audio;Likewise, generator also can be with the identification of continuous renewal
Device adjusts its parameter, and the prediction sound spectrograph for generating it is mobile towards true clean audio sound spectrograph.
7. a kind of sound enhancement method for comparing and generating confrontation network based on vocal print according to claim 1, feature
It is, the generator, discriminator, mutual game is confronted with each other, and confrontation network algorithm is generated, and algorithmic formula is as follows:
It disappears to solve the problems, such as that classical way faces gradient, confrontation network (the least- is generated using least square
Squares GAN) it replaces intersecting entropy loss (the cross-entropy loss), then:
In above-mentioned formula, G indicates generator (Generator), and D indicates discriminator (Discriminator), and V represents penalty values,
Data indicates that the sound bank of true clean audio in step 1) speech Separation system, x indicate to extract in data true clean
Speech audio, noise indicate that the band noise frequency sound bank in step 1) speech Separation system, n indicate extraction and x from noise
Corresponding band noise frequency, G (n) indicate that generator carries out denoising to noisy speech, obtain predicting clean audioD(G
(n)) indicate discriminator to the clean audio of predictionIt carries out being determined as that false number gives low point (close to 0), to true clean audio X
It is determined as that truthful data awards high marks (close to 1).
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811353760.5A CN109326302B (en) | 2018-11-14 | 2018-11-14 | Voice enhancement method based on voiceprint comparison and generation of confrontation network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811353760.5A CN109326302B (en) | 2018-11-14 | 2018-11-14 | Voice enhancement method based on voiceprint comparison and generation of confrontation network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109326302A true CN109326302A (en) | 2019-02-12 |
CN109326302B CN109326302B (en) | 2022-11-08 |
Family
ID=65257213
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811353760.5A Active CN109326302B (en) | 2018-11-14 | 2018-11-14 | Voice enhancement method based on voiceprint comparison and generation of confrontation network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109326302B (en) |
Cited By (36)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110164470A (en) * | 2019-06-12 | 2019-08-23 | 成都嗨翻屋科技有限公司 | Voice separation method, device, user terminal and storage medium |
CN110211591A (en) * | 2019-06-24 | 2019-09-06 | 卓尔智联(武汉)研究院有限公司 | Interview data analysing method, computer installation and medium based on emotional semantic classification |
CN110289004A (en) * | 2019-06-18 | 2019-09-27 | 暨南大学 | A kind of artificial synthesized vocal print detection system and method based on deep learning |
CN110619885A (en) * | 2019-08-15 | 2019-12-27 | 西北工业大学 | Method for generating confrontation network voice enhancement based on deep complete convolution neural network |
CN110619886A (en) * | 2019-10-11 | 2019-12-27 | 北京工商大学 | End-to-end voice enhancement method for low-resource Tujia language |
CN110675891A (en) * | 2019-09-25 | 2020-01-10 | 电子科技大学 | Voice separation method and module based on multilayer attention mechanism |
CN110718232A (en) * | 2019-09-23 | 2020-01-21 | 东南大学 | Speech enhancement method for generating countermeasure network based on two-dimensional spectrogram and condition |
CN110853663A (en) * | 2019-10-12 | 2020-02-28 | 平安科技(深圳)有限公司 | Speech enhancement method based on artificial intelligence, server and storage medium |
CN111128197A (en) * | 2019-12-25 | 2020-05-08 | 北京邮电大学 | Multi-speaker voice separation method based on voiceprint features and generation confrontation learning |
CN111243569A (en) * | 2020-02-24 | 2020-06-05 | 浙江工业大学 | Emotional voice automatic generation method and device based on generation type confrontation network |
CN111261147A (en) * | 2020-01-20 | 2020-06-09 | 浙江工业大学 | Music embedding attack defense method facing voice recognition system |
CN111276132A (en) * | 2020-02-04 | 2020-06-12 | 北京声智科技有限公司 | Voice processing method, electronic equipment and computer readable storage medium |
CN111341304A (en) * | 2020-02-28 | 2020-06-26 | 广州国音智能科技有限公司 | Method, device and equipment for training speech characteristics of speaker based on GAN |
CN111524526A (en) * | 2020-05-14 | 2020-08-11 | 中国工商银行股份有限公司 | Voiceprint recognition method and device |
CN111785281A (en) * | 2020-06-17 | 2020-10-16 | 国家计算机网络与信息安全管理中心 | Voiceprint recognition method and system based on channel compensation |
CN111862989A (en) * | 2020-06-01 | 2020-10-30 | 北京捷通华声科技股份有限公司 | Acoustic feature processing method and device |
CN111883091A (en) * | 2020-07-09 | 2020-11-03 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio noise reduction method and training method of audio noise reduction model |
CN112216300A (en) * | 2020-09-25 | 2021-01-12 | 三一专用汽车有限责任公司 | Noise reduction method and device for sound in driving cab of mixer truck and mixer truck |
CN112259112A (en) * | 2020-09-28 | 2021-01-22 | 上海声瀚信息科技有限公司 | Echo cancellation method combining voiceprint recognition and deep learning |
CN112687275A (en) * | 2020-12-25 | 2021-04-20 | 北京中科深智科技有限公司 | Voice filtering method and filtering system |
CN112802491A (en) * | 2021-02-07 | 2021-05-14 | 武汉大学 | Voice enhancement method for generating countermeasure network based on time-frequency domain |
CN112989108A (en) * | 2021-02-24 | 2021-06-18 | 腾讯科技(深圳)有限公司 | Language detection method and device based on artificial intelligence and electronic equipment |
CN113035217A (en) * | 2021-03-01 | 2021-06-25 | 武汉大学 | Voice enhancement method based on voiceprint embedding under low signal-to-noise ratio condition |
WO2021203880A1 (en) * | 2020-04-10 | 2021-10-14 | 华为技术有限公司 | Speech enhancement method, neural network training method, and related device |
CN113571084A (en) * | 2021-07-08 | 2021-10-29 | 咪咕音乐有限公司 | Audio processing method, device, equipment and storage medium |
CN113707168A (en) * | 2021-09-03 | 2021-11-26 | 合肥讯飞数码科技有限公司 | Voice enhancement method, device, equipment and storage medium |
CN113724713A (en) * | 2021-09-07 | 2021-11-30 | 科大讯飞股份有限公司 | Voice recognition method, device, equipment and storage medium |
CN113823293A (en) * | 2021-09-28 | 2021-12-21 | 武汉理工大学 | Speaker recognition method and system based on voice enhancement |
WO2022077305A1 (en) * | 2020-10-15 | 2022-04-21 | Beijing Didi Infinity Technology And Development Co., Ltd. | Method and system for acoustic echo cancellation |
CN114609493A (en) * | 2022-05-09 | 2022-06-10 | 杭州兆华电子股份有限公司 | Partial discharge signal identification method with enhanced signal data |
JP2022536190A (en) * | 2020-04-28 | 2022-08-12 | 平安科技(深▲せん▼)有限公司 | Voiceprint Recognition Method, Apparatus, Equipment, and Storage Medium |
US20220358904A1 (en) * | 2019-03-20 | 2022-11-10 | Research Foundation Of The City University Of New York | Method for extracting speech from degraded signals by predicting the inputs to a speech vocoder |
US11514925B2 (en) * | 2020-04-30 | 2022-11-29 | Adobe Inc. | Using a predictive model to automatically enhance audio having various audio quality issues |
WO2023020500A1 (en) * | 2021-08-17 | 2023-02-23 | 中移(苏州)软件技术有限公司 | Speech separation method and apparatus, and storage medium |
WO2023102930A1 (en) * | 2021-12-10 | 2023-06-15 | 清华大学深圳国际研究生院 | Speech enhancement method, electronic device, program product, and storage medium |
CN116458894A (en) * | 2023-04-21 | 2023-07-21 | 山东省人工智能研究院 | Electrocardiosignal enhancement and classification method based on composite generation countermeasure network |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1327976A1 (en) * | 2001-12-21 | 2003-07-16 | Cortologic AG | Method and system for recognition of speech in a noisy environment |
WO2017168870A1 (en) * | 2016-03-28 | 2017-10-05 | ソニー株式会社 | Information processing device and information processing method |
CN108074244A (en) * | 2017-09-07 | 2018-05-25 | 汉鼎宇佑互联网股份有限公司 | A kind of safe city wagon flow statistical method for merging deep learning and Background difference |
CN108597496A (en) * | 2018-05-07 | 2018-09-28 | 广州势必可赢网络科技有限公司 | A kind of speech production method and device for fighting network based on production |
CN108682418A (en) * | 2018-06-26 | 2018-10-19 | 北京理工大学 | A kind of audio recognition method based on pre-training and two-way LSTM |
-
2018
- 2018-11-14 CN CN201811353760.5A patent/CN109326302B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1327976A1 (en) * | 2001-12-21 | 2003-07-16 | Cortologic AG | Method and system for recognition of speech in a noisy environment |
WO2017168870A1 (en) * | 2016-03-28 | 2017-10-05 | ソニー株式会社 | Information processing device and information processing method |
CN108074244A (en) * | 2017-09-07 | 2018-05-25 | 汉鼎宇佑互联网股份有限公司 | A kind of safe city wagon flow statistical method for merging deep learning and Background difference |
CN108597496A (en) * | 2018-05-07 | 2018-09-28 | 广州势必可赢网络科技有限公司 | A kind of speech production method and device for fighting network based on production |
CN108682418A (en) * | 2018-06-26 | 2018-10-19 | 北京理工大学 | A kind of audio recognition method based on pre-training and two-way LSTM |
Cited By (55)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20220358904A1 (en) * | 2019-03-20 | 2022-11-10 | Research Foundation Of The City University Of New York | Method for extracting speech from degraded signals by predicting the inputs to a speech vocoder |
CN110164470A (en) * | 2019-06-12 | 2019-08-23 | 成都嗨翻屋科技有限公司 | Voice separation method, device, user terminal and storage medium |
CN110289004A (en) * | 2019-06-18 | 2019-09-27 | 暨南大学 | A kind of artificial synthesized vocal print detection system and method based on deep learning |
CN110289004B (en) * | 2019-06-18 | 2021-09-07 | 暨南大学 | Artificial synthesis voiceprint detection system and method based on deep learning |
CN110211591A (en) * | 2019-06-24 | 2019-09-06 | 卓尔智联(武汉)研究院有限公司 | Interview data analysing method, computer installation and medium based on emotional semantic classification |
CN110211591B (en) * | 2019-06-24 | 2021-12-21 | 卓尔智联(武汉)研究院有限公司 | Interview data analysis method based on emotion classification, computer device and medium |
CN110619885A (en) * | 2019-08-15 | 2019-12-27 | 西北工业大学 | Method for generating confrontation network voice enhancement based on deep complete convolution neural network |
CN110619885B (en) * | 2019-08-15 | 2022-02-11 | 西北工业大学 | Method for generating confrontation network voice enhancement based on deep complete convolution neural network |
CN110718232A (en) * | 2019-09-23 | 2020-01-21 | 东南大学 | Speech enhancement method for generating countermeasure network based on two-dimensional spectrogram and condition |
CN110675891A (en) * | 2019-09-25 | 2020-01-10 | 电子科技大学 | Voice separation method and module based on multilayer attention mechanism |
CN110675891B (en) * | 2019-09-25 | 2020-09-18 | 电子科技大学 | Voice separation method and module based on multilayer attention mechanism |
CN110619886B (en) * | 2019-10-11 | 2022-03-22 | 北京工商大学 | End-to-end voice enhancement method for low-resource Tujia language |
CN110619886A (en) * | 2019-10-11 | 2019-12-27 | 北京工商大学 | End-to-end voice enhancement method for low-resource Tujia language |
CN110853663A (en) * | 2019-10-12 | 2020-02-28 | 平安科技(深圳)有限公司 | Speech enhancement method based on artificial intelligence, server and storage medium |
CN110853663B (en) * | 2019-10-12 | 2023-04-28 | 平安科技(深圳)有限公司 | Speech enhancement method based on artificial intelligence, server and storage medium |
WO2021068338A1 (en) * | 2019-10-12 | 2021-04-15 | 平安科技(深圳)有限公司 | Speech enhancement method based on artificial intelligence, server and storage medium |
CN111128197B (en) * | 2019-12-25 | 2022-05-13 | 北京邮电大学 | Multi-speaker voice separation method based on voiceprint features and generation confrontation learning |
CN111128197A (en) * | 2019-12-25 | 2020-05-08 | 北京邮电大学 | Multi-speaker voice separation method based on voiceprint features and generation confrontation learning |
CN111261147A (en) * | 2020-01-20 | 2020-06-09 | 浙江工业大学 | Music embedding attack defense method facing voice recognition system |
CN111276132A (en) * | 2020-02-04 | 2020-06-12 | 北京声智科技有限公司 | Voice processing method, electronic equipment and computer readable storage medium |
CN111243569B (en) * | 2020-02-24 | 2022-03-08 | 浙江工业大学 | Emotional voice automatic generation method and device based on generation type confrontation network |
CN111243569A (en) * | 2020-02-24 | 2020-06-05 | 浙江工业大学 | Emotional voice automatic generation method and device based on generation type confrontation network |
CN111341304A (en) * | 2020-02-28 | 2020-06-26 | 广州国音智能科技有限公司 | Method, device and equipment for training speech characteristics of speaker based on GAN |
WO2021203880A1 (en) * | 2020-04-10 | 2021-10-14 | 华为技术有限公司 | Speech enhancement method, neural network training method, and related device |
JP2022536190A (en) * | 2020-04-28 | 2022-08-12 | 平安科技(深▲せん▼)有限公司 | Voiceprint Recognition Method, Apparatus, Equipment, and Storage Medium |
JP7184236B2 (en) | 2020-04-28 | 2022-12-06 | 平安科技(深▲せん▼)有限公司 | Voiceprint Recognition Method, Apparatus, Equipment, and Storage Medium |
US11514925B2 (en) * | 2020-04-30 | 2022-11-29 | Adobe Inc. | Using a predictive model to automatically enhance audio having various audio quality issues |
CN111524526A (en) * | 2020-05-14 | 2020-08-11 | 中国工商银行股份有限公司 | Voiceprint recognition method and device |
CN111524526B (en) * | 2020-05-14 | 2023-11-17 | 中国工商银行股份有限公司 | Voiceprint recognition method and voiceprint recognition device |
CN111862989B (en) * | 2020-06-01 | 2024-03-08 | 北京捷通华声科技股份有限公司 | Acoustic feature processing method and device |
CN111862989A (en) * | 2020-06-01 | 2020-10-30 | 北京捷通华声科技股份有限公司 | Acoustic feature processing method and device |
CN111785281A (en) * | 2020-06-17 | 2020-10-16 | 国家计算机网络与信息安全管理中心 | Voiceprint recognition method and system based on channel compensation |
CN111883091A (en) * | 2020-07-09 | 2020-11-03 | 腾讯音乐娱乐科技(深圳)有限公司 | Audio noise reduction method and training method of audio noise reduction model |
CN112216300A (en) * | 2020-09-25 | 2021-01-12 | 三一专用汽车有限责任公司 | Noise reduction method and device for sound in driving cab of mixer truck and mixer truck |
CN112259112A (en) * | 2020-09-28 | 2021-01-22 | 上海声瀚信息科技有限公司 | Echo cancellation method combining voiceprint recognition and deep learning |
CN115668366A (en) * | 2020-10-15 | 2023-01-31 | 北京嘀嘀无限科技发展有限公司 | Acoustic echo cancellation method and system |
WO2022077305A1 (en) * | 2020-10-15 | 2022-04-21 | Beijing Didi Infinity Technology And Development Co., Ltd. | Method and system for acoustic echo cancellation |
CN112687275A (en) * | 2020-12-25 | 2021-04-20 | 北京中科深智科技有限公司 | Voice filtering method and filtering system |
CN112802491B (en) * | 2021-02-07 | 2022-06-14 | 武汉大学 | Voice enhancement method for generating confrontation network based on time-frequency domain |
CN112802491A (en) * | 2021-02-07 | 2021-05-14 | 武汉大学 | Voice enhancement method for generating countermeasure network based on time-frequency domain |
CN112989108A (en) * | 2021-02-24 | 2021-06-18 | 腾讯科技(深圳)有限公司 | Language detection method and device based on artificial intelligence and electronic equipment |
CN113035217A (en) * | 2021-03-01 | 2021-06-25 | 武汉大学 | Voice enhancement method based on voiceprint embedding under low signal-to-noise ratio condition |
CN113035217B (en) * | 2021-03-01 | 2023-11-10 | 武汉大学 | Voice enhancement method based on voiceprint embedding under low signal-to-noise ratio condition |
CN113571084A (en) * | 2021-07-08 | 2021-10-29 | 咪咕音乐有限公司 | Audio processing method, device, equipment and storage medium |
CN113571084B (en) * | 2021-07-08 | 2024-03-22 | 咪咕音乐有限公司 | Audio processing method, device, equipment and storage medium |
WO2023020500A1 (en) * | 2021-08-17 | 2023-02-23 | 中移(苏州)软件技术有限公司 | Speech separation method and apparatus, and storage medium |
CN113707168A (en) * | 2021-09-03 | 2021-11-26 | 合肥讯飞数码科技有限公司 | Voice enhancement method, device, equipment and storage medium |
CN113724713A (en) * | 2021-09-07 | 2021-11-30 | 科大讯飞股份有限公司 | Voice recognition method, device, equipment and storage medium |
CN113823293B (en) * | 2021-09-28 | 2024-04-26 | 武汉理工大学 | Speaker recognition method and system based on voice enhancement |
CN113823293A (en) * | 2021-09-28 | 2021-12-21 | 武汉理工大学 | Speaker recognition method and system based on voice enhancement |
WO2023102930A1 (en) * | 2021-12-10 | 2023-06-15 | 清华大学深圳国际研究生院 | Speech enhancement method, electronic device, program product, and storage medium |
CN114609493B (en) * | 2022-05-09 | 2022-08-12 | 杭州兆华电子股份有限公司 | Partial discharge signal identification method with enhanced signal data |
CN114609493A (en) * | 2022-05-09 | 2022-06-10 | 杭州兆华电子股份有限公司 | Partial discharge signal identification method with enhanced signal data |
CN116458894B (en) * | 2023-04-21 | 2024-01-26 | 山东省人工智能研究院 | Electrocardiosignal enhancement and classification method based on composite generation countermeasure network |
CN116458894A (en) * | 2023-04-21 | 2023-07-21 | 山东省人工智能研究院 | Electrocardiosignal enhancement and classification method based on composite generation countermeasure network |
Also Published As
Publication number | Publication date |
---|---|
CN109326302B (en) | 2022-11-08 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109326302A (en) | A kind of sound enhancement method comparing and generate confrontation network based on vocal print | |
CN104732978B (en) | The relevant method for distinguishing speek person of text based on combined depth study | |
CN105632501A (en) | Deep-learning-technology-based automatic accent classification method and apparatus | |
Hui et al. | Convolutional maxout neural networks for speech separation | |
Kinoshita et al. | Text-informed speech enhancement with deep neural networks. | |
Xu et al. | Global variance equalization for improving deep neural network based speech enhancement | |
CN108615533A (en) | A kind of high-performance sound enhancement method based on deep learning | |
CN108962229A (en) | A kind of target speaker's voice extraction method based on single channel, unsupervised formula | |
CN110136709A (en) | Audio recognition method and video conferencing system based on speech recognition | |
CN112331218B (en) | Single-channel voice separation method and device for multiple speakers | |
CN111899757A (en) | Single-channel voice separation method and system for target speaker extraction | |
Do et al. | Speech source separation using variational autoencoder and bandpass filter | |
Soleymani et al. | Prosodic-enhanced siamese convolutional neural networks for cross-device text-independent speaker verification | |
CN111798875A (en) | VAD implementation method based on three-value quantization compression | |
CN106297769B (en) | A kind of distinctive feature extracting method applied to languages identification | |
Bhardwaj et al. | Deep neural network trained Punjabi children speech recognition system using Kaldi toolkit | |
CN105845143A (en) | Speaker confirmation method and speaker confirmation system based on support vector machine | |
CN110246518A (en) | Speech-emotion recognition method, device, system and storage medium based on more granularity sound state fusion features | |
Sekkate et al. | Speaker identification for OFDM-based aeronautical communication system | |
Gadasin et al. | Using Formants for Human Speech Recognition by Artificial Intelligence | |
Tan et al. | Denoised senone i-vectors for robust speaker verification | |
CN112466276A (en) | Speech synthesis system training method and device and readable storage medium | |
Le et al. | Personalized speech enhancement combining band-split rnn and speaker attentive module | |
Wang et al. | Robust speech recognition from ratio masks | |
CN111785262B (en) | Speaker age and gender classification method based on residual error network and fusion characteristics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |