CN108922517A - The method, apparatus and storage medium of training blind source separating model - Google Patents
The method, apparatus and storage medium of training blind source separating model Download PDFInfo
- Publication number
- CN108922517A CN108922517A CN201810717811.1A CN201810717811A CN108922517A CN 108922517 A CN108922517 A CN 108922517A CN 201810717811 A CN201810717811 A CN 201810717811A CN 108922517 A CN108922517 A CN 108922517A
- Authority
- CN
- China
- Prior art keywords
- voice signal
- uproar
- blind source
- source separating
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012549 training Methods 0.000 title claims abstract description 93
- 238000000034 method Methods 0.000 title claims abstract description 47
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 37
- 238000001228 spectrum Methods 0.000 claims description 23
- 238000012545 processing Methods 0.000 claims description 22
- 230000015654 memory Effects 0.000 claims description 16
- 238000009432 framing Methods 0.000 claims description 12
- 241000208340 Araliaceae Species 0.000 claims description 8
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 claims description 8
- 235000003140 Panax quinquefolius Nutrition 0.000 claims description 8
- 235000008434 ginseng Nutrition 0.000 claims description 8
- 238000013528 artificial neural network Methods 0.000 claims description 3
- 230000005055 memory storage Effects 0.000 claims description 2
- 230000001537 neural effect Effects 0.000 claims 2
- 230000006870 function Effects 0.000 description 13
- 238000013461 design Methods 0.000 description 10
- 238000005516 engineering process Methods 0.000 description 9
- 238000012360 testing method Methods 0.000 description 9
- 230000000694 effects Effects 0.000 description 8
- 230000001276 controlling effect Effects 0.000 description 7
- 238000000605 extraction Methods 0.000 description 7
- 230000008569 process Effects 0.000 description 7
- 238000004891 communication Methods 0.000 description 5
- 238000010168 coupling process Methods 0.000 description 5
- 238000005859 coupling reaction Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 5
- 230000008878 coupling Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 239000000203 mixture Substances 0.000 description 3
- 230000000873 masking effect Effects 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 206010019133 Hangover Diseases 0.000 description 1
- 244000269722 Thea sinensis Species 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 238000005266 casting Methods 0.000 description 1
- 239000004568 cement Substances 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000000802 evaporation-induced self-assembly Methods 0.000 description 1
- 239000004744 fabric Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005259 measurement Methods 0.000 description 1
- 210000002569 neuron Anatomy 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000013139 quantization Methods 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 210000001260 vocal cord Anatomy 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Signal Processing (AREA)
- Evolutionary Computation (AREA)
- Soundproofing, Sound Blocking, And Sound Damping (AREA)
Abstract
The embodiment of the present invention provides the method, apparatus and storage medium of a kind of trained blind source separating model.The method of the training blind source separating model includes:Training voice signal is determined according to adding control parameter of making an uproar to add to make an uproar online, wherein control parameter of should plus making an uproar is the default distribution of satisfaction for controlling the parameter of noise;Using training voice signal training convolutional neural networks, blind source separating model is obtained.More preferably background sound can be suppressed to most strong degree in the case where damage foreground voice few as far as possible to the available performance of the embodiment of the present invention by blind source separating model, i.e. the blind source separating model.
Description
Technical field
The present embodiments relate to speech recognition technology more particularly to a kind of method, apparatus of trained blind source separating model
And storage medium.
Background technique
In recent years, speech recognition technology is applied to industry, household electrical appliances, communication, automotive electronics, medical treatment, family more and more
The fields such as front yard service, consumption electronic product.Under quiet environment, the accuracy of speech recognition technology can achieve 97%,
More than the auditory system of the mankind;But under noisy environment, the accuracy of speech recognition technology is also far below the auditory system of the mankind.
Wherein, the auditory system of the mankind can tell the interested sound in noisy environment, this phenomenon is called " cocktail party effect
It answers ".
" cocktail party effect " is technically described as blind source separating, that is, without reference to signal,
Interested " foreground voice " is separated from noisy " background sound ".Blind source separating is substantially regression model, i.e., blind source
Disjunctive model.In the training of existing blind source separating model, by the way of offline plus noise, saved after voice is added noise
On hard disk.
The blind source separating model performance obtained by the training of the above-mentioned prior art is poor, is in particular in following three kinds of feelings
Condition:1, background sound is not eliminated;2, foreground voice is also eliminated;3, background sound is not eliminated clean but foreground voice and is damaged.
Summary of the invention
The embodiment of the present invention provides the method, apparatus and storage medium of a kind of trained blind source separating model, to obtain performance
More preferably blind source separating model, i.e. the blind source separating model can be in the case where damage foreground voices few as far as possible, background sound
It is suppressed to most strong degree.
In a first aspect, the embodiment of the present invention provides a kind of method of trained blind source separating model, including:According to adding control of making an uproar
Parameter, which adds to make an uproar online, determines training voice signal, wherein described plus control parameter of making an uproar is to meet making an uproar for controlling for default distribution
The parameter of sound;Using the trained voice signal training convolutional neural networks, blind source separating model is obtained.
In a kind of possible design, described plus control parameter of making an uproar is signal-to-noise ratio.
It is described default to be distributed as being uniformly distributed or Gaussian Profile in a kind of possible design.
In a kind of possible design, the basis adds control parameter of making an uproar to add determining training voice signal of making an uproar online, including:
Obtain described plus make an uproar control parameter, voice signal and noise;The voice signal and institute are calculated according to described plus control parameter of making an uproar
State the mixed coefficint of noise;According to the mixed coefficint, the voice signal and the noise, the trained voice letter is determined
Number.
It is described to use the trained voice signal training convolutional neural networks in a kind of possible design, obtain blind source
Disjunctive model, including:
Sub-frame processing is carried out to the trained voice signal, obtains multiframe voice signal;
Using the multiframe voice signal training convolutional neural networks, the blind source separating model is obtained.
It is described using the multiframe voice signal training convolutional neural networks in a kind of possible design, it obtains
The blind source separating model, including:
To each frame voice signal, the characteristic value of the voice signal is extracted by following either type:
Mode one:Extract the amplitude spectrum of the voice signal;
Mode two:Extract the Meier frequency spectrum of the voice signal;
Mode three:Extract the mel-frequency cepstrum coefficient MFCC of the voice signal;
Enter ginseng using the corresponding characteristic value of the voice signal as the convolutional neural networks, by controlling the convolution
The mean square error of neural network obtains the blind source separating model.
Second aspect, the embodiment of the present invention provide a kind of device of trained blind source separating model, including:Determining module is used
Add to make an uproar online in basis plus control parameter of making an uproar and determine training voice signal, wherein described plus control parameter of making an uproar is to meet default point
The parameter for being used to control noise of cloth;Processing module is obtained for using the trained voice signal training convolutional neural networks
Blind source separating model.
In a kind of possible design, described plus control parameter of making an uproar is signal-to-noise ratio.
It is described default to be distributed as being uniformly distributed or Gaussian Profile in a kind of possible design.
In a kind of possible design, the determining module is specifically used for:
Obtain described plus make an uproar control parameter, voice signal and noise;
The mixed coefficint of the voice signal and the noise is calculated according to described plus control parameter of making an uproar;
According to the mixed coefficint, the voice signal and the noise, the trained voice signal is determined.
In a kind of possible design, the processing module includes:
Framing unit obtains multiframe voice signal for carrying out sub-frame processing to the trained voice signal;
Training unit, for obtaining the blind source point using the multiframe voice signal training convolutional neural networks
From model.
It is described in a kind of possible design
Training unit is specifically used for:
To each frame voice signal, the characteristic value of the voice signal is extracted by following either type:
Mode one:Extract the amplitude spectrum of the voice signal;
Mode two:Extract the Meier frequency spectrum of the voice signal;
Mode three:Extract the mel-frequency cepstrum coefficient MFCC of the voice signal;
Enter ginseng using the corresponding characteristic value of the voice signal as the convolutional neural networks, by controlling the convolution
The mean square error of neural network obtains the blind source separating model.
The third aspect, the embodiment of the present invention provide a kind of device of trained blind source separating model, including:Processor and storage
Device;The memory stores computer executed instructions;The processor executes the computer executed instructions of the memory storage,
So that the processor executes the method such as the described in any item trained blind source separating models of first aspect.
Fourth aspect, the embodiment of the present invention provide a kind of computer readable storage medium, the computer-readable storage medium
It is stored with computer executed instructions in matter, appoints when the computer executed instructions are executed by processor for realizing such as first aspect
The method of training blind source separating model described in one.
The method, apparatus and storage medium of trained blind source separating model provided in an embodiment of the present invention, according to adding control of making an uproar
Parameter, which adds to make an uproar online, determines training voice signal, wherein control parameter of should plus making an uproar is the default distribution of satisfaction for controlling noise
Parameter;Using the trained voice signal training convolutional neural networks, blind source separating model is obtained.Due to adding control parameter of making an uproar
For the parameter for being used to control noise for meeting default distribution, therefore, compared with prior art, the embodiment of the present invention is by setting plus makes an uproar
Control parameter meets default distribution, to increase amount of noise and type;And it is made an uproar by adding online so that blind source separating model is easy to
Adjustment.
Detailed description of the invention
In order to more clearly explain the embodiment of the invention or the technical proposal in the existing technology, to embodiment or will show below
There is attached drawing needed in technical description to do one simply to introduce, it should be apparent that, the accompanying drawings in the following description is this hair
Bright some embodiments for those of ordinary skill in the art without any creative labor, can be with
It obtains other drawings based on these drawings.
Fig. 1 is the flow chart of the method for the training blind source separating model that one embodiment of the invention provides;
Fig. 2 be another embodiment of the present invention provides training blind source separating model method flow chart;
Fig. 3 is the network architecture diagram for the training blind source separating model that one embodiment of the invention provides;
Fig. 4 is the structural schematic diagram of the device for the training blind source separating model that one embodiment of the invention provides;
Fig. 5 be another embodiment of the present invention provides training blind source separating model device structural schematic diagram.
Specific embodiment
Example embodiments are described in detail here, and the example is illustrated in the accompanying drawings.Following description is related to
When attached drawing, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements.Following exemplary embodiment
Described in embodiment do not represent all embodiments consistented with the present invention.On the contrary, they be only with it is such as appended
The example of device and method being described in detail in claims, some aspects of the invention are consistent.
Term " includes " and " having " and their any deformations in description and claims of this specification, it is intended that
It is to cover and non-exclusive includes.Such as the process, method, system, product or equipment for containing a series of steps or units do not have
It is defined in listed step or unit, but optionally further comprising the step of not listing or unit, or optionally also wrap
Include the other step or units intrinsic for these process, methods, product or equipment.
" first " and " second " in the embodiment of the present invention etc. only plays mark action, is not understood to indicate or imply suitable
Order relation, relative importance or the quantity for implicitly indicating indicated technical characteristic." multiple " refer to two or more.
"and/or" describes the incidence relation of affiliated partner, indicates may exist three kinds of relationships, for example, A and/or B, can indicate:It is single
Solely there are A, exist simultaneously A and B, these three situations of individualism B.It is a kind of that character "/", which typicallys represent forward-backward correlation object,
The relationship of "or".
" one embodiment " or " embodiment " mentioned in the whole text in specification of the invention means related with embodiment
A particular feature, structure, or characteristic include at least one embodiment of the application.Therefore, occur everywhere in the whole instruction
" in one embodiment " or " in one embodiment " not necessarily refer to identical embodiment.It should be noted that not rushing
In the case where prominent, the feature in embodiment and embodiment in the present invention be can be combined with each other.
Inventor's discovery:The necessity of blind source separating is the following aspects:
On the one hand, the audio signal that blind source separating can speak the voice of target speaker from speaker more than one section simultaneously
In extract.For example, TV is playing news hookup in parlor, user wants to carry out voice friendship with intelligent sound box on tea table
Mutually.Intelligent sound box has received the voice request of user simultaneously, and, the casting of host in news hookup.That is, same
The two people of host are speaking in one moment, user and news hookup, and intelligent sound needs the sound spoken simultaneously from two people
The corresponding voice of user is extracted in frequency signal.
On the other hand, blind source separating can separate voice from ambient noise.For example, typical example is exactly vehicle
Carry the speech recognition under environment.When driving, the microphone of vehicle device or mobile phone can receive various ambient noises:Wind
It makes an uproar, road noise, whistle etc., blind source separating can inhibit these ambient noises, and only speech enhan-cement is taken out, is sent to speech recognition system
In.
Above example is all more satisfactory situation.Blind source separating itself is a regression model, if blind source separating model
Performance is undesirable, just will appear following three kinds of situations:
1, background sound is not eliminated.That is blind source separating denoising effect is poor, low to the rejection ability of noise.
2, foreground voice is also eliminated.That is, blind source separating not only eliminates noise, meanwhile, also eliminate voice.
3, background sound is not eliminated clean but foreground voice and is damaged.This situation is the most universal, that is, in certain time-frequencies
Point, noise have been retained;And in other time frequency points, voice is but eliminated.
Therefore, two most crucial abilities of blind source separating are:Noise suppressed+do not damage voice.One good blind source separating
Model, it should background sound can be suppressed to most strong degree in the case where damage foreground voice few as far as possible.
In the training of current blind source separating model, using the mode of offline plus noise.Voice is added into noise, then
It is stored on hard disk.This mode is maximum to keep at least following two in check:
One, noise type and quantity are less.
Two, plus mode flexibility of making an uproar is poor and lower to the regulated efficiency of blind source separating model.
But, it is intended that blind source separating model can have various plus make an uproar with extensive various noises according to the variation of environment
Implementation.
Based on the above issues, the embodiment of the present invention provides a kind of method, apparatus of trained blind source separating model and storage is situated between
Matter, by the way that control parameter of making an uproar will be added to be designed as meeting the parameter for being used to control noise of default distribution, come increase noise type and
Quantity, and made an uproar by adding online come the mode flexibility that improves plus make an uproar, so that blind source separating model is easy to adjust.
The embodiment of the present invention can be applied to all kinds of intelligent sound boxes, DuerOS, smart television, intelligent refrigerator etc. and hand over voice
In the electronic equipment of mutual function, the method for the training blind source separating model has popularity.
Detailed embodiment is used, below to illustrate how the embodiment of the present invention realizes the online instruction of blind source separating model
Practice.
Fig. 1 is the flow chart of the method for the training blind source separating model that one embodiment of the invention provides.The embodiment of the present invention
A kind of method of trained blind source separating model is provided, the executing subject of the method for the training blind source separating model can be blind for training
The device of the device of source disjunctive model, the training blind source separating model can be realized by way of software and/or hardware.
Specifically, the device of training blind source separating model can include but is not limited at least one of the following:User sets
Standby, network equipment etc..Wherein, user equipment can include but is not limited to computer, smart phone, personal digital assistant
(Personal Digital Assistant, referred to as:) and the above-mentioned electronic equipment etc. referred to PDA.The network equipment may include
But be not limited to single network server, multiple network servers composition server group or based on cloud computing by a large amount of computers
Or the cloud that network server is constituted, wherein cloud computing is one kind of distributed computing, is made of the computer of a group loose couplings
A super virtual computer.The present embodiment is without limitation.
As shown in Figure 1, the method for the training blind source separating model includes:
S101, basis plus control parameter of making an uproar, which add to make an uproar online, determines training voice signal.
Wherein, control parameter of should plus making an uproar is the default distribution of satisfaction for controlling the parameter of noise.It is appreciated that should plus make an uproar
Control parameter is to meet default distribution.Here default distribution refers to parameter distribution, for example, normal distribution, being uniformly distributed, referring to
Number distribution, etc..The effect for adding control parameter of making an uproar is for controlled training noise in speech signal, for example, the ratio of noise, making an uproar
The gain etc. of sound.
For example, setting training voice signal is expressed as x, wherein actual speech is expressed as s, and noise is expressed as n, and a expression is made an uproar
The gain of sound, then x=s+a × n.Wherein, when control parameter of making an uproar being added to be used for the gain of controlled training noise in speech signal, as
Here a.
Alternatively, plus ratio of the control parameter for controlled training noise in speech signal of making an uproar, that is, signal-to-noise ratio.According to letter
It makes an uproar than (Signal-to-Noise Ratio, abbreviation:SNR calculation formula), can calculate according to the following formula can satisfy
The a of current SNR value:
In above formula, std () indicates standard deviation, and std (n) indicates the standard deviation of noise, and std (s) indicates the mark of actual speech
It is quasi- poor;Snr indicates current SNR value.
S102, using training voice signal training convolutional neural networks, obtain blind source separating model.
Wherein, convolutional neural networks (Convolutional Neural Network, abbreviation:It CNN may include) multiple
The convolutional layer of stacking.Illustratively, the convolutional layer of the convolutional neural networks bottom includes 257 nodes (i.e. neuron).
In practical applications, activation primitive of the sigmoid function as the convolutional neural networks, mean square error can be used
(Mean Square Error, referred to as:MSE) as the cost function of the convolutional neural networks, but the embodiment of the present invention is not with this
It is limited.
It should be noted that the length of the training voice signal in the embodiment of the present invention can be set according to actual needs
It sets.The speech recognition of requirement in view of to(for) real-time, the length of training voice signal should not be arranged in the embodiment of the present invention
It is too long, for example, the length that trained voice signal can be set is 10 milliseconds;Alternatively, training voice signal in S101 is divided
Frame processing, obtains the training voice signal of preset length, which is less than the length of training voice signal in S101.
Series of preprocessing, such as above-mentioned framing can be carried out to the training voice signal that previous step determines in the step
Processing etc., does not limit specific pretreatment mode here, illustrates and can refer to subsequent embodiment.
It should be noted that test phase is similar with all fitting problems based on machine learning, details are not described herein again.Here
It should be noted that test phase does not need other plus noise.That is, being input to the training in convolutional neural networks
Voice signal has been with noise.Therefore, the structure of the convolutional neural networks of test phase is training stage network structure
A sub-network.
In the embodiment of the present invention, training voice signal is determined according to adding control parameter of making an uproar to add to make an uproar online, wherein control of should plus making an uproar
Parameter processed is to meet the parameter for being used to control noise of default distribution;Using training voice signal training convolutional neural networks, obtain
To blind source separating model.Due to adding control parameter of making an uproar to be to meet the parameter for being used to control noise of default distribution, compared to existing
There is technology, the embodiment of the present invention meets default distribution by setting plus control parameter of making an uproar, to increase amount of noise and type;And lead to
It crosses online plus makes an uproar so that blind source separating model is easy to adjust.
The control parameter that refers to plus make an uproar in above-described embodiment can be specially signal-to-noise ratio, default distribution can be uniformly distributed or
Gaussian Profile etc., therefore, here by signal-to-noise ratio satisfaction be uniformly distributed or Gaussian Profile for be further expalined.
Be uniformly distributed is to be evenly distributed between two values.Such as:It is uniformly distributed between 5db to 30db.Using satisfaction
The benefit that the signal-to-noise ratio of this default distribution carries out plus makes an uproar is that comparison is direct.The calculation formula of signal-to-noise ratio is:
SNR=randm (A, B)
Here A is lower limit, is here 5db in a example.
B is online, is 30db in this example.
Randm is uniformly distributed function.
Gaussian Profile, also known as normal distribution (Normal distribution) or normal distribution.Such as:Signal-to-noise ratio clothes
From being contemplated to be 15db, the Gaussian Profile that variance is 5.The calculation formula of signal-to-noise ratio is:
SNR=gauss (C, D)
Here, C is desired value, is here 15 in a example.
D is variance, is 5 in this example.
Gauss is the function of Gaussian Profile.
In some embodiments, S101, basis plus control parameter of making an uproar, which add to make an uproar online, determines training voice signal, may include:
Obtain plus make an uproar control parameter, voice signal and noise;According to the mixed coefficint for adding control parameter of making an uproar to calculate voice signal and noise;
According to mixed coefficint, voice signal and noise, training voice signal is determined.
Illustratively, it sets training voice signal and is expressed as x, wherein voice signal (i.e. actual speech) is expressed as s, makes an uproar
Sound is expressed as n, and a indicates the gain of noise, then x=s+a × n.Wherein, adding control parameter of making an uproar is signal-to-noise ratio, is used for controlled training
The gain a of noise in speech signal, i.e. mixed coefficint.
It is appreciated that online plus when making an uproar, electronic equipment reads three data:Voice signal, noise add control parameter of making an uproar,
Wherein, add control parameter of making an uproar is signal-to-noise ratio here;Later, according to the mixed coefficint a of signal-to-noise ratio computation voice signal and noise, and
It is obtained training voice signal according to relational expression x=s+a × n.
Wherein, voice signal and noise can be float type, i.e., value is between -1 to 1;Alternatively, voice signal and
Noise is also possible to int type, value between -32767 to 32767 (16 quantizations), etc..
Fig. 2 be another embodiment of the present invention provides training blind source separating model method flow chart.As shown in Fig. 2,
On the basis of process shown in Fig. 1, wherein S102, using training voice signal training convolutional neural networks, obtain blind source separating
Model may include:
S201, sub-frame processing is carried out to training voice signal, obtains multiframe voice signal.
The step corresponds to framing layer, and main function is to carry out sub-frame processing to training voice signal, obtains one by one
Voice signal.
Wherein, voice signal is either continuously, can also take 10~30ms using overlapping framing, general frame length.Before
The overlapped portion of one frame voice signal and a later frame voice signal is known as frame shifting, and the ratio between frame shifting and frame length are generally taken as 0~1/2.
For example, it is 25ms that frame length, which can be set, it is 10ms that frame, which moves,.
Optionally, windowing process is carried out to voice signal, i.e., is multiplied with certain window function w (n) with voice signal, thus
Form adding window voice signal.The purpose of windowing process is to reduce by sub-frame processing bring spectral leakage.This is because framing
Processing is the unexpected truncation to training voice signal, and the period of the frequency spectrum and window function frequency spectrum that are equivalent to trained voice signal rolls up
Product.Since the secondary lobe of window frequency spectrum is higher, the frequency spectrum of training voice signal can generate " hangover ", i.e. spectral leakage.For this purpose, can be used
Hamming (hamming) window can be special with smoother low pass efficiently against leakage phenomenon because Hamming window secondary lobe is minimum
Property, obtained frequency spectrum is smoother.
S202, using multiframe voice signal training convolutional neural networks, obtain blind source separating model.
Optionally, which includes:Each frame voice signal is carried out the following processing:Extract the characteristic value of voice signal;
Enter ginseng using the characteristic value of voice signal as convolutional neural networks, the mean square error by controlling convolutional neural networks obtains blind
Source disjunctive model.
Wherein, convolutional neural networks are used as using acoustic feature and input, therefore each frame is also extracted after framing
The characteristic value of voice signal.
Optionally, the characteristic value of voice signal is extracted in the step, at least can be any in following several implementations
It is a kind of:
The first implementation:Extract the amplitude spectrum of voice signal.
Specifically, the extraction of amplitude spectrum may include:Discrete fourier transform is carried out to voice signal, for example, using discrete
Fast algorithm (Fast Fourier Transformation, the abbreviation of Fourier transform:FFT) voice signal is handled;So
Afterwards, the absolute value after taking discrete fourier transform obtains the amplitude spectrum of voice signal.
Second of implementation:Extract the Meier frequency spectrum (Mel-Filter banks) of voice signal.
The third implementation:Extract mel-frequency cepstrum coefficient (the Mel-Frequency Cepstral of voice signal
Coefficients, referred to as:MFCC).
Wherein, Meier frequency spectrum is similar with the whole extraction process of MFCC, and (discrete cosine becomes the only more step DCT of MFCC
It changes).Whole extraction process may include following steps:
1) preemphasis, effect are exactly the effect caused by vocal cords and lip in order to eliminate in voiced process, to compensate voice letter
Number high frequency section constrained by articulatory system.And the formant of high frequency can be highlighted.The step is optional step.
2) Short Time Fourier Transform (Short-Time Fourier Transform, abbreviation:STFT), vector spy is obtained
Sign, and power spectrum (by square) is converted by energy (amplitude) spectrum.
3) Meier filters, and is filtered by Meier filter group, to obtain meeting the sound spectrum of human auditory system habit, finally
Usually take logarithm by Conversion of measurement unit at db.
4) DCT, discrete cosine transform obtain cepstrum coefficient, that is, MFCC.
Later, in test phase, the blind source separating model obtained using above-described embodiment surveys tested speech signal
The characteristic value of tested speech signal and blind source separating model are carried out dot product, obtain the corresponding test of tested speech signal by examination
As a result.Specifically, by the characteristic value of each frame tested speech signal, frame by frame with blind source separating model carry out dot product, obtain:
Y=h.*x
Wherein .* indicates that dot product symbol, x indicate characteristic value, and y indicates the corresponding test result of tested speech signal, and h is indicated
The blind source separating model that training obtains.
In a kind of implementation, the network architecture for obtaining above-mentioned blind source separating model is as shown in Figure 3.With reference to Fig. 3, the network
Framework may include:Add make an uproar layer and feature extraction layer.Wherein, the function of feature extraction layer may include:Feature is extracted in framing
The calculating of value and desired proportions masking value.
The network architecture of test phase may include feature extraction layer and test layer.Here the function of feature extraction layer
It includes at least:The use of characteristic value and desired proportions masking value is extracted in framing (frame).Illustratively, test phase
The network architecture is followed successively by from top to bottom:Frame (framing), fft (Fast Fourier Transform (FFT)), log (taking logarithm), conv1d64
(convolution), bn, relu, conv1d64, bn, relu, conv1d64, bn, relu, conv1d64, bn, relu, conv1d64,
Bn, relu, linear, bn and relu.Wherein Each part can refer to the relevant technologies, and details are not described herein again.
Fig. 4 is the structural schematic diagram of the device for the training blind source separating model that one embodiment of the invention provides.The present invention is real
Apply example and a kind of device of trained blind source separating model be provided, the device of the training blind source separating model can by software and/or
The mode of hardware is realized.
Specifically, the device of training blind source separating model can include but is not limited at least one of the following:User sets
The standby, network equipment.Wherein, user equipment can include but is not limited to computer, smart phone, PDA and the above-mentioned electronics referred to
Equipment etc..The network equipment can include but is not limited to single network server, multiple network servers composition server group or
Cloud consisting of a large number of computers or network servers based on cloud computing, wherein cloud computing is one kind of distributed computing, by
One super virtual computer of the computer composition of a group loose couplings.The present embodiment is without limitation.
As shown in figure 4, the device 30 of training blind source separating model includes:Determining module 31 and processing module 32.Wherein,
The determining module 31 adds determining training voice signal of making an uproar for basis plus control parameter of making an uproar online.Wherein, add control of making an uproar
Parameter processed is to meet the parameter for being used to control noise of default distribution.
The processing module 32 obtains blind source separating mould for using the trained voice signal training convolutional neural networks
Type.
The device of trained blind source separating model provided in this embodiment, can be used for executing above-mentioned embodiment of the method, in fact
Existing mode is similar with technical effect, and details are not described herein again for the present embodiment.
Illustratively, adding control parameter of making an uproar can be signal-to-noise ratio.
Optionally, default distribution can be to be uniformly distributed or Gaussian Profile etc..
In some embodiments, determining module 31 can be specifically used for:Obtain plus make an uproar control parameter, voice signal and noise;Root
According to the mixed coefficint for adding control parameter of making an uproar to calculate voice signal and noise;According to mixed coefficint, voice signal and noise, instruction is determined
Practice voice signal.
Further, processing module 32 may include:Framing unit (not shown) and training unit (not shown).Wherein,
Framing unit obtains multiframe voice signal for carrying out sub-frame processing to training voice signal;
Training unit obtains blind source separating model for using multiframe voice signal training convolutional neural networks.
Optionally, training unit can be specifically used for:
To each frame voice signal, the characteristic value of the voice signal is extracted by following either type:
Mode one:Extract the amplitude spectrum of voice signal;
Mode two:Extract the Meier frequency spectrum of voice signal;
Mode three:Extract the MFCC of voice signal;
Enter ginseng using the corresponding characteristic value of voice signal as convolutional neural networks, passes through the equal of control convolutional neural networks
Square error obtains blind source separating model.
Above-described embodiment determines training voice signal according to adding control parameter of making an uproar to add to make an uproar online, wherein control ginseng of should plus making an uproar
Number is the default distribution of satisfaction for controlling the parameter of noise;Using the trained voice signal training convolutional neural networks, obtain
To blind source separating model.Due to adding control parameter of making an uproar to be to meet the parameter for being used to control noise of default distribution, compared to existing
There is technology, the embodiment of the present invention meets default distribution by setting plus control parameter of making an uproar, to increase amount of noise and type;And lead to
It crosses online plus makes an uproar so that blind source separating model is easy to adjust.
Fig. 5 be another embodiment of the present invention provides training blind source separating model device structural schematic diagram.Such as Fig. 5 institute
Show, the device 40 of the training blind source separating model includes:
Processor 41 and memory 42;
Memory 42 stores computer executed instructions;
Processor 41 executes the computer executed instructions that memory 42 stores, so that processor 41 executes instruction as described above
Practice the method for blind source separating model.
The specific implementation process of processor 41 can be found in above method embodiment, and it is similar that the realization principle and technical effect are similar,
Details are not described herein again for the present embodiment.
Optionally, the device 40 of the training blind source separating model further includes communication component 43.Wherein, processor 41, storage
Device 42 and communication component 43 can be connected by bus 44.
The embodiment of the present invention also provides a kind of computer readable storage medium, stores in the computer readable storage medium
There are computer executed instructions, trains blind source point as described above when the computer executed instructions are executed by processor
Method from model.
In the above-described embodiment, it should be understood that disclosed device and method, it can be real by another way
It is existing.For example, apparatus embodiments described above are merely indicative, for example, the division of the module, only one kind are patrolled
Function division is collected, there may be another division manner in actual implementation, such as multiple modules may be combined or can be integrated into
Another system, or some features can be ignored or not executed.Another point, shown or discussed mutual coupling or
Direct-coupling or communication connection can be through some interfaces, and the indirect coupling or communication connection of device or module can be electricity
Property, mechanical or other forms.
The module as illustrated by the separation member may or may not be physically separated, aobvious as module
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.Some or all of the modules therein can be selected to realize the mesh of this embodiment scheme according to the actual needs
's.
It, can also be in addition, each functional module in each embodiment of the present invention can integrate in one processing unit
It is that modules physically exist alone, can also be integrated in one unit with two or more modules.Above-mentioned module at
Unit both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.
The above-mentioned integrated module realized in the form of software function module, can store and computer-readable deposit at one
In storage media.Above-mentioned software function module is stored in a storage medium, including some instructions are used so that a computer
Equipment (can be personal computer, server or the network equipment etc.) or processor (English:Processor this Shen) is executed
Please each embodiment the method part steps.
It should be understood that above-mentioned processor can be central processing unit (English:Central Processing Unit, letter
Claim:CPU), it can also be other general processors, digital signal processor (English:Digital Signal Processor,
Referred to as:DSP), specific integrated circuit (English:Application Specific Integrated Circuit, referred to as:
ASIC) etc..General processor can be microprocessor or the processor is also possible to any conventional processor etc..In conjunction with hair
The step of bright disclosed method, can be embodied directly in hardware processor and execute completion, or with hardware in processor and soft
Part block combiner executes completion.
Memory may include high speed RAM memory, it is also possible to and it further include non-volatile memories NVM, for example, at least one
Magnetic disk storage can also be USB flash disk, mobile hard disk, read-only memory, disk or CD etc..
Bus can be industry standard architecture (Industry Standard Architecture, ISA) bus, outer
Portion's apparatus interconnection (Peripheral Component, PCI) bus or extended industry-standard architecture (Extended
Industry Standard Architecture, EISA) bus etc..Bus can be divided into address bus, data/address bus, control
Bus etc..For convenient for indicating, the bus in illustrations does not limit only a bus or a type of bus.
Above-mentioned storage medium can be by any kind of volatibility or non-volatile memory device or their combination
It realizes, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable
Read-only memory (EPROM), programmable read only memory (PROM), read-only memory (ROM), magnetic memory, flash memory,
Disk or CD.Storage medium can be any usable medium that general or specialized computer can access.
A kind of illustrative storage medium is coupled to processor, believes to enable a processor to read from the storage medium
Breath, and information can be written to the storage medium.Certainly, storage medium is also possible to the component part of processor.It processor and deposits
Storage media can be located at specific integrated circuit (Application Specific Integrated Circuits, abbreviation:
ASIC in).Certainly, pocessor and storage media can also be used as discrete assembly and be present in terminal or server.
Those of ordinary skill in the art will appreciate that:Realize that all or part of the steps of above-mentioned each method embodiment can lead to
The relevant hardware of program instruction is crossed to complete.Program above-mentioned can be stored in a computer readable storage medium.The journey
When being executed, execution includes the steps that above-mentioned each method embodiment to sequence;And storage medium above-mentioned includes:ROM, RAM, magnetic disk or
The various media that can store program code such as person's CD.
Finally it should be noted that:The above embodiments are only used to illustrate the technical solution of the present invention., rather than its limitations;To the greatest extent
Present invention has been described in detail with reference to the aforementioned embodiments for pipe, those skilled in the art should understand that:Its according to
So be possible to modify the technical solutions described in the foregoing embodiments, or to some or all of the technical features into
Row equivalent replacement;And these are modified or replaceed, various embodiments of the present invention technology that it does not separate the essence of the corresponding technical solution
The range of scheme.
Claims (14)
1. a kind of method of trained blind source separating model, which is characterized in that including:
Training voice signal is determined according to adding control parameter of making an uproar to add to make an uproar online, wherein described plus control parameter of making an uproar is default to meet
The parameter for being used to control noise of distribution;
Using the trained voice signal training convolutional neural networks, blind source separating model is obtained.
2. the method according to claim 1, wherein described plus control parameter of making an uproar is signal-to-noise ratio.
3. default being distributed as being uniformly distributed or Gaussian Profile the method according to claim 1, wherein described.
4. method according to any one of claims 1 to 3, which is characterized in that the basis adds control parameter of making an uproar to add online
It makes an uproar and determines training voice signal, including:
Obtain described plus make an uproar control parameter, voice signal and noise;
The mixed coefficint of the voice signal and the noise is calculated according to described plus control parameter of making an uproar;
According to the mixed coefficint, the voice signal and the noise, the trained voice signal is determined.
5. method according to any one of claims 1 to 3, which is characterized in that described using the trained voice signal instruction
Practice convolutional neural networks, obtains blind source separating model, including:
Sub-frame processing is carried out to the trained voice signal, obtains multiframe voice signal;
Using the multiframe voice signal training convolutional neural networks, the blind source separating model is obtained.
6. according to the method described in claim 5, it is characterized in that, described using the multiframe voice signal training convolution
Neural network obtains the blind source separating model, including:
To each frame voice signal, the characteristic value of the voice signal is extracted by following either type:
Mode one:Extract the amplitude spectrum of the voice signal;
Mode two:Extract the Meier frequency spectrum of the voice signal;
Mode three:Extract the mel-frequency cepstrum coefficient MFCC of the voice signal;
Enter ginseng using the corresponding characteristic value of the voice signal as the convolutional neural networks, by controlling the convolutional Neural
The mean square error of network obtains the blind source separating model.
7. a kind of device of trained blind source separating model, which is characterized in that including:
Determining module, for according to plus make an uproar control parameter add online make an uproar determine training voice signal, wherein it is described plus make an uproar control ginseng
Number is the default distribution of satisfaction for controlling the parameter of noise;
Processing module obtains blind source separating model for using the trained voice signal training convolutional neural networks.
8. device according to claim 7, which is characterized in that described plus control parameter of making an uproar is signal-to-noise ratio.
9. device according to claim 7, which is characterized in that described default to be distributed as being uniformly distributed or Gaussian Profile.
10. device according to any one of claims 7 to 9, which is characterized in that the determining module is specifically used for:
Obtain described plus make an uproar control parameter, voice signal and noise;
The mixed coefficint of the voice signal and the noise is calculated according to described plus control parameter of making an uproar;
According to the mixed coefficint, the voice signal and the noise, the trained voice signal is determined.
11. device according to any one of claims 7 to 9, which is characterized in that the processing module includes:
Framing unit obtains multiframe voice signal for carrying out sub-frame processing to the trained voice signal;
Training unit, for obtaining the blind source separating mould using the multiframe voice signal training convolutional neural networks
Type.
12. device according to claim 11, which is characterized in that the training unit is specifically used for:
To each frame voice signal, the characteristic value of the voice signal is extracted by following either type:
Mode one:Extract the amplitude spectrum of the voice signal;
Mode two:Extract the Meier frequency spectrum of the voice signal;
Mode three:Extract the mel-frequency cepstrum coefficient MFCC of the voice signal;
Enter ginseng using the corresponding characteristic value of the voice signal as the convolutional neural networks, by controlling the convolutional Neural
The mean square error of network obtains the blind source separating model.
13. a kind of device of trained blind source separating model, which is characterized in that including:Processor and memory;
The memory stores computer executed instructions;
The processor executes the computer executed instructions of the memory storage, so that the processor executes such as claim
The method of 1 to 6 described in any item trained blind source separating models.
14. a kind of computer readable storage medium, which is characterized in that be stored with computer in the computer readable storage medium
It executes instruction, for realizing instruction such as claimed in any one of claims 1 to 6 when the computer executed instructions are executed by processor
Practice the method for blind source separating model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810717811.1A CN108922517A (en) | 2018-07-03 | 2018-07-03 | The method, apparatus and storage medium of training blind source separating model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810717811.1A CN108922517A (en) | 2018-07-03 | 2018-07-03 | The method, apparatus and storage medium of training blind source separating model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108922517A true CN108922517A (en) | 2018-11-30 |
Family
ID=64423445
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810717811.1A Pending CN108922517A (en) | 2018-07-03 | 2018-07-03 | The method, apparatus and storage medium of training blind source separating model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108922517A (en) |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110222693A (en) * | 2019-06-03 | 2019-09-10 | 第四范式(北京)技术有限公司 | The method and apparatus for constructing character recognition model and identifying character |
CN111081222A (en) * | 2019-12-30 | 2020-04-28 | 北京明略软件系统有限公司 | Speech recognition method, speech recognition apparatus, storage medium, and electronic apparatus |
CN111243573A (en) * | 2019-12-31 | 2020-06-05 | 深圳市瑞讯云技术有限公司 | Voice training method and device |
WO2021027132A1 (en) * | 2019-08-12 | 2021-02-18 | 平安科技(深圳)有限公司 | Audio processing method and apparatus and computer storage medium |
CN112489675A (en) * | 2020-11-13 | 2021-03-12 | 北京云从科技有限公司 | Multi-channel blind source separation method and device, machine readable medium and equipment |
CN114067785A (en) * | 2022-01-05 | 2022-02-18 | 江苏清微智能科技有限公司 | Voice deep neural network training method and device, storage medium and electronic device |
CN117292703A (en) * | 2023-11-24 | 2023-12-26 | 国网辽宁省电力有限公司电力科学研究院 | Sound source positioning method and device for transformer equipment, electronic equipment and storage medium |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101366078A (en) * | 2005-10-06 | 2009-02-11 | Dts公司 | Neural network classifier for separating audio sources from a monophonic audio signal |
CN101710490A (en) * | 2009-11-20 | 2010-05-19 | 安徽科大讯飞信息科技股份有限公司 | Method and device for compensating noise for voice assessment |
CN106297819A (en) * | 2015-05-25 | 2017-01-04 | 国家计算机网络与信息安全管理中心 | A kind of noise cancellation method being applied to Speaker Identification |
US20170178664A1 (en) * | 2014-04-11 | 2017-06-22 | Analog Devices, Inc. | Apparatus, systems and methods for providing cloud based blind source separation services |
CN107481731A (en) * | 2017-08-01 | 2017-12-15 | 百度在线网络技术(北京)有限公司 | A kind of speech data Enhancement Method and system |
US9858949B2 (en) * | 2015-08-20 | 2018-01-02 | Honda Motor Co., Ltd. | Acoustic processing apparatus and acoustic processing method |
CN107680586A (en) * | 2017-08-01 | 2018-02-09 | 百度在线网络技术(北京)有限公司 | Far field Speech acoustics model training method and system |
-
2018
- 2018-07-03 CN CN201810717811.1A patent/CN108922517A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101366078A (en) * | 2005-10-06 | 2009-02-11 | Dts公司 | Neural network classifier for separating audio sources from a monophonic audio signal |
CN101710490A (en) * | 2009-11-20 | 2010-05-19 | 安徽科大讯飞信息科技股份有限公司 | Method and device for compensating noise for voice assessment |
US20170178664A1 (en) * | 2014-04-11 | 2017-06-22 | Analog Devices, Inc. | Apparatus, systems and methods for providing cloud based blind source separation services |
CN106297819A (en) * | 2015-05-25 | 2017-01-04 | 国家计算机网络与信息安全管理中心 | A kind of noise cancellation method being applied to Speaker Identification |
US9858949B2 (en) * | 2015-08-20 | 2018-01-02 | Honda Motor Co., Ltd. | Acoustic processing apparatus and acoustic processing method |
CN107481731A (en) * | 2017-08-01 | 2017-12-15 | 百度在线网络技术(北京)有限公司 | A kind of speech data Enhancement Method and system |
CN107680586A (en) * | 2017-08-01 | 2018-02-09 | 百度在线网络技术(北京)有限公司 | Far field Speech acoustics model training method and system |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110222693A (en) * | 2019-06-03 | 2019-09-10 | 第四范式(北京)技术有限公司 | The method and apparatus for constructing character recognition model and identifying character |
WO2021027132A1 (en) * | 2019-08-12 | 2021-02-18 | 平安科技(深圳)有限公司 | Audio processing method and apparatus and computer storage medium |
CN111081222A (en) * | 2019-12-30 | 2020-04-28 | 北京明略软件系统有限公司 | Speech recognition method, speech recognition apparatus, storage medium, and electronic apparatus |
CN111243573A (en) * | 2019-12-31 | 2020-06-05 | 深圳市瑞讯云技术有限公司 | Voice training method and device |
CN112489675A (en) * | 2020-11-13 | 2021-03-12 | 北京云从科技有限公司 | Multi-channel blind source separation method and device, machine readable medium and equipment |
CN114067785A (en) * | 2022-01-05 | 2022-02-18 | 江苏清微智能科技有限公司 | Voice deep neural network training method and device, storage medium and electronic device |
CN117292703A (en) * | 2023-11-24 | 2023-12-26 | 国网辽宁省电力有限公司电力科学研究院 | Sound source positioning method and device for transformer equipment, electronic equipment and storage medium |
CN117292703B (en) * | 2023-11-24 | 2024-03-15 | 国网辽宁省电力有限公司电力科学研究院 | Sound source positioning method and device for transformer equipment, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108922517A (en) | The method, apparatus and storage medium of training blind source separating model | |
Michelsanti et al. | Conditional generative adversarial networks for speech enhancement and noise-robust speaker verification | |
CN106486131B (en) | A kind of method and device of speech de-noising | |
CN108899047B (en) | The masking threshold estimation method, apparatus and storage medium of audio signal | |
CN109036460B (en) | Voice processing method and device based on multi-model neural network | |
CN107452389A (en) | A kind of general monophonic real-time noise-reducing method | |
CN106161751B (en) | A kind of noise suppressing method and device | |
CN109817209A (en) | A kind of intelligent speech interactive system based on two-microphone array | |
CN111508519B (en) | Method and device for enhancing voice of audio signal | |
CN109036437A (en) | Accents recognition method, apparatus, computer installation and computer readable storage medium | |
Liang et al. | Real-time speech enhancement algorithm based on attention LSTM | |
CN113744749B (en) | Speech enhancement method and system based on psychoacoustic domain weighting loss function | |
Xie et al. | Real-time, robust and adaptive universal adversarial attacks against speaker recognition systems | |
Sivaraman et al. | Personalized speech enhancement through self-supervised data augmentation and purification | |
CN108922514B (en) | Robust feature extraction method based on low-frequency log spectrum | |
CN111916093A (en) | Audio processing method and device | |
CN112712818A (en) | Voice enhancement method, device and equipment | |
CN110176243A (en) | Sound enhancement method, model training method, device and computer equipment | |
CN109215635B (en) | Broadband voice frequency spectrum gradient characteristic parameter reconstruction method for voice definition enhancement | |
CN103971697B (en) | Sound enhancement method based on non-local mean filtering | |
CN110875037A (en) | Voice data processing method and device and electronic equipment | |
CN113035216A (en) | Microphone array voice enhancement method and related equipment thereof | |
CN108574911B (en) | The unsupervised single microphone voice de-noising method of one kind and system | |
Ayhan et al. | Robust speaker identification algorithms and results in noisy environments | |
Goodarzi et al. | Feature bandwidth extension for Persian conversational telephone speech recognition |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181130 |