CN107123432A - A kind of Self Matching Top N audio events recognize channel self-adapted method - Google Patents
A kind of Self Matching Top N audio events recognize channel self-adapted method Download PDFInfo
- Publication number
- CN107123432A CN107123432A CN201710334633.XA CN201710334633A CN107123432A CN 107123432 A CN107123432 A CN 107123432A CN 201710334633 A CN201710334633 A CN 201710334633A CN 107123432 A CN107123432 A CN 107123432A
- Authority
- CN
- China
- Prior art keywords
- mrow
- msub
- msup
- channel
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/45—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
- G10L2015/0635—Training updating or merging of old and new templates; Mean values; Weighting
Abstract
Channel self-adapted method is recognized the present invention relates to a kind of Self Matching Top N audio events, is said from the angle of application scenarios, belongs to audio event identification technology field;For the angle realized from technology, computer science and audio signal processing technique field are also belonged to.The present invention carries out data prediction first, preprocessing process includes quantization, sampling, preemphasis and adding window, then feature extraction is carried out, namely required audio low-level image feature parameter is extracted, characteristic vector generation is carried out afterwards, namely the feature frame sequence of extraction is compressed according to segment length and section shifting and obtains vector paragraph, followed by Feature Mapping, Feature Mapping is the process that channel correlated characteristic vector paragraph is mapped as to channel extraneous features vector paragraph, Feature Mapping FM modules can be divided into FM training and FM using two parts, finally carry out model training and identification.The problem of present invention can solve under different value of K channel model Gaussian component number Top N selective problems and uneven covering channel information, a kind of preferable channel self-adapted method is provided for the audio event identification under the influence of network transmission encoding variability.
Description
Technical field
Channel self-adapted method is recognized the present invention relates to a kind of Self Matching Top-N audio events, from the angle of application scenarios
Say, belong to audio event identification technology field;For the angle realized from technology, computer science and audio frequency process skill are also belonged to
Art field.
Background technology
Audio event identifying system in actual applications often due to record environment, collecting device, coded system difference
And channel mismatch problem is produced, a relatively conventional class is the channel mismatch that encoding variability is introduced, and channel self-adapted method is pair
The characteristic parameter being distorted in channel mismatch is modified, so that the characteristic information of raw tone is more accurately reacted, letter
Road is adaptively commonly divided into that property field is adaptive, model domain adaptive and score domain is adaptive, can select one of them or
It is multiple to carry out adaptively.
Property field is adaptively current most widely used channel self-adapted method.Property field channel self-adapted method can divide
It is adaptive and channel non-linearity is adaptive for channel linearity, based on the adaptive method of channel linearity typically more and effect ratio
Preferably, the usually standard configuration of audio recognition systems.Wherein more typical channel linearity adaptive approach and channel non-thread
Property adaptive approach has:
1. cepstral mean subtracts
It is a kind of method for being widely used in removing channel convolution noise in speech recognition, the sheet of this method that cepstral mean, which subtracts,
Matter is the additive noise Convolution Noise on frequency domain being transformed on cepstrum domain, when subtracting average on the cepstrum parameter in cepstrum domain
When, it is possible to Convolution Noise is removed, the performance is especially prominent when channel distortions model is linear characteristic.But if during voice
Long shorter or voice segments are cleaner, and subtracting method effect using cepstral mean will not substantially, it could even be possible to causing systematic function
Decline.And when channel distortions are non-linear distortion, the validity that cepstral mean subtracts also can be by a definite limitation.
2. cepstral mean variance is regular
The regular variance progress further to cepstral domain feature parameter of cepstrum variance is regular.Cepstral mean subtracts and cepstrum variance
Regular to be combined, referred to as cepstral mean variance is regular.The regular thinking of cepstral mean variance and implementation are simple, know in voice
Aspect does not achieve preferable effect, but the channel distortions effect for non-linear distortion is not clearly.
3. vector Taylor series
Vector Taylor series are a kind of feature compensation methods of relatively practicality, generally by an explicit model
To describe the generation of Noisy Speech Signal, if clean speech and noise obey gauss hybrid models and single Gauss point respectively
Cloth, is linearized, it is ensured that noisy speech also obeys Gauss using vector Taylor expansion method to non-linear environmental model
Mixed model, it is assumed that training and tested speech signal are steady, estimates ambient noise statistic, most using EM algorithm
Afterwards clean speech feature is estimated using minimum mean square error criterion.Vector Taylor series algorithm has good noiseproof feature,
But this method, which is typically all the offline gauss hybrid models for completing and using, is generally 128 even more highs, not only iteration is secondary
Number is more and computationally intensive, and general is difficult to meet requirement of real-time.Need to be improved classic algorithm to lift its computing effect
Rate and real-time.
4. Feature Mapping
Feature Mapping method is based on GMM-UBM models, is developed by speaker model synthesis method, the purpose of this method
It is that the related phonetic feature of channel is mapped in a unrelated space of channel, mould is carried out using the unrelated characteristic vector of channel
Type training and identification.Main process includes two aspects:Channel model is trained and eigentransformation.Feature Mapping method is at present should
With one of widest channel self-adapted method, property field is acted on, with very high flexibility and convenience.
In summary, existing Feature Mapping method is only adaptive with regard to the maximum Gaussian component progress of score in eigentransformation
Should, when M is Gaussian component number, the channel information that remaining M-1 Gaussian component is included can be omitted, and maximum score for
Different Gaussage purpose channel models are often different, and generalization is typically poor.
The content of the invention
The purpose of the present invention be for solve different value of K channel model under Gaussian component number Top-N selective problems and
The problem of channel information is uneven is covered, a kind of audio event channel self-adapted method of Self Matching Top-N Gaussian components is proposed.
The present invention design principle be:The present invention carries out data prediction first, preprocessing process includes quantifying, sampling,
Preemphasis and adding window, then carry out feature extraction, that is, required audio low-level image feature parameter is extracted, and carry out afterwards special
Vectorial generation is levied, that is, the feature frame sequence of extraction is compressed according to segment length and section shifting and obtains vector paragraph, followed by
Feature Mapping, Feature Mapping is the process that channel correlated characteristic vector paragraph is mapped as to channel extraneous features vector paragraph, and feature is reflected
FM training and FM can be divided into using two parts by penetrating FM modules, finally carry out model training and identification.
The technical scheme is that be achieved by the steps of:
Step 1, the preprocessing process of audio identification mainly includes preemphasis, framing, adding window.It is general before feature extraction
Preemphasis processing is carried out to primary speech signal, lifting HFS spectrum is realized with order digital filter, Zhi Houxu
Carry out framing, framing can use contiguous segmentation or overlapping segmentation method, but many use overlapping is segmented to ensure between consecutive frame
Flatness and continuity, finally carry out adding window to reduce the truncation effect of speech frame, the changing slope at reduction speech frame two ends,
Need to choose suitable length of window.
Step 2, speech feature extraction is carried out using MFCC, time-domain signal is done into FFT, afterwards to its logarithmic energy
Compose the triangular filter group being distributed according to Mel scales and do convolution, calculate the logarithmic energy of each wave filter group output, then to filtering
The output vector of device group does discrete cosine transform.
Step 3, after characteristic parameter extraction is completed, characteristic vector generation is carried out.By the every one-dimensional of continuous N frames characteristic vector
Its average of feature addition calculation or variance, extract the general character of frame feature, and weaken typically has between the otherness of frame feature, adjacent segment
The overlapping flatness in order to improve transition of N-M frames.
Step 4, the Feature Mapping of mapping ruler is weighted based on Self Matching Top-N Gaussian components.By from different channels
Feature is mapped on the same feature space unrelated with channel by certain mode, for solving in actual audio event recognition
Because training condition and test condition is inconsistent causes the problem of recognition performance declines in system.Concrete methods of realizing is:
Step 4.1, a UBM model (w unrelated with channel is obtained using the data training from all kinds of channelsi, ui,
δi), wherein wiRepresent the weight of i-th of Gaussian probability-density function, uiRepresent average, δiRepresent variance.
Step 4.2, corresponding training data is selected according to specific channel situation, the training of each channel is then utilized
The GMM model that characteristic adaptively goes out under the particular channel using MAP methods one by one, with (wi A, ui A, δi A) represent in channel A
Under the conditions of GMM model.
Step 4.3, channel model judgement is carried out using the related training of whole identifying system channel and testing feature vector,
The characteristic parameter of input data is extracted first, and the channel of the data subordinate is then judged according to the size of log likelihood, I
Assume that the data belongs to self-channel A.
Step 4.4, using Self Matching Top-N Gaussian components weight mapping ruler carry out eigentransformation, according to from
Each frame feature vector of channel A test data, is selected in M Gaussian component of quantity of channel A gauss hybrid models
N Gaussian component N (u before Rank scoresk A,δk A)(N<M, k=1,2 ..., N), score threshold is set as ε (0 < ε < 1), is had
Body N number is obtained using score threshold Self Matching, when the fraction of N Gaussian component before score adds and reaches threshold epsilon,
The N values are then taken to weight the number of mapping as Self Matching Top-N Gaussian components:
After N is selected, variance δ of the Top-N Gaussian component in eigentransformation is calculated one by one respectivelyk AWith average uk A
Corresponding weight betak, and need to meet
The baseline mean and variance of GMM under the conditions of UBM after linear weighted function and channel A are designated as u respectivelyk *、δk *、
uk A*、δk A*.Obtain Self Matching Top-N Gaussian component weighted feature mapping equations:
Step 5, the training and identification of model are carried out to whole audio event using channel independent feature vectors.Beneficial effect
The method maximum compared to normalizing baseline scores, the present invention will not omit remaining M-1 Gaussian component and be included
Channel information.
Compared to Top-1 Gaussian component Feature Mapping methods and the Feature Mapping method of fixed Top-N Gaussian components weighting,
The present invention has more preferable application and channel self-adapting performance, can be the audio event identification under the influence of network transmission encoding variability
A kind of more preferable channel self-adapted method is provided.
Brief description of the drawings
Fig. 1 is audio event identifying system theory diagram of the invention;
Fig. 2 is the channel identification rate of different value of K under three kinds of channel mismatch;
Fig. 3 is the different value of K Top-1 of mismatch 1 and Self Matching Top-N method channel self-adapting performances;
Fig. 4 is the different value of K Top-1 of mismatch 2 and Self Matching Top-N method channel self-adapting performances;
The different value of K Top-1 of Fig. 5 mismatches 3 and Self Matching Top-N method channel self-adapting performances.
Embodiment
In order to better illustrate objects and advantages of the present invention, with reference to embodiment of the embodiment to the inventive method
It is described in further details.
Audio event data select shot collection as input, design and dispose 3 tests:(1) baseline system parameter is chosen
Channel matched is tested and the experiment of channel mismatch performance comparison;(2) Top-1 Gaussian components Feature Mapping method channel self-adapting performance
Test experiments;(3) Self Matching Top-N Gaussian components weighted feature mapping method is tested.
Above-mentioned 3 testing process will one by one be illustrated below, all tests are completed on same computer, tool
Body is configured to:Intel double-core CPU (dominant frequency 2.93GHz), 4.00GB internal memory, the operating systems of Windows 7.
1. baseline system channel matched and the experiment of channel mismatch performance comparison
First with the training data of channel matched data namely some channel and the test data of the channel to baseline system
Recognition accuracy under the conditions of channel matched is tested, the training data and test data of such as channel 1, then uses channel
Mismatch data, it is main to include three kinds of mismatch conditions, it is the training data of channel 1 and the test data of channel 2,3,4, difference respectively
Recognition accuracy of the test benchmark system in the case of these three channel mismatch.By consider system time complexity,
Discrimination and whether simple etc. the factor of operation, determine to choose 13 and tie up MFCC+2 dimensions Energy, 13 dimensions+2 and tie up first-order differences, 13 dimensions
+ 2 dimension second differnces reference characteristic that totally 45 dimension audio frequency characteristics are tested as audio event identifying system.
2.Top-1 Gaussian component Feature Mapping methods experiments
2.1Top-1 Gaussian component Feature Mapping method channel self-adapting performance tests
Different k values are set first, and k is the number of Gaussian component in UBM-GMM channel models, respectively using UBM-
GMM trains channel model, carries out model judgement, carries out Feature Mapping using Top-1 Gaussian component Feature Mappings method afterwards, most
Shot collection is trained and recognized using Adaboost afterwards, wherein k values take 4,8,16,32,64,128,256,512 and respectively
1024, Fig. 2 provide the system channel discrimination of different value of K in the case of three kinds of channel mismatch.
The channel information score and its corresponding channel self-adapting performance test of 2.2 difference Top-N Gaussian components
The test file won the confidence first under 2, wherein feature extraction are many frame data { x1,x2,…xn, entering
After the correct channel of row judges, each Gaussian component probability output score of ten frame data before calculating under the model of channel 2,
Include the probability output of the first six of highest scoring.As shown in table 1, Gaussian component number k takes 64.
The test frame data of table 1 belong to the score of each Gaussian component under the channel model
As k=64, the system tested under the different Gaussian component weighting mappings of Top-1 to the Top-6 under the conditions of mismatch 1 is known
Other performance, as a result as shown in table 2.
The channel self-adapting performance of different Top-N Gaussian components methods under the same k values of 2 mismatch of table 1
3. Self Matching Top-N Gaussian component weighted features mapping method is tested.
Baseline system is utilized respectively the weighting of Self Matching Top-N Gaussian components special under three kinds of channel mismatch conditions of experiment 1
Levy mapping method to test the channel mismatch adaptive performance under different value of K channel model, the parameter configuration of baseline system
With reference to Top-1 Gaussian component Feature Mapping methods experiments, afterwards with Top-1 Gaussian component Feature Mapping method channel self-adaptings
It can be contrasted.The Feature Mapping method of Self Matching Top-N Gaussian components weighting, uses score threshold method for per frame characteristic
Self Matching is carried out, corresponding Feature Mapping Gaussian component number N is matched.Experimental threshold values ε=0.99999 is set.
Test result
For test (1), baseline system typically has preferable recognition performance under conditions of channel matched, no matter at which kind of
Under channel mismatch conditions, influenceed very big by channel mismatch, the recognition performance of system all drastically declines, and therefore deduces that channel
The adaptive necessity of mismatch.
For test (2), when k values take 4,8,16,32, the recognition accuracy of system is in lifting trend, but works as k=64
When, the accuracy rate of system is begun to decline, and main cause is that training sample is relatively fewer, so as to cause what is set up when k values are higher
Model is not accurate enough.Generally speaking, the channel compensation effect of Top-1 Gaussian components Feature Mapping method is relatively good, or even in k values
System identification accuracy rate during channel matched can be met or exceeded in the case of suitable.
Letter of the fixed Top-N Gaussian component weighted feature mapping methods relative to Top-1 Gaussian component Feature Mapping methods
Road adaptive performance is somewhat better, and reason is that distribution of the frame data in feature space is typically determined jointly by multiple Gaussian components
It is fixed, although multiple Gaussian components covering channel informations are wider, but with the increase of k values, fixed Top-N Gaussian component it is defeated
Chu get branches reduce, comprising channel information can also reduce, and the selection of Top-N numbers can not well adapt to different k
The channel model of value, and the Feature Mapping method of Self Matching Top-N Gaussian components weighting not only avoid above mentioned problem and can also protect
Hold suitable channel compensation ability.
For test (3), Self Matching Top-N Gaussian component weighted feature mapping methods can solve high under different channels model
This component number Top-N selective problems, and average 2.0% lifting of fragment F values and the lifting of 1.36% duration F values,
Obtain than Top-1 and the more preferable channel self-adapting performance of fixed Top-N Gaussian component weighted feature mapping methods.
The present invention proposes a kind of audio event channel self-adapted method of Self Matching Top-N Gaussian components.In audio event
In channel mismatch identification process, the Feature Mapping method of Self Matching Top-N Gaussian components weighting can solve different value of K channel mould
How Gaussian component number Top-N selects and covers the problem of channel information is uneven, application and channel self-adapting under type
Can be more preferable than the Feature Mapping method of Top-1 Gaussian component Feature Mapping methods and fixed Top-N Gaussian components weighting, can be net
Audio event identification under the influence of network transmission encoding variability provides a kind of preferable channel self-adapted method.
Claims (5)
1. a kind of Self Matching Top-N audio events recognize channel self-adapted method, it is characterised in that methods described includes following step
Suddenly:
Step 1, the preprocessing process of audio identification mainly includes preemphasis, framing, adding window, typically right before feature extraction
Primary speech signal carry out preemphasis processing, lifting HFS spectrum realized with order digital filter, need afterwards into
Row framing, framing can use contiguous segmentation or overlapping segmentation method, but use overlaps segmentation to ensure putting down between consecutive frame more
Slip and continuity, finally carry out adding window to reduce the truncation effect of speech frame, the changing slope at reduction speech frame two ends, it is necessary to
Choose suitable length of window;
Step 2, speech feature extraction is carried out using MFCC, time-domain signal is done into FFT, afterwards to its logarithmic energy spectrum according to
The triangular filter group being distributed according to Mel scales does convolution, calculates the logarithmic energy of each wave filter group output, then to wave filter group
Output vector do discrete cosine transform;
Step 3, after characteristic parameter extraction is completed, characteristic vector generation is carried out, by every one-dimensional characteristic of continuous N frames characteristic vector
Its average of addition calculation or variance, extract the general character of frame feature, and weakening between the otherness of frame feature, adjacent segment typically has N-M
The overlapping flatness in order to improve transition of frame;
Step 4, the Feature Mapping of mapping ruler is weighted based on Self Matching Top-N Gaussian components, by the feature from different channels
It is mapped to by certain mode on the same feature space unrelated with channel, for solving in actual audio event recognition system
In because training condition and test condition is inconsistent causes the problem of recognition performance declines;
Step 5, the training and identification of model are carried out to whole audio event using channel independent feature vectors.
2. the Feature Mapping according to claim 1 that mapping ruler is weighted based on Self Matching Top-N Gaussian components, its feature
It is:A UBM model (w unrelated with channel is obtained using the data training from all kinds of channelsi, ui, δi), wherein wiTable
Show the weight of i-th of Gaussian probability-density function, uiRepresent average, δiRepresent variance.
3. the Feature Mapping according to claim 1 that mapping ruler is weighted based on Self Matching Top-N Gaussian components, its feature
It is:Corresponding training data is selected according to specific channel situation, then using each channel training characteristics data by
One GMM model adaptively gone out under the particular channel using MAP methods, with (wi A, ui A, δi A) represent under the conditions of channel A
GMM model.
4. the Feature Mapping according to claim 1 that mapping ruler is weighted based on Self Matching Top-N Gaussian components, its feature
It is:Channel model judgement is carried out using the related training of whole identifying system channel and testing feature vector, is extracted first
The characteristic parameter of input data, then judges the channel of the data subordinate according to the size of log likelihood, it will be assumed that this
Data belong to self-channel A.
5. the Feature Mapping according to claim 1 that mapping ruler is weighted based on Self Matching Top-N Gaussian components, its feature
It is:The mapping ruler weighted using Self Matching Top-N Gaussian components carries out eigentransformation, according to the test from channel A
Each frame feature vector of data, selects N before Rank scores in M Gaussian component of quantity of channel A gauss hybrid models
Gaussian component N (uk A,δk A)(N<M, k=1,2 ..., N), score threshold is set as ε (0 < ε < 1), and specific N number is profit
Obtained with score threshold Self Matching, when the fraction of N Gaussian component before score adds and reaches threshold epsilon, then take the N value conducts
The number of Self Matching Top-N Gaussian components weighting mapping:
<mrow>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>N</mi>
</munderover>
<mfrac>
<mrow>
<msup>
<msub>
<mi>w</mi>
<mi>k</mi>
</msub>
<mi>A</mi>
</msup>
<mi>N</mi>
<mrow>
<mo>(</mo>
<msup>
<msub>
<mi>u</mi>
<mi>k</mi>
</msub>
<mi>A</mi>
</msup>
<mo>,</mo>
<msup>
<msub>
<mi>&delta;</mi>
<mi>k</mi>
</msub>
<mi>A</mi>
</msup>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<msubsup>
<mi>&Sigma;</mi>
<mrow>
<mi>i</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>M</mi>
</msubsup>
<msup>
<msub>
<mi>w</mi>
<mi>i</mi>
</msub>
<mi>A</mi>
</msup>
<mi>N</mi>
<mrow>
<mo>(</mo>
<msup>
<msub>
<mi>u</mi>
<mi>i</mi>
</msub>
<mi>A</mi>
</msup>
<mo>,</mo>
<msup>
<msub>
<mi>&delta;</mi>
<mi>i</mi>
</msub>
<mi>A</mi>
</msup>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
<mo>&GreaterEqual;</mo>
<mi>&epsiv;</mi>
</mrow>
After N is selected, variance δ of the Top-N Gaussian component in eigentransformation is calculated one by one respectivelyk AWith average uk AIt is corresponding
Weight betak, and need to meet
<mrow>
<msub>
<mi>&beta;</mi>
<mi>k</mi>
</msub>
<mo>=</mo>
<mfrac>
<mrow>
<msup>
<msub>
<mi>w</mi>
<mi>k</mi>
</msub>
<mi>A</mi>
</msup>
<mi>N</mi>
<mrow>
<mo>(</mo>
<msup>
<msub>
<mi>u</mi>
<mi>k</mi>
</msub>
<mi>A</mi>
</msup>
<mo>,</mo>
<msup>
<msub>
<mi>&delta;</mi>
<mi>k</mi>
</msub>
<mi>A</mi>
</msup>
<mo>)</mo>
</mrow>
</mrow>
<mrow>
<msubsup>
<mi>&Sigma;</mi>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>N</mi>
</msubsup>
<msup>
<msub>
<mi>w</mi>
<mi>k</mi>
</msub>
<mi>A</mi>
</msup>
<mi>N</mi>
<mrow>
<mo>(</mo>
<msup>
<msub>
<mi>u</mi>
<mi>k</mi>
</msub>
<mi>A</mi>
</msup>
<mo>,</mo>
<msup>
<msub>
<mi>&delta;</mi>
<mi>k</mi>
</msub>
<mi>A</mi>
</msup>
<mo>)</mo>
</mrow>
</mrow>
</mfrac>
</mrow>
The baseline mean and variance of GMM under the conditions of UBM after linear weighted function and channel A are designated as u respectivelyk *、δk *、uk A*、
δk A*, obtain Self Matching Top-N Gaussian component weighted feature mapping equations:
<mfenced open = "" close = "">
<mtable>
<mtr>
<mtd>
<mrow>
<mi>y</mi>
<mo>=</mo>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>-</mo>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>N</mi>
</munderover>
<msub>
<mi>&beta;</mi>
<mi>k</mi>
</msub>
<msup>
<msub>
<mi>u</mi>
<mi>k</mi>
</msub>
<mi>A</mi>
</msup>
<mo>)</mo>
</mrow>
<mfrac>
<mrow>
<msubsup>
<mo>&Sigma;</mo>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>N</mi>
</msubsup>
<msub>
<mi>&beta;</mi>
<mi>k</mi>
</msub>
<msub>
<mi>&delta;</mi>
<mi>i</mi>
</msub>
</mrow>
<mrow>
<msubsup>
<mo>&Sigma;</mo>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>N</mi>
</msubsup>
<msub>
<mi>&beta;</mi>
<mi>k</mi>
</msub>
<msup>
<msub>
<mi>&delta;</mi>
<mi>k</mi>
</msub>
<mi>A</mi>
</msup>
</mrow>
</mfrac>
<mo>+</mo>
<munderover>
<mo>&Sigma;</mo>
<mrow>
<mi>k</mi>
<mo>=</mo>
<mn>1</mn>
</mrow>
<mi>N</mi>
</munderover>
<msub>
<mi>&beta;</mi>
<mi>k</mi>
</msub>
<msub>
<mi>u</mi>
<mi>i</mi>
</msub>
</mrow>
</mtd>
</mtr>
<mtr>
<mtd>
<mrow>
<mo>=</mo>
<mrow>
<mo>(</mo>
<mi>x</mi>
<mo>-</mo>
<msup>
<msub>
<mi>u</mi>
<mi>k</mi>
</msub>
<mrow>
<mi>A</mi>
<mo>*</mo>
</mrow>
</msup>
<mo>)</mo>
</mrow>
<mfrac>
<mrow>
<msup>
<msub>
<mi>&delta;</mi>
<mi>k</mi>
</msub>
<mo>*</mo>
</msup>
</mrow>
<mrow>
<msup>
<msub>
<mi>&delta;</mi>
<mi>k</mi>
</msub>
<mrow>
<mi>A</mi>
<mo>*</mo>
</mrow>
</msup>
</mrow>
</mfrac>
<mo>+</mo>
<msup>
<msub>
<mi>u</mi>
<mi>k</mi>
</msub>
<mo>*</mo>
</msup>
</mrow>
</mtd>
</mtr>
</mtable>
</mfenced>
2
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710334633.XA CN107123432A (en) | 2017-05-12 | 2017-05-12 | A kind of Self Matching Top N audio events recognize channel self-adapted method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710334633.XA CN107123432A (en) | 2017-05-12 | 2017-05-12 | A kind of Self Matching Top N audio events recognize channel self-adapted method |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107123432A true CN107123432A (en) | 2017-09-01 |
Family
ID=59728248
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710334633.XA Pending CN107123432A (en) | 2017-05-12 | 2017-05-12 | A kind of Self Matching Top N audio events recognize channel self-adapted method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107123432A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108417201A (en) * | 2018-01-19 | 2018-08-17 | 苏州思必驰信息科技有限公司 | The more speaker's identity recognition methods of single channel and system |
CN109599118A (en) * | 2019-01-24 | 2019-04-09 | 宁波大学 | A kind of voice playback detection method of robustness |
CN110120230A (en) * | 2019-01-08 | 2019-08-13 | 国家计算机网络与信息安全管理中心 | A kind of acoustic events detection method and device |
CN111210809A (en) * | 2018-11-22 | 2020-05-29 | 阿里巴巴集团控股有限公司 | Voice training data adaptation method and device, voice data conversion method and electronic equipment |
CN111602410A (en) * | 2018-02-27 | 2020-08-28 | 欧姆龙株式会社 | Suitability determination device, suitability determination method, and program |
CN112489678A (en) * | 2020-11-13 | 2021-03-12 | 苏宁云计算有限公司 | Scene recognition method and device based on channel characteristics |
CN112820318A (en) * | 2020-12-31 | 2021-05-18 | 西安合谱声学科技有限公司 | Impact sound model establishment and impact sound detection method and system based on GMM-UBM |
CN117373488A (en) * | 2023-12-08 | 2024-01-09 | 富迪科技(南京)有限公司 | Audio real-time scene recognition system |
-
2017
- 2017-05-12 CN CN201710334633.XA patent/CN107123432A/en active Pending
Non-Patent Citations (1)
Title |
---|
吕英: ""音频事件识别信道自适应方法研究"", 《中国优秀硕士学位论文全文数据库信息科技辑》 * |
Cited By (13)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108417201A (en) * | 2018-01-19 | 2018-08-17 | 苏州思必驰信息科技有限公司 | The more speaker's identity recognition methods of single channel and system |
CN111602410B (en) * | 2018-02-27 | 2022-04-19 | 欧姆龙株式会社 | Suitability determination device, suitability determination method, and storage medium |
CN111602410A (en) * | 2018-02-27 | 2020-08-28 | 欧姆龙株式会社 | Suitability determination device, suitability determination method, and program |
CN111210809A (en) * | 2018-11-22 | 2020-05-29 | 阿里巴巴集团控股有限公司 | Voice training data adaptation method and device, voice data conversion method and electronic equipment |
CN111210809B (en) * | 2018-11-22 | 2024-03-19 | 阿里巴巴集团控股有限公司 | Voice training data adaptation method and device, voice data conversion method and electronic equipment |
CN110120230A (en) * | 2019-01-08 | 2019-08-13 | 国家计算机网络与信息安全管理中心 | A kind of acoustic events detection method and device |
CN110120230B (en) * | 2019-01-08 | 2021-06-01 | 国家计算机网络与信息安全管理中心 | Acoustic event detection method and device |
CN109599118A (en) * | 2019-01-24 | 2019-04-09 | 宁波大学 | A kind of voice playback detection method of robustness |
CN112489678B (en) * | 2020-11-13 | 2023-12-05 | 深圳市云网万店科技有限公司 | Scene recognition method and device based on channel characteristics |
CN112489678A (en) * | 2020-11-13 | 2021-03-12 | 苏宁云计算有限公司 | Scene recognition method and device based on channel characteristics |
CN112820318A (en) * | 2020-12-31 | 2021-05-18 | 西安合谱声学科技有限公司 | Impact sound model establishment and impact sound detection method and system based on GMM-UBM |
CN117373488A (en) * | 2023-12-08 | 2024-01-09 | 富迪科技(南京)有限公司 | Audio real-time scene recognition system |
CN117373488B (en) * | 2023-12-08 | 2024-02-13 | 富迪科技(南京)有限公司 | Audio real-time scene recognition system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107123432A (en) | A kind of Self Matching Top N audio events recognize channel self-adapted method | |
CN103117059B (en) | Voice signal characteristics extracting method based on tensor decomposition | |
CN103310789B (en) | A kind of sound event recognition method of the parallel model combination based on improving | |
US20030236661A1 (en) | System and method for noise-robust feature extraction | |
CN106952643A (en) | A kind of sound pick-up outfit clustering method based on Gaussian mean super vector and spectral clustering | |
CN101853661B (en) | Noise spectrum estimation and voice mobility detection method based on unsupervised learning | |
EP2662854A1 (en) | Method and device for detecting fundamental tone | |
CN105206270A (en) | Isolated digit speech recognition classification system and method combining principal component analysis (PCA) with restricted Boltzmann machine (RBM) | |
CN103794207A (en) | Dual-mode voice identity recognition method | |
CN102789779A (en) | Speech recognition system and recognition method thereof | |
CN104978507A (en) | Intelligent well logging evaluation expert system identity authentication method based on voiceprint recognition | |
CN101640043A (en) | Speaker recognition method based on multi-coordinate sequence kernel and system thereof | |
Mallidi et al. | Novel neural network based fusion for multistream ASR | |
CN109378014A (en) | A kind of mobile device source discrimination and system based on convolutional neural networks | |
CN113327626A (en) | Voice noise reduction method, device, equipment and storage medium | |
Zhang et al. | An efficient perceptual hashing based on improved spectral entropy for speech authentication | |
Hassan et al. | Pattern classification in recognizing Qalqalah Kubra pronuncation using multilayer perceptrons | |
CN115758082A (en) | Fault diagnosis method for rail transit transformer | |
CN106297768B (en) | Speech recognition method | |
CN106941007A (en) | A kind of audio event model composite channel adaptive approach | |
CN112151067B (en) | Digital audio tampering passive detection method based on convolutional neural network | |
CN104392719A (en) | Center sub-band model adaptation method for voice recognition system | |
CN110808067A (en) | Low signal-to-noise ratio sound event detection method based on binary multiband energy distribution | |
CN107919136B (en) | Digital voice sampling frequency estimation method based on Gaussian mixture model | |
CN116312628A (en) | False audio detection method and system based on self knowledge distillation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170901 |
|
WD01 | Invention patent application deemed withdrawn after publication |