CN107123432A - A kind of Self Matching Top N audio events recognize channel self-adapted method - Google Patents

A kind of Self Matching Top N audio events recognize channel self-adapted method Download PDF

Info

Publication number
CN107123432A
CN107123432A CN201710334633.XA CN201710334633A CN107123432A CN 107123432 A CN107123432 A CN 107123432A CN 201710334633 A CN201710334633 A CN 201710334633A CN 107123432 A CN107123432 A CN 107123432A
Authority
CN
China
Prior art keywords
mrow
msub
msup
channel
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201710334633.XA
Other languages
Chinese (zh)
Inventor
罗森林
佟彤
潘丽敏
吕英
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Institute of Technology BIT
Original Assignee
Beijing Institute of Technology BIT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Institute of Technology BIT filed Critical Beijing Institute of Technology BIT
Priority to CN201710334633.XA priority Critical patent/CN107123432A/en
Publication of CN107123432A publication Critical patent/CN107123432A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/45Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of analysis window
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • G10L2015/0635Training updating or merging of old and new templates; Mean values; Weighting

Abstract

Channel self-adapted method is recognized the present invention relates to a kind of Self Matching Top N audio events, is said from the angle of application scenarios, belongs to audio event identification technology field;For the angle realized from technology, computer science and audio signal processing technique field are also belonged to.The present invention carries out data prediction first, preprocessing process includes quantization, sampling, preemphasis and adding window, then feature extraction is carried out, namely required audio low-level image feature parameter is extracted, characteristic vector generation is carried out afterwards, namely the feature frame sequence of extraction is compressed according to segment length and section shifting and obtains vector paragraph, followed by Feature Mapping, Feature Mapping is the process that channel correlated characteristic vector paragraph is mapped as to channel extraneous features vector paragraph, Feature Mapping FM modules can be divided into FM training and FM using two parts, finally carry out model training and identification.The problem of present invention can solve under different value of K channel model Gaussian component number Top N selective problems and uneven covering channel information, a kind of preferable channel self-adapted method is provided for the audio event identification under the influence of network transmission encoding variability.

Description

A kind of Self Matching Top-N audio events recognize channel self-adapted method
Technical field
Channel self-adapted method is recognized the present invention relates to a kind of Self Matching Top-N audio events, from the angle of application scenarios Say, belong to audio event identification technology field;For the angle realized from technology, computer science and audio frequency process skill are also belonged to Art field.
Background technology
Audio event identifying system in actual applications often due to record environment, collecting device, coded system difference And channel mismatch problem is produced, a relatively conventional class is the channel mismatch that encoding variability is introduced, and channel self-adapted method is pair The characteristic parameter being distorted in channel mismatch is modified, so that the characteristic information of raw tone is more accurately reacted, letter Road is adaptively commonly divided into that property field is adaptive, model domain adaptive and score domain is adaptive, can select one of them or It is multiple to carry out adaptively.
Property field is adaptively current most widely used channel self-adapted method.Property field channel self-adapted method can divide It is adaptive and channel non-linearity is adaptive for channel linearity, based on the adaptive method of channel linearity typically more and effect ratio Preferably, the usually standard configuration of audio recognition systems.Wherein more typical channel linearity adaptive approach and channel non-thread Property adaptive approach has:
1. cepstral mean subtracts
It is a kind of method for being widely used in removing channel convolution noise in speech recognition, the sheet of this method that cepstral mean, which subtracts, Matter is the additive noise Convolution Noise on frequency domain being transformed on cepstrum domain, when subtracting average on the cepstrum parameter in cepstrum domain When, it is possible to Convolution Noise is removed, the performance is especially prominent when channel distortions model is linear characteristic.But if during voice Long shorter or voice segments are cleaner, and subtracting method effect using cepstral mean will not substantially, it could even be possible to causing systematic function Decline.And when channel distortions are non-linear distortion, the validity that cepstral mean subtracts also can be by a definite limitation.
2. cepstral mean variance is regular
The regular variance progress further to cepstral domain feature parameter of cepstrum variance is regular.Cepstral mean subtracts and cepstrum variance Regular to be combined, referred to as cepstral mean variance is regular.The regular thinking of cepstral mean variance and implementation are simple, know in voice Aspect does not achieve preferable effect, but the channel distortions effect for non-linear distortion is not clearly.
3. vector Taylor series
Vector Taylor series are a kind of feature compensation methods of relatively practicality, generally by an explicit model To describe the generation of Noisy Speech Signal, if clean speech and noise obey gauss hybrid models and single Gauss point respectively Cloth, is linearized, it is ensured that noisy speech also obeys Gauss using vector Taylor expansion method to non-linear environmental model Mixed model, it is assumed that training and tested speech signal are steady, estimates ambient noise statistic, most using EM algorithm Afterwards clean speech feature is estimated using minimum mean square error criterion.Vector Taylor series algorithm has good noiseproof feature, But this method, which is typically all the offline gauss hybrid models for completing and using, is generally 128 even more highs, not only iteration is secondary Number is more and computationally intensive, and general is difficult to meet requirement of real-time.Need to be improved classic algorithm to lift its computing effect Rate and real-time.
4. Feature Mapping
Feature Mapping method is based on GMM-UBM models, is developed by speaker model synthesis method, the purpose of this method It is that the related phonetic feature of channel is mapped in a unrelated space of channel, mould is carried out using the unrelated characteristic vector of channel Type training and identification.Main process includes two aspects:Channel model is trained and eigentransformation.Feature Mapping method is at present should With one of widest channel self-adapted method, property field is acted on, with very high flexibility and convenience.
In summary, existing Feature Mapping method is only adaptive with regard to the maximum Gaussian component progress of score in eigentransformation Should, when M is Gaussian component number, the channel information that remaining M-1 Gaussian component is included can be omitted, and maximum score for Different Gaussage purpose channel models are often different, and generalization is typically poor.
The content of the invention
The purpose of the present invention be for solve different value of K channel model under Gaussian component number Top-N selective problems and The problem of channel information is uneven is covered, a kind of audio event channel self-adapted method of Self Matching Top-N Gaussian components is proposed.
The present invention design principle be:The present invention carries out data prediction first, preprocessing process includes quantifying, sampling, Preemphasis and adding window, then carry out feature extraction, that is, required audio low-level image feature parameter is extracted, and carry out afterwards special Vectorial generation is levied, that is, the feature frame sequence of extraction is compressed according to segment length and section shifting and obtains vector paragraph, followed by Feature Mapping, Feature Mapping is the process that channel correlated characteristic vector paragraph is mapped as to channel extraneous features vector paragraph, and feature is reflected FM training and FM can be divided into using two parts by penetrating FM modules, finally carry out model training and identification.
The technical scheme is that be achieved by the steps of:
Step 1, the preprocessing process of audio identification mainly includes preemphasis, framing, adding window.It is general before feature extraction Preemphasis processing is carried out to primary speech signal, lifting HFS spectrum is realized with order digital filter, Zhi Houxu Carry out framing, framing can use contiguous segmentation or overlapping segmentation method, but many use overlapping is segmented to ensure between consecutive frame Flatness and continuity, finally carry out adding window to reduce the truncation effect of speech frame, the changing slope at reduction speech frame two ends, Need to choose suitable length of window.
Step 2, speech feature extraction is carried out using MFCC, time-domain signal is done into FFT, afterwards to its logarithmic energy Compose the triangular filter group being distributed according to Mel scales and do convolution, calculate the logarithmic energy of each wave filter group output, then to filtering The output vector of device group does discrete cosine transform.
Step 3, after characteristic parameter extraction is completed, characteristic vector generation is carried out.By the every one-dimensional of continuous N frames characteristic vector Its average of feature addition calculation or variance, extract the general character of frame feature, and weaken typically has between the otherness of frame feature, adjacent segment The overlapping flatness in order to improve transition of N-M frames.
Step 4, the Feature Mapping of mapping ruler is weighted based on Self Matching Top-N Gaussian components.By from different channels Feature is mapped on the same feature space unrelated with channel by certain mode, for solving in actual audio event recognition Because training condition and test condition is inconsistent causes the problem of recognition performance declines in system.Concrete methods of realizing is:
Step 4.1, a UBM model (w unrelated with channel is obtained using the data training from all kinds of channelsi, ui, δi), wherein wiRepresent the weight of i-th of Gaussian probability-density function, uiRepresent average, δiRepresent variance.
Step 4.2, corresponding training data is selected according to specific channel situation, the training of each channel is then utilized The GMM model that characteristic adaptively goes out under the particular channel using MAP methods one by one, with (wi A, ui A, δi A) represent in channel A Under the conditions of GMM model.
Step 4.3, channel model judgement is carried out using the related training of whole identifying system channel and testing feature vector, The characteristic parameter of input data is extracted first, and the channel of the data subordinate is then judged according to the size of log likelihood, I Assume that the data belongs to self-channel A.
Step 4.4, using Self Matching Top-N Gaussian components weight mapping ruler carry out eigentransformation, according to from Each frame feature vector of channel A test data, is selected in M Gaussian component of quantity of channel A gauss hybrid models N Gaussian component N (u before Rank scoresk Ak A)(N<M, k=1,2 ..., N), score threshold is set as ε (0 < ε < 1), is had Body N number is obtained using score threshold Self Matching, when the fraction of N Gaussian component before score adds and reaches threshold epsilon, The N values are then taken to weight the number of mapping as Self Matching Top-N Gaussian components:
After N is selected, variance δ of the Top-N Gaussian component in eigentransformation is calculated one by one respectivelyk AWith average uk A Corresponding weight betak, and need to meet
The baseline mean and variance of GMM under the conditions of UBM after linear weighted function and channel A are designated as u respectivelyk *、δk *、 uk A*、δk A*.Obtain Self Matching Top-N Gaussian component weighted feature mapping equations:
Step 5, the training and identification of model are carried out to whole audio event using channel independent feature vectors.Beneficial effect
The method maximum compared to normalizing baseline scores, the present invention will not omit remaining M-1 Gaussian component and be included Channel information.
Compared to Top-1 Gaussian component Feature Mapping methods and the Feature Mapping method of fixed Top-N Gaussian components weighting, The present invention has more preferable application and channel self-adapting performance, can be the audio event identification under the influence of network transmission encoding variability A kind of more preferable channel self-adapted method is provided.
Brief description of the drawings
Fig. 1 is audio event identifying system theory diagram of the invention;
Fig. 2 is the channel identification rate of different value of K under three kinds of channel mismatch;
Fig. 3 is the different value of K Top-1 of mismatch 1 and Self Matching Top-N method channel self-adapting performances;
Fig. 4 is the different value of K Top-1 of mismatch 2 and Self Matching Top-N method channel self-adapting performances;
The different value of K Top-1 of Fig. 5 mismatches 3 and Self Matching Top-N method channel self-adapting performances.
Embodiment
In order to better illustrate objects and advantages of the present invention, with reference to embodiment of the embodiment to the inventive method It is described in further details.
Audio event data select shot collection as input, design and dispose 3 tests:(1) baseline system parameter is chosen Channel matched is tested and the experiment of channel mismatch performance comparison;(2) Top-1 Gaussian components Feature Mapping method channel self-adapting performance Test experiments;(3) Self Matching Top-N Gaussian components weighted feature mapping method is tested.
Above-mentioned 3 testing process will one by one be illustrated below, all tests are completed on same computer, tool Body is configured to:Intel double-core CPU (dominant frequency 2.93GHz), 4.00GB internal memory, the operating systems of Windows 7.
1. baseline system channel matched and the experiment of channel mismatch performance comparison
First with the training data of channel matched data namely some channel and the test data of the channel to baseline system Recognition accuracy under the conditions of channel matched is tested, the training data and test data of such as channel 1, then uses channel Mismatch data, it is main to include three kinds of mismatch conditions, it is the training data of channel 1 and the test data of channel 2,3,4, difference respectively Recognition accuracy of the test benchmark system in the case of these three channel mismatch.By consider system time complexity, Discrimination and whether simple etc. the factor of operation, determine to choose 13 and tie up MFCC+2 dimensions Energy, 13 dimensions+2 and tie up first-order differences, 13 dimensions + 2 dimension second differnces reference characteristic that totally 45 dimension audio frequency characteristics are tested as audio event identifying system.
2.Top-1 Gaussian component Feature Mapping methods experiments
2.1Top-1 Gaussian component Feature Mapping method channel self-adapting performance tests
Different k values are set first, and k is the number of Gaussian component in UBM-GMM channel models, respectively using UBM- GMM trains channel model, carries out model judgement, carries out Feature Mapping using Top-1 Gaussian component Feature Mappings method afterwards, most Shot collection is trained and recognized using Adaboost afterwards, wherein k values take 4,8,16,32,64,128,256,512 and respectively 1024, Fig. 2 provide the system channel discrimination of different value of K in the case of three kinds of channel mismatch.
The channel information score and its corresponding channel self-adapting performance test of 2.2 difference Top-N Gaussian components
The test file won the confidence first under 2, wherein feature extraction are many frame data { x1,x2,…xn, entering After the correct channel of row judges, each Gaussian component probability output score of ten frame data before calculating under the model of channel 2, Include the probability output of the first six of highest scoring.As shown in table 1, Gaussian component number k takes 64.
The test frame data of table 1 belong to the score of each Gaussian component under the channel model
As k=64, the system tested under the different Gaussian component weighting mappings of Top-1 to the Top-6 under the conditions of mismatch 1 is known Other performance, as a result as shown in table 2.
The channel self-adapting performance of different Top-N Gaussian components methods under the same k values of 2 mismatch of table 1
3. Self Matching Top-N Gaussian component weighted features mapping method is tested.
Baseline system is utilized respectively the weighting of Self Matching Top-N Gaussian components special under three kinds of channel mismatch conditions of experiment 1 Levy mapping method to test the channel mismatch adaptive performance under different value of K channel model, the parameter configuration of baseline system With reference to Top-1 Gaussian component Feature Mapping methods experiments, afterwards with Top-1 Gaussian component Feature Mapping method channel self-adaptings It can be contrasted.The Feature Mapping method of Self Matching Top-N Gaussian components weighting, uses score threshold method for per frame characteristic Self Matching is carried out, corresponding Feature Mapping Gaussian component number N is matched.Experimental threshold values ε=0.99999 is set.
Test result
For test (1), baseline system typically has preferable recognition performance under conditions of channel matched, no matter at which kind of Under channel mismatch conditions, influenceed very big by channel mismatch, the recognition performance of system all drastically declines, and therefore deduces that channel The adaptive necessity of mismatch.
For test (2), when k values take 4,8,16,32, the recognition accuracy of system is in lifting trend, but works as k=64 When, the accuracy rate of system is begun to decline, and main cause is that training sample is relatively fewer, so as to cause what is set up when k values are higher Model is not accurate enough.Generally speaking, the channel compensation effect of Top-1 Gaussian components Feature Mapping method is relatively good, or even in k values System identification accuracy rate during channel matched can be met or exceeded in the case of suitable.
Letter of the fixed Top-N Gaussian component weighted feature mapping methods relative to Top-1 Gaussian component Feature Mapping methods Road adaptive performance is somewhat better, and reason is that distribution of the frame data in feature space is typically determined jointly by multiple Gaussian components It is fixed, although multiple Gaussian components covering channel informations are wider, but with the increase of k values, fixed Top-N Gaussian component it is defeated Chu get branches reduce, comprising channel information can also reduce, and the selection of Top-N numbers can not well adapt to different k The channel model of value, and the Feature Mapping method of Self Matching Top-N Gaussian components weighting not only avoid above mentioned problem and can also protect Hold suitable channel compensation ability.
For test (3), Self Matching Top-N Gaussian component weighted feature mapping methods can solve high under different channels model This component number Top-N selective problems, and average 2.0% lifting of fragment F values and the lifting of 1.36% duration F values, Obtain than Top-1 and the more preferable channel self-adapting performance of fixed Top-N Gaussian component weighted feature mapping methods.
The present invention proposes a kind of audio event channel self-adapted method of Self Matching Top-N Gaussian components.In audio event In channel mismatch identification process, the Feature Mapping method of Self Matching Top-N Gaussian components weighting can solve different value of K channel mould How Gaussian component number Top-N selects and covers the problem of channel information is uneven, application and channel self-adapting under type Can be more preferable than the Feature Mapping method of Top-1 Gaussian component Feature Mapping methods and fixed Top-N Gaussian components weighting, can be net Audio event identification under the influence of network transmission encoding variability provides a kind of preferable channel self-adapted method.

Claims (5)

1. a kind of Self Matching Top-N audio events recognize channel self-adapted method, it is characterised in that methods described includes following step Suddenly:
Step 1, the preprocessing process of audio identification mainly includes preemphasis, framing, adding window, typically right before feature extraction Primary speech signal carry out preemphasis processing, lifting HFS spectrum realized with order digital filter, need afterwards into Row framing, framing can use contiguous segmentation or overlapping segmentation method, but use overlaps segmentation to ensure putting down between consecutive frame more Slip and continuity, finally carry out adding window to reduce the truncation effect of speech frame, the changing slope at reduction speech frame two ends, it is necessary to Choose suitable length of window;
Step 2, speech feature extraction is carried out using MFCC, time-domain signal is done into FFT, afterwards to its logarithmic energy spectrum according to The triangular filter group being distributed according to Mel scales does convolution, calculates the logarithmic energy of each wave filter group output, then to wave filter group Output vector do discrete cosine transform;
Step 3, after characteristic parameter extraction is completed, characteristic vector generation is carried out, by every one-dimensional characteristic of continuous N frames characteristic vector Its average of addition calculation or variance, extract the general character of frame feature, and weakening between the otherness of frame feature, adjacent segment typically has N-M The overlapping flatness in order to improve transition of frame;
Step 4, the Feature Mapping of mapping ruler is weighted based on Self Matching Top-N Gaussian components, by the feature from different channels It is mapped to by certain mode on the same feature space unrelated with channel, for solving in actual audio event recognition system In because training condition and test condition is inconsistent causes the problem of recognition performance declines;
Step 5, the training and identification of model are carried out to whole audio event using channel independent feature vectors.
2. the Feature Mapping according to claim 1 that mapping ruler is weighted based on Self Matching Top-N Gaussian components, its feature It is:A UBM model (w unrelated with channel is obtained using the data training from all kinds of channelsi, ui, δi), wherein wiTable Show the weight of i-th of Gaussian probability-density function, uiRepresent average, δiRepresent variance.
3. the Feature Mapping according to claim 1 that mapping ruler is weighted based on Self Matching Top-N Gaussian components, its feature It is:Corresponding training data is selected according to specific channel situation, then using each channel training characteristics data by One GMM model adaptively gone out under the particular channel using MAP methods, with (wi A, ui A, δi A) represent under the conditions of channel A GMM model.
4. the Feature Mapping according to claim 1 that mapping ruler is weighted based on Self Matching Top-N Gaussian components, its feature It is:Channel model judgement is carried out using the related training of whole identifying system channel and testing feature vector, is extracted first The characteristic parameter of input data, then judges the channel of the data subordinate according to the size of log likelihood, it will be assumed that this Data belong to self-channel A.
5. the Feature Mapping according to claim 1 that mapping ruler is weighted based on Self Matching Top-N Gaussian components, its feature It is:The mapping ruler weighted using Self Matching Top-N Gaussian components carries out eigentransformation, according to the test from channel A Each frame feature vector of data, selects N before Rank scores in M Gaussian component of quantity of channel A gauss hybrid models Gaussian component N (uk Ak A)(N<M, k=1,2 ..., N), score threshold is set as ε (0 < ε < 1), and specific N number is profit Obtained with score threshold Self Matching, when the fraction of N Gaussian component before score adds and reaches threshold epsilon, then take the N value conducts The number of Self Matching Top-N Gaussian components weighting mapping:
<mrow> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <mfrac> <mrow> <msup> <msub> <mi>w</mi> <mi>k</mi> </msub> <mi>A</mi> </msup> <mi>N</mi> <mrow> <mo>(</mo> <msup> <msub> <mi>u</mi> <mi>k</mi> </msub> <mi>A</mi> </msup> <mo>,</mo> <msup> <msub> <mi>&amp;delta;</mi> <mi>k</mi> </msub> <mi>A</mi> </msup> <mo>)</mo> </mrow> </mrow> <mrow> <msubsup> <mi>&amp;Sigma;</mi> <mrow> <mi>i</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>M</mi> </msubsup> <msup> <msub> <mi>w</mi> <mi>i</mi> </msub> <mi>A</mi> </msup> <mi>N</mi> <mrow> <mo>(</mo> <msup> <msub> <mi>u</mi> <mi>i</mi> </msub> <mi>A</mi> </msup> <mo>,</mo> <msup> <msub> <mi>&amp;delta;</mi> <mi>i</mi> </msub> <mi>A</mi> </msup> <mo>)</mo> </mrow> </mrow> </mfrac> <mo>&amp;GreaterEqual;</mo> <mi>&amp;epsiv;</mi> </mrow>
After N is selected, variance δ of the Top-N Gaussian component in eigentransformation is calculated one by one respectivelyk AWith average uk AIt is corresponding Weight betak, and need to meet
<mrow> <msub> <mi>&amp;beta;</mi> <mi>k</mi> </msub> <mo>=</mo> <mfrac> <mrow> <msup> <msub> <mi>w</mi> <mi>k</mi> </msub> <mi>A</mi> </msup> <mi>N</mi> <mrow> <mo>(</mo> <msup> <msub> <mi>u</mi> <mi>k</mi> </msub> <mi>A</mi> </msup> <mo>,</mo> <msup> <msub> <mi>&amp;delta;</mi> <mi>k</mi> </msub> <mi>A</mi> </msup> <mo>)</mo> </mrow> </mrow> <mrow> <msubsup> <mi>&amp;Sigma;</mi> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </msubsup> <msup> <msub> <mi>w</mi> <mi>k</mi> </msub> <mi>A</mi> </msup> <mi>N</mi> <mrow> <mo>(</mo> <msup> <msub> <mi>u</mi> <mi>k</mi> </msub> <mi>A</mi> </msup> <mo>,</mo> <msup> <msub> <mi>&amp;delta;</mi> <mi>k</mi> </msub> <mi>A</mi> </msup> <mo>)</mo> </mrow> </mrow> </mfrac> </mrow>
The baseline mean and variance of GMM under the conditions of UBM after linear weighted function and channel A are designated as u respectivelyk *、δk *、uk A*、 δk A*, obtain Self Matching Top-N Gaussian component weighted feature mapping equations:
<mfenced open = "" close = ""> <mtable> <mtr> <mtd> <mrow> <mi>y</mi> <mo>=</mo> <mrow> <mo>(</mo> <mi>x</mi> <mo>-</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msub> <mi>&amp;beta;</mi> <mi>k</mi> </msub> <msup> <msub> <mi>u</mi> <mi>k</mi> </msub> <mi>A</mi> </msup> <mo>)</mo> </mrow> <mfrac> <mrow> <msubsup> <mo>&amp;Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </msubsup> <msub> <mi>&amp;beta;</mi> <mi>k</mi> </msub> <msub> <mi>&amp;delta;</mi> <mi>i</mi> </msub> </mrow> <mrow> <msubsup> <mo>&amp;Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </msubsup> <msub> <mi>&amp;beta;</mi> <mi>k</mi> </msub> <msup> <msub> <mi>&amp;delta;</mi> <mi>k</mi> </msub> <mi>A</mi> </msup> </mrow> </mfrac> <mo>+</mo> <munderover> <mo>&amp;Sigma;</mo> <mrow> <mi>k</mi> <mo>=</mo> <mn>1</mn> </mrow> <mi>N</mi> </munderover> <msub> <mi>&amp;beta;</mi> <mi>k</mi> </msub> <msub> <mi>u</mi> <mi>i</mi> </msub> </mrow> </mtd> </mtr> <mtr> <mtd> <mrow> <mo>=</mo> <mrow> <mo>(</mo> <mi>x</mi> <mo>-</mo> <msup> <msub> <mi>u</mi> <mi>k</mi> </msub> <mrow> <mi>A</mi> <mo>*</mo> </mrow> </msup> <mo>)</mo> </mrow> <mfrac> <mrow> <msup> <msub> <mi>&amp;delta;</mi> <mi>k</mi> </msub> <mo>*</mo> </msup> </mrow> <mrow> <msup> <msub> <mi>&amp;delta;</mi> <mi>k</mi> </msub> <mrow> <mi>A</mi> <mo>*</mo> </mrow> </msup> </mrow> </mfrac> <mo>+</mo> <msup> <msub> <mi>u</mi> <mi>k</mi> </msub> <mo>*</mo> </msup> </mrow> </mtd> </mtr> </mtable> </mfenced> 2
CN201710334633.XA 2017-05-12 2017-05-12 A kind of Self Matching Top N audio events recognize channel self-adapted method Pending CN107123432A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710334633.XA CN107123432A (en) 2017-05-12 2017-05-12 A kind of Self Matching Top N audio events recognize channel self-adapted method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710334633.XA CN107123432A (en) 2017-05-12 2017-05-12 A kind of Self Matching Top N audio events recognize channel self-adapted method

Publications (1)

Publication Number Publication Date
CN107123432A true CN107123432A (en) 2017-09-01

Family

ID=59728248

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710334633.XA Pending CN107123432A (en) 2017-05-12 2017-05-12 A kind of Self Matching Top N audio events recognize channel self-adapted method

Country Status (1)

Country Link
CN (1) CN107123432A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108417201A (en) * 2018-01-19 2018-08-17 苏州思必驰信息科技有限公司 The more speaker's identity recognition methods of single channel and system
CN109599118A (en) * 2019-01-24 2019-04-09 宁波大学 A kind of voice playback detection method of robustness
CN110120230A (en) * 2019-01-08 2019-08-13 国家计算机网络与信息安全管理中心 A kind of acoustic events detection method and device
CN111210809A (en) * 2018-11-22 2020-05-29 阿里巴巴集团控股有限公司 Voice training data adaptation method and device, voice data conversion method and electronic equipment
CN111602410A (en) * 2018-02-27 2020-08-28 欧姆龙株式会社 Suitability determination device, suitability determination method, and program
CN112489678A (en) * 2020-11-13 2021-03-12 苏宁云计算有限公司 Scene recognition method and device based on channel characteristics
CN112820318A (en) * 2020-12-31 2021-05-18 西安合谱声学科技有限公司 Impact sound model establishment and impact sound detection method and system based on GMM-UBM
CN117373488A (en) * 2023-12-08 2024-01-09 富迪科技(南京)有限公司 Audio real-time scene recognition system

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
吕英: ""音频事件识别信道自适应方法研究"", 《中国优秀硕士学位论文全文数据库信息科技辑》 *

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108417201A (en) * 2018-01-19 2018-08-17 苏州思必驰信息科技有限公司 The more speaker's identity recognition methods of single channel and system
CN111602410B (en) * 2018-02-27 2022-04-19 欧姆龙株式会社 Suitability determination device, suitability determination method, and storage medium
CN111602410A (en) * 2018-02-27 2020-08-28 欧姆龙株式会社 Suitability determination device, suitability determination method, and program
CN111210809A (en) * 2018-11-22 2020-05-29 阿里巴巴集团控股有限公司 Voice training data adaptation method and device, voice data conversion method and electronic equipment
CN111210809B (en) * 2018-11-22 2024-03-19 阿里巴巴集团控股有限公司 Voice training data adaptation method and device, voice data conversion method and electronic equipment
CN110120230A (en) * 2019-01-08 2019-08-13 国家计算机网络与信息安全管理中心 A kind of acoustic events detection method and device
CN110120230B (en) * 2019-01-08 2021-06-01 国家计算机网络与信息安全管理中心 Acoustic event detection method and device
CN109599118A (en) * 2019-01-24 2019-04-09 宁波大学 A kind of voice playback detection method of robustness
CN112489678B (en) * 2020-11-13 2023-12-05 深圳市云网万店科技有限公司 Scene recognition method and device based on channel characteristics
CN112489678A (en) * 2020-11-13 2021-03-12 苏宁云计算有限公司 Scene recognition method and device based on channel characteristics
CN112820318A (en) * 2020-12-31 2021-05-18 西安合谱声学科技有限公司 Impact sound model establishment and impact sound detection method and system based on GMM-UBM
CN117373488A (en) * 2023-12-08 2024-01-09 富迪科技(南京)有限公司 Audio real-time scene recognition system
CN117373488B (en) * 2023-12-08 2024-02-13 富迪科技(南京)有限公司 Audio real-time scene recognition system

Similar Documents

Publication Publication Date Title
CN107123432A (en) A kind of Self Matching Top N audio events recognize channel self-adapted method
CN103117059B (en) Voice signal characteristics extracting method based on tensor decomposition
CN103310789B (en) A kind of sound event recognition method of the parallel model combination based on improving
US20030236661A1 (en) System and method for noise-robust feature extraction
CN106952643A (en) A kind of sound pick-up outfit clustering method based on Gaussian mean super vector and spectral clustering
CN101853661B (en) Noise spectrum estimation and voice mobility detection method based on unsupervised learning
EP2662854A1 (en) Method and device for detecting fundamental tone
CN105206270A (en) Isolated digit speech recognition classification system and method combining principal component analysis (PCA) with restricted Boltzmann machine (RBM)
CN103794207A (en) Dual-mode voice identity recognition method
CN102789779A (en) Speech recognition system and recognition method thereof
CN104978507A (en) Intelligent well logging evaluation expert system identity authentication method based on voiceprint recognition
CN101640043A (en) Speaker recognition method based on multi-coordinate sequence kernel and system thereof
Mallidi et al. Novel neural network based fusion for multistream ASR
CN109378014A (en) A kind of mobile device source discrimination and system based on convolutional neural networks
CN113327626A (en) Voice noise reduction method, device, equipment and storage medium
Zhang et al. An efficient perceptual hashing based on improved spectral entropy for speech authentication
Hassan et al. Pattern classification in recognizing Qalqalah Kubra pronuncation using multilayer perceptrons
CN115758082A (en) Fault diagnosis method for rail transit transformer
CN106297768B (en) Speech recognition method
CN106941007A (en) A kind of audio event model composite channel adaptive approach
CN112151067B (en) Digital audio tampering passive detection method based on convolutional neural network
CN104392719A (en) Center sub-band model adaptation method for voice recognition system
CN110808067A (en) Low signal-to-noise ratio sound event detection method based on binary multiband energy distribution
CN107919136B (en) Digital voice sampling frequency estimation method based on Gaussian mixture model
CN116312628A (en) False audio detection method and system based on self knowledge distillation

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170901

WD01 Invention patent application deemed withdrawn after publication