CN106297805A - A kind of method for distinguishing speek person based on respiratory characteristic - Google Patents
A kind of method for distinguishing speek person based on respiratory characteristic Download PDFInfo
- Publication number
- CN106297805A CN106297805A CN201610626034.0A CN201610626034A CN106297805A CN 106297805 A CN106297805 A CN 106297805A CN 201610626034 A CN201610626034 A CN 201610626034A CN 106297805 A CN106297805 A CN 106297805A
- Authority
- CN
- China
- Prior art keywords
- breathing
- unknown
- speaker
- frame
- sound bite
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/06—Decision making techniques; Pattern matching strategies
Abstract
The invention discloses a kind of method for distinguishing speek person based on respiratory characteristic, the method specifically includes that unknown input sound bite, by the breathing template set up by mel-frequency cepstrum coefficient MFCC, zero-crossing rate ZCR and short-time energy E extracts the respiratory murmur in unknown sound bite, then the border detection algorithm eliminating false low ebb is utilized to reject the false positive part in respiratory murmur, obtain the respiratory murmur after clean cut separation, whether the speaker of unknown sound bite is from sample speaker and judge whether the speaker of the unknown sound bite is legal speaker finally to utilize the respiratory murmur after clean cut separation to distinguish.The present invention achieves the uniqueness of human body respiration first and is paid close attention to and study, and it is effectively applied in Speaker Recognition System, overcome " extraction of breath signal " that exploitation based on the speaker Recognition Technology breathed face and " breath signal process " two is challenged greatly.Thus the Speaker Recognition System that the present invention provides is simply efficiently, and recognition result is accurately and reliably.
Description
Technical field
A kind of method that the present invention relates to contactless biometric acquisition of signal, especially relates to a kind of based on breathing spy
The method for distinguishing speek person levied.
Background technology
Speaker Identification (Speaker Recognition) is a class underlying issue, is subdivided into two classes: speaker identification asks
Topic (Speaker Identification) and speaker verification's problem (Speaker Verification).The former distinguishes unknown
Whether speaker is a member in speaker's sample database known to certain;The latter confirms whether the speaker's identity of statement closes
Method.Identifying that speaker is divided into training and two stages of test, the training stage is for the foundation of speaker characteristic template, test phase
Then calculate the similarity of test data and feature templates, and draw judged result.According to the degree of dependence difference to speech text,
Speaker Identification is divided into again text relationship type (the most effective to certain special text), text independent type (any text is effective), literary composition
This prompting-type (is subordinate to special text collection effective).Although phonetic feature can weaken because of mike, the reason of channel, can be by strong
Health, the impact of emotion, the most imitated, but in recent years, speech processes correlation technique quickly grows, and has occurred many real
Time application, make speech processes relevant issues obtain more concern and research.
The Speaker Identification scheme deposited now or based on Source-Filter (source-wave filter), or based on Source-
System (source-system) model, or be simultaneously based on both and extract characteristic vector.Excitaton source information can pass through glottal signal base
Residue sample linear prediction in shape represents.Channel information can be captured by cepstrum signal.Prosodic information can be held by statistics
Continuous time, tone, the time dynamic of energy obtain.It is the energy source that sound produces based on aerodynamic respiratory
One of, can be extracted and be processed as one section of complete voice.Existing research is devoted to breath signal in voice
Detection and rejecting, in order to improve sound quality, improve speech-to-text converting algorithm, training typist and identify psychology shape
Condition etc..
Source-Filter (source-wave filter) theory thinks that voice is the response of sound channel system, and gives non-linear
, the good approximation of time dependent voice." source (source) " refers to 4 kinds of source speech signals: suction source, sources of friction,
Glottis (sounding) source and transient state source.Sound channel act like a wave filter, its input is produced by above-mentioned 4 kinds of source speech signals,
Output then forms vowel, consonant or arbitrarily voice.Sound channel also controls to manage tone and produces, voice quality, and harmonic wave, resonance is special
Property, rdaiation response etc..
In source/system (source/system) model, voice is built according to linear slowly varying discrete-time system
Mould.By this system of pulse excitation paracycle in the random noise in unvoiced speech source or speech sound source.Source comprises tone
The phonetic feature easily made mistakes.Therefore, source model is rarely used in Speaker Identification, is the most seldom strengthened by other features.
Relatively, system (system) model is corresponding with smooth power spectral envelope, and envelope passes through linear prediction or Mel wave filter
Analysis obtains.Therefore, this model is widely used in the Speaker Recognition System about cepstrum coefficient.
Both models all using breathing the part as speech source, are converted into the voice in speech sound source or noiseless language
Noise in source of sound.Shift to new management mechanisms it practice, respiratory is a kind of energy that energy is converted into sound.Additionally, at voice
In breathing be limited, usually, expiratory duration is longer than inspiratory duration, and the breathing in non-voice in living, its exhale and
Inspiration time is of substantially equal.
Respiratory system comprises lung, diaphram, Intercostal muscle and by bronchus, trachea, larynx, sound channel, the breathing letter that oral cavity is constituted
Road.We regard and breathe as the physiology fingerprint of whole respiratory system, it by intra-pulmonary pressure, air flow and muscular movement managed with
Control.During air-breathing, respiratory muscle shrinks, and intra-pulmonary pressure reduces, and air is in external inflow lung.Similarly, due in lung during expiration
Pressure increases, and space compression in lung, air breathes out external in lung.According to anatomy principle, before and after breathing, certainly exist one
Individual reticent interval.Breathing and affected by age, sex factor, normal continuous 100 400 milliseconds, reticent gap continues 20 milliseconds
Above.Reticent gap is by breathing the key separated of demarcating.
The generation breathed is lung, intra-pulmonary pressure, diaphragm, sound channel, trachea, the coefficient result of respiratory muscle, is to breathe system
Physiology fingerprint in system meaning.The flowing of air is not to complete moment, all has one before the generation therefore breathed and after occurring
Individual reticent gap (>=20 milliseconds).Comparing with the voice signal (not comprising breathing) of ordinary meaning, the energy of breath signal is weak, time
Between short (100 400 milliseconds), occurrence frequency low (12-18 beat/min), and produce overlapping at low frequency and non-respiratory voice signal
(100Hz–1kHz).Additionally, respiratory murmur is the most similar to phoneme and consonant friction sound, as in " church "/ù/, " vision "
In<Z>.Therefore, the exploitation breathed in speaker Recognition Technology face " extraction of breath signal " and " breath signal
Process " two challenge greatly, thus cause breathing and be not exploited in speaker Recognition Technology, and often as breathing noise quilt
Reject.
Summary of the invention
It is an object of the invention to: cannot be used effectively in speaker Recognition Technology for above-mentioned prior art is breathed
In, and face " extraction of breath signal " and " breath signal process " based on the exploitation of speaker Recognition Technology breathed
Two challenge greatly, and the present invention provides a kind of method for distinguishing speek person based on respiratory characteristic.
The technical solution used in the present invention is as follows:
A kind of method for distinguishing speek person based on respiratory characteristic, it is characterised in that comprise the following steps:
Step 1: input breath sample collection, carries out sub-frame processing to breath sample collection, obtains breathing frame, passes through mel-frequency
Breathing frame is established as breathing template by cepstrum coefficient MFCC, and calculates breathing frame and the breathing template that each breath sample collection obtains
Similarity, obtains its minima Bm;
Step 2: unknown input sound bite, carries out sub-frame processing to unknown sound bite, obtains unknown speech frame, calculates
Each unknown speech frame and the similarity breathing template;Calculate the zero-crossing rate ZCR of unknown sound bite and unknown sound bite
Short-time energy E;According to unknown sound bite with breathe the similarity of template, Bm, the zero-crossing rate ZCR of unknown sound bite and
The short-time energy E of unknown sound bite filters out the respiratory murmur in unknown sound bite, and the respiratory murmur filtered out composition is preliminary
Respiratory murmur after separation;
Step 3: utilize the reticent gap of the respiratory murmur after the border detection algorithm detection initial gross separation eliminating false low ebb,
Reject the false positive part in the respiratory murmur after initial gross separation according to reticent gap, obtain the respiratory murmur after clean cut separation;
Step 4: choose one group of sample speaker, gathers the breathing fragment of each sample speaker, sets up one group of speaker
Sample database, if need to judge, the speaker of unknown sound bite, whether from sample speaker, carries out step 5;If needing to judge
Whether the speaker of unknown sound bite is legal speaker, carries out step 6;
Step 5: calculate the respiratory murmur after the clean cut separation of described unknown sound bite every with speaker's sample database
The similarity of individual speaker's breath sample, takes sample corresponding to maximum similarity and speaks the speaker of artificial unknown sound bite,
Terminate;
Step 6: to each sample speaker's collecting test sample, choose a test sample;
Step 7: calculate test sample speaker breath sample each to speaker's sample database similar chosen
Degree, takes described test sample and the maximum in the similarity of each speaker's breath sample in speaker's sample database,
To a maximum similarity;
Step 8: choose another test sample, repeats step 7, until the maximum obtaining all test samples corresponding is similar
Degree, obtains maximum similarity group;
Step 9: gather the sound bite of legal speaker, utilizes breath sample collection to extract the breathing sheet of legal speaker
Section, calculates the similarity of the respiratory murmur after the clean cut separation of unknown sound bite and the breathing fragment of legal speaker;
Step 10: if the respiratory murmur after the clean cut separation of unknown sound bite is similar to the breathing fragment of legal speaker
Degree is more than the minima of maximum similarity group, then the speaker of unknown sound bite is accredited as legal speaker, otherwise is non-
Method speaker.
In such scheme, described step 1 comprises the following steps:
Step 1.1: input breath sample collection, is divided into the breathing frame of a length of 100 milliseconds by described breath sample collection, will be every
Individual breathing frame is divided into again continuous and overlapped breathing subframe, and each breather frame length is 10ms, and adjacent breather
A length of 5ms overlapped between frame;
Step 1.2 uses first-order difference wave filter that each breathing subframe is carried out preemphasis, obtains the breather after preemphasis
Frame;Wherein, first-order difference wave filter H:
H (z)=1-α z-1
Wherein, α is pre-emphasis parameters α ≈ 0.095, and z is signal sampling point data;
Step 1.3: the breathing subframe after each preemphasis of each breathing frame is calculated MFCC, obtains each breathing frame
Cepstrum matrix in short-term, removes DC component to the every string of matrix of cepstrum in short-term of each breathing frame, obtains each breathing frame
MFCC cepstrum matrix;
Step 1.4: the Mean Matrix T of calculating breath sample collection:
Wherein, N represents breath sample and concentrates the number breathing frame, and M (Xi) represents that i-th breathes the MFCC cepstrum square of frame
Battle array, i ∈ [1,2 ..., N];
The variance matrix V of calculating breath sample collection:
Step 1.5: being connected by the MFCC cepstrum matrix of all breathing frames is a big matrix Mb: Mb=[M (X1),…,M
(Xi),M(Xi+1),…,M(XN)]
Described big matrix is carried out singular value decomposition:
Mb=U Σ V*
Wherein, U is m × m rank unitary matrice;Σ is positive semidefinite m × n rank diagonal matrix, and V* represents the conjugate transpose of V, be n ×
N rank unitary matrice, the element on Σ diagonal is { λ1,λ2,λ3... }, it is the singular value of M, obtains singular value vector { λ1,λ2,
λ3,…};
Use maximum singular value λmDescribed singular value vector is normalized, obtains the singular value vector after final normalizationWherein, λm=max{ λ1,λ2,λ3,…};
Step 1.6: obtain one group and breathe template, described breathing template includes singular value vector S after normalization, breathes sample
The variance matrix V of this collection and the Mean Matrix T of breath sample collection.
In such scheme, described step 2 comprises the following steps:
Step 2.1: unknown input sound bite, carries out sub-frame processing to unknown sound bite, obtain unknown speech frame with
Unknown speech subframe, calculates each unknown speech frame and similarity B (X, T, V, S) breathing template;Calculate breath sample collection
Each breathing frame and the similarity breathing template, taking minimum similarity degree is Bm;
Calculate each unknown speech frame short-time energy E:
Wherein, n represents that signal the n-th sampled point, x [n] represent the n-th speech sample signal, and N represents that the window of sample is long
Degree, N0The window start of expression sample is N0Individual sampled point;
Calculate the meansigma methods of all unknown speech frames
The zero-crossing rate ZCR of the unknown sound bite of calculating:
Wherein, n represents that signal the n-th sampled point, x [n] represent the n-th speech sample signal, and N represents that the window of sample is long
Degree, N0The window start of expression sample is N0Individual sampled point;
Step 2.2: choose a unknown speech frame;
Step 2.3: if the unknown speech frame being selected and similarity B (X, T, V, S) breathing template are more than threshold value Bm/2,
And the zero-crossing rate ZCR of unknown sound bite is less than 0.25, and the short-time energy E of the unknown speech frame being selected is less than all
The meansigma methods of unknown speech frameThen judging that the unknown speech frame being selected is respiratory murmur, if being unsatisfactory for described condition, then judging
The unknown speech frame being selected is non-respiratory sound.
Step 2.4: choose other unknown speech frames, repeat step 2.3, until judging all the unknowns in unknown sound bite
Whether speech frame is respiratory murmur;
Step 2.5: retain respiratory murmur, rejects non-respiratory sound, obtains initial gross separation respiratory murmur;
In such scheme, described step 2.1 calculates the side breathing frame or unknown speech frame with the similarity breathing template
Method comprises the following steps:
Step 2.1.1: input breath sample collection or unknown sound bite, is divided into breath sample collection or unknown sound bite
The breathing frame of a length of 100 milliseconds or unknown speech frame, be divided into each breathing frame or unknown speech frame again continuously and phase mutual respect
Folded breathing subframe or unknown speech subframe, each subframe or unknown a length of 10ms of speech subframe of breathing, and adjacent the unknown
A length of 5ms overlapped between speech subframe;
Step 2.1.2: use first-order difference wave filter that each unknown speech subframe is carried out preemphasis, after obtaining preemphasis
Breathe frame or unknown speech frame;Wherein, first-order difference wave filter H:
H (z)=1-α z-1
Wherein, α is pre-emphasis parameters α ≈ 0.095;Z is signal sampling point data;
Step 2.1.3: to the breathing subframe after each preemphasis of each breathing frame or unknown speech frame or unknown voice
Subframe calculates MFCC, obtains each breathing frame or the cepstrum matrix in short-term of unknown speech frame, to each breathing frame or unknown voice
The every string of matrix of cepstrum in short-term of frame removes DC component, obtains each breathing frame or the MFCC cepstrum matrix M of unknown speech frame
(X)
Step 2.1.4: choose one and breathe frame or unknown speech frame X;
Step 2.1.5: calculate be selected breathe frame or the normalization difference matrix D of unknown speech frame:
Wherein, T represents the Mean Matrix of breath sample collection, and V represents breath sample collection variance matrix, and M (X) is selected
Take breathes frame or the MFCC cepstrum matrix of unknown speech frame;
Step 2.1.6: every string and half Hamming window of D are multiplied, make the cepstrum coefficient of low frequency be strengthened:
D (:, j)=D (:, j) hamming, j ∈ [1, NC]
Wherein, Nc represents the MFCC number of parameters in each breathing subframe or unknown speech subframe, the i.e. columns of D;
Hamming represents Hamming window.
Step 2.1.7: calculate similarity B (X, T, V, S) breathing frame or unknown speech frame X and breathing template being selected
Component Cp:
Wherein, n represents that be selected breathes breathing subframe or the quantity of unknown speech subframe, k in frame or unknown speech frame X
∈ [1, n],Represent that the kth of frame to be breathed or unknown speech frame X breathes jth MFCC in subframe or unknown speech subframe
Parameter;
Calculate another component breathing frame or unknown speech frame X and similarity B (X, T, V, S) breathing template being selected
Cn:
Step 2.1.8: calculate and breathe frame or unknown speech frame X and similarity B (X, T, V, S) breathing template:
B (X, T, V, S)=Cp*Cn;
Step 2.1.9: choose another and breathe frame or the MFCC cepstrum matrix of unknown speech frame, repeat step 2.1.5-
2.1.8;
Step 2.1.10: repeat step 2.1.9, until obtaining all breathing frames or unknown speech frame and the phase breathing template
Like degree;
In such scheme, the value of the border detection algorithm utilization eliminating false low ebb in described step 3 includes that breathing continues
Time threshold, energy threshold, zero-crossing rate ZCR bound threshold value and spectrum slope accurately find breathing border, described step 3 profit
Breathe in the position of current speech segment with Binary Zero-1 accurately instruction.
In such scheme, method for distinguishing speek person based on respiratory characteristic according to claim 1, it is characterised in that
Described step 5 calculates the respiratory murmur after the clean cut separation of this unknown sound bite and the speaker in speaker's sample database
The similarity of breath sample comprises the following steps:
Step 5.1: set the MFCC characteristic vector of sample in breath sample data base as (a1,a2,...,an), calculate and say
The Mean Matrix M of the MFCC characteristic vector of the speaker's breath sample in words people's sample database:
Wherein, aiI-th MFCC for the MFCC characteristic vector of the speaker's breath sample in speaker's sample database
Cepstrum matrix, n represents the MFCC cepstrum square in the MFCC characteristic vector of the speaker's breath sample in speaker's sample database
The number of battle array, i ∈ [1,2 ..., n];
The variance matrix V of the MFCC characteristic vector of the speaker's breath sample in calculating speaker's sample database:
Step 5.2: calculate the MFCC characteristic vector of all respiratory murmurs after the clean cut separation of unknown sound bite, be designated as
(b1,b2,...,bn), biMFCC cepstrum matrix for the respiratory murmur after the i-th clean cut separation of unknown sound bite;
Step 5.3: the characteristic vector (a to the speaker's breath sample in speaker's sample database1,a2,...,an) enter
Row normalization:
Wherein, r and c represents respectivelyRow and column,For akNormalization difference matrix, k ∈ [1,2 ..., n], i
∈[1,2,…,r],j∈[1,2,…,c];
Step 5.4: by (Sa1,Sa2,...,San) carry out ascending order arrangement, obtain (S1,S2,...,Sn);
Step 5.5: to the MFCC characteristic vector of all respiratory murmurs after the clean cut separation of unknown sound bite (b1,
B2 ..., bn) it is normalized:
Wherein, r and c represents respectivelyRow and column,For bkNormalization difference matrix, k ∈ [1,2 ..., n], i ∈
[1,2,…,r],j∈[1,2,…,c];
Step 5.6: calculate the similarity degree Pk of bk and reference template: by SbkWith ordered vector (S1, S2 ..., Sn)
Element compares one by one, Pk be in ordered vector less than Sbk number of elements divided by element sum, calculate Pk meansigma methods,
Obtain the similarity of the respiratory murmur after the clean cut separation of unknown sound bite and the sample in breath sample data base.
In such scheme, described step 9 calculates the respiratory murmur after the clean cut separation of unknown sound bite and speaks with legal
The similarity of the breathing fragment of people comprises the following steps:
Step 9.1: set the MFCC characteristic vector of breathing fragment of legal speaker as (a1,a2,...,an), it is legal to calculate
The Mean Matrix M of the MFCC characteristic vector of the breathing fragment of speaker:
Wherein, aiFor the i-th MFCC cepstrum matrix of MFCC characteristic vector of the breathing fragment of legal speaker, n represents
The number of the MFCC cepstrum matrix in the MFCC characteristic vector of the breathing fragment of legal speaker, i ∈ [1,2 ..., n];
Calculate the variance matrix V of the MFCC characteristic vector of the breathing fragment of legal speaker:
Step 9.2: calculate the MFCC characteristic vector of all respiratory murmurs after the clean cut separation of unknown sound bite, be designated as
(b1,b2,...,bn), biMFCC cepstrum matrix for the respiratory murmur after the i-th clean cut separation of unknown sound bite;
Step 9.3: legal speaker is breathed the characteristic vector (a of fragment1,a2,...,an) it is normalized:
Wherein, r and c represents respectivelyRow and column,For akNormalization difference matrix, k ∈ [1,2 ..., n], i
∈[1,2,…,r],j∈[1,2,…,c];
Step 9.4: by (Sa1,Sa2,...,San) carry out ascending order arrangement, obtain (S1,S2,...,Sn);
Step 9.5: to the MFCC characteristic vector of all respiratory murmurs after the clean cut separation of unknown sound bite (b1,
B2 ..., bn) it is normalized:
Wherein, r and c represents respectivelyRow and column,For bkNormalization difference matrix, k ∈ [1,2 ..., n], i ∈
[1,2,…,r],j∈[1,2,…,c];
Step 9.6: calculate the similarity degree Pk of bk and reference template: by SbkWith ordered vector (S1, S2 ..., Sn)
Element compares one by one, Pk be in ordered vector less than Sbk number of elements divided by element sum, calculate Pk meansigma methods,
Obtain the similarity of the respiratory murmur after the clean cut separation of unknown sound bite and the breathing fragment of legal speaker.
In such scheme, described step 7 calculates the test sample and each speaker in speaker's sample database chosen
The method of the similarity of breath sample and step 5 calculate the respiratory murmur after the clean cut separation of this unknown sound bite and speaker
In sample database, the method for the similarity of each speaker's breath sample is identical.
In such scheme, the method calculating MFCC in described step 1.3 and step 5.2 includes: calculate MFCC's by needing
Signal carries out fast Fourier transform, then calculates complicated sine curve coefficient, finally by bank of filters based on melscale
Exported.
In sum, owing to have employed technique scheme, the invention has the beneficial effects as follows:
1) present invention is as a set of based on the Verification System breathed, and the uniqueness achieving human body respiration first is paid close attention to
And research, and be effectively applied in Speaker Recognition System, overcome exploitation based on the speaker Recognition Technology breathed
" extraction of breath signal " and " the breath signal process " two that face are challenged greatly.
2) present invention knowledge based on mathematical statistics, devises a light similarity algorithm for decision-making: this calculation
Method is a series of simple vector operations utilizing MFCC Mean Matrix and variance matrix.Compared with traditional classification algorithm, this
Similarity algorithm in bright has more excellent classification performance.
3) present invention can operate with speaker identification's experiment and speaker verification tests;If simultaneously because a people's exhales
Haustorium official be interfered, then his breathing is signed it is possible to be modified, therefore this invention can be used for judging human body respiration organ
Whether it is interfered.
4) present invention can realize needing the identification under quiet occasion.
5) can realize cannot the identification of tester of sounding for the present invention.
6) sorting technique that the present invention uses relatively with traditional based on multiparameter, the complex model classification side assumed more
Method, has relatively low time complexity and space complexity.Additionally, the present invention uses algorithm process data based on MFCC more
Hurry up, required training sample is less, and ensures recognition accuracy, thus the Speaker Recognition System that the present invention provides is simply efficient,
And recognition result is accurately and reliably.
Accompanying drawing explanation
Fig. 1 is to judge the system framework figure that unknown speaker identity is the most legal in the present invention;
Fig. 2 is the frame diagram breathing Preliminary detection in the present invention in step 2;
Fig. 3 is the frame diagram breathing final detection in the present invention in step 3;
Fig. 4 is the experimental result signal table of step 6-8 in the present invention;
Fig. 5 represents the contrast after Mel bank of filters acts on breath signal and non-respiratory voice signal in the present invention;
Fig. 6 represents the feature of ZCR in the present invention, spectrum slope and STE;
Fig. 7 represents the formant of the voice signal of breath signal and non-respiratory in the present invention;
Fig. 8 represents the breathing under normal condition and the breath signal under abnormal condition in the present invention;
Detailed description of the invention
All features disclosed in this specification, in addition to mutually exclusive feature and/or step, all can be with any
Mode combines.
Elaborate below in conjunction with-8 couples of present invention of Fig. 1.
The present invention proposes a kind of method for distinguishing speek person based on respiratory characteristic, and this model is applied to Speaker Identification and takes
Obtain effect well.The schematic diagram that realizes of whole algorithm is similar to Fig. 1, including step:
Step 1: such as Fig. 1, input breath sample collection, breath sample collection is carried out sub-frame processing, obtains breathing frame, passes through prunus mume (sieb.) sieb.et zucc.
Breathing frame is established as breathing template by you frequency cepstral coefficient MFCC;Step 1 specifically includes following steps:
Step 1.1: input breath sample collection, is divided into the breathing frame of a length of 100 milliseconds by described breath sample collection, will be every
Individual breathing frame is divided into again continuous and overlapped breathing subframe, and each breather frame length is 10ms, and adjacent breather
A length of 5ms overlapped between frame;
Step 1.2 uses first-order difference wave filter that each breathing subframe is carried out preemphasis, obtains the breather after preemphasis
Frame;Wherein, first-order difference wave filter H:
H (z)=1-α z-1
Wherein, α is pre-emphasis parameters α ≈ 0.095, and z is signal sampling point data;
Step 1.3: the breathing subframe after each preemphasis of each breathing frame is calculated MFCC, obtains each breathing frame
Cepstrum matrix in short-term, removes DC component to the every string of matrix of cepstrum in short-term of each breathing frame, obtains each breathing frame
MFCC cepstrum matrix;
Step 1.4: the Mean Matrix T of calculating breath sample collection:
Wherein, N represents breath sample and concentrates the number breathing frame, and M (Xi) represents that i-th breathes the MFCC cepstrum square of frame
Battle array, i ∈ [1,2 ..., N];
The variance matrix V of calculating breath sample collection:
Step 1.5: being connected by the MFCC cepstrum matrix of all breathing frames is a big matrix Mb: Mb=[M (X1),…,M
(Xi),M(Xi+1),…,M(XN)]
Described big matrix is carried out singular value decomposition:
Mb=U Σ V*
Wherein, U is m × m rank unitary matrice;Σ is positive semidefinite m × n rank diagonal matrix, and V* represents the conjugate transpose of V, be n ×
N rank unitary matrice, the element on Σ diagonal is { λ1,λ2,λ3... }, it is the singular value of M, obtains singular value vector { λ1,λ2,
λ3,…};
Use maximum singular value λmDescribed singular value vector is normalized, obtains the singular value vector after final normalizationWherein, λm=max{ λ1,λ2,λ3,…};
Step 1.6: obtain one group and breathe template, described breathing template includes singular value vector S after normalization, breathes sample
The variance matrix V of this collection and the Mean Matrix T of breath sample collection.
Step 2: such as Fig. 2, unknown input sound bite, unknown sound bite is carried out sub-frame processing, obtains unknown voice
Frame, the similarity calculating each unknown speech frame with breathing template, calculate the zero-crossing rate ZCR of unknown sound bite and unknown language
The short-time energy E of tablet section;According to unknown sound bite and the breathing similarity of template, Bm, the zero-crossing rate of unknown sound bite
The short-time energy E of ZCR and unknown sound bite filters out the respiratory murmur in unknown sound bite, the respiratory murmur group filtered out
Become the respiratory murmur after initial gross separation;
Described step 2 comprises the following steps:
Step 2.1: unknown input sound bite, carries out sub-frame processing to unknown sound bite, obtain unknown speech frame with
Unknown speech subframe, calculates each unknown speech frame and similarity B (X, T, V, S) breathing template;
Calculating each breathing frame of breath sample collection and breathe the similarity of template, taking minimum similarity degree is Bm;
Calculate each unknown speech frame short-time energy E:
Wherein, n represents that signal the n-th sampled point, x [n] represent the n-th speech sample signal, and N represents that the window of sample is long
Degree, N0The window start of expression sample is N0Individual sampled point;
Calculate the meansigma methods of all unknown speech frames
The zero-crossing rate ZCR of the unknown sound bite of calculating:
Wherein, n represents that signal the n-th sampled point, x [n] represent the n-th speech sample signal, and N represents that the window of sample is long
Degree, N0The window start of expression sample is N0Individual sampled point;
The method calculating the similarity breathed frame or unknown speech frame and breathe template in described step 2.1 includes following step
Rapid:
Step 2.1.1: input breath sample collection or unknown sound bite, is divided into breath sample collection or unknown sound bite
The breathing frame of a length of 100 milliseconds or unknown speech frame, be divided into each breathing frame or unknown speech frame again continuously and phase mutual respect
Folded breathing subframe or unknown speech subframe, each subframe or unknown a length of 10ms of speech subframe of breathing, and adjacent the unknown
A length of 5ms overlapped between speech subframe;
Step 2.1.2: use first-order difference wave filter that each unknown speech subframe is carried out preemphasis, after obtaining preemphasis
Breathe frame or unknown speech frame;Wherein, first-order difference wave filter H:
H (z)=1-α z-1
Wherein, α is pre-emphasis parameters α ≈ 0.095;Z is signal sampling point data;
Step 2.1.3: to the breathing subframe after each preemphasis of each breathing frame or unknown speech frame or unknown voice
Subframe calculates MFCC, obtains each breathing frame or the cepstrum matrix in short-term of unknown speech frame, to each breathing frame or unknown voice
The every string of matrix of cepstrum in short-term of frame removes DC component, obtains each breathing frame or the MFCC cepstrum matrix M of unknown speech frame
(X)
Step 2.1.4: choose one and breathe frame or unknown speech frame X;
Step 2.1.5: calculate be selected breathe frame or the normalization difference matrix D of unknown speech frame:
Wherein, T represents the Mean Matrix of breath sample collection, and V represents breath sample collection variance matrix, and M (X) is selected
Take breathes frame or the MFCC cepstrum matrix of unknown speech frame;
Step 2.1.6: every string and half Hamming window of D are multiplied, make the cepstrum coefficient of low frequency be strengthened:
D (:, j)=D (:, j) hamming, j ∈ [1, NC]
Wherein, Nc represents the MFCC number of parameters in each breathing subframe or unknown speech subframe, the i.e. columns of D;
Hamming represents Hamming window.
Step 2.1.7: calculate similarity B (X, T, V, S) breathing frame or unknown speech frame X and breathing template being selected
Component Cp:
Wherein, n represents that be selected breathes breathing subframe or the quantity of unknown speech subframe, k in frame or unknown speech frame X
∈ [1, n],Represent that the kth of frame to be breathed or unknown speech frame X breathes jth MFCC in subframe or unknown speech subframe
Parameter;
Calculate another component breathing frame or unknown speech frame X and similarity B (X, T, V, S) breathing template being selected
Cn:
Step 2.1.8: calculate and breathe frame or unknown speech frame X and similarity B (X, T, V, S) breathing template:
B (X, T, V, S)=Cp*Cn;
Step 2.1.9: choose another and breathe frame or the MFCC cepstrum matrix of unknown speech frame, repeat step 2.1.5-
2.1.8;
Step 2.1.10: repeat step 2.1.9, until obtaining all breathing frames or unknown speech frame and the phase breathing template
Like degree;
In such scheme, the value of the border detection algorithm utilization eliminating false low ebb in described step 3 includes that breathing continues
Time threshold, energy threshold, zero-crossing rate ZCR bound threshold value and spectrum slope accurately find breathing border, described step 3 profit
Breathe in the position of current speech segment with Binary Zero-1 accurately instruction.
Step 2.2: choose a unknown speech frame;
Step 2.3: if the unknown speech frame being selected and similarity B (X, T, V, S) breathing template are more than threshold value Bm/2,
And the zero-crossing rate ZCR of unknown sound bite is less than 0.25 (now sample rate is 44kHz), and the unknown speech frame being selected
Short-time energy E less than the meansigma methods of all unknown speech framesThen judge that the unknown speech frame being selected is respiratory murmur, if not
Meet described condition, then judge that the unknown speech frame being selected is non-respiratory sound.
Step 2.4: choose other unknown speech frames, repeat step 2.3, until judging all the unknowns in unknown sound bite
Whether speech frame is respiratory murmur;
Step 2.5: retain respiratory murmur, rejects non-respiratory sound, obtains initial gross separation respiratory murmur;
Step 3: such as Fig. 3, utilize the heavy of the respiratory murmur after the border detection algorithm detection initial gross separation eliminating false low ebb
Silent gap, rejects the false positive part in the respiratory murmur after initial gross separation according to reticent gap, obtains the breathing after clean cut separation
Sound;The IEEE of the 3rd phase of volume 15 in described such as in the March, 2007 of border detection algorithm specific implementation eliminating false low ebb
TRANSACTIONS ON AUDIO, " An effective algorithm in SPEECH, AND LANGUAGE PROCESSING
for automatic detection and exact demarcation of breath sounds in speech and
Song " literary composition;
Step 4: choose one group of sample speaker, gathers the breathing fragment of each sample speaker, sets up one group of speaker
Sample database, if need to judge, the speaker of unknown sound bite, whether from sample speaker, carries out step 5;If needing to judge
Whether the speaker of unknown sound bite is legal speaker, carries out step 6;
Step 5: calculate in the respiratory murmur after the clean cut separation of described unknown sound bite and speaker's sample database
The similarity of each speaker's breath sample, takes sample corresponding to maximum similarity and speaks the speaker of artificial unknown sound bite,
Terminate;
Described step 5 calculates in the respiratory murmur after the clean cut separation of this unknown sound bite and speaker's sample database
The similarity of speaker's breath sample comprise the following steps:
Step 5.1: set the MFCC characteristic vector of speaker's breath sample in speaker's sample database as (a1,
a2,...,an), the Mean Matrix M of the MFCC characteristic vector of the speaker's breath sample in calculating speaker's sample database:
Wherein, aiI-th MFCC for the MFCC characteristic vector of the speaker's breath sample in speaker's sample database
Cepstrum matrix, n represents the MFCC cepstrum square in the MFCC characteristic vector of the speaker's breath sample in speaker's sample database
The number of battle array, i ∈ [1,2 ..., n];
The variance matrix V of the MFCC characteristic vector of the speaker's breath sample in calculating speaker's sample database:
Step 5.2: calculate the MFCC characteristic vector of all respiratory murmurs after the clean cut separation of unknown sound bite, be designated as
(b1,b2,...,bn), biMFCC cepstrum matrix for the respiratory murmur after the i-th clean cut separation of unknown sound bite;
Step 5.3: the characteristic vector (a to the speaker's breath sample in speaker's sample database1,a2,...,an) enter
Row normalization:
Wherein, r and c represents respectivelyRow and column,For akNormalization difference matrix, k ∈ [1,2 ..., n], i
∈[1,2,…,r],j∈[1,2,…,c];
Step 5.4: by (Sa1,Sa2,...,San) carry out ascending order arrangement, obtain (S1,S2,...,Sn);
Step 5.5: to the MFCC characteristic vector of all respiratory murmurs after the clean cut separation of unknown sound bite (b1,
B2 ..., bn) it is normalized:
Wherein, r and c represents respectivelyRow and column,For bkNormalization difference matrix, k ∈ [1,2 ..., n], i
∈[1,2,…,r],j∈[1,2,…,c];
Step 5.6: calculate the similarity degree Pk of bk and reference template: by SbkWith ordered vector (S1, S2 ..., Sn)
Element compares one by one, Pk be in ordered vector less than Sbk number of elements divided by element sum, calculate Pk meansigma methods,
Obtain the similarity of the respiratory murmur after the clean cut separation of unknown sound bite and the sample in breath sample data base.
Step 6: to each sample speaker's collecting test sample, choose a test sample;
Step 7: such as Fig. 4, calculate the test sample and each speaker's breath sample in speaker's sample database chosen
Similarity, take the maximum in the similarity of each speaker's breath sample in described test sample and speaker's sample database
Value, obtains a maximum similarity;
Described step 7 calculates the test sample and each speaker's breath sample in speaker's sample database chosen
The method of similarity and step 5 calculate the respiratory murmur after the clean cut separation of this unknown sound bite and speaker's sample database
In the method for similarity of each speaker's breath sample identical.
Step 8: such as Fig. 4, chooses another test sample, repeats step 7, until it is corresponding to obtain all test samples
Big similarity, obtains maximum similarity group;
Step 9: gather the breathing fragment of legal speaker, calculate the respiratory murmur after the clean cut separation of unknown sound bite with
The similarity of the breathing fragment of legal speaker;
Described step 9 calculates the breathing fragment of the respiratory murmur after the clean cut separation of unknown sound bite and legal speaker
Similarity comprise the following steps:
Step 9.1: set the MFCC characteristic vector of breathing fragment of legal speaker as (a1,a2,...,an), it is legal to calculate
The Mean Matrix M of the MFCC characteristic vector of the breathing fragment of speaker:
Wherein, aiFor the i-th MFCC cepstrum matrix of MFCC characteristic vector of the breathing fragment of legal speaker, n represents
The number of the MFCC cepstrum matrix in the MFCC characteristic vector of the breathing fragment of legal speaker, i ∈ [1,2 ..., n];
Calculate the variance matrix V of the MFCC characteristic vector of the breathing fragment of legal speaker:
Step 9.2: calculate the MFCC characteristic vector of all respiratory murmurs after the clean cut separation of unknown sound bite, be designated as
(b1,b2,...,bn), biMFCC cepstrum matrix for the respiratory murmur after the i-th clean cut separation of unknown sound bite;
Step 9.3: legal speaker is breathed the characteristic vector (a of fragment1,a2,...,an) it is normalized:
Wherein, r and c represents respectivelyRow and column,For akNormalization difference matrix, k ∈ [1,2 ..., n], i
∈[1,2,…,r],j∈[1,2,…,c];
Step 9.4: by (Sa1,Sa2,...,San) carry out ascending order arrangement, obtain (S1,S2,...,Sn);
Step 9.5: to the MFCC characteristic vector of all respiratory murmurs after the clean cut separation of unknown sound bite (b1,
B2 ..., bn) it is normalized:
Wherein, r and c represents respectivelyRow and column,For bkNormalization difference matrix, k ∈ [1,2 ..., n], i
∈[1,2,…,r],j∈[1,2,…,c];
Step 9.6: calculate the similarity degree Pk of bk and reference template: by SbkWith ordered vector (S1, S2 ..., Sn)
Element compares one by one, Pk be in ordered vector less than Sbk number of elements divided by element sum, calculate Pk meansigma methods,
Obtain the similarity of the respiratory murmur after the clean cut separation of unknown sound bite and the breathing fragment of legal speaker.
Step 10: if the respiratory murmur after the clean cut separation of unknown sound bite is similar to the breathing fragment of legal speaker
Degree is more than the minima of maximum similarity group, then the speaker of unknown sound bite is accredited as legal speaker, otherwise is non-
Method speaker.
The method calculating MFCC in described step 1.3 and step 5.2 includes: carry out needing the signal calculating MFCC quickly
Fourier transformation, then calculates complicated sine curve coefficient, is finally exported by bank of filters based on melscale.
The present invention is illustrated by above-described embodiment, but it is to be understood that, above-described embodiment is only intended to
Citing and descriptive purpose, and be not intended to limit the invention in described scope of embodiments.In addition people in the art
Member, it is understood that the invention is not limited in above-described embodiment, can also make more kinds of according to the teachings of the present invention
Variants and modifications, within these variants and modifications all fall within scope of the present invention.Protection scope of the present invention by
The appended claims and equivalent scope thereof are defined.
Claims (9)
1. a method for distinguishing speek person based on respiratory characteristic, it is characterised in that comprise the following steps:
Step 1: input breath sample collection, carries out sub-frame processing to breath sample collection, obtains breathing frame, by mel-frequency cepstrum
Breathing frame is established as breathing template by coefficient MFCC, and calculate that each breath sample collection obtains to breathe frame similar with breathing template
Degree, obtains its minima Bm;
Step 2: unknown input sound bite, carries out sub-frame processing to unknown sound bite, obtains unknown speech frame, calculates each
Unknown speech frame and the similarity breathing template;Calculate the zero-crossing rate ZCR of unknown sound bite and the short of unknown sound bite
Shi NengliangE;According to unknown sound bite and the breathing similarity of template, Bm, the zero-crossing rate ZCR of unknown sound bite and the unknown
The short-time energy E of sound bite filters out the respiratory murmur in unknown sound bite, the respiratory murmur filtered out composition initial gross separation
After respiratory murmur;
Step 3: utilize the reticent gap of the respiratory murmur after the border detection algorithm detection initial gross separation eliminating false low ebb, according to
The false positive part in the respiratory murmur after initial gross separation is rejected in reticent gap, obtains the respiratory murmur after clean cut separation;
Step 4: choose one group of sample speaker, gathers the breathing fragment of each sample speaker, sets up one group of speaker's sample
Data base, if need to judge, the speaker of unknown sound bite, whether from sample speaker, carries out step 5;
If need to judge, the speaker of unknown sound bite, whether as legal speaker, carries out step 6;
Step 5: calculate the respiratory murmur after the clean cut separation of described unknown sound bite and each theory in speaker's sample database
The similarity of words people's breath sample, takes sample corresponding to maximum similarity and speaks the speaker of artificial unknown sound bite, terminate;
Step 6: to each sample speaker's collecting test sample, choose a test sample;
Step 7: calculate the test sample and the similarity of each speaker's breath sample in speaker's sample database chosen, take
Described test sample and the maximum in the similarity of each speaker's breath sample in speaker's sample database, obtain one
Maximum similarity;
Step 8: choose another test sample, repeats step 7, until obtaining the maximum similarity that all test samples are corresponding,
Obtain maximum similarity group;
Step 9: gather the sound bite of legal speaker, utilizes breath sample collection to extract the breathing fragment of legal speaker, meter
Calculate the similarity of the respiratory murmur after the clean cut separation of unknown sound bite and the breathing fragment of legal speaker;
Step 10: if the respiratory murmur after the clean cut separation of unknown sound bite is big with the similarity of the breathing fragment of legal speaker
In the minima of maximum similarity group, then the speaker of unknown sound bite is accredited as legal speaker, otherwise is illegally to say
Words people.
Method for distinguishing speek person based on respiratory characteristic the most according to claim 1, it is characterised in that described step 1 includes
Following steps:
Step 1.1: input breath sample collection, is divided into the breathing frame of a length of 100 milliseconds, by each by described breath sample collection
Breathing frame is divided into again continuous and overlapped breathing subframe, each breather frame length to be 10ms, and adjacent breathing subframe
Between overlapped a length of 5ms;
Step 1.2 uses first-order difference wave filter that each breathing subframe is carried out preemphasis, obtains the breathing subframe after preemphasis;
Wherein, first-order difference wave filter H:
H (z)=1-α z-1
Wherein, α is pre-emphasis parameters α ≈ 0.095, and z is signal sampling point data;
Step 1.3: the breathing subframe after each preemphasis of each breathing frame is calculated MFCC, obtains each breathing frame in short-term
Cepstrum matrix, removes DC component to the every string of matrix of cepstrum in short-term of each breathing frame, and the MFCC obtaining each breathing frame falls
Spectrum matrix;
Step 1.4: the Mean Matrix T of calculating breath sample collection:
Wherein, N represents breath sample and concentrates the number breathing frame, and M (Xi) represents that i-th breathes the MFCC cepstrum matrix of frame, i ∈
[1,2,…,N];
The variance matrix V of calculating breath sample collection:
Step 1.5: being connected by the MFCC cepstrum matrix of all breathing frames is a big matrix Mb:
Mb=[M (X1),…,M(Xi),M(Xi+1),…,M(XN)]
Described big matrix is carried out singular value decomposition:
Mb=U Σ V*
Wherein, U is m × m rank unitary matrice;Σ is positive semidefinite m × n rank diagonal matrix, and V* represents the conjugate transpose of V, is n × n rank
Unitary matrice, the element on Σ diagonal is { λ1,λ2,λ3... }, it is the singular value of M, obtains singular value vector { λ1,λ2,
λ3,…};
Use maximum singular value λmDescribed singular value vector is normalized, obtains the singular value vector after final normalizationWherein, λm=max{ λ1,λ2,λ3,…};
Step 1.6: obtaining one group and breathe template, described breathing template includes singular value vector S after normalization, breath sample collection
Variance matrix V and the Mean Matrix T of breath sample collection.
Method for distinguishing speek person based on respiratory characteristic the most according to claim 1, it is characterised in that described step 2 includes
Following steps:
Step 2.1: unknown input sound bite, carries out sub-frame processing to unknown sound bite, obtains unknown speech frame with unknown
Speech subframe, calculates each unknown speech frame and similarity B (X, T, V, S) breathing template;Calculate each of breath sample collection
The similarity breathing frame and breathe template, taking minimum similarity degree is Bm;
Calculate each unknown speech frame short-time energy E:
Wherein, n represents that signal the n-th sampled point, x [n] represent the n-th speech sample signal, and N represents the length of window of sample, N0
The window start of expression sample is N0Individual sampled point;
Calculate the meansigma methods of all unknown speech frames
The zero-crossing rate ZCR of the unknown sound bite of calculating:
Wherein, n represents that signal the n-th sampled point, x [n] represent the n-th speech sample signal, and N represents the length of window of sample, N0
The window start of expression sample is N0Individual sampled point;
Step 2.2: choose a unknown speech frame;
Step 2.3: if the unknown speech frame being selected and similarity B (X, T, V, S) breathing template are more than threshold value Bm/2, and
The zero-crossing rate ZCR of unknown sound bite is less than 0.25, and the short-time energy E of the unknown speech frame being selected is less than all the unknowns
The meansigma methods of speech frameThen judging that the unknown speech frame being selected is respiratory murmur, if being unsatisfactory for described condition, then judging selected
The unknown speech frame taken is non-respiratory sound.
Step 2.4: choose other unknown speech frames, repeat step 2.3, until judging all unknown voices in unknown sound bite
Whether frame is respiratory murmur;
Step 2.5: retain respiratory murmur, rejects non-respiratory sound, obtains initial gross separation respiratory murmur.
Method for distinguishing speek person based on respiratory characteristic the most according to claim 3, it is characterised in that in described step 2.1
Calculate and breathe the method for frame or unknown speech frame and the similarity of breathing template and comprise the following steps:
Step 2.1.1: input breath sample collection or unknown sound bite, is divided into length by breath sample collection or unknown sound bite
It is the breathing frame of 100 milliseconds or unknown speech frame, each breathing frame or unknown speech frame are divided into again continuous and overlapped
Breathe subframe or unknown speech subframe, each subframe or unknown a length of 10ms of speech subframe of breathing, and adjacent unknown voice
A length of 5ms overlapped between subframe;
Step 2.1.2: use first-order difference wave filter that each unknown speech subframe is carried out preemphasis, obtain exhaling after preemphasis
Inhale frame or unknown speech frame;Wherein, first-order difference wave filter H:
H (z)=1-α z-1
α is pre-emphasis parameters α ≈ 0.095;Z is signal sampling point data;
Step 2.1.3: to the breathing subframe after each preemphasis of each breathing frame or unknown speech frame or unknown speech subframe
Calculate MFCC, obtain each breathing frame or the cepstrum matrix in short-term of unknown speech frame, to each breathing frame or unknown speech frame
The every string of cepstrum matrix removes DC component in short-term, obtains each breathing frame or MFCC cepstrum matrix M (X) of unknown speech frame
Step 2.1.4: choose one and breathe frame or unknown speech frame X;
Step 2.1.5: calculate be selected breathe frame or the normalization difference matrix D of unknown speech frame:
Wherein, T represents the Mean Matrix of breath sample collection, and V represents breath sample collection variance matrix, and M (X) is be selected
Breathe frame or the MFCC cepstrum matrix of unknown speech frame;
Step 2.1.6: every string and half Hamming window of D are multiplied, make the cepstrum coefficient of low frequency be strengthened:
D (:, j)=D (:, j) hamming, j ∈ [1, NC]
Wherein, Nc represents the MFCC number of parameters in each breathing subframe or unknown speech subframe, the i.e. columns of D;Hamming table
Show Hamming window.
Step 2.1.7: what calculating was selected breathes dividing of frame or unknown speech frame X similarity B (X, T, V, S) with breathing template
Amount Cp:
Wherein, n represents that be selected breathes breathing subframe or the quantity of unknown speech subframe, k ∈ in frame or unknown speech frame X
[1, n], DkjRepresent that the kth of frame to be breathed or unknown speech frame X breathes jth MFCC ginseng in subframe or unknown speech subframe
Number;
Calculate another component Cn breathing frame or unknown speech frame X and similarity B (X, T, V, S) breathing template being selected:
Step 2.1.8: calculate and breathe frame or unknown speech frame X and similarity B (X, T, V, S) breathing template:
B (X, T, V, S)=Cp*Cn;
Step 2.1.9: choose another and breathe frame or the MFCC cepstrum matrix of unknown speech frame, repeat step 2.1.5-2.1.8;
Step 2.1.10: repeat step 2.1.9, until it is similar with breathing template to obtain all breathing frames or unknown speech frame
Degree.
5. according to method for distinguishing speek person based on respiratory characteristic described in claim 1-4, it is characterised in that in described step 3
The value of the border detection algorithm utilization eliminating false low ebb includes breathing duration threshold, and energy threshold, zero-crossing rate ZCR is upper and lower
Limit threshold value and spectrum slope accurately find breathing border, and described step 3 utilizes Binary Zero-1 accurately instruction to breathe at current language
The position of tablet section.
6. according to method for distinguishing speek person based on respiratory characteristic described in claim 1-5, it is characterised in that in described step 5
Calculate the respiratory murmur after the clean cut separation of described unknown sound bite and breathe sample with each speaker in speaker's sample database
This similarity comprises the following steps:
Step 5.1: set the MFCC characteristic vector of speaker's breath sample in speaker's sample database as (a1,a2,...,
an), the Mean Matrix M of the MFCC characteristic vector of the speaker's breath sample in calculating speaker's sample database:
Wherein, aiI-th MFCC cepstrum square for the MFCC characteristic vector of the speaker's breath sample in speaker's sample database
Battle array, n represents the individual of the MFCC cepstrum matrix in the MFCC characteristic vector of the speaker's breath sample in speaker's sample database
Number, i ∈ [1,2 ..., n];
The variance matrix V of the MFCC characteristic vector of the speaker's breath sample in calculating speaker's sample database:
Step 5.2: calculate the MFCC characteristic vector of all respiratory murmurs after the clean cut separation of unknown sound bite, be designated as (b1,
b2,...,bn), biMFCC cepstrum matrix for the respiratory murmur after the i-th clean cut separation of unknown sound bite;
Step 5.3: the characteristic vector (a to the speaker's breath sample in speaker's sample database1,a2,...,an) return
One changes:
Wherein, r and c represents respectivelyRow and column,For akNormalization difference matrix, k ∈ [1,2 ..., n], i ∈ [1,
2,…,r],j∈[1,2,…,c];
Step 5.4: by (Sa1,Sa2,...,San) carry out ascending order arrangement, obtain (S1,S2,...,Sn);
Step 5.5: to the MFCC characteristic vector of all respiratory murmurs after the clean cut separation of unknown sound bite (b1, b2 ...,
Bn) it is normalized:
Wherein, r and c represents respectivelyRow and column,For bkNormalization difference matrix, k ∈ [1,2 ..., n], i ∈ [1,
2,…,r],j∈[1,2,…,c];
Step 5.6: calculate the similarity degree Pk of bk and reference template: by SbkWith ordered vector (S1, S2 ..., Sn) element enter
Row compare one by one, Pk be in ordered vector less than Sbk number of elements divided by element sum, calculate Pk meansigma methods, obtain not
Know the similarity of the respiratory murmur after the clean cut separation of sound bite and the sample in breath sample data base.
7. according to method for distinguishing speek person based on respiratory characteristic described in claim 1-6, it is characterised in that in described step 9
The similarity of the breathing fragment calculating the respiratory murmur after the clean cut separation of unknown sound bite and legal speaker includes following step
Rapid:
Step 9.1: set the MFCC characteristic vector of breathing fragment of legal speaker as (a1,a2,...,an), calculate legal speaking
The Mean Matrix M of the MFCC characteristic vector of the breathing fragment of people:
Wherein, aiFor the i-th MFCC cepstrum matrix of MFCC characteristic vector of the breathing fragment of legal speaker, n represents legal theory
The number of the MFCC cepstrum matrix in the MFCC characteristic vector of the breathing fragment of words people, i ∈ [1,2 ..., n];
Calculate the variance matrix V of the MFCC characteristic vector of the breathing fragment of legal speaker:
Step 9.2: calculate the MFCC characteristic vector of all respiratory murmurs after the clean cut separation of unknown sound bite, be designated as (b1,
b2,...,bn), biMFCC cepstrum matrix for the respiratory murmur after the i-th clean cut separation of unknown sound bite;
Step 9.3: legal speaker is breathed the characteristic vector (a of fragment1,a2,...,an) it is normalized:
Wherein, r and c represents respectivelyRow and column,For akNormalization difference matrix, k ∈ [1,2 ..., n], i ∈ [1,
2,…,r],j∈[1,2,…,c];
Step 9.4: by (Sa1,Sa2,...,San) carry out ascending order arrangement, obtain (S1,S2,...,Sn);
Step 9.5: to the MFCC characteristic vector of all respiratory murmurs after the clean cut separation of unknown sound bite (b1, b2 ...,
Bn) it is normalized:
Wherein, r and c represents respectivelyRow and column,For bkNormalization difference matrix, k ∈ [1,2 ..., n], i ∈ [1,
2,…,r],j∈[1,2,…,c];
Step 9.6: calculate the similarity degree Pk of bk and reference template: by SbkWith ordered vector (S1, S2 ..., Sn) element enter
Row compare one by one, Pk be in ordered vector less than Sbk number of elements divided by element sum, calculate Pk meansigma methods, obtain not
Know the similarity of the respiratory murmur after the clean cut separation of sound bite and the breathing fragment of legal speaker.
8. according to method for distinguishing speek person based on respiratory characteristic described in claim 1-7, it is characterised in that in described step 7
Calculate in method and the step 5 of the similarity of each speaker's breath sample in the test sample and speaker's sample database chosen
Calculate the respiratory murmur after the clean cut separation of this unknown sound bite and each speaker's breath sample in speaker's sample database
The method of similarity is identical.
9. according to method for distinguishing speek person based on respiratory characteristic described in claim 1-8, it is characterised in that described step 1.3
Include with the method calculating MFCC in step 5.2: by needing the signal calculating MFCC to carry out fast Fourier transform, then calculate
Complicated sine curve coefficient, is finally exported by bank of filters based on melscale.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610626034.0A CN106297805B (en) | 2016-08-02 | 2016-08-02 | A kind of method for distinguishing speek person based on respiratory characteristic |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610626034.0A CN106297805B (en) | 2016-08-02 | 2016-08-02 | A kind of method for distinguishing speek person based on respiratory characteristic |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106297805A true CN106297805A (en) | 2017-01-04 |
CN106297805B CN106297805B (en) | 2019-07-05 |
Family
ID=57664264
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610626034.0A Active CN106297805B (en) | 2016-08-02 | 2016-08-02 | A kind of method for distinguishing speek person based on respiratory characteristic |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106297805B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110473563A (en) * | 2019-08-19 | 2019-11-19 | 山东省计算中心(国家超级计算济南中心) | Breathing detection method, system, equipment and medium based on time-frequency characteristics |
CN111568400A (en) * | 2020-05-20 | 2020-08-25 | 山东大学 | Human body sign information monitoring method and system |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1547191A (en) * | 2003-12-12 | 2004-11-17 | 北京大学 | Semantic and sound groove information combined speaking person identity system |
JP2005530214A (en) * | 2002-06-19 | 2005-10-06 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | Mega speaker identification (ID) system and method corresponding to its purpose |
CN101770774A (en) * | 2009-12-31 | 2010-07-07 | 吉林大学 | Embedded-based open set speaker recognition method and system thereof |
CN102324232A (en) * | 2011-09-12 | 2012-01-18 | 辽宁工业大学 | Method for recognizing sound-groove and system based on gauss hybrid models |
CN102486922A (en) * | 2010-12-03 | 2012-06-06 | 株式会社理光 | Speaker recognition method, device and system |
CN103280220A (en) * | 2013-04-25 | 2013-09-04 | 北京大学深圳研究生院 | Real-time recognition method for baby cry |
CN104112446A (en) * | 2013-04-19 | 2014-10-22 | 华为技术有限公司 | Breathing voice detection method and device |
US20150016617A1 (en) * | 2012-02-21 | 2015-01-15 | Tata Consultancy Services Limited | Modified mel filter bank structure using spectral characteristics for sound analysis |
-
2016
- 2016-08-02 CN CN201610626034.0A patent/CN106297805B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005530214A (en) * | 2002-06-19 | 2005-10-06 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | Mega speaker identification (ID) system and method corresponding to its purpose |
CN1547191A (en) * | 2003-12-12 | 2004-11-17 | 北京大学 | Semantic and sound groove information combined speaking person identity system |
CN101770774A (en) * | 2009-12-31 | 2010-07-07 | 吉林大学 | Embedded-based open set speaker recognition method and system thereof |
CN102486922A (en) * | 2010-12-03 | 2012-06-06 | 株式会社理光 | Speaker recognition method, device and system |
CN102324232A (en) * | 2011-09-12 | 2012-01-18 | 辽宁工业大学 | Method for recognizing sound-groove and system based on gauss hybrid models |
US20150016617A1 (en) * | 2012-02-21 | 2015-01-15 | Tata Consultancy Services Limited | Modified mel filter bank structure using spectral characteristics for sound analysis |
CN104112446A (en) * | 2013-04-19 | 2014-10-22 | 华为技术有限公司 | Breathing voice detection method and device |
CN103280220A (en) * | 2013-04-25 | 2013-09-04 | 北京大学深圳研究生院 | Real-time recognition method for baby cry |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110473563A (en) * | 2019-08-19 | 2019-11-19 | 山东省计算中心(国家超级计算济南中心) | Breathing detection method, system, equipment and medium based on time-frequency characteristics |
CN111568400A (en) * | 2020-05-20 | 2020-08-25 | 山东大学 | Human body sign information monitoring method and system |
CN111568400B (en) * | 2020-05-20 | 2024-02-09 | 山东大学 | Human body sign information monitoring method and system |
Also Published As
Publication number | Publication date |
---|---|
CN106297805B (en) | 2019-07-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kinnunen | Spectral features for automatic text-independent speaker recognition | |
Kandali et al. | Emotion recognition from Assamese speeches using MFCC features and GMM classifier | |
Bocklet et al. | Automatic evaluation of parkinson's speech-acoustic, prosodic and voice related cues. | |
Kumar et al. | Design of an automatic speaker recognition system using MFCC, vector quantization and LBG algorithm | |
CN107293302A (en) | A kind of sparse spectrum signature extracting method being used in voice lie detection system | |
CN108922541A (en) | Multidimensional characteristic parameter method for recognizing sound-groove based on DTW and GMM model | |
Yusnita et al. | Malaysian English accents identification using LPC and formant analysis | |
CN109727608A (en) | A kind of ill voice appraisal procedure based on Chinese speech | |
CN104992707A (en) | Cleft palate voice glottal stop automatic identification algorithm and device | |
Sun et al. | Investigating glottal parameters for differentiating emotional categories with similar prosodics | |
Fezari et al. | Acoustic analysis for detection of voice disorders using adaptive features and classifiers | |
Zhao et al. | Speaker identification from the sound of the human breath | |
Usman | On the performance degradation of speaker recognition system due to variation in speech characteristics caused by physiological changes | |
CN106297805B (en) | A kind of method for distinguishing speek person based on respiratory characteristic | |
Le et al. | A study of voice source and vocal tract filter based features in cognitive load classification | |
Kadiri et al. | Discriminating neutral and emotional speech using neural networks | |
Jha et al. | Discriminant feature vectors for characterizing ailment cough vs. simulated cough | |
Nandwana et al. | A new front-end for classification of non-speech sounds: a study on human whistle | |
Dumpala et al. | Analysis of the Effect of Speech-Laugh on Speaker Recognition System. | |
Hui et al. | Emotion classification of mandarin speech based on TEO nonlinear features | |
Mohamad Jamil et al. | A flexible speech recognition system for cerebral palsy disabled | |
Kumar et al. | Text dependent speaker identification in noisy environment | |
Kabir et al. | Vector quantization in text dependent automatic speaker recognition using mel-frequency cepstrum coefficient | |
Sahoo et al. | Detection of speech-based physical load using transfer learning approach | |
Julia et al. | Detection of emotional expressions in speech |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |