CN106297805B - A kind of method for distinguishing speek person based on respiratory characteristic - Google Patents
A kind of method for distinguishing speek person based on respiratory characteristic Download PDFInfo
- Publication number
- CN106297805B CN106297805B CN201610626034.0A CN201610626034A CN106297805B CN 106297805 B CN106297805 B CN 106297805B CN 201610626034 A CN201610626034 A CN 201610626034A CN 106297805 B CN106297805 B CN 106297805B
- Authority
- CN
- China
- Prior art keywords
- breathing
- unknown
- speaker
- frame
- breath
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 230000000241 respiratory effect Effects 0.000 title claims abstract description 19
- 230000029058 respiratory gaseous exchange Effects 0.000 claims abstract description 216
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 14
- 238000012545 processing Methods 0.000 claims abstract description 14
- 238000001514 detection method Methods 0.000 claims abstract description 13
- 239000011159 matrix material Substances 0.000 claims description 100
- 239000013598 vector Substances 0.000 claims description 75
- 238000012360 testing method Methods 0.000 claims description 24
- 238000010606 normalization Methods 0.000 claims description 19
- 238000000926 separation method Methods 0.000 claims description 13
- 230000030279 gene silencing Effects 0.000 claims description 9
- 230000008518 non respiratory effect Effects 0.000 claims description 9
- 208000037656 Respiratory Sounds Diseases 0.000 claims description 6
- 230000001174 ascending effect Effects 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 6
- 238000001228 spectrum Methods 0.000 claims description 5
- 241000208340 Araliaceae Species 0.000 claims description 3
- 235000005035 Panax pseudoginseng ssp. pseudoginseng Nutrition 0.000 claims description 3
- 235000003140 Panax quinquefolius Nutrition 0.000 claims description 3
- 239000012141 concentrate Substances 0.000 claims description 3
- 238000000354 decomposition reaction Methods 0.000 claims description 3
- 235000008434 ginseng Nutrition 0.000 claims description 3
- 230000017105 transposition Effects 0.000 claims description 3
- 239000000203 mixture Substances 0.000 claims description 2
- 238000005516 engineering process Methods 0.000 abstract description 7
- 238000011161 development Methods 0.000 abstract description 4
- 238000000605 extraction Methods 0.000 abstract description 4
- 239000000284 extract Substances 0.000 abstract 1
- 238000012549 training Methods 0.000 description 4
- 238000012795 verification Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 238000011160 research Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 230000008030 elimination Effects 0.000 description 2
- 238000003379 elimination reaction Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 210000004072 lung Anatomy 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000035479 physiological effects, processes and functions Effects 0.000 description 2
- 210000003019 respiratory muscle Anatomy 0.000 description 2
- 210000002345 respiratory system Anatomy 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000002159 abnormal effect Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 210000003484 anatomy Anatomy 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 210000000621 bronchi Anatomy 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000007635 classification algorithm Methods 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000005284 excitation Effects 0.000 description 1
- 210000004704 glottis Anatomy 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000003434 inspiratory effect Effects 0.000 description 1
- 210000000876 intercostal muscle Anatomy 0.000 description 1
- 210000000867 larynx Anatomy 0.000 description 1
- 230000007257 malfunction Effects 0.000 description 1
- 230000007246 mechanism Effects 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 210000000214 mouth Anatomy 0.000 description 1
- 230000003387 muscular Effects 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 230000003595 spectral effect Effects 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
- 230000001052 transient effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/06—Decision making techniques; Pattern matching strategies
Landscapes
- Engineering & Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Business, Economics & Management (AREA)
- Game Theory and Decision Science (AREA)
- Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)
Abstract
The invention discloses a kind of method for distinguishing speek person based on respiratory characteristic, this method specifically includes that unknown input sound bite, pass through the breathing template established by mel-frequency cepstrum coefficient MFCC, zero-crossing rate ZCR and short-time energy E extracts the breath sound in unknown sound bite, then the false positive part in breath sound is rejected using the border detection algorithm for eliminating false low ebb, breath sound after being precisely separated, finally distinguish whether the speaker of unknown sound bite from sample speaker and judges whether the speaker of unknown sound bite is legal speaker using the breath sound after being precisely separated.The uniqueness that the present invention realizes human body respiration for the first time is paid close attention to and is studied, and it is effectively applied and overcomes " extraction of breath signal " and " breath signal processing " two challenge greatly that the development and utilization of the speaker Recognition Technology based on breathing face in Speaker Recognition System.Thus Speaker Recognition System provided by the invention is simple and efficient, and recognition result is accurate and reliable.
Description
Technical field
The present invention relates to a kind of system and methods of contactless biometric signal detection, are based on more particularly, to one kind
The Speaker Recognition System and method of respiratory characteristic.
Background technique
Speaker Identification (Speaker Recognition) is a kind of underlying issue, and be subdivided into two classes: speaker identification asks
Inscribe (Speaker Identication) and speaker verification's problem (Speaker Verification).The former distinguishes unknown theory
Whether words people is a member in certain known speaker's sample database;The latter confirms whether the speaker's identity of statement is legal.
Identification speaker is divided into training and two stages of test, and the training stage is used for the foundation of speaker characteristic template, and test phase is then
The similarity of test data and feature templates is calculated, and obtains judging result.It is different according to the degree of dependence to speech text, it says
Words people's identification is divided into text relationship type (only effective to some special text), text independent type (any text is effective), text again
Prompting-type (it is effective to be subordinate to special text collection).Although phonetic feature can weaken due to the reason of because of microphone, channel, will receive strong
The influence of health, mood, or even be imitated, but in recent years, speech processes the relevant technologies are quickly grown, and have already appeared many realities instantly
When application, so that speech processes relevant issues has been obtained more concerns and research.
The present Speaker Identification scheme deposited is based on Source-Filter (source-filter), or is based on Source-
System (source-system) model, or feature vector is extracted based on the two simultaneously.Excitation source information can pass through glottal signal base
It is indicated in the remaining sample linear prediction of shape.Channel information can be captured by cepstrum signal.Prosodic information can be held by statistics
The continuous time, tone, energy time dynamic obtain.It is the energy source that sound generates based on aerodynamic respiratory
One of, it can be extracted and be handled as one section of complete voice.Existing research is dedicated to breath signal in voice
Detection and rejecting improve speech-to-text converting algorithm improving sound quality, training typist and identify psychological shape
Condition etc..
Source-Filter (source-filter) theory thinks that voice is the response of sound channel system, and gives non-linear
, the good approximation of the voice changed over time." source (source) " refers to 4 kinds of source speech signals: suction source, sources of friction,
Glottis (sounding) source and transient state source.Sound channel acts like a filter, and input is generated by above-mentioned 4 kinds of source speech signals,
Output then forms vowel, consonant or any voice.Sound channel, which also controls, manages tone generation, voice quality, harmonic wave, and resonance is special
Property, rdaiation response etc..
In source/system (source/system) model, voice is built according to linear slowly varying discrete-time system
Mould.The system is excited by the random noise in unvoiced speech source or the pulse paracycle in speech sound source.Source includes tone
The phonetic feature easily to malfunction.Therefore, source model is rarely used in Speaker Identification, is also seldom enhanced with other features.
Relatively, system (system) model is corresponding with smooth power spectral envelope, and envelope passes through linear prediction or Meier filter
Analysis obtains.Therefore, which is widely used in the Speaker Recognition System in relation to cepstrum coefficient.
Both models are using breathing as a part of speech source, the voice being converted into speech sound source or noiseless language
Noise in source of sound.In fact, respiratory is that a kind of energy for converting sound for energy is shifted to new management mechanisms.In addition, in voice
In breathing be it is limited, generally, expiratory duration is longer than inspiratory duration, and the breathing in living in non-voice, exhale and
Inspiration time is of substantially equal.
Respiratory system includes lung, diaphram, intercostal muscle and by bronchus, tracheae, larynx, sound channel, the breathing letter that oral cavity is constituted
Road.We regard breathing as the physiology fingerprint of entire respiratory system, it by intra-pulmonary pressure, air flow direction and muscular movement manage with
Control.When air-breathing, respiratory muscle is shunk, and intra-pulmonary pressure reduces, and air flows into intrapulmonary from external.Similarly, due to intrapulmonary when expiration
Pressure increases, intrapulmonary space compression, and air is breathed out from intrapulmonary to external.According to anatomy principle, breathing front and back certainly exists one
A silencing interval.Breathing is influenced by age, sex factor, and 100-400 milliseconds of normal continuous, silencing gap continues 20 milliseconds
More than.Silencing gap is the key that carry out breathing description separation.
The generation of breathing is that lung, intra-pulmonary pressure, diaphragm, sound channel, tracheae, respiratory muscle are coefficient as a result, being breathing system
Physiology fingerprint in meaning of uniting.The flowing of air is not to complete moment, therefore all have one before the generation of breathing and after occurring
A silencing gap (>=20 milliseconds).It is compared with the voice signal (not including breathing) of ordinary meaning, the energy of breath signal is weak, when
Between short (100-400 milliseconds), occurrence frequency is low (12-18 beats/min), and Chong Die with the generation of non-respiratory voice signal in low frequency
(100Hz–1kHz).In addition, breath sound and phoneme and consonant fricative are especially similar, in such as " church "//, " vision "
In<Z>.Therefore, development and utilization of the breathing in speaker Recognition Technology face " extraction of breath signal " and " breath signal
The big challenge of processing " two, is not exploited in speaker Recognition Technology so as to cause breathing, and often as breathing noise quilt
It rejects.
Summary of the invention
It is an object of the invention to: it can not be used effectively for above-mentioned breathing in the prior art in speaker Recognition Technology
In, and the development and utilization of the speaker Recognition Technology based on breathing face " extraction of breath signal " and " breath signal processing "
Two big challenges, the present invention provide a kind of Speaker Recognition System and method based on respiratory characteristic.
The technical solution adopted by the invention is as follows:
A kind of method for distinguishing speek person based on respiratory characteristic, it is characterised in that the following steps are included:
Step 1: input breath sample collection carries out sub-frame processing to breath sample collection, obtains breathing frame, passes through mel-frequency
Cepstrum coefficient MFCC will breathe frame and be established as breathing template, and calculate breathing frame and breathe template that each breath sample collection obtains
Similarity obtains its minimum value Bm;
Step 2: unknown input sound bite carries out sub-frame processing to unknown sound bite, obtains unknown speech frame, calculates
The similarity of each unknown speech frame and breathing template;Calculate unknown speech frame zero-crossing rate ZCR and unknown speech frame it is short
Shi NengliangE;According to unknown speech frame and the breathing similarity of template, Bm, unknown speech frame zero-crossing rate ZCR and unknown voice
The short-time energy E of frame filters out the breath sound in unknown sound bite, exhaling after the breath sound composition initial gross separation filtered out
Sound-absorbing;
Step 3: the silencing gap of the breath sound after initial gross separation is detected using the border detection algorithm for eliminating false low ebb,
The false positive part in the breath sound after initial gross separation, the breath sound after being precisely separated are rejected according to silencing gap;
Step 4: choosing one group of sample speaker, acquire the breathing segment of each sample speaker, establish one group of speaker
Sample database carries out step 5 if need to determine whether the speaker of unknown sound bite comes from sample speaker;If needing to determine
Whether the speaker of unknown sound bite is legal speaker, carries out step 6;
Step 5: every in the breath sound and speaker's sample database after calculating being precisely separated of the unknown sound bite
The similarity of a speaker's breath sample takes the corresponding sample of maximum similarity to speak the speaker of artificial unknown sound bite,
Terminate;
Step 6: to each sample speaker's collecting test sample, choosing a test sample;
Step 7: the test sample for calculating selection is similar to speaker's breath sample each in speaker's sample database
Degree takes the maximum value in the test sample and speaker's sample database in the similarity of each speaker's breath sample, obtains
To a maximum similarity;
Step 8: another test sample is chosen, step 7 is repeated, it is similar until obtaining the corresponding maximum of all test samples
Degree, obtains maximum similarity group;
Step 9: acquiring the sound bite of legal speaker, the breathing piece of legal speaker is extracted using breath sample collection
Section, the similarity of the breathing segment of breath sound and legal speaker after calculating being precisely separated of unknown sound bite;
Step 10: if the breath sound after being precisely separated of unknown sound bite is similar to the breathing segment of legal speaker
Degree is greater than the minimum value of maximum similarity group, then the speaker of unknown sound bite is accredited as legal speaker, otherwise is non-
Method speaker.
In above scheme, the step 1 the following steps are included:
Step 1.1: the breath sample collection is divided into the breathing frame that length is 100 milliseconds by input breath sample collection, will be every
A breathing frame is divided into continuous and overlapped breathing subframe again, and each subframe lengths that breathe is 10ms, and adjacent breather
Overlapped length is 5ms between frame;
Step 1.2 carries out preemphasis to each breathing subframe using first-order difference filter, the breather after obtaining preemphasis
Frame;Wherein, first-order difference filter H:
H (z)=1- α z-1
Wherein, α is pre-emphasis parameters α ≈ 0.095, and z is signal sampling point data;
Step 1.3: MFCC being calculated to the breathing subframe after each preemphasis of each breathing frame, obtains each breathing frame
Cepstrum matrix in short-term removes DC component to each column of the matrix of cepstrum in short-term of each breathing frame, obtains each breathing frame
MFCC cepstrum matrix;
Step 1.4: calculate the Mean Matrix T of breath sample collection:
Wherein, N represents the number that breath sample concentrates breathing frame, and M (Xi) indicates the MFCC cepstrum square of i-th of breathing frame
Battle array, i ∈ [1,2 ..., N];
Calculate the variance matrix V of breath sample collection:
Step 1.5: the MFCC cepstrum matrix series connection by all breathing frames is matrix M one bigb: Mb=[M (X1),…,M
(Xi),M(Xi+1),…,M(XN)]
Singular value decomposition is carried out to the big matrix:
Mb=U Σ V*
Wherein, U is m × m rank unitary matrice;Σ is positive semidefinite m × n rank diagonal matrix, and V* indicates the conjugate transposition of V, be n ×
N rank unitary matrice, the element on Σ diagonal line is { λ1,λ2,λ3..., the as singular value of M obtains singular value vector { λ1,λ2,
λ3,…};
With maximum singular value λmThe singular value vector is normalized, the singular value vector after finally being normalizedWherein, λm=max { λ1,λ2,λ3,…};
Step 1.6: obtaining one group of breathing template, the breathing template includes singular value vector S, the breathing sample after normalization
The variance matrix V of this collection and Mean Matrix T of breath sample collection.
In above scheme, the step 2 the following steps are included:
Step 2.1: unknown input sound bite, to unknown sound bite carry out sub-frame processing, obtain unknown speech frame with
Unknown speech subframe calculates the similarity B (X, T, V, S) of each unknown speech frame and breathing template;Calculate breath sample collection
Each breathing frame and the similarity for breathing template, taking minimum similarity degree is Bm;
Calculate each unknown speech frame short-time energy E:
Wherein, n indicates that n-th of sampled point of signal, x [n] indicate that n-th of speech sample signal, N indicate that the window of sample is long
Degree, N0The window start for indicating sample is N0A sampled point;
Calculate the average value of all unknown speech frames;
Calculate the zero-crossing rate ZCR of unknown sound bite:
Wherein, n indicates that n-th of sampled point of signal, x [n] indicate that n-th of speech sample signal, N indicate that the window of sample is long
Degree, N0The window start for indicating sample is N0A sampled point;
Step 2.2: choosing a unknown speech frame;
Step 2.3: if the unknown speech frame being selected and the similarity B (X, T, V, S) of breathing template are greater than threshold value Bm/2,
And the zero-crossing rate ZCR of unknown speech frame is less than 0.25, and the unknown speech frame being selected short-time energy E be less than it is all not
Know the average value of speech frame, then judge that the unknown speech frame being selected judges quilt if being unsatisfactory for the condition as breath sound
The unknown speech frame chosen is non-respiratory sound.
Step 2.4: other unknown speech frames are chosen, step 2.3 is repeated, it is all unknown in unknown sound bite until judging
Whether speech frame is breath sound;
Step 2.5: retaining breath sound, reject non-respiratory sound, obtain initial gross separation breath sound;
The side of the similarity of breathing frame or unknown speech frame and breathing template is calculated in above scheme, in the step 2.1
Method the following steps are included:
Step 2.1.1: breath sample collection or unknown sound bite are divided by input breath sample collection or unknown sound bite
Each breathing frame or unknown speech frame are divided into continuous and phase mutual respect by the breathing frame or unknown speech frame that length is 100 milliseconds again
Folded breathing subframe or unknown speech subframe, each to breathe subframe or unknown speech subframe length is 10ms and adjacent is unknown
Overlapped length is 5ms between speech subframe;
Step 2.1.2: preemphasis is carried out to each unknown speech subframe using first-order difference filter, after obtaining preemphasis
Breathe frame or unknown speech frame;Wherein, first-order difference filter H:
H (z)=1- α z-1
Wherein, α is pre-emphasis parameters α ≈ 0.095;Z is signal sampling point data;
Step 2.1.3: to it is each breathing frame or unknown speech frame each preemphasis after breathing subframe or unknown voice
Subframe calculates MFCC, the cepstrum matrix in short-term of each breathing frame or unknown speech frame is obtained, to each breathing frame or unknown voice
The each column of the matrix of cepstrum in short-term of frame remove DC component, obtain the MFCC cepstrum matrix M of each breathing frame or unknown speech frame
(X)
Step 2.1.4: a breathing frame or unknown speech frame X are chosen;
Step 2.1.5: the normalization difference matrix D of the breathing frame or unknown speech frame that are selected is calculated:
Wherein, T indicates that the Mean Matrix of breath sample collection, V indicate that breath sample collection variance matrix, M (X) are quilt
The breathing frame of selection or the MFCC cepstrum matrix of unknown speech frame;
Step 2.1.6: each column of D are multiplied with half Hamming window, the cepstrum coefficient of low frequency is made to be strengthened:
D (:, j)=D (:, j) hamming, j ∈ [1, NC]
Wherein, Nc indicates the MFCC number of parameters in each breathing subframe or unknown speech subframe, the i.e. columns of D;
Hamming indicates Hamming window.
Step 2.1.7: calculate be selected breathing frame or unknown speech frame X and breathing template similarity B (X, T, V,
S component Cp):
Wherein, n indicates the quantity that subframe or unknown speech subframe are breathed in the breathing frame or unknown speech frame X being selected, k
∈ [1, n], DkjIndicate j-th of MFCC ginseng in k-th of the breathing subframe or unknown speech subframe of frame to be breathed or unknown speech frame X
Number;
Calculate another component of the similarity B (X, T, V, S) of the breathing frame being selected or unknown speech frame X and breathing template
Cn:
Step 2.1.8: the similarity B (X, T, V, S) of breathing frame or unknown speech frame X and breathing template are calculated:
B (X, T, V, S)=Cp*Cn;
Step 2.1.9: choosing the MFCC cepstrum matrix of another breathing frame or unknown speech frame, repeats step 2.1.5-
2.1.8;
Step 2.1.10: repeating step 2.1.9, until obtaining the phase of all breathing frames or unknown speech frame with breathing template
Like degree;
In above scheme, the value that the border detection algorithm of the false low ebb of elimination utilizes in the step 3 includes that breathing continues
Time threshold, energy threshold, zero-crossing rate ZCR bound threshold value and spectrum slope accurately find breathing boundary, step 3 benefit
With Binary Zero -1, accurately instruction is breathed in the position of current speech segment.
In above scheme, according to claim 1 based on the method for distinguishing speek person of respiratory characteristic, which is characterized in that
The speaker in breath sound and speaker's sample database after calculating being precisely separated of the unknown sound bite in the step 5
The similarity of breath sample the following steps are included:
Step 5.1: setting the MFCC feature vector of the sample in breath sample database as (a1,a2,...,an), calculating is said
Talk about the Mean Matrix M of the MFCC feature vector of speaker's breath sample in proper manners database:
Wherein, aiFor i-th MFCC of the MFCC feature vector of speaker's breath sample in speaker's sample database
Cepstrum matrix, n represent the MFCC cepstrum square in the MFCC feature vector of speaker's breath sample in speaker's sample database
The number of battle array, i ∈ [1,2 ..., n];
Calculate the variance matrix V of the MFCC feature vector of speaker's breath sample in speaker's sample database:
Step 5.2: the MFCC feature vector of all breath sounds after calculating being precisely separated of unknown sound bite is denoted as
(b1,b2,...,bn), biFor the MFCC cepstrum matrix of the breath sound after being precisely separated for i-th of unknown sound bite;
Step 5.3: to the feature vector (a of speaker's breath sample in speaker's sample database1,a2,..., an)
It is normalized:
Wherein, r and c are respectively indicatedRow and column,For akNormalization difference matrix, k ∈ [1,2 ..., n], i
∈[1,2,…,r],j∈[1,2,…,c];
Step 5.4: by (Sa1,Sa2,...,San) ascending order arrangement is carried out, obtain (S1,S2,...,Sn);
Step 5.5: to the MFCC feature vectors of all breath sounds after being precisely separated of unknown sound bite (b1,
B2 ..., bn) it is normalized:
Wherein, r and c are respectively indicatedRow and column,For bkNormalization difference matrix, k ∈ [1,2 ..., n], i
∈[1,2,…,r],j∈[1,2,…,c];
Step 5.6: calculating the similarity degree Pk of bk and reference template: by SbkWith ordered vector (S1, S2 ..., Sn)
Element is compared one by one, and Pk is that the number of elements in ordered vector less than Sbk is total divided by element, calculates the average value of Pk,
The similarity of the sample in breath sound and breath sample database after obtaining being precisely separated of unknown sound bite.
In above scheme, the breath sound after being precisely separated of unknown sound bite is calculated in the step 9 is spoken with legal
The similarity of the breathing segment of people the following steps are included:
Step 9.1: setting the MFCC feature vector of the breathing segment of legal speaker as (a1,a2,...,an), it is legal to calculate
The Mean Matrix M of the MFCC feature vector of the breathing segment of speaker:
Wherein, aiFor i-th of MFCC cepstrum matrix of the MFCC feature vector of the breathing segment of legal speaker, n is represented
The number of MFCC cepstrum matrix in the MFCC feature vector of the breathing segment of legal speaker, i ∈ [1,2 ..., n];
Calculate the variance matrix V of the MFCC feature vector of the breathing segment of legal speaker:
Step 9.2: the MFCC feature vector of all breath sounds after calculating being precisely separated of unknown sound bite is denoted as
(b1,b2,...,bn), biFor the MFCC cepstrum matrix of the breath sound after being precisely separated for i-th of unknown sound bite;
Step 9.3: to the feature vector (a of the breathing segment of legal speaker1,a2,...,an) it is normalized:
Wherein, r and c are respectively indicatedRow and column,For akNormalization difference matrix, k ∈ [1,2 ..., n], i
∈[1,2,…,r],j∈[1,2,…,c];
Step 9.4: by (Sa1,Sa2,...,San) ascending order arrangement is carried out, obtain (S1,S2,...,Sn);
Step 9.5: to the MFCC feature vectors of all breath sounds after being precisely separated of unknown sound bite (b1,
B2 ..., bn) it is normalized:
Wherein, r and c are respectively indicatedRow and column,For bkNormalization difference matrix, k ∈ [1,2 ..., n], i
∈[1,2,…,r],j∈[1,2,…,c];
Step 9.6: calculating the similarity degree Pk of bk and reference template: by SbkWith ordered vector (S1, S2 ..., Sn)
Element is compared one by one, and Pk is that the number of elements in ordered vector less than Sbk is total divided by element, calculates the average value of Pk,
The similarity of the breathing segment of breath sound and legal speaker after obtaining being precisely separated of unknown sound bite.
In above scheme, each speaker in the test sample and speaker's sample database of selection is calculated in the step 7
Breath sound and speaker after calculating being precisely separated of the unknown sound bite in the method for the similarity of breath sample and step 5
The method of the similarity of each speaker's breath sample is identical in sample database.
In above scheme, the method that MFCC is calculated in the step 1.3 and step 5.2 includes: that will need to calculate MFCC's
Signal carries out Fast Fourier Transform (FFT), complicated sine curve coefficient is then calculated, finally by the filter group based on melscale
It is exported.
In conclusion by adopting the above-described technical solution, the beneficial effects of the present invention are:
1) present invention as a set of Verification System based on breathing, paid close attention to by the uniqueness for realizing human body respiration for the first time
And research, and it is effectively applied the development and utilization that the speaker Recognition Technology based on breathing is overcome in Speaker Recognition System
" extraction of breath signal " and " breath signal processing " two faced is challenged greatly.
2) the present invention is based on the knowledge of mathematical statistics, devise a light similarity algorithm for decision: the calculation
Method is a series of simple vector operations using MFCC Mean Matrix and variance matrix.Compared with traditional classification algorithm, this hair
Similarity algorithm in bright has more preferably classification performance.
3) present invention can operate with speaker identification's experiment and speaker verification's experiment;Simultaneously because if people's exhales
Haustorium official is interfered, then his breathing signature may be modified, therefore the invention can be used for judging human body respiration organ
Whether it is interfered.
4) present invention can be achieved to need the identification under mute occasion.
5) present invention can be achieved can not sounding tester identification.
6) classification method that uses of the present invention is opposite with traditional complex model classification side based on multi-parameter, more assumed
Method has lower time complexity and space complexity.In addition, the present invention uses the algorithm process data based on MFCC more
Fastly, required training sample is less, and ensures recognition accuracy, thus Speaker Recognition System provided by the invention is simple and efficient,
And recognition result is accurate and reliable.
Detailed description of the invention
Fig. 1 is the system framework figure for judging whether unknown speaker identity is legal in the present invention;
Fig. 2 is the frame diagram for breathing Preliminary detection in the present invention in step 2;
Fig. 3 is to breathe the frame diagram finally detected in the present invention in step 3;
Fig. 4 is the experimental result schematic table of step 6-8 in the present invention;
Fig. 5 indicates that Meier filter group acts on the comparison after breath signal and non-respiratory voice signal in the present invention;
Fig. 6 indicates the characteristics of ZCR, spectrum slope and STE in the present invention;
Fig. 7 indicates the formant of breath signal and the voice signal of non-respiratory in the present invention;
Fig. 8 shows the breath signals under the breathing and abnormal condition under normal condition in the present invention;
Specific embodiment
All features disclosed in this specification can be with any other than mutually exclusive feature and/or step
Mode combines.
It elaborates below with reference to Fig. 1-8 couples of present invention.
The invention proposes a kind of method for distinguishing speek person based on respiratory characteristic, which takes applied to Speaker Identification
Obtain good effect.The realization schematic diagram of entire algorithm similar to Fig. 1, comprising steps of
Step 1: such as Fig. 1, inputting breath sample collection, sub-frame processing is carried out to breath sample collection, obtain breathing frame, pass through plum
Your frequency cepstral coefficient MFCC will breathe frame and be established as breathing template;Step 1 specifically includes the following steps:
Step 1.1: the breath sample collection is divided into the breathing frame that length is 100 milliseconds by input breath sample collection, will be every
A breathing frame is divided into continuous and overlapped breathing subframe again, and each subframe lengths that breathe is 10ms, and adjacent breather
Overlapped length is 5ms between frame;
Step 1.2 carries out preemphasis to each breathing subframe using first-order difference filter, the breather after obtaining preemphasis
Frame;Wherein, first-order difference filter H:
H (z)=1- α z-1
Wherein, α is pre-emphasis parameters α ≈ 0.095, and z is signal sampling point data;
Step 1.3: MFCC being calculated to the breathing subframe after each preemphasis of each breathing frame, obtains each breathing frame
Cepstrum matrix in short-term removes DC component to each column of the matrix of cepstrum in short-term of each breathing frame, obtains each breathing frame
MFCC cepstrum matrix;
Step 1.4: calculate the Mean Matrix T of breath sample collection:
Wherein, N represents the number that breath sample concentrates breathing frame, and M (Xi) indicates the MFCC cepstrum square of i-th of breathing frame
Battle array, i ∈ [1,2 ..., N];
Calculate the variance matrix V of breath sample collection:
Step 1.5: the MFCC cepstrum matrix series connection by all breathing frames is matrix M one bigb: Mb=[M (X1),…,M
(Xi),M(Xi+1),…,M(XN)]
Singular value decomposition is carried out to the big matrix:
Mb=U Σ V*
Wherein, U is m × m rank unitary matrice;Σ is positive semidefinite m × n rank diagonal matrix, and V* indicates the conjugate transposition of V, be n ×
N rank unitary matrice, the element on Σ diagonal line is { λ1,λ2,λ3..., the as singular value of M obtains singular value vector { λ1,λ2,
λ3,…};
With maximum singular value λmThe singular value vector is normalized, the singular value vector after finally being normalizedWherein, λm=max { λ1,λ2,λ3,…};
Step 1.6: obtaining one group of breathing template, the breathing template includes singular value vector S, the breathing sample after normalization
The variance matrix V of this collection and Mean Matrix T of breath sample collection.
Step 2: such as Fig. 2, unknown input sound bite carries out sub-frame processing to unknown sound bite, obtains unknown voice
Frame calculates the similarity of each unknown speech frame and breathing template, calculates the zero-crossing rate ZCR and unknown language of unknown sound bite
The short-time energy E of tablet section;According to unknown sound bite and the breathing similarity of template, Bm, unknown sound bite zero-crossing rate
The short-time energy E of ZCR and unknown sound bite filters out the breath sound in unknown sound bite, the breath sound group that filters out
At the breath sound after initial gross separation;
The step 2 the following steps are included:
Step 2.1: unknown input sound bite, to unknown sound bite carry out sub-frame processing, obtain unknown speech frame with
Unknown speech subframe calculates the similarity B (X, T, V, S) of each unknown speech frame and breathing template;
Each breathing frame of breath sample collection and the similarity of breathing template are calculated, taking minimum similarity degree is Bm;
Calculate each unknown speech frame short-time energy E:
Wherein, n indicates that n-th of sampled point of signal, x [n] indicate that n-th of speech sample signal, N indicate that the window of sample is long
Degree, N0The window start for indicating sample is N0A sampled point;
Calculate the average value of all unknown speech frames
Calculate the zero-crossing rate ZCR of unknown speech frame:
Wherein, n indicates that n-th of sampled point of signal, x [n] indicate that n-th of speech sample signal, N indicate that the window of sample is long
Degree, N0The window start for indicating sample is N0A sampled point;
The method that the similarity of breathing frame or unknown speech frame and breathing template is calculated in the step 2.1 includes following step
It is rapid:
Step 2.1.1: breath sample collection or unknown sound bite are divided by input breath sample collection or unknown sound bite
Each breathing frame or unknown speech frame are divided into continuous and phase mutual respect by the breathing frame or unknown speech frame that length is 100 milliseconds again
Folded breathing subframe or unknown speech subframe, each to breathe subframe or unknown speech subframe length is 10ms and adjacent is unknown
Overlapped length is 5ms between speech subframe;
Step 2.1.2: preemphasis is carried out to each unknown speech subframe using first-order difference filter, after obtaining preemphasis
Breathe frame or unknown speech frame;Wherein, first-order difference filter H:
H (z)=1- α z-1
Wherein, α is pre-emphasis parameters α ≈ 0.095;Z is signal sampling point data;
Step 2.1.3: to it is each breathing frame or unknown speech frame each preemphasis after breathing subframe or unknown voice
Subframe calculates MFCC, the cepstrum matrix in short-term of each breathing frame or unknown speech frame is obtained, to each breathing frame or unknown voice
The each column of the matrix of cepstrum in short-term of frame remove DC component, obtain the MFCC cepstrum matrix M of each breathing frame or unknown speech frame
(X)
Step 2.1.4: a breathing frame or unknown speech frame X are chosen;
Step 2.1.5: the normalization difference matrix D of the breathing frame or unknown speech frame that are selected is calculated:
Wherein, T indicates that the Mean Matrix of breath sample collection, V indicate that breath sample collection variance matrix, M (X) are quilt
The breathing frame of selection or the MFCC cepstrum matrix of unknown speech frame;
Step 2.1.6: each column of D are multiplied with half Hamming window, the cepstrum coefficient of low frequency is made to be strengthened:
D (:, j)=D (:, j) hamming, j ∈ [1, NC]
Wherein, Nc indicates the MFCC number of parameters in each breathing subframe or unknown speech subframe, the i.e. columns of D;
Hamming indicates Hamming window.
Step 2.1.7: calculate be selected breathing frame or unknown speech frame X and breathing template similarity B (X, T, V,
S component Cp):
Wherein, n indicates the quantity that subframe or unknown speech subframe are breathed in the breathing frame or unknown speech frame X being selected, k
∈ [1, n], DkjIndicate j-th of MFCC ginseng in k-th of the breathing subframe or unknown speech subframe of frame to be breathed or unknown speech frame X
Number;
Calculate another component of the similarity B (X, T, V, S) of the breathing frame being selected or unknown speech frame X and breathing template
Cn:
Step 2.1.8: the similarity B (X, T, V, S) of breathing frame or unknown speech frame X and breathing template are calculated:
B (X, T, V, S)=Cp*Cn;
Step 2.1.9: choosing the MFCC cepstrum matrix of another breathing frame or unknown speech frame, repeats step 2.1.5-
2.1.8;
Step 2.1.10: repeating step 2.1.9, until obtaining the phase of all breathing frames or unknown speech frame with breathing template
Like degree;
In above scheme, the value that the border detection algorithm of the false low ebb of elimination utilizes in the step 3 includes that breathing continues
Time threshold, energy threshold, zero-crossing rate ZCR bound threshold value and spectrum slope accurately find breathing boundary, step 3 benefit
With Binary Zero -1, accurately instruction is breathed in the position of current speech segment.
Step 2.2: choosing a unknown speech frame;
Step 2.3: if the unknown speech frame being selected and the similarity B (X, T, V, S) of breathing template are greater than threshold value Bm/2,
And the zero-crossing rate ZCR of unknown sound bite is less than 0.25 (sample rate is 44kHz at this time), and the unknown speech frame being selected
Short-time energy E be less than all unknown speech frames average valueThe unknown speech frame being selected then is judged as breath sound, if not
Meet the condition, then judges the unknown speech frame being selected as non-respiratory sound.
Step 2.4: other unknown speech frames are chosen, step 2.3 is repeated, it is all unknown in unknown sound bite until judging
Whether speech frame is breath sound;
Step 2.5: retaining breath sound, reject non-respiratory sound, obtain initial gross separation breath sound;
Step 3: such as Fig. 3, utilizing the heavy of the breath sound after the border detection algorithm detection initial gross separation for eliminating false low ebb
Silent gap rejects the false positive part in the breath sound after initial gross separation, the breathing after being precisely separated according to silencing gap
Sound;The 3rd phase of volume 15 in border detection algorithm specific implementation such as in the March, 2007 for eliminating false low ebb
" An effective in IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING
algorithm for automatic detection and exact demarcation of breath sounds in
Speech and song " text;
Step 4: choosing one group of sample speaker, acquire the breathing segment of each sample speaker, establish one group of speaker
Sample database carries out step 5 if need to determine whether the speaker of unknown sound bite comes from sample speaker;If needing to determine
Whether the speaker of unknown sound bite is legal speaker, carries out step 6;
Step 5: in the breath sound and speaker's sample database after calculating being precisely separated of the unknown sound bite
The similarity of each speaker's breath sample takes the corresponding sample of maximum similarity to speak the speaker of artificial unknown sound bite,
Terminate;
In breath sound and speaker's sample database after calculating being precisely separated of the unknown sound bite in the step 5
Speaker's breath sample similarity the following steps are included:
Step 5.1: setting the MFCC feature vector of speaker's breath sample in speaker's sample database as (a1,
a2,...,an), calculate the Mean Matrix M of the MFCC feature vector of speaker's breath sample in speaker's sample database:
Wherein, aiFor i-th MFCC of the MFCC feature vector of speaker's breath sample in speaker's sample database
Cepstrum matrix, n represent the MFCC cepstrum square in the MFCC feature vector of speaker's breath sample in speaker's sample database
The number of battle array, i ∈ [1,2 ..., n];
Calculate the variance matrix V of the MFCC feature vector of speaker's breath sample in speaker's sample database:
Step 5.2: the MFCC feature vector of all breath sounds after calculating being precisely separated of unknown sound bite is denoted as
(b1,b2,...,bn), biFor the MFCC cepstrum matrix of the breath sound after being precisely separated for i-th of unknown sound bite;
Step 5.3: to the feature vector (a of speaker's breath sample in speaker's sample database1,a2,..., an)
It is normalized:
Wherein, r and c are respectively indicatedRow and column,For akNormalization difference matrix, k ∈ [1,2 ..., n], i
∈[1,2,…,r],j∈[1,2,…,c];
Step 5.4: by (Sa1,Sa2,...,San) ascending order arrangement is carried out, obtain (S1,S2,...,Sn);
Step 5.5: to the MFCC feature vectors of all breath sounds after being precisely separated of unknown sound bite (b1,
B2 ..., bn) it is normalized:
Wherein, r and c are respectively indicatedRow and column,For bkNormalization difference matrix, k ∈ [1,2 ..., n], i
∈[1,2,…,r],j∈[1,2,…,c];
Step 5.6: calculating the similarity degree Pk of bk and reference template: by SbkWith ordered vector (S1, S2 ..., Sn)
Element is compared one by one, and Pk is that the number of elements in ordered vector less than Sbk is total divided by element, calculates the average value of Pk,
The similarity of the sample in breath sound and breath sample database after obtaining being precisely separated of unknown sound bite.
Step 6: to each sample speaker's collecting test sample, choosing a test sample;
Step 7: such as Fig. 4, the test sample and each speaker's breath sample in speaker's sample database for calculating selection
Similarity, take the maximum in the similarity of each speaker's breath sample in the test sample and speaker's sample database
Value, obtains a maximum similarity;
The test sample and each speaker's breath sample in speaker's sample database that selection is calculated in the step 7
Breath sound and speaker's sample database after calculating being precisely separated of the unknown sound bite in the method for similarity and step 5
In each speaker's breath sample similarity method it is identical.
Step 8: such as Fig. 4, another test sample is chosen, repeats step 7, it is corresponding most until obtaining all test samples
Big similarity obtains maximum similarity group;
Step 9: acquire the breathing segment of legal speaker, the breath sound after calculating being precisely separated of unknown sound bite with
The similarity of the breathing segment of legal speaker;
The breathing segment of breath sound and legal speaker after calculating being precisely separated of unknown sound bite in the step 9
Similarity the following steps are included:
Step 9.1: setting the MFCC feature vector of the breathing segment of legal speaker as (a1,a2,...,an), it is legal to calculate
The Mean Matrix M of the MFCC feature vector of the breathing segment of speaker:
Wherein, aiFor i-th of MFCC cepstrum matrix of the MFCC feature vector of the breathing segment of legal speaker, n is represented
The number of MFCC cepstrum matrix in the MFCC feature vector of the breathing segment of legal speaker, i ∈ [1,2 ..., n];
Calculate the variance matrix V of the MFCC feature vector of the breathing segment of legal speaker:
Step 9.2: the MFCC feature vector of all breath sounds after calculating being precisely separated of unknown sound bite is denoted as
(b1,b2,...,bn), biFor the MFCC cepstrum matrix of the breath sound after being precisely separated for i-th of unknown sound bite;
Step 9.3: to the feature vector (a of the breathing segment of legal speaker1,a2,...,an) it is normalized:
Wherein, r and c are respectively indicatedRow and column,For akNormalization difference matrix, k ∈ [1,2 ..., n], i
∈[1,2,…,r],j∈[1,2,…,c];
Step 9.4: by (Sa1,Sa2,...,San) ascending order arrangement is carried out, obtain (S1,S2,...,Sn);
Step 9.5: to the MFCC feature vectors of all breath sounds after being precisely separated of unknown sound bite (b1,
B2 ..., bn) it is normalized:
Wherein, r and c are respectively indicatedRow and column,For bkNormalization difference matrix, k ∈ [1,2 ..., n], i
∈[1,2,…,r],j∈[1,2,…,c];
Step 9.6: calculating the similarity degree Pk of bk and reference template: by SbkWith ordered vector (S1, S2 ..., Sn)
Element is compared one by one, and Pk is that the number of elements in ordered vector less than Sbk is total divided by element, calculates the average value of Pk,
The similarity of the breathing segment of breath sound and legal speaker after obtaining being precisely separated of unknown sound bite.
Step 10: if the breath sound after being precisely separated of unknown sound bite is similar to the breathing segment of legal speaker
Degree is greater than the minimum value of maximum similarity group, then the speaker of unknown sound bite is accredited as legal speaker, otherwise is non-
Method speaker.
The method that MFCC is calculated in the step 1.3 and step 5.2 includes: to carry out the signal for needing to calculate MFCC quickly
Then Fourier transformation calculates complicated sine curve coefficient, is finally exported by the filter group based on melscale.
The present invention has been explained by the above embodiments, but it is to be understood that, above-described embodiment is only intended to
The purpose of citing and explanation, is not intended to limit the invention to the scope of the described embodiments.Furthermore those skilled in the art
It is understood that the present invention is not limited to the above embodiments, introduction according to the present invention can also be made more kinds of member
Variants and modifications, all fall within the scope of the claimed invention for these variants and modifications.Protection scope of the present invention by
The appended claims and its equivalent scope are defined.
Claims (9)
1. a kind of method for distinguishing speek person based on respiratory characteristic, it is characterised in that the following steps are included:
Step 1: input breath sample collection carries out sub-frame processing to breath sample collection, obtains breathing frame, passes through mel-frequency cepstrum
Coefficient MFCC will breathe frame and be established as breathing template, and it is similar to breathing template to calculate the breathing frame that each breath sample collection obtains
Degree, obtains its minimum value Bm;
Step 2: unknown input sound bite carries out sub-frame processing to unknown sound bite, obtains unknown speech frame, calculates each
The similarity of unknown speech frame and breathing template;Calculate the zero-crossing rate ZCR of unknown speech frame and in short-term capable of for unknown speech frame
Measure E;According to the similarity of unknown speech frame and breathing template, Bm, the zero-crossing rate ZCR of unknown speech frame and unknown speech frame
Short-time energy E filters out the breath sound in unknown sound bite, the breath sound after the breath sound composition initial gross separation filtered out;
Step 3: the silencing gap of the breath sound after initial gross separation is detected using the border detection algorithm for eliminating false low ebb, according to
Reject the false positive part in the breath sound after initial gross separation, the breath sound after being precisely separated in silencing gap;
Step 4: choosing one group of sample speaker, acquire the breathing segment of each sample speaker, establish one group of speaker's sample
Database carries out step 5 if need to determine whether the speaker of unknown sound bite comes from sample speaker;If need to determine unknown
Whether the speaker of sound bite is legal speaker, carries out step 6;
Step 5: each theory in the breath sound and speaker's sample database after calculating being precisely separated of the unknown sound bite
The similarity for talking about people's breath sample takes the corresponding sample of maximum similarity to speak the speaker of artificial unknown sound bite, terminates;
Step 6: to each sample speaker's collecting test sample, choosing a test sample;
Step 7: calculating the similarity of each speaker's breath sample in the test sample and speaker's sample database of selection, take
Maximum value in the test sample and speaker's sample database in the similarity of each speaker's breath sample, obtains one
Maximum similarity;
Step 8: another test sample is chosen, step 7 is repeated, until obtaining the corresponding maximum similarity of all test samples,
Obtain maximum similarity group;
Step 9: acquiring the sound bite of legal speaker, the breathing segment of legal speaker is extracted using breath sample collection, count
The similarity of the breathing segment of breath sound and legal speaker after calculating being precisely separated of unknown sound bite;
Step 10: if breath sound after being precisely separated of unknown sound bite and the similarity of the breathing segment of legal speaker are big
In the minimum value of maximum similarity group, then the speaker of unknown sound bite is accredited as legal speaker, otherwise illegally to say
Talk about people.
2. according to claim 1 based on the method for distinguishing speek person of respiratory characteristic, which is characterized in that the step 1 includes
Following steps:
Step 1.1: the breath sample collection is divided into the breathing frame that length is 100 milliseconds, exhaled each by input breath sample collection
Inhale frame and be divided into continuous and overlapped breathing subframe again, each subframe lengths that breathe are 10ms, and adjacent breathing subframe it
Between overlapped length be 5ms;
Step 1.2 carries out preemphasis to each breathing subframe using first-order difference filter, the breathing subframe after obtaining preemphasis;Its
In, first-order difference filter H:
H (z)=1- α z-1
Wherein, α is pre-emphasis parameters α ≈ 0.095, and z is signal sampling point data;
Step 1.3: MFCC being calculated to the breathing subframe after each preemphasis of each breathing frame, obtains each breathing frame in short-term
Cepstrum matrix removes DC component to each column of the matrix of cepstrum in short-term of each breathing frame, and the MFCC for obtaining each breathing frame falls
Spectrum matrix;
Step 1.4: calculate the Mean Matrix T of breath sample collection:
Wherein, N represents the number that breath sample concentrates breathing frame, and M (Xi) indicates the MFCC cepstrum matrix of i-th of breathing frame, i ∈
[1,2,…,N];
Calculate the variance matrix V of breath sample collection:
Step 1.5: the MFCC cepstrum matrix series connection by all breathing frames is matrix M one bigb:
Mb=[M (X1),…,M(Xi),M(Xi+1),…,M(XN)]
Singular value decomposition is carried out to the big matrix:
Mb=U Σ V*
Wherein, U is m × m rank unitary matrice;Σ is positive semidefinite m × n rank diagonal matrix, and it is n × n rank that V*, which indicates the conjugate transposition of V,
Unitary matrice, the element on Σ diagonal line is { λ1,λ2,λ3..., the as singular value of M obtains singular value vector { λ1,λ2,
λ3,…};
With maximum singular value λmThe singular value vector is normalized, the singular value vector after finally being normalizedWherein, λm=max { λ1,λ2,λ3,…};
Step 1.6: obtaining one group of breathing template, the breathing template includes singular value vector S, the breath sample collection after normalization
Variance matrix V and breath sample collection Mean Matrix T.
3. according to claim 1 based on the method for distinguishing speek person of respiratory characteristic, which is characterized in that the step 2 includes
Following steps:
Step 2.1: unknown input sound bite, to unknown sound bite carry out sub-frame processing, obtain unknown speech frame with it is unknown
Speech subframe calculates the similarity B (X, T, V, S) of each unknown speech frame and breathing template;Calculate each of breath sample collection
It breathes frame and breathes the similarity of template, taking minimum similarity degree is Bm;
Calculate each unknown speech frame short-time energy E:
Wherein, n indicates that n-th of sampled point of signal, x [n] indicate that n-th of speech sample signal, N indicate the length of window of sample, N0
The window start for indicating sample is N0A sampled point;
Calculate the average value of all unknown speech frames
Calculate the zero-crossing rate ZCR of unknown speech frame:
Wherein, n indicates that n-th of sampled point of signal, x [n] indicate that n-th of speech sample signal, N indicate the length of window of sample, N0
The window start for indicating sample is N0A sampled point;
Step 2.2: choosing a unknown speech frame;
Step 2.3: if the unknown speech frame being selected and the similarity B (X, T, V, S) of breathing template are greater than threshold value Bm/2, and
The zero-crossing rate ZCR of unknown speech frame is less than 0.25, and the short-time energy E of unknown speech frame being selected is less than all unknown languages
The average value of sound frameThen judge the unknown speech frame being selected as breath sound;If being unsatisfactory for above-mentioned condition, judgement is selected
Unknown speech frame be non-respiratory sound, wherein X indicates that breathing frame or unknown speech frame, T indicate the mean value square of breath sample collection
Battle array, V indicate that the variance matrix of speaker's breath sample, S indicate the singular value vector after normalization;
Step 2.4: choosing other unknown speech frames, step 2.3 is repeated, until judging all unknown voices in unknown sound bite
Whether frame is breath sound;
Step 2.5: retaining breath sound, reject non-respiratory sound, obtain initial gross separation breath sound.
4. according to claim 3 based on the method for distinguishing speek person of respiratory characteristic, which is characterized in that in the step 2.1
Calculate breathing frame or unknown speech frame and breathing template similarity method the following steps are included:
Step 2.1.1: breath sample collection or unknown sound bite are divided into length by input breath sample collection or unknown sound bite
For 100 milliseconds of breathing frame or unknown speech frame, each breathing frame or unknown speech frame are divided into again continuous and overlapped
Subframe or unknown speech subframe are breathed, each subframe or unknown speech subframe length of breathing is 10ms, and adjacent unknown voice
Overlapped length is 5ms between subframe;
Step 2.1.2: preemphasis is carried out to each unknown speech subframe using first-order difference filter, the breathing after obtaining preemphasis
Frame or unknown speech frame;Wherein, first-order difference filter H:
H (z)=1- α z-1
α is pre-emphasis parameters α ≈ 0.095;Z is signal sampling point data;
Step 2.1.3: to it is each breathing frame or unknown speech frame each preemphasis after breathing subframe or unknown speech subframe
MFCC is calculated, the cepstrum matrix in short-term of each breathing frame or unknown speech frame is obtained, to each breathing frame or unknown speech frame
The each column of cepstrum matrix remove DC component in short-term, obtain the MFCC cepstrum matrix M (X) of each breathing frame or unknown speech frame;
Step 2.1.4: a breathing frame or unknown speech frame X are chosen;
Step 2.1.5: the normalization difference matrix D of the breathing frame or unknown speech frame that are selected is calculated:
Wherein, T indicates that the Mean Matrix of breath sample collection, V indicate that breath sample collection variance matrix, M (X) are selected
Breathe the MFCC cepstrum matrix of frame or unknown speech frame;
Step 2.1.6: each column of D are multiplied with half Hamming window, the cepstrum coefficient of low frequency is made to be strengthened:
D (:, j)=D (:, j) hamming, j ∈ [1, NC]
Wherein, Nc indicates the MFCC number of parameters in each breathing subframe or unknown speech subframe, the i.e. columns of D;Hamming table
Show Hamming window;
Step 2.1.7: point of the similarity B (X, T, V, S) of the breathing frame being selected or unknown speech frame X and breathing template is calculated
Measure Cp:
Wherein, n indicates the quantity that subframe or unknown speech subframe are breathed in the breathing frame or unknown speech frame X being selected, k ∈
[1, n], DkjIndicate j-th of MFCC ginseng in k-th of the breathing subframe or unknown speech subframe of frame to be breathed or unknown speech frame X
Number;
Calculate another component Cn of the similarity B (X, T, V, S) of the breathing frame being selected or unknown speech frame X and breathing template:
Step 2.1.8: the similarity B (X, T, V, S) of breathing frame or unknown speech frame X and breathing template are calculated:
B (X, T, V, S)=Cp*Cn;
Step 2.1.9: choosing the MFCC cepstrum matrix of another breathing frame or unknown speech frame, repeats step 2.1.5-2.1.8;
Step 2.1.10: repeating step 2.1.9, similar to breathing template until obtaining all breathing frames or unknown speech frame
Degree.
5. according to claim 1 based on the method for distinguishing speek person of respiratory characteristic described in any one of -4, which is characterized in that institute
Stating and eliminating the value that the border detection algorithm of false low ebb utilizes in step 3 includes breathing duration threshold, energy threshold, zero passage
Rate ZCR bound threshold value and spectrum slope accurately find breathing boundary, and the step 3 using Binary Zero -1, accurately exhale by instruction
It inhales in the position of current speech segment.
6. according to claim 1 based on the method for distinguishing speek person of respiratory characteristic described in any one of -4, which is characterized in that institute
It states and each speaks in breath sound and speaker's sample database after calculating being precisely separated of the unknown sound bite in step 5
The similarity of people's breath sample the following steps are included:
Step 5.1: setting the MFCC feature vector of speaker's breath sample in speaker's sample database as (a1,a2,...,
an), calculate the Mean Matrix M of the MFCC feature vector of speaker's breath sample in speaker's sample database:
Wherein, aiFor i-th of MFCC cepstrum square of the MFCC feature vector of speaker's breath sample in speaker's sample database
Battle array, n represent of the MFCC cepstrum matrix in the MFCC feature vector of speaker's breath sample in speaker's sample database
Number, i ∈ [1,2 ..., n];
Calculate the variance matrix V of the MFCC feature vector of speaker's breath sample in speaker's sample database:
Step 5.2: the MFCC feature vector of all breath sounds after calculating being precisely separated of unknown sound bite is denoted as (b1,
b2,...,bn), biFor the MFCC cepstrum matrix of the breath sound after being precisely separated for i-th of unknown sound bite;
Step 5.3: to the feature vector (a of speaker's breath sample in speaker's sample database1,a2,...,an) returned
One changes:
Wherein, r and c are respectively indicatedRow and column,For akNormalization difference matrix, k ∈ [1,2 ..., n], i ∈ [1,
2,…,r],j∈[1,2,…,c];
Step 5.4: by (Sa1,Sa2,...,San) ascending order arrangement is carried out, obtain (S1,S2,...,Sn);
Step 5.5: to the MFCC feature vectors of all breath sounds after being precisely separated of unknown sound bite (b1, b2 ...,
Bn it) is normalized:
Wherein, r and c are respectively indicatedRow and column,For bkNormalization difference matrix, k ∈ [1,2 ..., n], i ∈ [1,
2,…,r],j∈[1,2,…,c];
Step 5.6: calculating the similarity degree Pk of bk and reference template: by SbkWith the element of ordered vector (S1, S2 ..., Sn) into
Row compares one by one, and Pk is that the number of elements in ordered vector less than Sbk is total divided by element, calculates the average value of Pk, obtains not
The similarity of the sample in breath sound and breath sample database after knowing being precisely separated of sound bite.
7. according to claim 1 based on the method for distinguishing speek person of respiratory characteristic described in any one of -4, which is characterized in that institute
State the similarity packet of the breath sound after calculating being precisely separated of unknown sound bite in step 9 and the breathing segment of legal speaker
Include following steps:
Step 9.1: setting the MFCC feature vector of the breathing segment of legal speaker as (a1,a2,...,an), calculate legal speak
The Mean Matrix M of the MFCC feature vector of the breathing segment of people:
Wherein, aiFor i-th of MFCC cepstrum matrix of the MFCC feature vector of the breathing segment of legal speaker, n represents legal theory
Talk about the number of the MFCC cepstrum matrix in the MFCC feature vector of the breathing segment of people, i ∈ [1,2 ..., n];
Calculate the variance matrix V of the MFCC feature vector of the breathing segment of legal speaker:
Step 9.2: the MFCC feature vector of all breath sounds after calculating being precisely separated of unknown sound bite is denoted as (b1,
b2,...,bn), biFor the MFCC cepstrum matrix of the breath sound after being precisely separated for i-th of unknown sound bite;
Step 9.3: to the feature vector (a of the breathing segment of legal speaker1,a2,...,an) it is normalized:
Wherein, r and c are respectively indicatedRow and column,For akNormalization difference matrix, k ∈ [1,2 ..., n], i ∈ [1,
2,…,r],j∈[1,2,…,c];
Step 9.4: by (Sa1,Sa2,...,San) ascending order arrangement is carried out, obtain (S1,S2,...,Sn);
Step 9.5: to the MFCC feature vectors of all breath sounds after being precisely separated of unknown sound bite (b1, b2 ...,
Bn it) is normalized:
Wherein, r and c are respectively indicatedRow and column,For bkNormalization difference matrix, k ∈ [1,2 ..., n], i ∈ [1,
2,…,r],j∈[1,2,…,c];
Step 9.6: calculating the similarity degree Pk of bk and reference template: by SbkWith the element of ordered vector (S1, S2 ..., Sn) into
Row compares one by one, and Pk is that the number of elements in ordered vector less than Sbk is total divided by element, calculates the average value of Pk, obtains not
The similarity of the breathing segment of breath sound and legal speaker after knowing being precisely separated of sound bite.
8. according to claim 1 based on the method for distinguishing speek person of respiratory characteristic described in any one of -4, which is characterized in that institute
The method for stating the similarity of each speaker's breath sample in the test sample for calculating selection in step 7 and speaker's sample database
It is exhaled with the breath sound after being precisely separated of the unknown sound bite is calculated in step 5 with each speaker in speaker's sample database
The method for inhaling the similarity of sample is identical.
9. based on the method for distinguishing speek person of respiratory characteristic according to claim 6, which is characterized in that the step 1.3
Include: that the signal for needing to calculate MFCC is subjected to Fast Fourier Transform (FFT) with the method for calculating MFCC in step 5.2, then calculates
Complicated sine curve coefficient is finally exported by the filter group based on melscale.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610626034.0A CN106297805B (en) | 2016-08-02 | 2016-08-02 | A kind of method for distinguishing speek person based on respiratory characteristic |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610626034.0A CN106297805B (en) | 2016-08-02 | 2016-08-02 | A kind of method for distinguishing speek person based on respiratory characteristic |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106297805A CN106297805A (en) | 2017-01-04 |
CN106297805B true CN106297805B (en) | 2019-07-05 |
Family
ID=57664264
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610626034.0A Active CN106297805B (en) | 2016-08-02 | 2016-08-02 | A kind of method for distinguishing speek person based on respiratory characteristic |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106297805B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110473563A (en) * | 2019-08-19 | 2019-11-19 | 山东省计算中心(国家超级计算济南中心) | Breathing detection method, system, equipment and medium based on time-frequency characteristics |
CN111568400B (en) * | 2020-05-20 | 2024-02-09 | 山东大学 | Human body sign information monitoring method and system |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1547191A (en) * | 2003-12-12 | 2004-11-17 | 北京大学 | Semantic and sound groove information combined speaking person identity system |
JP2005530214A (en) * | 2002-06-19 | 2005-10-06 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | Mega speaker identification (ID) system and method corresponding to its purpose |
CN101770774A (en) * | 2009-12-31 | 2010-07-07 | 吉林大学 | Embedded-based open set speaker recognition method and system thereof |
CN102324232A (en) * | 2011-09-12 | 2012-01-18 | 辽宁工业大学 | Method for recognizing sound-groove and system based on gauss hybrid models |
CN102486922A (en) * | 2010-12-03 | 2012-06-06 | 株式会社理光 | Speaker recognition method, device and system |
CN103280220A (en) * | 2013-04-25 | 2013-09-04 | 北京大学深圳研究生院 | Real-time recognition method for baby cry |
CN104112446A (en) * | 2013-04-19 | 2014-10-22 | 华为技术有限公司 | Breathing voice detection method and device |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9704495B2 (en) * | 2012-02-21 | 2017-07-11 | Tata Consultancy Services Limited | Modified mel filter bank structure using spectral characteristics for sound analysis |
-
2016
- 2016-08-02 CN CN201610626034.0A patent/CN106297805B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2005530214A (en) * | 2002-06-19 | 2005-10-06 | コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ | Mega speaker identification (ID) system and method corresponding to its purpose |
CN1547191A (en) * | 2003-12-12 | 2004-11-17 | 北京大学 | Semantic and sound groove information combined speaking person identity system |
CN101770774A (en) * | 2009-12-31 | 2010-07-07 | 吉林大学 | Embedded-based open set speaker recognition method and system thereof |
CN102486922A (en) * | 2010-12-03 | 2012-06-06 | 株式会社理光 | Speaker recognition method, device and system |
CN102324232A (en) * | 2011-09-12 | 2012-01-18 | 辽宁工业大学 | Method for recognizing sound-groove and system based on gauss hybrid models |
CN104112446A (en) * | 2013-04-19 | 2014-10-22 | 华为技术有限公司 | Breathing voice detection method and device |
CN103280220A (en) * | 2013-04-25 | 2013-09-04 | 北京大学深圳研究生院 | Real-time recognition method for baby cry |
Also Published As
Publication number | Publication date |
---|---|
CN106297805A (en) | 2017-01-04 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Kinnunen | Spectral features for automatic text-independent speaker recognition | |
Kumar et al. | Design of an automatic speaker recognition system using MFCC, vector quantization and LBG algorithm | |
Bocklet et al. | Automatic evaluation of parkinson's speech-acoustic, prosodic and voice related cues. | |
Patel et al. | Speech recognition and verification using MFCC & VQ | |
Samantaray et al. | A novel approach of speech emotion recognition with prosody, quality and derived features using SVM classifier for a class of North-Eastern Languages | |
Yusnita et al. | Malaysian English accents identification using LPC and formant analysis | |
Zhao et al. | Speaker identification from the sound of the human breath | |
Usman | On the performance degradation of speaker recognition system due to variation in speech characteristics caused by physiological changes | |
CN106297805B (en) | A kind of method for distinguishing speek person based on respiratory characteristic | |
Chamoli et al. | Detection of emotion in analysis of speech using linear predictive coding techniques (LPC) | |
Kamble et al. | Emotion recognition for instantaneous Marathi spoken words | |
Kadiri et al. | Discriminating neutral and emotional speech using neural networks | |
Kumari et al. | An efficient algorithm for Gender Detection using voice samples | |
Deshpande et al. | Automatic Breathing Pattern Analysis from Reading-Speech Signals | |
Sahoo et al. | Analyzing the vocal tract characteristics for out-of-breath speech | |
Kumar et al. | Text dependent speaker identification in noisy environment | |
Mohamad Jamil et al. | A flexible speech recognition system for cerebral palsy disabled | |
Dumpala et al. | Analysis of the Effect of Speech-Laugh on Speaker Recognition System. | |
Tavi | Prosodic cues of speech under stress: Phonetic exploration of finnish emergency calls | |
Kabir et al. | Vector quantization in text dependent automatic speaker recognition using mel-frequency cepstrum coefficient | |
Stadelmann et al. | Unfolding speaker clustering potential: a biomimetic approach | |
Julia et al. | Detection of emotional expressions in speech | |
Ozdas | Analysis of paralinguistic properties of speech for near-term suicidal risk assessment | |
Elisha et al. | Automatic detection of obstructive sleep apnea using speech signal analysis | |
Patil et al. | Person recognition using humming, singing and speech |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |