CN106297805A - A kind of method for distinguishing speek person based on respiratory characteristic - Google Patents

A kind of method for distinguishing speek person based on respiratory characteristic Download PDF

Info

Publication number
CN106297805A
CN106297805A CN201610626034.0A CN201610626034A CN106297805A CN 106297805 A CN106297805 A CN 106297805A CN 201610626034 A CN201610626034 A CN 201610626034A CN 106297805 A CN106297805 A CN 106297805A
Authority
CN
China
Prior art keywords
breathing
unknown
speaker
frame
sound bite
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610626034.0A
Other languages
Chinese (zh)
Other versions
CN106297805B (en
Inventor
鲁力
刘玲霜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201610626034.0A priority Critical patent/CN106297805B/en
Publication of CN106297805A publication Critical patent/CN106297805A/en
Application granted granted Critical
Publication of CN106297805B publication Critical patent/CN106297805B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/06Decision making techniques; Pattern matching strategies

Abstract

The invention discloses a kind of method for distinguishing speek person based on respiratory characteristic, the method specifically includes that unknown input sound bite, by the breathing template set up by mel-frequency cepstrum coefficient MFCC, zero-crossing rate ZCR and short-time energy E extracts the respiratory murmur in unknown sound bite, then the border detection algorithm eliminating false low ebb is utilized to reject the false positive part in respiratory murmur, obtain the respiratory murmur after clean cut separation, whether the speaker of unknown sound bite is from sample speaker and judge whether the speaker of the unknown sound bite is legal speaker finally to utilize the respiratory murmur after clean cut separation to distinguish.The present invention achieves the uniqueness of human body respiration first and is paid close attention to and study, and it is effectively applied in Speaker Recognition System, overcome " extraction of breath signal " that exploitation based on the speaker Recognition Technology breathed face and " breath signal process " two is challenged greatly.Thus the Speaker Recognition System that the present invention provides is simply efficiently, and recognition result is accurately and reliably.

Description

A kind of method for distinguishing speek person based on respiratory characteristic
Technical field
A kind of method that the present invention relates to contactless biometric acquisition of signal, especially relates to a kind of based on breathing spy The method for distinguishing speek person levied.
Background technology
Speaker Identification (Speaker Recognition) is a class underlying issue, is subdivided into two classes: speaker identification asks Topic (Speaker Identification) and speaker verification's problem (Speaker Verification).The former distinguishes unknown Whether speaker is a member in speaker's sample database known to certain;The latter confirms whether the speaker's identity of statement closes Method.Identifying that speaker is divided into training and two stages of test, the training stage is for the foundation of speaker characteristic template, test phase Then calculate the similarity of test data and feature templates, and draw judged result.According to the degree of dependence difference to speech text, Speaker Identification is divided into again text relationship type (the most effective to certain special text), text independent type (any text is effective), literary composition This prompting-type (is subordinate to special text collection effective).Although phonetic feature can weaken because of mike, the reason of channel, can be by strong Health, the impact of emotion, the most imitated, but in recent years, speech processes correlation technique quickly grows, and has occurred many real Time application, make speech processes relevant issues obtain more concern and research.
The Speaker Identification scheme deposited now or based on Source-Filter (source-wave filter), or based on Source- System (source-system) model, or be simultaneously based on both and extract characteristic vector.Excitaton source information can pass through glottal signal base Residue sample linear prediction in shape represents.Channel information can be captured by cepstrum signal.Prosodic information can be held by statistics Continuous time, tone, the time dynamic of energy obtain.It is the energy source that sound produces based on aerodynamic respiratory One of, can be extracted and be processed as one section of complete voice.Existing research is devoted to breath signal in voice Detection and rejecting, in order to improve sound quality, improve speech-to-text converting algorithm, training typist and identify psychology shape Condition etc..
Source-Filter (source-wave filter) theory thinks that voice is the response of sound channel system, and gives non-linear , the good approximation of time dependent voice." source (source) " refers to 4 kinds of source speech signals: suction source, sources of friction, Glottis (sounding) source and transient state source.Sound channel act like a wave filter, its input is produced by above-mentioned 4 kinds of source speech signals, Output then forms vowel, consonant or arbitrarily voice.Sound channel also controls to manage tone and produces, voice quality, and harmonic wave, resonance is special Property, rdaiation response etc..
In source/system (source/system) model, voice is built according to linear slowly varying discrete-time system Mould.By this system of pulse excitation paracycle in the random noise in unvoiced speech source or speech sound source.Source comprises tone The phonetic feature easily made mistakes.Therefore, source model is rarely used in Speaker Identification, is the most seldom strengthened by other features. Relatively, system (system) model is corresponding with smooth power spectral envelope, and envelope passes through linear prediction or Mel wave filter Analysis obtains.Therefore, this model is widely used in the Speaker Recognition System about cepstrum coefficient.
Both models all using breathing the part as speech source, are converted into the voice in speech sound source or noiseless language Noise in source of sound.Shift to new management mechanisms it practice, respiratory is a kind of energy that energy is converted into sound.Additionally, at voice In breathing be limited, usually, expiratory duration is longer than inspiratory duration, and the breathing in non-voice in living, its exhale and Inspiration time is of substantially equal.
Respiratory system comprises lung, diaphram, Intercostal muscle and by bronchus, trachea, larynx, sound channel, the breathing letter that oral cavity is constituted Road.We regard and breathe as the physiology fingerprint of whole respiratory system, it by intra-pulmonary pressure, air flow and muscular movement managed with Control.During air-breathing, respiratory muscle shrinks, and intra-pulmonary pressure reduces, and air is in external inflow lung.Similarly, due in lung during expiration Pressure increases, and space compression in lung, air breathes out external in lung.According to anatomy principle, before and after breathing, certainly exist one Individual reticent interval.Breathing and affected by age, sex factor, normal continuous 100 400 milliseconds, reticent gap continues 20 milliseconds Above.Reticent gap is by breathing the key separated of demarcating.
The generation breathed is lung, intra-pulmonary pressure, diaphragm, sound channel, trachea, the coefficient result of respiratory muscle, is to breathe system Physiology fingerprint in system meaning.The flowing of air is not to complete moment, all has one before the generation therefore breathed and after occurring Individual reticent gap (>=20 milliseconds).Comparing with the voice signal (not comprising breathing) of ordinary meaning, the energy of breath signal is weak, time Between short (100 400 milliseconds), occurrence frequency low (12-18 beat/min), and produce overlapping at low frequency and non-respiratory voice signal (100Hz–1kHz).Additionally, respiratory murmur is the most similar to phoneme and consonant friction sound, as in " church "/ù/, " vision " In<Z>.Therefore, the exploitation breathed in speaker Recognition Technology face " extraction of breath signal " and " breath signal Process " two challenge greatly, thus cause breathing and be not exploited in speaker Recognition Technology, and often as breathing noise quilt Reject.
Summary of the invention
It is an object of the invention to: cannot be used effectively in speaker Recognition Technology for above-mentioned prior art is breathed In, and face " extraction of breath signal " and " breath signal process " based on the exploitation of speaker Recognition Technology breathed Two challenge greatly, and the present invention provides a kind of method for distinguishing speek person based on respiratory characteristic.
The technical solution used in the present invention is as follows:
A kind of method for distinguishing speek person based on respiratory characteristic, it is characterised in that comprise the following steps:
Step 1: input breath sample collection, carries out sub-frame processing to breath sample collection, obtains breathing frame, passes through mel-frequency Breathing frame is established as breathing template by cepstrum coefficient MFCC, and calculates breathing frame and the breathing template that each breath sample collection obtains Similarity, obtains its minima Bm;
Step 2: unknown input sound bite, carries out sub-frame processing to unknown sound bite, obtains unknown speech frame, calculates Each unknown speech frame and the similarity breathing template;Calculate the zero-crossing rate ZCR of unknown sound bite and unknown sound bite Short-time energy E;According to unknown sound bite with breathe the similarity of template, Bm, the zero-crossing rate ZCR of unknown sound bite and The short-time energy E of unknown sound bite filters out the respiratory murmur in unknown sound bite, and the respiratory murmur filtered out composition is preliminary Respiratory murmur after separation;
Step 3: utilize the reticent gap of the respiratory murmur after the border detection algorithm detection initial gross separation eliminating false low ebb, Reject the false positive part in the respiratory murmur after initial gross separation according to reticent gap, obtain the respiratory murmur after clean cut separation;
Step 4: choose one group of sample speaker, gathers the breathing fragment of each sample speaker, sets up one group of speaker Sample database, if need to judge, the speaker of unknown sound bite, whether from sample speaker, carries out step 5;If needing to judge Whether the speaker of unknown sound bite is legal speaker, carries out step 6;
Step 5: calculate the respiratory murmur after the clean cut separation of described unknown sound bite every with speaker's sample database The similarity of individual speaker's breath sample, takes sample corresponding to maximum similarity and speaks the speaker of artificial unknown sound bite, Terminate;
Step 6: to each sample speaker's collecting test sample, choose a test sample;
Step 7: calculate test sample speaker breath sample each to speaker's sample database similar chosen Degree, takes described test sample and the maximum in the similarity of each speaker's breath sample in speaker's sample database, To a maximum similarity;
Step 8: choose another test sample, repeats step 7, until the maximum obtaining all test samples corresponding is similar Degree, obtains maximum similarity group;
Step 9: gather the sound bite of legal speaker, utilizes breath sample collection to extract the breathing sheet of legal speaker Section, calculates the similarity of the respiratory murmur after the clean cut separation of unknown sound bite and the breathing fragment of legal speaker;
Step 10: if the respiratory murmur after the clean cut separation of unknown sound bite is similar to the breathing fragment of legal speaker Degree is more than the minima of maximum similarity group, then the speaker of unknown sound bite is accredited as legal speaker, otherwise is non- Method speaker.
In such scheme, described step 1 comprises the following steps:
Step 1.1: input breath sample collection, is divided into the breathing frame of a length of 100 milliseconds by described breath sample collection, will be every Individual breathing frame is divided into again continuous and overlapped breathing subframe, and each breather frame length is 10ms, and adjacent breather A length of 5ms overlapped between frame;
Step 1.2 uses first-order difference wave filter that each breathing subframe is carried out preemphasis, obtains the breather after preemphasis Frame;Wherein, first-order difference wave filter H:
H (z)=1-α z-1
Wherein, α is pre-emphasis parameters α ≈ 0.095, and z is signal sampling point data;
Step 1.3: the breathing subframe after each preemphasis of each breathing frame is calculated MFCC, obtains each breathing frame Cepstrum matrix in short-term, removes DC component to the every string of matrix of cepstrum in short-term of each breathing frame, obtains each breathing frame MFCC cepstrum matrix;
Step 1.4: the Mean Matrix T of calculating breath sample collection:
T = 1 N &Sigma; i = 1 N M ( X i )
Wherein, N represents breath sample and concentrates the number breathing frame, and M (Xi) represents that i-th breathes the MFCC cepstrum square of frame Battle array, i ∈ [1,2 ..., N];
The variance matrix V of calculating breath sample collection:
V = { 1 N &Sigma; i = 1 N &lsqb; M ( X i ) - T &rsqb; 2 }
Step 1.5: being connected by the MFCC cepstrum matrix of all breathing frames is a big matrix Mb: Mb=[M (X1),…,M (Xi),M(Xi+1),…,M(XN)]
Described big matrix is carried out singular value decomposition:
Mb=U Σ V*
Wherein, U is m × m rank unitary matrice;Σ is positive semidefinite m × n rank diagonal matrix, and V* represents the conjugate transpose of V, be n × N rank unitary matrice, the element on Σ diagonal is { λ123... }, it is the singular value of M, obtains singular value vector { λ12, λ3,…};
Use maximum singular value λmDescribed singular value vector is normalized, obtains the singular value vector after final normalizationWherein, λm=max{ λ123,…};
Step 1.6: obtain one group and breathe template, described breathing template includes singular value vector S after normalization, breathes sample The variance matrix V of this collection and the Mean Matrix T of breath sample collection.
In such scheme, described step 2 comprises the following steps:
Step 2.1: unknown input sound bite, carries out sub-frame processing to unknown sound bite, obtain unknown speech frame with Unknown speech subframe, calculates each unknown speech frame and similarity B (X, T, V, S) breathing template;Calculate breath sample collection Each breathing frame and the similarity breathing template, taking minimum similarity degree is Bm;
Calculate each unknown speech frame short-time energy E:
E = 1 N &Sigma; n = N 0 N 0 + N + 1 x 2 &lsqb; n &rsqb;
Wherein, n represents that signal the n-th sampled point, x [n] represent the n-th speech sample signal, and N represents that the window of sample is long Degree, N0The window start of expression sample is N0Individual sampled point;
Calculate the meansigma methods of all unknown speech frames
The zero-crossing rate ZCR of the unknown sound bite of calculating:
Z C R = 1 N &Sigma; n = N 0 + 1 N 0 + N + 1 0.5 | s g n ( x &lsqb; n &rsqb; ) - s g n ( x &lsqb; n - 1 &rsqb; ) |
Wherein, n represents that signal the n-th sampled point, x [n] represent the n-th speech sample signal, and N represents that the window of sample is long Degree, N0The window start of expression sample is N0Individual sampled point;
Step 2.2: choose a unknown speech frame;
Step 2.3: if the unknown speech frame being selected and similarity B (X, T, V, S) breathing template are more than threshold value Bm/2, And the zero-crossing rate ZCR of unknown sound bite is less than 0.25, and the short-time energy E of the unknown speech frame being selected is less than all The meansigma methods of unknown speech frameThen judging that the unknown speech frame being selected is respiratory murmur, if being unsatisfactory for described condition, then judging The unknown speech frame being selected is non-respiratory sound.
Step 2.4: choose other unknown speech frames, repeat step 2.3, until judging all the unknowns in unknown sound bite Whether speech frame is respiratory murmur;
Step 2.5: retain respiratory murmur, rejects non-respiratory sound, obtains initial gross separation respiratory murmur;
In such scheme, described step 2.1 calculates the side breathing frame or unknown speech frame with the similarity breathing template Method comprises the following steps:
Step 2.1.1: input breath sample collection or unknown sound bite, is divided into breath sample collection or unknown sound bite The breathing frame of a length of 100 milliseconds or unknown speech frame, be divided into each breathing frame or unknown speech frame again continuously and phase mutual respect Folded breathing subframe or unknown speech subframe, each subframe or unknown a length of 10ms of speech subframe of breathing, and adjacent the unknown A length of 5ms overlapped between speech subframe;
Step 2.1.2: use first-order difference wave filter that each unknown speech subframe is carried out preemphasis, after obtaining preemphasis Breathe frame or unknown speech frame;Wherein, first-order difference wave filter H:
H (z)=1-α z-1
Wherein, α is pre-emphasis parameters α ≈ 0.095;Z is signal sampling point data;
Step 2.1.3: to the breathing subframe after each preemphasis of each breathing frame or unknown speech frame or unknown voice Subframe calculates MFCC, obtains each breathing frame or the cepstrum matrix in short-term of unknown speech frame, to each breathing frame or unknown voice The every string of matrix of cepstrum in short-term of frame removes DC component, obtains each breathing frame or the MFCC cepstrum matrix M of unknown speech frame (X)
Step 2.1.4: choose one and breathe frame or unknown speech frame X;
Step 2.1.5: calculate be selected breathe frame or the normalization difference matrix D of unknown speech frame:
D = M ( X ) - T V
Wherein, T represents the Mean Matrix of breath sample collection, and V represents breath sample collection variance matrix, and M (X) is selected Take breathes frame or the MFCC cepstrum matrix of unknown speech frame;
Step 2.1.6: every string and half Hamming window of D are multiplied, make the cepstrum coefficient of low frequency be strengthened:
D (:, j)=D (:, j) hamming, j ∈ [1, NC]
Wherein, Nc represents the MFCC number of parameters in each breathing subframe or unknown speech subframe, the i.e. columns of D; Hamming represents Hamming window.
Step 2.1.7: calculate similarity B (X, T, V, S) breathing frame or unknown speech frame X and breathing template being selected Component Cp:
C p = 1 &Sigma; k = 1 n &Sigma; j = 1 N c | D k j | 2
Wherein, n represents that be selected breathes breathing subframe or the quantity of unknown speech subframe, k in frame or unknown speech frame X ∈ [1, n],Represent that the kth of frame to be breathed or unknown speech frame X breathes jth MFCC in subframe or unknown speech subframe Parameter;
Calculate another component breathing frame or unknown speech frame X and similarity B (X, T, V, S) breathing template being selected Cn:
C n = &Sigma; j = 1 N C D ( : , j ) &CenterDot; S ;
Step 2.1.8: calculate and breathe frame or unknown speech frame X and similarity B (X, T, V, S) breathing template:
B (X, T, V, S)=Cp*Cn;
Step 2.1.9: choose another and breathe frame or the MFCC cepstrum matrix of unknown speech frame, repeat step 2.1.5- 2.1.8;
Step 2.1.10: repeat step 2.1.9, until obtaining all breathing frames or unknown speech frame and the phase breathing template Like degree;
In such scheme, the value of the border detection algorithm utilization eliminating false low ebb in described step 3 includes that breathing continues Time threshold, energy threshold, zero-crossing rate ZCR bound threshold value and spectrum slope accurately find breathing border, described step 3 profit Breathe in the position of current speech segment with Binary Zero-1 accurately instruction.
In such scheme, method for distinguishing speek person based on respiratory characteristic according to claim 1, it is characterised in that Described step 5 calculates the respiratory murmur after the clean cut separation of this unknown sound bite and the speaker in speaker's sample database The similarity of breath sample comprises the following steps:
Step 5.1: set the MFCC characteristic vector of sample in breath sample data base as (a1,a2,...,an), calculate and say The Mean Matrix M of the MFCC characteristic vector of the speaker's breath sample in words people's sample database:
M = 1 n &Sigma; i = 1 n a i
Wherein, aiI-th MFCC for the MFCC characteristic vector of the speaker's breath sample in speaker's sample database Cepstrum matrix, n represents the MFCC cepstrum square in the MFCC characteristic vector of the speaker's breath sample in speaker's sample database The number of battle array, i ∈ [1,2 ..., n];
The variance matrix V of the MFCC characteristic vector of the speaker's breath sample in calculating speaker's sample database:
V = { 1 n &Sigma; i = 1 n &lsqb; a i - M &rsqb; 2 }
Step 5.2: calculate the MFCC characteristic vector of all respiratory murmurs after the clean cut separation of unknown sound bite, be designated as (b1,b2,...,bn), biMFCC cepstrum matrix for the respiratory murmur after the i-th clean cut separation of unknown sound bite;
Step 5.3: the characteristic vector (a to the speaker's breath sample in speaker's sample database1,a2,...,an) enter Row normalization:
S a k = 1 / &Sigma; i = 1 r &Sigma; j = 1 c | ( D a k ) i j | 2
D a k = a k - M V
Wherein, r and c represents respectivelyRow and column,For akNormalization difference matrix, k ∈ [1,2 ..., n], i ∈[1,2,…,r],j∈[1,2,…,c];
Step 5.4: by (Sa1,Sa2,...,San) carry out ascending order arrangement, obtain (S1,S2,...,Sn);
Step 5.5: to the MFCC characteristic vector of all respiratory murmurs after the clean cut separation of unknown sound bite (b1, B2 ..., bn) it is normalized:
S b k = 1 / &Sigma; i = 1 r &Sigma; j = 1 c | ( D b k ) i j | 2
D b k = b k - M V
Wherein, r and c represents respectivelyRow and column,For bkNormalization difference matrix, k ∈ [1,2 ..., n], i ∈ [1,2,…,r],j∈[1,2,…,c];
Step 5.6: calculate the similarity degree Pk of bk and reference template: by SbkWith ordered vector (S1, S2 ..., Sn) Element compares one by one, Pk be in ordered vector less than Sbk number of elements divided by element sum, calculate Pk meansigma methods, Obtain the similarity of the respiratory murmur after the clean cut separation of unknown sound bite and the sample in breath sample data base.
In such scheme, described step 9 calculates the respiratory murmur after the clean cut separation of unknown sound bite and speaks with legal The similarity of the breathing fragment of people comprises the following steps:
Step 9.1: set the MFCC characteristic vector of breathing fragment of legal speaker as (a1,a2,...,an), it is legal to calculate The Mean Matrix M of the MFCC characteristic vector of the breathing fragment of speaker:
M = 1 n &Sigma; i = 1 n a i
Wherein, aiFor the i-th MFCC cepstrum matrix of MFCC characteristic vector of the breathing fragment of legal speaker, n represents The number of the MFCC cepstrum matrix in the MFCC characteristic vector of the breathing fragment of legal speaker, i ∈ [1,2 ..., n];
Calculate the variance matrix V of the MFCC characteristic vector of the breathing fragment of legal speaker:
V = { 1 n &Sigma; i = 1 n &lsqb; a i - M &rsqb; 2 }
Step 9.2: calculate the MFCC characteristic vector of all respiratory murmurs after the clean cut separation of unknown sound bite, be designated as (b1,b2,...,bn), biMFCC cepstrum matrix for the respiratory murmur after the i-th clean cut separation of unknown sound bite;
Step 9.3: legal speaker is breathed the characteristic vector (a of fragment1,a2,...,an) it is normalized:
S a k = 1 / &Sigma; i = 1 r &Sigma; j = 1 c | ( D a k ) i j | 2
D a k = a k - M V
Wherein, r and c represents respectivelyRow and column,For akNormalization difference matrix, k ∈ [1,2 ..., n], i ∈[1,2,…,r],j∈[1,2,…,c];
Step 9.4: by (Sa1,Sa2,...,San) carry out ascending order arrangement, obtain (S1,S2,...,Sn);
Step 9.5: to the MFCC characteristic vector of all respiratory murmurs after the clean cut separation of unknown sound bite (b1, B2 ..., bn) it is normalized:
S b k = 1 / &Sigma; i = 1 r &Sigma; j = 1 c | ( D b k ) i j | 2
D b k = b k - M V
Wherein, r and c represents respectivelyRow and column,For bkNormalization difference matrix, k ∈ [1,2 ..., n], i ∈ [1,2,…,r],j∈[1,2,…,c];
Step 9.6: calculate the similarity degree Pk of bk and reference template: by SbkWith ordered vector (S1, S2 ..., Sn) Element compares one by one, Pk be in ordered vector less than Sbk number of elements divided by element sum, calculate Pk meansigma methods, Obtain the similarity of the respiratory murmur after the clean cut separation of unknown sound bite and the breathing fragment of legal speaker.
In such scheme, described step 7 calculates the test sample and each speaker in speaker's sample database chosen The method of the similarity of breath sample and step 5 calculate the respiratory murmur after the clean cut separation of this unknown sound bite and speaker In sample database, the method for the similarity of each speaker's breath sample is identical.
In such scheme, the method calculating MFCC in described step 1.3 and step 5.2 includes: calculate MFCC's by needing Signal carries out fast Fourier transform, then calculates complicated sine curve coefficient, finally by bank of filters based on melscale Exported.
In sum, owing to have employed technique scheme, the invention has the beneficial effects as follows:
1) present invention is as a set of based on the Verification System breathed, and the uniqueness achieving human body respiration first is paid close attention to And research, and be effectively applied in Speaker Recognition System, overcome exploitation based on the speaker Recognition Technology breathed " extraction of breath signal " and " the breath signal process " two that face are challenged greatly.
2) present invention knowledge based on mathematical statistics, devises a light similarity algorithm for decision-making: this calculation Method is a series of simple vector operations utilizing MFCC Mean Matrix and variance matrix.Compared with traditional classification algorithm, this Similarity algorithm in bright has more excellent classification performance.
3) present invention can operate with speaker identification's experiment and speaker verification tests;If simultaneously because a people's exhales Haustorium official be interfered, then his breathing is signed it is possible to be modified, therefore this invention can be used for judging human body respiration organ Whether it is interfered.
4) present invention can realize needing the identification under quiet occasion.
5) can realize cannot the identification of tester of sounding for the present invention.
6) sorting technique that the present invention uses relatively with traditional based on multiparameter, the complex model classification side assumed more Method, has relatively low time complexity and space complexity.Additionally, the present invention uses algorithm process data based on MFCC more Hurry up, required training sample is less, and ensures recognition accuracy, thus the Speaker Recognition System that the present invention provides is simply efficient, And recognition result is accurately and reliably.
Accompanying drawing explanation
Fig. 1 is to judge the system framework figure that unknown speaker identity is the most legal in the present invention;
Fig. 2 is the frame diagram breathing Preliminary detection in the present invention in step 2;
Fig. 3 is the frame diagram breathing final detection in the present invention in step 3;
Fig. 4 is the experimental result signal table of step 6-8 in the present invention;
Fig. 5 represents the contrast after Mel bank of filters acts on breath signal and non-respiratory voice signal in the present invention;
Fig. 6 represents the feature of ZCR in the present invention, spectrum slope and STE;
Fig. 7 represents the formant of the voice signal of breath signal and non-respiratory in the present invention;
Fig. 8 represents the breathing under normal condition and the breath signal under abnormal condition in the present invention;
Detailed description of the invention
All features disclosed in this specification, in addition to mutually exclusive feature and/or step, all can be with any Mode combines.
Elaborate below in conjunction with-8 couples of present invention of Fig. 1.
The present invention proposes a kind of method for distinguishing speek person based on respiratory characteristic, and this model is applied to Speaker Identification and takes Obtain effect well.The schematic diagram that realizes of whole algorithm is similar to Fig. 1, including step:
Step 1: such as Fig. 1, input breath sample collection, breath sample collection is carried out sub-frame processing, obtains breathing frame, passes through prunus mume (sieb.) sieb.et zucc. Breathing frame is established as breathing template by you frequency cepstral coefficient MFCC;Step 1 specifically includes following steps:
Step 1.1: input breath sample collection, is divided into the breathing frame of a length of 100 milliseconds by described breath sample collection, will be every Individual breathing frame is divided into again continuous and overlapped breathing subframe, and each breather frame length is 10ms, and adjacent breather A length of 5ms overlapped between frame;
Step 1.2 uses first-order difference wave filter that each breathing subframe is carried out preemphasis, obtains the breather after preemphasis Frame;Wherein, first-order difference wave filter H:
H (z)=1-α z-1
Wherein, α is pre-emphasis parameters α ≈ 0.095, and z is signal sampling point data;
Step 1.3: the breathing subframe after each preemphasis of each breathing frame is calculated MFCC, obtains each breathing frame Cepstrum matrix in short-term, removes DC component to the every string of matrix of cepstrum in short-term of each breathing frame, obtains each breathing frame MFCC cepstrum matrix;
Step 1.4: the Mean Matrix T of calculating breath sample collection:
T = 1 N &Sigma; i = 1 N M ( X i )
Wherein, N represents breath sample and concentrates the number breathing frame, and M (Xi) represents that i-th breathes the MFCC cepstrum square of frame Battle array, i ∈ [1,2 ..., N];
The variance matrix V of calculating breath sample collection:
V = { 1 N &Sigma; i = 1 N &lsqb; M ( X i ) - T &rsqb; 2 }
Step 1.5: being connected by the MFCC cepstrum matrix of all breathing frames is a big matrix Mb: Mb=[M (X1),…,M (Xi),M(Xi+1),…,M(XN)]
Described big matrix is carried out singular value decomposition:
Mb=U Σ V*
Wherein, U is m × m rank unitary matrice;Σ is positive semidefinite m × n rank diagonal matrix, and V* represents the conjugate transpose of V, be n × N rank unitary matrice, the element on Σ diagonal is { λ123... }, it is the singular value of M, obtains singular value vector { λ12, λ3,…};
Use maximum singular value λmDescribed singular value vector is normalized, obtains the singular value vector after final normalizationWherein, λm=max{ λ123,…};
Step 1.6: obtain one group and breathe template, described breathing template includes singular value vector S after normalization, breathes sample The variance matrix V of this collection and the Mean Matrix T of breath sample collection.
Step 2: such as Fig. 2, unknown input sound bite, unknown sound bite is carried out sub-frame processing, obtains unknown voice Frame, the similarity calculating each unknown speech frame with breathing template, calculate the zero-crossing rate ZCR of unknown sound bite and unknown language The short-time energy E of tablet section;According to unknown sound bite and the breathing similarity of template, Bm, the zero-crossing rate of unknown sound bite The short-time energy E of ZCR and unknown sound bite filters out the respiratory murmur in unknown sound bite, the respiratory murmur group filtered out Become the respiratory murmur after initial gross separation;
Described step 2 comprises the following steps:
Step 2.1: unknown input sound bite, carries out sub-frame processing to unknown sound bite, obtain unknown speech frame with Unknown speech subframe, calculates each unknown speech frame and similarity B (X, T, V, S) breathing template;
Calculating each breathing frame of breath sample collection and breathe the similarity of template, taking minimum similarity degree is Bm;
Calculate each unknown speech frame short-time energy E:
E = 1 N &Sigma; n = N 0 N 0 + N + 1 x 2 &lsqb; n &rsqb;
Wherein, n represents that signal the n-th sampled point, x [n] represent the n-th speech sample signal, and N represents that the window of sample is long Degree, N0The window start of expression sample is N0Individual sampled point;
Calculate the meansigma methods of all unknown speech frames
The zero-crossing rate ZCR of the unknown sound bite of calculating:
Z C R = 1 N &Sigma; n = N 0 + 1 N 0 + N + 1 0.5 | s g n ( x &lsqb; n &rsqb; ) - s g n ( x &lsqb; n - 1 &rsqb; ) |
Wherein, n represents that signal the n-th sampled point, x [n] represent the n-th speech sample signal, and N represents that the window of sample is long Degree, N0The window start of expression sample is N0Individual sampled point;
The method calculating the similarity breathed frame or unknown speech frame and breathe template in described step 2.1 includes following step Rapid:
Step 2.1.1: input breath sample collection or unknown sound bite, is divided into breath sample collection or unknown sound bite The breathing frame of a length of 100 milliseconds or unknown speech frame, be divided into each breathing frame or unknown speech frame again continuously and phase mutual respect Folded breathing subframe or unknown speech subframe, each subframe or unknown a length of 10ms of speech subframe of breathing, and adjacent the unknown A length of 5ms overlapped between speech subframe;
Step 2.1.2: use first-order difference wave filter that each unknown speech subframe is carried out preemphasis, after obtaining preemphasis Breathe frame or unknown speech frame;Wherein, first-order difference wave filter H:
H (z)=1-α z-1
Wherein, α is pre-emphasis parameters α ≈ 0.095;Z is signal sampling point data;
Step 2.1.3: to the breathing subframe after each preemphasis of each breathing frame or unknown speech frame or unknown voice Subframe calculates MFCC, obtains each breathing frame or the cepstrum matrix in short-term of unknown speech frame, to each breathing frame or unknown voice The every string of matrix of cepstrum in short-term of frame removes DC component, obtains each breathing frame or the MFCC cepstrum matrix M of unknown speech frame (X)
Step 2.1.4: choose one and breathe frame or unknown speech frame X;
Step 2.1.5: calculate be selected breathe frame or the normalization difference matrix D of unknown speech frame:
D = M ( X ) - T V
Wherein, T represents the Mean Matrix of breath sample collection, and V represents breath sample collection variance matrix, and M (X) is selected Take breathes frame or the MFCC cepstrum matrix of unknown speech frame;
Step 2.1.6: every string and half Hamming window of D are multiplied, make the cepstrum coefficient of low frequency be strengthened:
D (:, j)=D (:, j) hamming, j ∈ [1, NC]
Wherein, Nc represents the MFCC number of parameters in each breathing subframe or unknown speech subframe, the i.e. columns of D; Hamming represents Hamming window.
Step 2.1.7: calculate similarity B (X, T, V, S) breathing frame or unknown speech frame X and breathing template being selected Component Cp:
C p = 1 &Sigma; k = 1 n &Sigma; j = 1 N c | D k j | 2
Wherein, n represents that be selected breathes breathing subframe or the quantity of unknown speech subframe, k in frame or unknown speech frame X ∈ [1, n],Represent that the kth of frame to be breathed or unknown speech frame X breathes jth MFCC in subframe or unknown speech subframe Parameter;
Calculate another component breathing frame or unknown speech frame X and similarity B (X, T, V, S) breathing template being selected Cn:
C n = &Sigma; j = 1 N C D ( : , j ) &CenterDot; S ;
Step 2.1.8: calculate and breathe frame or unknown speech frame X and similarity B (X, T, V, S) breathing template:
B (X, T, V, S)=Cp*Cn;
Step 2.1.9: choose another and breathe frame or the MFCC cepstrum matrix of unknown speech frame, repeat step 2.1.5- 2.1.8;
Step 2.1.10: repeat step 2.1.9, until obtaining all breathing frames or unknown speech frame and the phase breathing template Like degree;
In such scheme, the value of the border detection algorithm utilization eliminating false low ebb in described step 3 includes that breathing continues Time threshold, energy threshold, zero-crossing rate ZCR bound threshold value and spectrum slope accurately find breathing border, described step 3 profit Breathe in the position of current speech segment with Binary Zero-1 accurately instruction.
Step 2.2: choose a unknown speech frame;
Step 2.3: if the unknown speech frame being selected and similarity B (X, T, V, S) breathing template are more than threshold value Bm/2, And the zero-crossing rate ZCR of unknown sound bite is less than 0.25 (now sample rate is 44kHz), and the unknown speech frame being selected Short-time energy E less than the meansigma methods of all unknown speech framesThen judge that the unknown speech frame being selected is respiratory murmur, if not Meet described condition, then judge that the unknown speech frame being selected is non-respiratory sound.
Step 2.4: choose other unknown speech frames, repeat step 2.3, until judging all the unknowns in unknown sound bite Whether speech frame is respiratory murmur;
Step 2.5: retain respiratory murmur, rejects non-respiratory sound, obtains initial gross separation respiratory murmur;
Step 3: such as Fig. 3, utilize the heavy of the respiratory murmur after the border detection algorithm detection initial gross separation eliminating false low ebb Silent gap, rejects the false positive part in the respiratory murmur after initial gross separation according to reticent gap, obtains the breathing after clean cut separation Sound;The IEEE of the 3rd phase of volume 15 in described such as in the March, 2007 of border detection algorithm specific implementation eliminating false low ebb TRANSACTIONS ON AUDIO, " An effective algorithm in SPEECH, AND LANGUAGE PROCESSING for automatic detection and exact demarcation of breath sounds in speech and Song " literary composition;
Step 4: choose one group of sample speaker, gathers the breathing fragment of each sample speaker, sets up one group of speaker Sample database, if need to judge, the speaker of unknown sound bite, whether from sample speaker, carries out step 5;If needing to judge Whether the speaker of unknown sound bite is legal speaker, carries out step 6;
Step 5: calculate in the respiratory murmur after the clean cut separation of described unknown sound bite and speaker's sample database The similarity of each speaker's breath sample, takes sample corresponding to maximum similarity and speaks the speaker of artificial unknown sound bite, Terminate;
Described step 5 calculates in the respiratory murmur after the clean cut separation of this unknown sound bite and speaker's sample database The similarity of speaker's breath sample comprise the following steps:
Step 5.1: set the MFCC characteristic vector of speaker's breath sample in speaker's sample database as (a1, a2,...,an), the Mean Matrix M of the MFCC characteristic vector of the speaker's breath sample in calculating speaker's sample database:
M = 1 n &Sigma; i = 1 n a i
Wherein, aiI-th MFCC for the MFCC characteristic vector of the speaker's breath sample in speaker's sample database Cepstrum matrix, n represents the MFCC cepstrum square in the MFCC characteristic vector of the speaker's breath sample in speaker's sample database The number of battle array, i ∈ [1,2 ..., n];
The variance matrix V of the MFCC characteristic vector of the speaker's breath sample in calculating speaker's sample database:
V = { 1 n &Sigma; i = 1 n &lsqb; a i - M &rsqb; 2 }
Step 5.2: calculate the MFCC characteristic vector of all respiratory murmurs after the clean cut separation of unknown sound bite, be designated as (b1,b2,...,bn), biMFCC cepstrum matrix for the respiratory murmur after the i-th clean cut separation of unknown sound bite;
Step 5.3: the characteristic vector (a to the speaker's breath sample in speaker's sample database1,a2,...,an) enter Row normalization:
S a k = 1 / &Sigma; i = 1 r &Sigma; j = 1 c | ( D a k ) i j | 2
D a k = a k - M V
Wherein, r and c represents respectivelyRow and column,For akNormalization difference matrix, k ∈ [1,2 ..., n], i ∈[1,2,…,r],j∈[1,2,…,c];
Step 5.4: by (Sa1,Sa2,...,San) carry out ascending order arrangement, obtain (S1,S2,...,Sn);
Step 5.5: to the MFCC characteristic vector of all respiratory murmurs after the clean cut separation of unknown sound bite (b1, B2 ..., bn) it is normalized:
S b k = 1 / &Sigma; i = 1 r &Sigma; j = 1 c | ( D b k ) i j | 2
D b k = b k - M V
Wherein, r and c represents respectivelyRow and column,For bkNormalization difference matrix, k ∈ [1,2 ..., n], i ∈[1,2,…,r],j∈[1,2,…,c];
Step 5.6: calculate the similarity degree Pk of bk and reference template: by SbkWith ordered vector (S1, S2 ..., Sn) Element compares one by one, Pk be in ordered vector less than Sbk number of elements divided by element sum, calculate Pk meansigma methods, Obtain the similarity of the respiratory murmur after the clean cut separation of unknown sound bite and the sample in breath sample data base.
Step 6: to each sample speaker's collecting test sample, choose a test sample;
Step 7: such as Fig. 4, calculate the test sample and each speaker's breath sample in speaker's sample database chosen Similarity, take the maximum in the similarity of each speaker's breath sample in described test sample and speaker's sample database Value, obtains a maximum similarity;
Described step 7 calculates the test sample and each speaker's breath sample in speaker's sample database chosen The method of similarity and step 5 calculate the respiratory murmur after the clean cut separation of this unknown sound bite and speaker's sample database In the method for similarity of each speaker's breath sample identical.
Step 8: such as Fig. 4, chooses another test sample, repeats step 7, until it is corresponding to obtain all test samples Big similarity, obtains maximum similarity group;
Step 9: gather the breathing fragment of legal speaker, calculate the respiratory murmur after the clean cut separation of unknown sound bite with The similarity of the breathing fragment of legal speaker;
Described step 9 calculates the breathing fragment of the respiratory murmur after the clean cut separation of unknown sound bite and legal speaker Similarity comprise the following steps:
Step 9.1: set the MFCC characteristic vector of breathing fragment of legal speaker as (a1,a2,...,an), it is legal to calculate The Mean Matrix M of the MFCC characteristic vector of the breathing fragment of speaker:
M = 1 n &Sigma; i = 1 n a i
Wherein, aiFor the i-th MFCC cepstrum matrix of MFCC characteristic vector of the breathing fragment of legal speaker, n represents The number of the MFCC cepstrum matrix in the MFCC characteristic vector of the breathing fragment of legal speaker, i ∈ [1,2 ..., n];
Calculate the variance matrix V of the MFCC characteristic vector of the breathing fragment of legal speaker:
V = { 1 n &Sigma; i = 1 n &lsqb; a i - M &rsqb; 2 }
Step 9.2: calculate the MFCC characteristic vector of all respiratory murmurs after the clean cut separation of unknown sound bite, be designated as (b1,b2,...,bn), biMFCC cepstrum matrix for the respiratory murmur after the i-th clean cut separation of unknown sound bite;
Step 9.3: legal speaker is breathed the characteristic vector (a of fragment1,a2,...,an) it is normalized:
S a k = 1 / &Sigma; i = 1 r &Sigma; j = 1 c | ( D a k ) i j | 2
D a k = a k - M V
Wherein, r and c represents respectivelyRow and column,For akNormalization difference matrix, k ∈ [1,2 ..., n], i ∈[1,2,…,r],j∈[1,2,…,c];
Step 9.4: by (Sa1,Sa2,...,San) carry out ascending order arrangement, obtain (S1,S2,...,Sn);
Step 9.5: to the MFCC characteristic vector of all respiratory murmurs after the clean cut separation of unknown sound bite (b1, B2 ..., bn) it is normalized:
S b k = 1 / &Sigma; i = 1 r &Sigma; j = 1 c | ( D b k ) i j | 2
D b k = b k - M V
Wherein, r and c represents respectivelyRow and column,For bkNormalization difference matrix, k ∈ [1,2 ..., n], i ∈[1,2,…,r],j∈[1,2,…,c];
Step 9.6: calculate the similarity degree Pk of bk and reference template: by SbkWith ordered vector (S1, S2 ..., Sn) Element compares one by one, Pk be in ordered vector less than Sbk number of elements divided by element sum, calculate Pk meansigma methods, Obtain the similarity of the respiratory murmur after the clean cut separation of unknown sound bite and the breathing fragment of legal speaker.
Step 10: if the respiratory murmur after the clean cut separation of unknown sound bite is similar to the breathing fragment of legal speaker Degree is more than the minima of maximum similarity group, then the speaker of unknown sound bite is accredited as legal speaker, otherwise is non- Method speaker.
The method calculating MFCC in described step 1.3 and step 5.2 includes: carry out needing the signal calculating MFCC quickly Fourier transformation, then calculates complicated sine curve coefficient, is finally exported by bank of filters based on melscale.
The present invention is illustrated by above-described embodiment, but it is to be understood that, above-described embodiment is only intended to Citing and descriptive purpose, and be not intended to limit the invention in described scope of embodiments.In addition people in the art Member, it is understood that the invention is not limited in above-described embodiment, can also make more kinds of according to the teachings of the present invention Variants and modifications, within these variants and modifications all fall within scope of the present invention.Protection scope of the present invention by The appended claims and equivalent scope thereof are defined.

Claims (9)

1. a method for distinguishing speek person based on respiratory characteristic, it is characterised in that comprise the following steps:
Step 1: input breath sample collection, carries out sub-frame processing to breath sample collection, obtains breathing frame, by mel-frequency cepstrum Breathing frame is established as breathing template by coefficient MFCC, and calculate that each breath sample collection obtains to breathe frame similar with breathing template Degree, obtains its minima Bm;
Step 2: unknown input sound bite, carries out sub-frame processing to unknown sound bite, obtains unknown speech frame, calculates each Unknown speech frame and the similarity breathing template;Calculate the zero-crossing rate ZCR of unknown sound bite and the short of unknown sound bite Shi NengliangE;According to unknown sound bite and the breathing similarity of template, Bm, the zero-crossing rate ZCR of unknown sound bite and the unknown The short-time energy E of sound bite filters out the respiratory murmur in unknown sound bite, the respiratory murmur filtered out composition initial gross separation After respiratory murmur;
Step 3: utilize the reticent gap of the respiratory murmur after the border detection algorithm detection initial gross separation eliminating false low ebb, according to The false positive part in the respiratory murmur after initial gross separation is rejected in reticent gap, obtains the respiratory murmur after clean cut separation;
Step 4: choose one group of sample speaker, gathers the breathing fragment of each sample speaker, sets up one group of speaker's sample Data base, if need to judge, the speaker of unknown sound bite, whether from sample speaker, carries out step 5;
If need to judge, the speaker of unknown sound bite, whether as legal speaker, carries out step 6;
Step 5: calculate the respiratory murmur after the clean cut separation of described unknown sound bite and each theory in speaker's sample database The similarity of words people's breath sample, takes sample corresponding to maximum similarity and speaks the speaker of artificial unknown sound bite, terminate;
Step 6: to each sample speaker's collecting test sample, choose a test sample;
Step 7: calculate the test sample and the similarity of each speaker's breath sample in speaker's sample database chosen, take Described test sample and the maximum in the similarity of each speaker's breath sample in speaker's sample database, obtain one Maximum similarity;
Step 8: choose another test sample, repeats step 7, until obtaining the maximum similarity that all test samples are corresponding, Obtain maximum similarity group;
Step 9: gather the sound bite of legal speaker, utilizes breath sample collection to extract the breathing fragment of legal speaker, meter Calculate the similarity of the respiratory murmur after the clean cut separation of unknown sound bite and the breathing fragment of legal speaker;
Step 10: if the respiratory murmur after the clean cut separation of unknown sound bite is big with the similarity of the breathing fragment of legal speaker In the minima of maximum similarity group, then the speaker of unknown sound bite is accredited as legal speaker, otherwise is illegally to say Words people.
Method for distinguishing speek person based on respiratory characteristic the most according to claim 1, it is characterised in that described step 1 includes Following steps:
Step 1.1: input breath sample collection, is divided into the breathing frame of a length of 100 milliseconds, by each by described breath sample collection Breathing frame is divided into again continuous and overlapped breathing subframe, each breather frame length to be 10ms, and adjacent breathing subframe Between overlapped a length of 5ms;
Step 1.2 uses first-order difference wave filter that each breathing subframe is carried out preemphasis, obtains the breathing subframe after preemphasis;
Wherein, first-order difference wave filter H:
H (z)=1-α z-1
Wherein, α is pre-emphasis parameters α ≈ 0.095, and z is signal sampling point data;
Step 1.3: the breathing subframe after each preemphasis of each breathing frame is calculated MFCC, obtains each breathing frame in short-term Cepstrum matrix, removes DC component to the every string of matrix of cepstrum in short-term of each breathing frame, and the MFCC obtaining each breathing frame falls Spectrum matrix;
Step 1.4: the Mean Matrix T of calculating breath sample collection:
Wherein, N represents breath sample and concentrates the number breathing frame, and M (Xi) represents that i-th breathes the MFCC cepstrum matrix of frame, i ∈ [1,2,…,N];
The variance matrix V of calculating breath sample collection:
Step 1.5: being connected by the MFCC cepstrum matrix of all breathing frames is a big matrix Mb:
Mb=[M (X1),…,M(Xi),M(Xi+1),…,M(XN)]
Described big matrix is carried out singular value decomposition:
Mb=U Σ V*
Wherein, U is m × m rank unitary matrice;Σ is positive semidefinite m × n rank diagonal matrix, and V* represents the conjugate transpose of V, is n × n rank Unitary matrice, the element on Σ diagonal is { λ123... }, it is the singular value of M, obtains singular value vector { λ12, λ3,…};
Use maximum singular value λmDescribed singular value vector is normalized, obtains the singular value vector after final normalizationWherein, λm=max{ λ123,…};
Step 1.6: obtaining one group and breathe template, described breathing template includes singular value vector S after normalization, breath sample collection Variance matrix V and the Mean Matrix T of breath sample collection.
Method for distinguishing speek person based on respiratory characteristic the most according to claim 1, it is characterised in that described step 2 includes Following steps:
Step 2.1: unknown input sound bite, carries out sub-frame processing to unknown sound bite, obtains unknown speech frame with unknown Speech subframe, calculates each unknown speech frame and similarity B (X, T, V, S) breathing template;Calculate each of breath sample collection The similarity breathing frame and breathe template, taking minimum similarity degree is Bm;
Calculate each unknown speech frame short-time energy E:
Wherein, n represents that signal the n-th sampled point, x [n] represent the n-th speech sample signal, and N represents the length of window of sample, N0 The window start of expression sample is N0Individual sampled point;
Calculate the meansigma methods of all unknown speech frames
The zero-crossing rate ZCR of the unknown sound bite of calculating:
Wherein, n represents that signal the n-th sampled point, x [n] represent the n-th speech sample signal, and N represents the length of window of sample, N0 The window start of expression sample is N0Individual sampled point;
Step 2.2: choose a unknown speech frame;
Step 2.3: if the unknown speech frame being selected and similarity B (X, T, V, S) breathing template are more than threshold value Bm/2, and The zero-crossing rate ZCR of unknown sound bite is less than 0.25, and the short-time energy E of the unknown speech frame being selected is less than all the unknowns The meansigma methods of speech frameThen judging that the unknown speech frame being selected is respiratory murmur, if being unsatisfactory for described condition, then judging selected The unknown speech frame taken is non-respiratory sound.
Step 2.4: choose other unknown speech frames, repeat step 2.3, until judging all unknown voices in unknown sound bite Whether frame is respiratory murmur;
Step 2.5: retain respiratory murmur, rejects non-respiratory sound, obtains initial gross separation respiratory murmur.
Method for distinguishing speek person based on respiratory characteristic the most according to claim 3, it is characterised in that in described step 2.1 Calculate and breathe the method for frame or unknown speech frame and the similarity of breathing template and comprise the following steps:
Step 2.1.1: input breath sample collection or unknown sound bite, is divided into length by breath sample collection or unknown sound bite It is the breathing frame of 100 milliseconds or unknown speech frame, each breathing frame or unknown speech frame are divided into again continuous and overlapped Breathe subframe or unknown speech subframe, each subframe or unknown a length of 10ms of speech subframe of breathing, and adjacent unknown voice A length of 5ms overlapped between subframe;
Step 2.1.2: use first-order difference wave filter that each unknown speech subframe is carried out preemphasis, obtain exhaling after preemphasis Inhale frame or unknown speech frame;Wherein, first-order difference wave filter H:
H (z)=1-α z-1
α is pre-emphasis parameters α ≈ 0.095;Z is signal sampling point data;
Step 2.1.3: to the breathing subframe after each preemphasis of each breathing frame or unknown speech frame or unknown speech subframe Calculate MFCC, obtain each breathing frame or the cepstrum matrix in short-term of unknown speech frame, to each breathing frame or unknown speech frame The every string of cepstrum matrix removes DC component in short-term, obtains each breathing frame or MFCC cepstrum matrix M (X) of unknown speech frame
Step 2.1.4: choose one and breathe frame or unknown speech frame X;
Step 2.1.5: calculate be selected breathe frame or the normalization difference matrix D of unknown speech frame:
Wherein, T represents the Mean Matrix of breath sample collection, and V represents breath sample collection variance matrix, and M (X) is be selected Breathe frame or the MFCC cepstrum matrix of unknown speech frame;
Step 2.1.6: every string and half Hamming window of D are multiplied, make the cepstrum coefficient of low frequency be strengthened:
D (:, j)=D (:, j) hamming, j ∈ [1, NC]
Wherein, Nc represents the MFCC number of parameters in each breathing subframe or unknown speech subframe, the i.e. columns of D;Hamming table Show Hamming window.
Step 2.1.7: what calculating was selected breathes dividing of frame or unknown speech frame X similarity B (X, T, V, S) with breathing template Amount Cp:
Wherein, n represents that be selected breathes breathing subframe or the quantity of unknown speech subframe, k ∈ in frame or unknown speech frame X [1, n], DkjRepresent that the kth of frame to be breathed or unknown speech frame X breathes jth MFCC ginseng in subframe or unknown speech subframe Number;
Calculate another component Cn breathing frame or unknown speech frame X and similarity B (X, T, V, S) breathing template being selected:
Step 2.1.8: calculate and breathe frame or unknown speech frame X and similarity B (X, T, V, S) breathing template:
B (X, T, V, S)=Cp*Cn;
Step 2.1.9: choose another and breathe frame or the MFCC cepstrum matrix of unknown speech frame, repeat step 2.1.5-2.1.8;
Step 2.1.10: repeat step 2.1.9, until it is similar with breathing template to obtain all breathing frames or unknown speech frame Degree.
5. according to method for distinguishing speek person based on respiratory characteristic described in claim 1-4, it is characterised in that in described step 3 The value of the border detection algorithm utilization eliminating false low ebb includes breathing duration threshold, and energy threshold, zero-crossing rate ZCR is upper and lower Limit threshold value and spectrum slope accurately find breathing border, and described step 3 utilizes Binary Zero-1 accurately instruction to breathe at current language The position of tablet section.
6. according to method for distinguishing speek person based on respiratory characteristic described in claim 1-5, it is characterised in that in described step 5 Calculate the respiratory murmur after the clean cut separation of described unknown sound bite and breathe sample with each speaker in speaker's sample database This similarity comprises the following steps:
Step 5.1: set the MFCC characteristic vector of speaker's breath sample in speaker's sample database as (a1,a2,..., an), the Mean Matrix M of the MFCC characteristic vector of the speaker's breath sample in calculating speaker's sample database:
Wherein, aiI-th MFCC cepstrum square for the MFCC characteristic vector of the speaker's breath sample in speaker's sample database Battle array, n represents the individual of the MFCC cepstrum matrix in the MFCC characteristic vector of the speaker's breath sample in speaker's sample database Number, i ∈ [1,2 ..., n];
The variance matrix V of the MFCC characteristic vector of the speaker's breath sample in calculating speaker's sample database:
Step 5.2: calculate the MFCC characteristic vector of all respiratory murmurs after the clean cut separation of unknown sound bite, be designated as (b1, b2,...,bn), biMFCC cepstrum matrix for the respiratory murmur after the i-th clean cut separation of unknown sound bite;
Step 5.3: the characteristic vector (a to the speaker's breath sample in speaker's sample database1,a2,...,an) return One changes:
Wherein, r and c represents respectivelyRow and column,For akNormalization difference matrix, k ∈ [1,2 ..., n], i ∈ [1, 2,…,r],j∈[1,2,…,c];
Step 5.4: by (Sa1,Sa2,...,San) carry out ascending order arrangement, obtain (S1,S2,...,Sn);
Step 5.5: to the MFCC characteristic vector of all respiratory murmurs after the clean cut separation of unknown sound bite (b1, b2 ..., Bn) it is normalized:
Wherein, r and c represents respectivelyRow and column,For bkNormalization difference matrix, k ∈ [1,2 ..., n], i ∈ [1, 2,…,r],j∈[1,2,…,c];
Step 5.6: calculate the similarity degree Pk of bk and reference template: by SbkWith ordered vector (S1, S2 ..., Sn) element enter Row compare one by one, Pk be in ordered vector less than Sbk number of elements divided by element sum, calculate Pk meansigma methods, obtain not Know the similarity of the respiratory murmur after the clean cut separation of sound bite and the sample in breath sample data base.
7. according to method for distinguishing speek person based on respiratory characteristic described in claim 1-6, it is characterised in that in described step 9 The similarity of the breathing fragment calculating the respiratory murmur after the clean cut separation of unknown sound bite and legal speaker includes following step Rapid:
Step 9.1: set the MFCC characteristic vector of breathing fragment of legal speaker as (a1,a2,...,an), calculate legal speaking The Mean Matrix M of the MFCC characteristic vector of the breathing fragment of people:
Wherein, aiFor the i-th MFCC cepstrum matrix of MFCC characteristic vector of the breathing fragment of legal speaker, n represents legal theory The number of the MFCC cepstrum matrix in the MFCC characteristic vector of the breathing fragment of words people, i ∈ [1,2 ..., n];
Calculate the variance matrix V of the MFCC characteristic vector of the breathing fragment of legal speaker:
Step 9.2: calculate the MFCC characteristic vector of all respiratory murmurs after the clean cut separation of unknown sound bite, be designated as (b1, b2,...,bn), biMFCC cepstrum matrix for the respiratory murmur after the i-th clean cut separation of unknown sound bite;
Step 9.3: legal speaker is breathed the characteristic vector (a of fragment1,a2,...,an) it is normalized:
Wherein, r and c represents respectivelyRow and column,For akNormalization difference matrix, k ∈ [1,2 ..., n], i ∈ [1, 2,…,r],j∈[1,2,…,c];
Step 9.4: by (Sa1,Sa2,...,San) carry out ascending order arrangement, obtain (S1,S2,...,Sn);
Step 9.5: to the MFCC characteristic vector of all respiratory murmurs after the clean cut separation of unknown sound bite (b1, b2 ..., Bn) it is normalized:
Wherein, r and c represents respectivelyRow and column,For bkNormalization difference matrix, k ∈ [1,2 ..., n], i ∈ [1, 2,…,r],j∈[1,2,…,c];
Step 9.6: calculate the similarity degree Pk of bk and reference template: by SbkWith ordered vector (S1, S2 ..., Sn) element enter Row compare one by one, Pk be in ordered vector less than Sbk number of elements divided by element sum, calculate Pk meansigma methods, obtain not Know the similarity of the respiratory murmur after the clean cut separation of sound bite and the breathing fragment of legal speaker.
8. according to method for distinguishing speek person based on respiratory characteristic described in claim 1-7, it is characterised in that in described step 7 Calculate in method and the step 5 of the similarity of each speaker's breath sample in the test sample and speaker's sample database chosen Calculate the respiratory murmur after the clean cut separation of this unknown sound bite and each speaker's breath sample in speaker's sample database The method of similarity is identical.
9. according to method for distinguishing speek person based on respiratory characteristic described in claim 1-8, it is characterised in that described step 1.3 Include with the method calculating MFCC in step 5.2: by needing the signal calculating MFCC to carry out fast Fourier transform, then calculate Complicated sine curve coefficient, is finally exported by bank of filters based on melscale.
CN201610626034.0A 2016-08-02 2016-08-02 A kind of method for distinguishing speek person based on respiratory characteristic Active CN106297805B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610626034.0A CN106297805B (en) 2016-08-02 2016-08-02 A kind of method for distinguishing speek person based on respiratory characteristic

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610626034.0A CN106297805B (en) 2016-08-02 2016-08-02 A kind of method for distinguishing speek person based on respiratory characteristic

Publications (2)

Publication Number Publication Date
CN106297805A true CN106297805A (en) 2017-01-04
CN106297805B CN106297805B (en) 2019-07-05

Family

ID=57664264

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610626034.0A Active CN106297805B (en) 2016-08-02 2016-08-02 A kind of method for distinguishing speek person based on respiratory characteristic

Country Status (1)

Country Link
CN (1) CN106297805B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110473563A (en) * 2019-08-19 2019-11-19 山东省计算中心(国家超级计算济南中心) Breathing detection method, system, equipment and medium based on time-frequency characteristics
CN111568400A (en) * 2020-05-20 2020-08-25 山东大学 Human body sign information monitoring method and system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1547191A (en) * 2003-12-12 2004-11-17 北京大学 Semantic and sound groove information combined speaking person identity system
JP2005530214A (en) * 2002-06-19 2005-10-06 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Mega speaker identification (ID) system and method corresponding to its purpose
CN101770774A (en) * 2009-12-31 2010-07-07 吉林大学 Embedded-based open set speaker recognition method and system thereof
CN102324232A (en) * 2011-09-12 2012-01-18 辽宁工业大学 Method for recognizing sound-groove and system based on gauss hybrid models
CN102486922A (en) * 2010-12-03 2012-06-06 株式会社理光 Speaker recognition method, device and system
CN103280220A (en) * 2013-04-25 2013-09-04 北京大学深圳研究生院 Real-time recognition method for baby cry
CN104112446A (en) * 2013-04-19 2014-10-22 华为技术有限公司 Breathing voice detection method and device
US20150016617A1 (en) * 2012-02-21 2015-01-15 Tata Consultancy Services Limited Modified mel filter bank structure using spectral characteristics for sound analysis

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005530214A (en) * 2002-06-19 2005-10-06 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Mega speaker identification (ID) system and method corresponding to its purpose
CN1547191A (en) * 2003-12-12 2004-11-17 北京大学 Semantic and sound groove information combined speaking person identity system
CN101770774A (en) * 2009-12-31 2010-07-07 吉林大学 Embedded-based open set speaker recognition method and system thereof
CN102486922A (en) * 2010-12-03 2012-06-06 株式会社理光 Speaker recognition method, device and system
CN102324232A (en) * 2011-09-12 2012-01-18 辽宁工业大学 Method for recognizing sound-groove and system based on gauss hybrid models
US20150016617A1 (en) * 2012-02-21 2015-01-15 Tata Consultancy Services Limited Modified mel filter bank structure using spectral characteristics for sound analysis
CN104112446A (en) * 2013-04-19 2014-10-22 华为技术有限公司 Breathing voice detection method and device
CN103280220A (en) * 2013-04-25 2013-09-04 北京大学深圳研究生院 Real-time recognition method for baby cry

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110473563A (en) * 2019-08-19 2019-11-19 山东省计算中心(国家超级计算济南中心) Breathing detection method, system, equipment and medium based on time-frequency characteristics
CN111568400A (en) * 2020-05-20 2020-08-25 山东大学 Human body sign information monitoring method and system
CN111568400B (en) * 2020-05-20 2024-02-09 山东大学 Human body sign information monitoring method and system

Also Published As

Publication number Publication date
CN106297805B (en) 2019-07-05

Similar Documents

Publication Publication Date Title
Kinnunen Spectral features for automatic text-independent speaker recognition
Kandali et al. Emotion recognition from Assamese speeches using MFCC features and GMM classifier
Bocklet et al. Automatic evaluation of parkinson's speech-acoustic, prosodic and voice related cues.
Kumar et al. Design of an automatic speaker recognition system using MFCC, vector quantization and LBG algorithm
CN107293302A (en) A kind of sparse spectrum signature extracting method being used in voice lie detection system
CN108922541A (en) Multidimensional characteristic parameter method for recognizing sound-groove based on DTW and GMM model
Yusnita et al. Malaysian English accents identification using LPC and formant analysis
CN109727608A (en) A kind of ill voice appraisal procedure based on Chinese speech
CN104992707A (en) Cleft palate voice glottal stop automatic identification algorithm and device
Sun et al. Investigating glottal parameters for differentiating emotional categories with similar prosodics
Fezari et al. Acoustic analysis for detection of voice disorders using adaptive features and classifiers
Zhao et al. Speaker identification from the sound of the human breath
Usman On the performance degradation of speaker recognition system due to variation in speech characteristics caused by physiological changes
CN106297805B (en) A kind of method for distinguishing speek person based on respiratory characteristic
Le et al. A study of voice source and vocal tract filter based features in cognitive load classification
Kadiri et al. Discriminating neutral and emotional speech using neural networks
Jha et al. Discriminant feature vectors for characterizing ailment cough vs. simulated cough
Nandwana et al. A new front-end for classification of non-speech sounds: a study on human whistle
Dumpala et al. Analysis of the Effect of Speech-Laugh on Speaker Recognition System.
Hui et al. Emotion classification of mandarin speech based on TEO nonlinear features
Mohamad Jamil et al. A flexible speech recognition system for cerebral palsy disabled
Kumar et al. Text dependent speaker identification in noisy environment
Kabir et al. Vector quantization in text dependent automatic speaker recognition using mel-frequency cepstrum coefficient
Sahoo et al. Detection of speech-based physical load using transfer learning approach
Julia et al. Detection of emotional expressions in speech

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant