CN106297805B - A kind of method for distinguishing speek person based on respiratory characteristic - Google Patents

A kind of method for distinguishing speek person based on respiratory characteristic Download PDF

Info

Publication number
CN106297805B
CN106297805B CN201610626034.0A CN201610626034A CN106297805B CN 106297805 B CN106297805 B CN 106297805B CN 201610626034 A CN201610626034 A CN 201610626034A CN 106297805 B CN106297805 B CN 106297805B
Authority
CN
China
Prior art keywords
breathing
unknown
speaker
frame
breath
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610626034.0A
Other languages
Chinese (zh)
Other versions
CN106297805A (en
Inventor
鲁力
刘玲霜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201610626034.0A priority Critical patent/CN106297805B/en
Publication of CN106297805A publication Critical patent/CN106297805A/en
Application granted granted Critical
Publication of CN106297805B publication Critical patent/CN106297805B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Measurement Of The Respiration, Hearing Ability, Form, And Blood Characteristics Of Living Organisms (AREA)

Abstract

The invention discloses a kind of method for distinguishing speek person based on respiratory characteristic, this method specifically includes that unknown input sound bite, pass through the breathing template established by mel-frequency cepstrum coefficient MFCC, zero-crossing rate ZCR and short-time energy E extracts the breath sound in unknown sound bite, then the false positive part in breath sound is rejected using the border detection algorithm for eliminating false low ebb, breath sound after being precisely separated, finally distinguish whether the speaker of unknown sound bite from sample speaker and judges whether the speaker of unknown sound bite is legal speaker using the breath sound after being precisely separated.The uniqueness that the present invention realizes human body respiration for the first time is paid close attention to and is studied, and it is effectively applied and overcomes " extraction of breath signal " and " breath signal processing " two challenge greatly that the development and utilization of the speaker Recognition Technology based on breathing face in Speaker Recognition System.Thus Speaker Recognition System provided by the invention is simple and efficient, and recognition result is accurate and reliable.

Description

A kind of method for distinguishing speek person based on respiratory characteristic
Technical field
The present invention relates to a kind of system and methods of contactless biometric signal detection, are based on more particularly, to one kind The Speaker Recognition System and method of respiratory characteristic.
Background technique
Speaker Identification (Speaker Recognition) is a kind of underlying issue, and be subdivided into two classes: speaker identification asks Inscribe (Speaker Identication) and speaker verification's problem (Speaker Verification).The former distinguishes unknown theory Whether words people is a member in certain known speaker's sample database;The latter confirms whether the speaker's identity of statement is legal. Identification speaker is divided into training and two stages of test, and the training stage is used for the foundation of speaker characteristic template, and test phase is then The similarity of test data and feature templates is calculated, and obtains judging result.It is different according to the degree of dependence to speech text, it says Words people's identification is divided into text relationship type (only effective to some special text), text independent type (any text is effective), text again Prompting-type (it is effective to be subordinate to special text collection).Although phonetic feature can weaken due to the reason of because of microphone, channel, will receive strong The influence of health, mood, or even be imitated, but in recent years, speech processes the relevant technologies are quickly grown, and have already appeared many realities instantly When application, so that speech processes relevant issues has been obtained more concerns and research.
The present Speaker Identification scheme deposited is based on Source-Filter (source-filter), or is based on Source- System (source-system) model, or feature vector is extracted based on the two simultaneously.Excitation source information can pass through glottal signal base It is indicated in the remaining sample linear prediction of shape.Channel information can be captured by cepstrum signal.Prosodic information can be held by statistics The continuous time, tone, energy time dynamic obtain.It is the energy source that sound generates based on aerodynamic respiratory One of, it can be extracted and be handled as one section of complete voice.Existing research is dedicated to breath signal in voice Detection and rejecting improve speech-to-text converting algorithm improving sound quality, training typist and identify psychological shape Condition etc..
Source-Filter (source-filter) theory thinks that voice is the response of sound channel system, and gives non-linear , the good approximation of the voice changed over time." source (source) " refers to 4 kinds of source speech signals: suction source, sources of friction, Glottis (sounding) source and transient state source.Sound channel acts like a filter, and input is generated by above-mentioned 4 kinds of source speech signals, Output then forms vowel, consonant or any voice.Sound channel, which also controls, manages tone generation, voice quality, harmonic wave, and resonance is special Property, rdaiation response etc..
In source/system (source/system) model, voice is built according to linear slowly varying discrete-time system Mould.The system is excited by the random noise in unvoiced speech source or the pulse paracycle in speech sound source.Source includes tone The phonetic feature easily to malfunction.Therefore, source model is rarely used in Speaker Identification, is also seldom enhanced with other features. Relatively, system (system) model is corresponding with smooth power spectral envelope, and envelope passes through linear prediction or Meier filter Analysis obtains.Therefore, which is widely used in the Speaker Recognition System in relation to cepstrum coefficient.
Both models are using breathing as a part of speech source, the voice being converted into speech sound source or noiseless language Noise in source of sound.In fact, respiratory is that a kind of energy for converting sound for energy is shifted to new management mechanisms.In addition, in voice In breathing be it is limited, generally, expiratory duration is longer than inspiratory duration, and the breathing in living in non-voice, exhale and Inspiration time is of substantially equal.
Respiratory system includes lung, diaphram, intercostal muscle and by bronchus, tracheae, larynx, sound channel, the breathing letter that oral cavity is constituted Road.We regard breathing as the physiology fingerprint of entire respiratory system, it by intra-pulmonary pressure, air flow direction and muscular movement manage with Control.When air-breathing, respiratory muscle is shunk, and intra-pulmonary pressure reduces, and air flows into intrapulmonary from external.Similarly, due to intrapulmonary when expiration Pressure increases, intrapulmonary space compression, and air is breathed out from intrapulmonary to external.According to anatomy principle, breathing front and back certainly exists one A silencing interval.Breathing is influenced by age, sex factor, and 100-400 milliseconds of normal continuous, silencing gap continues 20 milliseconds More than.Silencing gap is the key that carry out breathing description separation.
The generation of breathing is that lung, intra-pulmonary pressure, diaphragm, sound channel, tracheae, respiratory muscle are coefficient as a result, being breathing system Physiology fingerprint in meaning of uniting.The flowing of air is not to complete moment, therefore all have one before the generation of breathing and after occurring A silencing gap (>=20 milliseconds).It is compared with the voice signal (not including breathing) of ordinary meaning, the energy of breath signal is weak, when Between short (100-400 milliseconds), occurrence frequency is low (12-18 beats/min), and Chong Die with the generation of non-respiratory voice signal in low frequency (100Hz–1kHz).In addition, breath sound and phoneme and consonant fricative are especially similar, in such as " church "//, " vision " In<Z>.Therefore, development and utilization of the breathing in speaker Recognition Technology face " extraction of breath signal " and " breath signal The big challenge of processing " two, is not exploited in speaker Recognition Technology so as to cause breathing, and often as breathing noise quilt It rejects.
Summary of the invention
It is an object of the invention to: it can not be used effectively for above-mentioned breathing in the prior art in speaker Recognition Technology In, and the development and utilization of the speaker Recognition Technology based on breathing face " extraction of breath signal " and " breath signal processing " Two big challenges, the present invention provide a kind of Speaker Recognition System and method based on respiratory characteristic.
The technical solution adopted by the invention is as follows:
A kind of method for distinguishing speek person based on respiratory characteristic, it is characterised in that the following steps are included:
Step 1: input breath sample collection carries out sub-frame processing to breath sample collection, obtains breathing frame, passes through mel-frequency Cepstrum coefficient MFCC will breathe frame and be established as breathing template, and calculate breathing frame and breathe template that each breath sample collection obtains Similarity obtains its minimum value Bm;
Step 2: unknown input sound bite carries out sub-frame processing to unknown sound bite, obtains unknown speech frame, calculates The similarity of each unknown speech frame and breathing template;Calculate unknown speech frame zero-crossing rate ZCR and unknown speech frame it is short Shi NengliangE;According to unknown speech frame and the breathing similarity of template, Bm, unknown speech frame zero-crossing rate ZCR and unknown voice The short-time energy E of frame filters out the breath sound in unknown sound bite, exhaling after the breath sound composition initial gross separation filtered out Sound-absorbing;
Step 3: the silencing gap of the breath sound after initial gross separation is detected using the border detection algorithm for eliminating false low ebb, The false positive part in the breath sound after initial gross separation, the breath sound after being precisely separated are rejected according to silencing gap;
Step 4: choosing one group of sample speaker, acquire the breathing segment of each sample speaker, establish one group of speaker Sample database carries out step 5 if need to determine whether the speaker of unknown sound bite comes from sample speaker;If needing to determine Whether the speaker of unknown sound bite is legal speaker, carries out step 6;
Step 5: every in the breath sound and speaker's sample database after calculating being precisely separated of the unknown sound bite The similarity of a speaker's breath sample takes the corresponding sample of maximum similarity to speak the speaker of artificial unknown sound bite, Terminate;
Step 6: to each sample speaker's collecting test sample, choosing a test sample;
Step 7: the test sample for calculating selection is similar to speaker's breath sample each in speaker's sample database Degree takes the maximum value in the test sample and speaker's sample database in the similarity of each speaker's breath sample, obtains To a maximum similarity;
Step 8: another test sample is chosen, step 7 is repeated, it is similar until obtaining the corresponding maximum of all test samples Degree, obtains maximum similarity group;
Step 9: acquiring the sound bite of legal speaker, the breathing piece of legal speaker is extracted using breath sample collection Section, the similarity of the breathing segment of breath sound and legal speaker after calculating being precisely separated of unknown sound bite;
Step 10: if the breath sound after being precisely separated of unknown sound bite is similar to the breathing segment of legal speaker Degree is greater than the minimum value of maximum similarity group, then the speaker of unknown sound bite is accredited as legal speaker, otherwise is non- Method speaker.
In above scheme, the step 1 the following steps are included:
Step 1.1: the breath sample collection is divided into the breathing frame that length is 100 milliseconds by input breath sample collection, will be every A breathing frame is divided into continuous and overlapped breathing subframe again, and each subframe lengths that breathe is 10ms, and adjacent breather Overlapped length is 5ms between frame;
Step 1.2 carries out preemphasis to each breathing subframe using first-order difference filter, the breather after obtaining preemphasis Frame;Wherein, first-order difference filter H:
H (z)=1- α z-1
Wherein, α is pre-emphasis parameters α ≈ 0.095, and z is signal sampling point data;
Step 1.3: MFCC being calculated to the breathing subframe after each preemphasis of each breathing frame, obtains each breathing frame Cepstrum matrix in short-term removes DC component to each column of the matrix of cepstrum in short-term of each breathing frame, obtains each breathing frame MFCC cepstrum matrix;
Step 1.4: calculate the Mean Matrix T of breath sample collection:
Wherein, N represents the number that breath sample concentrates breathing frame, and M (Xi) indicates the MFCC cepstrum square of i-th of breathing frame Battle array, i ∈ [1,2 ..., N];
Calculate the variance matrix V of breath sample collection:
Step 1.5: the MFCC cepstrum matrix series connection by all breathing frames is matrix M one bigb: Mb=[M (X1),…,M (Xi),M(Xi+1),…,M(XN)]
Singular value decomposition is carried out to the big matrix:
Mb=U Σ V*
Wherein, U is m × m rank unitary matrice;Σ is positive semidefinite m × n rank diagonal matrix, and V* indicates the conjugate transposition of V, be n × N rank unitary matrice, the element on Σ diagonal line is { λ123..., the as singular value of M obtains singular value vector { λ12, λ3,…};
With maximum singular value λmThe singular value vector is normalized, the singular value vector after finally being normalizedWherein, λm=max { λ123,…};
Step 1.6: obtaining one group of breathing template, the breathing template includes singular value vector S, the breathing sample after normalization The variance matrix V of this collection and Mean Matrix T of breath sample collection.
In above scheme, the step 2 the following steps are included:
Step 2.1: unknown input sound bite, to unknown sound bite carry out sub-frame processing, obtain unknown speech frame with Unknown speech subframe calculates the similarity B (X, T, V, S) of each unknown speech frame and breathing template;Calculate breath sample collection Each breathing frame and the similarity for breathing template, taking minimum similarity degree is Bm;
Calculate each unknown speech frame short-time energy E:
Wherein, n indicates that n-th of sampled point of signal, x [n] indicate that n-th of speech sample signal, N indicate that the window of sample is long Degree, N0The window start for indicating sample is N0A sampled point;
Calculate the average value of all unknown speech frames
Calculate the zero-crossing rate ZCR of unknown sound bite:
Wherein, n indicates that n-th of sampled point of signal, x [n] indicate that n-th of speech sample signal, N indicate that the window of sample is long Degree, N0The window start for indicating sample is N0A sampled point;
Step 2.2: choosing a unknown speech frame;
Step 2.3: if the unknown speech frame being selected and the similarity B (X, T, V, S) of breathing template are greater than threshold value Bm/2, And the zero-crossing rate ZCR of unknown speech frame is less than 0.25, and the unknown speech frame being selected short-time energy E be less than it is all not Know the average value of speech frame, then judge that the unknown speech frame being selected judges quilt if being unsatisfactory for the condition as breath sound The unknown speech frame chosen is non-respiratory sound.
Step 2.4: other unknown speech frames are chosen, step 2.3 is repeated, it is all unknown in unknown sound bite until judging Whether speech frame is breath sound;
Step 2.5: retaining breath sound, reject non-respiratory sound, obtain initial gross separation breath sound;
The side of the similarity of breathing frame or unknown speech frame and breathing template is calculated in above scheme, in the step 2.1 Method the following steps are included:
Step 2.1.1: breath sample collection or unknown sound bite are divided by input breath sample collection or unknown sound bite Each breathing frame or unknown speech frame are divided into continuous and phase mutual respect by the breathing frame or unknown speech frame that length is 100 milliseconds again Folded breathing subframe or unknown speech subframe, each to breathe subframe or unknown speech subframe length is 10ms and adjacent is unknown Overlapped length is 5ms between speech subframe;
Step 2.1.2: preemphasis is carried out to each unknown speech subframe using first-order difference filter, after obtaining preemphasis Breathe frame or unknown speech frame;Wherein, first-order difference filter H:
H (z)=1- α z-1
Wherein, α is pre-emphasis parameters α ≈ 0.095;Z is signal sampling point data;
Step 2.1.3: to it is each breathing frame or unknown speech frame each preemphasis after breathing subframe or unknown voice Subframe calculates MFCC, the cepstrum matrix in short-term of each breathing frame or unknown speech frame is obtained, to each breathing frame or unknown voice The each column of the matrix of cepstrum in short-term of frame remove DC component, obtain the MFCC cepstrum matrix M of each breathing frame or unknown speech frame (X)
Step 2.1.4: a breathing frame or unknown speech frame X are chosen;
Step 2.1.5: the normalization difference matrix D of the breathing frame or unknown speech frame that are selected is calculated:
Wherein, T indicates that the Mean Matrix of breath sample collection, V indicate that breath sample collection variance matrix, M (X) are quilt The breathing frame of selection or the MFCC cepstrum matrix of unknown speech frame;
Step 2.1.6: each column of D are multiplied with half Hamming window, the cepstrum coefficient of low frequency is made to be strengthened:
D (:, j)=D (:, j) hamming, j ∈ [1, NC]
Wherein, Nc indicates the MFCC number of parameters in each breathing subframe or unknown speech subframe, the i.e. columns of D; Hamming indicates Hamming window.
Step 2.1.7: calculate be selected breathing frame or unknown speech frame X and breathing template similarity B (X, T, V, S component Cp):
Wherein, n indicates the quantity that subframe or unknown speech subframe are breathed in the breathing frame or unknown speech frame X being selected, k ∈ [1, n], DkjIndicate j-th of MFCC ginseng in k-th of the breathing subframe or unknown speech subframe of frame to be breathed or unknown speech frame X Number;
Calculate another component of the similarity B (X, T, V, S) of the breathing frame being selected or unknown speech frame X and breathing template Cn:
Step 2.1.8: the similarity B (X, T, V, S) of breathing frame or unknown speech frame X and breathing template are calculated:
B (X, T, V, S)=Cp*Cn;
Step 2.1.9: choosing the MFCC cepstrum matrix of another breathing frame or unknown speech frame, repeats step 2.1.5- 2.1.8;
Step 2.1.10: repeating step 2.1.9, until obtaining the phase of all breathing frames or unknown speech frame with breathing template Like degree;
In above scheme, the value that the border detection algorithm of the false low ebb of elimination utilizes in the step 3 includes that breathing continues Time threshold, energy threshold, zero-crossing rate ZCR bound threshold value and spectrum slope accurately find breathing boundary, step 3 benefit With Binary Zero -1, accurately instruction is breathed in the position of current speech segment.
In above scheme, according to claim 1 based on the method for distinguishing speek person of respiratory characteristic, which is characterized in that The speaker in breath sound and speaker's sample database after calculating being precisely separated of the unknown sound bite in the step 5 The similarity of breath sample the following steps are included:
Step 5.1: setting the MFCC feature vector of the sample in breath sample database as (a1,a2,...,an), calculating is said Talk about the Mean Matrix M of the MFCC feature vector of speaker's breath sample in proper manners database:
Wherein, aiFor i-th MFCC of the MFCC feature vector of speaker's breath sample in speaker's sample database Cepstrum matrix, n represent the MFCC cepstrum square in the MFCC feature vector of speaker's breath sample in speaker's sample database The number of battle array, i ∈ [1,2 ..., n];
Calculate the variance matrix V of the MFCC feature vector of speaker's breath sample in speaker's sample database:
Step 5.2: the MFCC feature vector of all breath sounds after calculating being precisely separated of unknown sound bite is denoted as (b1,b2,...,bn), biFor the MFCC cepstrum matrix of the breath sound after being precisely separated for i-th of unknown sound bite;
Step 5.3: to the feature vector (a of speaker's breath sample in speaker's sample database1,a2,..., an) It is normalized:
Wherein, r and c are respectively indicatedRow and column,For akNormalization difference matrix, k ∈ [1,2 ..., n], i ∈[1,2,…,r],j∈[1,2,…,c];
Step 5.4: by (Sa1,Sa2,...,San) ascending order arrangement is carried out, obtain (S1,S2,...,Sn);
Step 5.5: to the MFCC feature vectors of all breath sounds after being precisely separated of unknown sound bite (b1, B2 ..., bn) it is normalized:
Wherein, r and c are respectively indicatedRow and column,For bkNormalization difference matrix, k ∈ [1,2 ..., n], i ∈[1,2,…,r],j∈[1,2,…,c];
Step 5.6: calculating the similarity degree Pk of bk and reference template: by SbkWith ordered vector (S1, S2 ..., Sn) Element is compared one by one, and Pk is that the number of elements in ordered vector less than Sbk is total divided by element, calculates the average value of Pk, The similarity of the sample in breath sound and breath sample database after obtaining being precisely separated of unknown sound bite.
In above scheme, the breath sound after being precisely separated of unknown sound bite is calculated in the step 9 is spoken with legal The similarity of the breathing segment of people the following steps are included:
Step 9.1: setting the MFCC feature vector of the breathing segment of legal speaker as (a1,a2,...,an), it is legal to calculate The Mean Matrix M of the MFCC feature vector of the breathing segment of speaker:
Wherein, aiFor i-th of MFCC cepstrum matrix of the MFCC feature vector of the breathing segment of legal speaker, n is represented The number of MFCC cepstrum matrix in the MFCC feature vector of the breathing segment of legal speaker, i ∈ [1,2 ..., n];
Calculate the variance matrix V of the MFCC feature vector of the breathing segment of legal speaker:
Step 9.2: the MFCC feature vector of all breath sounds after calculating being precisely separated of unknown sound bite is denoted as (b1,b2,...,bn), biFor the MFCC cepstrum matrix of the breath sound after being precisely separated for i-th of unknown sound bite;
Step 9.3: to the feature vector (a of the breathing segment of legal speaker1,a2,...,an) it is normalized:
Wherein, r and c are respectively indicatedRow and column,For akNormalization difference matrix, k ∈ [1,2 ..., n], i ∈[1,2,…,r],j∈[1,2,…,c];
Step 9.4: by (Sa1,Sa2,...,San) ascending order arrangement is carried out, obtain (S1,S2,...,Sn);
Step 9.5: to the MFCC feature vectors of all breath sounds after being precisely separated of unknown sound bite (b1, B2 ..., bn) it is normalized:
Wherein, r and c are respectively indicatedRow and column,For bkNormalization difference matrix, k ∈ [1,2 ..., n], i ∈[1,2,…,r],j∈[1,2,…,c];
Step 9.6: calculating the similarity degree Pk of bk and reference template: by SbkWith ordered vector (S1, S2 ..., Sn) Element is compared one by one, and Pk is that the number of elements in ordered vector less than Sbk is total divided by element, calculates the average value of Pk, The similarity of the breathing segment of breath sound and legal speaker after obtaining being precisely separated of unknown sound bite.
In above scheme, each speaker in the test sample and speaker's sample database of selection is calculated in the step 7 Breath sound and speaker after calculating being precisely separated of the unknown sound bite in the method for the similarity of breath sample and step 5 The method of the similarity of each speaker's breath sample is identical in sample database.
In above scheme, the method that MFCC is calculated in the step 1.3 and step 5.2 includes: that will need to calculate MFCC's Signal carries out Fast Fourier Transform (FFT), complicated sine curve coefficient is then calculated, finally by the filter group based on melscale It is exported.
In conclusion by adopting the above-described technical solution, the beneficial effects of the present invention are:
1) present invention as a set of Verification System based on breathing, paid close attention to by the uniqueness for realizing human body respiration for the first time And research, and it is effectively applied the development and utilization that the speaker Recognition Technology based on breathing is overcome in Speaker Recognition System " extraction of breath signal " and " breath signal processing " two faced is challenged greatly.
2) the present invention is based on the knowledge of mathematical statistics, devise a light similarity algorithm for decision: the calculation Method is a series of simple vector operations using MFCC Mean Matrix and variance matrix.Compared with traditional classification algorithm, this hair Similarity algorithm in bright has more preferably classification performance.
3) present invention can operate with speaker identification's experiment and speaker verification's experiment;Simultaneously because if people's exhales Haustorium official is interfered, then his breathing signature may be modified, therefore the invention can be used for judging human body respiration organ Whether it is interfered.
4) present invention can be achieved to need the identification under mute occasion.
5) present invention can be achieved can not sounding tester identification.
6) classification method that uses of the present invention is opposite with traditional complex model classification side based on multi-parameter, more assumed Method has lower time complexity and space complexity.In addition, the present invention uses the algorithm process data based on MFCC more Fastly, required training sample is less, and ensures recognition accuracy, thus Speaker Recognition System provided by the invention is simple and efficient, And recognition result is accurate and reliable.
Detailed description of the invention
Fig. 1 is the system framework figure for judging whether unknown speaker identity is legal in the present invention;
Fig. 2 is the frame diagram for breathing Preliminary detection in the present invention in step 2;
Fig. 3 is to breathe the frame diagram finally detected in the present invention in step 3;
Fig. 4 is the experimental result schematic table of step 6-8 in the present invention;
Fig. 5 indicates that Meier filter group acts on the comparison after breath signal and non-respiratory voice signal in the present invention;
Fig. 6 indicates the characteristics of ZCR, spectrum slope and STE in the present invention;
Fig. 7 indicates the formant of breath signal and the voice signal of non-respiratory in the present invention;
Fig. 8 shows the breath signals under the breathing and abnormal condition under normal condition in the present invention;
Specific embodiment
All features disclosed in this specification can be with any other than mutually exclusive feature and/or step Mode combines.
It elaborates below with reference to Fig. 1-8 couples of present invention.
The invention proposes a kind of method for distinguishing speek person based on respiratory characteristic, which takes applied to Speaker Identification Obtain good effect.The realization schematic diagram of entire algorithm similar to Fig. 1, comprising steps of
Step 1: such as Fig. 1, inputting breath sample collection, sub-frame processing is carried out to breath sample collection, obtain breathing frame, pass through plum Your frequency cepstral coefficient MFCC will breathe frame and be established as breathing template;Step 1 specifically includes the following steps:
Step 1.1: the breath sample collection is divided into the breathing frame that length is 100 milliseconds by input breath sample collection, will be every A breathing frame is divided into continuous and overlapped breathing subframe again, and each subframe lengths that breathe is 10ms, and adjacent breather Overlapped length is 5ms between frame;
Step 1.2 carries out preemphasis to each breathing subframe using first-order difference filter, the breather after obtaining preemphasis Frame;Wherein, first-order difference filter H:
H (z)=1- α z-1
Wherein, α is pre-emphasis parameters α ≈ 0.095, and z is signal sampling point data;
Step 1.3: MFCC being calculated to the breathing subframe after each preemphasis of each breathing frame, obtains each breathing frame Cepstrum matrix in short-term removes DC component to each column of the matrix of cepstrum in short-term of each breathing frame, obtains each breathing frame MFCC cepstrum matrix;
Step 1.4: calculate the Mean Matrix T of breath sample collection:
Wherein, N represents the number that breath sample concentrates breathing frame, and M (Xi) indicates the MFCC cepstrum square of i-th of breathing frame Battle array, i ∈ [1,2 ..., N];
Calculate the variance matrix V of breath sample collection:
Step 1.5: the MFCC cepstrum matrix series connection by all breathing frames is matrix M one bigb: Mb=[M (X1),…,M (Xi),M(Xi+1),…,M(XN)]
Singular value decomposition is carried out to the big matrix:
Mb=U Σ V*
Wherein, U is m × m rank unitary matrice;Σ is positive semidefinite m × n rank diagonal matrix, and V* indicates the conjugate transposition of V, be n × N rank unitary matrice, the element on Σ diagonal line is { λ123..., the as singular value of M obtains singular value vector { λ12, λ3,…};
With maximum singular value λmThe singular value vector is normalized, the singular value vector after finally being normalizedWherein, λm=max { λ123,…};
Step 1.6: obtaining one group of breathing template, the breathing template includes singular value vector S, the breathing sample after normalization The variance matrix V of this collection and Mean Matrix T of breath sample collection.
Step 2: such as Fig. 2, unknown input sound bite carries out sub-frame processing to unknown sound bite, obtains unknown voice Frame calculates the similarity of each unknown speech frame and breathing template, calculates the zero-crossing rate ZCR and unknown language of unknown sound bite The short-time energy E of tablet section;According to unknown sound bite and the breathing similarity of template, Bm, unknown sound bite zero-crossing rate The short-time energy E of ZCR and unknown sound bite filters out the breath sound in unknown sound bite, the breath sound group that filters out At the breath sound after initial gross separation;
The step 2 the following steps are included:
Step 2.1: unknown input sound bite, to unknown sound bite carry out sub-frame processing, obtain unknown speech frame with Unknown speech subframe calculates the similarity B (X, T, V, S) of each unknown speech frame and breathing template;
Each breathing frame of breath sample collection and the similarity of breathing template are calculated, taking minimum similarity degree is Bm;
Calculate each unknown speech frame short-time energy E:
Wherein, n indicates that n-th of sampled point of signal, x [n] indicate that n-th of speech sample signal, N indicate that the window of sample is long Degree, N0The window start for indicating sample is N0A sampled point;
Calculate the average value of all unknown speech frames
Calculate the zero-crossing rate ZCR of unknown speech frame:
Wherein, n indicates that n-th of sampled point of signal, x [n] indicate that n-th of speech sample signal, N indicate that the window of sample is long Degree, N0The window start for indicating sample is N0A sampled point;
The method that the similarity of breathing frame or unknown speech frame and breathing template is calculated in the step 2.1 includes following step It is rapid:
Step 2.1.1: breath sample collection or unknown sound bite are divided by input breath sample collection or unknown sound bite Each breathing frame or unknown speech frame are divided into continuous and phase mutual respect by the breathing frame or unknown speech frame that length is 100 milliseconds again Folded breathing subframe or unknown speech subframe, each to breathe subframe or unknown speech subframe length is 10ms and adjacent is unknown Overlapped length is 5ms between speech subframe;
Step 2.1.2: preemphasis is carried out to each unknown speech subframe using first-order difference filter, after obtaining preemphasis Breathe frame or unknown speech frame;Wherein, first-order difference filter H:
H (z)=1- α z-1
Wherein, α is pre-emphasis parameters α ≈ 0.095;Z is signal sampling point data;
Step 2.1.3: to it is each breathing frame or unknown speech frame each preemphasis after breathing subframe or unknown voice Subframe calculates MFCC, the cepstrum matrix in short-term of each breathing frame or unknown speech frame is obtained, to each breathing frame or unknown voice The each column of the matrix of cepstrum in short-term of frame remove DC component, obtain the MFCC cepstrum matrix M of each breathing frame or unknown speech frame (X)
Step 2.1.4: a breathing frame or unknown speech frame X are chosen;
Step 2.1.5: the normalization difference matrix D of the breathing frame or unknown speech frame that are selected is calculated:
Wherein, T indicates that the Mean Matrix of breath sample collection, V indicate that breath sample collection variance matrix, M (X) are quilt The breathing frame of selection or the MFCC cepstrum matrix of unknown speech frame;
Step 2.1.6: each column of D are multiplied with half Hamming window, the cepstrum coefficient of low frequency is made to be strengthened:
D (:, j)=D (:, j) hamming, j ∈ [1, NC]
Wherein, Nc indicates the MFCC number of parameters in each breathing subframe or unknown speech subframe, the i.e. columns of D; Hamming indicates Hamming window.
Step 2.1.7: calculate be selected breathing frame or unknown speech frame X and breathing template similarity B (X, T, V, S component Cp):
Wherein, n indicates the quantity that subframe or unknown speech subframe are breathed in the breathing frame or unknown speech frame X being selected, k ∈ [1, n], DkjIndicate j-th of MFCC ginseng in k-th of the breathing subframe or unknown speech subframe of frame to be breathed or unknown speech frame X Number;
Calculate another component of the similarity B (X, T, V, S) of the breathing frame being selected or unknown speech frame X and breathing template Cn:
Step 2.1.8: the similarity B (X, T, V, S) of breathing frame or unknown speech frame X and breathing template are calculated:
B (X, T, V, S)=Cp*Cn;
Step 2.1.9: choosing the MFCC cepstrum matrix of another breathing frame or unknown speech frame, repeats step 2.1.5- 2.1.8;
Step 2.1.10: repeating step 2.1.9, until obtaining the phase of all breathing frames or unknown speech frame with breathing template Like degree;
In above scheme, the value that the border detection algorithm of the false low ebb of elimination utilizes in the step 3 includes that breathing continues Time threshold, energy threshold, zero-crossing rate ZCR bound threshold value and spectrum slope accurately find breathing boundary, step 3 benefit With Binary Zero -1, accurately instruction is breathed in the position of current speech segment.
Step 2.2: choosing a unknown speech frame;
Step 2.3: if the unknown speech frame being selected and the similarity B (X, T, V, S) of breathing template are greater than threshold value Bm/2, And the zero-crossing rate ZCR of unknown sound bite is less than 0.25 (sample rate is 44kHz at this time), and the unknown speech frame being selected Short-time energy E be less than all unknown speech frames average valueThe unknown speech frame being selected then is judged as breath sound, if not Meet the condition, then judges the unknown speech frame being selected as non-respiratory sound.
Step 2.4: other unknown speech frames are chosen, step 2.3 is repeated, it is all unknown in unknown sound bite until judging Whether speech frame is breath sound;
Step 2.5: retaining breath sound, reject non-respiratory sound, obtain initial gross separation breath sound;
Step 3: such as Fig. 3, utilizing the heavy of the breath sound after the border detection algorithm detection initial gross separation for eliminating false low ebb Silent gap rejects the false positive part in the breath sound after initial gross separation, the breathing after being precisely separated according to silencing gap Sound;The 3rd phase of volume 15 in border detection algorithm specific implementation such as in the March, 2007 for eliminating false low ebb " An effective in IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING algorithm for automatic detection and exact demarcation of breath sounds in Speech and song " text;
Step 4: choosing one group of sample speaker, acquire the breathing segment of each sample speaker, establish one group of speaker Sample database carries out step 5 if need to determine whether the speaker of unknown sound bite comes from sample speaker;If needing to determine Whether the speaker of unknown sound bite is legal speaker, carries out step 6;
Step 5: in the breath sound and speaker's sample database after calculating being precisely separated of the unknown sound bite The similarity of each speaker's breath sample takes the corresponding sample of maximum similarity to speak the speaker of artificial unknown sound bite, Terminate;
In breath sound and speaker's sample database after calculating being precisely separated of the unknown sound bite in the step 5 Speaker's breath sample similarity the following steps are included:
Step 5.1: setting the MFCC feature vector of speaker's breath sample in speaker's sample database as (a1, a2,...,an), calculate the Mean Matrix M of the MFCC feature vector of speaker's breath sample in speaker's sample database:
Wherein, aiFor i-th MFCC of the MFCC feature vector of speaker's breath sample in speaker's sample database Cepstrum matrix, n represent the MFCC cepstrum square in the MFCC feature vector of speaker's breath sample in speaker's sample database The number of battle array, i ∈ [1,2 ..., n];
Calculate the variance matrix V of the MFCC feature vector of speaker's breath sample in speaker's sample database:
Step 5.2: the MFCC feature vector of all breath sounds after calculating being precisely separated of unknown sound bite is denoted as (b1,b2,...,bn), biFor the MFCC cepstrum matrix of the breath sound after being precisely separated for i-th of unknown sound bite;
Step 5.3: to the feature vector (a of speaker's breath sample in speaker's sample database1,a2,..., an) It is normalized:
Wherein, r and c are respectively indicatedRow and column,For akNormalization difference matrix, k ∈ [1,2 ..., n], i ∈[1,2,…,r],j∈[1,2,…,c];
Step 5.4: by (Sa1,Sa2,...,San) ascending order arrangement is carried out, obtain (S1,S2,...,Sn);
Step 5.5: to the MFCC feature vectors of all breath sounds after being precisely separated of unknown sound bite (b1, B2 ..., bn) it is normalized:
Wherein, r and c are respectively indicatedRow and column,For bkNormalization difference matrix, k ∈ [1,2 ..., n], i ∈[1,2,…,r],j∈[1,2,…,c];
Step 5.6: calculating the similarity degree Pk of bk and reference template: by SbkWith ordered vector (S1, S2 ..., Sn) Element is compared one by one, and Pk is that the number of elements in ordered vector less than Sbk is total divided by element, calculates the average value of Pk, The similarity of the sample in breath sound and breath sample database after obtaining being precisely separated of unknown sound bite.
Step 6: to each sample speaker's collecting test sample, choosing a test sample;
Step 7: such as Fig. 4, the test sample and each speaker's breath sample in speaker's sample database for calculating selection Similarity, take the maximum in the similarity of each speaker's breath sample in the test sample and speaker's sample database Value, obtains a maximum similarity;
The test sample and each speaker's breath sample in speaker's sample database that selection is calculated in the step 7 Breath sound and speaker's sample database after calculating being precisely separated of the unknown sound bite in the method for similarity and step 5 In each speaker's breath sample similarity method it is identical.
Step 8: such as Fig. 4, another test sample is chosen, repeats step 7, it is corresponding most until obtaining all test samples Big similarity obtains maximum similarity group;
Step 9: acquire the breathing segment of legal speaker, the breath sound after calculating being precisely separated of unknown sound bite with The similarity of the breathing segment of legal speaker;
The breathing segment of breath sound and legal speaker after calculating being precisely separated of unknown sound bite in the step 9 Similarity the following steps are included:
Step 9.1: setting the MFCC feature vector of the breathing segment of legal speaker as (a1,a2,...,an), it is legal to calculate The Mean Matrix M of the MFCC feature vector of the breathing segment of speaker:
Wherein, aiFor i-th of MFCC cepstrum matrix of the MFCC feature vector of the breathing segment of legal speaker, n is represented The number of MFCC cepstrum matrix in the MFCC feature vector of the breathing segment of legal speaker, i ∈ [1,2 ..., n];
Calculate the variance matrix V of the MFCC feature vector of the breathing segment of legal speaker:
Step 9.2: the MFCC feature vector of all breath sounds after calculating being precisely separated of unknown sound bite is denoted as (b1,b2,...,bn), biFor the MFCC cepstrum matrix of the breath sound after being precisely separated for i-th of unknown sound bite;
Step 9.3: to the feature vector (a of the breathing segment of legal speaker1,a2,...,an) it is normalized:
Wherein, r and c are respectively indicatedRow and column,For akNormalization difference matrix, k ∈ [1,2 ..., n], i ∈[1,2,…,r],j∈[1,2,…,c];
Step 9.4: by (Sa1,Sa2,...,San) ascending order arrangement is carried out, obtain (S1,S2,...,Sn);
Step 9.5: to the MFCC feature vectors of all breath sounds after being precisely separated of unknown sound bite (b1, B2 ..., bn) it is normalized:
Wherein, r and c are respectively indicatedRow and column,For bkNormalization difference matrix, k ∈ [1,2 ..., n], i ∈[1,2,…,r],j∈[1,2,…,c];
Step 9.6: calculating the similarity degree Pk of bk and reference template: by SbkWith ordered vector (S1, S2 ..., Sn) Element is compared one by one, and Pk is that the number of elements in ordered vector less than Sbk is total divided by element, calculates the average value of Pk, The similarity of the breathing segment of breath sound and legal speaker after obtaining being precisely separated of unknown sound bite.
Step 10: if the breath sound after being precisely separated of unknown sound bite is similar to the breathing segment of legal speaker Degree is greater than the minimum value of maximum similarity group, then the speaker of unknown sound bite is accredited as legal speaker, otherwise is non- Method speaker.
The method that MFCC is calculated in the step 1.3 and step 5.2 includes: to carry out the signal for needing to calculate MFCC quickly Then Fourier transformation calculates complicated sine curve coefficient, is finally exported by the filter group based on melscale.
The present invention has been explained by the above embodiments, but it is to be understood that, above-described embodiment is only intended to The purpose of citing and explanation, is not intended to limit the invention to the scope of the described embodiments.Furthermore those skilled in the art It is understood that the present invention is not limited to the above embodiments, introduction according to the present invention can also be made more kinds of member Variants and modifications, all fall within the scope of the claimed invention for these variants and modifications.Protection scope of the present invention by The appended claims and its equivalent scope are defined.

Claims (9)

1. a kind of method for distinguishing speek person based on respiratory characteristic, it is characterised in that the following steps are included:
Step 1: input breath sample collection carries out sub-frame processing to breath sample collection, obtains breathing frame, passes through mel-frequency cepstrum Coefficient MFCC will breathe frame and be established as breathing template, and it is similar to breathing template to calculate the breathing frame that each breath sample collection obtains Degree, obtains its minimum value Bm;
Step 2: unknown input sound bite carries out sub-frame processing to unknown sound bite, obtains unknown speech frame, calculates each The similarity of unknown speech frame and breathing template;Calculate the zero-crossing rate ZCR of unknown speech frame and in short-term capable of for unknown speech frame Measure E;According to the similarity of unknown speech frame and breathing template, Bm, the zero-crossing rate ZCR of unknown speech frame and unknown speech frame Short-time energy E filters out the breath sound in unknown sound bite, the breath sound after the breath sound composition initial gross separation filtered out;
Step 3: the silencing gap of the breath sound after initial gross separation is detected using the border detection algorithm for eliminating false low ebb, according to Reject the false positive part in the breath sound after initial gross separation, the breath sound after being precisely separated in silencing gap;
Step 4: choosing one group of sample speaker, acquire the breathing segment of each sample speaker, establish one group of speaker's sample Database carries out step 5 if need to determine whether the speaker of unknown sound bite comes from sample speaker;If need to determine unknown Whether the speaker of sound bite is legal speaker, carries out step 6;
Step 5: each theory in the breath sound and speaker's sample database after calculating being precisely separated of the unknown sound bite The similarity for talking about people's breath sample takes the corresponding sample of maximum similarity to speak the speaker of artificial unknown sound bite, terminates;
Step 6: to each sample speaker's collecting test sample, choosing a test sample;
Step 7: calculating the similarity of each speaker's breath sample in the test sample and speaker's sample database of selection, take Maximum value in the test sample and speaker's sample database in the similarity of each speaker's breath sample, obtains one Maximum similarity;
Step 8: another test sample is chosen, step 7 is repeated, until obtaining the corresponding maximum similarity of all test samples, Obtain maximum similarity group;
Step 9: acquiring the sound bite of legal speaker, the breathing segment of legal speaker is extracted using breath sample collection, count The similarity of the breathing segment of breath sound and legal speaker after calculating being precisely separated of unknown sound bite;
Step 10: if breath sound after being precisely separated of unknown sound bite and the similarity of the breathing segment of legal speaker are big In the minimum value of maximum similarity group, then the speaker of unknown sound bite is accredited as legal speaker, otherwise illegally to say Talk about people.
2. according to claim 1 based on the method for distinguishing speek person of respiratory characteristic, which is characterized in that the step 1 includes Following steps:
Step 1.1: the breath sample collection is divided into the breathing frame that length is 100 milliseconds, exhaled each by input breath sample collection Inhale frame and be divided into continuous and overlapped breathing subframe again, each subframe lengths that breathe are 10ms, and adjacent breathing subframe it Between overlapped length be 5ms;
Step 1.2 carries out preemphasis to each breathing subframe using first-order difference filter, the breathing subframe after obtaining preemphasis;Its In, first-order difference filter H:
H (z)=1- α z-1
Wherein, α is pre-emphasis parameters α ≈ 0.095, and z is signal sampling point data;
Step 1.3: MFCC being calculated to the breathing subframe after each preemphasis of each breathing frame, obtains each breathing frame in short-term Cepstrum matrix removes DC component to each column of the matrix of cepstrum in short-term of each breathing frame, and the MFCC for obtaining each breathing frame falls Spectrum matrix;
Step 1.4: calculate the Mean Matrix T of breath sample collection:
Wherein, N represents the number that breath sample concentrates breathing frame, and M (Xi) indicates the MFCC cepstrum matrix of i-th of breathing frame, i ∈ [1,2,…,N];
Calculate the variance matrix V of breath sample collection:
Step 1.5: the MFCC cepstrum matrix series connection by all breathing frames is matrix M one bigb:
Mb=[M (X1),…,M(Xi),M(Xi+1),…,M(XN)]
Singular value decomposition is carried out to the big matrix:
Mb=U Σ V*
Wherein, U is m × m rank unitary matrice;Σ is positive semidefinite m × n rank diagonal matrix, and it is n × n rank that V*, which indicates the conjugate transposition of V, Unitary matrice, the element on Σ diagonal line is { λ123..., the as singular value of M obtains singular value vector { λ12, λ3,…};
With maximum singular value λmThe singular value vector is normalized, the singular value vector after finally being normalizedWherein, λm=max { λ123,…};
Step 1.6: obtaining one group of breathing template, the breathing template includes singular value vector S, the breath sample collection after normalization Variance matrix V and breath sample collection Mean Matrix T.
3. according to claim 1 based on the method for distinguishing speek person of respiratory characteristic, which is characterized in that the step 2 includes Following steps:
Step 2.1: unknown input sound bite, to unknown sound bite carry out sub-frame processing, obtain unknown speech frame with it is unknown Speech subframe calculates the similarity B (X, T, V, S) of each unknown speech frame and breathing template;Calculate each of breath sample collection It breathes frame and breathes the similarity of template, taking minimum similarity degree is Bm;
Calculate each unknown speech frame short-time energy E:
Wherein, n indicates that n-th of sampled point of signal, x [n] indicate that n-th of speech sample signal, N indicate the length of window of sample, N0 The window start for indicating sample is N0A sampled point;
Calculate the average value of all unknown speech frames
Calculate the zero-crossing rate ZCR of unknown speech frame:
Wherein, n indicates that n-th of sampled point of signal, x [n] indicate that n-th of speech sample signal, N indicate the length of window of sample, N0 The window start for indicating sample is N0A sampled point;
Step 2.2: choosing a unknown speech frame;
Step 2.3: if the unknown speech frame being selected and the similarity B (X, T, V, S) of breathing template are greater than threshold value Bm/2, and The zero-crossing rate ZCR of unknown speech frame is less than 0.25, and the short-time energy E of unknown speech frame being selected is less than all unknown languages The average value of sound frameThen judge the unknown speech frame being selected as breath sound;If being unsatisfactory for above-mentioned condition, judgement is selected Unknown speech frame be non-respiratory sound, wherein X indicates that breathing frame or unknown speech frame, T indicate the mean value square of breath sample collection Battle array, V indicate that the variance matrix of speaker's breath sample, S indicate the singular value vector after normalization;
Step 2.4: choosing other unknown speech frames, step 2.3 is repeated, until judging all unknown voices in unknown sound bite Whether frame is breath sound;
Step 2.5: retaining breath sound, reject non-respiratory sound, obtain initial gross separation breath sound.
4. according to claim 3 based on the method for distinguishing speek person of respiratory characteristic, which is characterized in that in the step 2.1 Calculate breathing frame or unknown speech frame and breathing template similarity method the following steps are included:
Step 2.1.1: breath sample collection or unknown sound bite are divided into length by input breath sample collection or unknown sound bite For 100 milliseconds of breathing frame or unknown speech frame, each breathing frame or unknown speech frame are divided into again continuous and overlapped Subframe or unknown speech subframe are breathed, each subframe or unknown speech subframe length of breathing is 10ms, and adjacent unknown voice Overlapped length is 5ms between subframe;
Step 2.1.2: preemphasis is carried out to each unknown speech subframe using first-order difference filter, the breathing after obtaining preemphasis Frame or unknown speech frame;Wherein, first-order difference filter H:
H (z)=1- α z-1
α is pre-emphasis parameters α ≈ 0.095;Z is signal sampling point data;
Step 2.1.3: to it is each breathing frame or unknown speech frame each preemphasis after breathing subframe or unknown speech subframe MFCC is calculated, the cepstrum matrix in short-term of each breathing frame or unknown speech frame is obtained, to each breathing frame or unknown speech frame The each column of cepstrum matrix remove DC component in short-term, obtain the MFCC cepstrum matrix M (X) of each breathing frame or unknown speech frame;
Step 2.1.4: a breathing frame or unknown speech frame X are chosen;
Step 2.1.5: the normalization difference matrix D of the breathing frame or unknown speech frame that are selected is calculated:
Wherein, T indicates that the Mean Matrix of breath sample collection, V indicate that breath sample collection variance matrix, M (X) are selected Breathe the MFCC cepstrum matrix of frame or unknown speech frame;
Step 2.1.6: each column of D are multiplied with half Hamming window, the cepstrum coefficient of low frequency is made to be strengthened:
D (:, j)=D (:, j) hamming, j ∈ [1, NC]
Wherein, Nc indicates the MFCC number of parameters in each breathing subframe or unknown speech subframe, the i.e. columns of D;Hamming table Show Hamming window;
Step 2.1.7: point of the similarity B (X, T, V, S) of the breathing frame being selected or unknown speech frame X and breathing template is calculated Measure Cp:
Wherein, n indicates the quantity that subframe or unknown speech subframe are breathed in the breathing frame or unknown speech frame X being selected, k ∈ [1, n], DkjIndicate j-th of MFCC ginseng in k-th of the breathing subframe or unknown speech subframe of frame to be breathed or unknown speech frame X Number;
Calculate another component Cn of the similarity B (X, T, V, S) of the breathing frame being selected or unknown speech frame X and breathing template:
Step 2.1.8: the similarity B (X, T, V, S) of breathing frame or unknown speech frame X and breathing template are calculated:
B (X, T, V, S)=Cp*Cn;
Step 2.1.9: choosing the MFCC cepstrum matrix of another breathing frame or unknown speech frame, repeats step 2.1.5-2.1.8;
Step 2.1.10: repeating step 2.1.9, similar to breathing template until obtaining all breathing frames or unknown speech frame Degree.
5. according to claim 1 based on the method for distinguishing speek person of respiratory characteristic described in any one of -4, which is characterized in that institute Stating and eliminating the value that the border detection algorithm of false low ebb utilizes in step 3 includes breathing duration threshold, energy threshold, zero passage Rate ZCR bound threshold value and spectrum slope accurately find breathing boundary, and the step 3 using Binary Zero -1, accurately exhale by instruction It inhales in the position of current speech segment.
6. according to claim 1 based on the method for distinguishing speek person of respiratory characteristic described in any one of -4, which is characterized in that institute It states and each speaks in breath sound and speaker's sample database after calculating being precisely separated of the unknown sound bite in step 5 The similarity of people's breath sample the following steps are included:
Step 5.1: setting the MFCC feature vector of speaker's breath sample in speaker's sample database as (a1,a2,..., an), calculate the Mean Matrix M of the MFCC feature vector of speaker's breath sample in speaker's sample database:
Wherein, aiFor i-th of MFCC cepstrum square of the MFCC feature vector of speaker's breath sample in speaker's sample database Battle array, n represent of the MFCC cepstrum matrix in the MFCC feature vector of speaker's breath sample in speaker's sample database Number, i ∈ [1,2 ..., n];
Calculate the variance matrix V of the MFCC feature vector of speaker's breath sample in speaker's sample database:
Step 5.2: the MFCC feature vector of all breath sounds after calculating being precisely separated of unknown sound bite is denoted as (b1, b2,...,bn), biFor the MFCC cepstrum matrix of the breath sound after being precisely separated for i-th of unknown sound bite;
Step 5.3: to the feature vector (a of speaker's breath sample in speaker's sample database1,a2,...,an) returned One changes:
Wherein, r and c are respectively indicatedRow and column,For akNormalization difference matrix, k ∈ [1,2 ..., n], i ∈ [1, 2,…,r],j∈[1,2,…,c];
Step 5.4: by (Sa1,Sa2,...,San) ascending order arrangement is carried out, obtain (S1,S2,...,Sn);
Step 5.5: to the MFCC feature vectors of all breath sounds after being precisely separated of unknown sound bite (b1, b2 ..., Bn it) is normalized:
Wherein, r and c are respectively indicatedRow and column,For bkNormalization difference matrix, k ∈ [1,2 ..., n], i ∈ [1, 2,…,r],j∈[1,2,…,c];
Step 5.6: calculating the similarity degree Pk of bk and reference template: by SbkWith the element of ordered vector (S1, S2 ..., Sn) into Row compares one by one, and Pk is that the number of elements in ordered vector less than Sbk is total divided by element, calculates the average value of Pk, obtains not The similarity of the sample in breath sound and breath sample database after knowing being precisely separated of sound bite.
7. according to claim 1 based on the method for distinguishing speek person of respiratory characteristic described in any one of -4, which is characterized in that institute State the similarity packet of the breath sound after calculating being precisely separated of unknown sound bite in step 9 and the breathing segment of legal speaker Include following steps:
Step 9.1: setting the MFCC feature vector of the breathing segment of legal speaker as (a1,a2,...,an), calculate legal speak The Mean Matrix M of the MFCC feature vector of the breathing segment of people:
Wherein, aiFor i-th of MFCC cepstrum matrix of the MFCC feature vector of the breathing segment of legal speaker, n represents legal theory Talk about the number of the MFCC cepstrum matrix in the MFCC feature vector of the breathing segment of people, i ∈ [1,2 ..., n];
Calculate the variance matrix V of the MFCC feature vector of the breathing segment of legal speaker:
Step 9.2: the MFCC feature vector of all breath sounds after calculating being precisely separated of unknown sound bite is denoted as (b1, b2,...,bn), biFor the MFCC cepstrum matrix of the breath sound after being precisely separated for i-th of unknown sound bite;
Step 9.3: to the feature vector (a of the breathing segment of legal speaker1,a2,...,an) it is normalized:
Wherein, r and c are respectively indicatedRow and column,For akNormalization difference matrix, k ∈ [1,2 ..., n], i ∈ [1, 2,…,r],j∈[1,2,…,c];
Step 9.4: by (Sa1,Sa2,...,San) ascending order arrangement is carried out, obtain (S1,S2,...,Sn);
Step 9.5: to the MFCC feature vectors of all breath sounds after being precisely separated of unknown sound bite (b1, b2 ..., Bn it) is normalized:
Wherein, r and c are respectively indicatedRow and column,For bkNormalization difference matrix, k ∈ [1,2 ..., n], i ∈ [1, 2,…,r],j∈[1,2,…,c];
Step 9.6: calculating the similarity degree Pk of bk and reference template: by SbkWith the element of ordered vector (S1, S2 ..., Sn) into Row compares one by one, and Pk is that the number of elements in ordered vector less than Sbk is total divided by element, calculates the average value of Pk, obtains not The similarity of the breathing segment of breath sound and legal speaker after knowing being precisely separated of sound bite.
8. according to claim 1 based on the method for distinguishing speek person of respiratory characteristic described in any one of -4, which is characterized in that institute The method for stating the similarity of each speaker's breath sample in the test sample for calculating selection in step 7 and speaker's sample database It is exhaled with the breath sound after being precisely separated of the unknown sound bite is calculated in step 5 with each speaker in speaker's sample database The method for inhaling the similarity of sample is identical.
9. based on the method for distinguishing speek person of respiratory characteristic according to claim 6, which is characterized in that the step 1.3 Include: that the signal for needing to calculate MFCC is subjected to Fast Fourier Transform (FFT) with the method for calculating MFCC in step 5.2, then calculates Complicated sine curve coefficient is finally exported by the filter group based on melscale.
CN201610626034.0A 2016-08-02 2016-08-02 A kind of method for distinguishing speek person based on respiratory characteristic Active CN106297805B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610626034.0A CN106297805B (en) 2016-08-02 2016-08-02 A kind of method for distinguishing speek person based on respiratory characteristic

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610626034.0A CN106297805B (en) 2016-08-02 2016-08-02 A kind of method for distinguishing speek person based on respiratory characteristic

Publications (2)

Publication Number Publication Date
CN106297805A CN106297805A (en) 2017-01-04
CN106297805B true CN106297805B (en) 2019-07-05

Family

ID=57664264

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610626034.0A Active CN106297805B (en) 2016-08-02 2016-08-02 A kind of method for distinguishing speek person based on respiratory characteristic

Country Status (1)

Country Link
CN (1) CN106297805B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110473563A (en) * 2019-08-19 2019-11-19 山东省计算中心(国家超级计算济南中心) Breathing detection method, system, equipment and medium based on time-frequency characteristics
CN111568400B (en) * 2020-05-20 2024-02-09 山东大学 Human body sign information monitoring method and system

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1547191A (en) * 2003-12-12 2004-11-17 北京大学 Semantic and sound groove information combined speaking person identity system
JP2005530214A (en) * 2002-06-19 2005-10-06 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Mega speaker identification (ID) system and method corresponding to its purpose
CN101770774A (en) * 2009-12-31 2010-07-07 吉林大学 Embedded-based open set speaker recognition method and system thereof
CN102324232A (en) * 2011-09-12 2012-01-18 辽宁工业大学 Method for recognizing sound-groove and system based on gauss hybrid models
CN102486922A (en) * 2010-12-03 2012-06-06 株式会社理光 Speaker recognition method, device and system
CN103280220A (en) * 2013-04-25 2013-09-04 北京大学深圳研究生院 Real-time recognition method for baby cry
CN104112446A (en) * 2013-04-19 2014-10-22 华为技术有限公司 Breathing voice detection method and device

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9704495B2 (en) * 2012-02-21 2017-07-11 Tata Consultancy Services Limited Modified mel filter bank structure using spectral characteristics for sound analysis

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2005530214A (en) * 2002-06-19 2005-10-06 コーニンクレッカ フィリップス エレクトロニクス エヌ ヴィ Mega speaker identification (ID) system and method corresponding to its purpose
CN1547191A (en) * 2003-12-12 2004-11-17 北京大学 Semantic and sound groove information combined speaking person identity system
CN101770774A (en) * 2009-12-31 2010-07-07 吉林大学 Embedded-based open set speaker recognition method and system thereof
CN102486922A (en) * 2010-12-03 2012-06-06 株式会社理光 Speaker recognition method, device and system
CN102324232A (en) * 2011-09-12 2012-01-18 辽宁工业大学 Method for recognizing sound-groove and system based on gauss hybrid models
CN104112446A (en) * 2013-04-19 2014-10-22 华为技术有限公司 Breathing voice detection method and device
CN103280220A (en) * 2013-04-25 2013-09-04 北京大学深圳研究生院 Real-time recognition method for baby cry

Also Published As

Publication number Publication date
CN106297805A (en) 2017-01-04

Similar Documents

Publication Publication Date Title
Kinnunen Spectral features for automatic text-independent speaker recognition
Kumar et al. Design of an automatic speaker recognition system using MFCC, vector quantization and LBG algorithm
Bocklet et al. Automatic evaluation of parkinson's speech-acoustic, prosodic and voice related cues.
Patel et al. Speech recognition and verification using MFCC & VQ
Samantaray et al. A novel approach of speech emotion recognition with prosody, quality and derived features using SVM classifier for a class of North-Eastern Languages
Yusnita et al. Malaysian English accents identification using LPC and formant analysis
Zhao et al. Speaker identification from the sound of the human breath
Usman On the performance degradation of speaker recognition system due to variation in speech characteristics caused by physiological changes
CN106297805B (en) A kind of method for distinguishing speek person based on respiratory characteristic
Chamoli et al. Detection of emotion in analysis of speech using linear predictive coding techniques (LPC)
Kamble et al. Emotion recognition for instantaneous Marathi spoken words
Kadiri et al. Discriminating neutral and emotional speech using neural networks
Kumari et al. An efficient algorithm for Gender Detection using voice samples
Deshpande et al. Automatic Breathing Pattern Analysis from Reading-Speech Signals
Sahoo et al. Analyzing the vocal tract characteristics for out-of-breath speech
Kumar et al. Text dependent speaker identification in noisy environment
Mohamad Jamil et al. A flexible speech recognition system for cerebral palsy disabled
Dumpala et al. Analysis of the Effect of Speech-Laugh on Speaker Recognition System.
Tavi Prosodic cues of speech under stress: Phonetic exploration of finnish emergency calls
Kabir et al. Vector quantization in text dependent automatic speaker recognition using mel-frequency cepstrum coefficient
Stadelmann et al. Unfolding speaker clustering potential: a biomimetic approach
Julia et al. Detection of emotional expressions in speech
Ozdas Analysis of paralinguistic properties of speech for near-term suicidal risk assessment
Elisha et al. Automatic detection of obstructive sleep apnea using speech signal analysis
Patil et al. Person recognition using humming, singing and speech

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant