CN110211606A

CN110211606A - A kind of Replay Attack detection method of voice authentication system

Info

Publication number: CN110211606A
Application number: CN201910303649.3A
Authority: CN
Inventors: 冀晓宇; 龙颜; 徐文渊; 闫琛
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2019-04-12
Filing date: 2019-04-12
Publication date: 2019-09-06
Anticipated expiration: 2039-04-12
Also published as: CN110211606B

Abstract

The invention discloses a kind of Replay Attack detection methods based on the polar voice authentication system of voice signal time domain.Pass through voice authentication system acquisition and recording voice signal, extract the positive signal and minus polarity signal of voice signal, the proportionate relationship judgement for comparing positive signal and minus polarity signal obtains voice signal and belongs to Replay Attack or living body voice: if positive-negative polarity fraction gap is larger and positive signal ratio is higher than minus polarity signal ratio, then it is assumed that be Replay Attack；If positive-negative polarity fraction gap is larger and positive signal ratio is not higher than minus polarity signal ratio, then it is assumed that be living body voice.The present invention can accurately and effectively detect the Replay Attack in voice authentication system.

Description

A kind of Replay Attack detection method of voice authentication system

Technical field

The invention belongs to voice authentication technology and security technology areas, and in particular to one kind is detectable to be directed to voice authentication system The software processing method of the Replay Attack of system.

Background technique

Voice authentication system is a kind of using voice authentication technology extraction speaker's voice specific characteristics, passes through voice spy Sign pattern match is to identify the security certification system of speaker's identity.Since it is low to hardware requirement, inexpensive, certification is simple It is convenient, the characteristics of remote contactless certification can be carried out, be increasingly becoming a kind of mainstream user's certification and access control side Formula.However, existing voice Verification System, is generally subject to Replay Attack.

Refer to that attacker prerecords the true legitimate user's speech samples of collection for the Replay Attack of voice authentication system Segment, by it directly or through splicing after, broadcasted by loudspeaker, to cheat voice authentication system.Replay Attack does not need to attack It hits promoter and grasps Speech processing knowledge, and with the development of electronic device technology, the loudspeaking of high quality and low cost Device has become more common, these factors all make Replay Attack become but prestige most simple for voice authentication system Coerce maximum attack；But meanwhile Replay Attack extremely difficult be found, defend again.

Detect and defend Replay Attack, it is to be understood that the sound-electric and electro-acoustic of microphone and loudspeaker transformation mechanism.Wheat Gram wind, loudspeaker etc. are for sound wave-electromagnetic signal conversion converter.Microphone passes through sound wave bring vibration of thin membrane, benefit With faraday's electromagnetic induction effect, vibration mechanical energy is converted to the electric energy of electric signal；Loudspeaker is then by this electric signal computer It is reversely converted into the kinetic energy of film, so that film disturbance air is formed sound wave, and then restore the sound being converted into before electric signal.

Ideally, microphone and loudspeaker are converted to complete reciprocal process, i.e., in following Fig. 1, acoustical signal 1 should be with Acoustical signal 2 is identical.But in the realistic case, both signals are often different.Lead to the main reason for distinguishing between the two There is two o'clock: 1) in the pathway for electrical signals of microphone and loudspeaker, just like power amplifier, input and output filter, ad/da converter etc. Circuit can introduce noise into electric signal；2) in vibrating membrane vibration realizing electricity-sound and sound-electricity conversion, a variety of mechanical resistances Power will cause the variation of its motor pattern, cause conversion front and back signal inconsistent.

Since in Replay Attack, voice signal (being here the abstract summation of acoustical signal and electric signal) goes out from by human hair To before being received by voice authentication system microphone, certification is directly carried out compared with living body user and has additionally gone through one group of wheat wind-loudspeaker Hardware is attacked, therefore the voice signal of Replay Attack will change band comprising more noises and by vibrating diaphragm motor pattern compared with living body authentication The distortion come.By detecting these distortions, it can theoretically detect, defend Replay Attack.

There are many correlative studys at present introduces noise by detection attack hardware to detect Replay Attack.This kind detection The characteristics of method is usually had Detection accuracy lower and is affected by Replay Attack using microphone and loudspeaker quality.So And do not have also on research concern attack device hardware access by the variation bring voice signal distortion of vibrating diaphragm motor pattern.

Summary of the invention

To solve technical problem present in above-mentioned background technique, the present invention provides one kind to be based on voice signal time domain pole Property voice authentication system Replay Attack detection method, be collected into the time domain pole of voice signal by detecting voice authentication system Property feature can accurately and effectively detect Replay Attack.

The present invention adopts the following technical scheme:

The present invention extracts the positive signal and cathode of voice signal by voice authentication system acquisition and recording voice signal Property signal, the proportionate relationship judgement for comparing positive signal and minus polarity signal obtains voice signal and belongs to Replay Attack (recording The sound that equipment issues) or living body voice (i.e. the sound of living body user sending):

If positive-negative polarity fraction gap is larger and positive signal ratio is higher than minus polarity signal ratio, then it is assumed that be Replay Attack；

If positive-negative polarity fraction gap is larger and positive signal ratio is not higher than minus polarity signal ratio, then it is assumed that It is living body voice.

The method is specific as follows:

1) speech activity inspection is carried out by the voice signal that the acquisition of certain sample frequency interval is collected into voice authentication system It surveys, removes the noise in voice signal, extract a part in voice audio signals as pure vocal sections；

The voice activity detection that the method for the present invention uses passes through signal amplitude and duration mainly to judge specified section language Sound signal is pure voice or noise.

2) polarity index calculating is carried out to the pure human voice signal of obtained time domain:

Pure human voice signal sequence S is the sequence comprising N number of sampled point, all sampled points that wherein sampled value is positive Number is N_pos, the absolute value of the sum of sampled value of all sampled points that sampled value is positive is | Sum_pos|, sampled value is negative all Number of sampling points is N_neg, the absolute value of the sum of sampled value of all sampled points that sampled value is negative is | Sum_neg|, use is following Formula manipulation obtains polarity number I:

3) by obtained polarity number I and default polarity thresholds I_thrCompare: when polarity number I is greater than polarity thresholds I_thr, sentence Break as living body voice；Otherwise, it is judged as Replay Attack.

The step 1) specifically:

1.2) extract voice signal Sa in all sampled value absolute values be greater than signal amplitude threshold value | Athr | groups of samples At First ray (Sa_i1,Sa_i2,Sa_i3,...Sa_ix), and have 1≤i₁<i₂<i₃<...<i_x≤ N, i are that sampled point is believed in voice Index numerical sequence in number Sa sequence, N indicate the sum of sampled point in voice signal Sa sequence；

1.3) to First ray (Sa_i1,Sa_i2,Sa_i3,...Sa_ix) in, initially with i-th₁A sampled point is as reference sample Point, first from i-th₁The index numerical sequence of a sampled point starts to traverse the index numerical sequence for finding each sampled point backward: if i-th_pIt is a The index numerical sequence of sampled point and i-th_(p-1)The difference of the index numerical sequence of a sampled point is greater than default ordinal number threshold value D₁, then by i_p-1A sampled point and i-th₁First ray (Sa between a sampled point_i1,Sa_i2,Sa_i3,...Sa_ix) in all groups of samples At the 1st sequence of subsets Ssub1；

1.4) then from i-th_pA sampled point is constantly repeated the above steps backward as beginning 1.3), by i-th_q(q >=p) is a Sampled point and its before closest to reference sample point between First ray (Sa_i1,Sa_i2,Sa_i3,...Sa_ix) in all adopt Sampling point forms next sequence of subsets, until traversal arrives last Sa_ixA sampled point finally obtains y-th of sequence of subsets Ssuby；

1.5) for the 1st sequence of subsets Ssub1 to y-th sequence of subsets Ssuby (y >=1), judge each sequence of subsets The difference of largest index numerical sequence and minimum index numerical sequence that wherein whether each sampled point meets sampled point is greater than default index Threshold value D₂, the difference of all largest index numerical sequences for meeting sampled point and minimum index numerical sequence finally will be greater than default index Threshold value D₂Sequence of subsets merge become pure human voice signal sequence S.

Present invention discover that, since human vocal cord vibration beep pattern is relatively fixed, Verification System is direct in living body authentication The living body voice recorded is presented that signal positive-negative polarity fraction gap is larger and positive signal ratio is higher than negative polarity substantially The characteristics of signal proportion.

And in Replay Attack, due to attacking device hardware access bring diaphragm oscillations patterns of change, voice signal The characteristics of basic positive-negative polarity fraction that presents is suitable, and even minus polarity signal ratio is higher than positive signal.

The present invention is the ratio for passing through the positive-negative polarity signal of voice signal collected by detection voice authentication system hardware Compared with (time domain polarity), it simply but can effectively judge this voice signal from living body speaker or Replay Attack loudspeaking Device.

The beneficial effects of the present invention are:

The present invention under conditions of only handling voice authentication time-domain signal, realize detection to Replay Attack with Defence.Since method is very simple effective, processing step is few, and algorithm complexity is low, and the present invention has colleges and universities and is delayed low Advantage；Simultaneously as object detected is unrelated with mixed noise in microphone and loudspeaker pathway for electrical signals, this method inspection Survey success rate is not influenced using microphone with loudspeaker tonequality by Replay Attack institute, i.e. the loudspeaker to different quality class and wheat The attack that gram wind is initiated has same protection effect.

The present invention can accurately and effectively detect the Replay Attack in voice authentication system.

Detailed description of the invention

Fig. 1 is the conversion process schematic diagram of ideally microphone and loudspeaker.

Fig. 2 is detection method flow chart of the invention.

Fig. 3 is the Speech signal detection figure of embodiment.

Specific embodiment

Present invention will be further explained below with reference to the attached drawings and examples.

Specific implementation process of the present invention is as follows:

1) voice signal being collected into the acquisition of voice authentication system interval carries out voice activity detection, removes voice signal In noise, extract voice audio signals in a part as pure vocal sections；

1.2) extract voice signal Sa in all sampled value absolute values be greater than signal amplitude threshold value | Athr | groups of samples At First ray (Sa_i1,Sa_i2,Sa_i3,...Sa_ix), Sa_i1,Sa_i2,Sa_i3,...Sa_ixRespectively indicate i-th₁A sampled point is to i-th_x The sampled value of a sampled point, and have 1≤i₁<i₂<i₃<...<i_x≤ N, i are index of the sampled point in voice signal Sa sequence Numerical sequence, N indicate the sum of sampled point in voice signal Sa sequence；

3) by obtained polarity number I and default polarity thresholds I_thrCompare: when polarity number I is greater than polarity thresholds I_thr, i.e. I >I_thrWhen, it is believed that voice signal meets living body user voice signal polarity feature, is judged as living body voice；Otherwise, judgement is attached most importance to Put attack.

Embodiment one:

In Fig. 3, upper channel is the living body authentication voice signal that voice authentication system obtains, and lower channel is with HiVi sound equipment The voice signal that Replay Attack obtains.It is obvious that the positive sex ratio of living body voice signal is much higher than negative polarity ratio Example, and Replay Attack signal is then just the opposite.At this detection method first two steps (voice activity detection, polarity index calculate) After reason, it is 0.583 that living body authentication voice signal polarity index, which can be calculated, hence it is evident that greater than the polarity of Replay Attack voice signal Index is 0.494.

Embodiment two:

The present embodiment acquires the living body authentication voice of totally 20 people (14 male 6 female), and with including aforementioned HiVi sound equipment 8 kinds of quality loudspeaker distributed more widely carry out Replay Attack.Decision threshold is set to be 0.52, i.e., is greater than polarity index 0.52 voice is determined as living body voice, anyway be determined as Replay Attack, obtains to living body speech detection accuracy rate 93.2%, To playback attack detecting accuracy rate 96.5%.

Claims

1. a kind of Replay Attack detection method of voice authentication system, it is characterised in that: pass through voice authentication system acquisition and recording Voice signal extracts the positive signal and minus polarity signal of voice signal, compares the ratio of positive signal and minus polarity signal Example relationship judgement obtains voice signal and belongs to Replay Attack or living body voice: if positive-negative polarity fraction gap is larger and just Polar signal ratio is higher than minus polarity signal ratio, then it is assumed that is Replay Attack；If positive-negative polarity fraction gap it is larger and Positive signal ratio is not higher than minus polarity signal ratio, then it is assumed that is living body voice.

2. a kind of Replay Attack detection method of voice authentication system according to claim 1, it is characterised in that: method tool Body is as follows:

1) voice signal being collected into the acquisition of voice authentication system interval carries out voice activity detection, removes in voice signal Noise extracts a part in voice audio signals as pure vocal sections；

Pure human voice signal sequence S is the sequence comprising N number of sampled point, and all number of sampling points that wherein sampled value is positive are N_pos, the absolute value of the sum of sampled value of all sampled points that sampled value is positive is | Sum_pos|, all samplings that sampled value is negative Point number is N_neg, the absolute value of the sum of sampled value of all sampled points that sampled value is negative is | Sum_neg|, using following formula Processing obtains polarity number I:

3) by obtained polarity number I and default polarity thresholds I_thrCompare: when polarity number I is greater than polarity thresholds I_thr, it is judged as Living body voice；Otherwise, it is judged as Replay Attack.

3. a kind of Replay Attack detection method of voice authentication system according to claim 2, it is characterised in that:

The step 1) specifically:

1.2) all sampled value absolute values in voice signal Sa are extracted and are greater than signal amplitude threshold value | Athr | groups of samples at the One sequence (Sa_i1,Sa_i2,Sa_i3,...Sa_ix), and have 1≤i₁<i₂<i₃<...<i_x≤ N, i are sampled point in voice signal Sa Index numerical sequence in sequence, N indicate the sum of sampled point in voice signal Sa sequence；

1.3) to First ray (Sa_i1,Sa_i2,Sa_i3,...Sa_ix) in, initially with i-th₁A sampled point is as reference sample point, first From i-th₁The index numerical sequence of a sampled point starts to traverse the index numerical sequence for finding each sampled point backward: if i-th_pA sampling The index numerical sequence and i-th of point_(p-1)The difference of the index numerical sequence of a sampled point is greater than default ordinal number threshold value D₁, then by i-th_p-1It is a Sampled point and i-th₁First ray (Sa between a sampled point_i1,Sa_i2,Sa_i3,...Sa_ix) in all groups of samples at the 1st A sequence of subsets Ssub1；

1.4) then from i-th_pA sampled point is constantly repeated the above steps backward as beginning 1.3), by i-th_q(q >=p) a sampling Point and its before closest to reference sample point between First ray (Sa_i1,Sa_i2,Sa_i3,...Sa_ix) in all sampled points Next sequence of subsets is formed, until traversal arrives last Sa_ixA sampled point finally obtains y-th of sequence of subsets Ssuby；

1.5) for the 1st sequence of subsets Ssub1 to y-th sequence of subsets Ssuby (y >=1), judge each sequence of subsets wherein The difference of largest index numerical sequence and minimum index numerical sequence that whether each sampled point meets sampled point is greater than default index threshold value D₂, the difference of all largest index numerical sequences for meeting sampled point and minimum index numerical sequence finally will be greater than default index threshold value D₂Sequence of subsets merge become pure human voice signal sequence S.