CN102436810A - Record replay attack detection method and system based on channel mode noise - Google Patents

Record replay attack detection method and system based on channel mode noise Download PDF

Info

Publication number
CN102436810A
CN102436810A CN2011103305987A CN201110330598A CN102436810A CN 102436810 A CN102436810 A CN 102436810A CN 2011103305987 A CN2011103305987 A CN 2011103305987A CN 201110330598 A CN201110330598 A CN 201110330598A CN 102436810 A CN102436810 A CN 102436810A
Authority
CN
China
Prior art keywords
mode noise
replay attack
noise
attack detection
voice signal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2011103305987A
Other languages
Chinese (zh)
Inventor
贺前华
王志锋
罗海宇
陈芬
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN2011103305987A priority Critical patent/CN102436810A/en
Priority to PCT/CN2011/084868 priority patent/WO2013060079A1/en
Publication of CN102436810A publication Critical patent/CN102436810A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech

Abstract

The invention relates to the technical field of intelligent voice signal processing, mode recognition and artificial intelligence and in particular relates to a record replay attack detection method and system in a speaker recognition system based on a channel mode noise. The invention discloses a simpler and more efficient record replay attack detection method in a speaker recognition system. The method comprises the following steps: (1) inputting a to-be-recognized voice signal; (2) pre-processing the voice signal; (3) extracting the channel mode noise in the pre-processed voice signal; (4) extracting a long time statistic feature based on the channel mode noise; and (5) classifying the long time statistic feature according to a channel noise classifying judging model. By using the channel mode noise to perform the record replay attack detection, the extracted feature dimension is low, the computation complexity is low, and the recognition error rate is low, therefore, the safety performance of the speaker recognition system is greatly improved, and the method and system provided by the invention can be used in the reality more easily.

Description

A kind of recording replay attack detection method and system based on the channelling mode noise
Technical field
The present invention relates to intelligent sound signal Processing, pattern-recognition and field of artificial intelligence, particularly relate to a kind of based on recording replay attack detection method and system in the Speaker Recognition System of channelling mode noise.
Background technology
Along with the continuous development of speaker Recognition Technology, Speaker Recognition System has obtained using very widely, for example: judicial evidence collection, ecommerce, financial sector etc.Meanwhile, some safety problems that Speaker Recognition System faced have restricted its development and application.Two kinds of common attacks that Speaker Recognition System faces are speaker's bogus attack and recording replay attack.Speaker's bogus attack is meant that the assailant attacks system through user's in the imitation Speaker Recognition System sound.The experiment of Speaker Identification on twins' sound bank shows that existing speaker Recognition Technology can distinguish the twins' voice with similar acoustic characteristic; Therefore implementing speaker's bogus attack needs extraordinary imitation skill; Make assailant's voice to reach highly similar with the voice of system user, this makes that the exploitativeness of bogus attack is not high.The recording replay attack is meant that the assailant uses a hidden recorder user's voice in the Speaker Recognition System with the high-fidelity sound pick-up outfit in advance, passes through the high-fidelity power amplifier then in the system input playback, with this Speaker Recognition System is implemented to attack.For the relevant Speaker Recognition System of text, can through use a hidden recorder the user when getting into system voice or use a hidden recorder a large number of users voice and implement replay attack through the mode of syllable splicing.Only need obtain the User Part voice for the system of text-independent and can implement replay attack.Compare with counterfeit voice, the recording voice playback is truly to come from the user, and it is bigger to the threat that Speaker Recognition System causes.On the other hand, performance is good now high-fidelity recording and playback apparatus continue to bring out, and price is also more and more cheap, and volume is also more and more littler, and being easy to carry is difficult for coming to light, and this also lets the recording replay attack become more and more easier.
Whether a kind of strategy of replay attack of preventing to record is to let the user follow through system's random choose statement read, when carrying out Speaker Identification, also want judges to come on request with reading.The enforcement of this method needs to prepare in advance abundant sound bank; And requiring the user to follow according to voice content reads; When the user according to oneself pronunciation custom when reading, can not pass through Speaker Recognition System, this not too close friend's interactivity mode is not easy to be accepted by the user.And this method can sacrifice the security protection of Speaker Recognition System for specific user's particular text, can produce other safety problem.In the application of reality, this method can only be used for the relevant Speaker Recognition System of text, when doing Speaker Identification, also will carry out the text identification of voice, and this has also reduced the overall efficiency of Speaker Recognition System.
Adopt sentence similarity method relatively in addition; Though the password text of the each input of user is identical; But twice can not collect same sample, just can regard as the recording replay attack if the sentence similarity of sentence of therefore importing and storage exceeds certain scope.There is open defect in this method: one, this algorithm is merely able to be applied to the relevant Speaker Recognition System of the text replay attack of recording and detects; Two, the user gets into systematic sample at every turn and will leave a large amount of storage spaces of needs; Three, each user gets into that systematic sample is all wanted and all storing sample are carried out the similarity comparison, and calculated amount is very big; If four voice playbacks of recording not are when the user gets into system, to record, for example record privately or obtain through syllable splicing, this method is just invalid so; Five, this method is very strong to the dependence of threshold setting, and Speaker Identification itself is exactly to carry out similarity relatively, and similarity is high is judged as same speaker, and the boundary of attacking with the similarity threshold of speaker self identification that therefore goes back on defense is difficult to definite.
Summary of the invention
The objective of the invention is to overcome the defective and the deficiency of prior art, a kind of recording replay attack detection method based on the channelling mode noise is provided, be used for Speaker Recognition System and can improve the success ratio that the recording replay attack detects.
Another object of the present invention also is to provide the realization system for carrying out said process.
The object of the invention is realized through following technical proposals:
A kind of recording replay attack detection method based on the channelling mode noise is characterized in that, said recording replay attack detection method may further comprise the steps:
(1) imports voice signal to be identified;
(2) voice signal is carried out pre-service;
(3) the channelling mode noise in the voice signal after the extraction pre-service;
Statistical nature when (4) extracting based on channelling mode noise long;
(5) classify the court verdict that the replay attack that obtains recording detects according to interchannel noise classification judgement model statistical nature when long.
Said step (2) pre-service comprises pre-emphasis, divides frame and windowing.
Said step (3) may further comprise the steps:
(31) pretreated voice signal being carried out noise-removed filtering handles;
(32) noise-removed filtering is handled forward and backward signal and carry out the statistics frame analysis respectively;
(33) two paths of signals after statistics frame is analyzed extracts log power spectrum, and subtraction, extracts the channelling mode noise of input speech signal.
Said statistics frame is after the short time frame of voice signal is done discrete Fourier transformation, to get the wherein mean value of same frequency composition.
Said step (4) may further comprise the steps:
(41) 0~5 rank Legendre multinomial coefficient of extraction channelling mode noise;
(42) six statistical natures of extraction channelling mode noise;
Statistical nature vector when the numerical value that (43) above-mentioned steps is obtained is merged into one group of 12 tie up long is as the eigenvector of recording replay attack detection.
Minimum value, maximal value, average, intermediate value, standard deviation and the maximal value that six statistical natures of said step (42) are the channelling mode noise and the difference of minimum value.
The interchannel noise classification judgement modelling of said step (5) comprises the steps:
(51) input training utterance signal;
Statistical nature when (52) repeating step (2)~(4), the channelling mode noise that obtains training long;
(53) (Support Vector Machine SVM) classifies, and sets up interchannel noise classification judgement model to utilize SVMs.
Realize system for carrying out said process, comprising:
---load module 100 is used for input training or voice signal to be identified;
---pre-processing module 200, be used for voice signal is carried out pre-service, it comprises pre-emphasis, divides frame and adds window unit;
---channelling mode noise extraction module 300 is used for extracting the channelling mode noise of voice signal after the pre-service;
---statistical nature extraction module 400 when long, statistical nature when being used to extract based on channelling mode noise long;
---interchannel noise model module 500, statistical nature utilizes SVM to classify when being used for training long, sets up interchannel noise classification and adjudicates model;
---recognition decision module 600, statistical nature is classified when being used to utilize interchannel noise classification judgement model to treat recognition of speech signals long, the court verdict of the replay attack detection that obtains recording;
---output module 700 is used to export the court verdict of voice signal to be identified.
Ultimate principle of the present invention is: detect through the channelling mode noise that the extracts speech signal replay attack of recording.In the recognition system of speaking, raw tone is meant system acquisition user's raw tone, the voice playback replay attack voice that refer to record.Voice playback has also experienced the process of once recording and playback before get into Speaker Recognition System recording channel.Different recording and playback apparatus can be introduced the different interchannel noise of equipment self (microphone, loudspeaker, dither circuit, prime amplifier, power amplifier, input and output wave filter, A, D, sample-and-hold circuit etc. all can introduce corresponding noise); These interchannel noises are superimposed upon on the voice playback, make voice playback and raw tone exist subtle difference.The present invention is called the channelling mode noise with these noises of going into from transducer (microphone, loudspeaker) and different electric pass in difference recording and the playback apparatus.The channelling mode noise that contains system's sound pick-up outfit in the raw tone; And voice playback not only contains the channelling mode noise of system; Therefore the channelling mode noise that also contains the equipment of using a hidden recorder and playback apparatus extracts channelling mode noise in the voice to be identified replay attack of can recording and detects.The present invention extracts the channelling mode noise through the noise-removed filtering device, and on the basis of channelling mode noise, extracts statistical nature when long, and whether utilize SVM to set up the interchannel noise model again is the recording replay attack in order to the input of judgement Speaker Recognition System.
The present invention compares with existing recording replay attack detection method, has following advantage and beneficial effect:
(1) can be applied to the relevant Speaker Recognition System of text, also can be applied to the Speaker Recognition System of text-independent.
(2) to the Classification and Identification of raw tone and voice playback can before the Speaker Identification also can after; Therefore; Can utilize interchannel noise modelling front end recording replay attack detecting device or rear end recording replay attack detecting device, make that recording replay attack algorithm application is more flexible.
Statistical nature and MFCC (Mel Frequency Cepstrum Coefficient, Mel frequency cepstral coefficient) characteristic are compared when (3) growing, and intrinsic dimensionality obviously reduces, and in the training stage, when extracting characteristic, efficient obviously improves.And need each user not got into systematic sample and store, save a large amount of storage spaces and computational resource.
Description of drawings
Fig. 1 is a system construction drawing of the present invention.
Fig. 2 is channelling mode noise extraction and feature extraction process flow diagram during based on channelling mode noise long.
Fig. 3 is that statistics frame is extracted process flow diagram.
Fig. 4 is the comparison diagram after the connection Speaker Recognition System.
Embodiment
Below in conjunction with accompanying drawing and embodiment enforcement of the present invention is further described, but enforcement of the present invention is not limited thereto.
Recording replay attack detection method of the present invention can realize in embedded system according to the following steps:
Step (1), the input training utterance, it comprises primary speech signal and voice playback signal.
Step (2) is carried out pre-service to input speech signal, comprises voice signal is carried out pre-emphasis, divides frame and windowing process.Pre-emphasis is that voice signal is carried out high-pass filtering, and the transition function of wave filter is H (z)=1-az -1, a=0.975 wherein.To the branch frame of voice signal, wherein frame length is 512 points, and it is 256 points that frame moves.To the added window of voice signal is Hamming window, and wherein the function of Hamming window is:
ω H ( n ) = 0.54 - 0.46 cos ( 2 πn N - 1 ) , 0 ≤ n ≤ N - 1 1 others
Step (3), the channelling mode noise after the extraction pre-service in the voice signal, extraction step is as shown in Figure 2.The extraction of channelling mode noise is divided into following steps:
Step S301 is with arriving channelling mode noise extraction module 300 through pretreated phonetic entry in the step (2);
Step S302 carries out noise-removed filtering with the signal among the step S301 through the noise-removed filtering device and handles, and the design of noise-removed filtering device is following:
H ( z ) = 1 - Σ n = 1 N α n z - n Σ n = 1 N α n , N=32 wherein, α=0.94;
Step S303 is with carrying out the statistics frame analysis respectively without the voice signal of crossing noise-removed filtering among process noise-removed filtering and the step S301 among the step S302.Statistics frame is the mean value of same frequency composition in the voice signal short time frame, establishes X={x 1[n], K, x T[n] } the expression frame number is the voice signal of T, i (the frame signal x of 1≤i≤T) then i[n] (discrete Fourier transformation of 0≤n≤N-1) is:
X i [ k ] = Σ n = 0 N - 1 x i [ n ] e - j 2 πkn N , 0 ≤ k ≤ N - 1
The expression formula of statistics frame S [k] is following so:
S [ k ] = 1 T Σ i = 1 T X i [ k ]
= 1 T Σ i = 1 T Σ n = 0 N - 1 x i [ n ] e - j 2 πkn N
As shown in Figure 3, the method for distilling of statistics frame is divided into following steps among the step S303:
Step S3031 will carry out discrete Fourier transformation through the signal that step S301, S302 handle;
Step S3032 is with superposeing through same frequency composition in the every frame of signal of discrete Fourier transformation among the step S3031;
Step S3033 asks the frequency spectrum that superposes among the step S3032 on average, obtains the statistics frame of input signal.
Step S304; Ask log power spectrum; The two paths of signals that process statistics frame among the step S303 is analyzed extracts log power spectrum; To deduct another road signal without the road signal of crossing noise-removed filtering then, thereby obtain the channelling mode noise of input speech signal, be shown below through the noise-removed filtering device:
N = log [ 1 T Σ i = 1 T Σ n = 0 N - 1 x i [ n ] e - j 2 πkn N ] - log [ 1 T Σ i = 1 T Σ n = 0 N - 1 { Defilter ( x i [ n ] ) } e - j 2 πkn N ]
Wherein Defilter () is the noise-removed filtering device that designs among the step S302.
Step (4), statistical nature when extracting two group leaders on the basis of the signal mode noise that obtains in above-mentioned step, one group is the Legendre multinomial coefficient on 0~5 rank, other one group is 6 kinds of statistical natures of channelling mode noise.
Step S401, the extraction of Legendre multinomial coefficient: the legendre multinomial coefficient of getting 0~5 rank carries out parameter fitting to the channelling mode noise that extracts.
The polynomial form of Legendre is following:
f ( x ) = Σ n = 0 ∞ L n P n ( x )
Wherein 3, L nBe the Legendre multinomial coefficient.After extracting the channelling mode noise, carry out the Legendre polynomial expansion, obtain L 0~L 5Multinomial coefficient.Each Legendre multinomial coefficient has embodied the information of an aspect of channelling mode noise: L0---the direct current component of channelling mode noise; L1---channelling mode noise profile slope of a curve; L2---the curvature of channelling mode noise profile curve; L3---the S curvature of channelling mode noise profile curve; L4, L5---the more details information of channelling mode noise profile curve.
Step S402 extracts the statistical nature based on the channelling mode noise, and this group statistical nature comprises following six kinds of characteristics:
● PN_min: the minimum value of channelling mode noise;
● PN_max: the maximal value of channelling mode noise;
● PN_mean: the average of channelling mode noise;
● PN_median: the intermediate value of channelling mode noise;
● PN_diff: maximal value and minimum value poor;
● PN_stdev: the standard deviation of channelling mode noise.
Statistical nature vector when statistical nature is merged into one group of 12 tie up long during with two group leaders is with its eigenvector that detects as the recording replay attack.
Step (5) is set up SVM interchannel noise classification judgement model, and the voice to be identified that are used for distinguishing input are raw tone or voice playback.The detailed process that SVM makes up the interchannel noise model parameter is following: SVM makes up the interchannel noise model parameter and comprises positive sample and negative sample.Wherein positive sample be primary speech signal through above-mentioned steps (2)~(4) obtain based on channelling mode noise long the time statistical nature.Negative sample for the voice playback signal through above-mentioned steps (2)~(4) obtain based on channelling mode noise long the time statistical nature.
So-called svm classifier is that the requirement classifying face not only can correctly separate two types of samples, and makes the class interval maximum.We can be to sample set (x i, y i), i=1, L, n, x ∈ R d, y i∈ [1 ,+1], carry out normalization it satisfied:
y i[(w·x i)+b]-1≥0,i=1,L,n
This moment, the class interval equaled 2/||w||, the interval maximum is equivalent to makes || w|| 2Minimum.Therefore satisfy following formula and make
Figure BDA0000102391970000101
minimum classifying face and just be called the optimal classification face, the training sample point on it just is called support vector.
Utilize the Lagrange optimization method to find the solution, the Lagrange function is:
L ( w , b , α ) = 1 2 ( w . w ) - Σ i = 1 n α i { y i [ ( w . x i ) + b ] - 1 }
This function is converted into the Wolf dual problem, promptly in constraint condition:
Σ i = 1 n y i α i = 0 , And α i>=0, i=1, L, n
Down to α iFind the solution down the array function maximal value:
Q ( α ) = Σ i = 1 n α i - 1 2 Σ i , j = 1 n α i α j y i y j ( x i · x j )
α iFor in the former problem with each constraint condition y i[(wx i)+b]-1>=0, i=1, L, the Lagrange multiplier that n is corresponding.After separating the problems referred to above, establish the optimum solution that obtains separate into
Figure BDA0000102391970000105
And b *, x is the grouped data of treating of input.Available optimal classification function (being the output function of SVM),
f ( x ) = sgn { ( w · x ) + b * } = sgn { Σ i = 1 n α i * y i ( x i · x ) + b * }
Speech samples can have fully and makes an uproar in the reality, and linear separability fully is so be under the inseparable situation of linearity, to use the svm classifier device.Then can be in constraint condition
y i[(w·x i)+b]-1≥0,i=1,L,n
Relaxation factor ξ of middle increase i>=0, then constraint condition becomes:
y i[(w·x i)+b]-1+ξ i≥0,i=1,L,n
Then the Lagrange function is:
L ( w , b , α ) = 1 2 ( w . w ) + C ( Σ i = 1 n ξ i )
Changing the Wolf problem into gets:
Figure BDA0000102391970000113
With 0≤α i≤C, i=1, L, find the solution under the n condition:
Q ( α ) = Σ i = 1 n α i - 1 2 Σ i , j = 1 n α i α j y i y j K ( x i , x j )
Wherein C is a constant, in order to the degree of control to this punishment of wrong increment, is called penalty factor.
So under the inseparable situation of linearity, the output function of SVM can be expressed as:
f ( x ) = sgn ( Σ i = 1 N α i * y i K ( x , x i ) + b * )
Wherein, 0≤α i≤C, i=1 ..., n, sgn () are sign function,
K (x iX j) be radially basic inner product function, can be used as kernel function as SVM:
K(x,x i)=exp(-λ||x-x j||),λ>0
Can select different kernel functions in the practical operation.
Penalty factor C and λ confirm through SMO (Sequential Minimal Optimization, sequential minimum optimization) algorithm and grid search algorithm, and are used to train the interchannel noise model.One group through actual parameter optimization is set to: C=0.03125, λ=0.0078125.
Step (6); The Classification and Identification of raw tone and voice playback; Import voice signal to be identified; Court verdict is exported in the replay attack detection of recording of statistical nature when obtaining based on channelling mode noise long through above-mentioned steps (2)~(4), the interchannel noise model that utilizes step (5) to set up at last.
As shown in Figure 1, a kind of recording replay attack detection system of the present invention comprises:
---load module 100 is used for input training or voice signal to be identified;
---pre-processing module 200, be used for voice signal is carried out pre-service, it comprises pre-emphasis, divides frame and adds window unit;
---channelling mode noise extraction module 300 is used for extracting the channelling mode noise of voice signal after the pre-service;
---statistical nature extraction module 400 when long, statistical nature when being used to extract based on channelling mode noise long;
---interchannel noise model module 500, statistical nature utilizes SVM to classify when being used for training long, sets up interchannel noise classification and adjudicates model;
---recognition decision module 600 is used to utilize whether the voice to be identified of interchannel noise model module judgement input are recording replay attack voice;
---output module 700 is used to export the court verdict of voice signal to be identified.
Provided by the invention a kind of based on channelling mode noise recording replay attack detection method; At recording and voice playback database (Authentic and Playback Speech Database; APSD) compare in Yu based on the sentence similarity comparative approach; As shown in table 1, lower based on the method fault rate of channelling mode noise.
Table 1
As shown in Figure 4, the recording replay attack detecting device that two kinds of methods are set up is connected with the Speaker Recognition System of reality respectively.For the data that contain the replay attack voice, the Speaker Recognition System error rate that does not load the replay attack detection module is very high, and security performance is very low.Loading based on the replay attack detection module of channelling mode noise after error rate such as system minimum, be 10.2564%.And load based on error rates such as systems behind the replay attack detection module of sentence similarity comparison is 29.0598%.
Proposed by the invention a kind ofly be simple and easy to not only realize that based on channelling mode noise recording replay attack detection method efficiency of algorithm is high, and error rate is low.Be used on embedded identification and other smart machine higher efficient will be arranged.

Claims (8)

1. the recording replay attack detection method based on the channelling mode noise is characterized in that, said recording replay attack detection method may further comprise the steps:
(1) imports voice signal to be identified;
(2) voice signal is carried out pre-service;
(3) the channelling mode noise in the voice signal after the extraction pre-service;
Statistical nature when (4) extracting based on channelling mode noise long;
(5) classify the court verdict that the replay attack that obtains recording detects according to interchannel noise classification judgement model statistical nature when long.
2. a kind of recording replay attack detection method as claimed in claim 1 is characterized in that, the pre-service in the said step (2) comprises pre-emphasis, divides frame and windowing.
3. a kind of recording replay attack detection method as claimed in claim 1 is characterized in that said step (3) is further comprising the steps of:
(31) pretreated voice signal being carried out noise-removed filtering handles;
(32) noise-removed filtering is handled forward and backward signal and carry out the statistics frame analysis respectively;
(33) two paths of signals after statistics frame is analyzed extracts log power spectrum, and subtraction, extracts the channelling mode noise of input speech signal.
4. a kind of recording replay attack detection method as claimed in claim 3 is characterized in that, said statistics frame is after the short time frame of voice signal is done discrete Fourier transformation, to get the wherein mean value of same frequency composition.
5. a kind of recording replay attack detection method as claimed in claim 1 is characterized in that said step (4) is further comprising the steps of:
(41) 0 ~ 5 rank Legendre multinomial coefficient of extraction channelling mode noise;
(42) six statistical natures of extraction channelling mode noise;
Statistical nature vector when the numerical value that (43) above-mentioned steps is obtained is merged into one group of 12 tie up long is as the eigenvector of recording replay attack detection.
6. a kind of recording replay attack detection method as claimed in claim 5 is characterized in that, minimum value, maximal value, average, intermediate value, standard deviation and the maximal value that six statistical natures of said step (42) are the channelling mode noise and the difference of minimum value.
7. a kind of recording replay attack detection method as claimed in claim 1 is characterized in that, the interchannel noise classification judgement modelling of said step (5) comprises the steps:
(51) input training utterance signal;
Statistical nature when (52) repeating step (2) ~ (4), the channelling mode noise that obtains training long;
(53) utilize SVMs (SVM) to classify, set up interchannel noise classification judgement model.
8. recording replay attack detection system based on the channelling mode noise is characterized in that comprising:
---load module (100) is used to import training utterance signal or voice signal to be identified;
---pre-processing module (200), be used for training utterance signal or voice signal to be identified are carried out pre-service, it comprises pre-emphasis, divides frame and adds window unit;
---channelling mode noise extraction module (300) is used for extracting the channelling mode noise of training utterance signal after the pre-service or voice signal to be identified;
---statistical nature extraction module (400) when long, statistical nature when being used to extract based on the training utterance signal of channelling mode noise or voice signal to be identified long;
---interchannel noise model module (500), statistical nature utilizes SVM to classify when being used for training utterance signal long, sets up interchannel noise classification judgement model;
---recognition decision module (600), statistical nature is classified when being used to utilize interchannel noise classification judgement model to treat recognition of speech signals long, the court verdict of the replay attack detection that obtains recording;
---output module (700) is used to export the court verdict of voice signal to be identified.
CN2011103305987A 2011-10-26 2011-10-26 Record replay attack detection method and system based on channel mode noise Pending CN102436810A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN2011103305987A CN102436810A (en) 2011-10-26 2011-10-26 Record replay attack detection method and system based on channel mode noise
PCT/CN2011/084868 WO2013060079A1 (en) 2011-10-26 2011-12-29 Record playback attack detection method and system based on channel mode noise

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2011103305987A CN102436810A (en) 2011-10-26 2011-10-26 Record replay attack detection method and system based on channel mode noise

Publications (1)

Publication Number Publication Date
CN102436810A true CN102436810A (en) 2012-05-02

Family

ID=45984833

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2011103305987A Pending CN102436810A (en) 2011-10-26 2011-10-26 Record replay attack detection method and system based on channel mode noise

Country Status (2)

Country Link
CN (1) CN102436810A (en)
WO (1) WO2013060079A1 (en)

Cited By (23)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102820034A (en) * 2012-07-16 2012-12-12 中国民航大学 Noise sensing and identifying device and method for civil aircraft
CN104569551A (en) * 2015-01-08 2015-04-29 漳州科华技术有限责任公司 DC component detecting method applied to inversion voltages
CN105023571A (en) * 2015-07-28 2015-11-04 苏州宏展信息科技有限公司 Voice feature extraction control method for recording pen
CN105513598A (en) * 2016-01-14 2016-04-20 宁波大学 Playback voice detection method based on distribution of information quantity in frequency domain
CN105869630A (en) * 2016-06-27 2016-08-17 上海交通大学 Method and system for detecting voice spoofing attack of speakers on basis of deep learning
CN105913855A (en) * 2016-04-11 2016-08-31 宁波大学 Long window scaling factor-based playback voice attack detection algorithm
CN106297772A (en) * 2016-08-24 2017-01-04 武汉大学 Detection method is attacked in the playback of voice signal distorted characteristic based on speaker introducing
WO2017000813A1 (en) * 2015-06-30 2017-01-05 芋头科技(杭州)有限公司 Indoor noise pollution automatic identification and monitoring system
CN106409298A (en) * 2016-09-30 2017-02-15 广东技术师范学院 Identification method of sound rerecording attack
CN106531172A (en) * 2016-11-23 2017-03-22 湖北大学 Speaker voice playback identification method and system based on environmental noise change detection
CN108039176A (en) * 2018-01-11 2018-05-15 广州势必可赢网络科技有限公司 A kind of voiceprint authentication method, device and the access control system of anti-recording attack
CN108281158A (en) * 2018-01-12 2018-07-13 平安科技(深圳)有限公司 Voice biopsy method, server and storage medium based on deep learning
CN109243487A (en) * 2018-11-30 2019-01-18 宁波大学 A kind of voice playback detection method normalizing normal Q cepstrum feature
CN109599117A (en) * 2018-11-14 2019-04-09 厦门快商通信息技术有限公司 A kind of audio data recognition methods and human voice anti-replay identifying system
CN109754817A (en) * 2017-11-02 2019-05-14 北京三星通信技术研究有限公司 signal processing method and terminal device
CN110299141A (en) * 2019-07-04 2019-10-01 苏州大学 The acoustic feature extracting method of recording replay attack detection in a kind of Application on Voiceprint Recognition
CN110459226A (en) * 2019-08-19 2019-11-15 效生软件科技(上海)有限公司 A method of voice is detected by vocal print engine or machine sound carries out identity veritification
CN110718229A (en) * 2019-11-14 2020-01-21 国微集团(深圳)有限公司 Detection method for record playback attack and training method corresponding to detection model
CN111445904A (en) * 2018-12-27 2020-07-24 北京奇虎科技有限公司 Cloud-based voice control method and device and electronic equipment
CN111462737A (en) * 2020-03-26 2020-07-28 中国科学院计算技术研究所 Method for training grouping model for voice grouping and voice noise reduction method
CN112599149A (en) * 2020-12-10 2021-04-02 中国传媒大学 Detection method and device for replay attack voice
CN113012684A (en) * 2021-03-04 2021-06-22 电子科技大学 Synthesized voice detection method based on voice segmentation
CN114441029A (en) * 2022-01-20 2022-05-06 深圳壹账通科技服务有限公司 Recording noise detection method, device, equipment and medium of voice labeling system

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
BR102014023647B1 (en) * 2014-09-24 2022-12-06 Fundacao Cpqd - Centro De Pesquisa E Desenvolvimento Em Telecomunicacoes METHOD AND SYSTEM FOR FRAUD DETECTION IN APPLICATIONS BASED ON VOICE PROCESSING
CN105044478B (en) * 2015-07-23 2018-03-13 国家电网公司 A kind of multi channel signals extracting method of transmission line of electricity audible noise

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1445752A (en) * 2002-03-15 2003-10-01 松下电器产业株式会社 Method and device for channel and additivity noise joint compensation in characteristic field
CN1912993A (en) * 2005-08-08 2007-02-14 中国科学院声学研究所 Voice end detection method based on energy and harmonic
CN101199006A (en) * 2005-06-20 2008-06-11 微软公司 Multi-sensory speech enhancement using a clean speech prior
CN101223574A (en) * 2005-12-08 2008-07-16 韩国电子通信研究院 Voice recognition apparatus and method using vocal band signal

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1123863C (en) * 2000-11-10 2003-10-08 清华大学 Information check method based on speed recognition
US8190437B2 (en) * 2008-10-24 2012-05-29 Nuance Communications, Inc. Speaker verification methods and apparatus

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1445752A (en) * 2002-03-15 2003-10-01 松下电器产业株式会社 Method and device for channel and additivity noise joint compensation in characteristic field
CN101199006A (en) * 2005-06-20 2008-06-11 微软公司 Multi-sensory speech enhancement using a clean speech prior
CN1912993A (en) * 2005-08-08 2007-02-14 中国科学院声学研究所 Voice end detection method based on energy and harmonic
CN101223574A (en) * 2005-12-08 2008-07-16 韩国电子通信研究院 Voice recognition apparatus and method using vocal band signal

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
王志锋,贺前华,张雪源,罗海宇,苏卓生: "基于信道模式噪声的录音回放攻击检测", 《华南理工大学学报(自然科学版)》 *

Cited By (33)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102820034B (en) * 2012-07-16 2014-05-21 中国民航大学 Noise sensing and identifying device and method for civil aircraft
CN102820034A (en) * 2012-07-16 2012-12-12 中国民航大学 Noise sensing and identifying device and method for civil aircraft
CN104569551A (en) * 2015-01-08 2015-04-29 漳州科华技术有限责任公司 DC component detecting method applied to inversion voltages
CN104569551B (en) * 2015-01-08 2016-03-23 漳州科华技术有限责任公司 A kind of DC component detection method being applied to inverter voltage
WO2017000813A1 (en) * 2015-06-30 2017-01-05 芋头科技(杭州)有限公司 Indoor noise pollution automatic identification and monitoring system
CN105023571A (en) * 2015-07-28 2015-11-04 苏州宏展信息科技有限公司 Voice feature extraction control method for recording pen
CN105513598B (en) * 2016-01-14 2019-04-23 宁波大学 A kind of voice playback detection method based on the distribution of frequency domain information amount
CN105513598A (en) * 2016-01-14 2016-04-20 宁波大学 Playback voice detection method based on distribution of information quantity in frequency domain
CN105913855A (en) * 2016-04-11 2016-08-31 宁波大学 Long window scaling factor-based playback voice attack detection algorithm
CN105869630A (en) * 2016-06-27 2016-08-17 上海交通大学 Method and system for detecting voice spoofing attack of speakers on basis of deep learning
CN105869630B (en) * 2016-06-27 2019-08-02 上海交通大学 Speaker's voice spoofing attack detection method and system based on deep learning
CN106297772A (en) * 2016-08-24 2017-01-04 武汉大学 Detection method is attacked in the playback of voice signal distorted characteristic based on speaker introducing
CN106297772B (en) * 2016-08-24 2019-06-25 武汉大学 Replay attack detection method based on the voice signal distorted characteristic that loudspeaker introduces
CN106409298A (en) * 2016-09-30 2017-02-15 广东技术师范学院 Identification method of sound rerecording attack
CN106531172B (en) * 2016-11-23 2019-06-14 湖北大学 Speaker's audio playback discrimination method and system based on ambient noise variation detection
CN106531172A (en) * 2016-11-23 2017-03-22 湖北大学 Speaker voice playback identification method and system based on environmental noise change detection
CN109754817A (en) * 2017-11-02 2019-05-14 北京三星通信技术研究有限公司 signal processing method and terminal device
CN108039176B (en) * 2018-01-11 2021-06-18 广州势必可赢网络科技有限公司 Voiceprint authentication method and device for preventing recording attack and access control system
CN108039176A (en) * 2018-01-11 2018-05-15 广州势必可赢网络科技有限公司 A kind of voiceprint authentication method, device and the access control system of anti-recording attack
CN108281158A (en) * 2018-01-12 2018-07-13 平安科技(深圳)有限公司 Voice biopsy method, server and storage medium based on deep learning
CN109599117A (en) * 2018-11-14 2019-04-09 厦门快商通信息技术有限公司 A kind of audio data recognition methods and human voice anti-replay identifying system
CN109243487A (en) * 2018-11-30 2019-01-18 宁波大学 A kind of voice playback detection method normalizing normal Q cepstrum feature
CN109243487B (en) * 2018-11-30 2022-12-27 宁波大学 Playback voice detection method for normalized constant Q cepstrum features
CN111445904A (en) * 2018-12-27 2020-07-24 北京奇虎科技有限公司 Cloud-based voice control method and device and electronic equipment
CN110299141B (en) * 2019-07-04 2021-07-13 苏州大学 Acoustic feature extraction method for detecting playback attack of sound record in voiceprint recognition
CN110299141A (en) * 2019-07-04 2019-10-01 苏州大学 The acoustic feature extracting method of recording replay attack detection in a kind of Application on Voiceprint Recognition
CN110459226A (en) * 2019-08-19 2019-11-15 效生软件科技(上海)有限公司 A method of voice is detected by vocal print engine or machine sound carries out identity veritification
CN110718229A (en) * 2019-11-14 2020-01-21 国微集团(深圳)有限公司 Detection method for record playback attack and training method corresponding to detection model
CN111462737A (en) * 2020-03-26 2020-07-28 中国科学院计算技术研究所 Method for training grouping model for voice grouping and voice noise reduction method
CN111462737B (en) * 2020-03-26 2023-08-08 中国科学院计算技术研究所 Method for training grouping model for voice grouping and voice noise reduction method
CN112599149A (en) * 2020-12-10 2021-04-02 中国传媒大学 Detection method and device for replay attack voice
CN113012684A (en) * 2021-03-04 2021-06-22 电子科技大学 Synthesized voice detection method based on voice segmentation
CN114441029A (en) * 2022-01-20 2022-05-06 深圳壹账通科技服务有限公司 Recording noise detection method, device, equipment and medium of voice labeling system

Also Published As

Publication number Publication date
WO2013060079A1 (en) 2013-05-02

Similar Documents

Publication Publication Date Title
CN102436810A (en) Record replay attack detection method and system based on channel mode noise
CN105913855B (en) A kind of voice playback attack detecting algorithm based on long window scale factor
CN102394062B (en) Method and system for automatically identifying voice recording equipment source
Gomez-Alanis et al. A gated recurrent convolutional neural network for robust spoofing detection
Wrigley et al. Speech and crosstalk detection in multichannel audio
CN112201255B (en) Voice signal spectrum characteristic and deep learning voice spoofing attack detection method
CN101546556B (en) Classification system for identifying audio content
WO2012075641A1 (en) Device and method for pass-phrase modeling for speaker verification, and verification system
CN102968990A (en) Speaker identifying method and system
Paul et al. Countermeasure to handle replay attacks in practical speaker verification systems
CN113192504B (en) Silent voice attack detection method based on domain adaptation
CN105513598A (en) Playback voice detection method based on distribution of information quantity in frequency domain
Khan et al. Battling voice spoofing: a review, comparative analysis, and generalizability evaluation of state-of-the-art voice spoofing counter measures
US20220108702A1 (en) Speaker recognition method
CN114512134A (en) Method and device for voiceprint information extraction, model training and voiceprint recognition
Kooshan et al. Singer identification by vocal parts detection and singer classification using lstm neural networks
Chang et al. Application of abnormal sound recognition system for indoor environment
Sun et al. A Self-Attentional ResNet-LightGBM Model for IoT-Enabled Voice Liveness Detection
Dwijayanti et al. Speaker identification using a convolutional neural network
Ali et al. Fake audio detection using hierarchical representations learning and spectrogram features
Jin et al. Speaker verification based on single channel speech separation
Saleh et al. Multimodal person identification through the fusion of face and voice biometrics
Feng et al. SHNU anti-spoofing systems for asvspoof 2019 challenge
CN115662464B (en) Method and system for intelligently identifying environmental noise
Jiqing et al. Sports audio classification based on MFCC and GMM

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20120502