CN102436810A - Record replay attack detection method and system based on channel mode noise - Google Patents
Record replay attack detection method and system based on channel mode noise Download PDFInfo
- Publication number
- CN102436810A CN102436810A CN2011103305987A CN201110330598A CN102436810A CN 102436810 A CN102436810 A CN 102436810A CN 2011103305987 A CN2011103305987 A CN 2011103305987A CN 201110330598 A CN201110330598 A CN 201110330598A CN 102436810 A CN102436810 A CN 102436810A
- Authority
- CN
- China
- Prior art keywords
- mode noise
- replay attack
- noise
- attack detection
- voice signal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 31
- 238000007781 pre-processing Methods 0.000 claims abstract description 4
- 238000000605 extraction Methods 0.000 claims description 19
- 238000012549 training Methods 0.000 claims description 16
- 238000001914 filtration Methods 0.000 claims description 14
- 239000000284 extract Substances 0.000 claims description 11
- 238000001228 spectrum Methods 0.000 claims description 5
- 238000012706 support-vector machine Methods 0.000 claims description 5
- 230000009466 transformation Effects 0.000 claims description 5
- 230000008676 import Effects 0.000 claims description 4
- 239000000203 mixture Substances 0.000 claims description 4
- 238000000034 method Methods 0.000 abstract description 21
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 238000012545 processing Methods 0.000 abstract description 2
- 230000008569 process Effects 0.000 description 8
- 238000005457 optimization Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000009897 systematic effect Effects 0.000 description 3
- 241000282461 Canis lupus Species 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 108010074864 Factor XI Proteins 0.000 description 1
- 208000016709 aortopulmonary window Diseases 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000008901 benefit Effects 0.000 description 1
- 230000000052 comparative effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000007123 defense Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 230000002950 deficient Effects 0.000 description 1
- 230000009977 dual effect Effects 0.000 description 1
- 238000002474 experimental method Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000010845 search algorithm Methods 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 230000007704 transition Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Telephonic Communication Services (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
The invention relates to the technical field of intelligent voice signal processing, mode recognition and artificial intelligence and in particular relates to a record replay attack detection method and system in a speaker recognition system based on a channel mode noise. The invention discloses a simpler and more efficient record replay attack detection method in a speaker recognition system. The method comprises the following steps: (1) inputting a to-be-recognized voice signal; (2) pre-processing the voice signal; (3) extracting the channel mode noise in the pre-processed voice signal; (4) extracting a long time statistic feature based on the channel mode noise; and (5) classifying the long time statistic feature according to a channel noise classifying judging model. By using the channel mode noise to perform the record replay attack detection, the extracted feature dimension is low, the computation complexity is low, and the recognition error rate is low, therefore, the safety performance of the speaker recognition system is greatly improved, and the method and system provided by the invention can be used in the reality more easily.
Description
Technical field
The present invention relates to intelligent sound signal Processing, pattern-recognition and field of artificial intelligence, particularly relate to a kind of based on recording replay attack detection method and system in the Speaker Recognition System of channelling mode noise.
Background technology
Along with the continuous development of speaker Recognition Technology, Speaker Recognition System has obtained using very widely, for example: judicial evidence collection, ecommerce, financial sector etc.Meanwhile, some safety problems that Speaker Recognition System faced have restricted its development and application.Two kinds of common attacks that Speaker Recognition System faces are speaker's bogus attack and recording replay attack.Speaker's bogus attack is meant that the assailant attacks system through user's in the imitation Speaker Recognition System sound.The experiment of Speaker Identification on twins' sound bank shows that existing speaker Recognition Technology can distinguish the twins' voice with similar acoustic characteristic; Therefore implementing speaker's bogus attack needs extraordinary imitation skill; Make assailant's voice to reach highly similar with the voice of system user, this makes that the exploitativeness of bogus attack is not high.The recording replay attack is meant that the assailant uses a hidden recorder user's voice in the Speaker Recognition System with the high-fidelity sound pick-up outfit in advance, passes through the high-fidelity power amplifier then in the system input playback, with this Speaker Recognition System is implemented to attack.For the relevant Speaker Recognition System of text, can through use a hidden recorder the user when getting into system voice or use a hidden recorder a large number of users voice and implement replay attack through the mode of syllable splicing.Only need obtain the User Part voice for the system of text-independent and can implement replay attack.Compare with counterfeit voice, the recording voice playback is truly to come from the user, and it is bigger to the threat that Speaker Recognition System causes.On the other hand, performance is good now high-fidelity recording and playback apparatus continue to bring out, and price is also more and more cheap, and volume is also more and more littler, and being easy to carry is difficult for coming to light, and this also lets the recording replay attack become more and more easier.
Whether a kind of strategy of replay attack of preventing to record is to let the user follow through system's random choose statement read, when carrying out Speaker Identification, also want judges to come on request with reading.The enforcement of this method needs to prepare in advance abundant sound bank; And requiring the user to follow according to voice content reads; When the user according to oneself pronunciation custom when reading, can not pass through Speaker Recognition System, this not too close friend's interactivity mode is not easy to be accepted by the user.And this method can sacrifice the security protection of Speaker Recognition System for specific user's particular text, can produce other safety problem.In the application of reality, this method can only be used for the relevant Speaker Recognition System of text, when doing Speaker Identification, also will carry out the text identification of voice, and this has also reduced the overall efficiency of Speaker Recognition System.
Adopt sentence similarity method relatively in addition; Though the password text of the each input of user is identical; But twice can not collect same sample, just can regard as the recording replay attack if the sentence similarity of sentence of therefore importing and storage exceeds certain scope.There is open defect in this method: one, this algorithm is merely able to be applied to the relevant Speaker Recognition System of the text replay attack of recording and detects; Two, the user gets into systematic sample at every turn and will leave a large amount of storage spaces of needs; Three, each user gets into that systematic sample is all wanted and all storing sample are carried out the similarity comparison, and calculated amount is very big; If four voice playbacks of recording not are when the user gets into system, to record, for example record privately or obtain through syllable splicing, this method is just invalid so; Five, this method is very strong to the dependence of threshold setting, and Speaker Identification itself is exactly to carry out similarity relatively, and similarity is high is judged as same speaker, and the boundary of attacking with the similarity threshold of speaker self identification that therefore goes back on defense is difficult to definite.
Summary of the invention
The objective of the invention is to overcome the defective and the deficiency of prior art, a kind of recording replay attack detection method based on the channelling mode noise is provided, be used for Speaker Recognition System and can improve the success ratio that the recording replay attack detects.
Another object of the present invention also is to provide the realization system for carrying out said process.
The object of the invention is realized through following technical proposals:
A kind of recording replay attack detection method based on the channelling mode noise is characterized in that, said recording replay attack detection method may further comprise the steps:
(1) imports voice signal to be identified;
(2) voice signal is carried out pre-service;
(3) the channelling mode noise in the voice signal after the extraction pre-service;
Statistical nature when (4) extracting based on channelling mode noise long;
(5) classify the court verdict that the replay attack that obtains recording detects according to interchannel noise classification judgement model statistical nature when long.
Said step (2) pre-service comprises pre-emphasis, divides frame and windowing.
Said step (3) may further comprise the steps:
(31) pretreated voice signal being carried out noise-removed filtering handles;
(32) noise-removed filtering is handled forward and backward signal and carry out the statistics frame analysis respectively;
(33) two paths of signals after statistics frame is analyzed extracts log power spectrum, and subtraction, extracts the channelling mode noise of input speech signal.
Said statistics frame is after the short time frame of voice signal is done discrete Fourier transformation, to get the wherein mean value of same frequency composition.
Said step (4) may further comprise the steps:
(41) 0~5 rank Legendre multinomial coefficient of extraction channelling mode noise;
(42) six statistical natures of extraction channelling mode noise;
Statistical nature vector when the numerical value that (43) above-mentioned steps is obtained is merged into one group of 12 tie up long is as the eigenvector of recording replay attack detection.
Minimum value, maximal value, average, intermediate value, standard deviation and the maximal value that six statistical natures of said step (42) are the channelling mode noise and the difference of minimum value.
The interchannel noise classification judgement modelling of said step (5) comprises the steps:
(51) input training utterance signal;
Statistical nature when (52) repeating step (2)~(4), the channelling mode noise that obtains training long;
(53) (Support Vector Machine SVM) classifies, and sets up interchannel noise classification judgement model to utilize SVMs.
Realize system for carrying out said process, comprising:
---load module 100 is used for input training or voice signal to be identified;
---pre-processing module 200, be used for voice signal is carried out pre-service, it comprises pre-emphasis, divides frame and adds window unit;
---channelling mode noise extraction module 300 is used for extracting the channelling mode noise of voice signal after the pre-service;
---statistical nature extraction module 400 when long, statistical nature when being used to extract based on channelling mode noise long;
---interchannel noise model module 500, statistical nature utilizes SVM to classify when being used for training long, sets up interchannel noise classification and adjudicates model;
---recognition decision module 600, statistical nature is classified when being used to utilize interchannel noise classification judgement model to treat recognition of speech signals long, the court verdict of the replay attack detection that obtains recording;
---output module 700 is used to export the court verdict of voice signal to be identified.
Ultimate principle of the present invention is: detect through the channelling mode noise that the extracts speech signal replay attack of recording.In the recognition system of speaking, raw tone is meant system acquisition user's raw tone, the voice playback replay attack voice that refer to record.Voice playback has also experienced the process of once recording and playback before get into Speaker Recognition System recording channel.Different recording and playback apparatus can be introduced the different interchannel noise of equipment self (microphone, loudspeaker, dither circuit, prime amplifier, power amplifier, input and output wave filter, A, D, sample-and-hold circuit etc. all can introduce corresponding noise); These interchannel noises are superimposed upon on the voice playback, make voice playback and raw tone exist subtle difference.The present invention is called the channelling mode noise with these noises of going into from transducer (microphone, loudspeaker) and different electric pass in difference recording and the playback apparatus.The channelling mode noise that contains system's sound pick-up outfit in the raw tone; And voice playback not only contains the channelling mode noise of system; Therefore the channelling mode noise that also contains the equipment of using a hidden recorder and playback apparatus extracts channelling mode noise in the voice to be identified replay attack of can recording and detects.The present invention extracts the channelling mode noise through the noise-removed filtering device, and on the basis of channelling mode noise, extracts statistical nature when long, and whether utilize SVM to set up the interchannel noise model again is the recording replay attack in order to the input of judgement Speaker Recognition System.
The present invention compares with existing recording replay attack detection method, has following advantage and beneficial effect:
(1) can be applied to the relevant Speaker Recognition System of text, also can be applied to the Speaker Recognition System of text-independent.
(2) to the Classification and Identification of raw tone and voice playback can before the Speaker Identification also can after; Therefore; Can utilize interchannel noise modelling front end recording replay attack detecting device or rear end recording replay attack detecting device, make that recording replay attack algorithm application is more flexible.
Statistical nature and MFCC (Mel Frequency Cepstrum Coefficient, Mel frequency cepstral coefficient) characteristic are compared when (3) growing, and intrinsic dimensionality obviously reduces, and in the training stage, when extracting characteristic, efficient obviously improves.And need each user not got into systematic sample and store, save a large amount of storage spaces and computational resource.
Description of drawings
Fig. 1 is a system construction drawing of the present invention.
Fig. 2 is channelling mode noise extraction and feature extraction process flow diagram during based on channelling mode noise long.
Fig. 3 is that statistics frame is extracted process flow diagram.
Fig. 4 is the comparison diagram after the connection Speaker Recognition System.
Embodiment
Below in conjunction with accompanying drawing and embodiment enforcement of the present invention is further described, but enforcement of the present invention is not limited thereto.
Recording replay attack detection method of the present invention can realize in embedded system according to the following steps:
Step (1), the input training utterance, it comprises primary speech signal and voice playback signal.
Step (2) is carried out pre-service to input speech signal, comprises voice signal is carried out pre-emphasis, divides frame and windowing process.Pre-emphasis is that voice signal is carried out high-pass filtering, and the transition function of wave filter is H (z)=1-az
-1, a=0.975 wherein.To the branch frame of voice signal, wherein frame length is 512 points, and it is 256 points that frame moves.To the added window of voice signal is Hamming window, and wherein the function of Hamming window is:
Step (3), the channelling mode noise after the extraction pre-service in the voice signal, extraction step is as shown in Figure 2.The extraction of channelling mode noise is divided into following steps:
Step S301 is with arriving channelling mode noise extraction module 300 through pretreated phonetic entry in the step (2);
Step S302 carries out noise-removed filtering with the signal among the step S301 through the noise-removed filtering device and handles, and the design of noise-removed filtering device is following:
Step S303 is with carrying out the statistics frame analysis respectively without the voice signal of crossing noise-removed filtering among process noise-removed filtering and the step S301 among the step S302.Statistics frame is the mean value of same frequency composition in the voice signal short time frame, establishes X={x
1[n], K, x
T[n] } the expression frame number is the voice signal of T, i (the frame signal x of 1≤i≤T) then
i[n] (discrete Fourier transformation of 0≤n≤N-1) is:
The expression formula of statistics frame S [k] is following so:
As shown in Figure 3, the method for distilling of statistics frame is divided into following steps among the step S303:
Step S3031 will carry out discrete Fourier transformation through the signal that step S301, S302 handle;
Step S3032 is with superposeing through same frequency composition in the every frame of signal of discrete Fourier transformation among the step S3031;
Step S3033 asks the frequency spectrum that superposes among the step S3032 on average, obtains the statistics frame of input signal.
Step S304; Ask log power spectrum; The two paths of signals that process statistics frame among the step S303 is analyzed extracts log power spectrum; To deduct another road signal without the road signal of crossing noise-removed filtering then, thereby obtain the channelling mode noise of input speech signal, be shown below through the noise-removed filtering device:
Wherein Defilter () is the noise-removed filtering device that designs among the step S302.
Step (4), statistical nature when extracting two group leaders on the basis of the signal mode noise that obtains in above-mentioned step, one group is the Legendre multinomial coefficient on 0~5 rank, other one group is 6 kinds of statistical natures of channelling mode noise.
Step S401, the extraction of Legendre multinomial coefficient: the legendre multinomial coefficient of getting 0~5 rank carries out parameter fitting to the channelling mode noise that extracts.
The polynomial form of Legendre is following:
Wherein 3, L
nBe the Legendre multinomial coefficient.After extracting the channelling mode noise, carry out the Legendre polynomial expansion, obtain L
0~L
5Multinomial coefficient.Each Legendre multinomial coefficient has embodied the information of an aspect of channelling mode noise: L0---the direct current component of channelling mode noise; L1---channelling mode noise profile slope of a curve; L2---the curvature of channelling mode noise profile curve; L3---the S curvature of channelling mode noise profile curve; L4, L5---the more details information of channelling mode noise profile curve.
Step S402 extracts the statistical nature based on the channelling mode noise, and this group statistical nature comprises following six kinds of characteristics:
● PN_min: the minimum value of channelling mode noise;
● PN_max: the maximal value of channelling mode noise;
● PN_mean: the average of channelling mode noise;
● PN_median: the intermediate value of channelling mode noise;
● PN_diff: maximal value and minimum value poor;
● PN_stdev: the standard deviation of channelling mode noise.
Statistical nature vector when statistical nature is merged into one group of 12 tie up long during with two group leaders is with its eigenvector that detects as the recording replay attack.
Step (5) is set up SVM interchannel noise classification judgement model, and the voice to be identified that are used for distinguishing input are raw tone or voice playback.The detailed process that SVM makes up the interchannel noise model parameter is following: SVM makes up the interchannel noise model parameter and comprises positive sample and negative sample.Wherein positive sample be primary speech signal through above-mentioned steps (2)~(4) obtain based on channelling mode noise long the time statistical nature.Negative sample for the voice playback signal through above-mentioned steps (2)~(4) obtain based on channelling mode noise long the time statistical nature.
So-called svm classifier is that the requirement classifying face not only can correctly separate two types of samples, and makes the class interval maximum.We can be to sample set (x
i, y
i), i=1, L, n, x ∈ R
d, y
i∈ [1 ,+1], carry out normalization it satisfied:
y
i[(w·x
i)+b]-1≥0,i=1,L,n
This moment, the class interval equaled 2/||w||, the interval maximum is equivalent to makes || w||
2Minimum.Therefore satisfy following formula and make
minimum classifying face and just be called the optimal classification face, the training sample point on it just is called support vector.
Utilize the Lagrange optimization method to find the solution, the Lagrange function is:
This function is converted into the Wolf dual problem, promptly in constraint condition:
Down to α
iFind the solution down the array function maximal value:
α
iFor in the former problem with each constraint condition y
i[(wx
i)+b]-1>=0, i=1, L, the Lagrange multiplier that n is corresponding.After separating the problems referred to above, establish the optimum solution that obtains separate into
And b
*, x is the grouped data of treating of input.Available optimal classification function (being the output function of SVM),
Speech samples can have fully and makes an uproar in the reality, and linear separability fully is so be under the inseparable situation of linearity, to use the svm classifier device.Then can be in constraint condition
y
i[(w·x
i)+b]-1≥0,i=1,L,n
Relaxation factor ξ of middle increase
i>=0, then constraint condition becomes:
y
i[(w·x
i)+b]-1+ξ
i≥0,i=1,L,n
Then the Lagrange function is:
Changing the Wolf problem into gets:
Wherein C is a constant, in order to the degree of control to this punishment of wrong increment, is called penalty factor.
So under the inseparable situation of linearity, the output function of SVM can be expressed as:
Wherein, 0≤α
i≤C, i=1 ..., n, sgn () are sign function,
K (x
iX
j) be radially basic inner product function, can be used as kernel function as SVM:
K(x,x
i)=exp(-λ||x-x
j||),λ>0
Can select different kernel functions in the practical operation.
Penalty factor C and λ confirm through SMO (Sequential Minimal Optimization, sequential minimum optimization) algorithm and grid search algorithm, and are used to train the interchannel noise model.One group through actual parameter optimization is set to: C=0.03125, λ=0.0078125.
Step (6); The Classification and Identification of raw tone and voice playback; Import voice signal to be identified; Court verdict is exported in the replay attack detection of recording of statistical nature when obtaining based on channelling mode noise long through above-mentioned steps (2)~(4), the interchannel noise model that utilizes step (5) to set up at last.
As shown in Figure 1, a kind of recording replay attack detection system of the present invention comprises:
---load module 100 is used for input training or voice signal to be identified;
---pre-processing module 200, be used for voice signal is carried out pre-service, it comprises pre-emphasis, divides frame and adds window unit;
---channelling mode noise extraction module 300 is used for extracting the channelling mode noise of voice signal after the pre-service;
---statistical nature extraction module 400 when long, statistical nature when being used to extract based on channelling mode noise long;
---interchannel noise model module 500, statistical nature utilizes SVM to classify when being used for training long, sets up interchannel noise classification and adjudicates model;
---recognition decision module 600 is used to utilize whether the voice to be identified of interchannel noise model module judgement input are recording replay attack voice;
---output module 700 is used to export the court verdict of voice signal to be identified.
Provided by the invention a kind of based on channelling mode noise recording replay attack detection method; At recording and voice playback database (Authentic and Playback Speech Database; APSD) compare in Yu based on the sentence similarity comparative approach; As shown in table 1, lower based on the method fault rate of channelling mode noise.
Table 1
As shown in Figure 4, the recording replay attack detecting device that two kinds of methods are set up is connected with the Speaker Recognition System of reality respectively.For the data that contain the replay attack voice, the Speaker Recognition System error rate that does not load the replay attack detection module is very high, and security performance is very low.Loading based on the replay attack detection module of channelling mode noise after error rate such as system minimum, be 10.2564%.And load based on error rates such as systems behind the replay attack detection module of sentence similarity comparison is 29.0598%.
Proposed by the invention a kind ofly be simple and easy to not only realize that based on channelling mode noise recording replay attack detection method efficiency of algorithm is high, and error rate is low.Be used on embedded identification and other smart machine higher efficient will be arranged.
Claims (8)
1. the recording replay attack detection method based on the channelling mode noise is characterized in that, said recording replay attack detection method may further comprise the steps:
(1) imports voice signal to be identified;
(2) voice signal is carried out pre-service;
(3) the channelling mode noise in the voice signal after the extraction pre-service;
Statistical nature when (4) extracting based on channelling mode noise long;
(5) classify the court verdict that the replay attack that obtains recording detects according to interchannel noise classification judgement model statistical nature when long.
2. a kind of recording replay attack detection method as claimed in claim 1 is characterized in that, the pre-service in the said step (2) comprises pre-emphasis, divides frame and windowing.
3. a kind of recording replay attack detection method as claimed in claim 1 is characterized in that said step (3) is further comprising the steps of:
(31) pretreated voice signal being carried out noise-removed filtering handles;
(32) noise-removed filtering is handled forward and backward signal and carry out the statistics frame analysis respectively;
(33) two paths of signals after statistics frame is analyzed extracts log power spectrum, and subtraction, extracts the channelling mode noise of input speech signal.
4. a kind of recording replay attack detection method as claimed in claim 3 is characterized in that, said statistics frame is after the short time frame of voice signal is done discrete Fourier transformation, to get the wherein mean value of same frequency composition.
5. a kind of recording replay attack detection method as claimed in claim 1 is characterized in that said step (4) is further comprising the steps of:
(41) 0 ~ 5 rank Legendre multinomial coefficient of extraction channelling mode noise;
(42) six statistical natures of extraction channelling mode noise;
Statistical nature vector when the numerical value that (43) above-mentioned steps is obtained is merged into one group of 12 tie up long is as the eigenvector of recording replay attack detection.
6. a kind of recording replay attack detection method as claimed in claim 5 is characterized in that, minimum value, maximal value, average, intermediate value, standard deviation and the maximal value that six statistical natures of said step (42) are the channelling mode noise and the difference of minimum value.
7. a kind of recording replay attack detection method as claimed in claim 1 is characterized in that, the interchannel noise classification judgement modelling of said step (5) comprises the steps:
(51) input training utterance signal;
Statistical nature when (52) repeating step (2) ~ (4), the channelling mode noise that obtains training long;
(53) utilize SVMs (SVM) to classify, set up interchannel noise classification judgement model.
8. recording replay attack detection system based on the channelling mode noise is characterized in that comprising:
---load module (100) is used to import training utterance signal or voice signal to be identified;
---pre-processing module (200), be used for training utterance signal or voice signal to be identified are carried out pre-service, it comprises pre-emphasis, divides frame and adds window unit;
---channelling mode noise extraction module (300) is used for extracting the channelling mode noise of training utterance signal after the pre-service or voice signal to be identified;
---statistical nature extraction module (400) when long, statistical nature when being used to extract based on the training utterance signal of channelling mode noise or voice signal to be identified long;
---interchannel noise model module (500), statistical nature utilizes SVM to classify when being used for training utterance signal long, sets up interchannel noise classification judgement model;
---recognition decision module (600), statistical nature is classified when being used to utilize interchannel noise classification judgement model to treat recognition of speech signals long, the court verdict of the replay attack detection that obtains recording;
---output module (700) is used to export the court verdict of voice signal to be identified.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011103305987A CN102436810A (en) | 2011-10-26 | 2011-10-26 | Record replay attack detection method and system based on channel mode noise |
PCT/CN2011/084868 WO2013060079A1 (en) | 2011-10-26 | 2011-12-29 | Record playback attack detection method and system based on channel mode noise |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2011103305987A CN102436810A (en) | 2011-10-26 | 2011-10-26 | Record replay attack detection method and system based on channel mode noise |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102436810A true CN102436810A (en) | 2012-05-02 |
Family
ID=45984833
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2011103305987A Pending CN102436810A (en) | 2011-10-26 | 2011-10-26 | Record replay attack detection method and system based on channel mode noise |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN102436810A (en) |
WO (1) | WO2013060079A1 (en) |
Cited By (23)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102820034A (en) * | 2012-07-16 | 2012-12-12 | 中国民航大学 | Noise sensing and identifying device and method for civil aircraft |
CN104569551A (en) * | 2015-01-08 | 2015-04-29 | 漳州科华技术有限责任公司 | DC component detecting method applied to inversion voltages |
CN105023571A (en) * | 2015-07-28 | 2015-11-04 | 苏州宏展信息科技有限公司 | Voice feature extraction control method for recording pen |
CN105513598A (en) * | 2016-01-14 | 2016-04-20 | 宁波大学 | Playback voice detection method based on distribution of information quantity in frequency domain |
CN105869630A (en) * | 2016-06-27 | 2016-08-17 | 上海交通大学 | Method and system for detecting voice spoofing attack of speakers on basis of deep learning |
CN105913855A (en) * | 2016-04-11 | 2016-08-31 | 宁波大学 | Long window scaling factor-based playback voice attack detection algorithm |
CN106297772A (en) * | 2016-08-24 | 2017-01-04 | 武汉大学 | Detection method is attacked in the playback of voice signal distorted characteristic based on speaker introducing |
WO2017000813A1 (en) * | 2015-06-30 | 2017-01-05 | 芋头科技(杭州)有限公司 | Indoor noise pollution automatic identification and monitoring system |
CN106409298A (en) * | 2016-09-30 | 2017-02-15 | 广东技术师范学院 | Identification method of sound rerecording attack |
CN106531172A (en) * | 2016-11-23 | 2017-03-22 | 湖北大学 | Speaker voice playback identification method and system based on environmental noise change detection |
CN108039176A (en) * | 2018-01-11 | 2018-05-15 | 广州势必可赢网络科技有限公司 | Voiceprint authentication method and device for preventing recording attack and access control system |
CN108281158A (en) * | 2018-01-12 | 2018-07-13 | 平安科技(深圳)有限公司 | Voice biopsy method, server and storage medium based on deep learning |
CN109243487A (en) * | 2018-11-30 | 2019-01-18 | 宁波大学 | A kind of voice playback detection method normalizing normal Q cepstrum feature |
CN109599117A (en) * | 2018-11-14 | 2019-04-09 | 厦门快商通信息技术有限公司 | A kind of audio data recognition methods and human voice anti-replay identifying system |
CN109754817A (en) * | 2017-11-02 | 2019-05-14 | 北京三星通信技术研究有限公司 | signal processing method and terminal device |
CN110299141A (en) * | 2019-07-04 | 2019-10-01 | 苏州大学 | The acoustic feature extracting method of recording replay attack detection in a kind of Application on Voiceprint Recognition |
CN110459226A (en) * | 2019-08-19 | 2019-11-15 | 效生软件科技(上海)有限公司 | A method of voice is detected by vocal print engine or machine sound carries out identity veritification |
CN110718229A (en) * | 2019-11-14 | 2020-01-21 | 国微集团(深圳)有限公司 | Detection method for record playback attack and training method corresponding to detection model |
CN111445904A (en) * | 2018-12-27 | 2020-07-24 | 北京奇虎科技有限公司 | Cloud-based voice control method and device and electronic equipment |
CN111462737A (en) * | 2020-03-26 | 2020-07-28 | 中国科学院计算技术研究所 | Method for training grouping model for voice grouping and voice noise reduction method |
CN112599149A (en) * | 2020-12-10 | 2021-04-02 | 中国传媒大学 | Detection method and device for replay attack voice |
CN113012684A (en) * | 2021-03-04 | 2021-06-22 | 电子科技大学 | Synthesized voice detection method based on voice segmentation |
CN114441029A (en) * | 2022-01-20 | 2022-05-06 | 深圳壹账通科技服务有限公司 | Recording noise detection method, device, equipment and medium of voice labeling system |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
BR102014023647B1 (en) * | 2014-09-24 | 2022-12-06 | Fundacao Cpqd - Centro De Pesquisa E Desenvolvimento Em Telecomunicacoes | METHOD AND SYSTEM FOR FRAUD DETECTION IN APPLICATIONS BASED ON VOICE PROCESSING |
CN105044478B (en) * | 2015-07-23 | 2018-03-13 | 国家电网公司 | A kind of multi channel signals extracting method of transmission line of electricity audible noise |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1445752A (en) * | 2002-03-15 | 2003-10-01 | 松下电器产业株式会社 | Method and device for channel and additivity noise joint compensation in characteristic field |
CN1912993A (en) * | 2005-08-08 | 2007-02-14 | 中国科学院声学研究所 | Voice end detection method based on energy and harmonic |
CN101199006A (en) * | 2005-06-20 | 2008-06-11 | 微软公司 | Multi-sensory speech enhancement using a clean speech prior |
CN101223574A (en) * | 2005-12-08 | 2008-07-16 | 韩国电子通信研究院 | Voice recognition apparatus and method using vocal band signal |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1123863C (en) * | 2000-11-10 | 2003-10-08 | 清华大学 | Information check method based on speed recognition |
US8190437B2 (en) * | 2008-10-24 | 2012-05-29 | Nuance Communications, Inc. | Speaker verification methods and apparatus |
-
2011
- 2011-10-26 CN CN2011103305987A patent/CN102436810A/en active Pending
- 2011-12-29 WO PCT/CN2011/084868 patent/WO2013060079A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1445752A (en) * | 2002-03-15 | 2003-10-01 | 松下电器产业株式会社 | Method and device for channel and additivity noise joint compensation in characteristic field |
CN101199006A (en) * | 2005-06-20 | 2008-06-11 | 微软公司 | Multi-sensory speech enhancement using a clean speech prior |
CN1912993A (en) * | 2005-08-08 | 2007-02-14 | 中国科学院声学研究所 | Voice end detection method based on energy and harmonic |
CN101223574A (en) * | 2005-12-08 | 2008-07-16 | 韩国电子通信研究院 | Voice recognition apparatus and method using vocal band signal |
Non-Patent Citations (1)
Title |
---|
王志锋,贺前华,张雪源,罗海宇,苏卓生: "基于信道模式噪声的录音回放攻击检测", 《华南理工大学学报(自然科学版)》 * |
Cited By (34)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102820034B (en) * | 2012-07-16 | 2014-05-21 | 中国民航大学 | Noise sensing and identifying device and method for civil aircraft |
CN102820034A (en) * | 2012-07-16 | 2012-12-12 | 中国民航大学 | Noise sensing and identifying device and method for civil aircraft |
CN104569551A (en) * | 2015-01-08 | 2015-04-29 | 漳州科华技术有限责任公司 | DC component detecting method applied to inversion voltages |
CN104569551B (en) * | 2015-01-08 | 2016-03-23 | 漳州科华技术有限责任公司 | A kind of DC component detection method being applied to inverter voltage |
WO2017000813A1 (en) * | 2015-06-30 | 2017-01-05 | 芋头科技(杭州)有限公司 | Indoor noise pollution automatic identification and monitoring system |
CN105023571A (en) * | 2015-07-28 | 2015-11-04 | 苏州宏展信息科技有限公司 | Voice feature extraction control method for recording pen |
CN105513598B (en) * | 2016-01-14 | 2019-04-23 | 宁波大学 | A kind of voice playback detection method based on the distribution of frequency domain information amount |
CN105513598A (en) * | 2016-01-14 | 2016-04-20 | 宁波大学 | Playback voice detection method based on distribution of information quantity in frequency domain |
CN105913855A (en) * | 2016-04-11 | 2016-08-31 | 宁波大学 | Long window scaling factor-based playback voice attack detection algorithm |
CN105869630A (en) * | 2016-06-27 | 2016-08-17 | 上海交通大学 | Method and system for detecting voice spoofing attack of speakers on basis of deep learning |
CN105869630B (en) * | 2016-06-27 | 2019-08-02 | 上海交通大学 | Speaker's voice spoofing attack detection method and system based on deep learning |
CN106297772A (en) * | 2016-08-24 | 2017-01-04 | 武汉大学 | Detection method is attacked in the playback of voice signal distorted characteristic based on speaker introducing |
CN106297772B (en) * | 2016-08-24 | 2019-06-25 | 武汉大学 | Replay attack detection method based on the voice signal distorted characteristic that loudspeaker introduces |
CN106409298A (en) * | 2016-09-30 | 2017-02-15 | 广东技术师范学院 | Identification method of sound rerecording attack |
CN106531172B (en) * | 2016-11-23 | 2019-06-14 | 湖北大学 | Speaker's audio playback discrimination method and system based on ambient noise variation detection |
CN106531172A (en) * | 2016-11-23 | 2017-03-22 | 湖北大学 | Speaker voice playback identification method and system based on environmental noise change detection |
CN109754817A (en) * | 2017-11-02 | 2019-05-14 | 北京三星通信技术研究有限公司 | signal processing method and terminal device |
CN108039176B (en) * | 2018-01-11 | 2021-06-18 | 广州势必可赢网络科技有限公司 | Voiceprint authentication method and device for preventing recording attack and access control system |
CN108039176A (en) * | 2018-01-11 | 2018-05-15 | 广州势必可赢网络科技有限公司 | Voiceprint authentication method and device for preventing recording attack and access control system |
CN108281158A (en) * | 2018-01-12 | 2018-07-13 | 平安科技(深圳)有限公司 | Voice biopsy method, server and storage medium based on deep learning |
CN109599117A (en) * | 2018-11-14 | 2019-04-09 | 厦门快商通信息技术有限公司 | A kind of audio data recognition methods and human voice anti-replay identifying system |
CN109243487B (en) * | 2018-11-30 | 2022-12-27 | 宁波大学 | Playback voice detection method for normalized constant Q cepstrum features |
CN109243487A (en) * | 2018-11-30 | 2019-01-18 | 宁波大学 | A kind of voice playback detection method normalizing normal Q cepstrum feature |
CN111445904A (en) * | 2018-12-27 | 2020-07-24 | 北京奇虎科技有限公司 | Cloud-based voice control method and device and electronic equipment |
CN110299141A (en) * | 2019-07-04 | 2019-10-01 | 苏州大学 | The acoustic feature extracting method of recording replay attack detection in a kind of Application on Voiceprint Recognition |
CN110299141B (en) * | 2019-07-04 | 2021-07-13 | 苏州大学 | Acoustic feature extraction method for detecting playback attack of sound record in voiceprint recognition |
CN110459226A (en) * | 2019-08-19 | 2019-11-15 | 效生软件科技(上海)有限公司 | A method of voice is detected by vocal print engine or machine sound carries out identity veritification |
CN110718229A (en) * | 2019-11-14 | 2020-01-21 | 国微集团(深圳)有限公司 | Detection method for record playback attack and training method corresponding to detection model |
CN111462737A (en) * | 2020-03-26 | 2020-07-28 | 中国科学院计算技术研究所 | Method for training grouping model for voice grouping and voice noise reduction method |
CN111462737B (en) * | 2020-03-26 | 2023-08-08 | 中国科学院计算技术研究所 | Method for training grouping model for voice grouping and voice noise reduction method |
CN112599149A (en) * | 2020-12-10 | 2021-04-02 | 中国传媒大学 | Detection method and device for replay attack voice |
CN112599149B (en) * | 2020-12-10 | 2024-06-04 | 中国传媒大学 | Method and device for detecting replay attack voice |
CN113012684A (en) * | 2021-03-04 | 2021-06-22 | 电子科技大学 | Synthesized voice detection method based on voice segmentation |
CN114441029A (en) * | 2022-01-20 | 2022-05-06 | 深圳壹账通科技服务有限公司 | Recording noise detection method, device, equipment and medium of voice labeling system |
Also Published As
Publication number | Publication date |
---|---|
WO2013060079A1 (en) | 2013-05-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102436810A (en) | Record replay attack detection method and system based on channel mode noise | |
CN105913855B (en) | A kind of voice playback attack detecting algorithm based on long window scale factor | |
CN102394062B (en) | Method and system for automatically identifying voice recording equipment source | |
Gomez-Alanis et al. | A gated recurrent convolutional neural network for robust spoofing detection | |
Wang et al. | Channel pattern noise based playback attack detection algorithm for speaker recognition | |
CN112201255B (en) | Voice signal spectrum characteristic and deep learning voice spoofing attack detection method | |
WO2012075641A1 (en) | Device and method for pass-phrase modeling for speaker verification, and verification system | |
Paul et al. | Countermeasure to handle replay attacks in practical speaker verification systems | |
CN113192504B (en) | Silent voice attack detection method based on domain adaptation | |
Hassan et al. | Voice spoofing countermeasure for synthetic speech detection | |
CN105513598A (en) | Playback voice detection method based on distribution of information quantity in frequency domain | |
Khan et al. | Battling voice spoofing: a review, comparative analysis, and generalizability evaluation of state-of-the-art voice spoofing counter measures | |
Jin et al. | Speaker verification based on single channel speech separation | |
US20220108702A1 (en) | Speaker recognition method | |
CN114512134A (en) | Method and device for voiceprint information extraction, model training and voiceprint recognition | |
Kooshan et al. | Singer identification by vocal parts detection and singer classification using lstm neural networks | |
CN116863956A (en) | Robust snore detection method and system based on convolutional neural network | |
Ali et al. | Fake audio detection using hierarchical representations learning and spectrogram features | |
Dwijayanti et al. | Speaker identification using a convolutional neural network | |
Wang et al. | Detection of voice transformation spoofing based on dense convolutional network | |
Qin et al. | From Speaker Verification to Deepfake Algorithm Recognition: Our Learned Lessons from ADD2023 Track 3. | |
CN115881093A (en) | Method and system for acquiring voice of target speaker | |
Saleh et al. | Multimodal person identification through the fusion of face and voice biometrics | |
Hajipour et al. | Listening to sounds of silence for audio replay attack detection | |
Ashhad et al. | Improved vehicle sub-type classification for acoustic traffic monitoring |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C02 | Deemed withdrawal of patent application after publication (patent law 2001) | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20120502 |