CN109767776A - A kind of deception speech detection method based on intensive neural network - Google Patents

A kind of deception speech detection method based on intensive neural network Download PDF

Info

Publication number
CN109767776A
CN109767776A CN201910033384.XA CN201910033384A CN109767776A CN 109767776 A CN109767776 A CN 109767776A CN 201910033384 A CN201910033384 A CN 201910033384A CN 109767776 A CN109767776 A CN 109767776A
Authority
CN
China
Prior art keywords
deception
layer
intensive
formula
network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910033384.XA
Other languages
Chinese (zh)
Other versions
CN109767776B (en
Inventor
王泳
苏卓艺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Polytechnic Normal University
Original Assignee
Guangdong Polytechnic Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Polytechnic Normal University filed Critical Guangdong Polytechnic Normal University
Priority to CN201910033384.XA priority Critical patent/CN109767776B/en
Publication of CN109767776A publication Critical patent/CN109767776A/en
Application granted granted Critical
Publication of CN109767776B publication Critical patent/CN109767776B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a kind of deception speech detection methods based on intensive neural network, more particularly to field of information security technology, it specifically includes following detecting step: step 1: the building of VT deception voice transformation model: by breaking the connection between traditional time and frequency characteristic using STFT, and keeping rhythm constant;Convolutional neural networks are constructed, so that the output of previous layer network is sent to next layer as input, is exported by nonlinear operation.The present invention is by establishing intensive convolutional network, it ensure that the maximum information flow of interlayer, feature propagation is enhanced, and intensively connects the over-fitting reduced with regularization effect to the lesser task of training set, and intensive convolutional network can make network layer narrow, number of parameters is substantially reduced, mitigates degenerate problem, supports the reuse of limited neuron, it does not need to relearn the characteristic pattern of redundancy simultaneously, convenient for training.

Description

A kind of deception speech detection method based on intensive neural network
Technical field
The present invention relates to field of information security technology, it is more particularly related to which a kind of be based on intensive neural network Deception speech detection method.
Background technique
Today's society, speech deception phenomenon is very universal, proposes great challenge to social security.It is true from one Identify that the language pretended is very important in voice.Most of research all concentrates on voice conversion at present (VC), in speech synthesis and replay attack, however, being speaker A there is also another deception mode in voice deception Sound becomes a certain different sound (not needed target speaker), enable identifying system can not judge the voice for described in A, this Kind transformation is known as VT (Voice Transformation, voice deformation).People are to its attention but much less.
The patent of invention of 106875007 A of patent application publication CN discloses a kind of base for language fraud detection End-to-end deep neural network is remembered in convolution shot and long term, and deep neural network can directly optimize feature when the convolution of use is long It extracts and classifies according to current task, therefore given input can indicate more to have robustness and effectively, to make testing result It is improved comprehensively;Suitable feature is directly assessed by the training of combining classification device so that model can adapt to any correlation Task;Due to eliminating Front End so that model of the present invention enormously simplifies assembly line, especially API Calls: by list Joint classification and optimization in a model, so that the present invention is not necessarily to call join for individual classifier and feature extracting method more Number.
But it is in actual use, still there is more disadvantage, such as with the increase of the number of plies, it may occur that it degenerates, and Such connection type leads to many network layer contribution very littles, but occupies a large amount of calculating.
Summary of the invention
In order to overcome the drawbacks described above of the prior art, the embodiment of the present invention provides a kind of taking advantage of based on intensive neural network Speech detection method is deceived, by establishing intensive convolutional network, the maximum information flow of interlayer is ensure that, enhances feature propagation, and Intensive connection has regularization effect, reduces the over-fitting to the lesser task of training set, and intensive convolutional network can make Network layer narrows, and substantially reduces number of parameters, mitigates degenerate problem, supports the reuse of limited neuron, while not needing again Learn the characteristic pattern of redundancy, convenient for training, to solve the problems mentioned in the above background technology.
To achieve the above object, the invention provides the following technical scheme: a kind of deception voice based on intensive neural network Detection method specifically includes following detecting step:
Step 1: the building of VT deception voice transformation model: special by the time and frequency of breaking traditional using STFT Property between connection, and keep rhythm constant, wherein VT deception can be described as follows:
Assuming that xt(n)) be input speech signal t moment length be n frame, firstly, xt(n) FFT coefficient is by formula (1) it provides:
Wherein w (n) indicates that Hamming or Hanning window mouth, k indicate this index of frequency,
Then, instantaneous flow | F (k) | and the calculating of instantaneous frequency ω (k) is respectively in formula (2) and formula (3):
Δ indicates the deviation of this frequency of kth, and Fs indicates sample frequency,
VT is cheated, instantaneous frequency ω (k) is modified by formula (4), and α indicates scale factor, i.e. the deception factor,
ω ' (k* α)=ω (k) * α 0≤k < N/2 0≤k* α < N/2 (4)
Linear interpolation is shown in formula (5) commonly used in the instantaneous grade of modification, wherein 0≤k, k'< N/2, k=k'/α and μ= K'/α-k,
| F (k ') |=μ | F (k)+(1- μ) | F (k+1) | (5)
Another method for changing instantaneous flow modulus value is protecting energy amendment, as shown in formula (6),
Use the modified instantaneous frequency ω ' and instantaneous grade F' of k index
Then instantaneous phase φ ' (k) is calculated by instantaneous frequency ω ' (k), and then after being converted by formula (7) FFT coefficient,
F (k)=| F (k) | ejφ(k) (7)
Finally, to F'(k) inverse FFT is carried out, VT curve has been obtained,
From formula 4 and formula 5 as can be seen that VT cheating interference changes spectral magnitude, so that implicit features may be drawn Enter into deception voice signal, can input by using the spectrogram of voice as deep neural network, extraction depth characteristic into Row classification, and the spectrogram of an input speech signal has been obtained by short time discrete Fourier transform (STFT), formula (8) provides,
Wherein window size is 175, and lap 50%, in phonetics, VT cheating interference is led by 12 semitones The deception factor-alpha of cause measures, as shown in formula (9),
α (s)=2s/12 (9)
S can take any integer value in [- 12 ,+12] range, modified it is weak or it is too strong can all cause deception failure or It sounds unnatural, therefore, in an experiment, we have selected to have between [- 8, -4] and [+4 ,+8] and most cheat in ability by force Between section tested;
Step 2: building convolutional neural networks make the output of previous layer networkIt is sent to next layer and is used as input, By nonlinear operationOutputWherein,It can be expressed as follows:
With the increase of the number of plies, it may occur that degenerate, and residual error network, highway network and fractal net work all create from Short path X of the earlier network to rear layerl-n, have good inhibiting effect to degradation phenomena, as shown in formula (11)
Step 3: the detection accuracy of VT deception performance measurement: is tested by experiment corpus, wherein detection can describe It is as follows:
D=(Gd+Sd)/(G+S)
Wherein G and S is respectively true in test set and deception segment quantity, and Gd and Sd are respectively correctly to detect from G To genuine segments and the deception segment being correctly detecting from S quantity.
It in a preferred embodiment, further include a kind of intensive convolutional network for improving structure in the step 2, In intensive convolutional network, any layer is connected directly to all succeeding layers, is specifically expressed as follows,
Wherein X0,X1, Indicate the output of l layers of all layers of front, [...] indicates continuous operation, in addition, each layer Output dimension has k Feature Mapping, and wherein k is usually arranged as a lesser value.
In a preferred embodiment, the intensive convolutional network input is some single pass obtained by STFT Spectrogram, size is both configured to 90 × 88, and network, by an initialization layer, three intensive modules, two conversion layers, one is complete Office's pond layer and a linear layer composition, intensively touch block for three and are made of respectively 6 layers, 12 layers and 48 layers bottleneck layer, linear layer is One complete articulamentum is followed by a softmax, and there are two outputs, respectively indicates the probability of " true " and " deception ", Each convolution bottleneck layer includes 2 layers, and intensive convolutional network entire in this way includes 2 × (6+12+48)+1+1+1=135 convolutional layers.
In a preferred embodiment, the bottleneck layer includes one 1 × 1 layer of convolution, followed by one 3 × 3 two 3 × 3 layers replace convolution convolutional layer, and transition zone connects two adjacent denseblocks to be further reduced functionally The size of figure.
In a preferred embodiment, the experiment corpus in the step 3 includes Timit, NIST and UME, It is WAV format, 8 kilo hertzs of sample rates, 16 quantizations and monophonic.
In a preferred embodiment, described Timit, NIST and UME include training set and test set, wherein Training set is respectively Timit-1, NIST-1, UME-1, and test set is respectively Timit-2, NIST-2, UME-2.
Technical effect and advantage of the invention:
The present invention ensure that the maximum information flow of interlayer, enhance feature propagation by establishing intensive convolutional network, and close Collection connection has regularization effect, reduces the over-fitting to the lesser task of training set, and intensive convolutional network can make net Network layers narrow, and substantially reduce number of parameters, mitigate degenerate problem, support the reuse of limited neuron, while not needing to learn again The characteristic pattern of redundancy is practised, convenient for training, so that the present invention does not need to need manually to choose as traditional machine learning method Specific one or multiple features, are then classified with classifier again, but utilize the intensive neural network proposed, can from Hair ground extracts the feature that relevant feature includes some shallow-layer edges and then the feature of deep layer is classified in turn, simplifies entire stream Journey has simultaneously reached better effect.
Detailed description of the invention
Fig. 1 is speech detection flow chart of the invention;
Fig. 2 is intensive neural network structure figure of the invention;
Fig. 3 is intensive neural network internal structure chart of the invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
Embodiment 1
The present invention provides a kind of deception speech detection methods based on intensive neural network as shown in Figs. 1-3, specifically Including following detecting step:
Step 1: the building of VT deception voice transformation model: special by the time and frequency of breaking traditional using STFT Property between connection, and keep rhythm constant, wherein VT deception can be described as follows:
Assuming that xt(n)) be input speech signal t moment length be n frame, firstly, xt(n) FFT coefficient is by formula (1) it provides:
Wherein w (n) indicates that Hamming or Hanning window mouth, k indicate this index of frequency,
Then, instantaneous flow | F (k) | and the calculating of instantaneous frequency ω (k) is respectively in formula (2) and formula (3):
Δ indicates the deviation of this frequency of kth, and Fs indicates sample frequency,
VT is cheated, instantaneous frequency ω (k) is modified by formula (4), and α indicates scale factor, i.e. the deception factor,
ω ' (k* α)=ω (k) * α 0≤k < N/2 0≤k* α < N/2 (4)
Linear interpolation is shown in formula (5) commonly used in the instantaneous grade of modification, wherein 0≤k, k'< N/2, k=k'/α and μ= K'/α-k,
| F (k ') |=μ F (k) |+(1- μ) | F (k+1) | (5)
Another method for changing instantaneous flow modulus value is protecting energy amendment
(Energy-preserving modification), as shown in formula (6),
Use the modified instantaneous frequency ω ' and instantaneous grade F' of k index
Then instantaneous phase φ ' (k) is calculated by instantaneous frequency ω ' (k), and then after being converted by formula (7) FFT coefficient,
F (k)=| F (k) | ejφ(k) (7)
Finally, to F'(k) inverse FFT is carried out, VT curve has been obtained,
From formula 4 and formula 5 as can be seen that VT cheating interference changes spectral magnitude, so that implicit features may be drawn Enter into deception voice signal, can input by using the spectrogram of voice as deep neural network, extraction depth characteristic into Row classification, and the spectrogram of an input speech signal has been obtained by short time discrete Fourier transform (STFT), formula (8) provides,
Wherein window size is 175, and lap 50%, in phonetics, VT cheating interference is led by 12 semitones The deception factor-alpha of cause measures, as shown in formula (9),
α (s)=2s/12 (9)
S can take any integer value in [- 12 ,+12] range, modified it is weak or it is too strong can all cause deception failure or It sounds unnatural, therefore, in an experiment, we have selected to have between [- 8, -4] and [+4 ,+8] and most cheat in ability by force Between section tested;
Step 2: building convolutional neural networks (CNN) makes the output of previous layer networkIt is sent to next layer of conduct Input, by nonlinear operationOutputWherein,It can be expressed as follows:
With the increase of the number of plies, it may occur that it degenerates, and residual error network (ResNets), highway network (Highway Networks) and fractal net work (FractalNets) all creates the short path X from earlier network to rear layerl-n, existing to degenerating As there is good inhibiting effect, as shown in formula (11)
Step 3: the detection accuracy of VT deception performance measurement: is tested by experiment corpus, wherein detection can describe It is as follows:
D=(Gd+Sd)/(G+S)
Wherein G and S is respectively true in test set and deception segment quantity, and Gd and Sd are respectively correctly to detect from G To genuine segments and the deception segment being correctly detecting from S quantity.
Further, the experiment corpus in the step 3 include Timit (6300 segments, 630 spokesmans), NIST (3560 segments, 356 spokesmans) and UME (4040 segments, 202 spokesmans), is WAV format, and 8 kilo hertzs Sample rate, 16 quantizations and monophonic.
Further, the Timit (6300 segments, 630 spokesmans), NIST (3560 segments, 356 speeches Person) and UME (4040 segments, 202 spokesmans) include training set and test set, wherein training set is respectively Timit-1 (3000 segments), NIST-1 (2000 segments), UME-1 (2040 segments), and test set is respectively Timit-2 (3300 A segment), NIST-2 (1560 segments), UME-2 (2000 segments).
Embodiment 2
It unlike the first embodiment, further include a kind of intensive convolutional network for improving structure in the step 2 (DenseNet), in intensive convolutional network (DenseNet), any layer is connected directly to all succeeding layers, specific to indicate such as Under,
Wherein X0,X1,Indicate the output of l layers of all layers of front, [...] indicates continuous operation, in addition, each layer Output dimension has k Feature Mapping, and wherein k is usually arranged as a lesser value.
Further, intensive convolutional network (DenseNet) input is some single pass spectrums obtained by STFT Figure, size is both configured to 90 × 88, and network is by an initialization layer, three intensive modules, two conversion layers, a global pool Change layer and a linear layer composition, intensively touches block for three and be made of respectively 6 layers, 12 layers and 48 layers bottleneck layer, linear layer is one Complete articulamentum is followed by a softmax, and there are two outputs, respectively indicates the probability of " true " and " deception ", each Convolution bottleneck layer includes 2 layers, and intensive convolutional network (DenseNet) entire in this way is rolled up comprising 2 × (6+12+48)+1+1+1=135 Lamination is conducive to automatically extract depth characteristic by 135 layers of intensive convolutional network, to improve computational efficiency.
Further, the bottleneck layer includes one 1 × 1 layer of convolution, followed by one 3 × 3 two 3 × 3 layers It is calculated instead of convolution convolutional layer with reducing, transition zone connects two adjacent denseblocks to be further reduced function map Size.
Based on embodiment 2, homologous corpus assessment and across corpus assessment are carried out to test set and training set respectively:
(1) homologous corpus assessment
In the case where internal database, test set and training set come from the same corpus, the testing result of this method It is as shown in the table with the result of other methods,
From the data in the table, the method for the average detected ratio of precision tradition CNN model of method proposed by the invention is high 2.58%, it is higher than the method for SVM model by 3.66%, so that decision had not only had depth characteristic but also had referred in intensive convolutional network The edge feature of early stage, so as to further increasing precision.
(2) overstate that corpus is assessed
In reality scene, tested speech and training voice may be from different sources, by choosing three corpus In one be used as test data set, other two is as training set, and the experimental results are shown inthe following table,
From the data in the table, the result of first two situation is all fine, but scheme 3 is unsatisfactory, a possible reason Be NIST data volume be greater than table 1 shown in other two groups, illustrate NIST training model have better generalization ability, and And in the method for GNN, the accuracy rate of scheme 1 is 94.37%, and our accuracy rate is 96.45%, illustrate it is proposed that Method be better than GNN method.
Finally, it should be noted that the foregoing is only a preferred embodiment of the present invention, it is not intended to restrict the invention, All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in of the invention Within protection scope.

Claims (6)

1. a kind of deception speech detection method based on intensive neural network, it is characterised in that: specifically include following detecting step:
Step 1: VT deception voice transformation model building: by broken using STFT traditional time and frequency characteristic it Between connection, and keep rhythm constant, wherein VT deception can be described as follows:
Assuming that xt(n)) be input speech signal t moment length be n frame, firstly, xt(n) FFT coefficient is by formula (1) It provides:
Wherein w (n) indicates that Hamming or Hanning window mouth, k indicate this index of frequency,
Then, instantaneous flow | F (k) | and the calculating of instantaneous frequency ω (k) is respectively in formula (2) and formula (3):
Δ indicates the deviation of this frequency of kth, and Fs indicates sample frequency,
VT is cheated, instantaneous frequency ω (k) is modified by formula (4), and α indicates scale factor, i.e. the deception factor,
ω ' (k* α)=ω (k) * α 0≤k < N/2 0≤k* α < N/2 (4)
Linear interpolation is shown in formula (5), wherein 0≤k, k'< N/2, k=k'/α and μ=k'/α-commonly used in modifying instantaneous grade K,
| F (k ') |=μ | F (k) |+(1- μ) | F (k+1) | (5)
Another method for changing instantaneous flow modulus value is protecting energy amendment, as shown in formula (6),
Use the modified instantaneous frequency ω ' and instantaneous grade F' of k index
Then instantaneous phase φ ' (k) is calculated by instantaneous frequency ω ' (k), and then passes through the FFT system after formula (7) are converted Number,
F (k)=| F (k) | ejφ(k) (7)
Finally, to F'(k) inverse FFT is carried out, VT curve has been obtained,
From formula 4 and formula 5 as can be seen that VT cheating interference changes spectral magnitude, so that implicit features may be introduced in Cheat in voice signal, can input by using the spectrogram of voice as deep neural network, extraction depth characteristic divided Class, and the spectrogram of an input speech signal has been obtained by short time discrete Fourier transform (STFT), formula (8) provides,
Wherein window size is 175, and lap 50%, in phonetics, VT cheating interference is as caused by 12 semitones Factor-alpha is cheated to measure, as shown in formula (9),
α (s)=2s/12 (9)
S can take any integer value in [- 12 ,+12] range, modified it is weak or it is too strong can all cause deception failure or listened Next unnatural, therefore, in an experiment, we have selected have the middle area for most cheating ability by force between [- 8, -4] and [+4 ,+8] Between tested;
Step 2: building convolutional neural networks make the output X of previous layer networkl-1Next layer is sent to as input, is passed through Nonlinear operation HlExport Xl, wherein XlIt can be expressed as follows:
Xl=Hl(Xl-1) (10)
With the increase of the number of plies, it may occur that it degenerates, and residual error network, highway network and fractal net work were all created from early stage Short path X of the network to rear layerl-n, have good inhibiting effect to degradation phenomena, as shown in formula (11)
Xl=Hl(Xl-1)+Xl-n(11);
Step 3: the detection accuracy of VT deception performance measurement: is tested by experiment corpus, wherein detection can be described as follows:
D=(Gd+Sd)/(G+S)
Wherein G and S is respectively true in test set and deception segment quantity, and Gd and Sd are respectively to be correctly detecting from G The quantity of genuine segments and the deception segment being correctly detecting from S.
2. a kind of deception speech detection method based on intensive neural network according to claim 1, it is characterised in that: institute Stating further includes a kind of intensive convolutional network for improving structure in step 2, and in intensive convolutional network, any layer is all directly connected to To all succeeding layers, specifically it is expressed as follows,
Xl=Hl([X0,X1,...,Xl-1])
Wherein X0,X1,Xl-1Indicate the output of l layers of all layers of front, [...] indicates continuous operation, in addition, each layer of output is tieed up Degree has k Feature Mapping, and wherein k is usually arranged as a lesser value.
3. a kind of deception speech detection method based on intensive neural network according to claim 2, it is characterised in that: institute Stating intensive convolutional network input is some single pass spectrograms obtained by STFT, and size is both configured to 90 × 88, and network By an initialization layer, three intensive modules, two conversion layers, a global pool layer and a linear layer composition, three close Collection is touched block and is made of respectively 6 layers, 12 layers and 48 layers bottleneck layer, and linear layer is a complete articulamentum, is followed by one Softmax, there are two outputs, respectively indicate the probability of " true " and " deception ", and each convolution bottleneck layer includes 2 layers, in this way Entire intensive convolutional network includes 2 × (6+12+48)+1+1+1=135 convolutional layers.
4. a kind of deception speech detection method based on intensive neural network according to claim 3, it is characterised in that: institute Stating bottleneck layer includes one 1 × 1 layer of convolution, and followed by one 3 × 3 two 3 × 3 layers replace convolution convolutional layer, transition Layer connects two adjacent denseblocks to be further reduced the size of function map.
5. a kind of deception speech detection method based on intensive neural network according to claim 1, it is characterised in that: institute Stating the experiment corpus in step 3 includes Timit, NIST and UME, is WAV format, 8 kilo hertzs of sample rates, 16 amounts Change and monophonic.
6. a kind of deception speech detection method based on intensive neural network according to claim 5, it is characterised in that: institute Stating Timit, NIST and UME includes training set and test set, wherein and training set is respectively Timit-1, NIST-1, UME-1, And test set is respectively Timit-2, NIST-2, UME-2.
CN201910033384.XA 2019-01-14 2019-01-14 Deception voice detection method based on dense neural network Active CN109767776B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910033384.XA CN109767776B (en) 2019-01-14 2019-01-14 Deception voice detection method based on dense neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910033384.XA CN109767776B (en) 2019-01-14 2019-01-14 Deception voice detection method based on dense neural network

Publications (2)

Publication Number Publication Date
CN109767776A true CN109767776A (en) 2019-05-17
CN109767776B CN109767776B (en) 2023-12-15

Family

ID=66452939

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910033384.XA Active CN109767776B (en) 2019-01-14 2019-01-14 Deception voice detection method based on dense neural network

Country Status (1)

Country Link
CN (1) CN109767776B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110211604A (en) * 2019-06-17 2019-09-06 广东技术师范大学 A kind of depth residual error network structure for voice deformation detection
CN110232928A (en) * 2019-06-13 2019-09-13 苏州思必驰信息科技有限公司 The unrelated speaker validation method of text and device
CN110390952A (en) * 2019-06-21 2019-10-29 江南大学 City sound event classification method based on bicharacteristic 2-DenseNet parallel connection
CN111243621A (en) * 2020-01-14 2020-06-05 四川大学 Construction method of GRU-SVM deep learning model for synthetic speech detection
CN111933154A (en) * 2020-07-16 2020-11-13 平安科技(深圳)有限公司 Method and device for identifying counterfeit voice and computer readable storage medium
CN113506583A (en) * 2021-06-28 2021-10-15 杭州电子科技大学 Disguised voice detection method using residual error network

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102231277A (en) * 2011-06-29 2011-11-02 电子科技大学 Method for protecting mobile terminal privacy based on voiceprint recognition
US20130138428A1 (en) * 2010-01-07 2013-05-30 The Trustees Of The Stevens Institute Of Technology Systems and methods for automatically detecting deception in human communications expressed in digital form
CN105845127A (en) * 2015-01-13 2016-08-10 阿里巴巴集团控股有限公司 Voice recognition method and system
CN105869630A (en) * 2016-06-27 2016-08-17 上海交通大学 Method and system for detecting voice spoofing attack of speakers on basis of deep learning
CN106875007A (en) * 2017-01-25 2017-06-20 上海交通大学 End-to-end deep neural network is remembered based on convolution shot and long term for voice fraud detection
CN107293302A (en) * 2017-06-27 2017-10-24 苏州大学 A kind of sparse spectrum signature extracting method being used in voice lie detection system
CN108597540A (en) * 2018-04-11 2018-09-28 南京信息工程大学 A kind of speech-emotion recognition method based on variation mode decomposition and extreme learning machine
CN108806698A (en) * 2018-03-15 2018-11-13 中山大学 A kind of camouflage audio recognition method based on convolutional neural networks

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130138428A1 (en) * 2010-01-07 2013-05-30 The Trustees Of The Stevens Institute Of Technology Systems and methods for automatically detecting deception in human communications expressed in digital form
CN102231277A (en) * 2011-06-29 2011-11-02 电子科技大学 Method for protecting mobile terminal privacy based on voiceprint recognition
CN105845127A (en) * 2015-01-13 2016-08-10 阿里巴巴集团控股有限公司 Voice recognition method and system
CN105869630A (en) * 2016-06-27 2016-08-17 上海交通大学 Method and system for detecting voice spoofing attack of speakers on basis of deep learning
CN106875007A (en) * 2017-01-25 2017-06-20 上海交通大学 End-to-end deep neural network is remembered based on convolution shot and long term for voice fraud detection
CN107293302A (en) * 2017-06-27 2017-10-24 苏州大学 A kind of sparse spectrum signature extracting method being used in voice lie detection system
CN108806698A (en) * 2018-03-15 2018-11-13 中山大学 A kind of camouflage audio recognition method based on convolutional neural networks
CN108597540A (en) * 2018-04-11 2018-09-28 南京信息工程大学 A kind of speech-emotion recognition method based on variation mode decomposition and extreme learning machine

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110232928A (en) * 2019-06-13 2019-09-13 苏州思必驰信息科技有限公司 The unrelated speaker validation method of text and device
CN110232928B (en) * 2019-06-13 2021-05-25 思必驰科技股份有限公司 Text-independent speaker verification method and device
CN110211604A (en) * 2019-06-17 2019-09-06 广东技术师范大学 A kind of depth residual error network structure for voice deformation detection
CN110390952A (en) * 2019-06-21 2019-10-29 江南大学 City sound event classification method based on bicharacteristic 2-DenseNet parallel connection
CN110390952B (en) * 2019-06-21 2021-10-22 江南大学 City sound event classification method based on dual-feature 2-DenseNet parallel connection
CN111243621A (en) * 2020-01-14 2020-06-05 四川大学 Construction method of GRU-SVM deep learning model for synthetic speech detection
CN111933154A (en) * 2020-07-16 2020-11-13 平安科技(深圳)有限公司 Method and device for identifying counterfeit voice and computer readable storage medium
WO2021135454A1 (en) * 2020-07-16 2021-07-08 平安科技(深圳)有限公司 Method, device, and computer-readable storage medium for recognizing fake speech
CN111933154B (en) * 2020-07-16 2024-02-13 平安科技(深圳)有限公司 Method, equipment and computer readable storage medium for recognizing fake voice
CN113506583A (en) * 2021-06-28 2021-10-15 杭州电子科技大学 Disguised voice detection method using residual error network
CN113506583B (en) * 2021-06-28 2024-01-05 杭州电子科技大学 Camouflage voice detection method using residual error network

Also Published As

Publication number Publication date
CN109767776B (en) 2023-12-15

Similar Documents

Publication Publication Date Title
CN109767776A (en) A kind of deception speech detection method based on intensive neural network
CN105139857B (en) For the countercheck of voice deception in a kind of automatic Speaker Identification
CN103617799B (en) A kind of English statement pronunciation quality detection method being adapted to mobile device
CN108564942A (en) One kind being based on the adjustable speech-emotion recognition method of susceptibility and system
CN102820033A (en) Voiceprint identification method
CN108711436A (en) Speaker verification&#39;s system Replay Attack detection method based on high frequency and bottleneck characteristic
Auckenthaler et al. Improving a GMM speaker verification system by phonetic weighting
CN103578481B (en) A kind of speech-emotion recognition method across language
CN110120230B (en) Acoustic event detection method and device
JPH1083194A (en) Two-stage group selection method for speaker collation system
CN109545191B (en) Real-time detection method for initial position of human voice in song
CN110211604A (en) A kind of depth residual error network structure for voice deformation detection
CN106409298A (en) Identification method of sound rerecording attack
CN106531174A (en) Animal sound recognition method based on wavelet packet decomposition and spectrogram features
CN109346084A (en) Method for distinguishing speek person based on depth storehouse autoencoder network
CN104575519A (en) Feature extraction method and device as well as stress detection method and device
CN111611566B (en) Speaker verification system and replay attack detection method thereof
CN109920447B (en) Recording fraud detection method based on adaptive filter amplitude phase characteristic extraction
Fathullah et al. Improved large-margin softmax loss for speaker diarisation
Xiao Adaptive margin circle loss for speaker verification
CN105070300A (en) Voice emotion characteristic selection method based on speaker standardization change
CN112767951A (en) Voice conversion visual detection method based on deep dense network
CN112349267A (en) Synthesized voice detection method based on attention mechanism characteristics
CN110415707A (en) A kind of method for distinguishing speek person based on phonetic feature fusion and GMM
CN108665901A (en) A kind of phoneme/syllable extracting method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 510665 293 Zhongshan Avenue, Tianhe District, Guangzhou, Guangdong.

Applicant after: GUANGDONG POLYTECHNIC NORMAL University

Address before: 510665 293 Zhongshan Avenue, Tianhe District, Guangzhou, Guangdong.

Applicant before: GUANGDONG POLYTECHNIC NORMAL University

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant