CN109767776A - A kind of deception speech detection method based on intensive neural network - Google Patents
A kind of deception speech detection method based on intensive neural network Download PDFInfo
- Publication number
- CN109767776A CN109767776A CN201910033384.XA CN201910033384A CN109767776A CN 109767776 A CN109767776 A CN 109767776A CN 201910033384 A CN201910033384 A CN 201910033384A CN 109767776 A CN109767776 A CN 109767776A
- Authority
- CN
- China
- Prior art keywords
- deception
- layer
- intensive
- formula
- network
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000013528 artificial neural network Methods 0.000 title claims abstract description 21
- 238000001514 detection method Methods 0.000 title claims abstract description 21
- 238000012549 training Methods 0.000 claims abstract description 18
- 238000013527 convolutional neural network Methods 0.000 claims abstract description 6
- 230000009466 transformation Effects 0.000 claims abstract description 6
- 230000033764 rhythmic process Effects 0.000 claims abstract description 4
- 238000012360 testing method Methods 0.000 claims description 14
- 238000000034 method Methods 0.000 claims description 13
- 238000002474 experimental method Methods 0.000 claims description 9
- 238000006243 chemical reaction Methods 0.000 claims description 4
- 238000000605 extraction Methods 0.000 claims description 3
- 230000002401 inhibitory effect Effects 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000005259 measurement Methods 0.000 claims description 3
- 239000000203 mixture Substances 0.000 claims description 3
- 230000003595 spectral effect Effects 0.000 claims description 3
- 230000007704 transition Effects 0.000 claims description 3
- 230000015556 catabolic process Effects 0.000 claims description 2
- 230000008859 change Effects 0.000 claims description 2
- 238000006731 degradation reaction Methods 0.000 claims description 2
- 239000010410 layer Substances 0.000 abstract description 56
- 230000000694 effects Effects 0.000 abstract description 5
- 239000011229 interlayer Substances 0.000 abstract description 3
- 210000002569 neuron Anatomy 0.000 abstract description 3
- 238000012986 modification Methods 0.000 description 4
- 230000004048 modification Effects 0.000 description 4
- 239000000284 extract Substances 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000013139 quantization Methods 0.000 description 2
- 230000015572 biosynthetic process Effects 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000003475 lamination Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 238000003786 synthesis reaction Methods 0.000 description 1
Abstract
The invention discloses a kind of deception speech detection methods based on intensive neural network, more particularly to field of information security technology, it specifically includes following detecting step: step 1: the building of VT deception voice transformation model: by breaking the connection between traditional time and frequency characteristic using STFT, and keeping rhythm constant;Convolutional neural networks are constructed, so that the output of previous layer network is sent to next layer as input, is exported by nonlinear operation.The present invention is by establishing intensive convolutional network, it ensure that the maximum information flow of interlayer, feature propagation is enhanced, and intensively connects the over-fitting reduced with regularization effect to the lesser task of training set, and intensive convolutional network can make network layer narrow, number of parameters is substantially reduced, mitigates degenerate problem, supports the reuse of limited neuron, it does not need to relearn the characteristic pattern of redundancy simultaneously, convenient for training.
Description
Technical field
The present invention relates to field of information security technology, it is more particularly related to which a kind of be based on intensive neural network
Deception speech detection method.
Background technique
Today's society, speech deception phenomenon is very universal, proposes great challenge to social security.It is true from one
Identify that the language pretended is very important in voice.Most of research all concentrates on voice conversion at present
(VC), in speech synthesis and replay attack, however, being speaker A there is also another deception mode in voice deception
Sound becomes a certain different sound (not needed target speaker), enable identifying system can not judge the voice for described in A, this
Kind transformation is known as VT (Voice Transformation, voice deformation).People are to its attention but much less.
The patent of invention of 106875007 A of patent application publication CN discloses a kind of base for language fraud detection
End-to-end deep neural network is remembered in convolution shot and long term, and deep neural network can directly optimize feature when the convolution of use is long
It extracts and classifies according to current task, therefore given input can indicate more to have robustness and effectively, to make testing result
It is improved comprehensively;Suitable feature is directly assessed by the training of combining classification device so that model can adapt to any correlation
Task;Due to eliminating Front End so that model of the present invention enormously simplifies assembly line, especially API Calls: by list
Joint classification and optimization in a model, so that the present invention is not necessarily to call join for individual classifier and feature extracting method more
Number.
But it is in actual use, still there is more disadvantage, such as with the increase of the number of plies, it may occur that it degenerates, and
Such connection type leads to many network layer contribution very littles, but occupies a large amount of calculating.
Summary of the invention
In order to overcome the drawbacks described above of the prior art, the embodiment of the present invention provides a kind of taking advantage of based on intensive neural network
Speech detection method is deceived, by establishing intensive convolutional network, the maximum information flow of interlayer is ensure that, enhances feature propagation, and
Intensive connection has regularization effect, reduces the over-fitting to the lesser task of training set, and intensive convolutional network can make
Network layer narrows, and substantially reduces number of parameters, mitigates degenerate problem, supports the reuse of limited neuron, while not needing again
Learn the characteristic pattern of redundancy, convenient for training, to solve the problems mentioned in the above background technology.
To achieve the above object, the invention provides the following technical scheme: a kind of deception voice based on intensive neural network
Detection method specifically includes following detecting step:
Step 1: the building of VT deception voice transformation model: special by the time and frequency of breaking traditional using STFT
Property between connection, and keep rhythm constant, wherein VT deception can be described as follows:
Assuming that xt(n)) be input speech signal t moment length be n frame, firstly, xt(n) FFT coefficient is by formula
(1) it provides:
Wherein w (n) indicates that Hamming or Hanning window mouth, k indicate this index of frequency,
Then, instantaneous flow | F (k) | and the calculating of instantaneous frequency ω (k) is respectively in formula (2) and formula (3):
Δ indicates the deviation of this frequency of kth, and Fs indicates sample frequency,
VT is cheated, instantaneous frequency ω (k) is modified by formula (4), and α indicates scale factor, i.e. the deception factor,
ω ' (k* α)=ω (k) * α 0≤k < N/2 0≤k* α < N/2 (4)
Linear interpolation is shown in formula (5) commonly used in the instantaneous grade of modification, wherein 0≤k, k'< N/2, k=k'/α and μ=
K'/α-k,
| F (k ') |=μ | F (k)+(1- μ) | F (k+1) | (5)
Another method for changing instantaneous flow modulus value is protecting energy amendment, as shown in formula (6),
Use the modified instantaneous frequency ω ' and instantaneous grade F' of k index
Then instantaneous phase φ ' (k) is calculated by instantaneous frequency ω ' (k), and then after being converted by formula (7)
FFT coefficient,
F (k)=| F (k) | ejφ(k) (7)
Finally, to F'(k) inverse FFT is carried out, VT curve has been obtained,
From formula 4 and formula 5 as can be seen that VT cheating interference changes spectral magnitude, so that implicit features may be drawn
Enter into deception voice signal, can input by using the spectrogram of voice as deep neural network, extraction depth characteristic into
Row classification, and the spectrogram of an input speech signal has been obtained by short time discrete Fourier transform (STFT), formula (8) provides,
Wherein window size is 175, and lap 50%, in phonetics, VT cheating interference is led by 12 semitones
The deception factor-alpha of cause measures, as shown in formula (9),
α (s)=2s/12 (9)
S can take any integer value in [- 12 ,+12] range, modified it is weak or it is too strong can all cause deception failure or
It sounds unnatural, therefore, in an experiment, we have selected to have between [- 8, -4] and [+4 ,+8] and most cheat in ability by force
Between section tested;
Step 2: building convolutional neural networks make the output of previous layer networkIt is sent to next layer and is used as input,
By nonlinear operationOutputWherein,It can be expressed as follows:
With the increase of the number of plies, it may occur that degenerate, and residual error network, highway network and fractal net work all create from
Short path X of the earlier network to rear layerl-n, have good inhibiting effect to degradation phenomena, as shown in formula (11)
Step 3: the detection accuracy of VT deception performance measurement: is tested by experiment corpus, wherein detection can describe
It is as follows:
D=(Gd+Sd)/(G+S)
Wherein G and S is respectively true in test set and deception segment quantity, and Gd and Sd are respectively correctly to detect from G
To genuine segments and the deception segment being correctly detecting from S quantity.
It in a preferred embodiment, further include a kind of intensive convolutional network for improving structure in the step 2,
In intensive convolutional network, any layer is connected directly to all succeeding layers, is specifically expressed as follows,
Wherein X0,X1, Indicate the output of l layers of all layers of front, [...] indicates continuous operation, in addition, each layer
Output dimension has k Feature Mapping, and wherein k is usually arranged as a lesser value.
In a preferred embodiment, the intensive convolutional network input is some single pass obtained by STFT
Spectrogram, size is both configured to 90 × 88, and network, by an initialization layer, three intensive modules, two conversion layers, one is complete
Office's pond layer and a linear layer composition, intensively touch block for three and are made of respectively 6 layers, 12 layers and 48 layers bottleneck layer, linear layer is
One complete articulamentum is followed by a softmax, and there are two outputs, respectively indicates the probability of " true " and " deception ",
Each convolution bottleneck layer includes 2 layers, and intensive convolutional network entire in this way includes 2 × (6+12+48)+1+1+1=135 convolutional layers.
In a preferred embodiment, the bottleneck layer includes one 1 × 1 layer of convolution, followed by one 3
× 3 two 3 × 3 layers replace convolution convolutional layer, and transition zone connects two adjacent denseblocks to be further reduced functionally
The size of figure.
In a preferred embodiment, the experiment corpus in the step 3 includes Timit, NIST and UME,
It is WAV format, 8 kilo hertzs of sample rates, 16 quantizations and monophonic.
In a preferred embodiment, described Timit, NIST and UME include training set and test set, wherein
Training set is respectively Timit-1, NIST-1, UME-1, and test set is respectively Timit-2, NIST-2, UME-2.
Technical effect and advantage of the invention:
The present invention ensure that the maximum information flow of interlayer, enhance feature propagation by establishing intensive convolutional network, and close
Collection connection has regularization effect, reduces the over-fitting to the lesser task of training set, and intensive convolutional network can make net
Network layers narrow, and substantially reduce number of parameters, mitigate degenerate problem, support the reuse of limited neuron, while not needing to learn again
The characteristic pattern of redundancy is practised, convenient for training, so that the present invention does not need to need manually to choose as traditional machine learning method
Specific one or multiple features, are then classified with classifier again, but utilize the intensive neural network proposed, can from
Hair ground extracts the feature that relevant feature includes some shallow-layer edges and then the feature of deep layer is classified in turn, simplifies entire stream
Journey has simultaneously reached better effect.
Detailed description of the invention
Fig. 1 is speech detection flow chart of the invention;
Fig. 2 is intensive neural network structure figure of the invention;
Fig. 3 is intensive neural network internal structure chart of the invention.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
Embodiment 1
The present invention provides a kind of deception speech detection methods based on intensive neural network as shown in Figs. 1-3, specifically
Including following detecting step:
Step 1: the building of VT deception voice transformation model: special by the time and frequency of breaking traditional using STFT
Property between connection, and keep rhythm constant, wherein VT deception can be described as follows:
Assuming that xt(n)) be input speech signal t moment length be n frame, firstly, xt(n) FFT coefficient is by formula
(1) it provides:
Wherein w (n) indicates that Hamming or Hanning window mouth, k indicate this index of frequency,
Then, instantaneous flow | F (k) | and the calculating of instantaneous frequency ω (k) is respectively in formula (2) and formula (3):
Δ indicates the deviation of this frequency of kth, and Fs indicates sample frequency,
VT is cheated, instantaneous frequency ω (k) is modified by formula (4), and α indicates scale factor, i.e. the deception factor,
ω ' (k* α)=ω (k) * α 0≤k < N/2 0≤k* α < N/2 (4)
Linear interpolation is shown in formula (5) commonly used in the instantaneous grade of modification, wherein 0≤k, k'< N/2, k=k'/α and μ=
K'/α-k,
| F (k ') |=μ F (k) |+(1- μ) | F (k+1) | (5)
Another method for changing instantaneous flow modulus value is protecting energy amendment
(Energy-preserving modification), as shown in formula (6),
Use the modified instantaneous frequency ω ' and instantaneous grade F' of k index
Then instantaneous phase φ ' (k) is calculated by instantaneous frequency ω ' (k), and then after being converted by formula (7)
FFT coefficient,
F (k)=| F (k) | ejφ(k) (7)
Finally, to F'(k) inverse FFT is carried out, VT curve has been obtained,
From formula 4 and formula 5 as can be seen that VT cheating interference changes spectral magnitude, so that implicit features may be drawn
Enter into deception voice signal, can input by using the spectrogram of voice as deep neural network, extraction depth characteristic into
Row classification, and the spectrogram of an input speech signal has been obtained by short time discrete Fourier transform (STFT), formula (8) provides,
Wherein window size is 175, and lap 50%, in phonetics, VT cheating interference is led by 12 semitones
The deception factor-alpha of cause measures, as shown in formula (9),
α (s)=2s/12 (9)
S can take any integer value in [- 12 ,+12] range, modified it is weak or it is too strong can all cause deception failure or
It sounds unnatural, therefore, in an experiment, we have selected to have between [- 8, -4] and [+4 ,+8] and most cheat in ability by force
Between section tested;
Step 2: building convolutional neural networks (CNN) makes the output of previous layer networkIt is sent to next layer of conduct
Input, by nonlinear operationOutputWherein,It can be expressed as follows:
With the increase of the number of plies, it may occur that it degenerates, and residual error network (ResNets), highway network (Highway
Networks) and fractal net work (FractalNets) all creates the short path X from earlier network to rear layerl-n, existing to degenerating
As there is good inhibiting effect, as shown in formula (11)
Step 3: the detection accuracy of VT deception performance measurement: is tested by experiment corpus, wherein detection can describe
It is as follows:
D=(Gd+Sd)/(G+S)
Wherein G and S is respectively true in test set and deception segment quantity, and Gd and Sd are respectively correctly to detect from G
To genuine segments and the deception segment being correctly detecting from S quantity.
Further, the experiment corpus in the step 3 include Timit (6300 segments, 630 spokesmans),
NIST (3560 segments, 356 spokesmans) and UME (4040 segments, 202 spokesmans), is WAV format, and 8 kilo hertzs
Sample rate, 16 quantizations and monophonic.
Further, the Timit (6300 segments, 630 spokesmans), NIST (3560 segments, 356 speeches
Person) and UME (4040 segments, 202 spokesmans) include training set and test set, wherein training set is respectively Timit-1
(3000 segments), NIST-1 (2000 segments), UME-1 (2040 segments), and test set is respectively Timit-2 (3300
A segment), NIST-2 (1560 segments), UME-2 (2000 segments).
Embodiment 2
It unlike the first embodiment, further include a kind of intensive convolutional network for improving structure in the step 2
(DenseNet), in intensive convolutional network (DenseNet), any layer is connected directly to all succeeding layers, specific to indicate such as
Under,
Wherein X0,X1,Indicate the output of l layers of all layers of front, [...] indicates continuous operation, in addition, each layer
Output dimension has k Feature Mapping, and wherein k is usually arranged as a lesser value.
Further, intensive convolutional network (DenseNet) input is some single pass spectrums obtained by STFT
Figure, size is both configured to 90 × 88, and network is by an initialization layer, three intensive modules, two conversion layers, a global pool
Change layer and a linear layer composition, intensively touches block for three and be made of respectively 6 layers, 12 layers and 48 layers bottleneck layer, linear layer is one
Complete articulamentum is followed by a softmax, and there are two outputs, respectively indicates the probability of " true " and " deception ", each
Convolution bottleneck layer includes 2 layers, and intensive convolutional network (DenseNet) entire in this way is rolled up comprising 2 × (6+12+48)+1+1+1=135
Lamination is conducive to automatically extract depth characteristic by 135 layers of intensive convolutional network, to improve computational efficiency.
Further, the bottleneck layer includes one 1 × 1 layer of convolution, followed by one 3 × 3 two 3 × 3 layers
It is calculated instead of convolution convolutional layer with reducing, transition zone connects two adjacent denseblocks to be further reduced function map
Size.
Based on embodiment 2, homologous corpus assessment and across corpus assessment are carried out to test set and training set respectively:
(1) homologous corpus assessment
In the case where internal database, test set and training set come from the same corpus, the testing result of this method
It is as shown in the table with the result of other methods,
From the data in the table, the method for the average detected ratio of precision tradition CNN model of method proposed by the invention is high
2.58%, it is higher than the method for SVM model by 3.66%, so that decision had not only had depth characteristic but also had referred in intensive convolutional network
The edge feature of early stage, so as to further increasing precision.
(2) overstate that corpus is assessed
In reality scene, tested speech and training voice may be from different sources, by choosing three corpus
In one be used as test data set, other two is as training set, and the experimental results are shown inthe following table,
From the data in the table, the result of first two situation is all fine, but scheme 3 is unsatisfactory, a possible reason
Be NIST data volume be greater than table 1 shown in other two groups, illustrate NIST training model have better generalization ability, and
And in the method for GNN, the accuracy rate of scheme 1 is 94.37%, and our accuracy rate is 96.45%, illustrate it is proposed that
Method be better than GNN method.
Finally, it should be noted that the foregoing is only a preferred embodiment of the present invention, it is not intended to restrict the invention,
All within the spirits and principles of the present invention, any modification, equivalent replacement, improvement and so on should be included in of the invention
Within protection scope.
Claims (6)
1. a kind of deception speech detection method based on intensive neural network, it is characterised in that: specifically include following detecting step:
Step 1: VT deception voice transformation model building: by broken using STFT traditional time and frequency characteristic it
Between connection, and keep rhythm constant, wherein VT deception can be described as follows:
Assuming that xt(n)) be input speech signal t moment length be n frame, firstly, xt(n) FFT coefficient is by formula (1)
It provides:
Wherein w (n) indicates that Hamming or Hanning window mouth, k indicate this index of frequency,
Then, instantaneous flow | F (k) | and the calculating of instantaneous frequency ω (k) is respectively in formula (2) and formula (3):
Δ indicates the deviation of this frequency of kth, and Fs indicates sample frequency,
VT is cheated, instantaneous frequency ω (k) is modified by formula (4), and α indicates scale factor, i.e. the deception factor,
ω ' (k* α)=ω (k) * α 0≤k < N/2 0≤k* α < N/2 (4)
Linear interpolation is shown in formula (5), wherein 0≤k, k'< N/2, k=k'/α and μ=k'/α-commonly used in modifying instantaneous grade
K,
| F (k ') |=μ | F (k) |+(1- μ) | F (k+1) | (5)
Another method for changing instantaneous flow modulus value is protecting energy amendment, as shown in formula (6),
Use the modified instantaneous frequency ω ' and instantaneous grade F' of k index
Then instantaneous phase φ ' (k) is calculated by instantaneous frequency ω ' (k), and then passes through the FFT system after formula (7) are converted
Number,
F (k)=| F (k) | ejφ(k) (7)
Finally, to F'(k) inverse FFT is carried out, VT curve has been obtained,
From formula 4 and formula 5 as can be seen that VT cheating interference changes spectral magnitude, so that implicit features may be introduced in
Cheat in voice signal, can input by using the spectrogram of voice as deep neural network, extraction depth characteristic divided
Class, and the spectrogram of an input speech signal has been obtained by short time discrete Fourier transform (STFT), formula (8) provides,
Wherein window size is 175, and lap 50%, in phonetics, VT cheating interference is as caused by 12 semitones
Factor-alpha is cheated to measure, as shown in formula (9),
α (s)=2s/12 (9)
S can take any integer value in [- 12 ,+12] range, modified it is weak or it is too strong can all cause deception failure or listened
Next unnatural, therefore, in an experiment, we have selected have the middle area for most cheating ability by force between [- 8, -4] and [+4 ,+8]
Between tested;
Step 2: building convolutional neural networks make the output X of previous layer networkl-1Next layer is sent to as input, is passed through
Nonlinear operation HlExport Xl, wherein XlIt can be expressed as follows:
Xl=Hl(Xl-1) (10)
With the increase of the number of plies, it may occur that it degenerates, and residual error network, highway network and fractal net work were all created from early stage
Short path X of the network to rear layerl-n, have good inhibiting effect to degradation phenomena, as shown in formula (11)
Xl=Hl(Xl-1)+Xl-n(11);
Step 3: the detection accuracy of VT deception performance measurement: is tested by experiment corpus, wherein detection can be described as follows:
D=(Gd+Sd)/(G+S)
Wherein G and S is respectively true in test set and deception segment quantity, and Gd and Sd are respectively to be correctly detecting from G
The quantity of genuine segments and the deception segment being correctly detecting from S.
2. a kind of deception speech detection method based on intensive neural network according to claim 1, it is characterised in that: institute
Stating further includes a kind of intensive convolutional network for improving structure in step 2, and in intensive convolutional network, any layer is all directly connected to
To all succeeding layers, specifically it is expressed as follows,
Xl=Hl([X0,X1,...,Xl-1])
Wherein X0,X1,Xl-1Indicate the output of l layers of all layers of front, [...] indicates continuous operation, in addition, each layer of output is tieed up
Degree has k Feature Mapping, and wherein k is usually arranged as a lesser value.
3. a kind of deception speech detection method based on intensive neural network according to claim 2, it is characterised in that: institute
Stating intensive convolutional network input is some single pass spectrograms obtained by STFT, and size is both configured to 90 × 88, and network
By an initialization layer, three intensive modules, two conversion layers, a global pool layer and a linear layer composition, three close
Collection is touched block and is made of respectively 6 layers, 12 layers and 48 layers bottleneck layer, and linear layer is a complete articulamentum, is followed by one
Softmax, there are two outputs, respectively indicate the probability of " true " and " deception ", and each convolution bottleneck layer includes 2 layers, in this way
Entire intensive convolutional network includes 2 × (6+12+48)+1+1+1=135 convolutional layers.
4. a kind of deception speech detection method based on intensive neural network according to claim 3, it is characterised in that: institute
Stating bottleneck layer includes one 1 × 1 layer of convolution, and followed by one 3 × 3 two 3 × 3 layers replace convolution convolutional layer, transition
Layer connects two adjacent denseblocks to be further reduced the size of function map.
5. a kind of deception speech detection method based on intensive neural network according to claim 1, it is characterised in that: institute
Stating the experiment corpus in step 3 includes Timit, NIST and UME, is WAV format, 8 kilo hertzs of sample rates, 16 amounts
Change and monophonic.
6. a kind of deception speech detection method based on intensive neural network according to claim 5, it is characterised in that: institute
Stating Timit, NIST and UME includes training set and test set, wherein and training set is respectively Timit-1, NIST-1, UME-1,
And test set is respectively Timit-2, NIST-2, UME-2.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910033384.XA CN109767776B (en) | 2019-01-14 | 2019-01-14 | Deception voice detection method based on dense neural network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910033384.XA CN109767776B (en) | 2019-01-14 | 2019-01-14 | Deception voice detection method based on dense neural network |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109767776A true CN109767776A (en) | 2019-05-17 |
CN109767776B CN109767776B (en) | 2023-12-15 |
Family
ID=66452939
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910033384.XA Active CN109767776B (en) | 2019-01-14 | 2019-01-14 | Deception voice detection method based on dense neural network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109767776B (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110211604A (en) * | 2019-06-17 | 2019-09-06 | 广东技术师范大学 | A kind of depth residual error network structure for voice deformation detection |
CN110232928A (en) * | 2019-06-13 | 2019-09-13 | 苏州思必驰信息科技有限公司 | The unrelated speaker validation method of text and device |
CN110390952A (en) * | 2019-06-21 | 2019-10-29 | 江南大学 | City sound event classification method based on bicharacteristic 2-DenseNet parallel connection |
CN111243621A (en) * | 2020-01-14 | 2020-06-05 | 四川大学 | Construction method of GRU-SVM deep learning model for synthetic speech detection |
CN111933154A (en) * | 2020-07-16 | 2020-11-13 | 平安科技(深圳)有限公司 | Method and device for identifying counterfeit voice and computer readable storage medium |
CN113506583A (en) * | 2021-06-28 | 2021-10-15 | 杭州电子科技大学 | Disguised voice detection method using residual error network |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102231277A (en) * | 2011-06-29 | 2011-11-02 | 电子科技大学 | Method for protecting mobile terminal privacy based on voiceprint recognition |
US20130138428A1 (en) * | 2010-01-07 | 2013-05-30 | The Trustees Of The Stevens Institute Of Technology | Systems and methods for automatically detecting deception in human communications expressed in digital form |
CN105845127A (en) * | 2015-01-13 | 2016-08-10 | 阿里巴巴集团控股有限公司 | Voice recognition method and system |
CN105869630A (en) * | 2016-06-27 | 2016-08-17 | 上海交通大学 | Method and system for detecting voice spoofing attack of speakers on basis of deep learning |
CN106875007A (en) * | 2017-01-25 | 2017-06-20 | 上海交通大学 | End-to-end deep neural network is remembered based on convolution shot and long term for voice fraud detection |
CN107293302A (en) * | 2017-06-27 | 2017-10-24 | 苏州大学 | A kind of sparse spectrum signature extracting method being used in voice lie detection system |
CN108597540A (en) * | 2018-04-11 | 2018-09-28 | 南京信息工程大学 | A kind of speech-emotion recognition method based on variation mode decomposition and extreme learning machine |
CN108806698A (en) * | 2018-03-15 | 2018-11-13 | 中山大学 | A kind of camouflage audio recognition method based on convolutional neural networks |
-
2019
- 2019-01-14 CN CN201910033384.XA patent/CN109767776B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130138428A1 (en) * | 2010-01-07 | 2013-05-30 | The Trustees Of The Stevens Institute Of Technology | Systems and methods for automatically detecting deception in human communications expressed in digital form |
CN102231277A (en) * | 2011-06-29 | 2011-11-02 | 电子科技大学 | Method for protecting mobile terminal privacy based on voiceprint recognition |
CN105845127A (en) * | 2015-01-13 | 2016-08-10 | 阿里巴巴集团控股有限公司 | Voice recognition method and system |
CN105869630A (en) * | 2016-06-27 | 2016-08-17 | 上海交通大学 | Method and system for detecting voice spoofing attack of speakers on basis of deep learning |
CN106875007A (en) * | 2017-01-25 | 2017-06-20 | 上海交通大学 | End-to-end deep neural network is remembered based on convolution shot and long term for voice fraud detection |
CN107293302A (en) * | 2017-06-27 | 2017-10-24 | 苏州大学 | A kind of sparse spectrum signature extracting method being used in voice lie detection system |
CN108806698A (en) * | 2018-03-15 | 2018-11-13 | 中山大学 | A kind of camouflage audio recognition method based on convolutional neural networks |
CN108597540A (en) * | 2018-04-11 | 2018-09-28 | 南京信息工程大学 | A kind of speech-emotion recognition method based on variation mode decomposition and extreme learning machine |
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110232928A (en) * | 2019-06-13 | 2019-09-13 | 苏州思必驰信息科技有限公司 | The unrelated speaker validation method of text and device |
CN110232928B (en) * | 2019-06-13 | 2021-05-25 | 思必驰科技股份有限公司 | Text-independent speaker verification method and device |
CN110211604A (en) * | 2019-06-17 | 2019-09-06 | 广东技术师范大学 | A kind of depth residual error network structure for voice deformation detection |
CN110390952A (en) * | 2019-06-21 | 2019-10-29 | 江南大学 | City sound event classification method based on bicharacteristic 2-DenseNet parallel connection |
CN110390952B (en) * | 2019-06-21 | 2021-10-22 | 江南大学 | City sound event classification method based on dual-feature 2-DenseNet parallel connection |
CN111243621A (en) * | 2020-01-14 | 2020-06-05 | 四川大学 | Construction method of GRU-SVM deep learning model for synthetic speech detection |
CN111933154A (en) * | 2020-07-16 | 2020-11-13 | 平安科技(深圳)有限公司 | Method and device for identifying counterfeit voice and computer readable storage medium |
WO2021135454A1 (en) * | 2020-07-16 | 2021-07-08 | 平安科技(深圳)有限公司 | Method, device, and computer-readable storage medium for recognizing fake speech |
CN111933154B (en) * | 2020-07-16 | 2024-02-13 | 平安科技(深圳)有限公司 | Method, equipment and computer readable storage medium for recognizing fake voice |
CN113506583A (en) * | 2021-06-28 | 2021-10-15 | 杭州电子科技大学 | Disguised voice detection method using residual error network |
CN113506583B (en) * | 2021-06-28 | 2024-01-05 | 杭州电子科技大学 | Camouflage voice detection method using residual error network |
Also Published As
Publication number | Publication date |
---|---|
CN109767776B (en) | 2023-12-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109767776A (en) | A kind of deception speech detection method based on intensive neural network | |
CN105139857B (en) | For the countercheck of voice deception in a kind of automatic Speaker Identification | |
CN103617799B (en) | A kind of English statement pronunciation quality detection method being adapted to mobile device | |
CN108564942A (en) | One kind being based on the adjustable speech-emotion recognition method of susceptibility and system | |
CN102820033A (en) | Voiceprint identification method | |
CN108711436A (en) | Speaker verification's system Replay Attack detection method based on high frequency and bottleneck characteristic | |
Auckenthaler et al. | Improving a GMM speaker verification system by phonetic weighting | |
CN103578481B (en) | A kind of speech-emotion recognition method across language | |
CN110120230B (en) | Acoustic event detection method and device | |
JPH1083194A (en) | Two-stage group selection method for speaker collation system | |
CN109545191B (en) | Real-time detection method for initial position of human voice in song | |
CN110211604A (en) | A kind of depth residual error network structure for voice deformation detection | |
CN106409298A (en) | Identification method of sound rerecording attack | |
CN106531174A (en) | Animal sound recognition method based on wavelet packet decomposition and spectrogram features | |
CN109346084A (en) | Method for distinguishing speek person based on depth storehouse autoencoder network | |
CN104575519A (en) | Feature extraction method and device as well as stress detection method and device | |
CN111611566B (en) | Speaker verification system and replay attack detection method thereof | |
CN109920447B (en) | Recording fraud detection method based on adaptive filter amplitude phase characteristic extraction | |
Fathullah et al. | Improved large-margin softmax loss for speaker diarisation | |
Xiao | Adaptive margin circle loss for speaker verification | |
CN105070300A (en) | Voice emotion characteristic selection method based on speaker standardization change | |
CN112767951A (en) | Voice conversion visual detection method based on deep dense network | |
CN112349267A (en) | Synthesized voice detection method based on attention mechanism characteristics | |
CN110415707A (en) | A kind of method for distinguishing speek person based on phonetic feature fusion and GMM | |
CN108665901A (en) | A kind of phoneme/syllable extracting method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 510665 293 Zhongshan Avenue, Tianhe District, Guangzhou, Guangdong. Applicant after: GUANGDONG POLYTECHNIC NORMAL University Address before: 510665 293 Zhongshan Avenue, Tianhe District, Guangzhou, Guangdong. Applicant before: GUANGDONG POLYTECHNIC NORMAL University |
|
CB02 | Change of applicant information | ||
GR01 | Patent grant | ||
GR01 | Patent grant |