CN109767776B - Deception voice detection method based on dense neural network - Google Patents

Deception voice detection method based on dense neural network Download PDF

Info

Publication number
CN109767776B
CN109767776B CN201910033384.XA CN201910033384A CN109767776B CN 109767776 B CN109767776 B CN 109767776B CN 201910033384 A CN201910033384 A CN 201910033384A CN 109767776 B CN109767776 B CN 109767776B
Authority
CN
China
Prior art keywords
speech
neural network
layer
dense
spoofed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910033384.XA
Other languages
Chinese (zh)
Other versions
CN109767776A (en
Inventor
王泳
苏卓艺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong Polytechnic Normal University
Original Assignee
Guangdong Polytechnic Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong Polytechnic Normal University filed Critical Guangdong Polytechnic Normal University
Priority to CN201910033384.XA priority Critical patent/CN109767776B/en
Publication of CN109767776A publication Critical patent/CN109767776A/en
Application granted granted Critical
Publication of CN109767776B publication Critical patent/CN109767776B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Circuit For Audible Band Transducer (AREA)
  • Complex Calculations (AREA)

Abstract

The invention discloses a deception voice detection method based on a dense neural network, which particularly relates to the technical field of information security, and particularly comprises the following detection steps: step one: construction of VT spoofed speech conversion model: the STFT is utilized to break the connection between the traditional time and frequency characteristics and keep the rhythm unchanged; the convolutional neural network is constructed so that the output of the previous layer network is transmitted to the next layer as input and output through nonlinear operation. The invention ensures the maximum information flow between layers by establishing the dense convolution network, enhances the feature propagation, has regularization effect by dense connection, reduces the overfitting of tasks with smaller training set, can narrow the network layer, obviously reduces the parameter number, reduces the degradation problem, supports the reuse of limited neurons, does not need to relearn redundant feature graphs, and is convenient for training.

Description

Deception voice detection method based on dense neural network
Technical Field
The invention relates to the technical field of information security, in particular to a deception voice detection method based on a dense neural network.
Background
Speech fraud is common in the current society, and presents great challenges to social security. It is important to recognize a disguised language from a real voice. Most of the research is focused on speech conversion (VC), speech synthesis and replay attacks, however, another spoofing method exists in speech spoofing, which is to change the voice of speaker a into a different voice (without the need of a target speaker), so that the recognition system cannot judge that the voice is said to be a, and this transformation is called VT (Voice Transformation, speech morphing). Much less attention is paid to it.
The invention patent of patent application publication number CN 106875007A discloses a convolution long-term and short-term memory end-to-end depth neural network for language spoofing detection, and the adopted convolution long-term depth neural network can directly optimize feature extraction and classification according to the current task, so that given input can be expressed more robustly and effectively, and the detection result is comprehensively improved; directly evaluating the proper characteristics by combining classifier training, so that the model can adapt to any relevant task; the front-end program is removed, so that the model of the invention greatly simplifies the pipeline, in particular API call: by combining classification and optimization within a single model, the present invention eliminates the need to invoke multiple parameters for separate classifiers and feature extraction methods.
However, in practical use, there are still many drawbacks, such as degradation with increasing number of layers, and this connection method results in many network layers contributing little, but taking up a lot of computation.
Disclosure of Invention
In order to overcome the above-mentioned drawbacks of the prior art, an embodiment of the present invention provides a method for detecting spoofed speech based on a dense neural network, which ensures the maximum information flow between layers by establishing the dense convolutional network, enhances feature propagation, while the dense connection has a regularization effect, reduces overfitting to tasks with smaller training sets, and the dense convolutional network can narrow the network layer, significantly reduce the number of parameters, alleviate degradation problems, support reuse of limited neurons, and simultaneously does not need to relearn redundant feature patterns, thereby facilitating training to solve the problems presented in the above-mentioned background art.
In order to achieve the above purpose, the present invention provides the following technical solutions: a deception voice detection method based on a dense neural network specifically comprises the following detection steps:
step one: construction of VT spoofed speech conversion model: the conventional link between time and frequency characteristics is broken by using STFT, and the tempo is kept unchanged, wherein VT spoofing voice can be described as follows:
let x be t (n) is a frame of length n from the input speech signal at time t, which is first windowed w (n), and then FFT is performed on the windowed signal to obtain F (k), given by equation (1):
where w (n) represents a hamming or hanning window, k represents a frequency bin index,
then, the calculation of the instantaneous quantity |f (k) | and the instantaneous frequency ω (k) are shown in the formula (2) and the formula (3), respectively:
delta represents k th Deviation of frequency bin, F S Representing the sampling frequency of the sample,
for VT spoofing speech, the instantaneous frequency ω (k) is modified by equation (4), a represents a scale factor, i.e. a spoofing factor,
linear interpolation is used to modify the instantaneous values, see equation (5), where 0.ltoreq.k, k ' < N/2, k=k '/a and μ=k '/a-k,
another method of modifying the instantaneous value is energy protection correction, as shown in equation (6),
for simplicity we still use k as the frequency-base index for the modified instantaneous frequency ω 'and instantaneous value F'.
Then calculating instantaneous phase phi '(k) through instantaneous frequency omega' (k), and obtaining converted FFT coefficient through formula (7),
F(k)=|F(k)|e jφ'(k) (7)
finally, performing inverse FFT on F' (k) to obtain VT spoofed voice,
as can be seen from equations 4 and 5, VT spoofing speech changes the spectral amplitude such that implicit features may be introduced into VT spoofing speech, the depth features may be classified by using the spectrogram of the speech as an input to a depth neural network, and a spectrogram of the input speech signal may be obtained by short-time fourier transform (STFT), as given in equation (8),
where the window size is 175 and the overlap is 50%, in speech, VT spoof speech disturbance is measured by a spoof factor a caused by 12 semitones, as shown in equation (9),
α(s)=2 s/12 (9)
s takes any integer value in the range of [ -12, +12 ];
step two: constructing convolutional neural network to make the output of the previous layer networkIs transferred to the next layer as input via a nonlinear operation +.>Output->Wherein (1)>The method can be expressed as follows:
degradation occurs with increasing number of layers, while the residual network, the highway network and the fractal network both create a short path X from the early network to the later layer l-n Has good inhibiting effect on degradation phenomenon, as shown in formula (11)
Step three: performance measurement: the detection accuracy of the VT spoofed voice is tested through an experimental corpus, wherein the detection can be described as follows:
d=(G d +S d )/(G+S)
where d is the detection accuracy, G and S are the number of real and spoofed segments in the test set, respectively, and Gd and Sd are the number of real segments correctly detected from G and spoofed segments correctly detected from S, respectively.
In a preferred embodiment, the second step further comprises a dense convolution network of improved structure, in which any layer is directly connected to all subsequent layers, specifically as follows,
wherein X is 0 ,X 1 ,...,All outputs before layer i are indicated, [.]Representing continuous operation, furthermore, each layer's output dimension has q feature maps, where the q value is a natural number.
In a preferred embodiment, the dense convolution network input is a single channel spectrogram obtained by STFT, all sized 90 x 88, and the network consists of an initialization layer, three dense modules, two conversion layers, a global pooling layer and a linear layer, the three dense modules consisting of 6, 12 and 48 bottleneck layers, respectively, the linear layer being a complete connection layer followed by a normalized exponential function with two outputs representing the probabilities of "true" and "spoof", respectively, each convolution bottleneck layer comprising 2 layers, the entire dense convolution network comprising 2× (6+12+48) +1+1+1=135 convolution layers.
In a preferred embodiment, the bottleneck layer comprises a convolution 1 x 1 layer and a convolution 3 x 3 layer, and the transition layer connects two adjacent dense modules to further reduce the size of the functional map.
In a preferred embodiment, the experimental corpus in step three includes Timit, NIST and UME, which are all in WAV format, 8 khz sampling rate, 16-bit quantization and mono speech.
In a preferred embodiment, the Timit, NIST and UME each comprise a training set and a testing set, wherein the training set is Timit-1, NIST-1, UME-1, respectively, and the testing set is Timit-2, NIST-2, UME-2, respectively.
The invention has the technical effects and advantages that:
the invention ensures the maximum information flow between layers by establishing the dense convolution network, enhances the feature propagation, has regularization effect by dense connection, reduces the overfitting of tasks with smaller training set, can narrow the network layer, obviously reduces the parameter number, reduces the degradation problem, supports the reuse of limited neurons, does not need to relearn redundant feature graphs, is convenient for training, does not need to manually select specific one or more features like the traditional machine learning method, and then carries out classification by using a classifier, but can spontaneously extract related features including some shallow edge features and deep features by using the proposed dense neural network and then classify the features, thereby simplifying the whole flow and achieving better effect.
Drawings
FIG. 1 is a flow chart of the voice detection of the present invention;
FIG. 2 is a block diagram of a dense neural network of the present invention;
fig. 3 is an internal structural diagram of the dense neural network of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Example 1
The invention provides a deception voice detection method based on a dense neural network as shown in fig. 1-3, which specifically comprises the following detection steps:
step one: construction of VT spoofed speech conversion model: the conventional link between time and frequency characteristics is broken by using STFT, and the tempo is kept unchanged, wherein VT spoofing voice can be described as follows:
let x be t (n) is a frame of length n from the input speech signal at time t, which is first windowed w (n), and then FFT is performed on the windowed signal to obtain F (k), given by equation (1):
where w (n) represents a hamming or hanning window, k represents a frequency bin index,
then, the calculation of the instantaneous quantity |f (k) | and the instantaneous frequency ω (k) are shown in the formula (2) and the formula (3), respectively:
delta represents k th Deviation of frequency bin, F S Representing the sampling frequency of the sample,
for VT spoofing speech, the instantaneous frequency ω (k) is modified by equation (4), a represents a scale factor, i.e. a spoofing factor,
linear interpolation is used to modify the instantaneous values, see equation (5), where 0.ltoreq.k, k ' < N/2, k=k '/α and μ=k '/α -k,
another method of modifying the instantaneous value is energy protection correction, as shown in equation (6),
for simplicity we still use k as the frequency-base index for the modified instantaneous frequency ω 'and instantaneous value F'.
Then calculating instantaneous phase phi '(k) through instantaneous frequency omega' (k), and obtaining converted FFT coefficient through formula (7),
F(k)=|F(k)|e jφ(k) (7)
finally, performing inverse FFT on F' (k) to obtain VT spoofed voice,
as can be seen from equations 4 and 5, VT spoofing speech changes the spectral amplitude such that implicit features may be introduced into VT spoofing speech, the depth features may be classified by using the spectrogram of the speech as an input to a depth neural network, and a spectrogram of the input speech signal may be obtained by short-time fourier transform (STFT), as given in equation (8),
where the window size is 175 and the overlap is 50%, in speech, VT spoofing speech is measured by a spoofing factor a caused by 12 semitones, as shown in equation (9),
α(s)=2 s/12 (9)
s takes any integer value in the range of [ -12, +12 ];
step two: constructing Convolutional Neural Network (CNN) to make the output of the previous layer networkIs transferred to the next layer as input via a nonlinear operation +.>Output->Wherein (1)>The method can be expressed as follows:
degradation occurs with increasing layer number, and a residual network (ResNet), a Highway network (Highway networks) and a fractal network (Fractalnets) both create a short path X from an early network to a later layer l-n Has good inhibiting effect on degradation phenomenon, as shown in formula (11)
Step three: performance measurement: the detection accuracy of the VT spoofed voice is tested through an experimental corpus, wherein the detection can be described as follows:
d=(G d +S d )/(G+S)
where G and S are the number of real and spoofed segments in the test set, respectively, and Gd and Sd are the number of real segments correctly detected from G and spoofed segments correctly detected from S, respectively.
Further, the experimental corpus in the third step includes Timit (6300 segments, 630 speakers), NIST (3560 segments, 356 speakers) and UME (4040 segments, 202 speakers), which are all in WAV format, 8 khz sampling rate, 16-bit quantization and mono speech.
Further, the Timit (6300 fragments, 630 speakers), NIST (3560 fragments, 356 speakers) and UME (4040 fragments, 202 speakers) each include a training set and a test set, wherein the training set is Timit-1 (3000 fragments), NIST-1 (2000 fragments), UME-1 (2040 fragments), and the test set is Timit-2 (3300 fragments), NIST-2 (1560 fragments), UME-2 (2000 fragments), respectively.
Example 2
Unlike example 1, the second step further includes a dense convolutional network (DenseNet) of improved structure, in which any layer is directly connected to all subsequent layers, specifically as follows,
wherein X is 0 ,X 1 ,The output of all layers preceding layer i is indicated, [.]Representing continuous operation, furthermore, each layer's output dimension has k feature maps, where k is a small value.
Further, the dense convolution network (DenseNet) input is a single-channel spectrogram obtained by STFT, the size is set to 90×88, the network is composed of an initialization layer, three dense modules, two conversion layers, a global pooling layer and a linear layer, the three dense modules are respectively composed of 6 layers, 12 layers and 48 bottleneck layers, the linear layer is a complete connection layer, and then a normalized exponential function is provided, the two output is used for respectively representing the probability of 'true' and 'deception', each convolution bottleneck layer comprises 2 layers, and the whole dense convolution network (DenseNet) comprises 2× (6+12+48) +1+1+1=135 convolution layers, so that the depth features can be automatically extracted through the 135-layer dense convolution network, and the calculation efficiency is improved.
Further, the bottleneck layer includes one 1×1 convolution layer and one 3×3 convolution layer instead of 2 3×3 convolution layers to reduce computation, and the transition layer connects two adjacent dense modules to further reduce the size of the functional map.
Based on example 2, homologous corpus evaluation and cross-corpus evaluation were performed on the test set and training set, respectively:
(1) Homologous corpus evaluation
In the case of an internal database, where the test set and training set are from the same corpus, the test results of this method and the results of other methods are shown in the following table,
as can be seen from the data in the table, the average detection precision of the method provided by the invention is 2.58% higher than that of the traditional CNN model and 3.66% higher than that of the SVM model, so that in the dense convolution network, the decision has depth characteristics and refers to early edge characteristics, and the precision can be further improved.
(2) Quadrature corpus evaluation
In a real-world scenario, the test speech and the training speech may come from different sources, by selecting one of the three corpora as the test data set, the other two as the training set, and the experimental results are shown in the following table,
from the data in the table, the results of the first two cases are good, but scheme 3 is not ideal, one possible reason is that the data amount of NIST is greater than the other two groups shown in table 1, indicating that the NIST-trained model has better generalization ability, and that in the GNN method, the accuracy of scheme 1 is 94.37% and our accuracy is 96.45%, indicating that our proposed method is superior to the GNN method.
Finally, it should be noted that: the foregoing description of the preferred embodiments of the invention is not intended to limit the invention to the precise form disclosed, and any such modifications, equivalents, and alternatives falling within the spirit and principles of the invention are intended to be included within the scope of the invention.

Claims (4)

1. A deception voice detection method based on a dense neural network is characterized by comprising the following steps of: the method specifically comprises the following detection steps:
step one: construction of VT spoofed speech conversion model: the conventional link between time and frequency characteristics is broken by using STFT, and the tempo is kept unchanged, wherein VT spoofing voice can be described as follows:
let x be t (n) is a frame of length n from the input speech signal at time t, which is first windowed w (n), and then FFT is performed on the windowed signal to obtain F (k), given by equation (1):
where w (n) represents a hamming or hanning window, k represents a frequency bin index,
then, the calculation of the instantaneous quantity |f (k) | and the instantaneous frequency ω (k) are shown in the formula (2) and the formula (3), respectively:
delta represents k th Deviation of frequency bin, F S Representing the sampling frequency of the sample,
for VT spoofing speech, the instantaneous frequency ω (k) is modified by equation (4), a represents a scale factor, i.e. a spoofing factor,
linear interpolation is used to modify the instantaneous values, see equation (5), where 0.ltoreq.k, k ' < N/2, k=k '/a and μ=k '/a-k,
another method of modifying the instantaneous value is energy protection correction, as shown in equation (6),
for simplicity we still use k as the frequency-base index of the modified instantaneous frequency ω 'and instantaneous value F',
then calculating instantaneous phase phi '(k) through instantaneous frequency omega' (k), and obtaining converted FFT coefficient through formula (7),
F(k)=|F(k)|e jφ'(k) (7)
finally, performing inverse FFT on F' (k) to obtain VT spoofed voice,
as can be seen from equations 4 and 5, VT spoofing speech disturbance changes the spectral amplitude so that implicit features can be introduced into VT spoofing speech, when the implicit features are introduced into VT spoofing speech, by using the spectrogram of the speech as the input to the deep neural network to extract the implicit features, and by Short Time Fourier Transform (STFT) to obtain a spectrogram of the input speech signal, equation (8) gives,
where the window size is 175 and the overlap is 50%, in speech, VT spoof speech disturbance is measured by a spoof factor a caused by 12 semitones, as shown in equation (9),
a(s)=2 s/12 (9)
s takes any integer value in the range of [ -12, +12 ];
step two: constructing convolutional neural network to make the output of the previous layer networkIs transferred to the next layer as input via a nonlinear operation +.>Output->Wherein (1)>The method can be expressed as follows:
with the increase of the layer number, degradation occurs, and the convolutional neural network has good inhibition effect on degradation phenomenon, as shown in formula (11)
Step three: performance measurement: the detection accuracy of the VT spoofed voice is tested through an experimental corpus, wherein the detection can be described as follows:
d=(G d +S d )/(G+S)
where d is the detection accuracy, G and S are the number of real and spoofed segments in the test set, respectively, and Gd and Sd are the number of real segments correctly detected from G and spoofed segments correctly detected from S, respectively.
2. The method for detecting spoofed voice based on dense neural network of claim 1, wherein the method comprises the steps of: the second step also comprises a dense convolution network with an improved structure, in which any layer is directly connected to all subsequent layers, specifically expressed as follows,
wherein the method comprises the steps ofAll outputs before layer i are indicated, [.]Representing continuous operation, furthermore, each layer's output dimension has q feature maps, where the q value is a natural number.
3. The method for detecting spoofed voice based on dense neural network of claim 1, wherein the method comprises the steps of: the experimental corpus in the third step comprises Timit, NIST and UME which are all in WAV format, 8 khz sampling rate and 16-bit quantization.
4. A method for detecting spoofed speech based on a dense neural network according to claim 3, wherein: the kit, the NIST and the UME comprise a training set and a testing set, wherein the training set is respectively the kit-1, the NIST-1 and the UME-1, and the testing set is respectively the kit-2, the NIST-2 and the UME-2.
CN201910033384.XA 2019-01-14 2019-01-14 Deception voice detection method based on dense neural network Active CN109767776B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910033384.XA CN109767776B (en) 2019-01-14 2019-01-14 Deception voice detection method based on dense neural network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910033384.XA CN109767776B (en) 2019-01-14 2019-01-14 Deception voice detection method based on dense neural network

Publications (2)

Publication Number Publication Date
CN109767776A CN109767776A (en) 2019-05-17
CN109767776B true CN109767776B (en) 2023-12-15

Family

ID=66452939

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910033384.XA Active CN109767776B (en) 2019-01-14 2019-01-14 Deception voice detection method based on dense neural network

Country Status (1)

Country Link
CN (1) CN109767776B (en)

Families Citing this family (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110232928B (en) * 2019-06-13 2021-05-25 思必驰科技股份有限公司 Text-independent speaker verification method and device
CN110211604A (en) * 2019-06-17 2019-09-06 广东技术师范大学 A kind of depth residual error network structure for voice deformation detection
CN110390952B (en) * 2019-06-21 2021-10-22 江南大学 City sound event classification method based on dual-feature 2-DenseNet parallel connection
CN111243621A (en) * 2020-01-14 2020-06-05 四川大学 Construction method of GRU-SVM deep learning model for synthetic speech detection
CN111933154B (en) * 2020-07-16 2024-02-13 平安科技(深圳)有限公司 Method, equipment and computer readable storage medium for recognizing fake voice
CN112767951A (en) * 2021-01-22 2021-05-07 广东技术师范大学 Voice conversion visual detection method based on deep dense network
CN113506583B (en) * 2021-06-28 2024-01-05 杭州电子科技大学 Camouflage voice detection method using residual error network

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102231277A (en) * 2011-06-29 2011-11-02 电子科技大学 Method for protecting mobile terminal privacy based on voiceprint recognition
CN105845127A (en) * 2015-01-13 2016-08-10 阿里巴巴集团控股有限公司 Voice recognition method and system
CN105869630A (en) * 2016-06-27 2016-08-17 上海交通大学 Method and system for detecting voice spoofing attack of speakers on basis of deep learning
CN106875007A (en) * 2017-01-25 2017-06-20 上海交通大学 End-to-end deep neural network is remembered based on convolution shot and long term for voice fraud detection
CN107293302A (en) * 2017-06-27 2017-10-24 苏州大学 A kind of sparse spectrum signature extracting method being used in voice lie detection system
CN108597540A (en) * 2018-04-11 2018-09-28 南京信息工程大学 A kind of speech-emotion recognition method based on variation mode decomposition and extreme learning machine
CN108806698A (en) * 2018-03-15 2018-11-13 中山大学 A kind of camouflage audio recognition method based on convolutional neural networks

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2493875A (en) * 2010-04-26 2013-02-20 Trustees Of Stevens Inst Of Technology Systems and methods for automatically detecting deception in human communications expressed in digital form

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102231277A (en) * 2011-06-29 2011-11-02 电子科技大学 Method for protecting mobile terminal privacy based on voiceprint recognition
CN105845127A (en) * 2015-01-13 2016-08-10 阿里巴巴集团控股有限公司 Voice recognition method and system
CN105869630A (en) * 2016-06-27 2016-08-17 上海交通大学 Method and system for detecting voice spoofing attack of speakers on basis of deep learning
CN106875007A (en) * 2017-01-25 2017-06-20 上海交通大学 End-to-end deep neural network is remembered based on convolution shot and long term for voice fraud detection
CN107293302A (en) * 2017-06-27 2017-10-24 苏州大学 A kind of sparse spectrum signature extracting method being used in voice lie detection system
CN108806698A (en) * 2018-03-15 2018-11-13 中山大学 A kind of camouflage audio recognition method based on convolutional neural networks
CN108597540A (en) * 2018-04-11 2018-09-28 南京信息工程大学 A kind of speech-emotion recognition method based on variation mode decomposition and extreme learning machine

Also Published As

Publication number Publication date
CN109767776A (en) 2019-05-17

Similar Documents

Publication Publication Date Title
CN109767776B (en) Deception voice detection method based on dense neural network
CN111261147B (en) Music embedding attack defense method for voice recognition system
CN111724770B (en) Audio keyword identification method for generating confrontation network based on deep convolution
CN104900235A (en) Voiceprint recognition method based on pitch period mixed characteristic parameters
WO2020024396A1 (en) Music style recognition method and apparatus, computer device, and storage medium
CN112802484A (en) Panda sound event detection method and system under mixed audio frequency
CN108091345B (en) Double-ear voice separation method based on support vector machine
Zheng et al. When automatic voice disguise meets automatic speaker verification
US10522160B2 (en) Methods and apparatus to identify a source of speech captured at a wearable electronic device
KR101140896B1 (en) Method and apparatus for speech segmentation
Yadav et al. ASSD: Synthetic Speech Detection in the AAC Compressed Domain
Chakravarty et al. Data augmentation and hybrid feature amalgamation to detect audio deep fake attacks
CN112767951A (en) Voice conversion visual detection method based on deep dense network
CN109545198A (en) A kind of Oral English Practice mother tongue degree judgment method based on convolutional neural networks
CN116153337B (en) Synthetic voice tracing evidence obtaining method and device, electronic equipment and storage medium
CN116665649A (en) Synthetic voice detection method based on prosody characteristics
Lu et al. Detecting Unknown Speech Spoofing Algorithms with Nearest Neighbors.
CN112309404B (en) Machine voice authentication method, device, equipment and storage medium
CN113012684B (en) Synthesized voice detection method based on voice segmentation
CN115293214A (en) Underwater sound target recognition model optimization method based on sample expansion network
CN108269566A (en) A kind of thorax mouth wave recognition methods based on multiple dimensioned sub-belt energy collection feature
CN104575518B (en) Rhythm event detecting method and device
Nosek et al. Synthesized speech detection based on spectrogram and convolutional neural networks
Dou et al. Dynamically mitigating data discrepancy with balanced focal loss for replay attack detection
CN117393000B (en) Synthetic voice detection method based on neural network and feature fusion

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 510665 293 Zhongshan Avenue, Tianhe District, Guangzhou, Guangdong.

Applicant after: GUANGDONG POLYTECHNIC NORMAL University

Address before: 510665 293 Zhongshan Avenue, Tianhe District, Guangzhou, Guangdong.

Applicant before: GUANGDONG POLYTECHNIC NORMAL University

CB02 Change of applicant information
GR01 Patent grant
GR01 Patent grant