CN111564163B - RNN-based multiple fake operation voice detection method - Google Patents

RNN-based multiple fake operation voice detection method Download PDF

Info

Publication number
CN111564163B
CN111564163B CN202010382185.2A CN202010382185A CN111564163B CN 111564163 B CN111564163 B CN 111564163B CN 202010382185 A CN202010382185 A CN 202010382185A CN 111564163 B CN111564163 B CN 111564163B
Authority
CN
China
Prior art keywords
voice
lfcc
rnn
matrix
original
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010382185.2A
Other languages
Chinese (zh)
Other versions
CN111564163A (en
Inventor
严迪群
乌婷婷
王让定
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ningbo University
Original Assignee
Ningbo University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ningbo University filed Critical Ningbo University
Priority to CN202010382185.2A priority Critical patent/CN111564163B/en
Publication of CN111564163A publication Critical patent/CN111564163A/en
Application granted granted Critical
Publication of CN111564163B publication Critical patent/CN111564163B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/27Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
    • G10L25/30Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02TCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
    • Y02T10/00Road transport of goods or passengers
    • Y02T10/10Internal combustion engine [ICE] based vehicles
    • Y02T10/40Engine management systems

Abstract

The invention discloses a speech detection method for multiple fake operations based on RNN, which comprises the following steps: 1) Obtaining an original voice sample, performing M kinds of forging treatment on the original voice sample to obtain M kinds of forged voices and 1 untreated original voice, extracting characteristics of the voices to obtain an LFCC matrix of a training voice sample, and sending the LFCC matrix into an RNN classifier network for training to obtain a multi-classification training model; 2) Obtaining a section of test voice, extracting features of the test voice to obtain an LFCC matrix of test voice data, sending the LFCC matrix into the RNN classifier trained in the step 1) to classify, obtaining an output probability for each test voice, and combining all the output probabilities as a final prediction result: if the predicted result is the original voice, the test voice is recognized as the original voice; if the predicted result is a voice that has undergone a certain falsification operation, the test voice is recognized as a falsified voice that has undergone the corresponding falsification operation.

Description

RNN-based multiple fake operation voice detection method
Technical Field
The invention relates to a voice detection method, in particular to a voice detection method for multiple fake operations based on RNN.
Background
With the increasing functionality of voice editing software, modifications to the voice content are also readily available to non-professionals. If lawbreakers maliciously forge and modify voice, even if the modified voice is used in the fields of news reporting, judicial evidence obtaining, scientific research and the like, the voice can pose a huge threat, and even can cause an unpredictable influence on social stability. The digital voice evidence obtaining method is detection of fake operation, plays a crucial role in identifying the originality and the authenticity of the audio material, and is an important research subject in the current multimedia evidence obtaining field.
Most of the existing digital voice evidence obtaining detection technologies detect a single fake operation, that is, evidence obtaining personnel assume that voice to be detected can undergo a specific fake operation. The Mengyu Qiao et al propose a detection algorithm based on statistical features of quantized MDCT coefficients and their derivatives, detecting up-converted and down-converted MP3 audio files, generating reference audio signals by recompression and calibration of the audio, and then classifying with a support vector machine, and experimental results show that the method effectively detects MP3 double compression and can detect the audio processing history of digital evidence. For example, wang Lihua et al propose a convolutional neural network-based pitch-shifting voice processing history detection, which uses four different pitch-shifting software to perform pitch shifting on three voice libraries, and uses CNN to detect pitch shifting factors of voices in and between the voice libraries and between pitch shifting methods, wherein the detection rate is up to more than 90%.
The existing digital voice evidence obtaining detection technology can detect single fake operation, and the detection rate can reach a very high level. In practical applications, however, a prover often cannot predict a specific counterfeit operation, and a false judgment may occur when detecting the counterfeit operation using a specific operation classifier.
At present, most of the existing digital evidence collection works applicable to various forging operations are concentrated on the field of digital images, and the research on digital voice evidence collection is still relatively few. In the digital voice field, a convolutional neural network model is designed by a Luo Weiqi team, can be used for detecting the audio processing operation of default settings in two different audio editing software, and provides better results. However, although the experiment originally researches various fake operation detection of the voice, the experiment has some problems that the calculation complexity is too high, the application scene of the fake operation is too ideal, and the like.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a speech detection method for various fake operations based on RNN, which can improve the detection accuracy.
The technical scheme adopted for solving the technical problems is as follows: a speech detection method for multiple fake operations based on RNN is characterized in that: the method comprises the following steps:
1) Training network: obtaining an original voice sample, performing M kinds of forging treatment on the original voice sample to obtain M kinds of forged voices and 1 untreated original voice, extracting features of the M kinds of forged voices and 1 original voice to obtain an LFCC matrix of a training voice sample, and sending the LFCC matrix into an RNN classifier network for training to obtain a multi-classification training model;
2) And (3) voice recognition: obtaining a section of test voice, extracting features of the test voice to obtain an LFCC matrix of test voice data, sending the LFCC matrix into the RNN classifier trained in the step 1) to classify, obtaining an output probability for each test voice, and combining all the output probabilities as a final prediction result: if the predicted result is the original voice, the test voice is recognized as the original voice; if the predicted result is a voice that has undergone a certain falsification operation, the test voice is recognized as a falsified voice that has undergone the corresponding falsification operation.
Preferably, in steps 1) and 2), the step of obtaining the LFCC matrix is:
1) FFT: firstly, preprocessing voice, and calculating the frequency spectrum energy E (i, k) of each voice frame after FFT:
where i is the number of speech frames, k is the frequency component, x i (m) voice signal data of the i-th frame, N being the number of fourier transforms;
the energy of the spectral energy E (i, k) per frame after passing through the triangular filter bank is then calculated:
wherein H is i (k) Representing the frequency response of the triangular filter, f (L) is the filtering function of the first triangular filter, S (i, L) is spectral line energy after passing through the triangular filter bank, L represents the number of the triangular filter, and L is the total number of the triangular filters;
2) DCT (discrete cosine transform): calculating output data lfcc (i, n) of each triangular filter bank using DCT:
wherein n represents the spectral line after DCT of the ith frame;
3) Obtaining LFCC statistical moment: taking 12-order LFCC coefficients from LFCC (i, n), and calculating the mean value and the correlation coefficient to obtain an LFCC matrix extracted from a section of voice, wherein the LFCC matrix comprises the following components:
wherein x is s,1 …x 1,n N LFCCs for the calculated s-th frame voice data.
Preferably, the RNN classifier includes an LSTM network, a Dropout layer, a full connection layer, and a Softmax layer sequentially connected, where the Dropout layer is connected to the last LSTM network.
Preferably, the LSTM network has two parameters set to (64,128) and (128,64), respectively.
Preferably, the LSTM network uses a tanh activation function.
Preferably, the dropoff layer has a dropoff function value of 0.5.
Preferably, the original speech is in WAV format.
Compared with the prior art, the invention has the advantages that: the voice cepstrum features are adopted, the probability of the result is output through the classification of the cyclic neural network, the accuracy of voice detection is improved, the voice detection method is more suitable for digital voice carriers, and different fake marks can be identified; compared with the existing method based on deep learning, the method has the advantage that the calculation complexity of the sharing parameters in the RNN is greatly reduced.
Drawings
Fig. 1 is a process diagram of LFCC statistical moment extraction in the voice detection method according to the embodiment of the present invention;
FIG. 2 is a schematic diagram of the overall framework of a voice detection method according to an embodiment of the present invention;
fig. 3 is a network configuration diagram of a voice detection method according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions.
In the description of the present invention, it should be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", "axial", "radial", "circumferential", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings are merely for purposes of describing the present invention and simplifying the description, and do not indicate or imply that the device or element being referred to must have a specific orientation, be configured and operated in a specific orientation, and because the disclosed embodiments of the present invention may be arranged in different orientations, these directional terms are merely for illustration and should not be construed as limitations, such as "upper", "lower" are not necessarily limited to orientations opposite or coincident with the direction of gravity. Furthermore, features defining "first", "second" may include one or more such features, either explicitly or implicitly.
A speech detection method based on RNN (recurrent neural network) for multiple fake operations is realized by constructing a recurrent neural network framework based on cepstrum features. Referring to fig. 2, the frame is made up of two parts: firstly, the cepstrum features of the voice sample are extracted, and then the cepstrum features are sent into a designed network frame for classification, so that the task of identifying various fake operations is achieved.
Specifically, in the present invention, feature extraction of speech is achieved in the following manner. The cepstrum features adopted by the invention are linear frequency cepstrum coefficients (Linear Frequency Cepstral Coefficients, LFCC). The voice cepstrum feature is one of the most commonly used feature parameters in voice technology, characterizes the auditory features of humans, and is widely used for speaker recognition.
LFCC is the average allocation of low frequency to high frequency band pass filters. The extraction process of the LFCC statistical moment of the present invention can be seen in fig. 1:
1) FFT: firstly, preprocessing voice, and calculating the spectrum energy E (i, k) of each voice frame after fast Fourier transform (Fast Fourier Transform, FFT):
where i is the number of speech frames, k is the frequency component, x i (m) is the speech signal data of the i-th frame, and N is the number of Fourier transforms.
Calculating the energy of the spectral energy E (i, k) of each frame after the spectral energy E (i, k) passes through the triangular filter bank:
wherein H is i (k) The frequency response of the triangular filter is represented, f (L) is the filtering function of the first triangular filter, S (i, L) is the spectral line energy after the triangular filter group, L represents the number of the triangular filter, and L is the total number of the triangular filters.
2) DCT (discrete cosine transform): then, the output data lfcc (i, n) of each triangular filter bank is calculated using a discrete cosine transform (Discrete Cosine Transform, DCT):
where n represents the spectral line after the i-th frame DCT.
3) Obtaining LFCC statistical moment: taking 12-order LFCC coefficients from LFCC (i, n), and calculating the mean value and the correlation coefficient, wherein the steps can be realized through the existing matlab function, and the LFCC matrix extracted from a certain segment of preprocessed voice is as follows, assuming that the voice has s frames in total:
wherein x is s,1 …x 1,n N LFCCs for the calculated s-th frame voice data.
Referring to fig. 3, the network framework employs RNN classifiers, the selection of the number of network layers of which is critical to the optimization algorithm, and deeper networks can learn more knowledge, but training takes a long time and may be overfitted. Thus, in the present invention, the network structure of the RNN classifier is proposed as shown in fig. 3. The network structure comprises 2 LSTM networks, parameters are respectively set to (64,128) and (128,64), and the tanh activation function is used for improving the model performance. The device also comprises a Dropout layer, a full connection layer (dense) and a Softmax layer which are connected in sequence, wherein the Dropout layer is connected with the last LSTM network. Setting the value of the Dropout function to 0.5 helps to reduce the overfitting, after full-connected layer dimension reduction, using the Softmax layer (Softmax classifier) output probabilities. The overall iterative training of the network framework is set to 50 turns. Certain adjustments can be made during specific training.
Referring back to fig. 2, the voice detection method includes the following steps:
1) First, training the network frame is required. Assuming that there are M kinds of falsification operations, M kinds of falsification processing are performed on the original speech respectively, so as to obtain m+1 kinds of speech samples including M kinds of falsified speech and 1 kind of unprocessed original speech. In the invention, certain constraint is applied to the input of the original voice, and a certain amount of WAV format audio sample library is required to be provided as training data of a network framework. Extracting features of the M+1 voice samples to obtain an LFCC matrix of training voice samples, and sending the LFCC matrix into a designed RNN classifier network for training to obtain a multi-classification training model; a plurality of original voice samples can be stored in a database, and each original voice sample is subjected to characteristic extraction and sent to an RNN classifier for training;
2) Then, obtaining a detection and identification result through the trained network framework: when a section of test voice is obtained, extracting the characteristics of the test voice to obtain an LFCC matrix of test voice data, and sending the LFCC matrix into a trained RNN classifier for classification. Each test speech will get an output probability, combining all output probabilities as the final prediction result. If the predicted result is the original voice, the test voice is recognized as the original voice; if the predicted result is a speech subjected to a certain falsification operation, the test speech is recognized as the falsified speech. The evidence obtaining person can judge whether a section of voice is forged or not according to the experimental result.

Claims (6)

1. A speech detection method for multiple fake operations based on RNN is characterized in that: the method comprises the following steps:
1) Training network: obtaining an original voice sample, performing M kinds of forging treatment on the original voice sample to obtain M kinds of forged voices and 1 untreated original voice, extracting features of the M kinds of forged voices and 1 original voice to obtain an LFCC matrix of a training voice sample, and sending the LFCC matrix into an RNN classifier network for training to obtain a multi-classification training model;
2) And (3) voice recognition: obtaining a section of test voice, extracting features of the test voice to obtain an LFCC matrix of test voice data, sending the LFCC matrix into the RNN classifier trained in the step 1) to classify, obtaining an output probability for each test voice, and combining all the output probabilities as a final prediction result: if the predicted result is the original voice, the test voice is recognized as the original voice; if the predicted result is the voice which is subjected to a certain fake operation, the test voice is recognized as the fake voice which is subjected to the corresponding fake operation;
in steps 1) and 2), the step of obtaining the LFCC matrix is:
1) FFT: firstly, preprocessing voice, and calculating the frequency spectrum energy E (i, k) of each voice frame after FFT:
where i is the number of speech frames, k is the frequency component, x i (m) voice signal data of the i-th frame, N being the number of fourier transforms;
the energy of the spectral energy E (i, k) per frame after passing through the triangular filter bank is then calculated:
wherein H is i (k) Representing the frequency response of the triangular filter, f (l) being the filter function of the first triangular filter, S (i, l) being the spectral line energy after passing through the triangular filter bank,l represents the number of the triangular filters, and L is the total number of the triangular filters;
2) DCT (discrete cosine transform): calculating output data lfcc (i, n) of each triangular filter bank using DCT:
wherein n represents the spectral line after DCT of the ith frame;
3) Obtaining LFCC statistical moment: taking 12-order LFCC coefficients from LFCC (i, n), and calculating the mean value and the correlation coefficient to obtain an LFCC matrix extracted from a section of voice, wherein the LFCC matrix comprises the following components:
wherein x is s,1 …x 1,n N LFCCs for the calculated s-th frame voice data.
2. The RNN-based multiple counterfeit operation voice detection method according to claim 1, wherein: the RNN classifier comprises an LSTM network, a Dropout layer, a full connection layer and a Softmax layer which are sequentially connected, wherein the Dropout layer is connected with the last LSTM network.
3. The RNN-based multiple counterfeit operation voice detection method according to claim 2, wherein: the LSTM network has two parameters set to (64,128) and (128,64), respectively.
4. The RNN-based multiple counterfeit operation voice detection method according to claim 2, wherein: the LSTM network uses a tanh activation function.
5. The RNN-based multiple counterfeit operation voice detection method according to claim 2, wherein: the Dropout function value of the Dropout layer is 0.5.
6. The RNN-based multiple counterfeit operation voice detection method according to claim 1, wherein: the original speech is in WAV format.
CN202010382185.2A 2020-05-08 2020-05-08 RNN-based multiple fake operation voice detection method Active CN111564163B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010382185.2A CN111564163B (en) 2020-05-08 2020-05-08 RNN-based multiple fake operation voice detection method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010382185.2A CN111564163B (en) 2020-05-08 2020-05-08 RNN-based multiple fake operation voice detection method

Publications (2)

Publication Number Publication Date
CN111564163A CN111564163A (en) 2020-08-21
CN111564163B true CN111564163B (en) 2023-12-15

Family

ID=72071821

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010382185.2A Active CN111564163B (en) 2020-05-08 2020-05-08 RNN-based multiple fake operation voice detection method

Country Status (1)

Country Link
CN (1) CN111564163B (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113488073B (en) * 2021-07-06 2023-11-24 浙江工业大学 Fake voice detection method and device based on multi-feature fusion
CN113299315B (en) * 2021-07-27 2021-10-15 中国科学院自动化研究所 Method for generating voice features through continuous learning without original data storage
CN113362814B (en) * 2021-08-09 2021-11-09 中国科学院自动化研究所 Voice identification model compression method fusing combined model information
CN113488027A (en) * 2021-09-08 2021-10-08 中国科学院自动化研究所 Hierarchical classification generated audio tracing method, storage medium and computer equipment
CN113555007B (en) * 2021-09-23 2021-12-14 中国科学院自动化研究所 Voice splicing point detection method and storage medium
CN115249487B (en) * 2022-07-21 2023-04-14 中国科学院自动化研究所 Incremental generated voice detection method and system for playback boundary load sample
CN116229960B (en) * 2023-03-08 2023-10-31 江苏微锐超算科技有限公司 Robust detection method, system, medium and equipment for deceptive voice
CN117690455A (en) * 2023-12-21 2024-03-12 合肥工业大学 Sliding window-based partial synthesis fake voice detection method and system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB201514943D0 (en) * 2015-08-21 2015-10-07 Validsoft Uk Ltd Replay attack detection
US9299364B1 (en) * 2008-06-18 2016-03-29 Gracenote, Inc. Audio content fingerprinting based on two-dimensional constant Q-factor transform representation and robust audio identification for time-aligned applications
KR20160125628A (en) * 2015-04-22 2016-11-01 (주)사운드렉 A method for recognizing sound based on acoustic feature extraction and probabillty model
WO2018107810A1 (en) * 2016-12-15 2018-06-21 平安科技(深圳)有限公司 Voiceprint recognition method and apparatus, and electronic device and medium
CN108806698A (en) * 2018-03-15 2018-11-13 中山大学 A kind of camouflage audio recognition method based on convolutional neural networks
CN109599116A (en) * 2018-10-08 2019-04-09 中国平安财产保险股份有限公司 The method, apparatus and computer equipment of supervision settlement of insurance claim based on speech recognition
CN110491391A (en) * 2019-07-02 2019-11-22 厦门大学 A kind of deception speech detection method based on deep neural network
CN110931022A (en) * 2019-11-19 2020-03-27 天津大学 Voiceprint identification method based on high-frequency and low-frequency dynamic and static characteristics

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10860858B2 (en) * 2018-06-15 2020-12-08 Adobe Inc. Utilizing a trained multi-modal combination model for content and text-based evaluation and distribution of digital video content to client devices

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9299364B1 (en) * 2008-06-18 2016-03-29 Gracenote, Inc. Audio content fingerprinting based on two-dimensional constant Q-factor transform representation and robust audio identification for time-aligned applications
KR20160125628A (en) * 2015-04-22 2016-11-01 (주)사운드렉 A method for recognizing sound based on acoustic feature extraction and probabillty model
GB201514943D0 (en) * 2015-08-21 2015-10-07 Validsoft Uk Ltd Replay attack detection
WO2018107810A1 (en) * 2016-12-15 2018-06-21 平安科技(深圳)有限公司 Voiceprint recognition method and apparatus, and electronic device and medium
CN108806698A (en) * 2018-03-15 2018-11-13 中山大学 A kind of camouflage audio recognition method based on convolutional neural networks
CN109599116A (en) * 2018-10-08 2019-04-09 中国平安财产保险股份有限公司 The method, apparatus and computer equipment of supervision settlement of insurance claim based on speech recognition
CN110491391A (en) * 2019-07-02 2019-11-22 厦门大学 A kind of deception speech detection method based on deep neural network
CN110931022A (en) * 2019-11-19 2020-03-27 天津大学 Voiceprint identification method based on high-frequency and low-frequency dynamic and static characteristics

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
Combining Phase-based Features for Replay Spoof Detection System;Kantheti Srinivas;《2018 11th International Symposium on Chinese Spoken Language Processing》;第151-155页 *
Mapping model of network scenarios and routing metrics in DTN;Qin Zhenzhen;《 Journal of Nanjing University of Science and Technology》;第40卷(第3期);第291-296页 *
针对多种伪造操作的数字语音取证算法;乌婷婷 等;《无线通信技术》(第3期);第37-45页 *
陈柱欣.基于深度神经网络的声纹欺骗检测研究.《中国优秀硕士学位论文全文数据库-信息科技辑》.2020,(第1期),第I136-340页. *

Also Published As

Publication number Publication date
CN111564163A (en) 2020-08-21

Similar Documents

Publication Publication Date Title
CN111564163B (en) RNN-based multiple fake operation voice detection method
CN108198574B (en) Sound change detection method and device
CN110310666B (en) Musical instrument identification method and system based on SE convolutional network
US20040260550A1 (en) Audio processing system and method for classifying speakers in audio data
CN110033756B (en) Language identification method and device, electronic equipment and storage medium
CN109949824B (en) City sound event classification method based on N-DenseNet and high-dimensional mfcc characteristics
CN111986699B (en) Sound event detection method based on full convolution network
CN110120230B (en) Acoustic event detection method and device
CN111275165A (en) Network intrusion detection method based on improved convolutional neural network
CN107145778B (en) Intrusion detection method and device
Halkias et al. Classification of mysticete sounds using machine learning techniques
CN106910495A (en) A kind of audio classification system and method for being applied to abnormal sound detection
CN113571067A (en) Voiceprint recognition countermeasure sample generation method based on boundary attack
CN114495950A (en) Voice deception detection method based on deep residual shrinkage network
Lu et al. Unsupervised speaker segmentation and tracking in real-time audio content analysis
CN113299315B (en) Method for generating voice features through continuous learning without original data storage
CN115910045B (en) Model training method and recognition method for voice wake-up word
CN116229960B (en) Robust detection method, system, medium and equipment for deceptive voice
Qin et al. Multi-branch feature aggregation based on multiple weighting for speaker verification
WO2021088176A1 (en) Binary multi-band power distribution-based low signal-to-noise ratio sound event detection method
CN115331661A (en) Voiceprint recognition backdoor attack defense method based on feature clustering analysis and feature dimension reduction
CN115064175A (en) Speaker recognition method
CN112967712A (en) Synthetic speech detection method based on autoregressive model coefficient
Chun et al. Research on music classification based on MFCC and BP neural network
CN111783534B (en) Sleep stage method based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant