CN111564163B - RNN-based multiple fake operation voice detection method - Google Patents
RNN-based multiple fake operation voice detection method Download PDFInfo
- Publication number
- CN111564163B CN111564163B CN202010382185.2A CN202010382185A CN111564163B CN 111564163 B CN111564163 B CN 111564163B CN 202010382185 A CN202010382185 A CN 202010382185A CN 111564163 B CN111564163 B CN 111564163B
- Authority
- CN
- China
- Prior art keywords
- voice
- lfcc
- rnn
- matrix
- original
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000001514 detection method Methods 0.000 title claims abstract description 30
- 238000012360 testing method Methods 0.000 claims abstract description 24
- 239000011159 matrix material Substances 0.000 claims abstract description 23
- 238000012549 training Methods 0.000 claims abstract description 20
- 238000005242 forging Methods 0.000 claims abstract description 4
- 230000003595 spectral effect Effects 0.000 claims description 10
- 238000000034 method Methods 0.000 claims description 8
- 230000004913 activation Effects 0.000 claims description 3
- 238000007781 pre-processing Methods 0.000 claims description 3
- 230000004044 response Effects 0.000 claims description 3
- 238000001228 spectrum Methods 0.000 claims description 3
- 230000006870 function Effects 0.000 description 8
- 238000000605 extraction Methods 0.000 description 4
- 238000012545 processing Methods 0.000 description 4
- 238000011160 research Methods 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 230000008901 benefit Effects 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 238000002474 experimental method Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 239000000969 carrier Substances 0.000 description 1
- 230000006835 compression Effects 0.000 description 1
- 238000007906 compression Methods 0.000 description 1
- 125000004122 cyclic group Chemical group 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000005484 gravity Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000005236 sound signal Effects 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/03—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
- G10L25/24—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/27—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique
- G10L25/30—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the analysis technique using neural networks
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02T—CLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO TRANSPORTATION
- Y02T10/00—Road transport of goods or passengers
- Y02T10/10—Internal combustion engine [ICE] based vehicles
- Y02T10/40—Engine management systems
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Signal Processing (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Telephonic Communication Services (AREA)
Abstract
The invention discloses a speech detection method for multiple fake operations based on RNN, which comprises the following steps: 1) Obtaining an original voice sample, performing M kinds of forging treatment on the original voice sample to obtain M kinds of forged voices and 1 untreated original voice, extracting characteristics of the voices to obtain an LFCC matrix of a training voice sample, and sending the LFCC matrix into an RNN classifier network for training to obtain a multi-classification training model; 2) Obtaining a section of test voice, extracting features of the test voice to obtain an LFCC matrix of test voice data, sending the LFCC matrix into the RNN classifier trained in the step 1) to classify, obtaining an output probability for each test voice, and combining all the output probabilities as a final prediction result: if the predicted result is the original voice, the test voice is recognized as the original voice; if the predicted result is a voice that has undergone a certain falsification operation, the test voice is recognized as a falsified voice that has undergone the corresponding falsification operation.
Description
Technical Field
The invention relates to a voice detection method, in particular to a voice detection method for multiple fake operations based on RNN.
Background
With the increasing functionality of voice editing software, modifications to the voice content are also readily available to non-professionals. If lawbreakers maliciously forge and modify voice, even if the modified voice is used in the fields of news reporting, judicial evidence obtaining, scientific research and the like, the voice can pose a huge threat, and even can cause an unpredictable influence on social stability. The digital voice evidence obtaining method is detection of fake operation, plays a crucial role in identifying the originality and the authenticity of the audio material, and is an important research subject in the current multimedia evidence obtaining field.
Most of the existing digital voice evidence obtaining detection technologies detect a single fake operation, that is, evidence obtaining personnel assume that voice to be detected can undergo a specific fake operation. The Mengyu Qiao et al propose a detection algorithm based on statistical features of quantized MDCT coefficients and their derivatives, detecting up-converted and down-converted MP3 audio files, generating reference audio signals by recompression and calibration of the audio, and then classifying with a support vector machine, and experimental results show that the method effectively detects MP3 double compression and can detect the audio processing history of digital evidence. For example, wang Lihua et al propose a convolutional neural network-based pitch-shifting voice processing history detection, which uses four different pitch-shifting software to perform pitch shifting on three voice libraries, and uses CNN to detect pitch shifting factors of voices in and between the voice libraries and between pitch shifting methods, wherein the detection rate is up to more than 90%.
The existing digital voice evidence obtaining detection technology can detect single fake operation, and the detection rate can reach a very high level. In practical applications, however, a prover often cannot predict a specific counterfeit operation, and a false judgment may occur when detecting the counterfeit operation using a specific operation classifier.
At present, most of the existing digital evidence collection works applicable to various forging operations are concentrated on the field of digital images, and the research on digital voice evidence collection is still relatively few. In the digital voice field, a convolutional neural network model is designed by a Luo Weiqi team, can be used for detecting the audio processing operation of default settings in two different audio editing software, and provides better results. However, although the experiment originally researches various fake operation detection of the voice, the experiment has some problems that the calculation complexity is too high, the application scene of the fake operation is too ideal, and the like.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a speech detection method for various fake operations based on RNN, which can improve the detection accuracy.
The technical scheme adopted for solving the technical problems is as follows: a speech detection method for multiple fake operations based on RNN is characterized in that: the method comprises the following steps:
1) Training network: obtaining an original voice sample, performing M kinds of forging treatment on the original voice sample to obtain M kinds of forged voices and 1 untreated original voice, extracting features of the M kinds of forged voices and 1 original voice to obtain an LFCC matrix of a training voice sample, and sending the LFCC matrix into an RNN classifier network for training to obtain a multi-classification training model;
2) And (3) voice recognition: obtaining a section of test voice, extracting features of the test voice to obtain an LFCC matrix of test voice data, sending the LFCC matrix into the RNN classifier trained in the step 1) to classify, obtaining an output probability for each test voice, and combining all the output probabilities as a final prediction result: if the predicted result is the original voice, the test voice is recognized as the original voice; if the predicted result is a voice that has undergone a certain falsification operation, the test voice is recognized as a falsified voice that has undergone the corresponding falsification operation.
Preferably, in steps 1) and 2), the step of obtaining the LFCC matrix is:
1) FFT: firstly, preprocessing voice, and calculating the frequency spectrum energy E (i, k) of each voice frame after FFT:
where i is the number of speech frames, k is the frequency component, x i (m) voice signal data of the i-th frame, N being the number of fourier transforms;
the energy of the spectral energy E (i, k) per frame after passing through the triangular filter bank is then calculated:
wherein H is i (k) Representing the frequency response of the triangular filter, f (L) is the filtering function of the first triangular filter, S (i, L) is spectral line energy after passing through the triangular filter bank, L represents the number of the triangular filter, and L is the total number of the triangular filters;
2) DCT (discrete cosine transform): calculating output data lfcc (i, n) of each triangular filter bank using DCT:
wherein n represents the spectral line after DCT of the ith frame;
3) Obtaining LFCC statistical moment: taking 12-order LFCC coefficients from LFCC (i, n), and calculating the mean value and the correlation coefficient to obtain an LFCC matrix extracted from a section of voice, wherein the LFCC matrix comprises the following components:
wherein x is s,1 …x 1,n N LFCCs for the calculated s-th frame voice data.
Preferably, the RNN classifier includes an LSTM network, a Dropout layer, a full connection layer, and a Softmax layer sequentially connected, where the Dropout layer is connected to the last LSTM network.
Preferably, the LSTM network has two parameters set to (64,128) and (128,64), respectively.
Preferably, the LSTM network uses a tanh activation function.
Preferably, the dropoff layer has a dropoff function value of 0.5.
Preferably, the original speech is in WAV format.
Compared with the prior art, the invention has the advantages that: the voice cepstrum features are adopted, the probability of the result is output through the classification of the cyclic neural network, the accuracy of voice detection is improved, the voice detection method is more suitable for digital voice carriers, and different fake marks can be identified; compared with the existing method based on deep learning, the method has the advantage that the calculation complexity of the sharing parameters in the RNN is greatly reduced.
Drawings
Fig. 1 is a process diagram of LFCC statistical moment extraction in the voice detection method according to the embodiment of the present invention;
FIG. 2 is a schematic diagram of the overall framework of a voice detection method according to an embodiment of the present invention;
fig. 3 is a network configuration diagram of a voice detection method according to an embodiment of the present invention.
Detailed Description
Embodiments of the present invention are described in detail below, examples of which are illustrated in the accompanying drawings, wherein the same or similar reference numerals refer to the same or similar elements or elements having the same or similar functions.
In the description of the present invention, it should be understood that the terms "center", "longitudinal", "lateral", "length", "width", "thickness", "upper", "lower", "front", "rear", "left", "right", "vertical", "horizontal", "top", "bottom", "inner", "outer", "clockwise", "counterclockwise", "axial", "radial", "circumferential", etc. indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings are merely for purposes of describing the present invention and simplifying the description, and do not indicate or imply that the device or element being referred to must have a specific orientation, be configured and operated in a specific orientation, and because the disclosed embodiments of the present invention may be arranged in different orientations, these directional terms are merely for illustration and should not be construed as limitations, such as "upper", "lower" are not necessarily limited to orientations opposite or coincident with the direction of gravity. Furthermore, features defining "first", "second" may include one or more such features, either explicitly or implicitly.
A speech detection method based on RNN (recurrent neural network) for multiple fake operations is realized by constructing a recurrent neural network framework based on cepstrum features. Referring to fig. 2, the frame is made up of two parts: firstly, the cepstrum features of the voice sample are extracted, and then the cepstrum features are sent into a designed network frame for classification, so that the task of identifying various fake operations is achieved.
Specifically, in the present invention, feature extraction of speech is achieved in the following manner. The cepstrum features adopted by the invention are linear frequency cepstrum coefficients (Linear Frequency Cepstral Coefficients, LFCC). The voice cepstrum feature is one of the most commonly used feature parameters in voice technology, characterizes the auditory features of humans, and is widely used for speaker recognition.
LFCC is the average allocation of low frequency to high frequency band pass filters. The extraction process of the LFCC statistical moment of the present invention can be seen in fig. 1:
1) FFT: firstly, preprocessing voice, and calculating the spectrum energy E (i, k) of each voice frame after fast Fourier transform (Fast Fourier Transform, FFT):
where i is the number of speech frames, k is the frequency component, x i (m) is the speech signal data of the i-th frame, and N is the number of Fourier transforms.
Calculating the energy of the spectral energy E (i, k) of each frame after the spectral energy E (i, k) passes through the triangular filter bank:
wherein H is i (k) The frequency response of the triangular filter is represented, f (L) is the filtering function of the first triangular filter, S (i, L) is the spectral line energy after the triangular filter group, L represents the number of the triangular filter, and L is the total number of the triangular filters.
2) DCT (discrete cosine transform): then, the output data lfcc (i, n) of each triangular filter bank is calculated using a discrete cosine transform (Discrete Cosine Transform, DCT):
where n represents the spectral line after the i-th frame DCT.
3) Obtaining LFCC statistical moment: taking 12-order LFCC coefficients from LFCC (i, n), and calculating the mean value and the correlation coefficient, wherein the steps can be realized through the existing matlab function, and the LFCC matrix extracted from a certain segment of preprocessed voice is as follows, assuming that the voice has s frames in total:
wherein x is s,1 …x 1,n N LFCCs for the calculated s-th frame voice data.
Referring to fig. 3, the network framework employs RNN classifiers, the selection of the number of network layers of which is critical to the optimization algorithm, and deeper networks can learn more knowledge, but training takes a long time and may be overfitted. Thus, in the present invention, the network structure of the RNN classifier is proposed as shown in fig. 3. The network structure comprises 2 LSTM networks, parameters are respectively set to (64,128) and (128,64), and the tanh activation function is used for improving the model performance. The device also comprises a Dropout layer, a full connection layer (dense) and a Softmax layer which are connected in sequence, wherein the Dropout layer is connected with the last LSTM network. Setting the value of the Dropout function to 0.5 helps to reduce the overfitting, after full-connected layer dimension reduction, using the Softmax layer (Softmax classifier) output probabilities. The overall iterative training of the network framework is set to 50 turns. Certain adjustments can be made during specific training.
Referring back to fig. 2, the voice detection method includes the following steps:
1) First, training the network frame is required. Assuming that there are M kinds of falsification operations, M kinds of falsification processing are performed on the original speech respectively, so as to obtain m+1 kinds of speech samples including M kinds of falsified speech and 1 kind of unprocessed original speech. In the invention, certain constraint is applied to the input of the original voice, and a certain amount of WAV format audio sample library is required to be provided as training data of a network framework. Extracting features of the M+1 voice samples to obtain an LFCC matrix of training voice samples, and sending the LFCC matrix into a designed RNN classifier network for training to obtain a multi-classification training model; a plurality of original voice samples can be stored in a database, and each original voice sample is subjected to characteristic extraction and sent to an RNN classifier for training;
2) Then, obtaining a detection and identification result through the trained network framework: when a section of test voice is obtained, extracting the characteristics of the test voice to obtain an LFCC matrix of test voice data, and sending the LFCC matrix into a trained RNN classifier for classification. Each test speech will get an output probability, combining all output probabilities as the final prediction result. If the predicted result is the original voice, the test voice is recognized as the original voice; if the predicted result is a speech subjected to a certain falsification operation, the test speech is recognized as the falsified speech. The evidence obtaining person can judge whether a section of voice is forged or not according to the experimental result.
Claims (6)
1. A speech detection method for multiple fake operations based on RNN is characterized in that: the method comprises the following steps:
1) Training network: obtaining an original voice sample, performing M kinds of forging treatment on the original voice sample to obtain M kinds of forged voices and 1 untreated original voice, extracting features of the M kinds of forged voices and 1 original voice to obtain an LFCC matrix of a training voice sample, and sending the LFCC matrix into an RNN classifier network for training to obtain a multi-classification training model;
2) And (3) voice recognition: obtaining a section of test voice, extracting features of the test voice to obtain an LFCC matrix of test voice data, sending the LFCC matrix into the RNN classifier trained in the step 1) to classify, obtaining an output probability for each test voice, and combining all the output probabilities as a final prediction result: if the predicted result is the original voice, the test voice is recognized as the original voice; if the predicted result is the voice which is subjected to a certain fake operation, the test voice is recognized as the fake voice which is subjected to the corresponding fake operation;
in steps 1) and 2), the step of obtaining the LFCC matrix is:
1) FFT: firstly, preprocessing voice, and calculating the frequency spectrum energy E (i, k) of each voice frame after FFT:
where i is the number of speech frames, k is the frequency component, x i (m) voice signal data of the i-th frame, N being the number of fourier transforms;
the energy of the spectral energy E (i, k) per frame after passing through the triangular filter bank is then calculated:
wherein H is i (k) Representing the frequency response of the triangular filter, f (l) being the filter function of the first triangular filter, S (i, l) being the spectral line energy after passing through the triangular filter bank,l represents the number of the triangular filters, and L is the total number of the triangular filters;
2) DCT (discrete cosine transform): calculating output data lfcc (i, n) of each triangular filter bank using DCT:
wherein n represents the spectral line after DCT of the ith frame;
3) Obtaining LFCC statistical moment: taking 12-order LFCC coefficients from LFCC (i, n), and calculating the mean value and the correlation coefficient to obtain an LFCC matrix extracted from a section of voice, wherein the LFCC matrix comprises the following components:
wherein x is s,1 …x 1,n N LFCCs for the calculated s-th frame voice data.
2. The RNN-based multiple counterfeit operation voice detection method according to claim 1, wherein: the RNN classifier comprises an LSTM network, a Dropout layer, a full connection layer and a Softmax layer which are sequentially connected, wherein the Dropout layer is connected with the last LSTM network.
3. The RNN-based multiple counterfeit operation voice detection method according to claim 2, wherein: the LSTM network has two parameters set to (64,128) and (128,64), respectively.
4. The RNN-based multiple counterfeit operation voice detection method according to claim 2, wherein: the LSTM network uses a tanh activation function.
5. The RNN-based multiple counterfeit operation voice detection method according to claim 2, wherein: the Dropout function value of the Dropout layer is 0.5.
6. The RNN-based multiple counterfeit operation voice detection method according to claim 1, wherein: the original speech is in WAV format.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010382185.2A CN111564163B (en) | 2020-05-08 | 2020-05-08 | RNN-based multiple fake operation voice detection method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010382185.2A CN111564163B (en) | 2020-05-08 | 2020-05-08 | RNN-based multiple fake operation voice detection method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN111564163A CN111564163A (en) | 2020-08-21 |
CN111564163B true CN111564163B (en) | 2023-12-15 |
Family
ID=72071821
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010382185.2A Active CN111564163B (en) | 2020-05-08 | 2020-05-08 | RNN-based multiple fake operation voice detection method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN111564163B (en) |
Families Citing this family (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113488073B (en) * | 2021-07-06 | 2023-11-24 | 浙江工业大学 | Fake voice detection method and device based on multi-feature fusion |
CN113299315B (en) * | 2021-07-27 | 2021-10-15 | 中国科学院自动化研究所 | Method for generating voice features through continuous learning without original data storage |
CN113362814B (en) * | 2021-08-09 | 2021-11-09 | 中国科学院自动化研究所 | Voice identification model compression method fusing combined model information |
CN113488027A (en) * | 2021-09-08 | 2021-10-08 | 中国科学院自动化研究所 | Hierarchical classification generated audio tracing method, storage medium and computer equipment |
CN113555007B (en) * | 2021-09-23 | 2021-12-14 | 中国科学院自动化研究所 | Voice splicing point detection method and storage medium |
CN115249487B (en) * | 2022-07-21 | 2023-04-14 | 中国科学院自动化研究所 | Incremental generated voice detection method and system for playback boundary load sample |
CN116229960B (en) * | 2023-03-08 | 2023-10-31 | 江苏微锐超算科技有限公司 | Robust detection method, system, medium and equipment for deceptive voice |
CN117690455A (en) * | 2023-12-21 | 2024-03-12 | 合肥工业大学 | Sliding window-based partial synthesis fake voice detection method and system |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
GB201514943D0 (en) * | 2015-08-21 | 2015-10-07 | Validsoft Uk Ltd | Replay attack detection |
US9299364B1 (en) * | 2008-06-18 | 2016-03-29 | Gracenote, Inc. | Audio content fingerprinting based on two-dimensional constant Q-factor transform representation and robust audio identification for time-aligned applications |
KR20160125628A (en) * | 2015-04-22 | 2016-11-01 | (주)사운드렉 | A method for recognizing sound based on acoustic feature extraction and probabillty model |
WO2018107810A1 (en) * | 2016-12-15 | 2018-06-21 | 平安科技(深圳)有限公司 | Voiceprint recognition method and apparatus, and electronic device and medium |
CN108806698A (en) * | 2018-03-15 | 2018-11-13 | 中山大学 | A kind of camouflage audio recognition method based on convolutional neural networks |
CN109599116A (en) * | 2018-10-08 | 2019-04-09 | 中国平安财产保险股份有限公司 | The method, apparatus and computer equipment of supervision settlement of insurance claim based on speech recognition |
CN110491391A (en) * | 2019-07-02 | 2019-11-22 | 厦门大学 | A kind of deception speech detection method based on deep neural network |
CN110931022A (en) * | 2019-11-19 | 2020-03-27 | 天津大学 | Voiceprint identification method based on high-frequency and low-frequency dynamic and static characteristics |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10860858B2 (en) * | 2018-06-15 | 2020-12-08 | Adobe Inc. | Utilizing a trained multi-modal combination model for content and text-based evaluation and distribution of digital video content to client devices |
-
2020
- 2020-05-08 CN CN202010382185.2A patent/CN111564163B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9299364B1 (en) * | 2008-06-18 | 2016-03-29 | Gracenote, Inc. | Audio content fingerprinting based on two-dimensional constant Q-factor transform representation and robust audio identification for time-aligned applications |
KR20160125628A (en) * | 2015-04-22 | 2016-11-01 | (주)사운드렉 | A method for recognizing sound based on acoustic feature extraction and probabillty model |
GB201514943D0 (en) * | 2015-08-21 | 2015-10-07 | Validsoft Uk Ltd | Replay attack detection |
WO2018107810A1 (en) * | 2016-12-15 | 2018-06-21 | 平安科技(深圳)有限公司 | Voiceprint recognition method and apparatus, and electronic device and medium |
CN108806698A (en) * | 2018-03-15 | 2018-11-13 | 中山大学 | A kind of camouflage audio recognition method based on convolutional neural networks |
CN109599116A (en) * | 2018-10-08 | 2019-04-09 | 中国平安财产保险股份有限公司 | The method, apparatus and computer equipment of supervision settlement of insurance claim based on speech recognition |
CN110491391A (en) * | 2019-07-02 | 2019-11-22 | 厦门大学 | A kind of deception speech detection method based on deep neural network |
CN110931022A (en) * | 2019-11-19 | 2020-03-27 | 天津大学 | Voiceprint identification method based on high-frequency and low-frequency dynamic and static characteristics |
Non-Patent Citations (4)
Title |
---|
Combining Phase-based Features for Replay Spoof Detection System;Kantheti Srinivas;《2018 11th International Symposium on Chinese Spoken Language Processing》;第151-155页 * |
Mapping model of network scenarios and routing metrics in DTN;Qin Zhenzhen;《 Journal of Nanjing University of Science and Technology》;第40卷(第3期);第291-296页 * |
针对多种伪造操作的数字语音取证算法;乌婷婷 等;《无线通信技术》(第3期);第37-45页 * |
陈柱欣.基于深度神经网络的声纹欺骗检测研究.《中国优秀硕士学位论文全文数据库-信息科技辑》.2020,(第1期),第I136-340页. * |
Also Published As
Publication number | Publication date |
---|---|
CN111564163A (en) | 2020-08-21 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111564163B (en) | RNN-based multiple fake operation voice detection method | |
CN108198574B (en) | Sound change detection method and device | |
CN110310666B (en) | Musical instrument identification method and system based on SE convolutional network | |
US20040260550A1 (en) | Audio processing system and method for classifying speakers in audio data | |
CN110033756B (en) | Language identification method and device, electronic equipment and storage medium | |
CN109949824B (en) | City sound event classification method based on N-DenseNet and high-dimensional mfcc characteristics | |
CN111986699B (en) | Sound event detection method based on full convolution network | |
CN110120230B (en) | Acoustic event detection method and device | |
CN111275165A (en) | Network intrusion detection method based on improved convolutional neural network | |
CN107145778B (en) | Intrusion detection method and device | |
Halkias et al. | Classification of mysticete sounds using machine learning techniques | |
CN106910495A (en) | A kind of audio classification system and method for being applied to abnormal sound detection | |
CN113571067A (en) | Voiceprint recognition countermeasure sample generation method based on boundary attack | |
CN114495950A (en) | Voice deception detection method based on deep residual shrinkage network | |
Lu et al. | Unsupervised speaker segmentation and tracking in real-time audio content analysis | |
CN113299315B (en) | Method for generating voice features through continuous learning without original data storage | |
CN115910045B (en) | Model training method and recognition method for voice wake-up word | |
CN116229960B (en) | Robust detection method, system, medium and equipment for deceptive voice | |
Qin et al. | Multi-branch feature aggregation based on multiple weighting for speaker verification | |
WO2021088176A1 (en) | Binary multi-band power distribution-based low signal-to-noise ratio sound event detection method | |
CN115331661A (en) | Voiceprint recognition backdoor attack defense method based on feature clustering analysis and feature dimension reduction | |
CN115064175A (en) | Speaker recognition method | |
CN112967712A (en) | Synthetic speech detection method based on autoregressive model coefficient | |
Chun et al. | Research on music classification based on MFCC and BP neural network | |
CN111783534B (en) | Sleep stage method based on deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |