CN113488027A - Hierarchical classification generated audio tracing method, storage medium and computer equipment - Google Patents

Hierarchical classification generated audio tracing method, storage medium and computer equipment Download PDF

Info

Publication number
CN113488027A
CN113488027A CN202111046475.0A CN202111046475A CN113488027A CN 113488027 A CN113488027 A CN 113488027A CN 202111046475 A CN202111046475 A CN 202111046475A CN 113488027 A CN113488027 A CN 113488027A
Authority
CN
China
Prior art keywords
audio
training
classification
trained
generated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111046475.0A
Other languages
Chinese (zh)
Inventor
陶建华
马浩鑫
易江燕
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Automation of Chinese Academy of Science
Original Assignee
Institute of Automation of Chinese Academy of Science
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Automation of Chinese Academy of Science filed Critical Institute of Automation of Chinese Academy of Science
Priority to CN202111046475.0A priority Critical patent/CN113488027A/en
Publication of CN113488027A publication Critical patent/CN113488027A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/16Speech classification or search using artificial neural networks

Abstract

The invention provides a hierarchical classification audio generation tracing method, a storage medium and computer equipment, comprising the following steps: extracting acoustic features of the training audio; inputting the acoustic characteristics of the training audio into a two-classification model, and performing two-classification model training to obtain a trained two-classification model; marking different labels on the generated training audio according to the generation method of the training audio, and inputting the acoustic characteristics of the generated training audio into a multi-classification model for training to obtain a trained multi-classification model; and extracting acoustic features of the test audio, inputting the acoustic features of the test audio into the trained two-class model, judging whether the test audio is real or generated voice, if so, terminating prediction, and if so, inputting the generated acoustic features of the test audio into the trained multi-class model to predict the generation source type of the test audio.

Description

Hierarchical classification generated audio tracing method, storage medium and computer equipment
Technical Field
The invention relates to the field of voice processing and image processing, in particular to a hierarchical classification audio generation tracing method.
Background
The output of the current generated voice detection network only has true and false binary classification results, however, under the background of evidence collection of practical public security and court, people not only care about the true effectiveness of the audio, but also need to know what the generation source is if the audio is synthetic or recorded and the like. Research into multi-classification traceability of audio is still blank at present.
The generated voice is identified as whether the input voice is judged and detected to be the generated voice or not, and the detected binary detection result is output. The current detection scheme is mainly based on two improvements: more discriminative acoustic features and more efficient classifiers, although in recent years models of end-to-end structures no longer distinguish between feature extraction modules and classifiers, the mainstream research at present still adopts a feature extraction and classifier architecture. In the classifier level, most of researches select a certain neural network to perform classification training, such as a residual neural network (ResNet), a lightweight convolutional neural network (LightCNN), and the like, and only pay attention to the judgment of voice authenticity. However, in practical application scenarios such as forensics, people do not pay attention to the reality of audio, and need to know the source of the false audio (i.e. which synthesis methods are used to generate false speech/which company's technology generates audio/which model of recording device records, etc.).
Publication No. CN113299315A, provides a method for generating speech features without continuous learning of raw data storage, comprising: collecting audio data, and extracting audio acoustic features to obtain linear cepstrum coefficient features; training a deep learning network model by applying the linear cepstrum coefficient characteristics to obtain a source domain model; regularization loss is added on the basis of a training loss function of the source domain model, the direction of model parameter optimization is constrained, and model parameters of the source domain model are updated by using newly acquired audio data to obtain a target domain model.
The method is characterized in that the model is continuously updated, namely, the original model is updated by new data, the model has memory of old knowledge, the continuous learning of the model is the innovation of the model training and updating process, the learning is the characteristics of generated voice, then a classification task is carried out, and the input audio is used for obtaining the reality of the audio/generating a classification result. Publication No. CN113314148A provides a lightweight neural network generated speech discrimination method and system based on original waveforms, including: sampling an audio file according to a fixed sampling rate to obtain original waveform points of the audio file, and segmenting the original waveform points into original audio frames to obtain an original audio frame sequence; the first layer is a fixed one-dimensional convolution layer, the one-dimensional convolution layer is a structure formed by mutually stacking a conventional module and a dimension reduction module, the first layer is an average pooling layer, and the average pooling layer is a full-connection layer to construct a search network; inputting the original audio frame sequence into a search network, and respectively searching the optimal operation connection between each neuron in the conventional module and the dimension reduction module to obtain an optimal model structure; and training the searched optimal model structure by using the original audio frame sequence to obtain a trained search network. The method is characterized in that the process of model training generation is emphasized, original audio is used as the output of a network through a network structure searching method, the network is used as feature extraction and a classifier, an end-to-end network structure is designed, meanwhile, the redundancy of a manual network is removed through the network searching method, the main innovation point is the generation of the model structure, the model is still input audio after the model is completed, and the classification and the judgment of real/generated voice are carried out.
The technical problem to be solved by the present application is to generate the tracing source of the voice, not the discrimination of the real/generated voice.
The prior art has the following defects: only two classification results are generated really, the two classification results are not detailed enough, the generation source type is not given, the audio tracing cannot be carried out, the judicial evidence obtaining cannot be provided with a judgment basis, the frequency tracing has important significance for the judicial evidence obtaining, if only two classification results are generated really, the two classification results are not detailed enough, the generation source type is not given, the audio tracing cannot be carried out, and the persuasion of the effectiveness of the audio evidence can be weakened greatly.
Disclosure of Invention
In view of the above, a first aspect of the present invention provides a hierarchical classification audio tracing method, including:
s1: extracting acoustic features of the training audio;
s2: inputting the acoustic characteristics of the training audio into a two-classification model, and performing two-classification model training to obtain a trained two-classification model;
s3: marking different labels on the generated training audio according to the generation method of the training audio, and inputting the acoustic characteristics of the generated training audio into a multi-classification model for training to obtain a trained multi-classification model;
s4: and extracting acoustic features of the test audio, inputting the acoustic features of the test audio into the trained two-class model, judging whether the test audio is real or generated voice, if so, terminating prediction, and if so, inputting the generated acoustic features of the test audio into the trained multi-class model to predict the generation source type of the test audio.
In some specific embodiments, the specific method for extracting the acoustic features of the training audio includes: the method comprises the steps of sampling a training audio to an original waveform point, then carrying out pre-emphasis, framing, windowing, fast Fourier transform, linear filter bank, logarithm taking and discrete cosine transform to obtain 60-dimensional LFCC-linear coefficient cepstrum characteristics of the audio.
In some specific embodiments, the windowed window is 25 frames in length.
In some specific embodiments, the fast fourier transform is a 512-dimensional FFT-fast fourier transform.
In some specific embodiments, the two-classification model employs a LightCNN network.
In some specific embodiments, the two-class model is trained for 150 rounds, an adaptive moment estimation (adam) optimizer is selected, the initial learning rate is set to 0.001, and the batch size is 128.
In some specific embodiments, the multi-classification model employs a ResNet18 network.
In some specific embodiments, the multi-classification model is trained for 100 rounds, the adam optimizer is selected, the initial learning rate is set to 0.001, and the batch size is 128.
A second aspect of the present invention provides a readable storage medium, which stores one or more programs, which are executable by one or more processors, for implementing the steps of the hierarchical classification audio tracing generating method according to the first aspect.
A third aspect of the invention provides a computer apparatus comprising a processor and a memory, wherein the memory is for storing a computer program; the processor is configured to implement the steps of the hierarchical classification generated audio tracing method according to the first aspect when executing the computer program stored in the memory.
Compared with the prior art, the technical scheme provided by the embodiment of the application has the following advantages:
and providing a judgment basis for audio evidence collection, and further providing a criterion for generating a source type on the basis of judging to generate the audio so as to collect the audio evidence.
Drawings
Fig. 1 is a flowchart of a method for generating audio source tracing by hierarchical classification according to an embodiment of the present invention;
fig. 2 is a diagram of a prediction process of a method for generating audio source tracing according to an embodiment of the present invention.
Detailed Description
Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present invention. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the invention, as detailed in the appended claims.
Example 1:
the embodiment of the present application as illustrated in figure 1 provides a hierarchical classification method of generating audio traceability,
s1: extracting acoustic features of the training audio;
s2: inputting the acoustic characteristics of the training audio into a two-classification model, and performing two-classification model training to obtain a trained two-classification model;
s3: marking different labels on the generated training audio according to the generation method of the training audio, and inputting the acoustic characteristics of the generated training audio into a multi-classification model for training to obtain a trained multi-classification model;
s4: and extracting acoustic features of the test audio, inputting the acoustic features of the test audio into the trained two-class model, judging whether the model is real or generating voice, if so, terminating prediction, and if so, inputting the generated acoustic features of the test audio into the trained multi-class model to predict the generation source type of the model.
In some specific embodiments, the specific method of extracting the acoustic features of the training audio includes: the method comprises the steps of sampling a training audio to an original waveform point, then carrying out pre-emphasis, framing, windowing, fast Fourier transform, linear filter bank, logarithm taking and discrete cosine transform to obtain 60-dimensional LFCC-linear coefficient cepstrum characteristics of the audio.
In some specific embodiments, the windowed window is 25 frames in length.
In some specific embodiments, the fast fourier transform is a 512-dimensional FFT-fast fourier transform.
In some specific embodiments, the binary model employs a LightCNN network.
In some specific embodiments, the binary model is trained for 150 rounds, an optimizer for adam-adaptive moment estimation is selected, the initial learning rate is set to 0.001, and the batch size is 128.
In some specific embodiments, the multi-classification model employs a ResNet18 network.
In some specific embodiments, the multi-classification model is trained for 100 rounds, an optimizer for adam-adaptive moment estimation is selected, the initial learning rate is set to 0.001, and the batch size is 128.
Example 2:
as shown in fig. 1, in some specific application fields, the scheme described in embodiment 1 is adopted to specifically provide an embodiment of a hierarchical classification audio source generating method, and the specific method and steps are as follows:
s1: the method for extracting the acoustic features of the training audio comprises the following steps: sampling training audio to an original waveform point, then performing pre-emphasis, framing, windowing, fast Fourier transform, linear filter bank, logarithm taking and discrete cosine transform to obtain 60-dimensional LFCC characteristics of the audio, wherein the window length is 25 frames, and 512-dimensional FFT is performed;
s2: inputting the acoustic characteristics of the training audio into a two-classification model, and performing two-classification model training to obtain a trained two-classification model; the two-classification model adopts a LightCNN network, the two-classification model is trained for 150 rounds, an adam optimizer is selected, the initial learning rate is set to be 0.001, and the batch data size is 128;
s3: marking different labels on the generated training audio according to the generation method of the training audio, and inputting the acoustic characteristics of the generated training audio into a multi-classification model for training to obtain a trained multi-classification model; the multi-classification model adopts a ResNet18 network, the multi-classification model is trained for 100 rounds, an adam optimizer is selected, the initial learning rate is set to be 0.001, and the batch data size is 128;
s4: as shown in fig. 2, extracting the acoustic features of the test audio, inputting the acoustic features of the test audio into the trained two-class model, performing the discrimination of the real/generated speech, if the acoustic features of the test audio are discriminated to be real, terminating the prediction, and if the acoustic features of the test audio are discriminated to be generated, inputting the acoustic features of the generated test audio into the trained multi-class model to predict the generation source type of the multi-class model.
Example 3:
the present invention further provides a readable storage medium, which stores one or more programs, where the one or more programs are executable by one or more processors, and implement the steps of the hierarchical classification audio tracing generating method according to the embodiment of the first aspect.
Example 4:
the invention additionally provides computer equipment, which comprises a processor and a memory, wherein the memory is used for storing computer programs; the processor is configured to, when executing the computer program stored in the memory, implement the steps of the hierarchical classification generated audio tracing method according to the embodiment of the first aspect.
The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.
It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present invention. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.
The implementation process of the functions and actions of each unit in the above device is specifically described in the implementation process of the corresponding step in the above method, and is not described herein again.
For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the invention. One of ordinary skill in the art can understand and implement it without inventive effort.
Embodiments of the subject matter and the functional operations described in this specification can be implemented in: digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware including the structures disclosed in this specification and their structural equivalents, or a combination of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions, encoded on a tangible, non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. Alternatively or additionally, the program instructions may be encoded on an artificially generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode and transmit information to suitable receiver apparatus for execution by the data processing apparatus. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.
The processes and logic flows described in this specification can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).
Computers suitable for executing computer programs include, for example, general and/or special purpose microprocessors, or any other type of central processing unit. Generally, a central processing unit will receive instructions and data from a read-only memory and/or a random access memory. The basic components of a computer include a central processing unit for implementing or executing instructions and one or more memory devices for storing instructions and data. Generally, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks. However, a computer does not necessarily have such a device. Moreover, a computer may be embedded in another device, e.g., a mobile telephone, a Personal Digital Assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device such as a Universal Serial Bus (USB) flash drive, to name a few.
Computer-readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices, including by way of example semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices), magnetic disks (e.g., an internal hard disk or a removable disk), magneto-optical disks, and CD ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.
While this specification contains many specific implementation details, these should not be construed as limitations on the scope of any invention or of what may be claimed, but rather as descriptions of features specific to particular embodiments of particular inventions. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. In other instances, features described in connection with one embodiment may be implemented as discrete components or in any suitable subcombination. Moreover, although features may be described above as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.
Similarly, while operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.
Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. Further, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.
The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (10)

1. A hierarchical classification generated audio traceability method, the method comprising:
s1: extracting acoustic features of the training audio;
s2: inputting the acoustic characteristics of the training audio into a two-classification model, and performing two-classification model training to obtain a trained two-classification model;
s3: marking different labels on the generated training audio according to the generation method of the training audio, and inputting the acoustic characteristics of the generated training audio into a multi-classification model for training to obtain a trained multi-classification model;
s4: and extracting acoustic features of the test audio, inputting the acoustic features of the test audio into the trained two-class model, judging whether the test audio is real or generated voice, if so, terminating prediction, and if so, inputting the generated acoustic features of the test audio into the trained multi-class model to predict the generation source type of the test audio.
2. The hierarchical classification generating audio tracing method according to claim 1, wherein the specific method for extracting the acoustic features of the training audio comprises: the method comprises the steps of sampling a training audio to an original waveform point, then carrying out pre-emphasis, framing, windowing, fast Fourier transform, linear filter bank, logarithm taking and discrete cosine transform to obtain 60-dimensional linear coefficient cepstrum characteristics of the audio.
3. The hierarchically classified generative audio tracing method of claim 2, wherein said windowed window has a length of 25 frames.
4. The hierarchically classified generated audio tracing method according to claim 3, wherein said fast Fourier transform is a 512-dimensional fast Fourier transform.
5. The hierarchically classified generative audio tracing method of claim 1, wherein the two classification models employ a lightweight convolutional neural network.
6. The hierarchical classification generating audio tracing method of claim 5, wherein the two classification models are trained for 150 rounds, an adaptive moment estimation optimizer is selected, an initial learning rate is set to 0.001, and a batch data size is 128.
7. The hierarchically classified generative audio tracing method of claim 1, wherein the multi-classification model employs an 18-layer residual neural network.
8. The hierarchical classification generating audio tracing method of claim 7, wherein the multi-classification model is trained for 100 rounds, an adaptive moment estimation optimizer is selected, an initial learning rate is set to 0.001, and a batch data size is 128.
9. A readable storage medium storing one or more programs, the one or more programs being executable by one or more processors to perform the steps of the hierarchical classified generate audio traceability method of any of claims 1-8.
10. A computer device comprising a processor and a memory, wherein the memory is configured to store a computer program; the processor, when executing a computer program stored on the memory, is configured to perform the steps of the hierarchical classification generating audio traceability method according to any one of claims 1 to 8.
CN202111046475.0A 2021-09-08 2021-09-08 Hierarchical classification generated audio tracing method, storage medium and computer equipment Pending CN113488027A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111046475.0A CN113488027A (en) 2021-09-08 2021-09-08 Hierarchical classification generated audio tracing method, storage medium and computer equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111046475.0A CN113488027A (en) 2021-09-08 2021-09-08 Hierarchical classification generated audio tracing method, storage medium and computer equipment

Publications (1)

Publication Number Publication Date
CN113488027A true CN113488027A (en) 2021-10-08

Family

ID=77947339

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111046475.0A Pending CN113488027A (en) 2021-09-08 2021-09-08 Hierarchical classification generated audio tracing method, storage medium and computer equipment

Country Status (1)

Country Link
CN (1) CN113488027A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115083422A (en) * 2022-07-21 2022-09-20 中国科学院自动化研究所 Voice traceability evidence obtaining method and device, equipment and storage medium

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180254046A1 (en) * 2017-03-03 2018-09-06 Pindrop Security, Inc. Method and apparatus for detecting spoofing conditions
CN110491391A (en) * 2019-07-02 2019-11-22 厦门大学 A kind of deception speech detection method based on deep neural network
CN110600053A (en) * 2019-07-30 2019-12-20 广东工业大学 Cerebral stroke dysarthria risk prediction method based on ResNet and LSTM network
CN110808033A (en) * 2019-09-25 2020-02-18 武汉科技大学 Audio classification method based on dual data enhancement strategy
CN111564163A (en) * 2020-05-08 2020-08-21 宁波大学 RNN-based voice detection method for various counterfeit operations
CN111613246A (en) * 2020-05-28 2020-09-01 腾讯音乐娱乐科技(深圳)有限公司 Audio classification prompting method and related equipment
CN111724810A (en) * 2019-03-19 2020-09-29 杭州海康威视数字技术股份有限公司 Audio classification method and device
US20200322377A1 (en) * 2019-04-08 2020-10-08 Pindrop Security, Inc. Systems and methods for end-to-end architectures for voice spoofing detection
CN111859011A (en) * 2020-07-16 2020-10-30 腾讯音乐娱乐科技(深圳)有限公司 Audio processing method and device, storage medium and electronic equipment
CN112201255A (en) * 2020-09-30 2021-01-08 浙江大学 Voice signal spectrum characteristic and deep learning voice spoofing attack detection method
CN112712809A (en) * 2021-03-29 2021-04-27 北京远鉴信息技术有限公司 Voice detection method and device, electronic equipment and storage medium
CN113128619A (en) * 2021-05-10 2021-07-16 北京瑞莱智慧科技有限公司 Method for training detection model of counterfeit sample, method for identifying counterfeit sample, apparatus, medium, and device
CN113241079A (en) * 2021-04-29 2021-08-10 江西师范大学 Voice spoofing detection method based on residual error neural network
CN113284508A (en) * 2021-07-21 2021-08-20 中国科学院自动化研究所 Hierarchical differentiation based generated audio detection system
CN113299315A (en) * 2021-07-27 2021-08-24 中国科学院自动化研究所 Method for generating voice features through continuous learning without original data storage

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180254046A1 (en) * 2017-03-03 2018-09-06 Pindrop Security, Inc. Method and apparatus for detecting spoofing conditions
CN111724810A (en) * 2019-03-19 2020-09-29 杭州海康威视数字技术股份有限公司 Audio classification method and device
US20200322377A1 (en) * 2019-04-08 2020-10-08 Pindrop Security, Inc. Systems and methods for end-to-end architectures for voice spoofing detection
CN110491391A (en) * 2019-07-02 2019-11-22 厦门大学 A kind of deception speech detection method based on deep neural network
CN110600053A (en) * 2019-07-30 2019-12-20 广东工业大学 Cerebral stroke dysarthria risk prediction method based on ResNet and LSTM network
CN110808033A (en) * 2019-09-25 2020-02-18 武汉科技大学 Audio classification method based on dual data enhancement strategy
CN111564163A (en) * 2020-05-08 2020-08-21 宁波大学 RNN-based voice detection method for various counterfeit operations
CN111613246A (en) * 2020-05-28 2020-09-01 腾讯音乐娱乐科技(深圳)有限公司 Audio classification prompting method and related equipment
CN111859011A (en) * 2020-07-16 2020-10-30 腾讯音乐娱乐科技(深圳)有限公司 Audio processing method and device, storage medium and electronic equipment
CN112201255A (en) * 2020-09-30 2021-01-08 浙江大学 Voice signal spectrum characteristic and deep learning voice spoofing attack detection method
CN112712809A (en) * 2021-03-29 2021-04-27 北京远鉴信息技术有限公司 Voice detection method and device, electronic equipment and storage medium
CN113241079A (en) * 2021-04-29 2021-08-10 江西师范大学 Voice spoofing detection method based on residual error neural network
CN113128619A (en) * 2021-05-10 2021-07-16 北京瑞莱智慧科技有限公司 Method for training detection model of counterfeit sample, method for identifying counterfeit sample, apparatus, medium, and device
CN113284508A (en) * 2021-07-21 2021-08-20 中国科学院自动化研究所 Hierarchical differentiation based generated audio detection system
CN113299315A (en) * 2021-07-27 2021-08-24 中国科学院自动化研究所 Method for generating voice features through continuous learning without original data storage

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
张立等: "基于语音高频信息的伪装语音检测算法", 《数据通信》 *
张雄伟等: "语音欺骗检测方法的研究现状及展望", 《数据采集与处理》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115083422A (en) * 2022-07-21 2022-09-20 中国科学院自动化研究所 Voice traceability evidence obtaining method and device, equipment and storage medium
CN115083422B (en) * 2022-07-21 2022-11-15 中国科学院自动化研究所 Voice traceability evidence obtaining method and device, equipment and storage medium

Similar Documents

Publication Publication Date Title
Kong et al. Weakly labelled audioset tagging with attention neural networks
Lidy et al. CQT-based Convolutional Neural Networks for Audio Scene Classification.
CN110852215B (en) Multi-mode emotion recognition method and system and storage medium
CN111444967B (en) Training method, generating method, device, equipment and medium for generating countermeasure network
CN112912897A (en) Sound classification system
CN104903954A (en) Speaker verification and identification using artificial neural network-based sub-phonetic unit discrimination
CN112528920A (en) Pet image emotion recognition method based on depth residual error network
Cartwright et al. Tricycle: Audio representation learning from sensor network data using self-supervision
Imran et al. An analysis of audio classification techniques using deep learning architectures
Thornton Audio recognition using mel spectrograms and convolution neural networks
Oo et al. Fusion of Log-Mel Spectrogram and GLCM feature in acoustic scene classification
Wang et al. Automated call detection for acoustic surveys with structured calls of varying length
CN113488027A (en) Hierarchical classification generated audio tracing method, storage medium and computer equipment
Sattigeri et al. A scalable feature learning and tag prediction framework for natural environment sounds
McLoughlin et al. Early detection of continuous and partial audio events using CNN
Xie et al. Investigation of acoustic and visual features for frog call classification
CN113314148B (en) Light-weight neural network generated voice identification method and system based on original waveform
CN113111855B (en) Multi-mode emotion recognition method and device, electronic equipment and storage medium
CN102308307B (en) Method for pattern discovery and recognition
Islam et al. DCNN-LSTM based audio classification combining multiple feature engineering and data augmentation techniques
Xie et al. Image processing and classification procedure for the analysis of australian frog vocalisations
Rajesh et al. Combined evidence of MFCC and CRP features using machine learning algorithms for singer identification
Guo UL-net: Fusion Spatial and Temporal Features for Bird Voice Detection
CN116052725B (en) Fine granularity borborygmus recognition method and device based on deep neural network
Ajitha et al. Emotion Recognition in Speech Using MFCC and Classifiers

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination