CN111755014A - Domain-adaptive replay attack detection method and system - Google Patents

Domain-adaptive replay attack detection method and system Download PDF

Info

Publication number
CN111755014A
CN111755014A CN202010630019.XA CN202010630019A CN111755014A CN 111755014 A CN111755014 A CN 111755014A CN 202010630019 A CN202010630019 A CN 202010630019A CN 111755014 A CN111755014 A CN 111755014A
Authority
CN
China
Prior art keywords
replay
shared
module
detection module
voiceprint feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010630019.XA
Other languages
Chinese (zh)
Other versions
CN111755014B (en
Inventor
伍强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sichuan Changhong Electric Co Ltd
Original Assignee
Sichuan Changhong Electric Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Sichuan Changhong Electric Co Ltd filed Critical Sichuan Changhong Electric Co Ltd
Priority to CN202010630019.XA priority Critical patent/CN111755014B/en
Publication of CN111755014A publication Critical patent/CN111755014A/en
Application granted granted Critical
Publication of CN111755014B publication Critical patent/CN111755014B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/30Authentication, i.e. establishing the identity or authorisation of security principals
    • G06F21/31User authentication
    • G06F21/32User authentication using biometric data, e.g. fingerprints, iris scans or voiceprints
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4014Identity check for transactions
    • G06Q20/40145Biometric identity checks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0861Network architectures or network communication protocols for network security for authentication of entities using biometrical features, e.g. fingerprint, retina-scan

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Signal Processing (AREA)
  • Theoretical Computer Science (AREA)
  • Biomedical Technology (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Circuit For Audible Band Transducer (AREA)
  • Soundproofing, Sound Blocking, And Sound Damping (AREA)

Abstract

The invention discloses a field self-adaptive detection method for replay attack of a sound recording, which comprises the following steps: extracting acoustic features from at least one recording region of the recording; extracting a shared voiceprint feature vector from the acoustic features; and detecting whether the sound recording is a replay sound recording or not by a domain adaptive method from the shared voiceprint feature vector. The invention can still ensure the robustness of the record replay attack detection system under the conditions of the equipment and environment of record replay and the field diversity of speakers.

Description

Domain-adaptive replay attack detection method and system
Technical Field
The invention relates to the technical field of voice signal processing, in particular to a field-adaptive record playback attack detection method and system.
Background
In recent years, with the rapid development of artificial intelligence technology, more and more products with artificial intelligence technology appear in people's daily life, especially the smart sound box of recent years is different military prominence. The voiceprint recognition technology is almost the standard configuration of all intelligent sound boxes, and a user can finish account login, shopping payment and the like by using own voice. The detection of the replay attack of the recording is an extremely important link in a voiceprint recognition system, and whether a real person from which voice comes or the recording is judged. The diversity of domains leads to degraded performance of replay attack detection systems, as devices, environments, and speakers of the replay are diverse.
Disclosure of Invention
The invention provides a method and a system for detecting the replay attack of the recording with self-adaption in the field, aiming at solving the problem of field diversity of the replay attack of the recording. Designing a shared voiceprint feature extraction module, inputting the acoustic features of voice into the shared module, extracting the shared voiceprint features, and then respectively inputting the shared voiceprint features into four sub-classification modules, wherein the four sub-classification modules respectively comprise: a replay attack detection module, a replay device detection module, a replay environment detection module and a replay speaker detection module. The error gradients of the replay attack detection module are directly fed back to the shared voiceprint feature extraction module and the replay attack detection module, the error gradients of the replay device detection module, the replay environment detection module and the replay speaker detection module are fed back to the outside of the respective modules, and the error gradients are fed back to the shared voiceprint feature extraction module after being inverted. By the method and the system, the field adaptivity of the system can be enhanced, and the replay attack detection capability of the system is improved.
The invention realizes the purpose through the following technical scheme:
a method and a system for detecting the attack of playback of a voice record with self-adaptation field comprise the following steps:
calculating and extracting acoustic features from at least one recording region in the recording, wherein the acoustic features comprise Mel Frequency Cepstrum Coefficient (MFCC) or Power-normalized Cepstral Coefficients (PNCC);
extracting a shared voiceprint feature vector from the acoustic features;
and detecting whether the sound recording is a replay sound recording or not from the shared voiceprint feature vector by a domain adaptive method.
Further, in a detection phase, the shared voiceprint feature vectors are used to detect corresponding targets of at least one domain adaptive countermeasure task associated with the replay attack detection, the domain adaptive countermeasure task comprising: a playback device detection task, a playback environment detection task, and a playback speaker detection task.
Furthermore, the shared voiceprint feature vector is extracted through a shared voiceprint feature module, whether the record is replayed or not is detected through a replay attack detection module, the replay device detection task is achieved through a replay device detection module, the replay environment detection task is achieved through a replay environment detection module, and the replay speaker detection task is achieved through a replay speaker detection module.
Further, the shared voiceprint feature module, the replay attack detection module, the replay device detection module, the replay environment detection module, and the replay speaker detection module are all formed of a deep neural network including a combination of one or more of a Convolutional Neural Network (CNN), a recurrent neural network (RNN, LSTM, GRU), and a time-delayed neural network (TDNN).
Further, the method also comprises a training method of each module. Wherein the weight of the shared voiceprint feature module is WfThe replay attack detection module has a weight WaThe weight of the detection module of the playback device is WdThe replay speaker detection module has a weight WsThe playback environment detection module has a weight WeThe training steps of each module are as follows:
s0: inputting the acoustic features of the sound recording into a shared voiceprint feature module, and extracting shared voiceprint feature vectors;
s1: inputting the shared voiceprint feature vector in S0 into a replay attack detection module, and outputting a classification error La
S2: inputting the shared voiceprint feature vector in S0 into a detection module of a playback device, and outputting a classification error Ld
S3: mixing S0 togetherThe shared voiceprint characteristic vector is input into a speaker detection module for replaying, and a classification error L is outputs
S4: inputting the shared voiceprint feature vector in S0 into a playback environment detection module, and outputting a classification error Le
S5: the update method of each weight is as follows:
Figure BDA0002567058020000031
Figure BDA0002567058020000032
Figure BDA0002567058020000033
Figure BDA0002567058020000034
Figure BDA0002567058020000035
wherein is the learning rate, α1、α2、α3The weights of the playback device detection module, the playback speaker detection module, and the playback environment detection module, respectively.
S6: the steps of S0 to S5 are repeated until the blocks converge.
The embodiment of the invention provides another field self-adaptive record replay attack detection system, which comprises the following modules:
the acoustic feature extraction module is used for extracting acoustic features of at least one section of recording area in the recording;
the shared voiceprint feature extraction module is used for extracting a shared voiceprint feature vector from the acoustic features;
a detection module for detecting whether the shared voiceprint feature vector is a replay attack;
further, the detection module is also used to detect at least one domain-adaptive countermeasure task associated with the replay attack.
Further, the shared voiceprint feature extraction module and the detection module further comprise a deep neural network module.
And further, the system also comprises a training module which is used for training the deep neural network module in the shared voiceprint feature extraction module and the detection module.
The invention has the beneficial effects that:
the invention can solve the problem of performance degradation of the record replay attack detection system caused by the field diversity of the record replay equipment, environment and speakers; the robustness of the replay attack detection system can still be ensured under the conditions of the devices and environments of replay recording and the field diversity of speakers.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the following briefly introduces the embodiments or the drawings needed to be practical in the prior art description, and obviously, the drawings in the following description are only some embodiments of the embodiments, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1: a schematic diagram of a domain adaptive replay attack detection method;
FIG. 2: a schematic diagram of a training method in a field-adaptive replay attack detection method;
FIG. 3: the structure schematic diagram of a domain-adaptive replay attack detection system;
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more apparent, the technical solutions of the present invention will be described in detail below. It is to be understood that the described embodiments are merely exemplary of the invention, and not restrictive of the full scope of the invention. All other embodiments, which can be derived by a person skilled in the art from the examples given herein without any inventive step, are within the scope of the present invention.
Example one
A domain-adaptive replay attack detection method proposed by the present invention is described with reference to fig. 1 and 2, where fig. 1 shows a flowchart of the replay attack detection method and fig. 2 shows a training flowchart of the domain-adaptive replay attack detection method.
In step S101, extracting acoustic features from at least one recording region in the recording, the acoustic features including Mel-Frequency Cepstrum Coefficients (MFCCs) or Power-normalized Cepstral Coefficients (PNCCs);
in step S102, a shared voiceprint feature vector is extracted from the multiple acoustic features extracted in step S101;
in step S103, it is detected whether the recording is a replay attack from the shared voiceprint feature vector extracted in step S102; while in a detection phase, the shared voiceprint feature vectors are used to detect corresponding targets of at least one domain adaptive countermeasure task associated with the replay attack detection, the domain adaptive countermeasure task including, but not limited to: the replay device detects the task, replays the environment detection task and replays the speaker detection task, and obtains detection results of all fields of adaptive confrontation tasks. The shared voiceprint feature vector is extracted through a shared voiceprint feature module, whether the record is replayed or not is detected through a replay attack detection module, the replay device detection task is achieved through a replay device detection module, the replay environment detection task is achieved through a replay environment detection module, and the replay speaker detection task is achieved through a replay speaker detection module. The shared voiceprint feature module, the replay attack detection module, the replay device detection module, the replay environment detection module and the replay speaker detection module are all composed of a deep neural network comprising one or a combination of Convolutional Neural Networks (CNN), recurrent neural networks (RNN, LSTM, GRU) and time-delay neural networks (TDNN).
In addition, the method of the present invention further comprises a training method of a shared voiceprint feature module, a replay attack detection module, a replay device detection module, a replay environment detection module and a replay speaker detection module, as shown in fig. 2.
In step S201, a sound recording sample in a training set, its real playback label, and a real label of a domain-adaptive countermeasure task are acquired;
in step S202, the weight of the shared voiceprint feature module is WfThe weight of the replay attack detection module is WaThe weight of the detection module of the playback device is WdThe replay speaker detection module has a weight of WsAnd the playback environment detection module has a weight of WeInputting the acoustic features of the sound recording into a shared voiceprint feature module, extracting shared voiceprint feature vectors, inputting the shared voiceprint feature vectors into a replay attack detection module, a replay device detection module, a replay speaker detection module and a replay environment detection module, and acquiring the detection result of replay sound recording and the detection result of a domain-adaptive confrontation task;
in step S203, the detection result of the reproduced sound recording is compared with the real tag of the reproduced sound recording, and the detection error L is obtaineda
In step S204, parameters of the replay attack detection module are updated in a back propagation manner, where the updating manner is: wd
Figure BDA0002567058020000051
Wherein is the learning rate;
in step S205, the detection result of the domain-adaptive countermeasure task and the true label of the domain-adaptive countermeasure task are compared, respectively, and the detection error L is obtainedd、Ls、Le
In step S206, parameters of the domain-adaptive confrontation task detection module are updated in a back-propagation manner, where the updating manner is:
Figure BDA0002567058020000061
wherein is the learning rate;
in step S207, the shared voiceprint feature module parameters are updated in a back-propagation manner after the detection error of the domain adaptive countermeasure task is inverted and the detection error of the playback record is simultaneously updated, and the updating manner is as follows:
Figure BDA0002567058020000062
wherein is the learning rate, α1、α2、α3The weights of the playback device detection module, the playback speaker detection module and the playback environment detection module are respectively.
In step S208, it is determined whether the module converges or the training frequency reaches the set maximum iteration frequency or the module error reaches the set minimum error, if any one of the conditions is satisfied, the training is terminated, otherwise, the steps S201 to S208 are repeated.
The field-adaptive attack detection method for the record replay provided by the embodiment of the invention can still ensure the robustness of a record replay attack detection system under the conditions of the field diversity of the record replay equipment, environment and speakers.
Example two
A domain-adaptive replay attack detection system proposed by the present invention is described with reference to fig. 3, of which fig. 3 shows the constituent modules. Referring to fig. 3, the system includes an acoustic feature extraction module 301, a shared voiceprint feature extraction module 302, a detection module 303, and a training module 304.
The acoustic feature extraction module 301 extracts acoustic features from at least one recording area or the whole recording in the recording data;
the shared voiceprint feature extraction module 302 extracts a shared voiceprint feature vector from the acoustic features in the acoustic feature extraction module 301;
the detection module 303 detects whether the recording is a playback recording from the shared voiceprint feature vectors in the shared voiceprint feature extraction module 302. Meanwhile, the detection module 303 may further detect at least one domain-adaptive countermeasure task associated with the replay attack from the shared voiceprint feature vector, where the countermeasure tasks include a replay device detection task, a replay environment detection task, and a replay speaker detection task and obtain detection results of all the domain-adaptive countermeasure tasks.
The training module 304 is used to train the deep neural network modules in the shared voiceprint feature extraction module 302 and the detection module 303, and the system replay attack detection at least one domain-adaptive countermeasure task that can be associated with replay attack is simultaneously trained, and the training step refers to the first embodiment described above.
The second field-adaptive attack detection system provided by the embodiment of the invention can still ensure the robustness of the attack detection system under the conditions of the field diversity of the equipment, environment and speaker for record playback.
It will be understood by those skilled in the art that all or part of the processes of the methods of the above embodiments may be implemented by hardware instructions related to a program, and the program may be stored in a computer-readable storage medium, and when executed, may include the processes of the above embodiments of the methods. The storage medium may be a magnetic disk, an optical disk, a Read-only Memory (ROM), a Random Access Memory (RAM), or the like.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims. It should be noted that the various technical features described in the above embodiments can be combined in any suitable manner without contradiction, and the invention is not described in any way for the possible combinations in order to avoid unnecessary repetition. In addition, any combination of the various embodiments of the present invention is also possible, and the same should be considered as the disclosure of the present invention as long as it does not depart from the spirit of the present invention.

Claims (10)

1. A domain-adaptive replay attack detection method is characterized by comprising the following steps:
extracting acoustic features from at least one recording region of the recording;
extracting a shared voiceprint feature vector from the acoustic features; and
and detecting whether the sound recording is a replay sound recording or not from the shared voiceprint feature vector by a domain adaptive method.
2. The domain-adaptive replay attack detection method of claim 1, wherein the acoustic features include mel-frequency cepstral coefficients or energy-normalized cepstral coefficients.
3. The method of claim 1, wherein the shared voiceprint feature vector is used to detect a corresponding target of at least one domain adaptive countermeasure task associated with playback attack detection, the domain adaptive countermeasure task comprising: a playback device detection task, a playback environment detection task, and a playback speaker detection task.
4. The domain-adaptive replay attack detection method of claim 3, wherein the shared voiceprint feature vector is extracted by a shared voiceprint feature extraction module, whether the recording is replayed is detected by a replay attack detection module, the replay device detection task is realized by a replay device detection module, the replay environment detection task is realized by a replay environment detection module, and the replay speaker detection task is realized by a replay speaker detection module.
5. The method and system for domain-adaptive replay attack detection of sound recordings according to claim 4, wherein the shared voiceprint feature extraction module, replay attack detection module, replay device detection module, replay environment detection module and replay speaker detection module are formed by a deep neural network, and the deep neural network comprises one or more networks selected from the group consisting of convolutional neural network, recursive neural network and delayed neural network.
6. A domain adaptive replay attack detection method according to any one of claims 4 to 5 in which the training steps for each module are as follows:
wherein the weight of the shared voiceprint feature module is WfThe replay attack detection module has a weight WaThe weight of the detection module of the playback device is WdThe replay speaker detection module has a weight WsThe playback environment detection module has a weight We
S0: inputting the acoustic features of the sound recording into a shared voiceprint feature module, and extracting shared voiceprint feature vectors;
s1: inputting the shared voiceprint feature vector in S0 into a replay attack detection module, and outputting a classification error La
S2: inputting the shared voiceprint feature vector in S0 into a detection module of a playback device, and outputting a classification error Ld
S3: inputting the shared voiceprint feature vector in S0 into the speaker detection module, and outputting a classification error Ls
S4: inputting the shared voiceprint feature vector in S0 into a playback environment detection module, and outputting a classification error Le
S5: the update method of each weight is as follows:
Figure FDA0002567058010000021
Figure FDA0002567058010000022
Figure FDA0002567058010000023
Figure FDA0002567058010000024
Figure FDA0002567058010000025
wherein is the learning rate, α1、α2、α3The weights of the playback device detection module, the playback speaker detection module, and the playback environment detection module, respectively.
S6: the steps of S0 to S5 are repeated until the blocks converge.
7. The domain adaptive replay attack detection method detection system of any one of claims 1 to 8, comprising:
the acoustic feature extraction module is used for extracting acoustic features of at least one section of recording area in the recording;
the shared voiceprint feature extraction module is used for extracting a shared voiceprint feature vector from the acoustic features;
a detection module for detecting whether the shared voiceprint feature vector is a replay attack.
8. The domain-adaptive replay attack detection system of claim 7, wherein the detection module is further configured to detect at least one domain-adaptive countermeasure task associated with a replay attack.
9. The domain-adaptive replay attack detection system of claim 7, wherein the shared voiceprint feature extraction module and detection module further comprises a deep neural network module.
10. A domain adaptive replay attack detection system according to claims 7-9 further including a training module for training the deep neural network module in the shared voiceprint feature extraction module and the detection module.
CN202010630019.XA 2020-07-02 2020-07-02 Domain-adaptive replay attack detection method and system Active CN111755014B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010630019.XA CN111755014B (en) 2020-07-02 2020-07-02 Domain-adaptive replay attack detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010630019.XA CN111755014B (en) 2020-07-02 2020-07-02 Domain-adaptive replay attack detection method and system

Publications (2)

Publication Number Publication Date
CN111755014A true CN111755014A (en) 2020-10-09
CN111755014B CN111755014B (en) 2022-06-03

Family

ID=72678889

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010630019.XA Active CN111755014B (en) 2020-07-02 2020-07-02 Domain-adaptive replay attack detection method and system

Country Status (1)

Country Link
CN (1) CN111755014B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113284486A (en) * 2021-07-26 2021-08-20 中国科学院自动化研究所 Robust voice identification method for environmental countermeasure

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105139857A (en) * 2015-09-02 2015-12-09 广东顺德中山大学卡内基梅隆大学国际联合研究院 Countercheck method for automatically identifying speaker aiming to voice deception
CN105702263A (en) * 2016-01-06 2016-06-22 清华大学 Voice playback detection method and device
CN105869630A (en) * 2016-06-27 2016-08-17 上海交通大学 Method and system for detecting voice spoofing attack of speakers on basis of deep learning
CN106531172A (en) * 2016-11-23 2017-03-22 湖北大学 Speaker voice playback identification method and system based on environmental noise change detection
CN108039176A (en) * 2018-01-11 2018-05-15 广州势必可赢网络科技有限公司 Voiceprint authentication method and device for preventing recording attack and access control system
CN108806698A (en) * 2018-03-15 2018-11-13 中山大学 A kind of camouflage audio recognition method based on convolutional neural networks
US20190013033A1 (en) * 2016-08-19 2019-01-10 Amazon Technologies, Inc. Detecting replay attacks in voice-based authentication
US20190115033A1 (en) * 2017-10-13 2019-04-18 Cirrus Logic International Semiconductor Ltd. Detection of liveness
CN109841219A (en) * 2019-03-15 2019-06-04 慧言科技(天津)有限公司 Replay Attack method is cheated using speech amplitude information and a variety of phase-detection voices
CN110491391A (en) * 2019-07-02 2019-11-22 厦门大学 A kind of deception speech detection method based on deep neural network
CN110718229A (en) * 2019-11-14 2020-01-21 国微集团(深圳)有限公司 Detection method for record playback attack and training method corresponding to detection model

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105139857A (en) * 2015-09-02 2015-12-09 广东顺德中山大学卡内基梅隆大学国际联合研究院 Countercheck method for automatically identifying speaker aiming to voice deception
CN105702263A (en) * 2016-01-06 2016-06-22 清华大学 Voice playback detection method and device
CN105869630A (en) * 2016-06-27 2016-08-17 上海交通大学 Method and system for detecting voice spoofing attack of speakers on basis of deep learning
US20190013033A1 (en) * 2016-08-19 2019-01-10 Amazon Technologies, Inc. Detecting replay attacks in voice-based authentication
US20200118577A1 (en) * 2016-08-19 2020-04-16 Amazon Technologies, Inc. Detecting replay attacks in voice-based authentication
US10510352B2 (en) * 2016-08-19 2019-12-17 Amazon Technologies, Inc. Detecting replay attacks in voice-based authentication
CN106531172A (en) * 2016-11-23 2017-03-22 湖北大学 Speaker voice playback identification method and system based on environmental noise change detection
US20190115033A1 (en) * 2017-10-13 2019-04-18 Cirrus Logic International Semiconductor Ltd. Detection of liveness
CN108039176A (en) * 2018-01-11 2018-05-15 广州势必可赢网络科技有限公司 Voiceprint authentication method and device for preventing recording attack and access control system
CN108806698A (en) * 2018-03-15 2018-11-13 中山大学 A kind of camouflage audio recognition method based on convolutional neural networks
CN109841219A (en) * 2019-03-15 2019-06-04 慧言科技(天津)有限公司 Replay Attack method is cheated using speech amplitude information and a variety of phase-detection voices
CN110491391A (en) * 2019-07-02 2019-11-22 厦门大学 A kind of deception speech detection method based on deep neural network
CN110718229A (en) * 2019-11-14 2020-01-21 国微集团(深圳)有限公司 Detection method for record playback attack and training method corresponding to detection model

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
VALENTI G: ""An end-to-end spoofing countermeasure for automatic speaker verification using evolving recurrent neural networks"", 《 ODYSSEY 2018 THE SPEAKER AND LANGUAGE RECOGNITION WORKSHOP》 *
ZHUXIN CHEN: ""Recurrent Neural Networks for Automatic Replay Spoofing Attack Detection"", 《ICASSP》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113284486A (en) * 2021-07-26 2021-08-20 中国科学院自动化研究所 Robust voice identification method for environmental countermeasure

Also Published As

Publication number Publication date
CN111755014B (en) 2022-06-03

Similar Documents

Publication Publication Date Title
EP3504703B1 (en) A speech recognition method and apparatus
Li et al. Online direction of arrival estimation based on deep learning
Cui et al. Data augmentation for deep neural network acoustic modeling
Li et al. Developing far-field speaker system via teacher-student learning
CN110503970A (en) A kind of audio data processing method, device and storage medium
Cheng et al. Replay detection using CQT-based modified group delay feature and ResNeWt network in ASVspoof 2019
CN109584896A (en) A kind of speech chip and electronic equipment
CN108122563A (en) Improve voice wake-up rate and the method for correcting DOA
JP6703460B2 (en) Audio processing device, audio processing method, and audio processing program
CN108417224A (en) The training and recognition methods of two way blocks model and system
Ge et al. Deep learning approach in DOA estimation: A systematic literature review
CN114708857A (en) Speech recognition model training method, speech recognition method and corresponding device
Basbug et al. Acoustic scene classification using spatial pyramid pooling with convolutional neural networks
Takeda et al. Unsupervised adaptation of neural networks for discriminative sound source localization with eliminative constraint
Chang et al. Audio adversarial examples generation with recurrent neural networks
CN110930996A (en) Model training method, voice recognition method, device, storage medium and equipment
CN115208507A (en) Privacy protection method and device based on white-box voice countermeasure sample
CN112180318A (en) Sound source direction-of-arrival estimation model training and sound source direction-of-arrival estimation method
CN111755014B (en) Domain-adaptive replay attack detection method and system
CN112133293A (en) Phrase voice sample compensation method based on generation countermeasure network and storage medium
Suh et al. Phoneme segmentation of continuous speech using multi-layer perceptron
Salvati et al. End-to-End Speaker Identification in Noisy and Reverberant Environments Using Raw Waveform Convolutional Neural Networks.
CN114664288A (en) Voice recognition method, device, equipment and storage medium
KR20210131067A (en) Method and appratus for training acoustic scene recognition model and method and appratus for reconition of acoustic scene using acoustic scene recognition model
Iqbal et al. Enhancing audio augmentation methods with consistency learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant