CN113113021A - Voice biological recognition authentication real-time detection method and system - Google Patents

Voice biological recognition authentication real-time detection method and system Download PDF

Info

Publication number
CN113113021A
CN113113021A CN202110396974.6A CN202110396974A CN113113021A CN 113113021 A CN113113021 A CN 113113021A CN 202110396974 A CN202110396974 A CN 202110396974A CN 113113021 A CN113113021 A CN 113113021A
Authority
CN
China
Prior art keywords
voice
watermark
real
authentication
audio file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110396974.6A
Other languages
Chinese (zh)
Inventor
张寅�
张翼
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Effective Software Technology Shanghai Co ltd
Original Assignee
Effective Software Technology Shanghai Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Effective Software Technology Shanghai Co ltd filed Critical Effective Software Technology Shanghai Co ltd
Priority to CN202110396974.6A priority Critical patent/CN113113021A/en
Publication of CN113113021A publication Critical patent/CN113113021A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L19/00Speech or audio signals analysis-synthesis techniques for redundancy reduction, e.g. in vocoders; Coding or decoding of speech or audio signals, using source filter models or psychoacoustic analysis
    • G10L19/018Audio watermarking, i.e. embedding inaudible data in the audio signal

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Collating Specific Patterns (AREA)

Abstract

The invention provides a real-time detection method and a real-time detection system for a voice biological recognition authentication system, wherein the detection method comprises the following steps: step S1: generating an audio file for voice verification; step S2: the voice biometric identification server receives the audio file generated in step S1; step S3: the audio file in step S2 is processed by the speech biometric identification server, and then sent to the watermark processing module in the speech biometric identification server for watermark detection. The system comprises: the identity authentication client is used for starting an identity authentication request; a speaker for playing the audio watermark; and the voice biological recognition server is used for converting the audio file into a format and then sending the audio file to the voice biological recognition server for processing.

Description

Voice biological recognition authentication real-time detection method and system
Technical Field
The invention belongs to the field of voice biological recognition systems, and particularly relates to a voice biological recognition authentication real-time detection method and system.
Background
Biometric speech recognition (also known as speaker recognition) is a system that compares the speech modeling model of an enrolled user with the speech modeling model of a matching request and gives a probability score that two speech samples come from the same person, similar to the working principle of fingerprint recognition or facial recognition. Speech biometric recognition is divided into two categories, one is called text-dependent and the other is text-independent. The speech biometric recognition of the text-related pattern refers to the registered speech modeled speech, such as "my voice is my password" and must be the same as the phrase spoken at the time of authentication. Text-independent means that the enrollment voice may be different from the authentication voice, but it requires a long enrollment time compared to the text-dependent mode. Typically, text-dependent voice biometric systems are used for identity verification, such as application login or access control. Text-dependent speech biometric systems, however, are susceptible to a type of spoofing attack known as a play or replay attack. This is when an authentication phrase from a legitimate user is captured by a recording device and the phrase is played on a speaker to corrupt the system associated with the text. Text-dependent systems judge the audio of the speaker as coming from the legitimate user and allow access to the application.
To combat such attacks, text-dependent systems are typically equipped with a real-time detection module that contains algorithms for determining whether the audio sample is from a real-time user or a machine speaker. However, such algorithms are statistical classifiers and therefore cannot reliably prevent replay attacks, especially when recording quality is high and speaker quality is high.
In addition, some other text-related systems use randomness elements, such as random numbers, to ensure real-time. For example, the system may prompt the user to speak a unique eight-digit random number sequence. Theoretically, if an authentication session is recorded, the recording is useless because the next session will have a new set of digits. However, this randomness element has the disadvantage that it must have a speech recognition so that the system can verify that the spoken language elements are the same prompt elements. Since speech recognition systems do not have perfect accuracy, if a user speaks in an accent or unsupported language/dialect, it is unreliable, which severely reduces the overall accuracy of the speech biometric system and limits the number of people who can use speech biometric as an authentication channel.
Disclosure of Invention
In order to overcome the defects in the prior art, the invention provides a voice biological identification authentication real-time detection method and a voice biological identification authentication real-time detection system, which solve the problem that the real-time recording or the pre-recording of an audio sample received by a biological identification system cannot be accurately judged in the background technology.
The invention provides a voice biological identification authentication real-time detection method, which comprises the following steps:
step S1: generating an audio file for voice verification;
step S11: the authentication client initiates an authentication request:
step S12: outputting the sound watermark by a loudspeaker;
the step S11 and the step S12 occur simultaneously or sequentially in time domain;
step S2: the voice biological recognition server receives the audio file generated in the step S1;
step S3: and after being processed by the voice biometric identification server, the audio file in the step S2 is sent to a watermarking processing module in the voice biometric identification server for watermarking detection.
In an embodiment of the present invention, the audio file generated in step S1 further includes a voice generated when the authentication request is executed in step S11 and a sound watermark generated when step S12 is executed, and the voice and the sound watermark are superimposed.
In one embodiment of the present invention, the audio file in step S1 is further processed and converted into a file type that can be recognized by the speech biometric recognition server, and then transmitted to the speech biometric recognition server.
In one embodiment of the present invention, the watermark processing module is further configured to detect a watermark from a file received by the watermark processing module via an algorithm module in the watermark processing module.
One technical solution of the present invention is further configured that the method further includes the steps of:
step S4: the watermark detected by said step S3 is compared with the perturbation provided by the client;
step S5: and returning the detection result to the voice biological recognition server.
One technical scheme of the invention is further set that the detected watermark is consistent with the disturbance, and the signal is a real-time signal and is returned to the voice biological recognition server; the detected watermark is inconsistent with the disturbance, and the signal is a replay signal, which is returned to the voice biological recognition server.
One technical solution of the present invention is further configured that the step of determining that the watermark is inconsistent with the disturbance includes: the time when the watermark occurs, the watermark length, or a combination of one or more of these.
The invention also provides another technical scheme, a voice biological identification authentication real-time detection system and a real-time detection method using the voice biological identification authentication system, wherein the system comprises:
the identity authentication client is used for starting an identity authentication request;
a speaker for playing the audio watermark;
and the system converts the audio file into a format and then sends the audio file to the voice biological recognition server for processing.
In one embodiment of the present invention, the voice biometric server further includes a watermark processing module, and the watermark processing module is configured to detect a watermark.
The invention has the beneficial effects that:
(1) the invention realizes real-time detection without increasing the friction of user experience.
(2) The invention protects each voice authentication session from being recorded, thereby reducing the risk of play attack.
(3) The voiceprint watermark ensures the uniqueness of the voice password and improves the safety of user identity authentication and the confidentiality of user biological characteristic information.
Drawings
FIG. 1 is a schematic flow chart of a real-time detection method for voice biometric authentication according to the present invention;
fig. 2 is a block diagram of a voice biometric authentication real-time detection system of the present invention.
Detailed Description
In order to facilitate an understanding of the invention, the invention will now be described more fully hereinafter with reference to the accompanying drawings, in which several embodiments of the invention are shown, but which may be embodied in many different forms and are not limited to the embodiments described herein, but rather are provided for the purpose of providing a more thorough disclosure of the invention.
Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs; the terminology used herein in the description of the invention is for the purpose of describing particular embodiments only and is not intended to be limiting of the invention; as used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
Any play attack occurs and the text-related authentication phrase from the target must be recorded. It may be assumed that text related phrases such as "my voice is my password" or "please verify my transaction" are not used in normal conversations outside the authentication context. Thus, the most likely way for a recording device to record such phrases is during an authentication session.
The method described in the present invention demonstrates how a voice biometric authentication system can detect a replay attack without using the conventional method mentioned in the background.
The invention provides a voice biological identification authentication real-time detection method, which comprises the following steps:
step S1: generating an audio file for voice verification;
step S2: the voice biometric identification server receives the audio file generated in step S1;
step S3: the audio file in step S2 is processed by the speech biometric identification server, and then sent to the watermark processing module in the speech biometric identification server for watermark detection.
The substeps of step S11 are as follows:
step S11: the authentication client initiates an authentication request:
step S12: outputting the sound watermark by a loudspeaker;
step S11 and step S12 occur simultaneously or sequentially in time domain.
Further, the audio file generated at step S1 includes the voice generated when the authentication request is performed at step S11 and the sound watermark generated when step S12 is performed, which are superimposed.
Further, the audio file of step S1 is processed and converted into a file type that can be recognized by the speech biometric recognition server, and then transmitted to the speech biometric recognition server.
Furthermore, the file received by the watermark processing module detects the watermark through an algorithm module in the watermark processing module.
Further, the detection method of the present invention further comprises the steps of:
step S4: the watermark detected in step S3 is compared with the perturbation provided by the client;
step S5: and returning the detection result to the voice biological recognition server.
In detail, the detected watermark is consistent with the disturbance, and the signal is a real-time signal and is returned to the voice biological recognition server; the detected watermark is inconsistent with the disturbance, and the signal is a replay signal, which is returned to the voice biological recognition server.
Wherein the watermark being inconsistent with the perturbation comprises: the time when the watermark occurs, the watermark length, or a combination of one or more of these.
The real-time detection method is applied to the following voice biological identification authentication real-time detection system, and the real-time detection system comprises:
the identity authentication client is used for starting an identity authentication request;
a speaker for playing the audio watermark;
and the voice biological recognition server is used for converting the audio file into a format and then sending the audio file to the voice biological recognition server for processing.
Furthermore, the voice biological recognition server also comprises a watermark processing module, and the watermark processing module is used for detecting the watermark.
If the authentication client sends different signals to all nearby recording devices, the authentication session will effectively be watermarked, making the recording unusable for play attacks.
Each time the authentication client initiates an authentication request, the speaker on the device plays an audio watermark that will be recorded in all nearby recorders. The voice watermark will be played randomly and at a random time between the start of the authentication request and the end of the authentication request. When the final audio file is sent to the speech biometric server for processing, the system also sends relevant information about the watermark, such as the sound image, the moment of play, and the time of transmission from the beginning of the file to the end of the file.
The watermark processing module in the speech biological recognition server searches the watermark in the audio file according to the disturbance provided by the system by using a signal processing formula based on Discrete Fourier Transform (DFT). If a single watermark in the audio file is present that is the same as the symbol provided by the device, the system may conclude that the audio file is from a genuine user.
The signal processing formula is as follows:
Figure BDA0003018921920000061
where t0 is the time at which the watermark begins and t1 is the time at which the watermark ends, g (f) is an audio frequency function, g (t) is an audio time function, i is a root number-1 equal to the imaginary unit i, i2I is the imaginary unit and f is the number of tangents to the sampling rate.
The watermark will be present in the recording if there is another recording device nearby for recording the authentication phrase for future play attacks.
When the recording is used as a replay attack, the watermark processing module will detect two separate watermarks, the first of which coincides with the symbol provided by the device and the second of which coincides with the watermark in the recording. If multiple watermarks are detected, the watermarking module should conclude that the authentication request may come from the playback source, rather than the legitimate speaker. This patent protects each voice authentication session from being recorded, thereby reducing the risk of play attacks.
Specifically, in conjunction with fig. 1 and 2, the following is exemplified:
the authentication client initiates an authentication request while activating the speaker to broadcast an unnatural sounding watermark, e.g., at 14khz, a sine wave is played at 60db for 0.3 seconds and at a timestamp of 1.0 second after the microphone is turned on. And then records and transmits the information of the approximate time of the server side. The audio file is saved as a 16khz 16bit WAV file and converted to BASE64 and sent to the speech biometric recognition server for processing. The copy of BASE64 would then be sent to a separate watermarking module where the algorithm would detect the watermark from the client-provided disturbance. If a watermark is detected and consistent with the perturbation, the WPM will return a "signal as a real-time signal" signal to the speech biometric server. If other watermarks of different lengths are detected at different times or the watermarks do not match the information in the metadata, the WPM returns a "signal as a replay signal" signal to the voice biometric server and may take corresponding action.
Assume that a real user is using a speech biometric recognition system whose speech biometric recognition system uses "my voice is my password" as a given phrase. When the user presses a button to initiate an authentication session, the authentication client selects a random point in time after pressing the button, but before the session is over, e.g., 0.3 seconds after pressing the button, and triggers a speaker on the device to emit a unique non-naturally occurring audio signature at a particular frequency (e.g., a 14khz sine wave) for a length of 0.5 seconds at a volume that will broadcast at a volume that will be recorded to all microphones in the vicinity, including the microphones on the authentication device. After the audio sample is sent to the server for processing, the watermark processing module searches the specific watermark signature in the audio file within 0.3 second, and the length of the specific watermark signature is 0.5 second. The watermarking module will also look for this 14khz sinusoid at any other point in time in the audio sample.
Assume that the genuine user was recorded by a malicious participant during the session and that the executive uses the voice sample to corrupt the genuine user account. When the authentication session is initiated, the new watermark is emitted 0.9 seconds after the button is pressed for a length of 0.4 seconds. However, when the audio file arrives at the server for processing, the watermarking module will find the watermark at the 0.9 second mark, play it for 0.4 seconds, and find a watermark at the 0.3 second mark and play it for 0.5 seconds. Thus, the system may conclude that the audio sample may be from a recorded source, not a real-time user, because there is a watermark that is not compliant with what should happen. In this case there is an overlap, for example if the new watermark is emitted at a 0.4 second mark length of 0.7 seconds, the audio file has a single watermark starting at a 0.3 second mark, playing for a total of 0.8 seconds. The watermark processing module may detect that the start-up time of the watermark is different and that the length of the emission is different from the length of the perturbation provided by the client application.
It is assumed that there is an injection attack and that the attacker is somehow able to bypass the authentication integrity of the client and inject the recording directly into the server. This type of attack would destroy any play detection module but not the watermarking module. The watermark representation received by the server from the client is different from the watermark in the injection. Only the original 0.3 mark will be 0.5 length since nothing is emitted. The probability that an attacked authentication session receives a representation of the watermark at 0.3 mark 0.5 length is very small.
In a conventional playback attack, since the initiation of the authentication session and the start of recording are two separate manual operations, the probability of watermark overlap is very small, since the watermark is indistinguishable from the watermarking module.
The above-mentioned embodiments only express a certain implementation mode of the present invention, and the description thereof is specific and detailed, but not construed as limiting the scope of the present invention; it should be noted that, for those skilled in the art, without departing from the concept of the present invention, several variations and modifications can be made, which are within the protection scope of the present invention; therefore, the protection scope of the present patent shall be subject to the appended claims.

Claims (9)

1. A voice biological identification authentication real-time detection method is characterized by comprising the following steps:
step S1: generating an audio file for voice verification;
step S11: the authentication client initiates an authentication request:
step S12: outputting the sound watermark by a loudspeaker;
the step S11 and the step S12 occur simultaneously or sequentially in time domain;
step S2: the voice biological recognition server receives the audio file generated in the step S1;
step S3: and after being processed by the voice biometric identification server, the audio file in the step S2 is sent to a watermarking processing module in the voice biometric identification server for watermarking detection.
2. The voice biometric authentication real-time detection method according to claim 1, wherein the audio file generated in the step S1 includes a voice generated when the authentication request is performed in the step S11 and a sound watermark generated when the step S12 is performed, which are superimposed.
3. The method for real-time detection of voice biometric authentication according to claim 2, wherein the audio file of step S1 is processed and converted into a file type recognizable by the voice biometric server, and then is transmitted to the voice biometric server.
4. The voice biometric authentication real-time detection method according to claim 3, wherein the watermark is detected from the file received by the watermarking module via an algorithm module in the watermarking module.
5. The voice biometric authentication real-time detection method according to claim 1, further comprising the steps of:
step S4: the watermark detected by said step S3 is compared with the perturbation provided by the client;
step S5: and returning the detection result to the voice biological recognition server.
6. The voice biometric authentication real-time detection method according to claim 5, wherein the detected watermark is consistent with the disturbance, and a signal is a real-time signal and is returned to the voice biometric server; the detected watermark is inconsistent with the disturbance, and the signal is a replay signal, which is returned to the voice biological recognition server.
7. The voice biometric authentication real-time detection method according to claim 6, wherein the watermark being inconsistent with the perturbation comprises: the time when the watermark occurs, the watermark length, or a combination of one or more of these.
8. A voice biometric authentication system to which the voice biometric authentication real-time detection method according to any one of claims 1 to 7 is applied, the system comprising:
the identity authentication client is used for starting an identity authentication request;
a speaker for playing the audio watermark;
and the system converts the audio file into a format and then sends the audio file to the voice biological recognition server for processing.
9. The voice biometric authentication system of claim 8, wherein the voice biometric server comprises a watermarking module, the watermarking module configured to detect a watermark.
CN202110396974.6A 2021-04-13 2021-04-13 Voice biological recognition authentication real-time detection method and system Pending CN113113021A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110396974.6A CN113113021A (en) 2021-04-13 2021-04-13 Voice biological recognition authentication real-time detection method and system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110396974.6A CN113113021A (en) 2021-04-13 2021-04-13 Voice biological recognition authentication real-time detection method and system

Publications (1)

Publication Number Publication Date
CN113113021A true CN113113021A (en) 2021-07-13

Family

ID=76716470

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110396974.6A Pending CN113113021A (en) 2021-04-13 2021-04-13 Voice biological recognition authentication real-time detection method and system

Country Status (1)

Country Link
CN (1) CN113113021A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115035903A (en) * 2022-08-10 2022-09-09 杭州海康威视数字技术股份有限公司 Physical voice watermark injection method, voice tracing method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1291324A (en) * 1997-01-31 2001-04-11 T-内提克斯公司 System and method for detecting a recorded voice
CN103208289A (en) * 2013-04-01 2013-07-17 上海大学 Digital audio watermarking method capable of resisting re-recording attack
WO2015012680A2 (en) * 2013-07-22 2015-01-29 Universiti Putra Malaysia A method for speech watermarking in speaker verification

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1291324A (en) * 1997-01-31 2001-04-11 T-内提克斯公司 System and method for detecting a recorded voice
CN103208289A (en) * 2013-04-01 2013-07-17 上海大学 Digital audio watermarking method capable of resisting re-recording attack
WO2015012680A2 (en) * 2013-07-22 2015-01-29 Universiti Putra Malaysia A method for speech watermarking in speaker verification

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115035903A (en) * 2022-08-10 2022-09-09 杭州海康威视数字技术股份有限公司 Physical voice watermark injection method, voice tracing method and device
CN115035903B (en) * 2022-08-10 2022-12-06 杭州海康威视数字技术股份有限公司 Physical voice watermark injection method, voice tracing method and device

Similar Documents

Publication Publication Date Title
US10083693B2 (en) Method and system for using conversational biometrics and speaker identification/verification to filter voice streams
US11210461B2 (en) Real-time privacy filter
US20180146370A1 (en) Method and apparatus for secured authentication using voice biometrics and watermarking
WO2017114307A1 (en) Voiceprint authentication method capable of preventing recording attack, server, terminal, and system
US8630391B2 (en) Voice authentication system and method using a removable voice ID card
WO2017197953A1 (en) Voiceprint-based identity recognition method and device
US8010367B2 (en) Spoken free-form passwords for light-weight speaker verification using standard speech recognition engines
Gałka et al. Playback attack detection for text-dependent speaker verification over telephone channels
US8812319B2 (en) Dynamic pass phrase security system (DPSS)
US9697836B1 (en) Authentication of users of self service channels
US20080270132A1 (en) Method and system to improve speaker verification accuracy by detecting repeat imposters
WO2012154798A1 (en) Speaker liveness detection
WO2010047816A1 (en) Speaker verification methods and apparatus
WO2010047817A1 (en) Speaker verification methods and systems
WO2008083571A1 (en) A random voice print cipher certification system, random voice print cipher lock and generating method thereof
CN103678977A (en) Method and electronic device for protecting information security
JP2007264507A (en) User authentication system, illegal user discrimination method, and computer program
JP6220304B2 (en) Voice identification device
CN102377729A (en) User registration and logon method by combining speaker speech identity authentication and account code protection in network games
Chang et al. My voiceprint is my authenticator: A two-layer authentication approach using voiceprint for voice assistants
Kassis et al. Practical attacks on voice spoofing countermeasures
CN113113021A (en) Voice biological recognition authentication real-time detection method and system
Shirvanian et al. Quantifying the breakability of voice assistants
Shirvanian et al. Short voice imitation man-in-the-middle attacks on Crypto Phones: Defeating humans and machines
Kounoudes et al. Voice biometric authentication for enhancing Internet service security

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination