WO2017114307A1 - Voiceprint authentication method capable of preventing recording attack, server, terminal, and system - Google Patents

Voiceprint authentication method capable of preventing recording attack, server, terminal, and system Download PDF

Info

Publication number
WO2017114307A1
WO2017114307A1 PCT/CN2016/111714 CN2016111714W WO2017114307A1 WO 2017114307 A1 WO2017114307 A1 WO 2017114307A1 CN 2016111714 W CN2016111714 W CN 2016111714W WO 2017114307 A1 WO2017114307 A1 WO 2017114307A1
Authority
WO
WIPO (PCT)
Prior art keywords
character
user
voice
voiceprint authentication
user voice
Prior art date
Application number
PCT/CN2016/111714
Other languages
French (fr)
Chinese (zh)
Inventor
徐燕军
何朔
尹亚伟
万四爽
Original Assignee
中国银联股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国银联股份有限公司 filed Critical 中国银联股份有限公司
Publication of WO2017114307A1 publication Critical patent/WO2017114307A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/08Network architectures or network communication protocols for network security for authentication of entities
    • H04L63/0861Network architectures or network communication protocols for network security for authentication of entities using biometrical features, e.g. fingerprint, retina-scan
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/183Speech classification or search using natural language modelling using context dependencies, e.g. language models
    • G10L15/187Phonemic context, e.g. pronunciation rules, phonotactical constraints or phoneme n-grams
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/14Network architectures or network communication protocols for network security for detecting or protecting against malicious traffic
    • H04L63/1441Countermeasures against malicious traffic
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials

Definitions

  • the present application belongs to the field of voiceprint recognition, and particularly relates to a voiceprint authentication method, a server, a terminal, and a system capable of preventing a recording attack.
  • the voiceprint is a very important biological feature that can characterize people. Compared with traditional password authentication and other means, the voiceprint has high security and convenience.
  • the most commonly used attacks in voiceprint authentication are recording replay attacks, speaker spoofing attacks, and forged authentication voice attacks.
  • the recording playback attack means that the attacker obtains the user's voice sample through various means through the high-fidelity recording device, and uses the original recording of the user or the method of cutting, splicing, etc. to synthesize the "speaker true sound", and then in the authentication system. When the user's voice is collected, it is played back through the high-fidelity power amplifier to attack.
  • a speaker phishing attack is an attack by an attacker who is good at defaulting the voice of others by imitating the speaker's way of speaking and pronunciation.
  • Forgery authentication voice attack refers to the attack by attacking the voice of the attacker through techniques such as synthesis, conversion, and splicing.
  • the attacker's counterfeit attack requires the attacker to have a good ability to imitate. Forgery of the authenticated voice attack also requires high professional skills. These two attacks are inherently high in attack, and whether it is an analog sound or a fake sound, it is not a true sound.
  • the existing voiceprint recognition technology can basically cope with these two types of attacks.
  • Recording playback attacks are very important issues in voiceprint recognition. Attackers acquire sounds and then use software synthesis to attack. There are two cases of recording attacks. One is that the user's voice is stolen in other situations to attack. The other is that the user attacks the voice of the user through malware during the voiceprint recognition.
  • the first scheme is to distinguish whether the recording content is by analyzing the difference in the channel characteristic pattern between the recording and the original speech; the second scheme is to verify the speaker's voiceprint and also verify the speaker's speech content. Because the recording attacker does not know the content of this speech.
  • the user randomly reads and writes a large amount of text, the user experience is poor, if the user's voice input is reduced, such as a patent (application number: 201310123555.0; invention name: based on the dynamic password voice identity Recognition system and method), select and combine from 26 English letters and 10 numbers. After each random combination of production dynamic passwords, let the user input by voice. Because they do not know the dynamic password of each production in advance, they can resist A simple recording attack is a better solution. However, since the patent only randomly combines 36 characters in 26 English letters and 10 numbers, if the attacker separates the 36 characters by means of recording separation, the attacker only needs to obtain any random string. Simply splicing through 36 characters for attack.
  • the present invention provides a voiceprint authentication method, a server, and a terminal, which are provided with a function of preventing a recording attack, and are used for solving the defect of preventing a recording attack in the prior art, and cannot effectively prevent a recording attack.
  • the present application provides a voiceprint authentication method capable of preventing a recording attack, including:
  • the present application further provides a voiceprint authentication server capable of preventing recording, including:
  • a generating unit configured to generate a character combination and a pronunciation rule of the character according to a request of the user
  • a sending unit configured to send the character combination and the pronunciation rule of the character to the requesting terminal, and send the voiceprint authentication result to the requesting terminal;
  • a receiving unit configured to receive a user voice input by the requesting terminal according to the character combination and a pronunciation rule of a character
  • a sound detecting unit configured to perform voiceprint authentication according to the user voice, the character combination, and a pronunciation rule of the character
  • the present application further provides a voiceprint authentication terminal capable of preventing a recording attack, including:
  • a requesting unit configured to send a user's voiceprint authentication request to the server
  • a receiving unit configured to receive and display a character combination sent by the server and a pronunciation rule of the character, and receive a voiceprint authentication result sent by the server;
  • An input unit configured to receive a user voice input by a user according to the character combination and a pronunciation rule of the character
  • a sending unit configured to send the user voice to the server.
  • the present application further provides a voiceprint authentication system capable of preventing a recording attack, the system comprising a server and a requesting terminal, wherein the server is configured to generate a character combination and a pronunciation rule of a character according to a user's voiceprint authentication request;
  • the character combination and the pronunciation rule of the character are sent to the requesting terminal; the user voice input by the requesting terminal according to the character combination and the pronunciation rule of the character is received; and the sound is performed according to the user voice, the character combination, and the pronunciation rule of the character.
  • Pattern authentication sending the voiceprint authentication result to the requesting terminal;
  • the requesting terminal is configured to send a user's voiceprint authentication request to the server; receive and display the character combination sent by the server and the pronunciation rule of the character; and receive the user voice input by the user according to the character combination and the pronunciation rule of the character; Transmitting the user voice to the server; receiving a voiceprint authentication result sent by the server.
  • the voiceprint authentication method, server, terminal and system capable of preventing recording attacks proposed by the present application can effectively prevent recording attacks by verifying whether characters and pronunciations in the user voice are consistent with the character combination generated by the server and the pronunciation rules of the characters. Even if the attacker can obtain the voice content through other channels, the attacker cannot satisfy the requirement of the pronunciation mode. Further, in order to prevent the user voice repeatedly input by the user from being subjected to a recording attack, it is determined that the character and the pronunciation mode in the user voice are consistent with the character combination generated by the server and the pronunciation rule of the character, and the current voice and historical voice library to be verified are also determined. Whether the voice of the user is consistent. If they are consistent, there is a recording attack. This application can effectively prevent recording attacks in voiceprint authentication.
  • FIG. 1 is a flowchart of a voiceprint authentication method capable of preventing a recording attack according to an embodiment of the present application
  • FIG. 2 is a flowchart of a voiceprint authentication process capable of preventing a recording attack according to an embodiment of the present application
  • FIG. 3 is a flowchart of a voiceprint authentication process capable of preventing a recording attack according to an embodiment of the present application
  • FIG. 5 is a flowchart of a voiceprint authentication method capable of preventing a recording attack according to an embodiment of the present application
  • 6 is a voiceprint authentication server capable of preventing a recording attack according to an embodiment of the present application
  • FIG. 7 is a voiceprint authentication terminal capable of preventing a recording attack according to an embodiment of the present application.
  • FIG. 8 is a voiceprint authentication system capable of preventing a recording attack according to an embodiment of the present application.
  • FIG. 9 is a flowchart of a voiceprint authentication method with a function of preventing a recording attack according to an embodiment of the present application.
  • FIG. 1 is a flowchart of a voiceprint authentication method capable of preventing a recording attack according to an embodiment of the present application.
  • This embodiment is a voiceprint authentication method described on the server side.
  • the voiceprint authentication is performed according to the user voice fed back by the terminal, the character combination generated by the server, and the pronunciation rule of the character. This embodiment can prevent recording attacks to a certain extent.
  • the voiceprint authentication method capable of preventing a recording attack includes the following steps:
  • Step 101 Generate a character combination and a pronunciation rule of the character according to a user's voiceprint authentication request
  • the character combination includes but is not limited to letters, numbers, Chinese characters, etc.
  • the pronunciation rules of the characters include, but are not limited to, the pitch of the pronunciation, the length of the pronunciation, and the like.
  • each character in the character combination corresponds to one pronunciation rule, and the other implementation
  • the two characters in the character combination correspond to one pronunciation rule, and the present application does not limit the specific form of the pronunciation rule of the characters in the character combination and the character combination.
  • the character combination and the pronunciation rule of the character are randomly generated.
  • Step 102 Send a character combination and a pronunciation rule of the character to the requesting terminal;
  • the terminals described in the present application include, but are not limited to, mobile phones, PADs, computers, and notebooks.
  • Step 103 Receive a user voice input by the requesting terminal according to the character combination and a pronunciation rule of a character
  • Step 104 Perform voiceprint authentication according to the user voice, the character combination, and the pronunciation rule of the character;
  • Step 105 Send the voiceprint authentication result to the requesting terminal.
  • the pronunciation rule of the character cannot be obtained, and by adding the authentication of the pronunciation rule, the recording attack can be effectively prevented.
  • step 104 further includes:
  • the voiceprint authentication is passed, and in other cases, the voiceprint authentication fails, that is, if the voice of the user voice and the history input by the user are not the same person, and/or the characters in the voice of the user If the characters in the character combination are different, and/or the pronunciation manner of the characters in the user voice does not match the pronunciation rule of the character, the voiceprint authentication does not pass.
  • the present application does not limit the order of the above-mentioned judging process, and any combination of sequences can realize the judgment of voiceprint authentication.
  • step 104 further includes:
  • Step 201 First, it is determined whether the voice input by the user voice and the user history is the same person's voice; if not the voice of the same person, the voiceprint authentication does not pass, if it is the voice of the same person, proceed to step 202;
  • the user voice sent by the client is separated according to characters, and then the characters in the user voice are extracted.
  • Step 202 Determine whether the characters in the user voice are the same as the characters in the character combination.
  • the voiceprint authentication fails, that is, the voiceprint authentication fails
  • step 203 If the characters in the user voice are the same as the characters in the character combination, proceed to step 203;
  • Step 203 Determine whether a pronunciation manner of a character in the user voice matches a pronunciation rule of the character
  • the voiceprint authentication is passed.
  • voiceprint authentication can speed up the authentication, prevent the recording attack and improve the user experience.
  • voiceprint authentication is performed in the order described in this embodiment unless otherwise specified.
  • the method further includes storing the user voice into the historical voice library, so as to facilitate subsequent retrieval of the voice information input by the user.
  • the method further includes:
  • Step 204 Determine whether the user voice is consistent with the voice of the user in the historical voice library.
  • the voiceprint authentication does not pass
  • the voiceprint authentication is passed, and the user voice is stored in the historical voice library.
  • step 204 of the previous embodiment further includes:
  • the predetermined threshold value described in this embodiment can be determined based on the difference in the same sound that a person makes.
  • the detailed process of determining whether the user voice is consistent with the voice of the user in the historical voice library is:
  • the user voice is divided into multiple segments of speech according to characters, and each segment of speech is preprocessed, including framing, pre-emphasis, windowing, etc., to obtain a segment of sound that can be further calculated.
  • FIG. 4 is a waveform diagram corresponding to the pronunciation of the numeral “0”. It can be seen from FIG. 4 that there are many silent segments or fine noise segments before and after the sound. If these invalid sound signals are not removed, the attacker can process the invalid sound end of the recording and affect the effect of the recording detection.
  • the start point and the end point of the effective part of the voice can be judged by the short-time energy and the short-time zero-crossing rate.
  • the short-time energy refers to the sum of the intensities of one frame of speech signals, and the short-term energy En of the n-th frame speech signals:
  • n is the mth sample point of the nth frame
  • N is the size of the frame
  • x n (m) is the normalized frequency of the mth sample point of the nth frame.
  • the short-term zero-crossing rate refers to the number of times a frame of a speech signal crosses the horizontal axis, denoted as Z n .
  • n is the mth sample point of the nth frame
  • N is the size of the frame
  • x n (m) is the normalized frequency of the mth sample point of the nth frame.
  • the voice is the beginning of the effective voice, when the short-time energy En is lower than the threshold E or the short-time zero-crossing rate Zn is lower than the valve At the value Z, the speech is the end of the active speech.
  • MFCC Mel scale cepstral coefficients
  • the voice representation of a certain character of the user is T:
  • T has N frame vectors ⁇ T(1), T(2), ... T(n), ..., T(N) ⁇ , and T(n) is a speech feature vector of the nth frame.
  • d(T(i n ), R(i m )) represents the Euclidean distance between the feature of the i- th frame in T and the feature of the i m frame in R, if the two waveforms completely coincide in a certain frame, then The distance d is 0.
  • the distance D[T, R] between them can be calculated, and the smaller the distance, the higher the similarity.
  • the voiceprint authentication method capable of preventing recording attacks proposed by the present application can effectively prevent recording attacks by verifying whether characters and pronunciations in the user's voice are consistent with the character combination generated by the server and the pronunciation rules of the characters, and the attacker can pass the attack effectively.
  • the user voices obtained by other channels satisfy the voice content and cannot meet the requirements of the pronunciation mode.
  • it is determined that the character and the pronunciation mode in the user voice are consistent with the character combination generated by the server and the pronunciation rule of the character, and the current voice and historical voice library to be verified are also determined. Whether the voice of the user is consistent. If they are consistent, there is a recording attack.
  • This application can effectively prevent recording attacks in voiceprint authentication.
  • FIG. 5 is a flowchart of a voiceprint authentication method capable of preventing a recording attack according to an embodiment of the present application. The method is described from the requesting terminal side. Specifically, the voiceprint authentication method includes:
  • Step 501 Send a user's voiceprint authentication request to the server;
  • Step 502 Receive and display a character combination sent by the server and a pronunciation rule of the character.
  • Step 503 Receive a user voice input by the user according to the character combination and the pronunciation rule of the character;
  • Step 504 Send the user voice to the server.
  • Step 505 Receive a voiceprint authentication result sent by the server.
  • FIG. 6 is a voiceprint authentication server capable of preventing a recording attack according to an embodiment of the present invention.
  • the server 600 includes a generating unit 601, configured to generate a character combination and a pronunciation of a character according to a request of a user. rule;
  • the sending unit 602 is configured to send the character combination and the pronunciation rule of the character to the requesting terminal, and send the voiceprint authentication result to the requesting terminal;
  • the receiving unit 603 is configured to receive a user voice input by the requesting terminal according to the character combination and a pronunciation rule of the character;
  • the sound detecting unit 604 is configured to perform voiceprint authentication according to the user voice, the character combination, and the pronunciation rule of the character.
  • FIG. 7 is a voiceprint authentication terminal capable of preventing a recording attack according to an embodiment of the present application.
  • the authentication terminal 700 includes: a requesting unit 701, configured to send a voiceprint authentication request of a user to a server;
  • the receiving unit 702 is configured to receive and display a character combination sent by the server and a pronunciation rule of the character, and receive a voiceprint authentication result sent by the server;
  • the entry unit 703 is configured to receive a user voice input by the user according to the character combination and the pronunciation rule of the character;
  • the sending unit 704 is configured to send the user voice to the server.
  • FIG. 8 is a voiceprint authentication system capable of preventing a recording attack according to an embodiment of the present application.
  • the voiceprint authentication system includes a server 600 and a requesting terminal 700, wherein the server 600 is configured to generate a character combination and a pronunciation rule of a character according to a user's voiceprint authentication request; and send the character combination and the pronunciation rule of the character to Receiving a user voice input by the requesting terminal according to the character combination and the pronunciation rule of the character; performing voiceprint authentication according to the user voice, the character combination, and the pronunciation rule of the character; and the voiceprint authentication result Sended to the requesting terminal;
  • the requesting terminal 700 is configured to send a user's voiceprint authentication request to the server; receive and display the character combination sent by the server and the pronunciation rule of the character; and receive the user voice input by the user according to the character combination and the pronunciation rule of the character. Transmitting the user voice to the server; receiving a voiceprint authentication result sent by the server.
  • the voiceprint authentication method, server, terminal and system capable of preventing recording attacks proposed by the present application can effectively prevent recording attacks by verifying whether characters and pronunciations in the user voice are consistent with the character combination generated by the server and the pronunciation rules of the characters. Even if the attacker can obtain the voice content through other channels, the attacker cannot satisfy the requirement of the pronunciation mode. Further, in order to prevent the user voice repeatedly input by the user from being subjected to a recording attack, it is determined that the character and the pronunciation mode in the user voice are consistent with the character combination generated by the server and the pronunciation rule of the character, and the current voice and historical voice library to be verified are also determined. Whether the voice of the user is consistent. If they are consistent, there is a recording attack. This application can effectively prevent recording attacks in voiceprint authentication.
  • the system workflow for preventing a recording attack is:
  • Step 901 The client sends an identity authentication request to the server.
  • Step 902 The server receives an identity authentication request.
  • Step 903 The server randomly generates a verification character combination and a pronunciation mode of the character according to the identity authentication request, and sends the pronunciation mode to the client.
  • Step 904 After receiving the character combination to be verified and the pronunciation rule of the character sent by the server, the client prompts the user to read the character as required;
  • Step 905 The client receives the user voice read by the user, and sends the user voice read by the user to the server.
  • Step 906 The server performs voiceprint verification, and determines whether the received user voice and the pre-stored voice of the user are the same person, and the current conventional voiceprint verification algorithm may be used in the specific implementation;
  • Step 907 Verify whether the characters in the user voice are the same as the characters in the character combination generated by the server; if the characters in the user voice are different from the characters in the character combination generated by the server, the character verification in the user voice does not pass, and returns User authentication fails to the client; if the characters in the user voice are the same as the characters in the server-generated character combination, the character verification in the user voice passes, proceeding to step 908;
  • Step 908 Verify whether the pronunciation mode of the character in the user voice is the same as the pronunciation mode of the character generated by the server. If the pronunciation mode of the character in the user voice is different from the pronunciation mode of the character generated by the server, the character pronunciation mode verification in the user voice is performed. If not, the user authentication failure is returned to the client; if the pronunciation of the character in the user voice is the same as that of the character generated by the server, the pronunciation of the character in the user voice is verified, and the process proceeds to step 909;
  • Step 909 Verify that the user voice exists in the historical voice library. If yes, it proves that there is a recording attack, the authentication fails, and the authentication failure result is sent to the client; if not, the voiceprint authentication passes, and the user voice is stored in In the historical voice library, the voiceprint authentication is sent to the client through the result.
  • an electronic device includes: a processor; and a memory including computer readable instructions that, when executed, cause the processor to perform the following operations:
  • an electronic device comprising: a processor; and a memory including computer readable instructions that, when executed, cause the processor to perform the following operations :
  • embodiments of the present application can be provided as a method, system, or computer program product.
  • the present application can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment in combination of software and hardware.
  • the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
  • the computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device.
  • the apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
  • These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device.
  • the instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.

Abstract

This application provides a voiceprint authentication method capable of preventing a recording attack, a server, a terminal, and a system, the voiceprint authentication method comprising: generating a character combination and a character pronunciation rule on the basis of a voiceprint authentication request of a user; sending the character combination and the character pronunciation rule to a requesting terminal; receiving a user voice input by the requesting terminal on the basis of the character combination and the character pronunciation rule; performing voiceprint authentication on the basis of the user voice, the character combination, and the character pronunciation rule; and sending a voiceprint authentication result to the requesting terminal. The application can effectively prevent a recording attack.

Description

能够防止录音攻击的声纹认证方法、服务器、终端及系统Voiceprint authentication method, server, terminal and system capable of preventing recording attacks 技术领域Technical field
本申请属于声纹识别领域,特别涉及一种能够防止录音攻击的声纹认证方法、服务器、终端及系统。The present application belongs to the field of voiceprint recognition, and particularly relates to a voiceprint authentication method, a server, a terminal, and a system capable of preventing a recording attack.
背景技术Background technique
声纹同指纹一样,是一种非常重要的能表征人身份的生物特征。相比传统的密码认证等手段,声纹高安全性和便捷性等特点。声纹认证中最常用的攻击手段主要有录音回放攻击、说话人仿冒攻击及伪造认证语音攻击。Like the fingerprint, the voiceprint is a very important biological feature that can characterize people. Compared with traditional password authentication and other means, the voiceprint has high security and convenience. The most commonly used attacks in voiceprint authentication are recording replay attacks, speaker spoofing attacks, and forged authentication voice attacks.
其中录音回放攻击是指攻击者通过高保真的录音设备通过各种手段获取用户的语音样本,使用用户的原始录音或者通过裁剪、拼接等手段处理后合成“说话人真音”,然后在认证系统采集用户语音时,通过高保真的功放进行回放,从而进行攻击。说话人仿冒攻击是指一些善于默认他人语音的攻击者通过模仿说话人的说话方式以及发音特点来进行攻击。伪造认证语音攻击是指通过合成、转换、拼接等技术手段伪造被攻击者的语音来进行攻击。The recording playback attack means that the attacker obtains the user's voice sample through various means through the high-fidelity recording device, and uses the original recording of the user or the method of cutting, splicing, etc. to synthesize the "speaker true sound", and then in the authentication system. When the user's voice is collected, it is played back through the high-fidelity power amplifier to attack. A speaker phishing attack is an attack by an attacker who is good at defaulting the voice of others by imitating the speaker's way of speaking and pronunciation. Forgery authentication voice attack refers to the attack by attacking the voice of the attacker through techniques such as synthesis, conversion, and splicing.
说话人仿冒攻击需要攻击者具有很好的模仿能力,伪造认证语音攻击也往往需要较高的专业技能,这两种攻击本身攻击难道就高,另外无论是模仿音还是伪造音,终究不是真实音,现有的声纹识别技术基本能够应对这两类攻击。The attacker's counterfeit attack requires the attacker to have a good ability to imitate. Forgery of the authenticated voice attack also requires high professional skills. These two attacks are inherently high in attack, and whether it is an analog sound or a fake sound, it is not a true sound. The existing voiceprint recognition technology can basically cope with these two types of attacks.
录音回放攻击是声纹识别中面临的非常重要的问题,攻击者获取声音后通过软件合成来进行攻击。录音攻击有两种情况,一种是用户在其他情况下说话声音被窃取来进行攻击;另一种是用户在进行声纹识别时,通过恶意软件录取用户的声音进行攻击。Recording playback attacks are very important issues in voiceprint recognition. Attackers acquire sounds and then use software synthesis to attack. There are two cases of recording attacks. One is that the user's voice is stolen in other situations to attack. The other is that the user attacks the voice of the user through malware during the voiceprint recognition.
针对录音攻击,现有技术中,主要有如下两种解决方法:For the recording attack, in the prior art, there are mainly two solutions as follows:
第一种方案是通过分析录音和原始语音之间在信道特征模式上差异来分辨出是否是录音内容;第二种方案是在验证说话人的声纹的同时,也验证说话人的说话内容,因为录音攻击者并不知道本次的说话内容。The first scheme is to distinguish whether the recording content is by analyzing the difference in the channel characteristic pattern between the recording and the original speech; the second scheme is to verify the speaker's voiceprint and also verify the speaker's speech content. Because the recording attacker does not know the content of this speech.
但是,方案一对声音信号质量、信噪比、通道质量等要求很高,在实际应用中取得的效果并不是很好。However, the solution has a high demand for sound signal quality, signal-to-noise ratio, channel quality, etc., and the effect achieved in practical applications is not very good.
方案二中如果每次随机的让用户读写大段文字,用户体验较差,如果减少用户的语音输入,比如专利(申请号:201310123555.0;发明名称:基于动态密码语音的身份确 认系统及方法),从26个英文字母以及10个数字中挑选组合,每次随机组合生产动态密码后,让用户通过语音进行输入,由于事先并不知道每次生产的动态密码,所以可以抵抗简单的录音攻击,是一种较好的解决办法。但是由于该专利只在26个英文字母和10个数字中共36个字符随机组合,如果攻击者通过录音分隔的方式,分隔出这36个字符,那么无论得到何种随机字符串,攻击者只需要简单的通过36个字符中进行拼接进行攻击。In the second scheme, if the user randomly reads and writes a large amount of text, the user experience is poor, if the user's voice input is reduced, such as a patent (application number: 201310123555.0; invention name: based on the dynamic password voice identity Recognition system and method), select and combine from 26 English letters and 10 numbers. After each random combination of production dynamic passwords, let the user input by voice. Because they do not know the dynamic password of each production in advance, they can resist A simple recording attack is a better solution. However, since the patent only randomly combines 36 characters in 26 English letters and 10 numbers, if the attacker separates the 36 characters by means of recording separation, the attacker only needs to obtain any random string. Simply splicing through 36 characters for attack.
发明内容Summary of the invention
本申请提供一种具有防止录音攻击功能的声纹认证方法、服务器及终端,用于解决现有技术中防止录音攻击方法存在漏洞,不能有效的防止录音攻击的缺陷。The present invention provides a voiceprint authentication method, a server, and a terminal, which are provided with a function of preventing a recording attack, and are used for solving the defect of preventing a recording attack in the prior art, and cannot effectively prevent a recording attack.
为了解决上述技术问题,本申请提供一种能够防止录音攻击的声纹认证方法,包括:In order to solve the above technical problem, the present application provides a voiceprint authentication method capable of preventing a recording attack, including:
根据一用户的声纹认证请求生成字符组合及字符的发音规则;Generating a character combination and a pronunciation rule of a character according to a user's voiceprint authentication request;
将所述字符组合及字符的发音规则发送给请求终端;Transmitting the character combination and the pronunciation rule of the character to the requesting terminal;
接收所述请求终端根据所述字符组合及字符的发音规则输入的用户语音;Receiving, by the requesting terminal, a user voice input according to the character combination and a pronunciation rule of the character;
根据所述用户语音、所述字符组合及字符的发音规则进行声纹认证;Performing voiceprint authentication according to the user voice, the character combination, and the pronunciation rule of the character;
将所述声纹认证结果发送至所述请求终端。Transmitting the voiceprint authentication result to the requesting terminal.
本申请另提供一种能够防止录音攻击的声纹认证方法,包括:The present application further provides a voiceprint authentication method capable of preventing a recording attack, including:
发送一用户的声纹认证请求至服务器;Sending a user's voiceprint authentication request to the server;
接收并显示所述服务器发送的字符组合及字符的发音规则;Receiving and displaying a combination of characters sent by the server and a pronunciation rule of the character;
接收用户根据所述字符组合及字符的发音规则输入的用户语音;Receiving a user voice input by the user according to the character combination and the pronunciation rule of the character;
将所述用户语音发送至所述服务器;Transmitting the user voice to the server;
接收所述服务器发送的声纹认证结果。Receiving a voiceprint authentication result sent by the server.
本申请另提供一种能够防止录音的声纹认证服务器,包括:The present application further provides a voiceprint authentication server capable of preventing recording, including:
生成单元,用于根据一用户的请求生成字符组合及字符的发音规则;a generating unit, configured to generate a character combination and a pronunciation rule of the character according to a request of the user;
发送单元,用于将所述字符组合及字符的发音规则发送给请求终端,将声纹认证结果发送至所述请求终端;a sending unit, configured to send the character combination and the pronunciation rule of the character to the requesting terminal, and send the voiceprint authentication result to the requesting terminal;
接收单元,用于接收所述请求终端根据所述字符组合及字符的发音规则输入的用户语音; a receiving unit, configured to receive a user voice input by the requesting terminal according to the character combination and a pronunciation rule of a character;
声音检测单元,用于根据所述用户语音、所述字符组合及字符的发音规则进行声纹认证;a sound detecting unit, configured to perform voiceprint authentication according to the user voice, the character combination, and a pronunciation rule of the character;
本申请又提供一种能够防止录音攻击的声纹认证终端,包括:The present application further provides a voiceprint authentication terminal capable of preventing a recording attack, including:
请求单元,用于发送一用户的声纹认证请求至服务器;a requesting unit, configured to send a user's voiceprint authentication request to the server;
接收单元,用于接收并显示所述服务器发送的字符组合及字符的发音规则,接收所述服务器发送的声纹认证结果;a receiving unit, configured to receive and display a character combination sent by the server and a pronunciation rule of the character, and receive a voiceprint authentication result sent by the server;
录入单元,用于接收用户根据所述字符组合及字符的发音规则输入的用户语音;An input unit, configured to receive a user voice input by a user according to the character combination and a pronunciation rule of the character;
发送单元,用于将所述用户语音发送至所述服务器。And a sending unit, configured to send the user voice to the server.
本申请再提供一种能够防止录音攻击的声纹认证系统,该系统包括服务器及请求终端,其中,所述服务器用于根据一用户的声纹认证请求生成字符组合及字符的发音规则;将所述字符组合及字符的发音规则发送给请求终端;接收所述请求终端根据所述字符组合及字符的发音规则输入的用户语音;根据所述用户语音、所述字符组合及字符的发音规则进行声纹认证;将所述声纹认证结果发送至所述请求终端;The present application further provides a voiceprint authentication system capable of preventing a recording attack, the system comprising a server and a requesting terminal, wherein the server is configured to generate a character combination and a pronunciation rule of a character according to a user's voiceprint authentication request; The character combination and the pronunciation rule of the character are sent to the requesting terminal; the user voice input by the requesting terminal according to the character combination and the pronunciation rule of the character is received; and the sound is performed according to the user voice, the character combination, and the pronunciation rule of the character. Pattern authentication; sending the voiceprint authentication result to the requesting terminal;
所述请求终端用于发送一用户的声纹认证请求至服务器;接收并显示所述服务器发送的字符组合及字符的发音规则;接收用户根据所述字符组合及字符的发音规则输入的用户语音;将所述用户语音发送至所述服务器;接收所述服务器发送的声纹认证结果。The requesting terminal is configured to send a user's voiceprint authentication request to the server; receive and display the character combination sent by the server and the pronunciation rule of the character; and receive the user voice input by the user according to the character combination and the pronunciation rule of the character; Transmitting the user voice to the server; receiving a voiceprint authentication result sent by the server.
本申请提出的能够防止录音攻击的声纹认证方法、服务器、终端及系统,通过验证用户语音中的字符和发音方式与服务器生成的字符组合及字符的发音规则是否一致,能够有效的防止录音攻击,攻击者即使能通过其他渠道获取到的用户语音满足语音内容,也无法满足发音方式的要求。进一步的,为了防止用户重复输入的用户语音受到录音攻击,判断出用户语音中的字符和发音方式与服务器生成的字符组合及字符的发音规则一致后,还判断当前待验证的语音与历史语音库中该用户的语音是否一致,如果一致则说明存在录音攻击。本申请可以有效的防止声纹认证中的录音攻击。The voiceprint authentication method, server, terminal and system capable of preventing recording attacks proposed by the present application can effectively prevent recording attacks by verifying whether characters and pronunciations in the user voice are consistent with the character combination generated by the server and the pronunciation rules of the characters. Even if the attacker can obtain the voice content through other channels, the attacker cannot satisfy the requirement of the pronunciation mode. Further, in order to prevent the user voice repeatedly input by the user from being subjected to a recording attack, it is determined that the character and the pronunciation mode in the user voice are consistent with the character combination generated by the server and the pronunciation rule of the character, and the current voice and historical voice library to be verified are also determined. Whether the voice of the user is consistent. If they are consistent, there is a recording attack. This application can effectively prevent recording attacks in voiceprint authentication.
附图说明DRAWINGS
为了更清楚地说明本申请实施例的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings used in the description of the embodiments will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the present application, Those skilled in the art can also obtain other drawings based on these drawings without paying any creative work.
图1为本申请一实施例的能够防止录音攻击的声纹认证方法流程图; 1 is a flowchart of a voiceprint authentication method capable of preventing a recording attack according to an embodiment of the present application;
图2为本申请一实施例的能够防止录音攻击的声纹认证过程流程图;2 is a flowchart of a voiceprint authentication process capable of preventing a recording attack according to an embodiment of the present application;
图3为本申请一实施例的能够防止录音攻击的声纹认证过程流程图;3 is a flowchart of a voiceprint authentication process capable of preventing a recording attack according to an embodiment of the present application;
图4为本申请一实施例的数字“0”的发音对应的波形图;4 is a waveform diagram corresponding to the pronunciation of the number “0” according to an embodiment of the present application;
图5为本申请一实施例的能够防止录音攻击的声纹认证方法流程图;FIG. 5 is a flowchart of a voiceprint authentication method capable of preventing a recording attack according to an embodiment of the present application;
图6为本申请一实施例的能够防止录音攻击的声纹认证服务器;6 is a voiceprint authentication server capable of preventing a recording attack according to an embodiment of the present application;
图7为本申请一实施例的能够防止录音攻击的声纹认证终端;FIG. 7 is a voiceprint authentication terminal capable of preventing a recording attack according to an embodiment of the present application;
图8为本申请一实施例的能够防止录音攻击的声纹认证系统;FIG. 8 is a voiceprint authentication system capable of preventing a recording attack according to an embodiment of the present application; FIG.
图9为本申请一实施例的具有防止录音攻击功能的声纹认证方法流程图。FIG. 9 is a flowchart of a voiceprint authentication method with a function of preventing a recording attack according to an embodiment of the present application.
具体实施方式detailed description
为了使本申请的技术特点及效果更加明显,下面结合附图对本申请的技术方案做进一步说明,本申请也可有其他不同的具体实例来加以说明或实施,任何本领域技术人员在权利要求范围内做的等同变换均属于本申请的保护范畴。In order to make the technical features and effects of the present application more obvious, the technical solutions of the present application are further described below with reference to the accompanying drawings, and the present application may also be described or implemented in various other specific examples, and any person skilled in the art is in the scope of the claims. Equivalent transformations made within the scope of protection of this application.
如图1所示,图1为本申请一实施例的能够防止录音攻击的声纹认证方法流程图。As shown in FIG. 1 , FIG. 1 is a flowchart of a voiceprint authentication method capable of preventing a recording attack according to an embodiment of the present application.
本实施例是从服务器侧描述的声纹认证方法,根据终端反馈的用户语音、服务器生成的字符组合及字符的发音规则进行声纹认证,本实施例能够一定程度上防止录音攻击。This embodiment is a voiceprint authentication method described on the server side. The voiceprint authentication is performed according to the user voice fed back by the terminal, the character combination generated by the server, and the pronunciation rule of the character. This embodiment can prevent recording attacks to a certain extent.
具体的,能够防止录音攻击的声纹认证方法包括如下步骤:Specifically, the voiceprint authentication method capable of preventing a recording attack includes the following steps:
步骤101:根据一用户的声纹认证请求生成字符组合及字符的发音规则;Step 101: Generate a character combination and a pronunciation rule of the character according to a user's voiceprint authentication request;
字符组合包括但不限于字母、数字、汉字等,字符的发音规则包括但不限于发音的音调、发音的长度等,一实施例中,字符组合中的每个字符对应一个发音规则,另一实施例中,字符组合中的两个字符对应一个发音规则,本申请对字符组合及字符组合中的字符的发音规则的具体形式不做限制。The character combination includes but is not limited to letters, numbers, Chinese characters, etc., and the pronunciation rules of the characters include, but are not limited to, the pitch of the pronunciation, the length of the pronunciation, and the like. In one embodiment, each character in the character combination corresponds to one pronunciation rule, and the other implementation In the example, the two characters in the character combination correspond to one pronunciation rule, and the present application does not limit the specific form of the pronunciation rule of the characters in the character combination and the character combination.
本申请一实施例中,所述字符组合及字符的发音规则是随机生成的。In an embodiment of the present application, the character combination and the pronunciation rule of the character are randomly generated.
步骤102:将字符组合及字符的发音规则发送给请求终端;Step 102: Send a character combination and a pronunciation rule of the character to the requesting terminal;
本申请所述的终端包括但不限于手机、PAD、电脑及笔记本。The terminals described in the present application include, but are not limited to, mobile phones, PADs, computers, and notebooks.
步骤103:接收所述请求终端根据所述字符组合及字符的发音规则输入的用户语音;Step 103: Receive a user voice input by the requesting terminal according to the character combination and a pronunciation rule of a character;
步骤104:根据所述用户语音、所述字符组合及字符的发音规则进行声纹认证;Step 104: Perform voiceprint authentication according to the user voice, the character combination, and the pronunciation rule of the character;
步骤105:将所述声纹认证结果发送至所述请求终端。 Step 105: Send the voiceprint authentication result to the requesting terminal.
本实施例中,即使攻击者能够获取语音字符信息,也无法获取字符的发音规则,通过加入发音规则的认证,能够有效的防止录音攻击。In this embodiment, even if the attacker can obtain the voice character information, the pronunciation rule of the character cannot be obtained, and by adding the authentication of the pronunciation rule, the recording attack can be effectively prevented.
详细的说,步骤104进一步包括:In detail, step 104 further includes:
判断所述用户语音与所述用户历史输入的语音是否为同一人的声音;Determining whether the voice of the user voice and the history input by the user are the same person's voice;
判断所述用户语音中的字符与所述字符组合中的字符是否相同;Determining whether a character in the user voice is the same as a character in the character combination;
判断所述用户语音中的字符的发音方式与所述字符的发音规则是否匹配;Determining whether a pronunciation manner of a character in the user voice matches a pronunciation rule of the character;
只有所述用户语音与所述用户历史输入的语音为同一人、所述用户语音中的字符与所述字符组合中的字符相同且所述用户语音中的字符的发音方式与所述字符的发音规则匹配同时满足时,声纹认证才通过,其他情况声纹认证不通过,即若所述用户语音与所述用户历史输入的语音不为同一人,和/或所述用户语音中的字符与所述字符组合中的字符不同,和/或所述用户语音中的字符的发音方式与所述字符的发音规则不匹配,则声纹认证不通过。Only the voice input by the user voice and the user history is the same person, the characters in the user voice are the same as the characters in the character combination, and the pronunciation manner of the characters in the user voice and the pronunciation of the characters When the rule matching is satisfied at the same time, the voiceprint authentication is passed, and in other cases, the voiceprint authentication fails, that is, if the voice of the user voice and the history input by the user are not the same person, and/or the characters in the voice of the user If the characters in the character combination are different, and/or the pronunciation manner of the characters in the user voice does not match the pronunciation rule of the character, the voiceprint authentication does not pass.
本申请并不限制上述判断过程的顺序,任何顺序的组合均能实现声纹认证的判断。The present application does not limit the order of the above-mentioned judging process, and any combination of sequences can realize the judgment of voiceprint authentication.
可选的,如图2所示,步骤104进一步包括:Optionally, as shown in FIG. 2, step 104 further includes:
步骤201:先判断所述用户语音与所述用户历史输入的语音是否为同一人的声音;如果不为同一人的声音,则声纹认证不通过,如果为同一人的声音,继续步骤202;Step 201: First, it is determined whether the voice input by the user voice and the user history is the same person's voice; if not the voice of the same person, the voiceprint authentication does not pass, if it is the voice of the same person, proceed to step 202;
具体实施时,在进行步骤202之前,需先按照字符分隔客户端上送的用户语音,然后提取用户语音中的字符。In a specific implementation, before performing step 202, the user voice sent by the client is separated according to characters, and then the characters in the user voice are extracted.
步骤202:判断所述用户语音中的字符与所述字符组合中的字符是否相同;Step 202: Determine whether the characters in the user voice are the same as the characters in the character combination.
如果所述用户语音中的字符与所述字符组合中的字符不同,则声纹认证不通过即声纹认证失败;If the characters in the user voice are different from the characters in the character combination, the voiceprint authentication fails, that is, the voiceprint authentication fails;
如果所述用户语音中的字符与所述字符组合中的字符相同,则继续步骤203;If the characters in the user voice are the same as the characters in the character combination, proceed to step 203;
步骤203:判断所述用户语音中的字符的发音方式与所述字符的发音规则是否匹配;Step 203: Determine whether a pronunciation manner of a character in the user voice matches a pronunciation rule of the character;
如果所述用户语音中的字符的发音方式与所述字符的发音规则不匹配,则声纹认证不通过;If the pronunciation mode of the character in the user voice does not match the pronunciation rule of the character, the voiceprint authentication does not pass;
如果所述用户语音中的字符的发音方式与所述字符的发音规则匹配,则声纹认证通过。 If the pronunciation of the character in the user's voice matches the pronunciation rule of the character, the voiceprint authentication is passed.
按照本实施例所述的顺序进行声纹认证能够加快认证的速度,预防录音攻击的同时提高用户的体验效果。如下实施例中,如不做特殊说明,均按照本实施例所述的顺序进行声纹认证。The voiceprint authentication according to the sequence described in this embodiment can speed up the authentication, prevent the recording attack and improve the user experience. In the following embodiments, voiceprint authentication is performed in the order described in this embodiment unless otherwise specified.
复请参阅图2,判断出所述用户语音与所述用户历史输入的语音为同一人、所述用户语音中的字符与所述字符组合中的字符相同且所述用户语音中的字符的发音方式与所述字符的发音规则匹配后还包括将用户语音存储至历史语音库中,便于后续调取用户输入的语音信息。Referring to FIG. 2, it is determined that the voice of the user voice and the user history input are the same person, the characters in the user voice are the same as the characters in the character combination, and the pronunciation of the characters in the user voice is After the manner is matched with the pronunciation rule of the character, the method further includes storing the user voice into the historical voice library, so as to facilitate subsequent retrieval of the voice information input by the user.
如图3所示,本申请一实施例中,判断出所述用户语音与所述用户历史输入的语音为同一人、所述用户语音中的字符与所述字符组合中的字符相同且所述用户语音中的字符的发音方式与所述字符的发音规则匹配后还包括:As shown in FIG. 3, in an embodiment of the present application, it is determined that the voice input by the user voice and the user history is the same person, the characters in the user voice are the same as the characters in the character combination, and the After the pronunciation of the characters in the user's voice is matched with the pronunciation rules of the characters, the method further includes:
步骤204:判断所述用户语音与所述用户在历史语音库中的语音是否一致;Step 204: Determine whether the user voice is consistent with the voice of the user in the historical voice library.
如果所述用户语音与所述用户在历史语音库中的语音一致,则声纹认证不通过;If the user voice is consistent with the voice of the user in the historical voice library, the voiceprint authentication does not pass;
如果所述用户语音与所述用户在历史语音库中的语音不一致,则声纹认证通过,将所述用户语音存储至历史语音库中。If the user voice is inconsistent with the voice of the user in the historical voice library, the voiceprint authentication is passed, and the user voice is stored in the historical voice library.
通过验证用户语音与历史语音库中的该用户的语音是否一致,能够防止同一用户的不同次语音认证中输入的相同用户语音出现录音攻击。By verifying whether the user voice is consistent with the voice of the user in the historical voice library, it is possible to prevent a recording attack of the same user voice input in different voice authentications of the same user.
本申请一实施例中,上一实施例的步骤204进一步包括:In an embodiment of the present application, step 204 of the previous embodiment further includes:
提取所述用户语音的特征参数;Extracting characteristic parameters of the user voice;
计算所述用户语音的特征参数与所述用户在历史数据库中的语音的特征参数的欧几里德距离,所述欧几里德距离小于预定阈值时,所述用户语音与所述用户在历史语音库中的语音一致,所述欧几里德距离大于预定阈值时,所述用户语音与所述用户在历史语音库中的语音不一致。Calculating a Euclidean distance of a feature parameter of the user voice and a feature parameter of a voice of the user in a history database, where the Echo and the user are in history when the Euclidean distance is less than a predetermined threshold The voices in the voice library are consistent. When the Euclidean distance is greater than a predetermined threshold, the user voice is inconsistent with the voice of the user in the historical voice library.
本实施例所述的预定阈值可根据人发出同样声音的差异性确定。The predetermined threshold value described in this embodiment can be determined based on the difference in the same sound that a person makes.
具体实施时,判断用户语音与所述用户在历史语音库中的语音是否一致的详细过程为:In a specific implementation, the detailed process of determining whether the user voice is consistent with the voice of the user in the historical voice library is:
1)按字符将用户语音分为多段语音,对每段语音进行预处理,包括分帧、预加重、加窗等处理,得到可以进一步计算的一段声音。1) The user voice is divided into multiple segments of speech according to characters, and each segment of speech is preprocessed, including framing, pre-emphasis, windowing, etc., to obtain a segment of sound that can be further calculated.
2)找到每段语音中的有效语音部分的起点和终点。 2) Find the start and end points of the active speech portion of each speech.
如图4所示,图4为数字“0”的发音对应的波形图,由图4可以看出在声音的前后都有很多的无音段或者细微的噪声段。如果不去掉这些无效的声音信号,攻击者可以在录音的无效的声音端进行处理而影响录音检测的效果。As shown in FIG. 4, FIG. 4 is a waveform diagram corresponding to the pronunciation of the numeral “0”. It can be seen from FIG. 4 that there are many silent segments or fine noise segments before and after the sound. If these invalid sound signals are not removed, the attacker can process the invalid sound end of the recording and affect the effect of the recording detection.
具体实施时,可通过短时能量和短时过零率判断语音有效部分的起点和终点。In a specific implementation, the start point and the end point of the effective part of the voice can be judged by the short-time energy and the short-time zero-crossing rate.
其中短时能量是指一帧语音信号的强度之和,第n帧语音信号的短时能量En:The short-time energy refers to the sum of the intensities of one frame of speech signals, and the short-term energy En of the n-th frame speech signals:
Figure PCTCN2016111714-appb-000001
Figure PCTCN2016111714-appb-000001
其中,m为第n帧第m个采样点,N为该帧的大小,xn(m)为第n帧第m个采样点归一化后的频率。Where m is the mth sample point of the nth frame, N is the size of the frame, and x n (m) is the normalized frequency of the mth sample point of the nth frame.
短时过零率是指在一帧语音信号波形穿过横轴的次数,记为ZnThe short-term zero-crossing rate refers to the number of times a frame of a speech signal crosses the horizontal axis, denoted as Z n .
Figure PCTCN2016111714-appb-000002
Figure PCTCN2016111714-appb-000002
其中,m为第n帧第m个采样点,N为该帧的大小,xn(m)为第n帧第m个采样点归一化后的频率。Where m is the mth sample point of the nth frame, N is the size of the frame, and x n (m) is the normalized frequency of the mth sample point of the nth frame.
当短时能量En超过阀值E或者短时过零率Zn超过阀值Z时,该语音为有效语音的开始,当短时能量En低于阀值E或者短时过零率Zn低于阀值Z时,该语音为有效语音的结束。When the short-time energy En exceeds the threshold E or the short-time zero-crossing rate Zn exceeds the threshold value Z, the voice is the beginning of the effective voice, when the short-time energy En is lower than the threshold E or the short-time zero-crossing rate Zn is lower than the valve At the value Z, the speech is the end of the active speech.
3)采用Mel尺度倒谱系数(MFCC)对有效语音提取特征参数。该方法是目前声音处理中比较通用的特征参数提取办法,本申请此处不再赘述。3) Using Mel scale cepstral coefficients (MFCC) to extract characteristic parameters for effective speech. This method is a relatively common feature parameter extraction method in the current sound processing, and will not be described herein again.
记录用户本次经过前三步预处理、分割掉语音无效部分和提取特征参数后,用户的某个字符的语音表示为T:After recording the user's first three steps of preprocessing, splitting the invalid part of the voice and extracting the feature parameters, the voice representation of a certain character of the user is T:
T有N帧矢量{T(1),T(2),…T(n),…,T(N)},T(n)是第n帧的语音特征矢量。T has N frame vectors {T(1), T(2), ... T(n), ..., T(N)}, and T(n) is a speech feature vector of the nth frame.
对于历史库中该用户的字符发音进行同样预处理、分割掉语音无效部分和提取特征参数后记为R:Perform the same preprocessing on the character pronunciation of the user in the history library, segment the voice invalid part and extract the feature parameters, and record it as R:
R有M帧矢量R={R(1),R(2),…R(m),…,R(M)},R(m)为第m帧的语音特征矢量。R has an M frame vector R = {R(1), R(2), ... R(m), ..., R(M)}, and R(m) is a speech feature vector of the mth frame.
4)计算用户声音与历史语音库中存储的声音的相似性,即为计算T与R的相似性,该相似性判断可通过计算T和R的欧几里得距离。 4) Calculate the similarity between the user's voice and the sound stored in the historical speech library, that is, to calculate the similarity between T and R, which can be calculated by calculating the Euclidean distances of T and R.
d(T(in),R(im))表示T中第in帧特征与R中im帧特征之间的欧几里德距离,如果两个波形在某个帧完全重合,则距离d为0。为了比较它们之间的相似度,可以计算它们之间的距离D[T,R],距离越小则相似度越高。d(T(i n ), R(i m )) represents the Euclidean distance between the feature of the i- th frame in T and the feature of the i m frame in R, if the two waveforms completely coincide in a certain frame, then The distance d is 0. In order to compare the similarities between them, the distance D[T, R] between them can be calculated, and the smaller the distance, the higher the similarity.
若N=M,即两段语音长度相同,直接简单计算用户语音与历史语音库中存储的语音的欧几里得距离D[T,R]=d(1,1)+d(2,2)+…+d(N,N),如果两端语音完全一样,则D[T,R]=0,通过这种方式只可以判断T和R是否完全相同,但是录音攻击者在实际攻击中往往会采取对原始录音在部分位置进行拉伸、缩短或者删除等操作,所以如果简单计算两者距离并不能很好的防御此类攻击。If N=M, that is, the lengths of the two speeches are the same, directly calculate the Euclidean distance D[T, R]=d(1,1)+d(2,2) of the voice stored in the user speech and the historical speech database. )+...+d(N,N), if the voices at both ends are exactly the same, then D[T,R]=0, in this way only T and R can be judged to be exactly the same, but the recording attacker is in the actual attack. It is often necessary to stretch, shorten, or delete the original recording in some places, so if you simply calculate the distance between them, you can't defend against such attacks.
对于N和M不相同时,要考虑将T(n)和R(m)对齐。对齐可以采用线性扩张的方法,如果N<M可以将T线性映射为一个M帧的序列,再计算它与{R(1),R(2),……,R(M)}之间的距离。但是攻击者不会对整段声音进行处理,而往往只对声音的部分位置进行处理,如果采取此方法会识别出二者声音相似度很低。When N and M are not the same, consider aligning T(n) and R(m). Alignment can be performed by linear expansion. If N<M can linearly map T to a sequence of M frames, calculate it between {R(1), R(2), ..., R(M)}. distance. However, the attacker does not process the entire sound, but often only processes part of the sound. If this method is used, it will recognize that the sound similarity between the two is very low.
因此比较语音T和R的相似度需要将时间规则和距离测量结合起来,通过寻找函数im=Φ(in),将T的时间轴n非线性地映射到R的时间轴m上,并使该T与R的距离D[T,R]满足:Therefore, comparing the similarity of the speech T and R requires combining the time rule and the distance measurement, and by looking for the function i m =Φ(i n ), the time axis n of T is nonlinearly mapped onto the time axis m of R, and Let the distance D[T, R] of T and R satisfy:
Figure PCTCN2016111714-appb-000003
Figure PCTCN2016111714-appb-000003
其中:among them:
Figure PCTCN2016111714-appb-000004
Figure PCTCN2016111714-appb-000004
Φ(in+1)≥Φ(in)Φ(i n +1)≥Φ(i n )
Φ(in+1)-Φ(in)≤1Φ(i n +1)-Φ(i n )≤1
可以看出很明显满足动态规划的条件,可以使用动态规划算法进行求解,其中动态规划多项式为:It can be seen that the conditions for dynamic programming are clearly met and can be solved using a dynamic programming algorithm, where the dynamic programming polynomial is:
D(in,im)=d(T(in),R(im))+min{D(in-1,im),D(in-1,im-1),D(in-1,im-2)}D(in,im)=d(T(in),R(im))+min{D(in-1,im),D(in-1,im-1),D(in-1,im- 2)}
这样从(l,1)点出发(令D(1,1)=0)搜索,反复递推,直到(N,M)就可以得到最优路径,而且D(N,M)就是最佳匹配路径所对应的匹配距离。So starting from point (l,1) (let D(1,1)=0) search, repeated recursion until (N,M) can get the optimal path, and D(N,M) is the best match. The matching distance corresponding to the path.
由于每个人的发言由多种因素影响,任何人重复发相同字符的声音在声波上不可能完全相似,肯定存在差异性,定义这个差异性为判断的预定阀值。如果D(N,M)=0,则说明两端语音T和R完全一致,可以证明为语音T和R为一个声音,可能存在录音攻 击;如果D(N,M)<阀值,则说明两端语音T和R相似程度很高,同样可能存在录音攻击;如果D(N,M)>=阀值,则说明T和R不是同一语音,不存在录音攻击。Since each person's speech is influenced by many factors, the sound of any person repeating the same character cannot be completely similar on the sound wave, and there must be a difference. The difference is defined as the predetermined threshold of the judgment. If D(N,M)=0, it means that the voices T and R at both ends are exactly the same. It can be proved that the voices T and R are one sound, and there may be a recording attack. If D(N,M)< threshold, it means that the voices T and R at both ends are similar to each other, and there may be recording attacks; if D(N,M)>=threshold, then T and R are not The same voice, there is no recording attack.
本申请提出的能够防止录音攻击的声纹认证方法,通过验证用户语音中的字符和发音方式与服务器生成的字符组合及字符的发音规则是否一致,能够有效的防止录音攻击,攻击者即使能通过其他渠道获取到的用户语音满足语音内容,也无法满足发音方式的要求。进一步的,为了防止用户重复输入的用户语音受到录音攻击,判断出用户语音中的字符和发音方式与服务器生成的字符组合及字符的发音规则一致后,还判断当前待验证的语音与历史语音库中该用户的语音是否一致,如果一致则说明存在录音攻击。本申请可以有效的防止声纹认证中的录音攻击。The voiceprint authentication method capable of preventing recording attacks proposed by the present application can effectively prevent recording attacks by verifying whether characters and pronunciations in the user's voice are consistent with the character combination generated by the server and the pronunciation rules of the characters, and the attacker can pass the attack effectively. The user voices obtained by other channels satisfy the voice content and cannot meet the requirements of the pronunciation mode. Further, in order to prevent the user voice repeatedly input by the user from being subjected to a recording attack, it is determined that the character and the pronunciation mode in the user voice are consistent with the character combination generated by the server and the pronunciation rule of the character, and the current voice and historical voice library to be verified are also determined. Whether the voice of the user is consistent. If they are consistent, there is a recording attack. This application can effectively prevent recording attacks in voiceprint authentication.
如图5所示,图5为本申请一实施例的能够防止录音攻击的声纹认证方法流程图。该方法是从请求终端侧进行的描述,具体的,声纹认证方法包括:As shown in FIG. 5, FIG. 5 is a flowchart of a voiceprint authentication method capable of preventing a recording attack according to an embodiment of the present application. The method is described from the requesting terminal side. Specifically, the voiceprint authentication method includes:
步骤501:发送一用户的声纹认证请求至服务器;Step 501: Send a user's voiceprint authentication request to the server;
步骤502:接收并显示所述服务器发送的字符组合及字符的发音规则;Step 502: Receive and display a character combination sent by the server and a pronunciation rule of the character.
步骤503:接收用户根据所述字符组合及字符的发音规则输入的用户语音;Step 503: Receive a user voice input by the user according to the character combination and the pronunciation rule of the character;
步骤504:将所述用户语音发送至所述服务器;Step 504: Send the user voice to the server.
步骤505:接收所述服务器发送的声纹认证结果。Step 505: Receive a voiceprint authentication result sent by the server.
如图6所示,图6为本申请一实施例的一种能够防止录音攻击的声纹认证服务器,该服务器600包括,生成单元601,用于根据一用户的请求生成字符组合及字符的发音规则;As shown in FIG. 6, FIG. 6 is a voiceprint authentication server capable of preventing a recording attack according to an embodiment of the present invention. The server 600 includes a generating unit 601, configured to generate a character combination and a pronunciation of a character according to a request of a user. rule;
发送单元602,用于将所述字符组合及字符的发音规则发送给请求终端,将声纹认证结果发送至所述请求终端;The sending unit 602 is configured to send the character combination and the pronunciation rule of the character to the requesting terminal, and send the voiceprint authentication result to the requesting terminal;
接收单元603,用于接收所述请求终端根据所述字符组合及字符的发音规则输入的用户语音;The receiving unit 603 is configured to receive a user voice input by the requesting terminal according to the character combination and a pronunciation rule of the character;
声音检测单元604,用于根据所述用户语音、所述字符组合及字符的发音规则进行声纹认证。The sound detecting unit 604 is configured to perform voiceprint authentication according to the user voice, the character combination, and the pronunciation rule of the character.
如图7所示,图7为本申请一实施例的能够防止录音攻击的声纹认证终端。具体的,该认证终端700包括:请求单元701,用于发送一用户的声纹认证请求至服务器;As shown in FIG. 7, FIG. 7 is a voiceprint authentication terminal capable of preventing a recording attack according to an embodiment of the present application. Specifically, the authentication terminal 700 includes: a requesting unit 701, configured to send a voiceprint authentication request of a user to a server;
接收单元702,用于接收并显示所述服务器发送的字符组合及字符的发音规则,接收所述服务器发送的声纹认证结果; The receiving unit 702 is configured to receive and display a character combination sent by the server and a pronunciation rule of the character, and receive a voiceprint authentication result sent by the server;
录入单元703,用于接收用户根据所述字符组合及字符的发音规则输入的用户语音;The entry unit 703 is configured to receive a user voice input by the user according to the character combination and the pronunciation rule of the character;
发送单元704,用于将所述用户语音发送至所述服务器。The sending unit 704 is configured to send the user voice to the server.
如图8所示,图8为本申请一实施例的能够防止录音攻击的声纹认证系统。As shown in FIG. 8, FIG. 8 is a voiceprint authentication system capable of preventing a recording attack according to an embodiment of the present application.
该声纹认证系统包括服务器600及请求终端700,其中,所述服务器600用于根据一用户的声纹认证请求生成字符组合及字符的发音规则;将所述字符组合及字符的发音规则发送给请求终端;接收所述请求终端根据所述字符组合及字符的发音规则输入的用户语音;根据所述用户语音、所述字符组合及字符的发音规则进行声纹认证;将所述声纹认证结果发送至所述请求终端;The voiceprint authentication system includes a server 600 and a requesting terminal 700, wherein the server 600 is configured to generate a character combination and a pronunciation rule of a character according to a user's voiceprint authentication request; and send the character combination and the pronunciation rule of the character to Receiving a user voice input by the requesting terminal according to the character combination and the pronunciation rule of the character; performing voiceprint authentication according to the user voice, the character combination, and the pronunciation rule of the character; and the voiceprint authentication result Sended to the requesting terminal;
所述请求终端700用于发送一用户的声纹认证请求至服务器;接收并显示所述服务器发送的字符组合及字符的发音规则;接收用户根据所述字符组合及字符的发音规则输入的用户语音;将所述用户语音发送至所述服务器;接收所述服务器发送的声纹认证结果。The requesting terminal 700 is configured to send a user's voiceprint authentication request to the server; receive and display the character combination sent by the server and the pronunciation rule of the character; and receive the user voice input by the user according to the character combination and the pronunciation rule of the character. Transmitting the user voice to the server; receiving a voiceprint authentication result sent by the server.
本申请提出的能够防止录音攻击的声纹认证方法、服务器、终端及系统,通过验证用户语音中的字符和发音方式与服务器生成的字符组合及字符的发音规则是否一致,能够有效的防止录音攻击,攻击者即使能通过其他渠道获取到的用户语音满足语音内容,也无法满足发音方式的要求。进一步的,为了防止用户重复输入的用户语音受到录音攻击,判断出用户语音中的字符和发音方式与服务器生成的字符组合及字符的发音规则一致后,还判断当前待验证的语音与历史语音库中该用户的语音是否一致,如果一致则说明存在录音攻击。本申请可以有效的防止声纹认证中的录音攻击。The voiceprint authentication method, server, terminal and system capable of preventing recording attacks proposed by the present application can effectively prevent recording attacks by verifying whether characters and pronunciations in the user voice are consistent with the character combination generated by the server and the pronunciation rules of the characters. Even if the attacker can obtain the voice content through other channels, the attacker cannot satisfy the requirement of the pronunciation mode. Further, in order to prevent the user voice repeatedly input by the user from being subjected to a recording attack, it is determined that the character and the pronunciation mode in the user voice are consistent with the character combination generated by the server and the pronunciation rule of the character, and the current voice and historical voice library to be verified are also determined. Whether the voice of the user is consistent. If they are consistent, there is a recording attack. This application can effectively prevent recording attacks in voiceprint authentication.
为了更清楚的说明本申请的技术方案,下面以一具体实施例进行说明,结合图9所示,防止录音攻击的系统工作流程为:In order to explain the technical solution of the present application more clearly, the following describes a specific embodiment. As shown in FIG. 9, the system workflow for preventing a recording attack is:
步骤901:客户端发送身份认证请求至服务器;Step 901: The client sends an identity authentication request to the server.
步骤902:服务器接收身份认证请求;Step 902: The server receives an identity authentication request.
步骤903:服务器根据身份认证请求随机生成验证字符组合以及字符的发音方式,并将其发送给客户端;Step 903: The server randomly generates a verification character combination and a pronunciation mode of the character according to the identity authentication request, and sends the pronunciation mode to the client.
步骤904:客户端接收到服务器下发的待验证字符组合及字符的发音规则后,提示用户按要求读入字符;Step 904: After receiving the character combination to be verified and the pronunciation rule of the character sent by the server, the client prompts the user to read the character as required;
步骤905:客户端接收用户读入的用户语音,并将用户读入的用户语音发送至服务器; Step 905: The client receives the user voice read by the user, and sends the user voice read by the user to the server.
步骤906:服务器进行声纹验证,判断接收的用户语音与预先存储的该用户的语音是否为同一人,具体实施时可采用目前常规的声纹验证算法;Step 906: The server performs voiceprint verification, and determines whether the received user voice and the pre-stored voice of the user are the same person, and the current conventional voiceprint verification algorithm may be used in the specific implementation;
如果声纹验证不是同一个人,则直接返回用户认证失败给客户端;If the voiceprint verification is not the same person, the user authentication failure is directly returned to the client;
如果声纹验证为同一人,则继续录音检测;If the voiceprint is verified to be the same person, continue recording detection;
步骤907:验证用户声音中的字符与服务器生成的字符组合中的字符是否相同;如果用户声音中的字符与服务器生成的字符组合中的字符不相同,则用户声音中的字符验证不通过,返回用户认证失败给客户端;如果用户声音中的字符与服务器生成的字符组合中的字符相同,则用户声音中的字符验证通过,继续步骤908;Step 907: Verify whether the characters in the user voice are the same as the characters in the character combination generated by the server; if the characters in the user voice are different from the characters in the character combination generated by the server, the character verification in the user voice does not pass, and returns User authentication fails to the client; if the characters in the user voice are the same as the characters in the server-generated character combination, the character verification in the user voice passes, proceeding to step 908;
步骤908:验证用户声音中的字符的发音方式与服务器生成的字符发音方式是否相同,如果用户声音中的字符的发音方式与服务器生成的字符发音方式不相同,则用户声音中的字符发音方式验证不通过,返回用户认证失败给客户端;如果用户声音中的字符的发音方式与服务器生成的字符发音方式相同,则用户声音中的字符发音方式验证通过,继续步骤909;Step 908: Verify whether the pronunciation mode of the character in the user voice is the same as the pronunciation mode of the character generated by the server. If the pronunciation mode of the character in the user voice is different from the pronunciation mode of the character generated by the server, the character pronunciation mode verification in the user voice is performed. If not, the user authentication failure is returned to the client; if the pronunciation of the character in the user voice is the same as that of the character generated by the server, the pronunciation of the character in the user voice is verified, and the process proceeds to step 909;
步骤909:验证用户声音是否存在于历史语音库中,如果存在,则证明存在录音攻击,认证失败,将认证失败结果发送给客户端;如果不存在,则声纹认证通过,将用户声音存储于历史语音库中,将声纹认证通过结果发送给客户端。Step 909: Verify that the user voice exists in the historical voice library. If yes, it proves that there is a recording attack, the authentication fails, and the authentication failure result is sent to the client; if not, the voiceprint authentication passes, and the user voice is stored in In the historical voice library, the voiceprint authentication is sent to the client through the result.
验证用户声音是否存在于历史语音库中的过程已在上述实施例中进行了详细的说明,此处不再赘述。声纹认证通过后,客户端继续相应的操作,本申请对此不做限制。The process of verifying whether the user voice exists in the historical voice library has been described in detail in the above embodiment, and details are not described herein again. After the voiceprint authentication is passed, the client continues the corresponding operation, and this application does not limit this.
本申请提出的能够防止录音攻击的声纹认证方法、服务器、终端及系统,通过验证用户语音中的字符和发音方式与服务器生成的字符组合及字符的发音规则是否一致,能够有效的防止录音攻击,攻击者即使能通过其他渠道获取到的用户语音满足语音内容,也无法满足发音方式的要求。进一步的,为了防止用户重复输入的用户语音受到录音攻击,判断出用户语音中的字符和发音方式与服务器生成的字符组合及字符的发音规则一致后,还判断当前待验证的语音与历史语音库中该用户的语音是否一致,如果一致则说明存在录音攻击。本申请可以有效的防止声纹认证中的录音攻击。The voiceprint authentication method, server, terminal and system capable of preventing recording attacks proposed by the present application can effectively prevent recording attacks by verifying whether characters and pronunciations in the user voice are consistent with the character combination generated by the server and the pronunciation rules of the characters. Even if the attacker can obtain the voice content through other channels, the attacker cannot satisfy the requirement of the pronunciation mode. Further, in order to prevent the user voice repeatedly input by the user from being subjected to a recording attack, it is determined that the character and the pronunciation mode in the user voice are consistent with the character combination generated by the server and the pronunciation rule of the character, and the current voice and historical voice library to be verified are also determined. Whether the voice of the user is consistent. If they are consistent, there is a recording attack. This application can effectively prevent recording attacks in voiceprint authentication.
本申请一实施例中,还提供一种电子设备,该电子设备包括:处理器;和包括计算机可读指令的存储器,所述计算机可读指令在被执行时使所述处理器执行以下操作:In an embodiment of the present application, an electronic device is further provided, the electronic device includes: a processor; and a memory including computer readable instructions that, when executed, cause the processor to perform the following operations:
根据一用户的声纹认证请求生成字符组合及字符的发音规则;Generating a character combination and a pronunciation rule of a character according to a user's voiceprint authentication request;
将所述字符组合及字符的发音规则发送给请求终端;Transmitting the character combination and the pronunciation rule of the character to the requesting terminal;
接收所述请求终端根据所述字符组合及字符的发音规则输入的用户语音; Receiving, by the requesting terminal, a user voice input according to the character combination and a pronunciation rule of the character;
根据所述用户语音、所述字符组合及字符的发音规则进行声纹认证;将所述声纹认证结果发送至所述请求终端。Performing voiceprint authentication according to the user voice, the character combination, and the pronunciation rule of the character; and transmitting the voiceprint authentication result to the requesting terminal.
本申请另一实施例中,还提供一种电子设备,该电子设备包括:处理器;和包括计算机可读指令的存储器,所述计算机可读指令在被执行时使所述处理器执行以下操作:In another embodiment of the present application, there is also provided an electronic device, comprising: a processor; and a memory including computer readable instructions that, when executed, cause the processor to perform the following operations :
发送一用户的声纹认证请求至服务器;Sending a user's voiceprint authentication request to the server;
接收并显示所述服务器发送的字符组合及字符的发音规则;Receiving and displaying a combination of characters sent by the server and a pronunciation rule of the character;
接收用户根据所述字符组合及字符的发音规则输入的用户语音;Receiving a user voice input by the user according to the character combination and the pronunciation rule of the character;
将所述用户语音发送至所述服务器;Transmitting the user voice to the server;
接收所述服务器发送的声纹认证结果。Receiving a voiceprint authentication result sent by the server.
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art will appreciate that embodiments of the present application can be provided as a method, system, or computer program product. Thus, the present application can take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment in combination of software and hardware. Moreover, the application can take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, CD-ROM, optical storage, etc.) including computer usable program code.
本申请是参照根据本申请实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (system), and computer program products according to embodiments of the present application. It will be understood that each flow and/or block of the flowchart illustrations and/or FIG. These computer program instructions can be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing device to produce a machine for the execution of instructions for execution by a processor of a computer or other programmable data processing device. Means for implementing the functions specified in one or more of the flow or in a block or blocks of the flow chart.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。The computer program instructions can also be stored in a computer readable memory that can direct a computer or other programmable data processing device to operate in a particular manner, such that the instructions stored in the computer readable memory produce an article of manufacture comprising the instruction device. The apparatus implements the functions specified in one or more blocks of a flow or a flow and/or block diagram of the flowchart.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。 These computer program instructions can also be loaded onto a computer or other programmable data processing device such that a series of operational steps are performed on a computer or other programmable device to produce computer-implemented processing for execution on a computer or other programmable device. The instructions provide steps for implementing the functions specified in one or more of the flow or in a block or blocks of a flow diagram.
以上所述仅用于说明本申请技术方案,任何本领域普通技术人员均可在不违背本申请的精神及范畴下,对上述实施例进行修饰与改变。因此,本申请的权利保护范围应视权利要求范围为准。 The above description is only for explaining the technical solutions of the present application, and those skilled in the art can modify and change the above embodiments without departing from the spirit and scope of the present application. Therefore, the scope of protection of the application should be determined by the scope of the claims.

Claims (11)

  1. 一种能够防止录音攻击的声纹认证方法,其中,包括,A voiceprint authentication method capable of preventing a recording attack, wherein,
    根据一用户的声纹认证请求生成字符组合及字符的发音规则;Generating a character combination and a pronunciation rule of a character according to a user's voiceprint authentication request;
    将所述字符组合及字符的发音规则发送给请求终端;Transmitting the character combination and the pronunciation rule of the character to the requesting terminal;
    接收所述请求终端根据所述字符组合及字符的发音规则输入的用户语音;Receiving, by the requesting terminal, a user voice input according to the character combination and a pronunciation rule of the character;
    根据所述用户语音、所述字符组合及字符的发音规则进行声纹认证;将所述声纹认证结果发送至所述请求终端。Performing voiceprint authentication according to the user voice, the character combination, and the pronunciation rule of the character; and transmitting the voiceprint authentication result to the requesting terminal.
  2. 如权利要求1所述的能够防止录音攻击的声纹认证方法,其中,根据所述用户语音、所述字符组合及字符的发音规则进行声纹认证进一步包括,The voiceprint authentication method capable of preventing a recording attack according to claim 1, wherein the voiceprint authentication according to the user voice, the character combination, and the pronunciation rule of the character further includes,
    判断所述用户语音与所述用户历史输入的语音是否为同一人的声音;Determining whether the voice of the user voice and the history input by the user are the same person's voice;
    判断所述用户语音中的字符与所述字符组合中的字符是否相同;Determining whether a character in the user voice is the same as a character in the character combination;
    判断所述用户语音中的字符的发音方式与所述字符的发音规则是否匹配;Determining whether a pronunciation manner of a character in the user voice matches a pronunciation rule of the character;
    只有所述用户语音与所述用户历史输入的语音为同一人、所述用户语音中的字符与所述字符组合中的字符相同且所述用户语音中的字符的发音方式与所述字符的发音规则匹配时,声纹认证才通过,其他情况声纹认证不通过。Only the voice input by the user voice and the user history is the same person, the characters in the user voice are the same as the characters in the character combination, and the pronunciation manner of the characters in the user voice and the pronunciation of the characters When the rules are matched, the voiceprint authentication is passed, and in other cases, the voiceprint authentication does not pass.
  3. 如权利要求2所述的能够防止录音攻击的声纹认证方法,其中,判断出所述用户语音与所述用户历史输入的语音为同一人、所述用户语音中的字符与所述字符组合中的字符相同且所述用户语音中的字符的发音方式与所述字符的发音规则匹配后还包括,The voiceprint authentication method capable of preventing a recording attack according to claim 2, wherein it is determined that the user voice is the same person as the voice input by the user history, and the character in the user voice is combined with the character The characters are the same and the pronunciation manner of the characters in the user voice is matched with the pronunciation rule of the character, and includes
    将所述用户语音存储至历史语音库中。The user voice is stored in a historical voice library.
  4. 如权利要求2所述的能够防止录音攻击的声纹认证方法,其中,判断出所述用户语音与所述用户历史输入的语音为同一人、所述用户语音中的字符与所述字符组合中的字符相同且所述用户语音中的字符的发音方式与所述字符的发音规则匹配后还包括,The voiceprint authentication method capable of preventing a recording attack according to claim 2, wherein it is determined that the user voice is the same person as the voice input by the user history, and the character in the user voice is combined with the character The characters are the same and the pronunciation manner of the characters in the user voice is matched with the pronunciation rule of the character, and includes
    判断所述用户语音与所述用户在历史语音库中的语音是否一致;Determining whether the user voice is consistent with the voice of the user in the historical voice library;
    如果所述用户语音与所述用户在历史语音库中的语音一致,则声纹认证不通过;If the user voice is consistent with the voice of the user in the historical voice library, the voiceprint authentication does not pass;
    如果所述用户语音与所述用户在历史语音库中的语音不一致,则声纹认证通过,将所述用户语音存储至历史语音库中。If the user voice is inconsistent with the voice of the user in the historical voice library, the voiceprint authentication is passed, and the user voice is stored in the historical voice library.
  5. 如权利要求4所述的能够防止录音攻击的声纹认证方法,其中,判断所述用户语音与所述用户在历史语音库中的语音是否一致进一步包括,The voiceprint authentication method capable of preventing a recording attack according to claim 4, wherein determining whether the user voice is consistent with the voice of the user in the historical voice library further includes
    提取所述用户语音的特征参数; Extracting characteristic parameters of the user voice;
    计算所述用户语音的特征参数与所述用户在历史数据库中的语音的特征参数的欧几里德距离,所述欧几里德距离小于预定阈值时,所述用户语音与所述用户在历史语音库中的语音一致,所述欧几里德距离大于预定阈值时,所述用户语音与所述用户在历史语音库中的语音不一致。Calculating a Euclidean distance of a feature parameter of the user voice and a feature parameter of a voice of the user in a history database, where the Echo and the user are in history when the Euclidean distance is less than a predetermined threshold The voices in the voice library are consistent. When the Euclidean distance is greater than a predetermined threshold, the user voice is inconsistent with the voice of the user in the historical voice library.
  6. 如权利要求5所述的能够防止录音攻击的声纹认证方法,其中,提取所述用户语音的特征参数进一步包括,The voiceprint authentication method capable of preventing a recording attack according to claim 5, wherein extracting the feature parameters of the user voice further comprises
    对所述用户语音进行预处理,将所述用户语音按字符划分为多段语音;Performing pre-processing on the user voice, and dividing the user voice into multiple segments of speech according to characters;
    找到每段语音中的有效语音部分的起点和终点;Find the start and end points of the active speech portion of each speech;
    提取有效语音部分的特征参数。Extract the feature parameters of the active speech part.
  7. 如权利要求1所述的能够防止录音攻击的声纹认证方法,其中,所述字符组合及字符的发音规则是随机生成的。The voiceprint authentication method capable of preventing a recording attack according to claim 1, wherein the character combination and the pronunciation rule of the character are randomly generated.
  8. 一种能够防止录音攻击的声纹认证方法,其中,包括,A voiceprint authentication method capable of preventing a recording attack, wherein,
    发送一用户的声纹认证请求至服务器;Sending a user's voiceprint authentication request to the server;
    接收并显示所述服务器发送的字符组合及字符的发音规则;Receiving and displaying a combination of characters sent by the server and a pronunciation rule of the character;
    接收用户根据所述字符组合及字符的发音规则输入的用户语音;Receiving a user voice input by the user according to the character combination and the pronunciation rule of the character;
    将所述用户语音发送至所述服务器;Transmitting the user voice to the server;
    接收所述服务器发送的声纹认证结果。Receiving a voiceprint authentication result sent by the server.
  9. 一种能够防止录音攻击的声纹认证服务器,其中,包括,A voiceprint authentication server capable of preventing recording attacks, including,
    生成单元,用于根据一用户的请求生成字符组合及字符的发音规则;a generating unit, configured to generate a character combination and a pronunciation rule of the character according to a request of the user;
    发送单元,用于将所述字符组合及字符的发音规则发送给请求终端,将声纹认证结果发送至所述请求终端;a sending unit, configured to send the character combination and the pronunciation rule of the character to the requesting terminal, and send the voiceprint authentication result to the requesting terminal;
    接收单元,用于接收所述请求终端根据所述字符组合及字符的发音规则输入的用户语音;a receiving unit, configured to receive a user voice input by the requesting terminal according to the character combination and a pronunciation rule of a character;
    声音检测单元,用于根据所述用户语音、所述字符组合及字符的发音规则进行声纹认证。The sound detecting unit is configured to perform voiceprint authentication according to the user voice, the character combination, and the pronunciation rule of the character.
  10. 一种能够防止录音攻击的声纹认证终端,其中,包括,A voiceprint authentication terminal capable of preventing a recording attack, wherein,
    请求单元,用于发送一用户的声纹认证请求至服务器;a requesting unit, configured to send a user's voiceprint authentication request to the server;
    接收单元,用于接收并显示所述服务器发送的字符组合及字符的发音规则,接收所述服务器发送的声纹认证结果;a receiving unit, configured to receive and display a character combination sent by the server and a pronunciation rule of the character, and receive a voiceprint authentication result sent by the server;
    录入单元,用于接收用户根据所述字符组合及字符的发音规则输入的用户语音; An input unit, configured to receive a user voice input by a user according to the character combination and a pronunciation rule of the character;
    发送单元,用于将所述用户语音发送至所述服务器。And a sending unit, configured to send the user voice to the server.
  11. 一种能够防止录音攻击的声纹认证系统,其中,包括服务器及请求终端,其中,所述服务器用于根据一用户的声纹认证请求生成字符组合及字符的发音规则;将所述字符组合及字符的发音规则发送给请求终端;接收所述请求终端根据所述字符组合及字符的发音规则输入的用户语音;根据所述用户语音、所述字符组合及字符的发音规则进行声纹认证;将所述声纹认证结果发送至所述请求终端;A voiceprint authentication system capable of preventing a recording attack, comprising: a server and a requesting terminal, wherein the server is configured to generate a character combination and a pronunciation rule of a character according to a user's voiceprint authentication request; The pronunciation rule of the character is sent to the requesting terminal; the user voice input by the requesting terminal according to the character combination and the pronunciation rule of the character is received; the voiceprint authentication is performed according to the user voice, the character combination, and the pronunciation rule of the character; The voiceprint authentication result is sent to the requesting terminal;
    所述请求终端用于发送一用户的声纹认证请求至服务器;接收并显示所述服务器发送的字符组合及字符的发音规则;接收用户根据所述字符组合及字符的发音规则输入的用户语音;将所述用户语音发送至所述服务器;接收所述服务器发送的声纹认证结果。 The requesting terminal is configured to send a user's voiceprint authentication request to the server; receive and display the character combination sent by the server and the pronunciation rule of the character; and receive the user voice input by the user according to the character combination and the pronunciation rule of the character; Transmitting the user voice to the server; receiving a voiceprint authentication result sent by the server.
PCT/CN2016/111714 2015-12-30 2016-12-23 Voiceprint authentication method capable of preventing recording attack, server, terminal, and system WO2017114307A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201511020257.4 2015-12-30
CN201511020257.4A CN105933272A (en) 2015-12-30 2015-12-30 Voiceprint recognition method capable of preventing recording attack, server, terminal, and system

Publications (1)

Publication Number Publication Date
WO2017114307A1 true WO2017114307A1 (en) 2017-07-06

Family

ID=56839979

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/111714 WO2017114307A1 (en) 2015-12-30 2016-12-23 Voiceprint authentication method capable of preventing recording attack, server, terminal, and system

Country Status (2)

Country Link
CN (1) CN105933272A (en)
WO (1) WO2017114307A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109754817A (en) * 2017-11-02 2019-05-14 北京三星通信技术研究有限公司 signal processing method and terminal device
CN112365895A (en) * 2020-10-09 2021-02-12 深圳前海微众银行股份有限公司 Audio processing method and device, computing equipment and storage medium

Families Citing this family (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105933272A (en) * 2015-12-30 2016-09-07 中国银联股份有限公司 Voiceprint recognition method capable of preventing recording attack, server, terminal, and system
CN110169014A (en) * 2017-01-03 2019-08-23 诺基亚技术有限公司 Device, method and computer program product for certification
WO2019002831A1 (en) 2017-06-27 2019-01-03 Cirrus Logic International Semiconductor Limited Detection of replay attack
GB201713697D0 (en) 2017-06-28 2017-10-11 Cirrus Logic Int Semiconductor Ltd Magnetic detection of replay attack
GB2563953A (en) 2017-06-28 2019-01-02 Cirrus Logic Int Semiconductor Ltd Detection of replay attack
CN109218269A (en) * 2017-07-05 2019-01-15 阿里巴巴集团控股有限公司 Identity authentication method, device, equipment and data processing method
GB201801532D0 (en) 2017-07-07 2018-03-14 Cirrus Logic Int Semiconductor Ltd Methods, apparatus and systems for audio playback
GB201801530D0 (en) 2017-07-07 2018-03-14 Cirrus Logic Int Semiconductor Ltd Methods, apparatus and systems for authentication
GB201801527D0 (en) 2017-07-07 2018-03-14 Cirrus Logic Int Semiconductor Ltd Method, apparatus and systems for biometric processes
GB201801526D0 (en) 2017-07-07 2018-03-14 Cirrus Logic Int Semiconductor Ltd Methods, apparatus and systems for authentication
GB201801528D0 (en) 2017-07-07 2018-03-14 Cirrus Logic Int Semiconductor Ltd Method, apparatus and systems for biometric processes
GB201804843D0 (en) 2017-11-14 2018-05-09 Cirrus Logic Int Semiconductor Ltd Detection of replay attack
GB201801664D0 (en) 2017-10-13 2018-03-21 Cirrus Logic Int Semiconductor Ltd Detection of liveness
GB2567503A (en) 2017-10-13 2019-04-17 Cirrus Logic Int Semiconductor Ltd Analysing speech signals
GB201801663D0 (en) 2017-10-13 2018-03-21 Cirrus Logic Int Semiconductor Ltd Detection of liveness
GB201801661D0 (en) 2017-10-13 2018-03-21 Cirrus Logic International Uk Ltd Detection of liveness
GB201801659D0 (en) 2017-11-14 2018-03-21 Cirrus Logic Int Semiconductor Ltd Detection of loudspeaker playback
US11264037B2 (en) 2018-01-23 2022-03-01 Cirrus Logic, Inc. Speaker identification
US11735189B2 (en) 2018-01-23 2023-08-22 Cirrus Logic, Inc. Speaker identification
US11475899B2 (en) 2018-01-23 2022-10-18 Cirrus Logic, Inc. Speaker identification
US10692490B2 (en) 2018-07-31 2020-06-23 Cirrus Logic, Inc. Detection of replay attack
CN109087647B (en) * 2018-08-03 2023-06-13 平安科技(深圳)有限公司 Voiceprint recognition processing method and device, electronic equipment and storage medium
US10915614B2 (en) 2018-08-31 2021-02-09 Cirrus Logic, Inc. Biometric authentication
US11037574B2 (en) 2018-09-05 2021-06-15 Cirrus Logic, Inc. Speaker recognition and speaker change detection
CN109935233A (en) * 2019-01-29 2019-06-25 天津大学 A kind of recording attack detection method based on amplitude and phase information
CN111524528B (en) * 2020-05-28 2022-10-21 Oppo广东移动通信有限公司 Voice awakening method and device for preventing recording detection
CN112735426A (en) * 2020-12-24 2021-04-30 深圳市声扬科技有限公司 Voice verification method and system, computer device and storage medium
CN113012684B (en) * 2021-03-04 2022-05-31 电子科技大学 Synthesized voice detection method based on voice segmentation
CN114826709A (en) * 2022-04-15 2022-07-29 马上消费金融股份有限公司 Identity authentication and acoustic environment detection method, system, electronic device and medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090287486A1 (en) * 2008-05-14 2009-11-19 At&T Intellectual Property, Lp Methods and Apparatus to Generate a Speech Recognition Library
CN102737634A (en) * 2012-05-29 2012-10-17 百度在线网络技术(北京)有限公司 Authentication method and device based on voice
CN104901808A (en) * 2015-04-14 2015-09-09 时代亿宝(北京)科技有限公司 Voiceprint authentication system and method based on time type dynamic password
CN105096121A (en) * 2015-06-25 2015-11-25 百度在线网络技术(北京)有限公司 Voiceprint authentication method and device
CN105185379A (en) * 2015-06-17 2015-12-23 百度在线网络技术(北京)有限公司 Voiceprint authentication method and voiceprint authentication device
CN105933272A (en) * 2015-12-30 2016-09-07 中国银联股份有限公司 Voiceprint recognition method capable of preventing recording attack, server, terminal, and system

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1808567A (en) * 2006-01-26 2006-07-26 覃文华 Voice-print authentication device and method of authenticating people presence
CN105873050A (en) * 2010-10-14 2016-08-17 阿里巴巴集团控股有限公司 Wireless service identity authentication, server and system
CN102543084A (en) * 2010-12-29 2012-07-04 盛乐信息技术(上海)有限公司 Online voiceprint recognition system and implementation method thereof
CN104717219B (en) * 2015-03-20 2017-03-15 百度在线网络技术(北京)有限公司 Vocal print login method and device based on artificial intelligence

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090287486A1 (en) * 2008-05-14 2009-11-19 At&T Intellectual Property, Lp Methods and Apparatus to Generate a Speech Recognition Library
CN102737634A (en) * 2012-05-29 2012-10-17 百度在线网络技术(北京)有限公司 Authentication method and device based on voice
CN104901808A (en) * 2015-04-14 2015-09-09 时代亿宝(北京)科技有限公司 Voiceprint authentication system and method based on time type dynamic password
CN105185379A (en) * 2015-06-17 2015-12-23 百度在线网络技术(北京)有限公司 Voiceprint authentication method and voiceprint authentication device
CN105096121A (en) * 2015-06-25 2015-11-25 百度在线网络技术(北京)有限公司 Voiceprint authentication method and device
CN105933272A (en) * 2015-12-30 2016-09-07 中国银联股份有限公司 Voiceprint recognition method capable of preventing recording attack, server, terminal, and system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109754817A (en) * 2017-11-02 2019-05-14 北京三星通信技术研究有限公司 signal processing method and terminal device
CN112365895A (en) * 2020-10-09 2021-02-12 深圳前海微众银行股份有限公司 Audio processing method and device, computing equipment and storage medium
CN112365895B (en) * 2020-10-09 2024-04-19 深圳前海微众银行股份有限公司 Audio processing method, device, computing equipment and storage medium

Also Published As

Publication number Publication date
CN105933272A (en) 2016-09-07

Similar Documents

Publication Publication Date Title
WO2017114307A1 (en) Voiceprint authentication method capable of preventing recording attack, server, terminal, and system
Wang et al. Voicepop: A pop noise based anti-spoofing system for voice authentication on smartphones
Zhang et al. Voicelive: A phoneme localization based liveness detection for voice authentication on smartphones
Gałka et al. Playback attack detection for text-dependent speaker verification over telephone channels
US9646614B2 (en) Fast, language-independent method for user authentication by voice
CN107104803B (en) User identity authentication method based on digital password and voiceprint joint confirmation
WO2017215558A1 (en) Voiceprint recognition method and device
US7447632B2 (en) Voice authentication system
US20180146370A1 (en) Method and apparatus for secured authentication using voice biometrics and watermarking
US20190013026A1 (en) System and method for efficient liveness detection
US11979398B2 (en) Privacy-preserving voiceprint authentication apparatus and method
WO2017162053A1 (en) Identity authentication method and device
CN102737634A (en) Authentication method and device based on voice
WO2018129869A1 (en) Voiceprint verification method and apparatus
US20210304783A1 (en) Voice conversion and verification
US11081115B2 (en) Speaker recognition
KR101754954B1 (en) Certification system and method using autograph and voice
Firc et al. The dawn of a text-dependent society: Deepfakes as a threat to speech verification systems
Nykytyuk et al. The Method of User Identification by Speech Signal.
Kuznetsov et al. Methods of countering speech synthesis attacks on voice biometric systems in banking
Mishra et al. Speaker identification, differentiation and verification using deep learning for human machine interface
CN108630207B (en) Speaker verification method and speaker verification apparatus
US20240126851A1 (en) Authentication system and method
Kadu et al. Voice Based Authentication System for Web Applications using Machine Learning.
RU2747935C2 (en) Method and system for user authentication using voice biometrics

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16881096

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16881096

Country of ref document: EP

Kind code of ref document: A1