CN108550368B - Voice data processing method - Google Patents

Voice data processing method Download PDF

Info

Publication number
CN108550368B
CN108550368B CN201810225485.2A CN201810225485A CN108550368B CN 108550368 B CN108550368 B CN 108550368B CN 201810225485 A CN201810225485 A CN 201810225485A CN 108550368 B CN108550368 B CN 108550368B
Authority
CN
China
Prior art keywords
voice
authentication
user
pos machine
character string
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810225485.2A
Other languages
Chinese (zh)
Other versions
CN108550368A (en
Inventor
李仁超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangzhou Helipay Payment Technology Co ltd
Original Assignee
Guangzhou Helipay Payment Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangzhou Helipay Payment Technology Co ltd filed Critical Guangzhou Helipay Payment Technology Co ltd
Priority to CN201810225485.2A priority Critical patent/CN108550368B/en
Publication of CN108550368A publication Critical patent/CN108550368A/en
Application granted granted Critical
Publication of CN108550368B publication Critical patent/CN108550368B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/08Payment architectures
    • G06Q20/20Point-of-sale [POS] network systems
    • G06Q20/206Point-of-sale [POS] network systems comprising security or operator identification provisions, e.g. password entry
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q20/00Payment architectures, schemes or protocols
    • G06Q20/38Payment protocols; Details thereof
    • G06Q20/40Authorisation, e.g. identification of payer or payee, verification of customer or shop credentials; Review and approval of payers, e.g. check credit lines or negative lists
    • G06Q20/401Transaction verification
    • G06Q20/4014Identity check for transactions
    • G06Q20/40145Biometric identity checks
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/22Interactive procedures; Man-machine interfaces
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3247Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials involving digital signatures
    • H04L9/3249Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials involving digital signatures using RSA or related signature schemes, e.g. Rabin scheme

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Computer Security & Cryptography (AREA)
  • Accounting & Taxation (AREA)
  • General Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • General Business, Economics & Management (AREA)
  • Finance (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Strategic Management (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)
  • Cash Registers Or Receiving Machines (AREA)

Abstract

The invention provides a method for processing voice data, which comprises the following steps: establishing connection between the intelligent POS machine and the payment platform through a safety channel; the intelligent POS machine client performs voice recognition on the user; and carrying out user identity authentication based on the voice recognition result. The invention provides a voice data processing method, which realizes local storage, comparison and operation of the identity authentication data of the intelligent POS machine terminal, does not need to configure hardware password equipment, does not need to upload the data to a payment platform, and has higher safety.

Description

Voice data processing method
Technical Field
The present invention relates to speech recognition, and more particularly, to a method for processing speech data.
Background
At present, network security of the point-of-sale terminal, especially security of the smart POS device, is attracting attention, and security issues of information transmission through the smart POS device are receiving increasing attention. In the current application of the intelligent POS machine, user authentication of a user name and a password is adopted, a digital certificate is issued to the intelligent POS machine user, and the identity safety of the user is enhanced by utilizing the non-exportability of a hardware password terminal private key. However, any hardware password device needs to be an external entity device outside the intelligent POS machine, so that the usability of the scheme is further reduced, and the operation complexity of the user is increased. For the fingerprint identification of the prior art, identification information needs to be transmitted, and the security is challenged. And if the characteristic library stored by the payment platform is lost, the identity authentication cannot be carried out.
Disclosure of Invention
In order to solve the problems in the prior art, the present invention provides a method for processing voice data, including:
establishing connection between the intelligent POS machine and the payment platform through a safety channel;
the intelligent POS machine client performs voice recognition on the user;
and carrying out user identity authentication based on the voice recognition result.
Preferably, the smart POS client performs voice recognition on the user, and further includes:
the method comprises the steps that an intelligent POS machine obtains a voice recognition request started in advance from a payment platform;
and judging whether the intelligent POS machine supports the voice recognition or not based on the recognition mode currently supported by the intelligent POS machine.
Preferably, if the judgment result is that voice recognition is supported, the intelligent POS machine client performs user identity verification by using the recognition result of the user voice;
alternatively, the first and second electrodes may be,
and screening out the available authentication of the current intelligent POS machine according to the started authentication request and the authentication mode supported by the current intelligent POS machine, and displaying the available authentication to the user for the user to select and verify.
Preferably, a user private key in an RSA key pair generated by an authentication module of the intelligent POS machine in a secure environment when voice recognition is started is used for encrypting the random number, and the encrypted value is returned to the payment platform;
the payment platform verifies the validity of the encrypted value by using a user public key stored after voice recognition is started;
after the authentication module of the intelligent POS machine completes the identity authentication, when the intelligent POS machine is used next time, the private key in the RSA key pair stored in the trusted storage block is directly called to encrypt the abstract, and the encrypted value is transmitted to the payment platform to be verified.
Compared with the prior art, the invention has the following advantages:
the invention provides a voice data processing method, which realizes local storage, comparison and operation of the identity authentication data of the intelligent POS machine terminal, does not need to configure hardware password equipment, does not need to upload the data to a payment platform, and has higher safety.
Drawings
Fig. 1 is a flowchart of a method for processing voice data according to an embodiment of the present invention.
Detailed Description
A detailed description of one or more embodiments of the invention is provided below along with accompanying figures that illustrate the principles of the invention. The invention is described in connection with such embodiments, but the invention is not limited to any embodiment. The scope of the invention is limited only by the claims and the invention encompasses numerous alternatives, modifications and equivalents. In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention. These details are provided for the purpose of example and the invention may be practiced according to the claims without some or all of these specific details.
One aspect of the present invention provides a method for processing voice data. Fig. 1 is a flowchart of a method for processing voice data according to an embodiment of the present invention.
The intelligent POS machine is connected with the payment platform through a safety channel. The intelligent POS machine obtains a voice recognition request started in advance from the payment platform. And judging whether the intelligent POS machine supports the voice recognition or not based on the recognition mode currently supported by the intelligent POS machine.
And if the voice recognition is supported, the intelligent POS machine client performs user identity verification by using the recognition result of the user voice.
If the verification is passed, the random number is encrypted by using a private key in an RSA secret key pair generated when the identity authentication is started to obtain a first encryption value, and the first encryption value is sent to the payment platform through the intelligent POS machine client, so that the payment platform can perform the identity authentication based on the first encryption value and a user public key obtained when the identity authentication is started.
In the user identity authentication process, the intelligent POS machine downloads an authentication request started by the current intelligent POS machine through the payment platform, a client of the intelligent POS machine is used for finding an identification mode supported by the current intelligent POS machine, and available authentication of the current intelligent POS machine is screened out according to the started authentication request and the authentication mode supported by the current intelligent POS machine and displayed to a user for selection and verification of the user.
After the user is verified, the random number is encrypted by adopting a user private key in an RSA secret key pair generated by an authentication module of the intelligent POS machine in a secure environment when voice recognition is started, and the encrypted value is returned to the payment platform. And the payment platform verifies the validity of the encrypted value by using the user public key stored after the voice recognition is started.
After the encrypted value is obtained, whether the identity authentication is successful is judged according to whether the encrypted value is valid, and if the encrypted value is valid, the identity authentication is successful; if the encrypted value is invalid, the identity authentication fails.
Before starting authentication, an intelligent POS machine and a payment platform are required to negotiate an identification mode, and the specific authentication starting process comprises the following steps:
the intelligent POS machine acquires a negotiated identification mode from the payment platform; enumerating the current supported recognition mode of the intelligent POS machine, and judging whether the intelligent POS machine supports voice recognition;
if the intelligent POS machine client supports the voice recognition, the intelligent POS machine client carries out user identity verification by using the voice recognition; if the user identity passes the verification, the authentication module generates an RSA key pair in a secure environment, and encrypts a user public key in the RSA key pair by using an authentication module private key in the intelligent POS machine to generate a second encrypted value;
then, the authentication module uploads the second encrypted value and the user public key encrypted by the authentication module private key to the payment platform through the intelligent POS machine client, so that the payment platform uses the authentication module public key to verify whether the second encrypted value is valid.
In the process, the intelligent POS machine client is used for finding the identification mode supported by the current intelligent POS machine, screening out available authentication according to the identification mode supported by the current intelligent POS machine and displaying the available authentication to the user, after the user is verified, an authentication module of the intelligent POS machine generates an RSA secret key, and the public key and the started authentication request are returned to the authentication management platform for storage.
After voice recognition is started, an RSA key pair is generated in a trusted storage block of the intelligent POS machine, a user public key in the RSA key pair is exported, and the user public key is transmitted to the payment platform through an encryption transmission protocol. When the intelligent POS machine is used next time, after the authentication module completes identity verification, the private key in the RSA key pair stored in the trusted storage block is directly called to encrypt the abstract, and the encrypted value is transmitted to the payment platform to be verified.
The method comprises the steps of receiving a voice recognition request sent by an intelligent POS machine client through an interface of a trusted storage block, creating a corresponding recognition process according to the received identity recognition request, and managing the authentication module and the voice acquisition module to jointly complete the recognition process by executing the recognition process.
Specifically, when the payment platform receives a voice recognition request sent by the intelligent POS client through the interface of the trusted storage block, the payment platform creates a recognition process according to the voice recognition request, and sends a call instruction to the authentication module by executing the recognition process.
And secondly, after receiving the call instruction sent by the payment platform, the authentication module determines to return a collection instruction for calling the voice collection module to the payment platform according to the call instruction. So that the payment platform forwards the acquisition instruction to the voice acquisition module according to the acquisition instruction.
And then, the voice acquisition module calls a voice input device of the intelligent POS machine to acquire the voice fragment through an interface of the trusted storage block according to an acquisition instruction forwarded by the payment platform, and returns the acquired voice fragment to the authentication module through the payment platform.
The authentication module receives the voice fragments collected by the voice collection module forwarded by the payment platform. If the calling instruction sent by the payment platform carries the identity information to be recognized, the authentication module can create an association relationship between the voice fragment and the identity information to be recognized, and return the voice fragment and the identity information to be recognized to the payment platform as the voice information to be recognized. Or the authentication module extracts the user voice feature template to be recognized corresponding to the voice fragment according to a preset algorithm, then establishes the association relationship between the user voice feature template to be recognized and the identity information to be recognized, and returns the user voice feature template to be recognized and the identity information to be recognized as the voice information to be recognized to the payment platform.
And when the calling instruction sent by the payment platform does not carry the identity information to be identified, the authentication module can directly return the voice fragment to the payment platform, or the authentication module can return the extracted user voice feature template to be identified to the payment platform. The payment platform receives the voice fragment to be recognized or the user voice feature template to be recognized, when the payment platform receives the voice information to be recognized, the payment platform encrypts the voice information to be recognized according to a safety rule preset with the payment platform and then returns the encrypted voice information to the intelligent POS machine client through the interface of the credible storage block, when the payment platform receives the voice fragment or the user voice feature template to be recognized, the payment platform can determine corresponding identity information to be recognized according to the calling service and further determine the voice information to be recognized, and the encrypted voice information to be recognized is returned to the intelligent POS machine client through the interface of the credible storage block.
In a preferred embodiment of the present invention, the verifying the user identity by the intelligent POS client using the recognition result of the user voice further includes: verifying the input voice, and generating a pair of public and private keys for a user ID (identity) logging in a bank card reading program after the verification is passed, wherein the private keys are safely stored in a trusted storage block of the intelligent POS machine; encrypting a public key of a user ID, the user ID and a voice characteristic sequence of a login user ID by using a terminal private key built in a trusted storage block of the intelligent POS machine;
the terminal private key is preset in a safe storage area of the equipment when the intelligent POS machine leaves a factory; the public and private key pair of each POS machine has uniqueness;
when the voice of the login user ID is encrypted, the feature sequence of the voice is encrypted, the voice information generates the feature sequence when being stored in a trusted storage block of the intelligent POS machine, the feature sequence generation rule can be generated according to any suitable audio database retrieval rule, and the voice fragment corresponding to the feature sequence has uniqueness.
And sending the public key, the user ID and the voice characteristic sequence which are encrypted by a terminal private key as an authentication request to a payment platform, so that the payment platform verifies the public key after receiving the authentication request, and stores the public key, the user ID and the voice characteristic sequence.
The terminal private key is preset in a safe storage area of the equipment when a trusted storage block of the intelligent POS machine leaves a factory, so that a terminal public key in a payment platform can be sent to the payment platform by the intelligent POS machine terminal in advance for storage or is directly stored in the payment platform, and the terminal public key and the terminal private key are identified through an equipment unique identifier;
after the payment platform receives the authentication request, the information contained in the authentication request is encrypted by a terminal private key of the intelligent POS machine terminal, so that the payment platform retrieves a terminal public key corresponding to the terminal private key through the encrypted information to finish verification; after the verification is passed, storing the public key, the user ID and the voice characteristic sequence in the authentication request; and the payment platform feeds back the identification result to a trusted storage block of the intelligent POS machine.
After the registration is finished, when the registered user ID logs in the bank card reading program again, voice is input for verification operation; and encrypting the user ID and the characteristic sequence of the voice by a private key of the user ID stored in a trusted memory block of the intelligent POS machine.
And sending the authentication request containing the user ID and the voice feature sequence to a payment platform so that the payment platform can verify after receiving the authentication request, and checking whether the voice feature sequence in the authentication request is consistent with the voice feature sequence corresponding to the user ID during registration to obtain an authentication result.
If the authentication is not passed, the trusted storage block of the intelligent POS machine sends a re-authentication request, the payment platform can add the voice feature sequence which is not passed through the authentication into the authentication record, and the authentication is that the voice feature sequence which is inconsistent with the authentication in the registration process has the authority of using the bank card reading program service.
And for the voice characteristic sequence inconsistent with the voice characteristic sequence in the authentication request, if an execution verification code capable of executing the bank card reading program service is provided in the initiated re-authentication request, the voice characteristic sequence in the authentication request is stored in the authentication record, and the identity authentication is completed.
Before matching recognition is carried out on the voice, pre-emphasis, filtering, windowing, framing and end point detection are required. Silence and speech are distinguished by short-term power and ZCR. Before detection, a threshold is determined for the short-time power and the ZCR, then the short-time power and the ZCR are continuously calculated, the threshold is adjusted, state analysis is carried out, and whether the mute section is finished or not is judged.
In end point detection, the frequency band is divided into 4 segments, and the power ratio SE of the sub-bands is calculated as follows:
Figure BDA0001601214620000071
wherein: u shapeiAnd LiRespectively representing the upper limit frequency and the lower limit frequency of a subband i, i being 1, 2, 3, 4; x (ω) represents the amplitude of the signal at frequency ω.
If the power ZCR of a frame signal is lower than the threshold and the SE of the 4 segments of sub-bands are approximately equal, the frame signal is judged to be a mute segment.
Preferably, the detection of the voice signal endpoint is realized by combining a neural network and a particle swarm algorithm:
1: setting hidden nodes of a one-dimensional neural network to comprise K multiplied by L theta and K lambda, and K multiplied by N theta and N lambda contained in output layer nodes, wherein K is the number of the hidden nodes, L is the number of input nodes, N is the number of the output layer nodes, and theta and lambda are respectively a phase rotation coefficient and a phase control factor; initializing the related parameters of the particle swarm and the one-dimensional neural network;
2: randomly selecting a section of signal containing a voice section and noise, inputting short-time power, a circulating average amplitude difference function and a frequency band variance as a one-dimensional neural network, marking the beginning and the end of each frame of signal as the output of the one-dimensional neural network, and completing the construction of a one-dimensional neural network training sample;
3: inputting a training sample into a one-dimensional neural network for training, and optimizing the one-dimensional neural network through a particle group to enable the output and ideal output values of the one-dimensional neural network to meet the pre-design requirements, thereby completing the training of the one-dimensional neural network; the specific optimization steps of the one-dimensional neural network parameters are as follows:
1) initializing parameters to be optimized and learned; designing the motion position and the velocity vector of the particles for optimization into a matrix, wherein a row represents each parameter to be learned, and a column represents the motion particles for optimization;
2) to compute the output | Y > n of the entire one-dimensional neural network, a fitness function is defined as follows:
Figure BDA0001601214620000081
i O > n represents the expression of target output of the nth output neuron, and I Y > n is the expression of actual output of the nth output neuron;
3) updating the current speed and position of each particle through a speed and position formula of the particle swarm; the current velocity update for particle i is simplified as follows:
vt+1 i=vt i+c1r2-c2xt i
the update of the current position of particle i is simplified as follows:
xt+1 i=xt i+vt+1 i
r1and r2Is between [0,1]Independent random number in between, c1And c2Represents an acceleration limiting factor, wherein c1For adjusting the step size of the particles travelling to the respective optimum position, c2For adjusting the step size of individual travel to the global particle optimal position.
4) Calculating and evaluating the fitness of each particle so as to update the extreme value of the individual and the extreme value of the global situation;
5) when the end condition is met, obtaining the optimal values of the parameters theta and lambda of the hidden layer and the output layer of the one-dimensional neural network; then, storing the parameters, and ending the optimization process; otherwise, turning to 3) to continue searching;
after the neural network training is finished, calculating an original training sample by using the trained one-dimensional neural network, outputting a detection result, if the output result is greater than a threshold value, considering a current frame as a voice frame, otherwise, judging the current frame as a non-voice frame, then comparing an actual output result with a marked signal voice frame, and if the one-dimensional neural network training effect is not good, retraining the one-dimensional neural network;
voice endpoint detection is carried out; and taking a section of voice signal, extracting the characteristic quantity of the voice signal, detecting the voice signal by adopting a trained one-dimensional neural network, and finally outputting a voice endpoint detection result.
After the end point detection is finished, the voice signal is divided into R equal-length non-overlapped frames, which are marked as fk={fk(n) | n ═ 1, 2, …, L/R; k ═ 1, 2, …, R }, where: l is the length of the voice signal; r is the total frame number; f. ofk(n) is the nth sample value of the kth frame.
After preprocessing, carrying out short-time Fourier transform on each frame of signal, and dividing sub-bands according to the following formula:
Bi=exp[(lgFmin+i(lgFmax-lgFmin)/M)]
wherein: i represents a subband number and takes the value of 1, 2, 3, …, M; m represents the number of sub-bands; fmin、FmaxFor the lower and upper limits of the auditory bandwidth, the bandwidth range of sub-band i is [ B ]i-1,Bi]. Calculating sub-band power B on each sub-bandiAnd obtaining M sub-band powers.
Calculating the dynamic change of the audio power by the power difference value between the adjacent frame and the adjacent sub-band:
E(k)n=e(k)n+1-e(k)n
dE(k)n=E(k+1)n-E(k)n
if dE (k)n≤0,F(k)n=0,
If dE (k)n>0,F(k)n=1,
Wherein: n-0, 1, 2, …, M-1, representing a sub-band number; k denotes a frame number.
Firstly, the power difference E (k) is made for the adjacent sub-bandsnThen, the difference value dE (k) is calculated for the differential power of the adjacent framesnA threshold value judgment is made to obtain a feature F (k)n
Will frequency range 0, fs/2]Dividing into N sub-bands, calculating the gravity center of the mth sub-band:
Figure BDA0001601214620000101
wherein: lm、hmLower and upper limiting frequencies for sub-bandsRate; p (f) is the band power at f;
then regularizing the center of gravity of the sub-band to make the value thereof not influenced by the selection of the sub-band, as follows:
NCm=[Cm-(hm+lm)]/2(hm-lm)。
wherein NC ismRegularizing the subband center of gravity.
Mapping the original table entry to the hash index table by using a parameterized hash index table, and giving a fingerprint F (k)nObtaining a hash index value:
H(F(k)n)=F(k)nMaxlen
wherein: maxlen is the size of the hash index table; h (F (k)n) The hash index value is a value of 0-Maxlen-1;
computing a kth frame speech signal fk(n) short-time ZCR calculation yields the power ratio per frame:
Ck=Bk/(Rk+b),
wherein b is an anti-overflow constant, RkA short-time ZCR for the kth frame;
vector H ═ H (f (k) for power ration)Ck|k=1,2,…,R}。
The hash sequence H is then encrypted out of order. Firstly, a pseudo-random sequence S ═ S with the same length as the hash sequence is generated1,s2,…,sR]The hash sequence is then rearranged according to the values of the pseudo-random number sequence, the encrypted sequence being h(s)i)=h(i),
Wherein: h (i) is 1 only when H (i) > H (i-1), otherwise H (i) is 0.
In the voice authentication process, the similarity of voice signals is measured by using the blackman distance, and theta is measured for two audio segments1And theta2,h1Is recorded as a speech signal theta1Hash index value of h2Is recorded as a speech signal theta2The hash index value of (a). D is denoted by h1And h2The regularized blackman distance D is the ratio of the number of error bits of the hash index value to the total number of bits, and a calculation formulaComprises the following steps:
Figure BDA0001601214620000111
if two audio segments theta1And theta2Are the same as each other, then
Figure BDA0001601214620000112
If two audio segments theta1And theta2Are not the same, then
Figure BDA0001601214620000113
Wherein
Figure BDA0001601214620000114
To identify an authentication threshold. Distance if
Figure BDA0001601214620000115
Then two audio segments theta are considered1And theta2The characteristics of the data are the same, and the authentication is passed; otherwise, the authentication is not passed.
In another preferred embodiment, the unregistered user may also register for the payment platform through a random voice string. Specifically, the payment platform generates a random character string and sends the random character string to the intelligent POS machine user; the user records the received random character string into voice and sends the voice to the payment platform; after the payment platform receives the voice of the user, extracting MFCC characteristics of the voice;
converting the voice into a character string text according to the MFCC characteristics of the voice, and if the obtained character string text is the same as the content of a pre-generated random character string, marking the section of voice as valid registration voice; otherwise, marking as invalid voice;
accordingly, in the verification phase: when an intelligent POS machine user sends an identity authentication request, a payment platform firstly generates a random character string and sends the random character string to the user, the user records the received random character string according to the sequence specified by the payment platform to obtain authentication voice, and the generated authentication voice is sent to the payment platform; if the user fails to input the voice within a certain duration, the current random character string is invalid, and the user authentication fails;
after receiving the authentication voice, the payment platform extracts the MFCC characteristics of the authentication voice; verifying whether the user characteristics of the authentication voice belong to the current user and whether the content conforms to the correct character string text, and respectively obtaining voice matching values S1And text matching value S2
Matching the speech to a value S1Matching the value S with the text2And obtaining a final score after weighting and summing, comparing with a set threshold value and judging: when the final score exceeds a set threshold value, the registered user of the authenticated voice from the intelligent POS machine is considered and the text content of the voice is correct, and the verification is passed; otherwise, the verification fails;
the final score is calculated as follows:
S=wS1+(1-w)S2
wherein S is the final score, w is the weight, 0< w <1
Wherein, the verifying whether the user characteristics of the authentication voice belong to the current user and whether the content conforms to the correct character string text further comprises:
constructing a first HMM in the order of the correct string text;
according to the MFCC features of the authentication voice and the first HMM, obtaining a mapping between the MFCC features of the authentication voice and the first HMM state by adopting a Viterbi algorithm, so that:
Φ*t=argmaxΦp(Xt|H,Φt)
in the formula, XtMFCC feature set { x ] for authenticated Voicet(1),xt(2),...,xt(Nt)},NtFor the total number of the authentication voice features, the subscript t represents the authentication voice segment, H is the first HMM, ΦtTo authenticate the mapping of voice MFCC features to HMM states, p (X)t|H,Φt) Representing authenticated Voice MFCC feature set XtIn the first HMM and the state correspondence mode ΦtOverall likelihood probability of [ phi ]tBetween the authenticated speech MFCC features found for the Viterbi Algorithm and the first HMM stateOptimal mapping of (2);
according to the mapping between the MFCC features of the authentication voice and the first HMM state, the mapping between the MFCC features of the authentication voice and each character is further obtained, and the log-likelihood ratio of the authentication voice in the GMM model of the specific user voice and the GMM model of the general user voice is calculated and used as the voice matching value S1(ii) a Speech match value S1The calculation expression of (a) is as follows:
Figure BDA0001601214620000121
in the formula, xt(N) is the nth frame MFCC feature for authenticated speech, Nt 1Representing the number of MFCC features corresponding to all character texts in the authentication speech, d (n) representing the characters corresponding to the MFCC features of the nth frame of the authentication speech under the condition of correct character string text, and Λ0 d(n)And Λd(n)The characters d (n) correspond to a specific user GMM model and a general GMM model, p (x)t(n)|Λ0 d(n)And p (x)t(n)|Λd(n)) Xt (n) is the overall likelihood probability of the two GMM models;
identifying the character string content of the authentication voice, and taking the character string content obtained by verification as an optimal character string; constructing a second HMM using the generic GMM model according to the optimal string;
obtaining the mapping between the MFCC characteristics of the authentication voice and the second HMM state by adopting a Viterbi algorithm, and further obtaining the mapping between the MFCC characteristics of the authentication voice and each character;
according to the obtained mapping of the MFCC features of the authentication voice under the correct character string text and the optimal character string and each character, calculating the log-likelihood ratio on the GMM model and the general GMM model of the specific user voice of the authentication voice as a text matching value S2(ii) a Text matching value S2The calculation expression of (a) is as follows:
Figure BDA0001601214620000131
in the formula (I), the compound is shown in the specification,
Figure BDA0001601214620000132
representing the number of MFCC features corresponding to the optimal character text in the speech for authentication, d2(n) the characters corresponding to the MFCC features of the nth frame of the authentication voice under the condition of the optimal character string,
Figure BDA0001601214620000133
is d2(n) a corresponding generic GMM model,
Figure BDA0001601214620000134
is xt(n) at d2(n) overall likelihood probability over the generic GMM model.
To eliminate the effect of channel mismatch, in estimating the user identification model, modeling is performed simultaneously in the user identification space and the channel space based on factor analysis. I.e. a piece of speech is represented by a complex vector, i.e. the speech space may consist of complex vectors of users and channels.
The complex vector M is represented by the following formula:
M=s+c
s=m+Vy+Dz
c=Ux
where s is the user feature space vector, c is the channel space vector, m is the generic GMM vector, V, D and U are the space matrices. The component of the vector x serves as the channel factor, the component of y serves as the user identification factor, and the component of z is called the residual factor. The process of factor analysis is completed by estimating the matrix of the space, building a user identification model and testing.
In the spatial matrix estimation process, a speech output user and a speech feature vector x are given1,x2,…,xTGet the following:
Figure BDA0001601214620000141
Figure BDA0001601214620000142
Figure BDA0001601214620000143
wherein m iscRepresenting the mean subvector, x, of the channel ct,γt(c) Is the state probability, N, of each GMM functionc(s),Fc(s),Sc(s) are statistics of user s at zero, first, and second order, respectively, on the c-th GMM.
Then, the statistics are spliced: n is a radical ofc(s) diagonal matrix N(s), F spliced to CF × CFc(S) are concatenated into a CF × 1, column vector F (S), Sc(s) the diagonal matrix S(s) CF is spliced, CF being the dimension of the generic GMM vector.
And then calculating the intermediate variable of each user:
L(s)=VTΨ-1N(s)V,
where Ψ is the covariance matrix of the generic GMM;
calculating first and second order expectation values for the user identification factor y(s) using l(s):
E[y(s)]=L-1(s)VTΨ-1F(s),
E[y(s)y-1(s)]=E[y(s)]E[yT(s)]+L-1(s)
n(s), F(s), S(s) are statistics of zero order, first order and second order of the characteristic space vector of the user s respectively;
updating the user identification space matrix V and the covariance matrix Ψ s
V=ΨsF(s)E[yT(s)]/(∑sN(s)E[y(s)yT(s)]),
Ψnew=[ΨsN(s)]-1sS(s)-diag{ΨsF(s)E[yT(s)]VT}}。
In summary, the invention provides a processing method of voice data, which realizes local storage, comparison and operation of the intelligent POS terminal identity authentication data, does not need to configure hardware password equipment, does not need to upload to a payment platform, and is more secure.
It will be apparent to those skilled in the art that the modules or steps of the present invention described above may be implemented in a general purpose computing system, centralized on a single computing system, or distributed across a network of computing systems, and optionally implemented in program code that is executable by the computing system, such that the program code is stored in a storage system and executed by the computing system. Thus, the present invention is not limited to any specific combination of hardware and software.
It should be understood that the above-described embodiments of the present invention are merely illustrative of or explaining the principles of the invention and are not to be construed as limiting the invention. Therefore, any modification, equivalent replacement, improvement and the like made without departing from the spirit and scope of the present invention should be included in the protection scope of the present invention. Further, it is intended that the appended claims cover all such variations and modifications as fall within the scope and boundaries of the appended claims or the equivalents of such scope and boundaries.

Claims (4)

1. A method for processing voice data, comprising:
establishing connection between the intelligent POS machine and the payment platform through a safety channel;
the intelligent POS machine client performs voice recognition on the user;
performing user identity authentication based on the voice recognition result;
the payment platform generates a random character string and sends the random character string to the intelligent POS machine user; the user records the received random character string into voice and sends the voice to the payment platform; after the payment platform receives the voice of the user, extracting MFCC characteristics of the voice;
converting the voice into a character string text according to the MFCC characteristics of the voice, and if the obtained character string text is the same as the content of a pre-generated random character string, marking the section of voice as valid registration voice; otherwise, marking as invalid voice;
in the verification phase: when an intelligent POS machine user sends an identity authentication request, a payment platform firstly generates a random character string and sends the random character string to the user, the user records the received random character string according to the sequence specified by the payment platform to obtain authentication voice, and the generated authentication voice is sent to the payment platform; if the user fails to input the voice within a certain duration, the current random character string is invalid, and the user authentication fails;
after receiving the authentication voice, the payment platform extracts the MFCC characteristics of the authentication voice; verifying whether the user characteristics of the authentication voice belong to the current user and whether the content conforms to the correct character string text, and respectively obtaining voice matching values S1And text matching value S2
Matching the speech to a value S1Matching the value S with the text2And obtaining a final score after weighting and summing, comparing with a set threshold value and judging: when the final score exceeds a set threshold value, the registered user of the authenticated voice from the intelligent POS machine is considered to be authenticated, and the text content of the voice is correct, and the authentication is passed; otherwise, the verification fails;
the final score is calculated as follows:
S=wS1+(1-w)S2
wherein S is the final score, w is the weight, 0< w < 1;
verifying whether the user characteristics of the authentication voice belong to the current user and whether the content conforms to the correct character string text, further comprising:
constructing a first HMM in the order of the correct string text;
according to the MFCC features of the authentication voice and the first HMM, obtaining a mapping between the MFCC features of the authentication voice and the first HMM state by adopting a Viterbi algorithm, so that:
Φ*t=argmaxΦp(Xt|H,Φt)
in the formula, XtMFCC feature set { x ] for authenticated Voicet(1),xt(2),...,xt(Nt)},NtFor the total number of the authentication voice features, the subscript t represents the authentication voice segment, H is the first HMM, ΦtTo authenticate the mapping of voice MFCC features to HMM states, p (X)t|H,Φt) Representing authentication wordsSound MFCC feature set XtIn the first HMM and state correspondence mode ΦtOverall likelihood probability of [ phi ]tAn optimal mapping between the MFCC features of the authenticated speech found for the Viterbi algorithm and the first HMM state;
according to the mapping between the MFCC feature of the authentication voice and the first HMM state, the mapping between the MFCC feature of the authentication voice and each character is further obtained, and the log-likelihood ratio of the authentication voice in the GMM model of the specific user voice and the general GMM model is calculated as the voice matching value S1(ii) a Speech match value S1The calculation expression of (c) is as follows:
Figure FDA0003422091520000021
in the formula, xt(N) is the nth frame MFCC feature for authenticated speech, Nt 1Representing the number of MFCC features corresponding to all character texts in the authentication speech, d (n) representing the characters corresponding to the MFCC features of the nth frame of the authentication speech under the condition of correct character string text, and Λ0 d(n)And Λd(n)The characters d (n) correspond to a specific user GMM model and a general GMM model, p (x)t(n)|Λ0 d(n)And p (x)t(n)|Λd(n)) Are respectively xt(n) overall likelihood probabilities at both GMM models;
identifying the character string content of the authentication voice, and taking the character string content obtained by verification as an optimal character string; constructing a second HMM using the generic GMM model according to the optimal string;
obtaining the mapping between the MFCC characteristics of the authentication voice and the second HMM state by adopting a Viterbi algorithm, and further obtaining the mapping between the MFCC characteristics of the authentication voice and each character;
according to the obtained mapping of the MFCC characteristics of the authentication voice under the correct character string text and the optimal character string and each character, calculating the log-likelihood ratio on the GMM model and the general GMM model of the specific user voice of the authentication voice as a text matching value S2(ii) a Text matching value S2The calculation expression of (c) is as follows:
Figure FDA0003422091520000031
in the formula (I), the compound is shown in the specification,
Figure FDA0003422091520000032
representing the number of MFCC features corresponding to the optimal character text in the speech for authentication, d2(n) is the character corresponding to the MFCC feature of the nth frame of the authentication voice under the condition of the optimal character string,
Figure FDA0003422091520000033
is d2(n) a corresponding generic GMM model,
Figure FDA0003422091520000034
is xt(n) at d2(n) overall likelihood probability over the generic GMM model.
2. The method of claim 1, wherein the smart POS client performs voice recognition on the user, further comprising:
the intelligent POS machine acquires a voice recognition request started in advance from the payment platform;
and judging whether the intelligent POS machine supports the voice recognition or not based on the recognition mode currently supported by the intelligent POS machine.
3. The method of claim 2, further comprising:
if the judgment result is that voice recognition is supported, the intelligent POS machine client performs user identity verification by using the recognition result of the user voice;
alternatively, the first and second electrodes may be,
and screening out the available authentication of the current intelligent POS machine according to the started authentication request and the authentication mode supported by the current intelligent POS machine, and displaying the available authentication to the user for the user to select and verify.
4. The method of claim 1, further comprising:
encrypting the random number by adopting a user private key in an RSA key pair generated by an authentication module of the intelligent POS machine in a secure environment when the voice recognition is started, and returning an encrypted value to the payment platform;
the payment platform verifies the validity of the encrypted value by using a user public key stored after voice recognition is started;
after the authentication module of the intelligent POS machine completes the identity authentication, when the intelligent POS machine is used next time, the private key in the RSA key pair stored in the trusted storage block is directly called to encrypt the abstract, and the encrypted value is transmitted to the payment platform to be verified.
CN201810225485.2A 2018-03-19 2018-03-19 Voice data processing method Active CN108550368B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810225485.2A CN108550368B (en) 2018-03-19 2018-03-19 Voice data processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810225485.2A CN108550368B (en) 2018-03-19 2018-03-19 Voice data processing method

Publications (2)

Publication Number Publication Date
CN108550368A CN108550368A (en) 2018-09-18
CN108550368B true CN108550368B (en) 2022-05-31

Family

ID=63516562

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810225485.2A Active CN108550368B (en) 2018-03-19 2018-03-19 Voice data processing method

Country Status (1)

Country Link
CN (1) CN108550368B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111369986A (en) * 2018-12-26 2020-07-03 成都启英泰伦科技有限公司 Intelligent safe voice transmission system and method
CN113495715A (en) * 2020-04-08 2021-10-12 北京意锐新创科技有限公司 Voice issuing method and device suitable for payment equipment management and control platform

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
IL176262A0 (en) * 2006-06-12 2006-10-05 Cidway Technologies Ltd Secure and friendly payment system
US20090307140A1 (en) * 2008-06-06 2009-12-10 Upendra Mardikar Mobile device over-the-air (ota) registration and point-of-sale (pos) payment
CN104700261B (en) * 2013-12-10 2018-11-27 中国银联股份有限公司 The safe networking initial method and its system of POS terminal
CN104392353A (en) * 2014-10-08 2015-03-04 无锡指网生物识别科技有限公司 Payment method and system of voice recognition terminal

Also Published As

Publication number Publication date
CN108550368A (en) 2018-09-18

Similar Documents

Publication Publication Date Title
US11545155B2 (en) System and method for speaker recognition on mobile devices
US11847199B2 (en) Remote usage of locally stored biometric authentication data
WO2018166187A1 (en) Server, identity verification method and system, and a computer-readable storage medium
US8384516B2 (en) System and method for radio frequency identifier voice signature
Monrose et al. Using voice to generate cryptographic keys
US20080256613A1 (en) Voice print identification portal
WO2016015687A1 (en) Voiceprint verification method and device
CN110169014A (en) Device, method and computer program product for certification
JP2016511475A (en) Method and system for distinguishing humans from machines
CN110659468B (en) File encryption and decryption system based on C/S architecture and speaker identification technology
US9106422B2 (en) System and method for personalized security signature
US20060229879A1 (en) Voiceprint identification system for e-commerce
CN108550368B (en) Voice data processing method
CN108416592B (en) High-speed voice recognition method
Nagakrishnan et al. A robust cryptosystem to enhance the security in speech based person authentication
CN111710340A (en) Method, device, server and storage medium for identifying user identity based on voice
Zhang et al. Volere: Leakage resilient user authentication based on personal voice challenges
KR101424962B1 (en) Authentication system and method based by voice
CN108447491B (en) Intelligent voice recognition method
KR20010110964A (en) The method for verifying users by using voice recognition on the internet and the system thereof
Duraibi et al. Suitability of Voice Recognition Within the IoT Environment
Nagakrishnan et al. Novel secured speech communication for person authentication
Yang Security in voice authentication
Aloufi et al. On-Device Voice Authentication with Paralinguistic Privacy
ABDUL-HASSAN et al. CENTRAL INTELLIGENT BIOMETRIC AUTHENTICATION BASED ON VOICE RECOGNITION AND FUZZY LOGIC

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
TA01 Transfer of patent application right

Effective date of registration: 20220509

Address after: 510000 room 3201, No. 2, Huitong Second Street, Hengli Town, Nansha District, Guangzhou City, Guangdong Province

Applicant after: GUANGZHOU HELIPAY PAYMENT TECHNOLOGY Co.,Ltd.

Address before: No.11, 10th floor, building 1, NO.666, Jitai Road, high tech Zone, Chengdu, Sichuan 610000

Applicant before: CHENGDU CINDA OUTWIT TECHNOLOGY CO.,LTD.

TA01 Transfer of patent application right
GR01 Patent grant
GR01 Patent grant