CN107517207A

CN107517207A - Server, auth method and computer-readable recording medium

Info

Publication number: CN107517207A
Application number: CN201710715433.9A
Authority: CN
Inventors: 王健宗; 查高密; 程宁; 肖京
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2017-03-13
Filing date: 2017-08-20
Publication date: 2017-12-26
Also published as: CN107068154A; TW201833810A; WO2018166187A1; WO2018166112A1; TWI641965B

Abstract

The present invention relates to a kind of server, auth method and computer-readable recording medium, server includes memory and the processor being connected with memory, the authentication system that can be run on a processor is stored with memory, following steps are realized when authentication system is executed by processor：After authentication request is received, send voice to the client at random and obtain text；The password voice that the user that client is sent reports is received, identifies code characters corresponding to password voice；If code characters standard cipher character corresponding with voice acquisition text is consistent, then build the current vocal print feature vector of the password voice, and the standard vocal print feature vector according to corresponding to determining predetermined mapping relations, it is vectorial the distance between with identified standard vocal print feature vector that current vocal print feature is calculated using predetermined distance calculation formula, and authentication is carried out to user according to distance.The present invention can improve the security of authentication.

Description

Server, auth method and computer-readable recording medium

Technical field

The present invention relates to communication technical field, more particularly to a kind of server, auth method and computer-readable deposit Storage media.

Background technology

At present, the scope of business of large-scale financing corporation is related to multiple business such as insurance, bank, investment, each business Category is generally required for same client to be linked up, and the mode of communication has a variety of (such as telephonic communication or communications etc. face-to-face). Before being linked up, checking is carried out as the important component of service security is ensured to the identity of client.

In order to meet the real-time demand of business, financing corporation is much analyzed the identity of client using manual type Checking, but because customer group is huge, carries out discriminant analysis by artificial in a manner of the identity to verifying client accuracy is not Height, efficiency is also low, and in order to solve this problem, in other existing schemes, financing corporation is also carried out using a kind of vocal print scheme Authentication, but this kind of scheme can not exclude criminal and pass through vocal print authentication using false recording, have necessarily Safety risks.

The content of the invention

It is an object of the invention to provide a kind of server, auth method and computer-readable recording medium, it is intended to Improve the security of authentication.

To achieve the above object, the present invention provides a kind of server, the server include memory and with the storage The processor of device connection, is stored with the authentication system that can be run on the processor, the identity in the memory Following steps are realized when checking system is by the computing device：

S1, after the authentication request of carrying identity of client transmission is received, sent at random to the client Text is obtained for the voice of user response；

S2, the password voice that client obtains user's report that text is sent based on the voice is received, and to described close Code voice carries out character recognition, identifies code characters corresponding to the password voice；

S3, if code characters standard cipher character corresponding with voice acquisition text is consistent, build the password The current vocal print feature vector of voice, and determine to be somebody's turn to do according to the mapping relations of predetermined identity and standard vocal print feature vector Standard vocal print feature vector corresponding to the identity of user, it is special to calculate current vocal print using predetermined distance calculation formula The distance between vectorial and identified standard vocal print feature vector is levied, authentication is carried out to user according to the distance.

Preferably, the step S2 includes：

The password voice that the user that client is sent reports is received, analyzes whether the password voice can use, if described close Code voice is unavailable, then the recording for prompting client to re-start password voice, or, it is right if the password voice can use The password voice carries out character recognition.

Preferably, when the authentication system is by the computing device, following steps are also realized：

If code characters standard cipher character corresponding with voice acquisition text is inconsistent, again at random to this Client sends the voice acquisition text for user response；

The voice for adding up to send to client obtains the number of text, if the number is more than or equal to preset times, eventually Only to the response of the authentication request.

Preferably, the step of the current vocal print feature vector for building the password voice includes：

The password voice is handled using Predetermined filter to carry out the extraction of preset kind vocal print feature, and base Vocal print feature vector corresponding to the password voice is built in the preset kind vocal print feature of extraction；

It is special to construct the current vocal print by the background channel model of the vocal print feature vector input training in advance of structure Sign vector；

It is described to utilize the current vocal print feature of predetermined distance calculation formula calculating vectorial and identified standard vocal print The distance between characteristic vector, the step of user's progress authentication, is included according to the distance：

Calculate the current COS distance between vocal print discriminant vectorses and identified standard vocal print feature vector： It is vectorial for the standard vocal print feature,For current vocal print feature vector；

If the COS distance is less than or equal to default distance threshold, authentication passes through；

If the COS distance is more than default distance threshold, authentication does not pass through.

To achieve the above object, the present invention also provides a kind of server, and the server includes memory and deposited with described The processor of reservoir connection, is stored with the identity based on Application on Voiceprint Recognition that can be run on the processor in the memory and tests The system of card, following steps are realized when the system of the authentication based on Application on Voiceprint Recognition is by the computing device：

S101, after the speech data for the user for carrying out authentication is received, the vocal print for obtaining the speech data is special Sign, and based on vocal print feature vector corresponding to vocal print feature structure；

S102, by the background channel model of vocal print feature vector input training in advance generation, the predicate to construct Current vocal print discriminant vectorses corresponding to sound data；

S103, calculate the sky between the current vocal print discriminant vectorses and the standard vocal print discriminant vectorses of the user to prestore Between distance, authentication is carried out to the user based on the distance, and generates the result.

To achieve the above object, the present invention also provides a kind of auth method, and the auth method includes：

Preferably, the step S2 includes：

Preferably, also include after the step S2：

Preferably, the background channel model is gauss hybrid models, and the training background channel model includes：

The speech data sample of predetermined number is obtained, and obtains vocal print feature corresponding to each speech data sample, and is based on Vocal print feature corresponding to each speech data sample builds vocal print feature vector corresponding to each speech data sample；

Vocal print feature vector corresponding to each speech data sample is divided into the training set of the first ratio and testing for the second ratio Card collection, first ratio and the second ratio and less than or equal to 1；

Gauss hybrid models are trained using the vocal print feature vector in the training set, and after the completion of training, Verified using the accuracy rate of the gauss hybrid models after the checking set pair training；

If the accuracy rate is more than predetermined threshold value, model training terminates, and institute is used as using the gauss hybrid models after training Background channel model is stated, or, if the accuracy rate is less than or equal to predetermined threshold value, increase the number of the speech data sample Amount, and training is re-started based on the speech data sample after increase.

The present invention also provides a kind of computer-readable recording medium, and identity is stored with the computer-readable recording medium Checking system, the authentication system realizes above-mentioned auth method when being executed by processor the step of.

The beneficial effects of the invention are as follows：If other people carry out authentication using false recording that is existing or being ready for, due to The voice of transmission obtains the randomness of text, then the obtained code characters identified should differ with corresponding standard cipher character Cause, can so prevent other people from carrying out authentication using false recording that is existing or being ready for；If other people record oneself Sound carries out authentication, then can not be verified by vocal print feature afterwards.Therefore, the present embodiment is equivalent to carrying out identity twice Checking, there is the effect of double verification, while the accuracy rate and efficiency of subscriber authentication is ensured, improve authentication Security.

Brief description of the drawings

Fig. 1 is each optional application environment schematic diagram of embodiment one of the present invention；

Fig. 2 is the schematic flow sheet of the embodiment of auth method one of the present invention.

Embodiment

In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that specific embodiment described herein is only to explain the present invention, not For limiting the present invention.Based on the embodiment in the present invention, those of ordinary skill in the art are not before creative work is made The every other embodiment obtained is put, belongs to the scope of protection of the invention.

It should be noted that the description for being related to " first ", " second " etc. in the present invention is only used for describing purpose, and can not It is interpreted as indicating or implies its relative importance or imply the quantity of the technical characteristic indicated by indicating.Thus, define " the One ", at least one this feature can be expressed or be implicitly included to the feature of " second ".In addition, the skill between each embodiment Art scheme can be combined with each other, but must can be implemented as basis with those of ordinary skill in the art, when technical scheme With reference to occurring conflicting or will be understood that the combination of this technical scheme is not present when can not realize, also not in application claims Protection domain within.

As shown in fig.1, it is the application environment schematic diagram of the preferred embodiment of auth method of the present invention.This applies ring Border schematic diagram includes server 1 and terminal device 2.Server 1 can by network, near-field communication technology etc. be adapted to technology with Terminal device 2 carries out data interaction.

Client for sending from authentication request to server 1 is installed on terminal device 2, terminal device 2 includes, But it is not limited to, any one can enter pedestrian with user by modes such as keyboard, mouse, remote control, touch pad or voice-operated devices The electronic product of machine interaction, for example, personal computer, tablet personal computer, smart mobile phone, personal digital assistant (Personal Digital Assistant, PDA), game machine, IPTV (Internet Protocol Television, IPTV), the movable equipment of intellectual Wearable, guider etc., or such as digital TV, desktop computer, pen The fixed terminal of note sheet, server etc..

The server 1 be it is a kind of can according to the instruction for being previously set or storing, it is automatic carry out numerical computations and/or The equipment of information processing.The server 1 can be computer, can also be single network server, multiple webservers The server group of the composition either cloud being made up of a large amount of main frames or the webserver based on cloud computing, wherein cloud computing is point One kind that cloth calculates, a super virtual computer being made up of the computer collection of a group loose couplings.

In the present embodiment, server 1 may include, but be not limited only to, and depositing for connection can be in communication with each other by system bus Reservoir 11, processor 12, network interface 13, memory 11 are stored with the authentication system that can be run on the processor 12.Need It is noted that Fig. 1 illustrate only the server 1 with component 11-13, it should be understood that being not required for implementing to own The component shown, what can be substituted implements more or less components.

Wherein, memory 11 includes internal memory and the readable storage medium storing program for executing of at least one type.Inside save as the operation of server 1 Caching is provided；Readable storage medium storing program for executing can be if flash memory, hard disk, multimedia card, card-type memory are (for example, SD or DX memories Deng), random access storage device (RAM), static random-access memory (SRAM), read-only storage (ROM), electric erasable can compile Journey read-only storage (EEPROM), programmable read only memory (PROM), magnetic storage, disk, CD etc. it is non-volatile Storage medium.In certain embodiments, readable storage medium storing program for executing can be the internal storage unit of server 1, such as the server 1 Hard disk；In further embodiments, the non-volatile memory medium can also be the External memory equipment of server 1, such as The plug-in type hard disk being equipped with server 1, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card (Flash Card) etc..In the present embodiment, the readable storage medium storing program for executing of memory 11 is generally used for Storage is installed on the operating system and types of applications software of server 1, such as the authentication system in one embodiment of the invention Program code etc..In addition, memory 11 can be also used for temporarily storing the Various types of data that has exported or will export.

The processor 12 can be in certain embodiments central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor or other data processing chips.The processor 12 is generally used for controlling the clothes The overall operation of business device 1, such as perform the control and processing related to the terminal device 2 progress data interaction or communication Deng.In the present embodiment, the processor 12 is used to run the program code stored in the memory 11 or processing data, example Such as run authentication system.

The network interface 13 may include radio network interface or wired network interface, and the network interface 13 is generally used for Communication connection is established between the server 1 and other electronic equipments.In the present embodiment, network interface 13 is mainly used in servicing Device 1 is connected with one or more terminal devices 2, and data transfer is established between server 1 and one or more terminal devices 2 and is led to Road and communication connection.

The authentication system is stored in memory 11, including at least one computer being stored in memory 11 Readable instruction, at least one computer-readable instruction can be performed by processor device 12, to realize the side of each embodiment of the application Method；And at least one computer-readable instruction is different according to the function that its each several part is realized, can be divided into different patrol Collect module.

In one embodiment, following steps are realized when above-mentioned authentication system is performed by the processor 12：

Step S1, after the authentication request of carrying identity of client transmission is received, at random to the client The voice sent for user response obtains text；

Wherein, user is operated on the client, and the authentication request for carrying identity, clothes are sent to server After business device receives the authentication request, the voice acquisition text for user response is sent to client at random.

Wherein, identity can be the identification card number of user or the phone number etc. of user；For user response Voice, which obtains text, to be had a variety of, and server sends one kind therein to client at random, it is therefore intended that prevents other people using existing False recording carry out authentication.It can need text corresponding to the random cipher of voice recording that the voice, which obtains text, Or can be the text of the enquirement for the random cipher for needing voice recording.For example, voice obtains text as that " please record a string Digital * * * ", user according to the voice obtain text responded when record " string number * * * " voice please be record, and for example, Voice obtains text to put question to text " where is your birthplace ", and user obtains when text is responded according to the voice and recorded " my birthplace is in * * * ".

Step S2, the password voice that client obtains user's report that text is sent based on the voice is received, and to institute State password voice and carry out character recognition, identify code characters corresponding to the password voice；

In the present embodiment, user can be in the mode of the client recording password voice：User obtains text according to voice This, after user presses predetermined physical button or virtual key, control sound recording unit carries out voice recording, After user discharges the button, stop voice recording, the voice recorded is sent to server as password voice.

Wherein, when carrying out password voice recording, should try one's best the interference for preventing ambient noise and voice recording equipment.Voice Recording arrangement keeps suitable distance with user, and does not have to the big voice recording equipment of distortion as far as possible, and power supply preferably uses civil power, and Keep electric current stable；Sensor should be used when carrying out telephonograph.

After server receives the password voice, character recognition is carried out to the password voice, i.e., is converted into password voice Character one by one, wherein it is possible to which password voice directly is converted into character, noise treatment can be carried out to password voice, Disturbed with further reduce.In order to extract to obtain the vocal print feature of password voice, the password voice recorded is present count According to the speech data of length, or it is the speech data more than preset data length.

Step S3, if code characters standard cipher character corresponding with voice acquisition text is consistent, structure should The current vocal print feature vector of password voice, and it is true according to predetermined identity and the mapping relations of standard vocal print feature vector Standard vocal print feature vector corresponding to the identity of the fixed user, current sound is calculated using predetermined distance calculation formula The distance between line characteristic vector and identified standard vocal print feature vector, identity is carried out to user according to the distance and tested Card.

In the present embodiment, voice, which obtains text, to be had a variety of, and the standard cipher character to be prestored on server also has a variety of, voice Text is obtained to correspond with standard cipher character respectively.After code characters corresponding to password voice are identified, acquisition and institute The voice of transmission obtains standard cipher character corresponding to text, and the obtained code characters and corresponding standard that judgement is identified are close Whether code character is consistent.

If the obtained code characters identified should be consistent with corresponding standard cipher character, it is close further to build this The current vocal print feature vector of code voice.Wherein, vocal print feature includes polytype, such as broadband vocal print, arrowband vocal print, amplitude Vocal print etc., the vocal print feature of the present embodiment is preferably mel-frequency cepstrum coefficient (the Mel Frequency of speech data Cepstrum Coefficient, MFCC).In vocal print feature vector corresponding to structure, by the vocal print feature group of password voice Into characteristic matrix, this feature data matrix is the vocal print feature vector of password voice.

Vector has a variety of, including COS distance and Euclidean distance etc. with the distance between vector, it is preferable that the present embodiment Current vocal print feature it is vectorial the distance between with identified standard vocal print feature vector be COS distance, COS distance is sharp Measurement by the use of two vectorial angle cosine values in vector space as the size for weighing two interindividual variations.

Wherein, standard vocal print feature vector is vectorial for the vocal print feature prestored.Before distance is calculated, marked according to user Know standard vocal print feature vector corresponding to obtaining.

Wherein, when the distance being calculated is less than or equal to pre-determined distance threshold value, it is verified, conversely, then authentication failed.

Compared with prior art, if other people carry out authentication using false recording that is existing or being ready for, due to sending Voice obtain text randomness, then the obtained code characters identified should be inconsistent with corresponding standard cipher character, It can so prevent other people from carrying out authentication using false recording that is existing or being ready for；If other people record the sound of oneself Authentication is carried out, then can not be verified by vocal print feature afterwards.Therefore, the present embodiment is tested equivalent to identity twice is carried out Card, there is the effect of double verification, while the accuracy rate and efficiency of subscriber authentication is ensured, improve the peace of authentication Quan Xing.

In a preferred embodiment, in order to prevent the audio quality of password voice from influenceing the result of vocal print feature checking, On the basis of above-mentioned Fig. 1 embodiment, the step S2 includes：The password voice that the user that client is sent reports is received, Analyze whether the password voice can use, if the password voice is unavailable, prompt client to re-start password voice Record, or, if the password voice can use, character recognition is carried out to the password voice.

Wherein, it is based on following analyses that whether password voice is available：Whether analysis user speaks part duration more than pre- If whether the background noise volume of duration, password voice is less than the first default volume and/or speaking volume is more than the second default sound Amount, the password voice can use if the analysis result in above-mentioned is satisfied by, and can perform the operation such as follow-up character recognition；Instead It, if user speaks, part duration is less than preset duration, or the background noise volume of password voice is more than or equal to the first default sound Amount, or speaking volume are less than or equal to the second default volume, then the password voice is unavailable, now, prompt client to re-start The recording of password voice.

In a preferred embodiment, when the authentication system is by the computing device, following steps are also realized： If code characters standard cipher character corresponding with voice acquisition text is inconsistent, sent out at random to the client again The voice for user response is sent to obtain text；The voice for adding up to send to client obtains the number of text, if the number is big In equal to preset times, then the response to the authentication request is terminated.

If user recorded the password voice of mistake, i.e. code characters standard cipher word corresponding with voice acquisition text When according with inconsistent, the chance for sending the voice acquisition text for user response to the client at random again can be provided, meanwhile, In order to prevent excessive password authentification from wasting computer resource, the number that can limit password authentification is less than preset times, i.e., tired The number for counting the voice acquisition text sent to client is less than preset times, and whole when the number is more than or equal to preset times Only to the response of authentication request.

In a preferred embodiment, on the basis of above-described embodiment, the password voice is built in above-mentioned steps S3 The step of current vocal print feature vector includes：The password voice is handled to carry out preset kind using Predetermined filter The extraction of vocal print feature, and the preset kind vocal print feature based on extraction builds vocal print feature vector corresponding to the password voice； It is vectorial to construct the current vocal print feature by the background channel model of the vocal print feature vector input training in advance of structure.

Wherein, Predetermined filter is preferably Mel wave filter.First, to the password voice carry out preemphasis, framing and Windowing process；In the present embodiment, after the password voice for the user for carrying out authentication is received, at password voice Reason.Wherein, preemphasis processing is really high-pass filtering processing, filters out low-frequency data so that the high frequency characteristics in password voice is more Add and highlight, specifically, the transmission function of high-pass filtering is：H (Z)=1- α Z^-1, wherein, Z is speech data, and α is constant factor, Preferably, α value is 0.97；Because stationarity is only presented in voice signal within a short period of time, therefore by one section of voice signal It is divided into the signal (i.e. N frames) of N section short time, and is lost in order to avoid the continuity Characteristics of sound, has one section of weight between consecutive frame Multiple region, repeat region are generally 1/2 per frame length；After framing is carried out to password voice, each frame signal is all as steady Signal is handled, but the presence of Gibbs' effect, and the start frame and end frame of password voice be discontinuous, after framing, More deviate from raw tone, therefore, it is necessary to windowing process is carried out to password voice.

Fourier transform is carried out to each adding window and obtains corresponding frequency spectrum；

The frequency spectrum is inputted into Mel wave filter to export to obtain Mel frequency spectrum；

Cepstral analysis is carried out on Mel frequency spectrum to obtain mel-frequency cepstrum coefficient MFCC, based on the mel-frequency Vocal print feature vector corresponding to cepstrum coefficient MFCC compositions.Wherein, cepstral analysis is, for example, and takes the logarithm, does inverse transformation, inverse transformation Realized generally by DCT discrete cosine transforms, take the 2nd after DCT to the 13rd coefficient as MFCC coefficients.Mel frequency Rate cepstrum coefficient MFCC is the vocal print feature of this frame password voice, by the mel-frequency cepstrum coefficient MFCC composition characteristics of every frame Data matrix, this feature data matrix are the vocal print feature vector of password voice.

Then, by the background channel model of vocal print feature vector input training in advance generation, it is preferable that the background channel mould Type is gauss hybrid models, calculates vocal print feature vector using the background channel model, draws corresponding current vocal print feature Vectorial (i.e. i-vector).

Specifically, the calculating process includes：

1) Gauss model, is selected：First, every frame data are calculated using the parameter in common background channel model in difference The likelihood logarithm value of Gauss model, by likelihood logarithm value matrix each column sorting in parallel, choosing top n Gauss model, finally Obtain a matrix per frame data numerical value in mixed Gauss model：

Loglike=E (X) * D (X)^-1*X^T-0.5*D(X)^-1*(X.²)^T,

Wherein, Loglike is likelihood logarithm value matrix, and E (X) is that common background channel model trains the average square come Battle array, D (X) are covariance matrix, and X is data matrix, X.²Each it is worth for matrix squared.

2) posterior probability, is calculated：X*XT calculating will be carried out per frame data X, and obtain a symmetrical matrix, three can be reduced to down Angular moment battle array, and element is arranged as to 1 row in order, become the vector that a N frame is multiplied by the lower triangular matrix number latitude Calculated, the vector of all frames is combined into new data matrix, while the association for probability being calculated in universal background model Variance matrix, each matrix are also reduced to lower triangular matrix, become with matrix as new data matrix class, passing through common background Mean Matrix and covariance matrix in channel model calculate the likelihood logarithm under the Gauss model of the selection of every frame data Value, Softmax recurrence is then carried out, operation is finally normalized, obtain every frame in mixed Gauss model Posterior probability distribution, The ProbabilityDistribution Vector of every frame is formed into probability matrix.

3) current vocal print feature vector, is extracted：Carry out single order first, the calculating of second order coefficient, coefficient of first order calculates can be with Obtained by probability matrix row summation：

Wherein, Gamma_iFor i-th of element of coefficient of first order vector, loglikes_jiFor the jth row of probability matrix, i-th yuan Element.

Second order coefficient can be multiplied by data matrix acquisition by the transposition of probability matrix：

X=Loglike^T* feats, wherein, X is second order coefficient matrix, and loglike is probability matrix, and feats is characterized Data matrix.

It is being calculated single order, after second order coefficient, parallel computation first order and quadratic term, is then passing through first order and two Current vocal print feature vector is calculated in secondary item.

In a preferred embodiment, on the basis of above-described embodiment, in above-mentioned steps S3 using it is predetermined away from It is vectorial the distance between with identified standard vocal print feature vector to calculate current vocal print feature from calculation formula, according to it is described away from Include from the step of carrying out authentication to user：Calculate the current vocal print discriminant vectorses and identified standard vocal print feature COS distance between vector：

Wherein,It is vectorial for the standard vocal print feature,For current vocal print feature vector.If the COS distance is less than Or pass through equal to default distance threshold, then authentication；If the COS distance is more than default distance threshold, identity Checking does not pass through.

The present invention also provides another server, and the server is similar with the hardware structure of above-mentioned Fig. 1 server, including Memory and the processor being connected with memory, and be connected by network interface with the terminal device of outside.Except that deposit The system for the authentication based on Application on Voiceprint Recognition that can be run on the processor is stored with reservoir, Application on Voiceprint Recognition should be based on Authentication system by the computing device when realize following steps：

In the present embodiment, speech data collects (voice capture device is, for example, microphone) by voice capture device, The system that the speech data of collection is sent to the authentication based on Application on Voiceprint Recognition by voice capture device.

When gathering speech data, should try one's best prevents the interference of ambient noise and voice capture device.Voice capture device Suitable distance is kept with user, and does not have to the big voice capture device of distortion as far as possible, power supply preferably uses civil power, and keeps electric current It is stable；Sensor should be used when carrying out telephonograph., can be to voice number before the vocal print feature in extracting speech data According to noise treatment is carried out, disturbed with further reduce.In order to extract to obtain the vocal print feature of speech data, gathered Speech data is the speech data of preset data length, or is the speech data more than preset data length.

Vocal print feature includes polytype, such as broadband vocal print, arrowband vocal print, amplitude vocal print etc., the vocal print of the present embodiment Be characterized as preferably speech data mel-frequency cepstrum coefficient (Mel Frequency Cepstrum Coefficient, MFCC).In vocal print feature vector corresponding to structure, by the vocal print feature composition characteristic data matrix of speech data, this feature Data matrix is the vocal print feature vector of speech data.

Wherein, by the background channel model of vocal print feature vector input training in advance generation, it is preferable that the background channel mould Type is gauss hybrid models, and vocal print feature vector is calculated using the background channel model, show that corresponding current vocal print differentiates Vectorial (i.e. i-vector).

Specifically, the calculating process includes：

Loglike=E (X) * D (X)^-1*X^T-0.5*D(X)^-1*(X.²)^T,

3) current vocal print discriminant vectorses, are extracted：Carry out single order first, the calculating of second order coefficient, coefficient of first order calculates can be with Obtained by probability matrix row summation：

It is being calculated single order, after second order coefficient, parallel computation first order and quadratic term, is then passing through first order and two Secondary item calculates current vocal print discriminant vectorses.

Preferably, background channel model is gauss hybrid models, is included before above-mentioned steps S1：

If the accuracy rate is more than predetermined threshold value, model training terminates, and institute is used as using the gauss hybrid models after training Step S2 background channel model is stated, or, if the accuracy rate is less than or equal to predetermined threshold value, increase the speech data sample This quantity, and training is re-started based on the speech data sample after increase.

Wherein, when the vocal print feature vector in using training set is trained to gauss hybrid models, the D that extracts Likelihood probability corresponding to dimension vocal print feature can be expressed as with K Gaussian component：

Wherein, the probability (mixed Gauss model) that P (x) is generated for speech data sample by gauss hybrid models, w_kTo be each high The weight of this model, and p (x | k) it is the probability that sample is generated by k-th of Gauss model, K is Gauss model quantity.

The parameter of whole gauss hybrid models can be expressed as：{w_i,μ_i,Σ_i, w_iFor the weight of i-th of Gauss model, μ_i For the average of i-th of Gauss model, ∑_iFor the covariance of i-th of Gauss model.The gauss hybrid models are trained to use non-prison The EM algorithms superintended and directed.After the completion of training, the weight vectors of gauss hybrid models, constant vector, N number of covariance matrix, average are obtained It is multiplied by matrix of covariance etc., the gauss hybrid models after as one training.

Vector has a variety of, including COS distance and Euclidean distance etc. with the distance between vector, it is preferable that the present embodiment Space length be COS distance, COS distance is as measurement two by the use of two vectorial angle cosine values in vector space The measurement of the size of interindividual variation.

Wherein, standard vocal print discriminant vectorses are the vocal print discriminant vectorses for being obtained ahead of time and storing, standard vocal print discriminant vectorses The identification information of its corresponding user is carried in storage, it is capable of the identity of user corresponding to accurate representation.Calculating space Before distance, the identification information provided according to user obtains the vocal print discriminant vectorses of storage.

Wherein, when the space length being calculated is less than or equal to pre-determined distance threshold value, it is verified, conversely, then verifying Failure.

Compared with prior art, the background channel model of the present embodiment training in advance generation is by a large amount of speech datas Excavation obtained with comparing training, this model can to greatest extent retain user vocal print feature while, accurately portray Background vocal print feature when user speaks, and can remove this feature in identification, and extract the intrinsic spy of user voice Sign, can significantly improve the accuracy rate of subscriber authentication, and improve the efficiency of authentication；In addition, the present embodiment is abundant Vocal print feature related to sound channel in voice is make use of, this vocal print feature need not be simultaneously any limitation as to text, thus entered There is larger flexibility during row identification and checking.

As shown in Fig. 2 Fig. 2 is the schematic flow sheet of the embodiment of auth method one of the present invention, the auth method Comprise the following steps：

In a preferred embodiment, in order to prevent the audio quality of password voice from influenceing the result of vocal print feature checking, On the basis of above-mentioned Fig. 2 embodiment, the step S2 includes：The password voice that the user that client is sent reports is received, Analyze whether the password voice can use, if the password voice is unavailable, prompt client to re-start password voice Record, or, if the password voice can use, character recognition is carried out to the password voice.

In a preferred embodiment, on the basis of above-mentioned Fig. 2 embodiment, the auth method also includes as follows Step：If code characters standard cipher character corresponding with voice acquisition text is inconsistent, again at random to the visitor Family end sends the voice acquisition text for user response；The voice for adding up to send to client obtains the number of text, if described Number is more than or equal to preset times, then terminates the response to the authentication request.

Specifically, the calculating process includes：

Loglike=E (X) * D (X)^-1*X^T-0.5*D(X)^-1*(X.²)^T,

In a preferred embodiment, on the basis of above-described embodiment, background channel model is gauss hybrid models, instruction Practicing background channel model includes：

If the accuracy rate is more than predetermined threshold value, model training terminates, using the gauss hybrid models after training as upper Background channel model to be applied is stated, or, if the accuracy rate is less than or equal to predetermined threshold value, increase the speech data sample This quantity, and training is re-started based on the speech data sample after increase.

The embodiments of the present invention are for illustration only, do not represent the quality of embodiment.

Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can add the mode of required general hardware platform to realize by software, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on such understanding, technical scheme is substantially done to prior art in other words Going out the part of contribution can be embodied in the form of software product, and the computer software product is stored in a storage medium In (such as ROM/RAM, magnetic disc, CD), including some instructions to cause a station terminal equipment (can be mobile phone, computer, clothes Be engaged in device, air conditioner, or network equipment etc.) perform method described in each embodiment of the present invention.

The preferred embodiments of the present invention are these are only, are not intended to limit the scope of the invention, it is every to utilize this hair The equivalent structure or equivalent flow conversion that bright specification and accompanying drawing content are made, or directly or indirectly it is used in other related skills Art field, is included within the scope of the present invention.

Claims

1. a kind of server, it is characterised in that the server includes memory and the processor being connected with the memory, institute The authentication system that is stored with and can run on the processor in memory is stated, the authentication system is by the processing Device realizes following steps when performing：

S1, receive client transmission carrying identity authentication request after, at random to the client send for The voice of family response obtains text；

S2, the password voice that client obtains user's report that text is sent based on the voice is received, and to the password language Sound carries out character recognition, identifies code characters corresponding to the password voice；

S3, if code characters standard cipher character corresponding with voice acquisition text is consistent, build the password voice Current vocal print feature vector, and according to predetermined identity and standard vocal print feature vector mapping relations determine the user Identity corresponding to standard vocal print feature vector, using predetermined distance calculation formula calculate current vocal print feature to The distance between amount and identified standard vocal print feature vector, authentication is carried out to user according to the distance.

2. server according to claim 1, it is characterised in that the step S2 includes：

The password voice that the user that client is sent reports is received, analyzes whether the password voice can use, if the password language Sound is unavailable, then the recording for prompting client to re-start password voice, or, if the password voice can use, to described Password voice carries out character recognition.

3. server according to claim 1 or 2, it is characterised in that the authentication system is held by the processor During row, following steps are also realized：

If code characters standard cipher character corresponding with voice acquisition text is inconsistent, again at random to the client End sends the voice acquisition text for user response；

The voice for adding up to send to client obtains the number of text, if the number is more than or equal to preset times, termination pair The response of the authentication request.

4. server according to claim 1 or 2, it is characterised in that the current vocal print for building the password voice is special The step of sign vector includes：

The password voice is handled using Predetermined filter to carry out the extraction of preset kind vocal print feature, and be based on carrying The preset kind vocal print feature taken builds vocal print feature vector corresponding to the password voice；

By the background channel model of the vocal print feature of structure vector input training in advance, with construct the current vocal print feature to Amount；

It is described to utilize the current vocal print feature of predetermined distance calculation formula calculating vectorial and identified standard vocal print feature The distance between vector, the step of user's progress authentication, is included according to the distance：

5. a kind of server, it is characterised in that the server includes memory and the processor being connected with the memory, institute The system that the authentication based on Application on Voiceprint Recognition that can be run on the processor is stored with memory is stated, it is described to be based on sound Following steps are realized when the system of the authentication of line identification is by the computing device：

S101, after the speech data for the user for carrying out authentication is received, the vocal print feature of the speech data is obtained, and Based on vocal print feature vector corresponding to vocal print feature structure；

S102, the background channel model that vocal print feature vector input training in advance is generated, to construct the voice number According to corresponding current vocal print discriminant vectorses；

S103, calculate space between the current vocal print discriminant vectorses and the standard vocal print discriminant vectorses of the user to prestore away from From carrying out authentication to the user based on the distance, and generate the result.

6. a kind of auth method, it is characterised in that the auth method includes：

7. auth method according to claim 6, it is characterised in that the step S2 includes：

8. the auth method according to claim 6 or 7, it is characterised in that also include after the step S2：

9. the auth method according to claim 6 or 7, it is characterised in that described to build the current of the password voice The step of vocal print feature vector includes：

10. auth method according to claim 9, it is characterised in that the background channel model is Gaussian Mixture Model, the training background channel model include：

The speech data sample of predetermined number is obtained, and obtains vocal print feature corresponding to each speech data sample, and is based on each language Vocal print feature corresponding to sound data sample builds vocal print feature vector corresponding to each speech data sample；

Vocal print feature vector corresponding to each speech data sample is divided into the training set of the first ratio and the checking collection of the second ratio, First ratio and the second ratio and less than or equal to 1；

Gauss hybrid models are trained using the vocal print feature vector in the training set, and after the completion of training, utilized The accuracy rate of gauss hybrid models after the checking set pair training is verified；

If the accuracy rate is more than predetermined threshold value, model training terminates, and the back of the body is used as using the gauss hybrid models after training Scape channel model, or, if the accuracy rate is less than or equal to predetermined threshold value, increase the quantity of the speech data sample, and Training is re-started based on the speech data sample after increase.

11. a kind of computer-readable recording medium, it is characterised in that be stored with identity on the computer-readable recording medium and test Card system, realize that the identity as any one of claim 6 to 10 is tested when the authentication system is executed by processor The step of card method.