CN109257362A

CN109257362A - Method, apparatus, computer equipment and the storage medium of voice print verification

Info

Publication number: CN109257362A
Application number: CN201811184775.3A
Authority: CN
Inventors: 杨翘楚; 王健宗; 肖京
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-10-11
Filing date: 2018-10-11
Publication date: 2019-01-22
Also published as: WO2020073519A1

Abstract

This application discloses the method, apparatus of voice print verification, computer equipment and storage mediums, the wherein method of voice print verification, it include: the voice signal of identity to be verified to be extracted by client-server, and extract the corresponding MFCC type vocal print feature of each frame voice data in the voice signal；The MFCC type vocal print feature is built into the corresponding vocal print feature vector of each frame voice data by the client-server, to form the first vocal print feature；Voice print verification server receives first vocal print feature that the client-server is sent；Whether the characteristic distance value that voice print verification server judges first vocal print feature and prestores between the corresponding vocal print discriminant vectors i-vector of vocal print feature meets preset requirement；If satisfied, then determine that first vocal print feature is identical as the vocal print feature that prestores, it is otherwise not identical.The application is preposition to client-server by the extraction of vocal print feature vector, is then transmitted to voice print verification server again and carries out voice print verification.

Description

Method, apparatus, computer equipment and the storage medium of voice print verification

Technical field

This application involves arrive voice print verification field, especially relate to method, apparatus, the computer equipment of voice print verification with And storage medium.

Background technique

Currently, the scope of business of many large size financing corporations is related to multiple business such as insurance, bank, investment, and it is every A business usually requires same client and links up, and requires to carry out instead cheating identification, therefore, tests the identity of client Card and anti-fraud identification also just become the important component for guaranteeing service security.In client identity verifying link, vocal print is tested It demonstrate,proves the real-time having due to it and convenience and is used by many companies.The training of client's sound-groove model and testing for client identity Card needs to acquire the voice data of client, and talk of the acquisition of voice data often from financing corporation and client is recorded. However, voice data is carried out phonetic feature ginseng to backstage by network transmission again since business negotiation often relates to confidential content Several extractions are unfavorable for data security.

Summary of the invention

The main purpose of the application is to provide the method for voice print verification, it is intended to which solving to need during existing voice print verification will be objective The voice data of family end acquisition is sent to backstage and carries out vocal print feature extraction, causes the confidentiality of voice data in transmission poor Technical problem.

A kind of method that the application proposes voice print verification, comprising:

The voice signal of identity to be verified is extracted by client-server, and extracts each frame voice in the voice signal The corresponding MFCC type vocal print feature of data；

The MFCC type vocal print feature each frame voice data is built by the client-server to respectively correspond Vocal print feature vector, to form the first vocal print feature；

Voice print verification server receives first vocal print feature that the client-server is sent；

Voice print verification server judge first vocal print feature vocal print corresponding with vocal print feature is prestored identify to Whether the characteristic distance value between amount i-vector meets preset requirement；

If satisfied, then determine that first vocal print feature is identical as the vocal print feature that prestores, it is otherwise not identical.

Preferably, the voice print verification server judges that first vocal print feature is corresponding with vocal print feature is prestored The step of whether the characteristic distance value between vocal print discriminant vectors i-vector meets preset requirement, comprising:

The corresponding vocal print feature vector of each frame voice data is each mapped to the vocal print discriminant vectors i- of low dimensional vector；

Pass through COS distance formulaCalculate the corresponding vocal print discriminant vectors of the first vocal print feature COS distance value between i-vector vocal print discriminant vectors i-vector corresponding with vocal print feature is prestored, wherein x represents pre- The corresponding vocal print discriminant vectors i-vector of vocal print feature is deposited, y represents the corresponding vocal print discriminant vectors i- of the first vocal print feature vector；

Judge whether the COS distance value meets preset condition；

If satisfied, then determining first vocal print feature vocal print discriminant vectors i- corresponding with vocal print feature is prestored Characteristic distance value between vector meets preset requirement, is otherwise unsatisfactory for preset requirement.

Preferably, the vocal print that the corresponding vocal print feature vector of each frame voice data is each mapped to low dimensional The step of discriminant vectors i-vector, comprising:

The obtained corresponding vocal print feature vector of each frame voice data will be extracted and be separately input to GMM-UBM model, Obtain characterizing the Gauss super vector of probability distribution of each frame voice data in each Gaussian component；

Each Gauss super vector is utilized into formula m_r=μ+T ω_r, it is corresponding low that each frame voice data is calculated The vocal print discriminant vectors i-vector of dimension, wherein m_rFor the Gauss super vector of each frame voice data, μ is the GMM-UBM mould The mean value super vector of type, T are vocal print the discriminant vectors i-vector, ω of the low dimensional of each frame voice data_rTo be mapped to higher-dimension The transition matrix of the Gaussian spatial of degree.

It is preferably, described that the step of whether the COS distance value meets preset condition judged, comprising:

It obtains respectively and corresponding in the vocal print feature data of the multiple people prestored prestores vocal print feature and described first The first COS distance value between vocal print feature, wherein prestore vocal print including target person in the vocal print feature data of multiple people Feature；

Each first COS distance value is ranked up according to sequence from small to large；

In first COS distance value of the preceding preset quantity of judgement sequence, if prestore vocal print including the target person The corresponding first COS distance value of feature；

If so, determining that the COS distance value meets preset condition, it is otherwise unsatisfactory for preset condition.

Obtain the second COS distance value of target person prestored between vocal print feature and first vocal print feature；

Judge whether the second COS distance value is less than or equal to preset threshold；

Present invention also provides a kind of voiceprint verification systems, including client, client-server and the vocal print service for checking credentials Device；

The client acquires the voice signal of identity to be verified, and sends the client for the voice signal and take Business device；

The client-server receives the voice signal, and carries out vocal print feature to the voice signal and extract to obtain First vocal print feature is transmitted to voice print verification server by the first vocal print feature；

The voice print verification server receives first vocal print feature, and by first vocal print feature and prestores vocal print Feature is compared analysis, to judge whether first vocal print feature and the vocal print feature that prestores are identical, and judgement is tied Fruit feeds back to the client-server；

The client-server controls the client according to the judging result and carries out feedback response.

Preferably, the continuous analog signal of the voice signal is sampled by client according to specified sample period, It to form discrete analog signal, and is digital signal by prescribed coding Rules expanding, described in the client-server reception Voice signal, and vocal print feature is carried out to the voice signal and extracts to obtain the process of the first vocal print feature, comprising:

After the digital signal is carried out preemphasis by the client-server, framing is carried out to the digital signal of preemphasis Processing, obtains each frame voice data；

According toEach frame voice data is mapped to from linear spectral domain Meier spectrum domain, wherein f_MelIndicate that Meier spectrum value, f indicate linear spectral value；

Each frame voice data for translating into Meier spectrum domain is input to Meier triangular filter group, calculates each frequency range The logarithmic energy of Meier triangular filter output, obtains the corresponding logarithmic energy sequence of each frame voice data；

Each logarithmic energy sequence is subjected to discrete cosine transform, obtains the corresponding MFCC class of each frame voice data Type vocal print feature；

The MFCC type vocal print feature is built into the corresponding vocal print feature vector of each frame voice data, to be formed First vocal print feature.

Preferably, the judging result includes that first vocal print feature and the vocal print feature that prestores be not identical, described Client-server controls the process that the client carries out feedback response according to the judging result, comprising:

Client-server generates the unsuccessful feedback information of authentication and is sent to the client；

Judge the number for generating the unsuccessful feedback information of authentication in preset time according to first vocal print feature, It whether is more than preset times.

If being more than preset times, controls the client and be in disabled status, and sound an alarm.

Present invention also provides a kind of computer equipment, including memory and processor, the memory is stored with calculating The step of machine program, the processor realizes the above method when executing the computer program.

Present invention also provides a kind of computer readable storage mediums, are stored thereon with computer program, the computer The step of above-mentioned method is realized when program is executed by processor.

The application is preposition to completing on client-server by function that vocal print feature vector extracts, and client passes through recording The vocal print feature vector of voice signal directly is extracted in local client-server after acquiring voice signal, then again by vocal print Feature vector is transmitted on the authentication server of third party technology support and carries out voice print verification, the training of voice print verification model and says People's identification process is talked about, since vocal print feature vector counter push away can not be reduced to the initial data of voice signal again, is conducive to client The voice signal of recording carries out data confidentiality, improves Information Security, proposes the safety of client identity identifying procedure It is high.The application is transmitted to server by the data after extracting vocal print feature vector and carries out voice print verification, vocal print feature vector number According to more lighter than primary speech signal data, efficiency of transmission is considerably increased.The application is based on GMM-UBM realization will be each described Vocal print feature vector is each mapped to the vocal print discriminant vectors i-vector of low dimensional, reduces and calculates cost, reduces voice print verification Use cost.By being compared analysis with the pre-stored data of more people in verification process, reduce voice print verification etc. error rates, The model error bring for reducing voice print verification influences.

Detailed description of the invention

The method flow schematic diagram of the voice print verification of one embodiment of Fig. 1 the application；

The computer equipment schematic diagram of internal structure of one embodiment of Fig. 2 the application.

Specific embodiment

It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, not For limiting the application.

Referring to Fig.1, the method for the voice print verification of one embodiment of the application acquires information by client, passes through server Voice print verification is carried out, method includes:

S1: extracting the voice signal of identity to be verified by client-server, and extracts each frame in the voice signal The corresponding MFCC type vocal print feature of voice data.

MFCC (Mel Frequency Cepstrum Coefficient, mel-frequency cepstrum coefficient) class of the present embodiment Type vocal print feature has nonlinear characteristic, issues analysis result of the voice signal of client in each frequency range closer to human body The feature of real speech improves the effect of voice print verification.

S2: it is right respectively that the MFCC type vocal print feature is built by each frame voice data by the client-server The vocal print feature vector answered, to form the first vocal print feature.

The present embodiment constructs the corresponding vocal print feature of each frame voice data according to the MFCC type vocal print feature of extraction Vector exists the combination of corresponding MFCC type vocal print feature then by the sequence of each frame voice data of voice signal Together, corresponding first vocal print feature of voice signal of client is obtained, above-mentioned building process is still completed in client-server, To enhance the data security in data transmission procedure.

S3: voice print verification server receives first vocal print feature that the client-server is sent.

The present embodiment is preposition to client-server completion by the extraction work of the first vocal print feature, so as to client service After device receives the voice signal of the client of recording acquisition, corresponding first vocal print of voice signal directly is extracted in client-server Feature, the voice print verification server for being then transmitted to third party technology support again carry out voice print verification.Due to the first vocal print feature Original voice signal can not be reduced to by counter push away again, be conducive to carry out data confidentiality to the voice signal that client records, mention High Information Security makes the safety of client identity identifying procedure be improved, meanwhile, the first vocal print feature compares voice signal Data volume is smaller, considerably increases efficiency of transmission.Vocal print feature is extracted by voice signal of the client-server to acquisition, it will Vocal print feature after extraction is transmitted to voice print verification server and carries out voice print verification, the client-server for extracting vocal print feature It is separated with vocal print authentication server.

S4: voice print verification server judges that first vocal print feature vocal print corresponding with vocal print feature is prestored identifies Whether the characteristic distance value between vector i-vector meets preset requirement.

The preset requirement of the present embodiment includes the preset threshold range etc. that characteristic distance value reaches specified, can be according to specific Application scenarios carry out customized setting, broadly to meet personalized use demand.

S5: if satisfied, then determine that first vocal print feature is identical as the vocal print feature that prestores, it is otherwise not identical.

The present embodiment will determine that first vocal print feature is identical as the vocal print feature that prestores, then passes through server to visitor The result that family end feedback validation passes through is to client, otherwise, the result of feedback validation failure to client, so as to client according to Feedback result carries out further application operating.Citing ground controls intelligent door opening etc. after being verified.With illustrating again, verifying Control security system carries out screen locking after failure predetermined number of times, further destroys e-banking system to prevent offender.

Further, the step S4 of the present embodiment, comprising:

S41: by the vocal print that the corresponding vocal print feature vector of each frame voice data is each mapped to low dimensional identify to Measure i-vector.

The present embodiment is based on GMM-UBM (Gaussian Mixture Model-Universal Background Model, gauss hybrid models-background model) realize the corresponding vocal print feature vector of each frame voice data is mapped respectively For the vocal print discriminant vectors i-vector of low dimensional.The training process of the GMM-UBM of the present embodiment is as follows: B1: obtaining present count Measure the voice data sample of (for example, 100,000), the corresponding vocal print discriminant vectors of each voice data sample, each voice sample Originally the voice from different people in different environments can be acquired, such voice data sample, which is used to training, can characterize generally The universal background model (GMM-UBM) of characteristics of speech sounds；B2, each voice data sample is handled respectively it is each to extract The corresponding preset kind vocal print feature of voice data sample, and it is special based on the corresponding preset kind vocal print of each voice data sample Sign constructs the corresponding vocal print feature vector of each voice data sample；B3, by all preset kind vocal print features constructed to Amount is divided into the training set of the first percentage and the verifying collection of the second percentage, small after first percentage and the second percentage In or equal to 100%；B4, second model is trained using the vocal print feature vector in training set, and complete in training The accuracy rate of trained second model is verified at being collected later using verifying；If B5, accuracy rate are greater than default accurate Rate (for example, 98.5%), then model training terminates, and otherwise, increases the quantity of voice data sample, and based on the voice after increase Data sample re-executes above-mentioned steps B2, B3, B4, B5.

The vocal print discriminant vectors of the present embodiment are expressed using vocal print discriminant vectors i-vector, vocal print discriminant vectors i- Vector is a vector, and for the dimension of Gaussian spatial, vocal print discriminant vectors i-vector dimension is lower, convenient for drop Low calculating cost.

S42: pass through COS distance formulaThe corresponding vocal print of the first vocal print feature is calculated to identify The distance between vector i-vector vocal print discriminant vectors i-vector corresponding with vocal print feature is prestored value, wherein x represents pre- The corresponding vocal print discriminant vectors i-vector of vocal print feature is deposited, y represents the corresponding vocal print discriminant vectors i- of the first vocal print feature vector。

S43: judge whether the COS distance value meets preset condition.

The preset condition of the present embodiment includes COS distance value in specified threshold range etc., can be set as needed. The present embodiment, which passes through, prestores vocal print feature and first sound for corresponding in the vocal print feature data of the multiple people prestored The first COS distance value that line feature calculates separately is sorted from small to large, judges preceding several first cosine of predetermined order Whether include that target person prestores the corresponding first COS distance value of vocal print feature in distance value, determines COS distance if including Value meets preset condition.Another embodiment of the application pass through judge target person prestore vocal print feature and first vocal print feature Between the second COS distance value whether be less than or equal to preset threshold, if being less than or equal to, determine COS distance value meet Preset condition.

S44: if the COS distance value meets preset condition, determine first vocal print feature and prestore vocal print feature Characteristic distance value between corresponding vocal print discriminant vectors i-vector meets preset requirement, is otherwise unsatisfactory for default want It asks.

Further, the step S41 of the present embodiment, comprising:

S410: the obtained corresponding vocal print feature vector of each frame voice data will be extracted and be separately input to GMM-UBM Model obtains the Gauss super vector for characterizing probability distribution of each frame voice data in each Gaussian component.

S411: each Gauss super vector is utilized into formula m_r=μ+T ω_r, each frame voice data is calculated and respectively corresponds Low dimensional vocal print discriminant vectors i-vector, wherein m_rFor the Gauss super vector of each frame voice data, μ is the GMM- The mean value super vector of UBM model, T are vocal print the discriminant vectors i-vector, ω of the low dimensional of each frame voice data_rTo be mapped to The transition matrix of high-dimensional Gaussian spatial.

The T training of the present embodiment uses EM algorithm.EM algorithm refers to EM algorithm (Expectation Maximization Algorithm, and translate expectation-maximization algorithm), it is a kind of iterative algorithm, is used to seek in statistics It looks for, in the probabilistic model dependent on the recessive variable of not observable, the maximal possibility estimation of parameter.EM algorithm passes through two A step alternately calculates: 1) calculating expectation (E) using the existing estimated value of probabilistic model parameter and calculate hidden variable It is expected that；2) (M) is maximized, the expectation of the hidden variable acquired is walked using E, maximal possibility estimation is carried out to parameter model.On It walks the estimates of parameters found to be used in lower step calculating, constantly alternately.

Further, the step S43 of the present embodiment, comprising:

S430: obtain respectively in the vocal print feature data of the multiple people prestored it is corresponding prestore vocal print feature with it is described The first COS distance value between first vocal print feature, wherein prestoring including target person in the vocal print feature data of multiple people Vocal print feature.

The present embodiment passes through the vocal print feature data for the more people including target person that will be prestored, while for judging currently to adopt Whether the vocal print feature of the voice signal of collection is identical as the vocal print feature of target person, to improve judgment accuracy.The present embodiment is logical Cross COS distance formulaIt is prestored between vocal print feature and first vocal print feature described in indicating each The first COS distance value, wherein x representative respectively prestore vocal print discriminant vectors, y represents the vocal print discriminant vectors of the first vocal print feature I-vector, COS distance value is smaller, shows that two vocal print features are closer or identical." first " of the present embodiment, is used only as area Not, it is not used in restriction, the effect of elsewhere is identical, does not repeat.

S431: each first COS distance value is ranked up according to sequence from small to large.

The present embodiment is by by each first COS distance prestored between vocal print feature and first vocal print feature Value is sorted from small to large, is distributed shape with the similarity for respectively prestoring vocal print feature more accurately to analyze the first vocal print feature State, more accurately to obtain the verifying to the first vocal print feature.

S432: in the first COS distance value of the preceding preset quantity of judgement sequence, if including the pre- of the target person Deposit the corresponding first COS distance value of vocal print feature.

The present embodiment includes prestoring for the target person in the first COS distance value by the preceding preset quantity that sorts The corresponding first COS distance value of vocal print feature, then determine that the first vocal print feature is identical as the vocal print feature of the target person prestored, To reduce the error rates such as model error bring identification, the error rates such as above-mentioned are " the unsanctioned frequency of the verifying occurred when should be verified Rate, with should verify not by when the frequency being verified that occurs it is equal ".First COS distance of the preset quantity of the present embodiment Value includes 1,2 or 3 etc., can be carried out according to use demand from setting.

S433: if so, determining that COS distance value meets preset condition, otherwise it is unsatisfactory for preset condition.

Further, the step S43 of another embodiment of the application, comprising:

S434: the second COS distance value of target person prestored between vocal print feature and first vocal print feature is obtained.

The present embodiment compares calculation amount by comparing a second COS distance value, reduction just for property, improves verifying Rate.

S435: judge whether the second COS distance value is less than or equal to preset threshold.

The present embodiment passes through the distance threshold for prestoring vocal print feature of setting the first vocal print feature and target user, and realization has The voice print verification of effect.Citing ground, preset threshold 0.6.

S436: if so, determining that the COS distance value meets preset condition, otherwise it is unsatisfactory for preset condition.

The present embodiment calculates the first vocal print feature and the COS distance for prestoring vocal print feature of target user is less than or equal to Preset threshold then determines that COS distance value meets preset condition, determines that the first vocal print feature and target user's prestores vocal print spy It levies identical, is then verified；If the COS distance for prestoring vocal print feature for calculating the first vocal print feature and target user is greater than in advance If threshold value, then determine that the distance value is unsatisfactory for preset condition, determines that the first vocal print feature and target user's prestores vocal print spy Sign is not identical, then authentication failed.

Further, the continuous analog signal of the embodiment of the present invention voice signal is by client according to specified sampling week Phase is sampled, and to form discrete analog signal, and prescribed coding Rules expanding is digital signal；The client-server connects The voice signal is received, and the process for extracting to obtain the first vocal print feature to voice signal progress vocal print feature is as follows:

S101, after the digital signal is carried out preemphasis by the client-server, to the digital signal of preemphasis into Row sub-frame processing obtains each frame voice data；S102, according toBy each frame voice Data are mapped to Meier spectrum domain from linear spectral domain, wherein f_MelIndicate that Meier spectrum value, f indicate linear spectral value；S103, Each frame voice data for translating into Meier spectrum domain is input to one group of Meier triangular filter group, calculates the Meier of each frequency range The logarithmic energy of triangular filter output, obtains the corresponding logarithmic energy sequence of each frame voice data；S104, will be each described Logarithmic energy sequence carries out discrete cosine transform, obtains the corresponding MFCC type vocal print feature of each frame voice data；By institute It states MFCC type vocal print feature and is built into the corresponding vocal print feature vector of each frame voice data, to form first vocal print Feature.

Above-mentioned preemphasis, due to the physiological property of human body, the radio-frequency component of voice signal is often constrained, the work of preemphasis Be compensation radio-frequency component；In above-mentioned sub-frame processing, due to " the instantaneous stationarity " of voice signal, when carrying out spectrum analysis pair One section of voice signal carries out sub-frame processing (generally 10 to 30 millisecond of one frame), and feature extraction is then carried out as unit of frame；On Windowing process is carried out after stating sub-frame processing, effect is the discontinuity problem of with reducing frame starting and ending square signal, this reality It applies example and windowing process is carried out using Hamming window.

The present embodiment is preposition to completing on client-server by function that vocal print feature vector extracts, and client passes through record Sound directly extracts the vocal print feature vector of voice signal in local client-server after acquiring voice signal, then again by sound Line feature vector is transmitted on the authentication server of third party technology support and carries out voice print verification, the training of voice print verification model and Speaker's identification process is conducive to since vocal print feature vector counter push away can not be reduced to the initial data of voice signal again to visitor The voice signal of family recording carries out data confidentiality, improves Information Security, obtains the safety of client identity identifying procedure It improves.The present embodiment by extract vocal print feature vector after data be transmitted to server carry out voice print verification, vocal print feature to It is more lighter than primary speech signal data to measure data, considerably increases efficiency of transmission.The present embodiment is based on GMM-UBM realization will Each vocal print feature vector is each mapped to the vocal print discriminant vectors i-vector of low dimensional, reduces and calculates cost, reduction sound The use cost of line verifying.By being compared analysis with the pre-stored data of more people in verification process, voice print verification is reduced Etc. error rates, reduce voice print verification model error bring influence.

Further, the judging result includes that first vocal print feature and the vocal print feature that prestores be not identical, institute It states client-server and the process that the client carries out feedback response is controlled according to the judging result, comprising:

This voiceprint verification system includes alarm and security management and control device, to enhance the voiceprint verification system in practical application mistake The functional completeness of journey improves management safety and information security.

Referring to Fig. 2, a kind of computer equipment is also provided in the embodiment of the present application, which can be server, Its internal structure can be as shown in Figure 2.The computer equipment includes processor, the memory, network connected by system bus Interface and database.Wherein, the processor of the Computer Design is for providing calculating and control ability.The computer equipment is deposited Reservoir includes non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer program And database.The internal memory provides environment for the operation of operating system and computer program in non-volatile memory medium.It should The database of computer equipment is used to store all data that the process of voice print verification needs.The network interface of the computer equipment For being communicated with external end by network connection.The side of voice print verification is realized when the computer program is executed by processor Method.

The method that above-mentioned processor executes above-mentioned voice print verification, comprising: identity to be verified is extracted by client-server Voice signal, and extract the corresponding MFCC type vocal print feature of each frame voice data in the voice signal；Pass through institute It states client-server and the MFCC type vocal print feature is built into the corresponding vocal print feature vector of each frame voice data, To form the first vocal print feature；Voice print verification server receives first vocal print feature that the client-server is sent； Voice print verification server judges first vocal print feature vocal print discriminant vectors i- corresponding with vocal print feature is prestored Whether the characteristic distance value between vector meets preset requirement；If satisfied, then determine first vocal print feature with it is described pre- It is identical to deposit vocal print feature, it is otherwise not identical.

Above-mentioned computer equipment, function that vocal print feature vector extracts is preposition to completing on client-server, client End is by directly extracting the vocal print feature vector of voice signal in local client-server after recording acquisition voice signal, so Vocal print feature vector is transmitted to again on the authentication server of third party technology support afterwards and carries out voice print verification, voice print verification model Training and speaker's identification process have since vocal print feature vector counter push away can not be reduced to the initial data of voice signal again Data confidentiality is carried out conducive to the voice signal recorded to client, Information Security is improved, makes the safety of client identity identifying procedure Property is improved.By extract vocal print feature vector after data be transmitted to server carry out voice print verification, vocal print feature to It is more lighter than primary speech signal data to measure data, considerably increases efficiency of transmission.It is realized based on GMM-UBM by each sound Line feature vector is each mapped to the vocal print discriminant vectors i-vector of low dimensional, reduces and calculates cost, reduces voice print verification Use cost.By being compared analysis with the pre-stored data of more people in verification process, reduce voice print verification etc. error rates, drop The model error bring of low voice print verification influences.

In one embodiment, above-mentioned processor judges that first vocal print feature is corresponding with vocal print feature is prestored The step of whether the characteristic distance value between vocal print discriminant vectors i-vector meets preset requirement, comprising: by each frame voice number The vocal print discriminant vectors i-vector of low dimensional is each mapped to according to corresponding vocal print feature vector；Pass through COS distance public affairs FormulaIt calculates the corresponding vocal print discriminant vectors i-vector of the first vocal print feature and prestores vocal print feature COS distance value cos (x, y) between corresponding vocal print discriminant vectors i-vector, wherein it is corresponding that x representative prestores vocal print feature Vocal print discriminant vectors i-vector, y represent the corresponding vocal print discriminant vectors i-vector of the first vocal print feature；Judge described remaining Whether chordal distance value meets preset condition；If satisfied, then determining that first vocal print feature is respectively corresponded with vocal print feature is prestored Vocal print discriminant vectors i-vector between characteristic distance value meet preset requirement, be otherwise unsatisfactory for preset requirement.

In one embodiment, above-mentioned processor maps the corresponding vocal print feature vector of each frame voice data respectively For low dimensional vocal print discriminant vectors i-vector the step of, comprising:

The obtained corresponding vocal print feature vector of each frame voice data will be extracted and be separately input to GMM-UBM model, Obtain characterizing the Gauss super vector of probability distribution of each frame voice data in each Gaussian component；By each Gauss super vector benefit With formula m_r=μ+T ω_r, the vocal print discriminant vectors i-vector of the corresponding low dimensional of each frame voice data is calculated, Middle m_rFor the Gauss super vector of each frame voice data, μ is the mean value super vector of the GMM-UBM model, and T is each frame voice data Low dimensional vocal print discriminant vectors i-vector, ω_rFor the transition matrix for being mapped to high-dimensional Gaussian spatial.

In one embodiment, above-mentioned processor judges the step of whether the COS distance value meets preset condition, packet It includes: obtaining corresponding in the vocal print feature data of the multiple people prestored prestore vocal print feature and first vocal print is special respectively The first COS distance value between sign, wherein prestore vocal print feature including target person in the vocal print feature data of multiple people；It will Each first COS distance value is ranked up according to sequence from small to large；More than the first of the preceding preset quantity of judgement sequence In chordal distance value, if prestore the corresponding first COS distance value of vocal print feature including the target person；If so, determining institute It states COS distance value and meets preset condition, be otherwise unsatisfactory for preset condition.

In one embodiment, above-mentioned processor judges the step of whether the COS distance value meets preset condition, packet It includes: obtaining the second COS distance value of target person prestored between vocal print feature and first vocal print feature；Judge described Whether two COS distance values are less than or equal to preset threshold；If so, determining that the COS distance value meets preset condition, otherwise It is unsatisfactory for preset condition.

It will be understood by those skilled in the art that structure shown in Figure 2, only part relevant to application scheme is tied The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme.

One embodiment of the application also provides a kind of computer readable storage medium, is stored thereon with computer program, calculates The method of voice print verification is realized when machine program is executed by processor, comprising: client acquisition is extracted by client-server First vocal print feature, and extract the corresponding MFCC type vocal print feature of each frame voice data in the voice signal；Pass through The client-server by the MFCC type vocal print feature be built into the corresponding vocal print feature of each frame voice data to Amount, to form the first vocal print feature；Remotely receive first vocal print feature that the client-server is sent；Described in judgement First vocal print feature and whether prestore characteristic distance value between the corresponding vocal print discriminant vectors i-vector of vocal print feature Meet preset requirement；If satisfied, then determine that first vocal print feature is identical as the vocal print feature that prestores, it is otherwise not identical.

Above-mentioned computer readable storage medium, the function that vocal print feature vector is extracted are preposition to complete on client-server At client is by directly extracting the vocal print feature of voice signal in local client-server after recording acquisition voice signal Then vocal print feature vector is transmitted on the authentication server of third party technology support again and carries out voice print verification, vocal print by vector Training and the speaker's identification process for verifying model, since vocal print feature vector counter push away can not be reduced to the original of voice signal again Data are conducive to carry out data confidentiality to the voice signal that client records, improve Information Security, make client identity identifying procedure Safety be improved.Server, which is transmitted to, by the data after extraction vocal print feature vector carries out voice print verification, vocal print Characteristic vector data is more lighter than primary speech signal data, considerably increases efficiency of transmission.It will be each based on GMM-UBM realization The vocal print feature vector is each mapped to the vocal print discriminant vectors i-vector of low dimensional, reduces and calculates cost, reduces vocal print The use cost of verifying.By being compared analysis with the pre-stored data of more people in verification process, reduce voice print verification etc. Error rate, the model error bring for reducing voice print verification influence.

In one embodiment, above-mentioned processor judges that first vocal print feature is corresponding with vocal print feature is prestored The step of whether the characteristic distance value between vocal print discriminant vectors i-vector meets preset requirement, comprising: by each frame voice number The vocal print discriminant vectors i-vector of low dimensional is each mapped to according to corresponding vocal print feature vector；Pass through COS distance public affairs FormulaIt calculates the corresponding vocal print discriminant vectors i-vector of the first vocal print feature and prestores vocal print feature COS distance value between corresponding vocal print discriminant vectors i-vector, wherein x representative prestores the corresponding vocal print mirror of vocal print feature Other vector i-vector, y represent the corresponding vocal print discriminant vectors i-vector of the first vocal print feature；Judge the COS distance value Whether preset condition is met；If satisfied, then determining first vocal print feature vocal print mirror corresponding with vocal print feature is prestored Characteristic distance value between other vector i-vector meets preset requirement, is otherwise unsatisfactory for preset requirement.

Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, above-mentioned computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, Any reference used in provided herein and embodiment to memory, storage, database or other media, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double speed are according to rate SDRAM (SSRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..

It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, device, article or the method that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, device, article or method institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, device of element, article or method.

The foregoing is merely preferred embodiment of the present application, are not intended to limit the scope of the patents of the application, all utilizations Equivalent structure or equivalent flow shift made by present specification and accompanying drawing content is applied directly or indirectly in other correlations Technical field, similarly include in the scope of patent protection of the application.

Claims

1. a kind of method of voice print verification characterized by comprising

The voice signal of identity to be verified is extracted by client-server, and extracts each frame voice data in the voice signal Corresponding MFCC type vocal print feature；

The MFCC type vocal print feature is built into the corresponding sound of each frame voice data by the client-server Line feature vector, to form the first vocal print feature；

Voice print verification server judges first vocal print feature vocal print discriminant vectors i- corresponding with vocal print feature is prestored Whether the characteristic distance value between vector meets preset requirement；

2. the method for voice print verification according to claim 1, which is characterized in that described in the voice print verification server judgement First vocal print feature and whether prestore characteristic distance value between the corresponding vocal print discriminant vectors i-vector of vocal print feature The step of meeting preset requirement, comprising:

Pass through COS distance formulaCalculate the corresponding vocal print discriminant vectors i- of the first vocal print feature COS distance value cos (x, y) between vector vocal print discriminant vectors i-vector corresponding with vocal print feature is prestored, wherein x Representative prestores the corresponding vocal print discriminant vectors i-vector of vocal print feature, y represent the corresponding vocal print of the first vocal print feature identify to Measure i-vector；

Judge whether the COS distance value meets preset condition；

If satisfied, then determining first vocal print feature vocal print discriminant vectors i-vector corresponding with vocal print feature is prestored Between characteristic distance value meet preset requirement, be otherwise unsatisfactory for preset requirement.

3. the method for voice print verification according to claim 2, which is characterized in that described to respectively correspond each frame voice data Vocal print feature vector the step of being each mapped to the vocal print discriminant vectors i-vector of low dimensional, comprising:

The obtained corresponding vocal print feature vector of each frame voice data will be extracted and be separately input to GMM-UBM model, obtained Characterize the Gauss super vector of probability distribution of each frame voice data in each Gaussian component；

Each Gauss super vector is utilized into formula m_r=μ+T ω_r, the corresponding low dimensional of each frame voice data is calculated Vocal print discriminant vectors i-vector, wherein m_rFor the Gauss super vector of each frame voice data, μ is the GMM-UBM model Mean value super vector, T are vocal print the discriminant vectors i-vector, ω of the low dimensional of each frame voice data_rIt is high-dimensional to be mapped to The transition matrix of Gaussian spatial.

4. the method for voice print verification according to claim 2, which is characterized in that described whether to judge the COS distance value The step of meeting preset condition, comprising:

It obtains respectively and corresponding in the vocal print feature data of the multiple people prestored prestores vocal print feature and first vocal print The first COS distance value between feature, wherein prestore vocal print feature including target person in the vocal print feature data of multiple people；

In first COS distance value of the preceding preset quantity of judgement sequence, if prestore vocal print feature including the target person Corresponding first COS distance value；

If so, determining that the first COS distance value meets preset condition, it is otherwise unsatisfactory for preset condition.

5. the method for voice print verification according to claim 2, which is characterized in that described whether to judge the COS distance value The step of meeting preset condition, comprising:

If so, determining that the second COS distance value meets preset condition, it is otherwise unsatisfactory for preset condition.

6. a kind of voiceprint verification system, which is characterized in that including client, client-server and vocal print authentication server；

The client acquires the voice signal of identity to be verified, and sends the client service for the voice signal Device；

The client-server receives the voice signal, and carries out vocal print feature to the voice signal and extract to obtain first First vocal print feature is transmitted to voice print verification server by vocal print feature；

The voice print verification server receives first vocal print feature, and by first vocal print feature and prestores vocal print feature It is compared analysis, to judge first vocal print feature and whether described to prestore vocal print feature identical and judging result is anti- It is fed to the client-server；

7. voiceprint verification system according to claim 6, which is characterized in that the continuous analog signal of the voice signal is logical It crosses client to be sampled according to specified sample period, to form discrete analog signal, and by prescribed coding Rules expanding is Digital signal；The client-server receives the voice signal, and carries out vocal print feature to the voice signal and extract To the process of the first vocal print feature, comprising:

After the digital signal is carried out preemphasis by the client-server, the digital signal of preemphasis is carried out at framing Reason, obtains each frame voice data；

According toEach frame voice data is mapped to Meier from linear spectral domain Spectrum domain, wherein f_MelIndicate that Meier spectrum value, f indicate linear spectral value；

Each frame voice data for translating into Meier spectrum domain is input to Meier triangular filter group, calculates the Meier of each frequency range The logarithmic energy of triangular filter output, obtains the corresponding logarithmic energy sequence of each frame voice data；

Each logarithmic energy sequence is subjected to discrete cosine transform, obtains the corresponding MFCC type sound of each frame voice data Line feature；

The MFCC type vocal print feature is built into the corresponding vocal print feature vector of each frame voice data, described in being formed First vocal print feature.

8. voiceprint verification system according to claim 6, which is characterized in that the judging result includes first vocal print Feature and the vocal print feature that prestores be not identical, the client-server according to the judging result control the client into The process of row feedback response, comprising:

Judge the number for generating the unsuccessful feedback information of authentication in preset time according to first vocal print feature, if More than preset times.

9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In the step of processor realizes any one of claims 1 to 5 the method when executing the computer program.

10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method described in any one of claims 1 to 5 is realized when being executed by processor.