CN106098068A

CN106098068A - A kind of method for recognizing sound-groove and device

Info

Publication number: CN106098068A
Application number: CN201610416650.3A
Authority: CN
Inventors: 李为; 钱柄桦; 金星明; 李科; 吴富章; 吴永坚; 黄飞跃
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd; Tencent Cloud Computing Beijing Co Ltd
Priority date: 2016-06-12
Filing date: 2016-06-12
Publication date: 2016-11-09
Anticipated expiration: 2036-06-12
Also published as: CN106098068B; WO2017215558A1

Abstract

The embodiment of the invention discloses a kind of method for recognizing sound-groove and device, the method comprise the steps that acquisition checking user reads aloud the first character string produced checking voice messaging；To described checking voice messaging carry out speech recognition obtain described checking voice messaging in comprise respectively with the corresponding sound bite of multiple characters in described first character string；Extract the vocal print feature of the corresponding sound bite of each character；It according to the vocal print feature of the corresponding sound bite of each character described, is verified in voice messaging each character characteristic of correspondence vector in conjunction with the respective symbols corresponding universal background model training preset；Calculate the vectorial similarity score with respective symbols characteristic of correspondence vector in the registration voice messaging preset of each character characteristic of correspondence in checking voice messaging, if described similarity score reaches to preset checking thresholding, then described checking user is defined as described registration voice messaging corresponding registration user.Use the present invention, Application on Voiceprint Recognition accuracy rate can be effectively improved.

Description

A kind of method for recognizing sound-groove and device

Technical field

The present invention relates to voice recognition technology field, particularly relate to a kind of method for recognizing sound-groove and device.

Background technology

Application on Voiceprint Recognition knows method for distinguishing as a kind of biological information, including user registers and two rank of user identity identification Section.Voice is mapped as user model by a series of process by registration phase.At cognitive phase for the unknown language of one section of identity Sound, is carried out mating of similarity with model, and then sentences to whether the identity of unknown voice is consistent with the identity registering voice Disconnected.Existing vocal print modeling method is typically to be modeled from the unrelated aspect of text to realize retouching speaker's identity feature Stating, but the unrelated modeling pattern of text being when user reads aloud different content, recognition accuracy is relatively low, it is difficult to meets and requires.

Content of the invention

In view of this, the embodiment of the present invention provides a kind of method for recognizing sound-groove and device, can effectively improve Application on Voiceprint Recognition accurate True rate.

In order to solve above-mentioned technical problem, embodiments providing a kind of method for recognizing sound-groove, described method includes:

Obtain checking user and read aloud the first character string produced checking voice messaging；

To described checking voice messaging carry out speech recognition obtain described checking voice messaging in comprise respectively with described The corresponding sound bite of multiple characters in first character string；

Extract the vocal print feature of the corresponding sound bite of each character；

According to the vocal print feature of the corresponding sound bite of each character described, corresponding general in conjunction with the respective symbols preset Background model training is verified in voice messaging each character characteristic of correspondence vector；

Calculate each character characteristic of correspondence vector and corresponding word in the registration voice messaging preset in checking voice messaging Described checking if described similarity score reaches to preset checking thresholding, is then used by the similarity score of symbol characteristic of correspondence vector Family is defined as described registration voice messaging corresponding registration user.

Correspondingly, the embodiment of the present invention additionally provides a kind of voice print identification device, and described device includes:

Voice acquisition module, is used for obtaining checking user and reads aloud the first character string produced checking voice messaging；

Sound bite identification module, obtains described checking voice letter for carrying out speech recognition to described checking voice messaging Breath comprises respectively with the corresponding sound bite of multiple characters in described first character string；

Vocal print characteristic extracting module, the vocal print for extracting the corresponding sound bite of each character in checking voice messaging is special Levy；

Characteristic model training module, for the vocal print feature according to the corresponding sound bite of each character described, in conjunction with in advance If the training of respective symbols corresponding universal background model be verified in voice messaging each character characteristic of correspondence vector；

Similarity judge module, for calculating each character characteristic of correspondence vector and the note preset in checking voice messaging The similarity score of respective symbols characteristic of correspondence vector in volume voice messaging；

By described, subscriber identification module, if reaching to preset checking thresholding for described similarity score, then verifies that user is true It is set to described registration voice messaging corresponding registration user.

The present embodiment is by obtaining the vocal print verifying the corresponding sound bite of each character in voice messaging of checking user Feature, the UBM training in conjunction with the respective symbols preset is verified in voice messaging each character characteristic of correspondence vector, and leads to Cross and each character characteristic of correspondence vector in checking voice messaging and the characteristic vector of respective symbols in registration voice messaging are entered Row similarity-rough set, so that it is determined that checking user user identity, which in order to the user characteristics vector that compares with concrete Character is corresponding, fully takes into account vocal print feature when user reads aloud kinds of characters, thus can effectively improve Application on Voiceprint Recognition accuracy rate.

Brief description

In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing In having technology to describe, the accompanying drawing of required use is briefly described, it should be apparent that, the accompanying drawing in describing below is only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, all right Obtain other accompanying drawing according to these accompanying drawings.

Fig. 1 is the Stages Overview schematic diagram of the method for recognizing sound-groove in the embodiment of the present invention；

Fig. 2 is the schematic flow sheet of a kind of method for recognizing sound-groove in the embodiment of the present invention；

Fig. 3 is to identify from voice messaging in the embodiment of the present invention that the principle obtaining the corresponding sound bite of multiple character is shown It is intended to；

Fig. 4 is the principle signal obtaining each character characteristic of correspondence vector in the embodiment of the present invention from voice messaging Figure；

Fig. 5 is the voiceprint registration schematic flow sheet registering user in the embodiment of the present invention；

Fig. 6 is the schematic flow sheet of the method for recognizing sound-groove in another embodiment of the present invention；

Fig. 7 is the structural representation of a kind of voice print identification device in the embodiment of the present invention；

Fig. 8 is the structural representation of the sound bite identification module in the embodiment of the present invention.

Detailed description of the invention

Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Describe, it is clear that described embodiment is only a part of embodiment of the present invention, rather than whole embodiments wholely.Based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of not making creative work Embodiment, broadly falls into the scope of protection of the invention.

Embodiments provide a kind of method for recognizing sound-groove and device.This method for recognizing sound-groove and device can be applicable to Be there is a need in scene or the equipment of identification unknown subscriber's identity.For carrying out the character in the character string of Application on Voiceprint Recognition can be Arabic numerals, English alphabet or other language characters etc..In order to simplify description, the character in the embodiment of the present invention is with Arab It is illustrated as a example by Shuo Zi.

Method for recognizing sound-groove in the embodiment of the present invention can be divided into two stages, as shown in Figure 1:

1) the voiceprint registration stage of user is registered

In the voiceprint registration stage, registration user can read aloud a login-string (the second character i.e. hereinafter occurring String), voice print identification device gathers the registration voice messaging when reading aloud this login-string for this registration user, then to registration language Message breath carry out voice recognition obtain described registration voice messaging in comprise respectively with the multiple words in described login-string Accord with corresponding sound bite, and then corresponding sound bite carry out vocal print feature extraction and vocal print model training to each character, Including according to the vocal print feature of the corresponding sound bite of each character described, in conjunction with the corresponding common background of respective symbols preset Model (Universal Background Model, UBM, i.e. GMM-UBM) training obtains each character in registration voice messaging Characteristic of correspondence vector, then voice print identification device can be respectively different registration users and reads aloud it in the voiceprint registration stage Registration voice messaging in multiple character characteristic of correspondence vectors be saved in the model library of voice print identification device.

Such as, login-string is digit strings 0185851, contains four kinds of digital " 0 ", " 1 ", " 5 ", " 8 ", then sound Line identification device carries out vocal print feature extraction and sound-groove model according to the corresponding sound bite of each character in registration voice messaging Training, obtains " 0 ", " 1 ", " 5 ", the vocal print feature of " 8 " corresponding sound bite, and then it is corresponding to combine the respective symbols preset UBM training obtains each character characteristic of correspondence vector in registration voice messaging, including vectorial with digital " 0 " characteristic of correspondence, Vectorial with digital " 1 " characteristic of correspondence vector and numeral " 5 " characteristic of correspondence and vectorial with numeral " 8 " characteristic of correspondence.

2) the identification stage of user is verified

In the identification stage, the user of the i.e. unknown identity of checking user reads aloud a checking character string (i.e. hereinafter to be occurred The first character string, described second character string has at least one identical character with described first character string), Application on Voiceprint Recognition fill Put the collection checking voice messaging when reading aloud this checking character string for this checking user, then sound is carried out to checking voice messaging Identify obtain described checking voice messaging in comprise respectively with described checking character string in the corresponding voice sheet of multiple characters Section, and then corresponding sound bite carries out vocal print feature extraction and vocal print model training to each character, including according to described respectively The vocal print feature of the corresponding sound bite of individual character, is verified voice letter in conjunction with the respective symbols corresponding UBM training preset Each character characteristic of correspondence vector in breath, in finally calculating checking voice messaging, each character characteristic of correspondence is vectorial and presets Registration voice messaging in respective symbols characteristic of correspondence vector similarity score, if described similarity score reach preset test Card thresholding, then be defined as described registration voice messaging corresponding registration user by described checking user.

Such as, verify that character string is digit strings 85851510, then when voice print identification device is read aloud according to checking user In the checking voice messaging producing, the corresponding sound bite of each character carries out vocal print feature extraction and vocal print model training, obtains " 0 ", " 1 ", " 5 ", " 8 " corresponding GMM, so combine preset the corresponding UBM of respective symbols can be calculated checking user Checking voice messaging characteristic vector, including with digital " 0 " characteristic of correspondence vector and numeral " 1 " characteristic of correspondence vector, Vectorial with numeral " 5 " characteristic of correspondence and vectorial with numeral " 8 " characteristic of correspondence, and then calculate respectively in checking voice messaging " 0 ", " 1 ", " 5 ", " 8 " characteristic of correspondence vector respectively with registration voice messaging in " 0 ", " 1 ", " 5 ", " 8 " corresponding spy Levy the similarity score between vector, if described similarity score reaches to preset checking thresholding, then described checking user is determined For described registration voice messaging corresponding registration user.

It is pointed out that the identification stage of voiceprint registration stage of above-mentioned registration user and checking user can be Same equipment or device realize, it is also possible to realize in different equipment or device respectively, the vocal print note of such as registration user The volume stage implements in the first equipment, and then the multiple character characteristic of correspondence vector registered in voice messaging is sent out by the first equipment Give the second equipment, such that it is able to implement the identification stage of checking user in the second equipment.

Respectively above-mentioned two process is described in detail below by specific embodiment.

Fig. 2 is the schematic flow sheet of a kind of method for recognizing sound-groove in the embodiment of the present invention, as shown in the figure in the present embodiment Method for recognizing sound-groove flow process may include that

S201, obtains checking user and reads aloud the first character string produced checking voice messaging.

Described checking user is the user of unknown identity, needs to verify its user identity by voice print identification device.Described First character string is for verifying that user carries out the character string of authentication, can be randomly generated, it is also possible to be to preset admittedly A fixed character string, for example at least partly identical with registration corresponding second character string of voice messaging previously generating one Character string.Concrete, described character string can comprise m character, wherein has n mutually different character, and m, n are just whole Number, and m >=n.

Such as, the first character string is " 12358948 ", totally 8 characters, include 7 kinds of mutually different characters " 1 ", " 2 ", “3”、“4”、“5”、“8”、“9”。

In an alternative embodiment, voice print identification device can generate and show described first character string, allows and verifies user's root Read aloud according to described first character string of display.

S202, to described checking voice messaging carry out speech recognition obtain described checking voice messaging in comprise respectively with The corresponding sound bite of multiple characters in described first character string.

As it is shown on figure 3, voice print identification device can pass through speech recognition and intensity of sound filters, by described checking voice Information divides and obtains the corresponding sound bite of multiple character, optionally can also weed out invalid voice fragment, after being not involved in Continuous processing procedure.

S203, extracts the vocal print feature of the corresponding sound bite of each character.

Concrete, voice print identification device can extract the MFCC (Mel in the corresponding sound bite of each character Frequency Cepstrum Coefficient, mel cepstrum coefficients) or PLP (Perceptual Linear Predictive, perception linear predictor coefficient), as the vocal print feature of the sound bite corresponding to each character.

S204 is according to the vocal print feature of the corresponding sound bite of each character described, corresponding in conjunction with the respective symbols preset Universal background model training be verified in voice messaging each character characteristic of correspondence vector；

Described universal background model UBM in the embodiment of the present invention, is the language of a kind of optional network specific digit by a large amount of speakers The mixed Gauss model of segment combined training, characterizes distribution in feature space for the corresponding digital voice, again due to instruction Practicing data and deriving from substantial amounts of speaker, therefore it does not characterize a certain class and talks about people specifically, has the unrelated characteristic of identity, can Regard a kind of universal background model as.Schematically, the number of speaking can be used to be more than 1000 people, the language more than 20 hours for the duration Sound sample, and the frequency of occurrences relative equilibrium of each character, training obtains UBM.The mathematic(al) representation of UBM is:

P (x)=∑_{I=1 ... C}a_iN(x|μ_i, ∑_i) ... ... formula (1)

Wherein, P (x) represents the probability distribution of UBM, and C represents and has C Gauss module in UBM, sums up, a_iRepresent The weight of i-th Gauss module, μ_iRepresent the average of i-th Gauss module, ∑_iRepresent the variance of i-th Gauss module, N (x) Representing Gaussian Profile, x represents sample, sample namely the vocal print feature of input.

Voice print identification device can using checking voice messaging in the corresponding sound bite of each character vocal print feature as Training sample data, use maximal posterior probability algorithm (Maximum A Posteriori, MAP) to default respective symbols pair The parameter of the universal background model answered is adjusted, i.e. at the sound by the corresponding sound bite of each character in checking voice messaging After line feature substitutes into formula (1) as input sample, by the continuous corresponding universal background model of respective symbols adjusting and presetting Parameter so that posterior probability P (x) is maximum, such that it is able to verify voice according to the parameter determination making posterior probability P (x) maximum Respective symbols characteristic of correspondence vector in information.

May be used for differentiation speak owing to substantial amounts of experiment and paper demonstrate the average of each Gauss module in UBM model The identity information of people, the average super vector that we define UBM model is:

(\begin{matrix} μ_{1} \\ μ_{2} \\ . \\ . \\ . \\ . \\ μ_{C} \end{matrix})

Thus, voice print identification device can be by the vocal print feature of the corresponding sound bite of each character in checking voice messaging As training sample data, use maximal posterior probability algorithm (Maximum A Posteriori, MAP) to default corresponding word The average super vector according with corresponding universal background model is adjusted, and i.e. will verify the corresponding language of each character in voice messaging After the vocal print feature of tablet section substitutes into formula (1) as input sample, adjust average super vector by continuous so that posterior probability P X () is maximum, such that it is able to the average super vector of posterior probability P (x) maximum will be made as respective symbols in checking voice messaging Characteristic of correspondence vector.

In another alternative embodiment, the slow problem of the high-dimensional convergence rate brought in order to reduce super vector, we By the principal component analytical method (PPCA, probabilistic principal component analysis) based on probability The excursion of average super vector is limited in a sub spaces, voice print identification device can by checking voice messaging in each The vocal print feature of the corresponding sound bite of character as training sample data, uses maximal posterior probability algorithm to default corresponding The average super vector of the corresponding universal background model of character is adjusted, and combine preset super vector subspace matrices thus Each character characteristic of correspondence vector in checking voice messaging.In implementing, following formula can be used to default corresponding word The average super vector according with corresponding universal background model is adjusted so that the respective symbols corresponding common background mould after adjustment The posterior probability of type is maximum:

M=m+T ω, wherein M represents the average super vector of the universal background model of certain character after adjusting, and m represents tune The average super vector of the universal background model of the respective symbols before whole, T is the super vector subspace matrices preset, and ω is checking Respective symbols characteristic of correspondence vector in voice messaging, i.e. by the corresponding sound bite of each character in checking voice messaging After vocal print feature substitutes into formula (1) as input sample, by continuous adjust ω can realize the average in adjustment type (1) surpass to Amount so that posterior probability P (x) is maximum, such that it is able to will make the ω of posterior probability P (x) maximum as in checking voice messaging Respective symbols characteristic of correspondence vector.Described super vector subspace matrices T for according to the average of described gauss hybrid models surpass to Correlation determination between each dimension vector in amount obtains.

S205, calculates each character characteristic of correspondence vector and phase in the registration voice messaging preset in checking voice messaging Answer the similarity score of character characteristic of correspondence vector, if described similarity score reaches to preset checking thresholding, then test described Card user is defined as described registration voice messaging corresponding registration user.

Concrete, voice print identification device can get the registration voice messaging of registration user in the voiceprint registration stage, And by the vocal print feature extraction similar with the present embodiment and vocal print model training, can obtain registration voice messaging in each The sound bite characteristic of correspondence vector of character.Described registration voice messaging, can be that voice print identification device obtains registration user Reading aloud the second character string produced registration voice messaging, described second character string and described first character string have at least one Identical character, i.e. described registration corresponding second character string of voice messaging is at least partly identical with described first character string.Enter And in an alternative embodiment, it is corresponding that voice print identification device can also obtain respective symbols described registration voice messaging from outside After characteristic vector, i.e. registration user pass through other equipment typings registration voice messaging, other equipment or server pass through sound Line feature extraction and vocal print model training obtain registering the sound bite characteristic of correspondence vector of each character in voice messaging, sound Line identification device is by getting respective symbols characteristic of correspondence in described registration voice messaging from other equipment or server Vector, thus verifying the identification stage of user in order to enter with each character characteristic of correspondence vector in checking voice messaging Row compares.

In implementing, described similarity score is that voice print identification device will verify that in voice messaging, each character is corresponding After characteristic vector compares with respective symbols characteristic of correspondence vector in the registration voice messaging preset, weigh identical characters The score value of the similarity degree between two characteristic vectors.In an alternative embodiment, each word in checking voice messaging can be calculated COS distance value between respective symbols characteristic of correspondence vector in symbol characteristic of correspondence vector and default registration voice messaging As described similarity score, i.e. calculate certain character characteristic of correspondence vector sum in checking voice messaging respectively by following formula The similarity score between characteristic vector in registration voice messaging:

s c o r e = \frac{ω_{i} {(t a r)}^{T} * ω_{i} (t e s t)}{| | ω_{i} (t a r) | | * | | ω_{i} (t e s t) | |}

Wherein, subscript i represents character total in i-th checking voice messaging and registration voice messaging, ω_i(tar) table Show this character characteristic of correspondence vector, ω in checking voice messaging_i(test) represent that this character is right in registration voice messaging The characteristic vector answered.If checking voice messaging comprises multiple identical character in registration voice messaging, then can be according to above formula The similarity score of each character calculated takes average, presets if the similarity score average of each character reaches corresponding Checking thresholding, then be defined as described registration voice messaging corresponding registration user by described checking user.If there is multidigit registration User, such as registration user A, B and the C shown in Fig. 1, can note with each according to the characteristic vector of checking certain character of user The similarity of the characteristic vector of the respective symbols of volume user, when certain characteristic vector registering the respective symbols of user and checking language The similarity score of the characteristic vector of this character of sound is the highest and similarity reaches to preset checking thresholding, then make this registration user For verifying the identification result of user.

In an alternative embodiment, if described checking voice messaging existing same character occur more than once, such as occurring The 1st, the 0th, checking voice messaging as shown in Figure 25 and 8 all occur in that 2 times respectively, then can be corresponding according to character 0 twice Sound bite process the characteristic vector that the obtains similarity respectively with the characteristic vector of character 0 in default registration voice messaging The mean value of fraction, as the characteristic vector of character 0 and character 0 in the registration voice messaging preset in this checking voice messaging The similarity score of characteristic vector, by that analogy.

It is pointed out that the mode of the similarity weighed between two characteristic vectors also has a lot, above is only this A kind of embodiment of bright offer, those skilled in the art can be without creative labor on the basis of scheme disclosed by the invention The similarity obtaining the more characteristic vector calculating character total in checking voice messaging and registration voice messaging is divided dynamicly The mode of number, the present invention is not necessarily to exhaustive.

Thus, the present embodiment is by obtaining the corresponding sound bite of each character in the checking voice messaging verifying user Vocal print feature, the UBM training in conjunction with the respective symbols preset is verified in voice messaging each character characteristic of correspondence vector, And by by the feature of each character characteristic of correspondence vector in checking voice messaging and respective symbols in registration voice messaging to Amount carry out similarity-rough set, so that it is determined that checking user user identity, which in order to compare user characteristics vector with Concrete character is corresponding, fully takes into account vocal print feature when user reads aloud kinds of characters, thus it is accurate to effectively improve Application on Voiceprint Recognition True rate.

Fig. 5 is the voiceprint registration schematic flow sheet registering user in the embodiment of the present invention, as shown in the figure in the present embodiment Voiceprint registration flow process may include that

S501, obtains registration user and reads aloud the second character string produced registration voice messaging, described second character string with Described first character string has at least one identical character.

Described registration user i.e. determines the user of legal identity, and described second character string is for gathering registration user's vocal print The character string of characteristic vector, can be randomly generated, it is also possible to be to preset a fixing character string.Concrete, described the Two character strings also can comprise m character, wherein has n mutually different character, and m, n are positive integer, and m >=n.

In an alternative embodiment, voice print identification device can generate and show described second character string, allows and registers user's root Read aloud according to described second character string of display.

S502, to described registration voice messaging carry out speech recognition obtain described registration voice messaging in comprise respectively with The corresponding sound bite of multiple characters in described second character string；

Voice print identification device can pass through speech recognition and intensity of sound filters, and divides described checking voice messaging To the corresponding sound bite of multiple characters, optionally invalid voice fragment can also be weeded out, be not involved in follow-up process Journey.

S503, extracts the vocal print feature of the corresponding sound bite of each character in registration voice messaging.

S504, according to the vocal print feature of the corresponding sound bite of each character in registration voice messaging, in conjunction with the phase preset The training of character corresponding universal background model is answered to obtain each character characteristic of correspondence vector in registration voice messaging.

The expression formula of UBM is referred to embodiment above.This step of voiceprint registration flow process and Application on Voiceprint Recognition flow process S204 be similar to, voice print identification device can using registration voice messaging in the corresponding sound bite of each character vocal print feature as Training sample data, use maximal posterior probability algorithm (Maximum A Posteriori, MAP) to default respective symbols pair The parameter of the universal background model answered is adjusted, i.e. at the sound by the corresponding sound bite of each character in registration voice messaging After line feature substitutes into formula (1) as input sample, by the continuous corresponding universal background model of respective symbols adjusting and presetting Parameter so that posterior probability P (x) is maximum, such that it is able to register voice according to the parameter determination making posterior probability P (x) maximum Respective symbols characteristic of correspondence vector in information.

And owing to the average of each Gauss module in UBM model may be used for distinguishing the identity information of speaker, vocal print is known Other device can be adopted using the vocal print feature of the corresponding sound bite of each character in registration voice messaging as training sample data With maximal posterior probability algorithm (Maximum A Posteriori, MAP) to default respective symbols corresponding common background mould The average super vector of type is adjusted, and is i.e. making the vocal print feature of the corresponding sound bite of each character in registration voice messaging After substituting into formula (1) for input sample, adjust average super vector by continuous so that posterior probability P (x) is maximum, such that it is able to will The average super vector making posterior probability P (x) maximum is vectorial as respective symbols characteristic of correspondence in registration voice messaging.

In another alternative embodiment, equal to the default corresponding universal background model of respective symbols of following formula can be used Value super vector is adjusted so that the posterior probability of the corresponding universal background model of respective symbols after adjustment is maximum:

M=m+T ω, wherein M represents the average super vector of the universal background model of certain character after adjusting, and m represents tune The average super vector of the universal background model of the respective symbols before whole, T is the super vector subspace matrices preset, and ω is registration Respective symbols characteristic of correspondence vector in voice messaging, i.e. by the corresponding sound bite of each character in registration voice messaging After vocal print feature substitutes into formula (1) as input sample, by continuous adjust ω can realize the average in adjustment type (1) surpass to Amount so that posterior probability P (x) is maximum, such that it is able to will make the ω of posterior probability P (x) maximum as in registration voice messaging Respective symbols characteristic of correspondence vector.

Fig. 6 is the schematic flow sheet of the method for recognizing sound-groove in another embodiment of the present invention, as shown in the figure in the present embodiment Method for recognizing sound-groove can include below scheme:

S601, stochastic generation the first character string simultaneously shows.

S602, obtains checking user and reads aloud the first character string produced checking voice messaging.

S603, identifies the efficient voice fragment in described checking voice messaging and invalid voice fragment.

Concrete, according to intensity of sound, checking voice can be divided, sound bite less for intensity of sound is regarded For invalid voice fragment (for example including quiet section and impulsive noise).

S604, described efficient voice fragment is carried out speech recognition obtain respectively with the multiple words in described first character string Accord with corresponding sound bite.

Speech recognition can be passed through, obtain respectively with the corresponding sound bite of multiple characters in described first character string.

S605, determines the sequence of the sound bite of the described multiple characters verified in voice messaging and described first character string In the sequence of respective symbols consistent.

In order to be prevented effectively from after the voice messaging registering user is copied illegally or illegally copied in order to carry out Application on Voiceprint Recognition, permissible The first different character string of each stochastic generation, and the sound bite of the multiple characters in checking voice messaging is judged in this step Sequence whether consistent with the sequence of the respective symbols in the first character string, if inconsistent, then may determine that Application on Voiceprint Recognition failure, If the sequence with the respective symbols in the first character string is consistent, then perform follow-up flow process.

S606, extracts the vocal print feature of the corresponding sound bite of each character.

S607, using the vocal print feature of the corresponding sound bite of each character in checking voice messaging as number of training According to the employing average super vector to the default corresponding universal background model of respective symbols for the maximal posterior probability algorithm is adjusted Whole, thus estimate to be verified in voice messaging each character characteristic of correspondence vector.

May be used for differentiation speak owing to substantial amounts of experiment and paper demonstrate the average of each Gauss module in UBM model The identity information of people, voice print identification device can be by the vocal print feature of the corresponding sound bite of each character in checking voice messaging As training sample data, use maximal posterior probability algorithm (Maximum A Posteriori, MAP) to default corresponding word The average super vector according with corresponding universal background model is adjusted, and i.e. will verify the corresponding language of each character in voice messaging After the vocal print feature of tablet section substitutes into formula (1) as input sample, adjust average super vector by continuous so that posterior probability P X () is maximum, such that it is able to the average super vector of posterior probability P (x) maximum will be made as respective symbols in checking voice messaging Characteristic of correspondence vector.

In another alternative embodiment, the slow problem of the high-dimensional convergence rate brought in order to reduce super vector, vocal print Identify that device can use following formula to be adjusted the average super vector of the default corresponding universal background model of respective symbols, make The posterior probability of the corresponding universal background model of respective symbols after must adjusting is maximum:

M=m+T ω, wherein M represents the average super vector of the universal background model of certain character after adjusting, and m represents tune The average super vector of the universal background model of the respective symbols before whole, T is the super vector subspace matrices preset, and ω is checking Respective symbols characteristic of correspondence vector in voice messaging, i.e. by the corresponding sound bite of each character in checking voice messaging After vocal print feature substitutes into formula (1) as input sample, by continuous adjust ω can realize the average in adjustment type (1) surpass to Amount so that posterior probability P (x) is maximum, such that it is able to will make the ω of posterior probability P (x) maximum as in checking voice messaging Respective symbols characteristic of correspondence vector.

S608, calculates each character characteristic of correspondence vector and phase in the registration voice messaging preset in checking voice messaging Answer the similarity score of character characteristic of correspondence vector, if similarity score reaches to preset checking thresholding, then will verify that user is true It is set to registration voice messaging corresponding registration user.

In the present embodiment, voice print identification device can calculate in checking voice messaging each character characteristic of correspondence vector with In the registration voice messaging preset, the COS distance value between respective symbols characteristic of correspondence vector is as described similarity score, I.e. calculate certain character spy in characteristic of correspondence vector sum registration voice messaging in checking voice messaging respectively by following formula Levy the similarity score between vector:

s c o r e = \frac{ω_{i} {(t a r)}^{T} * ω_{i} (t e s t)}{| | ω_{i} (t a r) | | * | | ω_{i} (t e s t) | |}

Thus, the present embodiment will be by verifying each character characteristic of correspondence vector and registration voice messaging in voice messaging The characteristic vector of middle respective symbols carries out similarity-rough set, and the sequential combining sound bite judges, can be true further Protect checking user the accuracy of user identity.

Fig. 7 is the structural representation of a kind of voice print identification device in the embodiment of the present invention, as shown in the figure in the present embodiment Voice print identification device may include that

Voice acquisition module 710, is used for obtaining checking user and reads aloud the first character string produced checking voice messaging.

Sound bite identification module 720, obtains described checking language for carrying out speech recognition to described checking voice messaging Message breath in comprise respectively with the corresponding sound bite of multiple characters in described first character string.

Filter as it is shown on figure 3, sound bite identification module 720 can pass through speech recognition and intensity of sound, by described Checking voice messaging divides and obtains the corresponding sound bite of multiple character, optionally can also weed out invalid voice fragment, It is not involved in follow-up processing procedure.

In an alternative embodiment, described sound bite identification module may include that as shown in Figure 8 further

Effective fragment recognition unit 721, for identifying the described efficient voice fragment verified in voice messaging and invalid language Tablet section.

Concrete, checking voice can be divided by effective fragment recognition unit 721 according to intensity of sound, and sound is strong Spend less sound bite and be considered as invalid voice fragment (for example including quiet section and impulsive noise).

Voice recognition unit 722, obtains respectively with described first for carrying out speech recognition to described efficient voice fragment The corresponding sound bite of multiple characters in character string.

Vocal print characteristic extracting module 730, for extracting the sound of the corresponding sound bite of each character in checking voice messaging Line feature.

Concrete, vocal print characteristic extracting module 730 can extract the MFCC (Mel in the corresponding sound bite of each character Frequency Cepstrum Coefficient, mel cepstrum coefficients) or PLP (Perceptual Linear Predictive, perception linear predictor coefficient), as the vocal print feature of the sound bite corresponding to each character.

Characteristic model training module 740, is used for the vocal print feature according to the corresponding sound bite of each character described, in conjunction with The respective symbols corresponding universal background model training preset is verified in voice messaging each character characteristic of correspondence vector.

Characteristic model training module 740 can be by the vocal print spy of the corresponding sound bite of each character in checking voice messaging Levy as training sample data, use maximal posterior probability algorithm (Maximum A Posteriori, MAP) to default corresponding The parameter of the corresponding universal background model of character is adjusted, and i.e. will verify the corresponding voice sheet of each character in voice messaging After the vocal print feature of section substitutes into formula (1) as input sample, by the continuous corresponding common background of respective symbols adjusting and presetting The parameter of model so that posterior probability P (x) is maximum, thus characteristic model training module 740 can be according to making posterior probability P Respective symbols characteristic of correspondence vector in (x) maximum parameter determination checking voice messaging.

(\begin{matrix} μ_{1} \\ μ_{2} \\ . \\ . \\ . \\ . \\ μ_{C} \end{matrix})

Thus, characteristic model training module 740 can be by the corresponding sound bite of each character in checking voice messaging Vocal print feature as training sample data, uses maximal posterior probability algorithm (Maximum A Posteriori, MAP) to presetting The average super vector of the corresponding universal background model of respective symbols be adjusted, i.e. will each character in checking voice messaging After the vocal print feature of corresponding sound bite substitutes into formula (1) as input sample, adjust average super vector by continuous so that after Test probability P (x) maximum, characteristic model training module 740 can make the average super vector of posterior probability P (x) maximum as Respective symbols characteristic of correspondence vector in checking voice messaging.

In another alternative embodiment, the slow problem of the high-dimensional convergence rate brought in order to reduce super vector, we By the principal component analytical method (PPCA, probabilistic principal component analysis) based on probability Being limited in the excursion of average super vector in one sub spaces, characteristic model training module 740 can be by checking voice letter In breath, the vocal print feature of the corresponding sound bite of each character is as training sample data, uses maximal posterior probability algorithm in advance If the average super vector of the corresponding universal background model of respective symbols be adjusted, and combine the super vector subspace square preset Battle array thus be verified in voice messaging each character characteristic of correspondence vector.In implementing, characteristic model training module 740 Following formula can be used to be adjusted the average super vector of the default corresponding universal background model of respective symbols so that after adjustment The posterior probability of the corresponding universal background model of respective symbols maximum:

Similarity judge module 750, is used for calculating each character characteristic of correspondence in checking voice messaging vectorial and default Registration voice messaging in respective symbols characteristic of correspondence vector similarity score.

Concrete, voice print identification device can get the registration voice messaging of registration user in the voiceprint registration stage, And pass through sound bite identification module the 720th, vocal print characteristic extracting module 730 and characteristic model training module 740, can obtain The sound bite characteristic of correspondence vector of each character in registration voice messaging.Described registration voice messaging, can be that vocal print is known Other device obtains registration user and reads aloud the second character string produced registration voice messaging, described second character string and described first Character string has at least one identical character, i.e. described registration corresponding second character string of voice messaging and described first character Go here and there at least partly identical.And then in an alternative embodiment, voice print identification device can also obtain described registration voice letter from outside After respective symbols characteristic of correspondence vector in breath, i.e. registration user pass through other equipment typings registration voice messaging, other set Standby or server obtains registering the voice sheet of each character in voice messaging by vocal print feature extraction and vocal print model training Section characteristic of correspondence vector, voice print identification device is by getting in described registration voice messaging from other equipment or server Respective symbols characteristic of correspondence vector, thus checking user identification stage similarity judge module 750 in order to test In card voice messaging, each character characteristic of correspondence vector compares.

In implementing, described similarity score is that voice print identification device will verify that in voice messaging, each character is corresponding After characteristic vector compares with respective symbols characteristic of correspondence vector in the registration voice messaging preset, weigh identical characters The score value of the similarity degree between two characteristic vectors.In an alternative embodiment, similarity judge module 750 can calculate checking Each character characteristic of correspondence vector and respective symbols characteristic of correspondence vector in the registration voice messaging preset in voice messaging Between COS distance value as described similarity score, i.e. by following formula calculate certain character respectively checking voice messaging in The similarity score between characteristic vector in characteristic of correspondence vector sum registration voice messaging:

s c o r e = \frac{ω_{i} {(t a r)}^{T} * ω_{i} (t e s t)}{| | ω_{i} (t a r) | | * | | ω_{i} (t e s t) | |}

Wherein, subscript i represents character total in i-th checking voice messaging and registration voice messaging, ω_i(tar) table Show this character characteristic of correspondence vector, ω in checking voice messaging_i(test) represent that this character is right in registration voice messaging The characteristic vector answered.In an alternative embodiment, if described checking voice messaging existing same character and occurring more than once, such as Occur the 1st, the 0th, checking voice messaging as shown in Figure 25 and 8 all occur in that 2 times respectively, then can be according to character 0 twice Corresponding sound bite processes the phase respectively with the characteristic vector of character 0 in default registration voice messaging for the characteristic vector obtaining Like the mean value spending fraction, as in this characteristic vector verifying character 0 in voice messaging and the registration voice messaging preset The similarity score of the characteristic vector of character 0, by that analogy.

Subscriber identification module 760, if reaching to preset checking thresholding for described similarity score, then by described checking user It is defined as described registration voice messaging corresponding registration user.

If checking voice messaging comprises multiple identical character in registration voice messaging, then subscriber identification module 760 can Take average with the similarity score according to similarity judge module 750 each character calculated, if each character is similar Degree fraction average reaches corresponding default checking thresholding, then described checking user is defined as described registration voice messaging corresponding Registration user.If there is multidigit registration user, such as registration user A, B and the C shown in Fig. 1, subscriber identification module 760 is permissible The similarity of the characteristic vector of the respective symbols with each registration user for the characteristic vector according to checking certain character of user, when certain The characteristic vector of the respective symbols of individual registration user the highest with the similarity score of characteristic vector of this character of checking voice and Similarity reaches to preset checking thresholding, then using this registration user as the identification result verifying user.

And then in an alternative embodiment, described voice acquisition module 710, it is additionally operable to obtain registration user and read aloud the second character String produced registration voice messaging, described second character string has at least one identical character with described first character string；

Described sound bite identification module 720, is additionally operable to carry out speech recognition to described registration voice messaging and obtains described Registration voice messaging in comprise respectively with the corresponding sound bite of multiple characters in described second character string；

Described vocal print characteristic extracting module 730, is additionally operable to extract the corresponding voice sheet of each character in registration voice messaging The vocal print feature of section；

Described characteristic model training module 740, is additionally operable to according to the corresponding language of each character in described registration voice messaging The vocal print feature of tablet section, obtains in registration voice messaging each in conjunction with the respective symbols corresponding universal background model training preset Individual character characteristic of correspondence vector.

In an alternative embodiment, voice print identification device can also include further:

Character sorts determining module 770, for determining the sound bite of the multiple characters in described checking voice messaging Sort consistent with the sequence of the respective symbols in described first character string.

In order to be prevented effectively from after the voice messaging registering user is copied illegally or illegally copied in order to carry out Application on Voiceprint Recognition, permissible The first different character string of each stochastic generation, and the sound bite of the multiple characters in checking voice messaging is judged in this step Sequence whether consistent with the sequence of the respective symbols in the first character string, if inconsistent, then may determine that Application on Voiceprint Recognition failure, If the sequence with the respective symbols in the first character string is consistent, then can notify vocal print characteristic extracting module 730 or characteristic model Training module 740 performs to train for feature extraction and the vocal print of this checking voice messaging.

Character string display module 700, is used for the first character string described in stochastic generation and shows.

In actual test case, in 1000 people's training samples, 290,000 tests, (wherein the test of identities match is 1 About ten thousand times, matching test is not about at 280,000 times), it is capable of the recall rate of under one thousandth error rate 79.8%, wait wrong general Rate (EER, Equal Error Rate) is 3.39%, and compared to traditional unrelated modeling method of text, Application on Voiceprint Recognition performance carries Rise more than more than 40%.

One of ordinary skill in the art will appreciate that all or part of flow process realizing in above-described embodiment method, be permissible Instructing related hardware by computer program to complete, described program can be stored in a computer read/write memory medium In, this program is upon execution, it may include such as the flow process of the embodiment of above-mentioned each method.Wherein, described storage medium can be magnetic Dish, CD, read-only store-memory body (Read-Only Memory, ROM) or random store-memory body (Random Access Memory, RAM) etc..

The above disclosed present pre-ferred embodiments that is only, can not limit the right model of the present invention with this certainly Enclose, the equivalent variations therefore made according to the claims in the present invention, still belong to the scope that the present invention is covered.

Claims

1. a method for recognizing sound-groove, it is characterised in that described method includes:

To described checking voice messaging carry out speech recognition obtain described checking voice messaging in comprise respectively with described first The corresponding sound bite of multiple characters in character string；

According to the vocal print feature of the corresponding sound bite of each character described, in conjunction with the corresponding common background of respective symbols preset Model training is verified in voice messaging each character characteristic of correspondence vector；

Calculate each character characteristic of correspondence vector and respective symbols pair in the registration voice messaging preset in checking voice messaging By described, the similarity score of the characteristic vector answered, if described similarity score reaches to preset checking thresholding, then verifies that user is true It is set to described registration voice messaging corresponding registration user.

2. method for recognizing sound-groove as claimed in claim 1, it is characterised in that described acquisition checking user reads aloud the first character string Also include before produced checking voice messaging:

Obtain registration user and read aloud the second character string produced registration voice messaging, described second character string and described first word Symbol string has at least one identical character；

To described registration voice messaging carry out speech recognition obtain described registration voice messaging in comprise respectively with described second The corresponding sound bite of multiple characters in character string；

Extract the vocal print feature of the corresponding sound bite of each character in registration voice messaging；

According to the vocal print feature of the corresponding sound bite of each character in registration voice messaging, corresponding in conjunction with the respective symbols preset Universal background model training obtain each character characteristic of correspondence vector in registration voice messaging.

3. method for recognizing sound-groove as claimed in claim 1, it is characterised in that the corresponding voice of each character described in described basis The vocal print feature of fragment, is verified in voice messaging each in conjunction with the respective symbols corresponding universal background model training preset Character characteristic of correspondence vector includes:

Using the vocal print feature of the corresponding sound bite of each character in checking voice messaging as training sample data, use maximum The average super vector to the default corresponding universal background model of respective symbols for the posterior probability algorithm is adjusted, thus estimates Each character characteristic of correspondence vector in checking voice messaging.

4. method for recognizing sound-groove as claimed in claim 3, it is characterised in that described by each character pair in checking voice messaging The vocal print feature of the sound bite answered, as training sample data, uses maximal posterior probability algorithm to default respective symbols pair The average super vector of the universal background model answered is adjusted, thus it is corresponding to estimate to be verified in voice messaging each character Characteristic vector includes:

Using the vocal print feature of the corresponding sound bite of each character in checking voice messaging as training sample data, use maximum The average super vector to the default corresponding universal background model of respective symbols for the posterior probability algorithm is adjusted, and combines default Super vector subspace matrices thus be verified in voice messaging each character characteristic of correspondence vector.

5. method for recognizing sound-groove as claimed in claim 4, it is characterised in that described by each character pair in checking voice messaging The vocal print feature of the sound bite answered, as training sample data, uses maximal posterior probability algorithm to default respective symbols pair The average super vector of the universal background model answered is adjusted, and combine preset super vector subspace matrices thus be verified In voice messaging, each character characteristic of correspondence vector includes:

Using the vocal print feature of the corresponding sound bite of each character in checking voice messaging as training sample data, use following formula The average super vector of the default corresponding universal background model of respective symbols is adjusted so that the respective symbols pair after adjustment The posterior probability of the universal background model answered is maximum:

M=m+T ω, wherein M represents the average super vector of the universal background model of certain character after adjusting, and m represents before adjusting The average super vector of universal background model of respective symbols, T is the super vector subspace matrices preset, and ω is checking voice Respective symbols characteristic of correspondence vector in information.

6. method for recognizing sound-groove as claimed in claim 4, it is characterised in that described super vector subspace matrices is for according to described In universal background model each Gauss module weight between correlation determine and obtain.

7. method for recognizing sound-groove as claimed in claim 1, it is characterised in that each character in described calculating checking voice messaging Characteristic of correspondence vector includes with the similarity score of respective symbols characteristic of correspondence vector in the registration voice messaging preset:

Calculate each character characteristic of correspondence vector and respective symbols pair in the registration voice messaging preset in checking voice messaging COS distance value between the characteristic vector answered is as described similarity score.

8. method for recognizing sound-groove as claimed in claim 1, it is characterised in that described voice is carried out to described checking voice messaging Identify obtain described checking voice messaging in comprise respectively with the corresponding voice sheet of multiple characters in described first character string Section includes:

Identify the efficient voice fragment in described checking voice messaging and invalid voice fragment；

Carry out speech recognition to described efficient voice fragment and obtain corresponding with the multiple characters in described first character string respectively Sound bite.

9. method for recognizing sound-groove as claimed in claim 1, it is characterised in that described described checking user is defined as described note Also include before volume voice messaging corresponding registration user:

Determine that the sequence of the sound bite of the described multiple characters verified in voice messaging is corresponding to described first character string The sequence of character is consistent.

10. method for recognizing sound-groove as claimed in any one of claims 1-9 wherein, it is characterised in that checking user is bright in described acquisition Also include before reading the first character string produced checking voice messaging:

First character string described in stochastic generation simultaneously shows.

11. 1 kinds of voice print identification device, it is characterised in that described device includes:

Sound bite identification module, obtains in described checking voice messaging for carrying out speech recognition to described checking voice messaging Comprise respectively with the corresponding sound bite of multiple characters in described first character string；

Vocal print characteristic extracting module, for extracting the vocal print feature of the corresponding sound bite of each character in checking voice messaging；

Characteristic model training module, for the vocal print feature according to the corresponding sound bite of each character described, in conjunction with preset The training of respective symbols corresponding universal background model is verified in voice messaging each character characteristic of correspondence vector；

Similarity judge module, for calculating each character characteristic of correspondence vector and the registration language preset in checking voice messaging The similarity score of respective symbols characteristic of correspondence vector in message breath；

Described checking user if reaching to preset checking thresholding for described similarity score, is then defined as by subscriber identification module Described registration voice messaging corresponding registration user.

12. voice print identification device as claimed in claim 11, it is characterised in that

Described voice acquisition module, is additionally operable to obtain registration user and reads aloud the second character string produced registration voice messaging, institute State the second character string and have at least one identical character with described first character string；

Described sound bite identification module, is additionally operable to carry out speech recognition to described registration voice messaging and obtains described registration voice Information comprises respectively with the corresponding sound bite of multiple characters in described second character string；

Described vocal print characteristic extracting module, is additionally operable to extract the vocal print of the corresponding sound bite of each character in registration voice messaging Feature；

Described characteristic model training module, is additionally operable to according to the corresponding sound bite of each character in described registration voice messaging Vocal print feature, obtains each character pair in registration voice messaging in conjunction with the respective symbols corresponding universal background model training preset The characteristic vector answered.

13. voice print identification device as claimed in claim 11, it is characterised in that described characteristic vector computing module is used for:

14. voice print identification device as claimed in claim 13, it is characterised in that described characteristic vector computing module is used for:

15. voice print identification device as claimed in claim 14, it is characterised in that described characteristic vector computing module is specifically used In:

16. voice print identification device as claimed in claim 14, it is characterised in that described super vector subspace matrices is for according to institute State what the correlation determination between each dimension vector in the average super vector of gauss hybrid models obtained.

17. voice print identification device as claimed in claim 11, it is characterised in that described similarity judge module is used for:

18. voice print identification device as claimed in claim 11, it is characterised in that described sound bite identification module includes:

Effective fragment recognition unit, for identifying the described efficient voice fragment verified in voice messaging and invalid voice fragment；

Voice recognition unit, for described efficient voice fragment carried out speech recognition obtain respectively with in described first character string The corresponding sound bite of multiple characters.

19. voice print identification device as claimed in claim 11, it is characterised in that also include:

Character sequence determining module, is used for determining sequence and the institute of the sound bite of the multiple characters in described checking voice messaging The sequence stating respective symbols in the first character string is consistent.

20. voice print identification device as according to any one of claim 11-19, it is characterised in that also include:

Character string display module, is used for the first character string described in stochastic generation and shows.