WO2017215558A1 - Procédé et dispositif de reconnaissance d'empreinte vocale - Google Patents

Procédé et dispositif de reconnaissance d'empreinte vocale Download PDF

Info

Publication number
WO2017215558A1
WO2017215558A1 PCT/CN2017/087911 CN2017087911W WO2017215558A1 WO 2017215558 A1 WO2017215558 A1 WO 2017215558A1 CN 2017087911 W CN2017087911 W CN 2017087911W WO 2017215558 A1 WO2017215558 A1 WO 2017215558A1
Authority
WO
WIPO (PCT)
Prior art keywords
character
voice information
verification
voice
voiceprint
Prior art date
Application number
PCT/CN2017/087911
Other languages
English (en)
Chinese (zh)
Inventor
李为
钱柄桦
金星明
李科
吴富章
吴永坚
黄飞跃
Original Assignee
腾讯科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 腾讯科技(深圳)有限公司 filed Critical 腾讯科技(深圳)有限公司
Publication of WO2017215558A1 publication Critical patent/WO2017215558A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/08Use of distortion metrics or a particular distance between probe pattern and reference templates
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/04Training, enrolment or model building

Definitions

  • the present invention relates to the field of voice recognition technology, and in particular, to a voiceprint recognition method and apparatus.
  • Voiceprint recognition is a method of biometric information recognition, including user registration and user identification.
  • the registration phase maps speech through a series of processes to a user model.
  • the similarity is matched with the model, and then the identity of the unknown voice is consistent with the identity of the registered voice.
  • the existing voiceprint modeling methods usually model from the text-independent level to realize the description of the speaker's identity characteristics, but the text-independent modeling method is difficult to meet when the user reads different content. Claim.
  • the embodiment of the invention provides a voiceprint recognition method and device, which can effectively improve the accuracy of voiceprint recognition.
  • a first aspect of the embodiments of the present invention provides a voiceprint recognition method, where the method includes:
  • the user verification is verified as the registered user corresponding to the registered voice information.
  • the calculating, before verifying, the similarity between the voice segment corresponding to each character in the voice information and the voiceprint feature of the voice segment corresponding to the corresponding character in the preset registration voice information include:
  • the voiceprint features of the voice segments corresponding to the respective characters are extracted.
  • the similarity between the voiceprint feature of the voice segment corresponding to the corresponding character in the preset registration voice information is verified by the calculation:
  • the method before the obtaining the verification voice information generated by the verification user to read the first character string, the method further includes:
  • the feature vector corresponding to each character in the registered voice information is obtained according to the voiceprint feature corresponding to each character in the registered voice information, combined with the common background model corresponding to the preset corresponding character.
  • the voiceprint feature corresponding to the voice segment corresponding to the respective characters is combined with the common background model corresponding to the preset corresponding character to obtain a feature corresponding to each character in the verification voice information.
  • Vectors include:
  • the maximum a posteriori probability algorithm is used to adjust the mean supervector of the common background model corresponding to the preset corresponding character, and obtain the The feature vector corresponding to each character in the voice information is verified.
  • the voiceprint feature of the voice segment corresponding to each character in the verification voice information is used as training sample data, and the maximum a posteriori probability algorithm is used to correspond to the preset corresponding character.
  • the mean supervector of the background model is adjusted, so that the feature vector corresponding to each character in the verification voice information is obtained:
  • the maximum a posteriori probability algorithm is used to adjust the mean supervector of the universal background model corresponding to the preset corresponding character, and combined with the preset supervector subspace matrix, each of the verified voice information is obtained.
  • the feature vector corresponding to the character is obtained.
  • the voiceprint feature of the voice segment corresponding to each character in the verification voice information is used as the training sample data, and the maximum a posteriori probability algorithm is used to correspond to the preset corresponding character.
  • Adjusting the mean supervector of the universal background model, and combining the preset supervector subspace matrix, obtaining the feature vector corresponding to each character in the verification voice information includes:
  • the mean supervector of the common background model corresponding to the preset corresponding character is adjusted by using the following formula, so that the adjusted
  • the general background model corresponding to the corresponding character has the largest posterior probability:
  • M m+T ⁇ , where M represents the mean supervector of the universal background model of the adjusted character, m represents the mean supervector of the universal background model of the corresponding character before adjustment, and T is the preset supervector
  • is a feature vector corresponding to the corresponding character in the verification speech information.
  • the preset super-vector sub-space matrix is determined according to a correlation between weights of respective Gaussian modules in the universal background model.
  • the calculating a similarity score of the feature vector corresponding to each character in the verification voice information and the corresponding character in the preset registration voice information includes:
  • the performing voice recognition on the verification voice information to obtain the voice segments respectively included in the verification voice information and corresponding to the plurality of characters in the first character string includes:
  • the method before the determining, by the verification user, the registered user corresponding to the registration voice information, the method further includes:
  • the verification user determines the registered user corresponding to the registration voice information.
  • the method further includes: before the obtaining the verification voice information generated by the verification user to read the first character string, the method further includes:
  • the first character string is randomly generated, and the first character string is displayed.
  • a second aspect of the embodiments of the present invention provides a voiceprint recognition apparatus, where the apparatus includes:
  • a voice acquiring module configured to obtain verification voice information generated by the user to read the first character string
  • the similarity judging module is configured to calculate a similarity of the voiceprint features of the voice segment corresponding to the corresponding character in the preset voice information in the voice information corresponding to each character in the verification voice information;
  • the user identification module is configured to verify the verification user as the registered user corresponding to the registered voice information when the similarity reaches a preset threshold.
  • the voiceprint recognition apparatus further includes:
  • a voice segment identification module configured to perform voice recognition on the verification voice information, and obtain a voice segment respectively included in the verification voice information corresponding to multiple characters in the first character string;
  • the voiceprint feature extraction module is configured to extract a voiceprint feature of the voice segment corresponding to each character in the verification voice information.
  • the voiceprint recognition apparatus further includes:
  • a feature model training module configured to perform, according to the voiceprint feature of the voice segment corresponding to each character, and the common background model corresponding to the corresponding corresponding character, to obtain a feature vector corresponding to each character in the verification voice information
  • the similarity judging module is configured to calculate a similarity score of the feature vector corresponding to each character in the verification voice information and the corresponding character in the preset registration voice information, as the verification corresponding to each character in the voice information The similarity of the voiceprint features of the voice segment corresponding to the corresponding character in the preset registered voice information.
  • the voice acquiring module is further configured to obtain registration voice information generated by a registered user reading a second character string, where the second character string has at least one identical character with the first character string. ;
  • the feature model training module is further configured to: according to each character in the registered voice information
  • the voiceprint feature is combined with a common background model corresponding to the preset corresponding character to obtain a feature vector corresponding to each character in the registered voice information.
  • the feature model training module is configured to:
  • the maximum a posteriori probability algorithm is used to adjust the mean supervector of the common background model corresponding to the preset corresponding character, and obtain the The feature vector corresponding to each character in the voice information is verified.
  • the feature model training module is configured to:
  • the voiceprint feature of the voice segment corresponding to each character in the verification voice information is used as the training sample data, and the maximum a posteriori probability algorithm is used to adjust the mean supervector of the common background model corresponding to the preset corresponding character. Combined with the preset super-vector subspace matrix, the feature vector corresponding to each character in the voice information is obtained.
  • the feature model training module is specifically configured to:
  • the voiceprint feature of the voice segment corresponding to each character in the verification voice information is used as the training sample data, and the mean supervector of the common background model corresponding to the preset corresponding character is adjusted by using the following formula, so that the adjusted corresponding
  • the general background model corresponding to the character has the largest posterior probability:
  • M m+T ⁇ , where M represents the mean supervector of the universal background model of the adjusted character, m represents the mean supervector of the universal background model of the corresponding character before adjustment, and T is the preset supervector
  • is a feature vector corresponding to the corresponding character in the verification speech information.
  • the preset super-vector sub-space matrix is determined according to a correlation between respective dimension vectors in the mean super-vector of the Gaussian mixture model.
  • the similarity determination module is configured to:
  • the voice segment identification module includes:
  • a valid segment identification unit configured to identify a valid speech segment and an invalid speech segment in the verification speech information
  • the voice recognition unit is configured to perform voice recognition on the valid voice segment to obtain a voice segment corresponding to multiple characters in the first character string.
  • the voiceprint recognition apparatus further includes:
  • a character order determining module configured to determine whether a sorting of the voice segments of the plurality of characters in the verification voice information is consistent with a ranking of the corresponding characters in the first character string
  • the user identification module is further configured to: when the similarity reaches a preset threshold, and verify the order of the voice segments of the plurality of characters in the voice information and the corresponding characters in the first string In the case that the sorting is consistent, the verification user is determined as the registered user corresponding to the registered voice information.
  • the voiceprint recognition apparatus further includes:
  • a string display module configured to randomly generate the first string, and display the first string.
  • a third aspect of the embodiments of the present invention further provides a voiceprint recognition apparatus, including:
  • a processor for invoking the computer executable program code to perform the following operations:
  • the user verification is verified as the registered user corresponding to the registered voice information.
  • the processor before the obtaining the verification voice information generated by the verification user to read the first character string, the processor further calls the computer executable program code to perform the following operations:
  • the voiceprint features of the voice segments corresponding to the respective characters are extracted.
  • the processor invokes the computer executable program code to perform the following operations to calculate a sound segment of a voice segment corresponding to a corresponding character in the preset registered voice information in the voice information of the verification voice information. Similarity of the pattern features:
  • the processor before the obtaining the verification voice information generated by the verification user to read the first character string, the processor further calls the computer executable program code to perform the following operations:
  • the feature vector corresponding to each character in the registered voice information is obtained according to the voiceprint feature corresponding to each character in the registered voice information, combined with the common background model corresponding to the preset corresponding character.
  • the processor invokes the computer executable program code to perform the following operations to combine a common background model corresponding to the preset corresponding character according to a voiceprint feature of the voice segment corresponding to the respective character Training, obtaining a feature vector corresponding to each character in the verification voice information:
  • the maximum a posteriori probability algorithm is used to adjust the mean supervector of the common background model corresponding to the preset corresponding character, and obtain the The feature vector corresponding to each character in the voice information is verified.
  • the processor invokes the computer executable program code to perform the following operations to use the voiceprint feature of the voice segment corresponding to each character in the verification voice information as training sample data, using the maximum The probability algorithm adjusts the mean supervector of the universal background model corresponding to the preset corresponding character, thereby obtaining a feature vector corresponding to each character in the verified voice information:
  • the voiceprint feature of the voice segment corresponding to each character in the verification voice information is used as the training sample data, and the maximum a posteriori probability algorithm is used to adjust the mean supervector of the common background model corresponding to the preset corresponding character. And combining the preset super vector subspace matrix to obtain a feature vector corresponding to each character in the verification voice information.
  • the processor invokes the computer executable program code to perform the following operations to use the voiceprint feature of the voice segment corresponding to each character in the verification voice information as Training the sample data, using the maximum posterior probability algorithm to adjust the mean supervector of the universal background model corresponding to the preset corresponding character, and combining the preset supervector subspace matrix to obtain the verification voice
  • the feature vector corresponding to each character in the message :
  • the mean supervector of the common background model corresponding to the preset corresponding character is adjusted by using the following formula, so that the adjusted
  • the general background model corresponding to the corresponding character has the largest posterior probability:
  • M m+T ⁇ , where M represents the mean supervector of the universal background model of the adjusted character, m represents the mean supervector of the universal background model of the corresponding character before adjustment, and T is the preset supervector
  • is a feature vector corresponding to the corresponding character in the verification speech information.
  • the preset super-vector sub-space matrix is determined according to a correlation between weights of respective Gaussian modules in the universal background model.
  • the processor calls the computer executable program code to perform the following operations to calculate a feature vector corresponding to a feature vector corresponding to each character in the verification voice information and a corresponding character in the preset registration voice information.
  • the similarity scores include:
  • the processor calls the computer executable program code to perform the following operations to perform voice recognition on the verification voice information to obtain the verification voice information included in the first string and the first character string respectively
  • the voice segments corresponding to multiple characters include:
  • the processor before the determining the determined user as the registered user corresponding to the registered voice information, the processor further invokes the computer executable program code to perform the following operations:
  • the similarity reaches a preset threshold, and the plurality of characters in the voice information are verified If the order of the voice segments is consistent with the ordering of the corresponding characters in the first character string, the verification user is determined as the registered user corresponding to the registered voice information.
  • the processor before the obtaining the verification voice information generated by the verification user to read the first character string, the processor further calls the computer executable program code to perform the following operations:
  • the first character string is randomly generated, and the first character string is displayed through the user interface.
  • the fourth aspect of the embodiments of the present invention further provides a storage medium, where the computer program stores a computer program, where the computer program is used to perform any implementation manner according to the first aspect of the embodiments of the present invention.
  • the voiceprint feature of the voice segment corresponding to each character in the verification voice information of the user is obtained, and the UTM training corresponding to the preset corresponding character is used to obtain the feature vector corresponding to each character in the voice message, and the voice information is verified.
  • the feature vector corresponding to each character in the registration is compared with the feature vector of the corresponding character in the registered voice information, thereby determining the user identity of the verified user, and the user feature vector used for comparison corresponds to the specific character, fully considering the user reading aloud
  • the voiceprint features of different characters can effectively improve the accuracy of voiceprint recognition.
  • FIG. 1 is a schematic diagram showing the stages of a voiceprint recognition method in an embodiment of the present invention
  • FIG. 2 is a schematic flow chart of a voiceprint recognition method according to an embodiment of the present invention.
  • FIG. 3 is a schematic diagram showing the principle of identifying a voice segment corresponding to a plurality of characters from voice information in the embodiment of the present invention
  • FIG. 4 is a schematic diagram of a principle for acquiring feature vectors corresponding to respective characters from voice information according to an embodiment of the present invention
  • FIG. 5 is a schematic diagram of a voiceprint registration process of a registered user in an embodiment of the present invention.
  • FIG. 6 is a schematic flow chart of a voiceprint recognition method in another embodiment of the present invention.
  • FIG. 7 is a schematic structural diagram of a voiceprint recognition apparatus according to an embodiment of the present invention.
  • FIG. 8 is a schematic structural diagram of a voice segment identification module in an embodiment of the present invention.
  • FIG. 9 is a schematic structural diagram of another voiceprint recognition apparatus in an embodiment of the present invention.
  • Embodiments of the present invention provide a voiceprint recognition method and apparatus.
  • the voiceprint recognition method and apparatus can be applied to all scenes or devices that need to identify an unknown user.
  • the characters in the string used for voiceprint recognition may be Arabic numerals, English letters or other language characters.
  • the characters in the embodiments of the present invention are exemplified by taking Arabic numerals as an example.
  • the voiceprint recognition method in the embodiment of the present invention can be divided into two stages, as shown in FIG. 1:
  • the registered user can read a registration string (ie, the second character string appearing later), and the voiceprint recognition device collects the registered voice information when the registered user reads the registration string, and then registers the voice.
  • UBM Universal Background Model
  • the voiceprint recognition device can respectively save the feature vectors corresponding to the plurality of characters in the registered voice information that the different registered users read in the voiceprint registration stage in the model library of the voiceprint recognition device.
  • the registration string is a numeric string 0185851, and includes four numbers “0”, “1”, “5”, “8”, and the voiceprint recognition device performs sound according to the voice segment corresponding to each character in the registered voice information.
  • the pattern feature extraction and the voiceprint model training obtain the voiceprint features of the voice segments corresponding to “0”, “1”, “5”, and “8”, and then obtain the registered voice information in combination with the UBM training corresponding to the preset corresponding characters.
  • Individual characters Corresponding feature vectors include a feature vector corresponding to the number “0”, a feature vector corresponding to the number “1”, a feature vector corresponding to the number “5”, and a feature vector corresponding to the number “8”.
  • the user who authenticates the user reads a verification string (ie, the first character string that appears later, and the second character string has at least one of the same characters as the first character string).
  • the pattern recognition device collects the verification voice information when the verification user reads the verification character string, and then performs voice recognition on the verification voice information to obtain corresponding to the plurality of characters in the verification string respectively included in the verification voice information.
  • the voice segment, and then the voiceprint feature extraction and the voiceprint model training for the voice segment corresponding to each character, including the voiceprint feature corresponding to the voice segment corresponding to the respective character, and the UBM training corresponding to the corresponding corresponding character is used to obtain the verification voice.
  • the feature vector corresponding to each character in the information and finally calculating the similarity score of the feature vector corresponding to each character in the verification voice information and the corresponding character in the preset registration voice information, if the similarity score reaches the preset verification a threshold, the verification user is determined as a note corresponding to the registration voice information Users.
  • the voiceprint recognition device performs voiceprint feature extraction and voiceprint model training according to the voice segment corresponding to each character in the verification voice information generated when the user reads aloud, and obtains “0” and “ The GMM corresponding to 1", "5", and “8", and then combined with the preset UBM corresponding to the corresponding character, can calculate the feature vector of the verified voice information of the verified user, including the feature vector corresponding to the number "0", and the number a feature vector corresponding to "1", a feature vector corresponding to the number "5", and a feature vector corresponding to the number "8", thereby respectively calculating "0", "1", "5", "8” in the verification voice information a similarity score between the corresponding feature vectors and the feature vectors corresponding to "0", “1", "5", and “8” in the registered voice information, and if the similarity score reaches a preset verification threshold, And determining, by the verification user, a registered user
  • the voiceprint registration phase of the registered user and the identity phase of the authenticated user may be implemented in the same device or device, or may be implemented in different devices or devices, for example, the voiceprint registration phase of the registered user is
  • the first device is implemented, and the first device sends the feature vector corresponding to the multiple characters in the registered voice information to the second device, so that the identity recognition phase of the authenticated user can be implemented in the second device.
  • FIG. 2 is a schematic flow chart of a method for identifying a voiceprint according to an embodiment of the present invention. As shown in the figure, the flow of the voiceprint recognition method in the embodiment may include:
  • the user who authenticates the user needs to verify the identity of the user through the voiceprint recognition device.
  • the first character string is a character string used for verifying the user's identity verification, and may be randomly generated, or may be a preset fixed string, for example, a second string corresponding to the pre-generated registration voice information. Part of the same string.
  • the first character string may include m characters, wherein n characters are different from each other, m and n are positive integers, and m ⁇ n.
  • the first string is "12358948", a total of 8 characters, including 7 different characters "1", “2", “3", “4", "5", "8", " 9".
  • the voiceprint recognition device may generate and display the first character string for the verification user to read aloud according to the displayed first character string.
  • S202 Perform speech recognition on the verification voice information to obtain a voice segment respectively included in the verification voice information and corresponding to multiple characters in the first character string.
  • the voiceprint recognition device can divide the verification voice information into voice segments corresponding to multiple characters through voice recognition and voice intensity filtering, and optionally remove the invalid voice segments, and do not participate in subsequent Process.
  • the voiceprint recognition device may extract MFCC (Mel Frequency Cepstrum Coefficient) or PLP (Perceptual Linear Predictive Coefficient) in the voice segment corresponding to each character as the voice corresponding to each character.
  • MFCC Mel Frequency Cepstrum Coefficient
  • PLP Perceptual Linear Predictive Coefficient
  • the universal background model UBM in the embodiment of the present invention is a mixed Gaussian model in which a plurality of speech segments of a plurality of speakers are mixed and trained, and the distribution of the corresponding digital speech in the feature space is represented, and The data comes from a large number of speakers, so it does not represent a specific type of speaker, and has an identity-independent nature, which can be regarded as a general background model.
  • a speech sample with more than 1000 speakers and a duration of more than 20 hours can be used, and each character is out.
  • the current frequency is relatively balanced and training is obtained by UBM.
  • the mathematical expression of UBM is:
  • P(x) represents the probability distribution of UBM
  • C represents the total of C Gaussian modules in UBM
  • a i represents the weight of the i-th Gaussian module
  • ⁇ i represents the mean of the i-th Gaussian module
  • ⁇ i represents The variance of the i-th Gaussian module
  • N(x) represents a Gaussian distribution
  • x represents the voiceprint characteristics of the input sample.
  • the voiceprint recognition device may use the voiceprint feature of the voice segment corresponding to each character in the verification voice information as the training sample data, and adopt a maximum a posteriori probability algorithm (MAP) to the common background model corresponding to the preset corresponding character.
  • MAP maximum a posteriori probability algorithm
  • the parameter is adjusted, that is, after the voiceprint feature of the voice segment corresponding to each character in the verification voice information is substituted into the formula (1) as an input sample, the posterior check is made by continuously adjusting the parameters of the common background model corresponding to the corresponding corresponding character.
  • the probability P(x) is the largest, so that the feature vector corresponding to the corresponding character in the verification speech information can be determined according to the parameter that makes the posterior probability P(x) the largest.
  • the mean supervector of the UBM model is defined as:
  • the voiceprint recognition device can use the voiceprint feature of the voice segment corresponding to each character in the verification voice information as the training sample data, and adopt the maximum posterior probability algorithm (MAP) to the common background corresponding to the preset corresponding character.
  • MAP maximum posterior probability algorithm
  • the mean value supervector of the model is adjusted, that is, after the voiceprint feature of the voice segment corresponding to each character in the verification voice information is substituted into the formula (1) as an input sample, the posterior probability P(x) is adjusted by continuously adjusting the mean supervector.
  • the maximum so that the mean supervector that maximizes the posterior probability P(x) can be used as the feature vector corresponding to the corresponding character in the verification speech information.
  • the profiling of the mean supervector is limited to a subspace by probabilistic principal component analysis (PPCA), and the voiceprint recognition device can verify the voiceprint features of the speech segments corresponding to each character in the speech information.
  • PPCA probabilistic principal component analysis
  • the voiceprint recognition device can verify the voiceprint features of the speech segments corresponding to each character in the speech information.
  • the maximum a posteriori probability algorithm is used to adjust the mean supervector of the common background model corresponding to the corresponding corresponding character
  • the preset super-vector subspace matrix is used to obtain the corresponding features of each character in the verified voice information. vector.
  • the mean supervector of the common background model corresponding to the preset corresponding character may be adjusted by using the following formula, so that the posterior probability of the universal background model corresponding to the adjusted corresponding character is the largest:
  • M m+T ⁇
  • M represents the mean supervector of the universal background model of the adjusted character
  • m represents the mean supervector of the universal background model of the corresponding character before adjustment
  • T is the preset supervector subspace matrix
  • is the feature vector corresponding to the corresponding character in the verification voice information, that is, after the voiceprint feature of the voice segment corresponding to each character in the verification voice information is substituted into the formula (1) as an input sample, the adjustment can be realized by continuously adjusting ⁇ .
  • the mean supervector in (1) maximizes the posterior probability P(x), so that ⁇ that maximizes the posterior probability P(x) can be used as the feature vector corresponding to the corresponding character in the verification speech information.
  • the super-vector subspace matrix T is determined according to the correlation between each dimension vector in the mean supervector of the Gaussian mixture model.
  • the voiceprint recognition device can obtain the registered voice information of the registered user in the voiceprint registration stage, and obtain the voiceprint feature extraction and the voiceprint model training similar to the embodiment, so that the characters in the registered voice information can be obtained.
  • the registration voice information may be that the voiceprint recognition device acquires the registration voice information generated by the registered user reading the second character string, and the second character string has at least one of the same characters as the first character string, that is, the The second character string corresponding to the registration voice information is at least partially identical to the first character string.
  • the voiceprint recognition device may further acquire the feature vector corresponding to the corresponding character in the registered voice information from the outside, that is, after the registered user inputs the registered voice information through other devices, the other device or the server passes the voiceprint.
  • Feature extraction and voiceprint model training obtain feature vectors corresponding to voice segments of each character in the registered voice information, and the voiceprint recognition device acquires the phase in the registered voice information by using other devices or servers The feature vector corresponding to the character is used to compare the feature vector corresponding to each character in the verification voice information in the identification phase of the verification user.
  • the similarity score is that the voiceprint recognition device compares the feature vector corresponding to each character in the voice information with the feature vector corresponding to the corresponding character in the preset registered voice information, and then measures two features of the same character. The degree of similarity between vectors.
  • a cosine distance value between a feature vector corresponding to each character in the verification voice information and a feature vector corresponding to a corresponding character in the preset registration voice information may be calculated, and the cosine distance value is used as the
  • the similarity score that is, the similarity score between the corresponding feature vector in the verification voice information and the feature vector in the registered voice information is calculated by the following formula:
  • the subscript i indicates a character common to the i-th verification voice information and the registration voice information
  • ⁇ i (tar) indicates a corresponding feature vector of the character in the verification voice information
  • ⁇ i (test) indicates that the character is in the registered voice.
  • the registered user may be authenticated according to the similarity between the feature vector of a certain character of the user and the corresponding character of each registered user.
  • the feature vector of the corresponding character and the feature vector of the character of the verification voice have the highest similarity score and the similarity reaches the preset verification threshold, and the registered user is used as the identification result of the verification user.
  • the same character appears in the verification voice information more than once, for example, if 0, 1, 5, and 8 appear in the verification voice information as shown in FIG. 2, respectively.
  • the average of the similarity scores of the feature vectors processed by the two-character 0 corresponding to the speech segment and the feature vector of the character 0 in the preset registration speech information is used as the feature vector and preset of the character 0 in the verification speech information.
  • the similarity score of the feature vector of character 0 in the registered voice information and so on.
  • the UTM training corresponding to the preset corresponding character is used to obtain the feature vector corresponding to each character in the verification voice information, and the verification is performed.
  • the feature vector corresponding to each character in the voice information is compared with the feature vector of the corresponding character in the registered voice information, thereby determining the user identity of the verified user.
  • the user feature vector used for comparison corresponds to the specific character, and the user is fully considered.
  • the voiceprint features of different characters are read aloud, so that the accuracy of voiceprint recognition can be effectively improved.
  • FIG. 5 is a schematic diagram of a voiceprint registration process of a registered user in the embodiment of the present invention. As shown in the figure, the voiceprint registration process in this embodiment may include:
  • the registered user is a user who determines a legal identity.
  • the second character string is a string used to collect the voiceprint feature vector of the registered user, and may be randomly generated or a fixed string.
  • the second character string may also include m characters, wherein there are n characters different from each other, m and n are positive integers, and m ⁇ n.
  • the voiceprint recognition device may generate and display the second character string for the registered user to read aloud according to the displayed second character string.
  • the voiceprint recognition device can divide the verification voice information into voice segments corresponding to multiple characters through voice recognition and voice intensity filtering, and optionally remove the invalid voice segments, and do not participate in subsequent processing.
  • the voiceprint recognition device may extract MFCC (Mel Frequency Cepstrum Coefficient) or PLP (Perceptual Linear Predictive Coefficient) in the voice segment corresponding to each character as the voice corresponding to each character.
  • MFCC Mel Frequency Cepstrum Coefficient
  • PLP Perceptual Linear Predictive Coefficient
  • UBM can refer to the previous embodiment.
  • This step of the voiceprint registration process is similar to S204 of the voiceprint recognition process.
  • the voiceprint recognition device can use the voiceprint feature of the voice segment corresponding to each character in the registered voice information as the training sample data, and adopt the maximum posterior probability algorithm (Maximum A).
  • Posteriori, MAP adjusts the parameters of the common background model corresponding to the corresponding corresponding characters, that is, after the voiceprint feature of the voice segment corresponding to each character in the registered voice information is substituted into the formula (1) as an input sample,
  • the parameters of the common background model corresponding to the preset corresponding characters are such that the posterior probability P(x) is the largest, so that the feature vector corresponding to the corresponding character in the registered voice information can be determined according to the parameter that makes the posterior probability P(x) the largest.
  • the voiceprint recognition device can use the voiceprint feature of the voice segment corresponding to each character in the registered voice information as the training sample data, using the maximum posterior
  • the Maximum A Posteriori (MAP) adjusts the mean supervector of the universal background model corresponding to the preset corresponding characters, that is, the voiceprint feature of the voice segment corresponding to each character in the registered voice information is used as an input sample substitution type ( 1) After, by continuously adjusting the mean supervector, the posterior probability P(x) is maximized, so that the mean supervector that maximizes the posterior probability P(x) can be used as the feature vector corresponding to the corresponding character in the registered speech information.
  • MAP Maximum A Posteriori
  • the mean supervector of the universal background model corresponding to the preset corresponding character may be adjusted by using the following formula, so that the posterior probability of the universal background model corresponding to the adjusted corresponding character is the largest:
  • M m+T ⁇
  • M represents the mean supervector of the universal background model of the adjusted character
  • m represents the mean supervector of the universal background model of the corresponding character before adjustment
  • T is the preset supervector subspace matrix
  • is the feature vector corresponding to the corresponding character in the registered voice information, that is, after the voiceprint feature of the voice segment corresponding to each character in the registered voice information is substituted into the formula (1) as an input sample, the adjustment can be realized by continuously adjusting ⁇ .
  • the mean supervector in (1) maximizes the posterior probability P(x), so that ⁇ that maximizes the posterior probability P(x) can be used as the feature vector corresponding to the corresponding character in the registered speech information.
  • FIG. 6 is a schematic flow chart of a voiceprint recognition method according to another embodiment of the present invention, as shown in the figure.
  • the voiceprint recognition method in the embodiment may include the following processes:
  • the verification voice may be divided according to the sound intensity, and the voice segment with less sound intensity is regarded as an invalid voice segment (for example, including a silent segment and impulse noise).
  • a voice segment respectively corresponding to a plurality of characters in the first character string can be obtained by voice recognition.
  • a different first character string may be randomly generated each time, and in this step, the voice segment of the plurality of characters in the voice information is determined. Whether the sorting is consistent with the sorting of the corresponding characters in the first string, if not, it can be judged that the voiceprint recognition fails, and if the sorting of the corresponding characters in the first string is consistent, the subsequent process is performed.
  • the voiceprint recognition device may extract MFCC (Mel Frequency Cepstrum Coefficient) or PLP (Perceptual Linear Predictive Coefficient) in the voice segment corresponding to each character as the voice corresponding to each character.
  • MFCC Mel Frequency Cepstrum Coefficient
  • PLP Perceptual Linear Predictive Coefficient
  • the voiceprint feature of the voice segment corresponding to each character in the voice information is used as the training sample data, and the maximum a posteriori probability algorithm is used to adjust the mean supervector of the common background model corresponding to the corresponding corresponding character, so that the estimation is verified.
  • a feature vector corresponding to each character in the voice message is used as the training sample data, and the maximum a posteriori probability algorithm is used to adjust the mean supervector of the common background model corresponding to the corresponding corresponding character, so that the estimation is verified.
  • the voiceprint recognition device can verify the voiceprint feature of the voice segment corresponding to each character in the voice information as a training sample.
  • Data using Maximum A Posteriori (MAP) to adjust the mean supervector of the common background model corresponding to the corresponding corresponding character, that is, the voiceprint feature of the voice segment corresponding to each character in the verification voice information
  • MAP Maximum A Posteriori
  • the voiceprint recognition apparatus may adjust the mean super vector of the common background model corresponding to the preset corresponding character by using the following formula:
  • the posterior probability of the universal background model corresponding to the adjusted corresponding characters is the largest:
  • M m+T ⁇
  • M represents the mean supervector of the universal background model of the adjusted character
  • m represents the mean supervector of the universal background model of the corresponding character before adjustment
  • T is the preset supervector subspace matrix
  • is the feature vector corresponding to the corresponding character in the verification voice information, that is, after the voiceprint feature of the voice segment corresponding to each character in the verification voice information is substituted into the formula (1) as an input sample, the adjustment can be realized by continuously adjusting ⁇ .
  • the mean supervector in (1) maximizes the posterior probability P(x), so that ⁇ that maximizes the posterior probability P(x) can be used as the feature vector corresponding to the corresponding character in the verification speech information.
  • the voiceprint recognition apparatus may calculate a cosine distance value between the feature vector corresponding to each character in the verification voice information and the feature vector corresponding to the corresponding character in the preset registration voice information, and use the cosine distance value as
  • the similarity score that is, the similarity score between the corresponding feature vector in the verification voice information and the feature vector in the registered voice information is calculated by the following formula:
  • the subscript i indicates a character common to the i-th verification voice information and the registration voice information
  • ⁇ i (tar) indicates a corresponding feature vector of the character in the verification voice information
  • ⁇ i (test) indicates that the character is in the registered voice.
  • the registered user may be authenticated according to the similarity between the feature vector of a certain character of the user and the corresponding character of each registered user.
  • the feature vector of the corresponding character and the feature vector of the character of the verification voice have the highest similarity score and the similarity reaches the preset verification threshold, and the registered user is used as the identification result of the verification user.
  • the user identity of the user can be further verified. accuracy.
  • FIG. 7 is a schematic structural diagram of a voiceprint identifying apparatus according to an embodiment of the present invention. As shown in the figure, the voiceprint identifying apparatus in this embodiment may include:
  • the voice obtaining module 710 is configured to obtain verification voice information generated by the user to read the first character string.
  • the user who authenticates the user needs to verify the identity of the user through the voiceprint recognition device.
  • the first character string is a character string used for verifying the user's identity verification, and may be randomly generated, or may be a preset fixed string, for example, a second string corresponding to the pre-generated registration voice information. Part of the same string.
  • the first character string may include m characters, wherein n characters are different from each other, m and n are positive integers, and m ⁇ n.
  • the first string is "12358948", a total of 8 characters, including 7 different characters "1", “2", “3", “4", "5", "8", " 9".
  • the voice segment identification module 720 is configured to perform voice recognition on the verification voice information to obtain voice segments respectively corresponding to the plurality of characters in the first character string included in the verification voice information.
  • the voice segment identification module 720 can divide the verification voice information into voice segments corresponding to multiple characters through voice recognition and voice intensity filtering, and optionally remove the invalid voice segments without participating. Subsequent processing.
  • the voice segment identification module may further include:
  • the valid segment identification unit 721 is configured to identify a valid speech segment and an invalid speech segment in the verification speech information.
  • the effective segment recognizing unit 721 can divide the verification speech according to the sound intensity, and treat the speech segment with a small sound intensity as an invalid speech segment (for example, including a silent segment and impulse noise).
  • a voice recognition unit 722 configured to perform voice recognition on the valid voice segment to obtain a separate voice A speech segment corresponding to a plurality of characters in the first character string.
  • the voiceprint feature extraction module 730 is configured to extract a voiceprint feature of the voice segment corresponding to each character in the verification voice information.
  • the voiceprint feature extraction module 730 can extract MFCC (Mel Frequency Cepstrum Coefficient) or PLP (Perceptual Linear Predictive Coefficient) in the voice segment corresponding to each character, as corresponding to each character.
  • MFCC Mel Frequency Cepstrum Coefficient
  • PLP Perceptual Linear Predictive Coefficient
  • the feature model training module 740 is configured to train, according to the voiceprint feature of the voice segment corresponding to the respective characters, a feature vector corresponding to each character in the verification voice information according to the common background model corresponding to the preset corresponding character.
  • the feature model training module 740 may use the voiceprint feature of the voice segment corresponding to each character in the voice information as the training sample data, and adopt a maximum posterior probability algorithm (MAP) to the common background model corresponding to the preset corresponding character.
  • MAP maximum posterior probability algorithm
  • the parameters are adjusted, that is, after the voiceprint feature of the voice segment corresponding to each character in the verification voice information is substituted into the formula (1) as an input sample, the parameters of the common background model corresponding to the corresponding corresponding characters are continuously adjusted, so that The probability P(x) is the largest, so that the feature model training module 740 can determine the feature vector corresponding to the corresponding character in the verification voice information according to the parameter that maximizes the posterior probability P(x).
  • the feature model training module 740 can use the voiceprint feature of the voice segment corresponding to each character in the voice information as the training sample data, and use the Maximum A Posteriori (MAP) algorithm to correspond to the preset corresponding character.
  • MAP Maximum A Posteriori
  • the mean supervector of the background model is adjusted In the whole, that is, after the voiceprint feature of the voice segment corresponding to each character in the verification voice information is substituted into the formula (1), the posterior probability P(x) is maximized by continuously adjusting the mean supervector, and the feature model training module 740 may use the mean supervector that maximizes the posterior probability P(x) as the feature vector corresponding to the corresponding character in the verification speech information.
  • the feature model training module 740 may use the voiceprint feature of the voice segment corresponding to each character in the verification voice information as the training sample data, and adopt the maximum posterior probability algorithm to the common background model corresponding to the preset corresponding character.
  • the mean super vector is adjusted, and the preset super vector subspace matrix is combined to obtain a feature vector corresponding to each character in the verified speech information.
  • the feature model training module 740 may adjust the mean super vector of the common background model corresponding to the preset corresponding character by using the following formula, so that the posterior probability of the common background model corresponding to the adjusted corresponding character is the largest:
  • M m+T ⁇
  • M represents the mean supervector of the universal background model of the adjusted character
  • m represents the mean supervector of the universal background model of the corresponding character before adjustment
  • T is the preset supervector subspace matrix
  • is the feature vector corresponding to the corresponding character in the verification voice information, that is, after the voiceprint feature of the voice segment corresponding to each character in the verification voice information is substituted into the formula (1) as an input sample, the adjustment can be realized by continuously adjusting ⁇ .
  • the mean supervector in (1) maximizes the posterior probability P(x), so that ⁇ that maximizes the posterior probability P(x) can be used as the feature vector corresponding to the corresponding character in the verification speech information.
  • the super-vector subspace matrix T is determined according to the correlation between each dimension vector in the mean supervector of the Gaussian mixture model.
  • the similarity determining module 750 is configured to calculate a similarity score of the feature vector corresponding to the corresponding character in the preset registered voice information in the verification voice information.
  • the voiceprint recognition device can obtain the registered voice information of the registered user in the voiceprint registration stage, and can obtain the registered voice information through the voice segment identification module 720, the voiceprint feature extraction module 730, and the feature model training module 740.
  • the registration voice information may be that the voiceprint recognition device acquires the registration voice information generated by the registered user reading the second character string, and the second character string has at least one of the same characters as the first character string, that is, the The second character string corresponding to the registration voice information is at least partially identical to the first character string.
  • the voiceprint recognition device may further acquire a feature vector corresponding to the corresponding character in the registered voice information, that is, after the registered user inputs the registered voice information through other devices, the other device or the server passes the voiceprint feature.
  • the extraction and voiceprint model training obtains a feature vector corresponding to the voice segment of each character in the registered voice information, and the voiceprint recognition device obtains the feature vector corresponding to the corresponding character in the registered voice information from other devices or servers, thereby verifying the user
  • the identification stage similarity judgment module 750 is configured to compare the feature vectors corresponding to the respective characters in the verification voice information.
  • the similarity score is that the voiceprint recognition device compares the feature vector corresponding to each character in the voice information with the feature vector corresponding to the corresponding character in the preset registered voice information, and then measures two features of the same character. The degree of similarity between vectors.
  • the similarity determination module 750 may calculate a cosine distance value between the feature vector corresponding to each character in the verification voice information and the feature vector corresponding to the corresponding character in the preset registration voice information, and the cosine distance value is The distance value is used as the similarity score, that is, the similarity score between the corresponding feature vector in the verification voice information and the feature vector in the registered voice information is calculated by the following formula:
  • the subscript i indicates a character common to the i-th verification voice information and the registration voice information
  • ⁇ i (tar) indicates a corresponding feature vector of the character in the verification voice information
  • ⁇ i (test) indicates that the character is in the registered voice.
  • the corresponding feature vector in the message if the same character appears in the verification voice information more than once, for example, if 0, 1, 5, and 8 appear in the verification voice information as shown in FIG. 2, respectively, then The average of the similarity scores of the feature vectors processed by the two-character 0 corresponding to the speech segment and the feature vector of the character 0 in the preset registration speech information is used as the feature vector and preset of the character 0 in the verification speech information.
  • the similarity score of the feature vector of character 0 in the registered voice information and so on.
  • a user identification module 760 configured to: if the similarity score reaches a preset verification threshold, The verification user determines the registered user corresponding to the registered voice information.
  • the user identification module 760 may take the average value of the similarity scores of the respective characters calculated by the similarity determination module 750, if the average value of the similarity scores of the respective characters reaches the corresponding value.
  • the preset verification threshold determines the authenticated user as the registered user corresponding to the registered voice information. If there are multiple registered users, such as the registered users A, B, and C shown in FIG. 1, the user identification module 760 may perform the similarity between the feature vector of a certain character of the user and the feature vector of the corresponding character of each registered user. If the feature vector of the corresponding character of a registered user and the feature vector of the character of the verification voice have the highest similarity score and the similarity reaches the preset verification threshold, the registered user is used as the identification result of the verification user.
  • the voice acquiring module 710 is further configured to obtain registration voice information generated by the registered user reading the second character string, where the second character string has at least one of the same as the first character string. character of;
  • the voice segment identification module 720 is further configured to perform voice recognition on the registration voice information to obtain a voice segment respectively included in the registration voice information and corresponding to multiple characters in the second character string;
  • the voiceprint feature extraction module 730 is further configured to extract a voiceprint feature of a voice segment corresponding to each character in the registered voice information;
  • the feature model training module 740 is further configured to: according to the voiceprint feature of the voice segment corresponding to each character in the registered voice information, and the common background model corresponding to the preset corresponding character, the corresponding characters in the registered voice information are trained. Feature vector.
  • the voiceprint recognition apparatus may further include:
  • the character order determining module 770 is configured to determine whether the order of the voice segments of the plurality of characters in the verification voice information is consistent with the order of the corresponding characters in the first character string.
  • a different first character string may be randomly generated each time, and in this step, the voice segment of the plurality of characters in the voice information is determined. Whether the sorting is consistent with the sorting of the corresponding characters in the first string, if not, it may be determined that the voiceprint recognition fails, and if the sorting of the corresponding characters in the first character string is consistent, the voiceprint feature extraction module 730 may be notified. Or feature model training module 740 performs feature extraction and voiceprint training for the verification voice information.
  • the voiceprint recognition apparatus may further include:
  • the string display module 700 is configured to randomly generate the first string and display it.
  • the UTM training corresponding to the preset corresponding character is used to obtain the feature vector corresponding to each character in the verification voice information, and the verification is performed.
  • the feature vector corresponding to each character in the voice information is compared with the feature vector of the corresponding character in the registered voice information, thereby determining the user identity of the verified user.
  • the user feature vector used for comparison corresponds to the specific character, and the user is fully considered.
  • the voiceprint features of different characters are read aloud, so that the accuracy of voiceprint recognition can be effectively improved.
  • FIG. 9 is a schematic structural diagram of another voiceprint recognition apparatus according to an embodiment of the present invention.
  • the voiceprint recognition apparatus 1000 may include at least one processor 1001 (eg, a CPU), a user interface 1003, a memory 1005, and at least one communication bus 1002.
  • the communication bus 1002 is used to implement connection communication between these components.
  • the user interface 1003 may include a display, a keyboard, a microphone, and the like.
  • the user interface 1003 may further include a standard wired interface and a wireless interface.
  • the memory 1005 may be a high speed RAM memory or a non-volatile memory such as at least one disk memory.
  • an operating system, a user interface module, and computer executable program code eg, a voiceprint recognition program
  • the processor 1001 can be used to call the computer executable program code stored in the memory 1005, and specifically perform the following steps:
  • the user verification is verified as the registered user corresponding to the registered voice information.
  • the processor 1001 also calls the computer executable program code to perform the following operations:
  • the voiceprint features of the voice segments corresponding to the respective characters are extracted.
  • the processor 1001 invokes the computer executable program code to perform the following operations to calculate a voice segment corresponding to a corresponding one of the preset voice information in the voice message corresponding to each character in the verification voice information. Similarity of voiceprint features:
  • the processor 1001 before the obtaining the verification voice information generated by the verification user to read the first character string, the processor 1001 further invokes the computer executable program code to perform the following operations: acquiring Registering a user to read the registered voice information generated by the second character string, the second character string having at least one of the same characters as the first character string; performing voice recognition on the registered voice information to obtain the registered voice information a voice segment respectively corresponding to the plurality of characters in the second character string; extracting a voiceprint feature of the voice segment corresponding to each character in the registered voice information; and corresponding to each character in the registered voice information
  • the voiceprint feature of the voice segment is trained in conjunction with a common background model corresponding to the corresponding corresponding character to obtain a feature vector corresponding to each character in the registered voice information.
  • the processor 1001 invokes the computer executable program code to perform the following operations, according to the voiceprint feature of the voice segment corresponding to the respective characters, and the common background corresponding to the preset corresponding character Model training, obtaining a feature vector corresponding to each character in the verification voice information: using a voiceprint feature of a voice segment corresponding to each character in the verification voice information as training sample data, and using a maximum posterior probability algorithm on the preset
  • the mean supervector of the universal background model corresponding to the corresponding character is adjusted to obtain a feature vector corresponding to each character in the verified voice information.
  • the processor 1001 invokes the computer executable program code to perform the following operations to use the voiceprint feature of the voice segment corresponding to each character in the verification voice information as training sample data, using the maximum The posterior probability algorithm adjusts the mean supervector of the universal background model corresponding to the preset corresponding character, thereby obtaining a feature vector corresponding to each character in the verification voice information: corresponding to each character in the verification voice information
  • the voiceprint feature of the voice segment is used as the training sample data
  • the maximum a posteriori probability algorithm is used to adjust the mean supervector of the common background model corresponding to the preset corresponding character, and combined with the preset supervector subspace matrix. Obtaining a feature vector corresponding to each character in the verification voice information.
  • the processor 1001 invokes the computer executable program code to perform the following operations to use the voiceprint feature of the voice segment corresponding to each character in the verification voice information as the training sample data,
  • the maximum posterior probability algorithm adjusts the mean supervector of the universal background model corresponding to the preset corresponding character, and combines the preset supervector subspace matrix to obtain corresponding characters of the verified voice information.
  • Feature vector using the voiceprint feature of the voice segment corresponding to each character in the verification voice information as the training sample data, and adjusting the mean super vector of the common background model corresponding to the preset corresponding character by using the following formula:
  • Mean supervector of the background model, T is the preset supervector subspace matrix, and ⁇ is the verification speech information Eigenvectors corresponding to the respective characters.
  • the preset super-vector sub-space matrix is determined according to a correlation between weights of respective Gaussian modules in the universal background model.
  • the processor 1001 invokes the computer executable program code to perform the following operations to calculate that a feature vector corresponding to each character in the verification voice information corresponds to a corresponding character in the preset registration voice information.
  • the similarity score of the feature vector calculating a cosine distance value between a feature vector corresponding to each character in the verification voice information and a feature vector corresponding to a corresponding character in the preset registration voice information, and The cosine distance value is determined as the similarity score.
  • the processor 1001 invokes the computer executable program code to perform the following operations to perform voice recognition on the verification voice information to obtain the verification voice information package.
  • a speech segment respectively corresponding to the plurality of characters in the first character string identifying a valid speech segment and an invalid speech segment in the verification speech information; performing speech recognition on the valid speech segment respectively A speech segment corresponding to a plurality of characters in the first character string.
  • the processor 1001 before the verifying the user is determined as the registered user corresponding to the registered voice information, the processor 1001 further invokes the computer executable program code to perform the following operations: determining the Verifying whether the ordering of the voice segments of the plurality of characters in the voice information is consistent with the ordering of the corresponding characters in the first character string; and when the similarity reaches a preset threshold value, and the verification voice information is In the case that the ordering of the speech segments of the plurality of characters is consistent with the ordering of the corresponding characters in the first character string, the verification user is determined as the registered user corresponding to the registration voice information.
  • the processor 1001 before the obtaining the verification voice information generated by the verification user to read the first character string, the processor 1001 further invokes the computer executable program code to perform the following operations: random Generating the first character string and displaying the first character string.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Game Theory and Decision Science (AREA)
  • Business, Economics & Management (AREA)
  • Telephonic Communication Services (AREA)
  • Telephone Function (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

L'invention concerne un procédé et un dispositif de reconnaissance d'empreinte vocale. Le procédé comprend : l'obtention d'informations vocales d'authentification générées par la lecture d'une première chaîne de caractères par un utilisateur authentifié (S201) ; la réalisation d'une reconnaissance vocale sur les informations vocales d'authentification afin d'obtenir des segments vidéo inclus dans les informations vocales d'authentification et correspondant respectivement à une pluralité de caractères dans la première chaîne de caractères (S202) ; l'extraction d'une caractéristique d'empreinte vocale du segment vidéo correspondant à chaque caractère (S203) ; l'obtention, conformément à la caractéristique d'empreinte vocale du segment vidéo correspondant à chaque caractère, d'un vecteur caractéristique correspondant à chaque caractère dans les informations vocales d'authentification en se basant sur un apprentissage de modèle d'arrière-plan général prédéfini correspondant à un caractère correspondant (S204) ; et le calcul d'un score de similarité entre le vecteur caractéristique correspondant à chaque caractère dans les informations vocales d'authentification et le vecteur caractéristique correspondant à un caractère correspondant dans des informations vocales d'enregistrement prédéfinies et, si le score de similarité atteint un seuil d'authentification prédéfini, détermination du fait que l'utilisateur authentifié est un utilisateur enregistré correspondant aux informations vocales d'enregistrement (S205). L'invention permet d'améliorer efficacement la précision de reconnaissance d'empreinte vocale.
PCT/CN2017/087911 2016-06-12 2017-06-12 Procédé et dispositif de reconnaissance d'empreinte vocale WO2017215558A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610416650.3 2016-06-12
CN201610416650.3A CN106098068B (zh) 2016-06-12 2016-06-12 一种声纹识别方法和装置

Publications (1)

Publication Number Publication Date
WO2017215558A1 true WO2017215558A1 (fr) 2017-12-21

Family

ID=57846666

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/087911 WO2017215558A1 (fr) 2016-06-12 2017-06-12 Procédé et dispositif de reconnaissance d'empreinte vocale

Country Status (2)

Country Link
CN (1) CN106098068B (fr)
WO (1) WO2017215558A1 (fr)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109147767A (zh) * 2018-08-16 2019-01-04 平安科技(深圳)有限公司 语音中的数字识别方法、装置、计算机设备及存储介质
CN110738998A (zh) * 2019-09-11 2020-01-31 深圳壹账通智能科技有限公司 基于语音的个人信用评估方法、装置、终端及存储介质
CN111199729A (zh) * 2018-11-19 2020-05-26 阿里巴巴集团控股有限公司 声纹识别方法及装置
CN112435673A (zh) * 2020-12-15 2021-03-02 北京声智科技有限公司 一种模型训练方法及电子终端
CN115550075A (zh) * 2022-12-01 2022-12-30 中网道科技集团股份有限公司 一种社区矫正对象公益活动数据的防伪处理方法和设备
CN115641852A (zh) * 2022-10-18 2023-01-24 中国电信股份有限公司 声纹识别方法、装置、电子设备和计算机可读存储介质
WO2024077588A1 (fr) * 2022-10-14 2024-04-18 Qualcomm Incorporated Authentification d'utilisateur basée sur la voix

Families Citing this family (32)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106098068B (zh) * 2016-06-12 2019-07-16 腾讯科技(深圳)有限公司 一种声纹识别方法和装置
US11283631B2 (en) 2017-01-03 2022-03-22 Nokia Technologies Oy Apparatus, method and computer program product for authentication
CN108447471B (zh) * 2017-02-15 2021-09-10 腾讯科技(深圳)有限公司 语音识别方法及语音识别装置
CN107610708B (zh) * 2017-06-09 2018-06-19 平安科技(深圳)有限公司 识别声纹的方法及设备
CN109102812B (zh) * 2017-06-21 2021-08-31 北京搜狗科技发展有限公司 一种声纹识别方法、系统及电子设备
CN107492379B (zh) 2017-06-30 2021-09-21 百度在线网络技术(北京)有限公司 一种声纹创建与注册方法及装置
CN107248410A (zh) * 2017-07-19 2017-10-13 浙江联运知慧科技有限公司 声纹识别垃圾箱开门的方法
CN109559759B (zh) * 2017-09-27 2021-10-08 华硕电脑股份有限公司 具备增量注册单元的电子设备及其方法
CN110310647B (zh) * 2017-09-29 2022-02-25 腾讯科技(深圳)有限公司 一种语音身份特征提取器、分类器训练方法及相关设备
CN107886943A (zh) * 2017-11-21 2018-04-06 广州势必可赢网络科技有限公司 一种声纹识别方法及装置
CN108154588B (zh) * 2017-12-29 2020-11-27 深圳市艾特智能科技有限公司 解锁方法、系统、可读存储介质及智能设备
CN110047491A (zh) * 2018-01-16 2019-07-23 中国科学院声学研究所 一种随机数字口令相关的说话人识别方法及装置
CN108269590A (zh) * 2018-01-17 2018-07-10 广州势必可赢网络科技有限公司 一种声带恢复评分方法及装置
CN108447489B (zh) * 2018-04-17 2020-05-22 清华大学 一种带反馈的连续声纹认证方法及系统
CN110875044B (zh) * 2018-08-30 2022-05-03 中国科学院声学研究所 一种基于字相关得分计算的说话人识别方法
CN109117622B (zh) * 2018-09-19 2020-09-01 北京容联易通信息技术有限公司 一种基于音频指纹的身份认证方法
CN109257362A (zh) * 2018-10-11 2019-01-22 平安科技(深圳)有限公司 声纹验证的方法、装置、计算机设备以及存储介质
CN109473107B (zh) * 2018-12-03 2020-12-22 厦门快商通信息技术有限公司 一种文本半相关的声纹识别方法及系统
CN111669350A (zh) * 2019-03-05 2020-09-15 阿里巴巴集团控股有限公司 身份验证方法、验证信息生成方法、支付方法及装置
CN110600041B (zh) * 2019-07-29 2022-04-29 华为技术有限公司 一种声纹识别的方法及设备
CN110517695A (zh) * 2019-09-11 2019-11-29 国微集团(深圳)有限公司 基于声纹的验证方法及装置
CN110971763B (zh) * 2019-12-10 2021-01-26 Oppo广东移动通信有限公司 到站提醒方法、装置、存储介质及电子设备
CN110956732A (zh) * 2019-12-19 2020-04-03 重庆特斯联智慧科技股份有限公司 一种基于物联网的安全门禁
CN111081260A (zh) * 2019-12-31 2020-04-28 苏州思必驰信息科技有限公司 一种唤醒词声纹的识别方法及系统
CN111081256A (zh) * 2019-12-31 2020-04-28 苏州思必驰信息科技有限公司 数字串声纹密码验证方法及系统
CN111597531A (zh) * 2020-04-07 2020-08-28 北京捷通华声科技股份有限公司 一种身份认证方法、装置、电子设备及可读存储介质
CN111613230A (zh) * 2020-06-24 2020-09-01 泰康保险集团股份有限公司 声纹验证方法、装置、设备及存储介质
CN112820299B (zh) * 2020-12-29 2021-09-14 马上消费金融股份有限公司 声纹识别模型训练方法、装置及相关设备
CN113113022A (zh) * 2021-04-15 2021-07-13 吉林大学 一种基于说话人声纹信息的自动识别身份的方法
CN113570754B (zh) * 2021-07-01 2022-04-29 汉王科技股份有限公司 声纹锁控制方法、装置、电子设备
CN116530944B (zh) * 2023-07-06 2023-10-20 荣耀终端有限公司 声音处理方法及电子设备
CN116978368B (zh) * 2023-09-25 2023-12-15 腾讯科技(深圳)有限公司 一种唤醒词检测方法和相关装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050033573A1 (en) * 2001-08-09 2005-02-10 Sang-Jin Hong Voice registration method and system, and voice recognition method and system based on voice registration method and system
CN102238189A (zh) * 2011-08-01 2011-11-09 安徽科大讯飞信息科技股份有限公司 声纹密码认证方法及系统
CN105096121A (zh) * 2015-06-25 2015-11-25 百度在线网络技术(北京)有限公司 声纹认证方法和装置
CN105656887A (zh) * 2015-12-30 2016-06-08 百度在线网络技术(北京)有限公司 基于人工智能的声纹认证方法以及装置
CN106098068A (zh) * 2016-06-12 2016-11-09 腾讯科技(深圳)有限公司 一种声纹识别方法和装置

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102254559A (zh) * 2010-05-20 2011-11-23 盛乐信息技术(上海)有限公司 基于声纹的身份认证系统及方法
CN102314877A (zh) * 2010-07-08 2012-01-11 盛乐信息技术(上海)有限公司 字符内容提示的声纹识别方法
CN101997689B (zh) * 2010-11-19 2012-08-08 吉林大学 基于声纹识别的usb身份认证方法及其系统
CN102163427B (zh) * 2010-12-20 2012-09-12 北京邮电大学 一种基于环境模型的音频异常事件检测方法
CN102737634A (zh) * 2012-05-29 2012-10-17 百度在线网络技术(北京)有限公司 一种基于语音的认证方法及装置
CN103679452A (zh) * 2013-06-20 2014-03-26 腾讯科技(深圳)有限公司 支付验证方法、装置及系统
CN104282303B (zh) * 2013-07-09 2019-03-29 威盛电子股份有限公司 利用声纹识别进行语音辨识的方法及其电子装置
CN104064189A (zh) * 2014-06-26 2014-09-24 厦门天聪智能软件有限公司 一种声纹动态口令的建模和验证方法
CN104575504A (zh) * 2014-12-24 2015-04-29 上海师范大学 采用声纹和语音识别进行个性化电视语音唤醒的方法
CN104901808A (zh) * 2015-04-14 2015-09-09 时代亿宝(北京)科技有限公司 基于时间型动态口令的声纹认证系统及方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20050033573A1 (en) * 2001-08-09 2005-02-10 Sang-Jin Hong Voice registration method and system, and voice recognition method and system based on voice registration method and system
CN102238189A (zh) * 2011-08-01 2011-11-09 安徽科大讯飞信息科技股份有限公司 声纹密码认证方法及系统
CN105096121A (zh) * 2015-06-25 2015-11-25 百度在线网络技术(北京)有限公司 声纹认证方法和装置
CN105656887A (zh) * 2015-12-30 2016-06-08 百度在线网络技术(北京)有限公司 基于人工智能的声纹认证方法以及装置
CN106098068A (zh) * 2016-06-12 2016-11-09 腾讯科技(深圳)有限公司 一种声纹识别方法和装置

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109147767A (zh) * 2018-08-16 2019-01-04 平安科技(深圳)有限公司 语音中的数字识别方法、装置、计算机设备及存储介质
CN111199729A (zh) * 2018-11-19 2020-05-26 阿里巴巴集团控股有限公司 声纹识别方法及装置
CN111199729B (zh) * 2018-11-19 2023-09-26 阿里巴巴集团控股有限公司 声纹识别方法及装置
CN110738998A (zh) * 2019-09-11 2020-01-31 深圳壹账通智能科技有限公司 基于语音的个人信用评估方法、装置、终端及存储介质
CN112435673A (zh) * 2020-12-15 2021-03-02 北京声智科技有限公司 一种模型训练方法及电子终端
CN112435673B (zh) * 2020-12-15 2024-05-14 北京声智科技有限公司 一种模型训练方法及电子终端
WO2024077588A1 (fr) * 2022-10-14 2024-04-18 Qualcomm Incorporated Authentification d'utilisateur basée sur la voix
CN115641852A (zh) * 2022-10-18 2023-01-24 中国电信股份有限公司 声纹识别方法、装置、电子设备和计算机可读存储介质
CN115550075A (zh) * 2022-12-01 2022-12-30 中网道科技集团股份有限公司 一种社区矫正对象公益活动数据的防伪处理方法和设备
CN115550075B (zh) * 2022-12-01 2023-05-09 中网道科技集团股份有限公司 一种社区矫正对象公益活动数据的防伪处理方法和设备

Also Published As

Publication number Publication date
CN106098068B (zh) 2019-07-16
CN106098068A (zh) 2016-11-09

Similar Documents

Publication Publication Date Title
WO2017215558A1 (fr) Procédé et dispositif de reconnaissance d'empreinte vocale
KR102239129B1 (ko) 심층신경망을 이용하는 종단 간 화자 인식
RU2738325C2 (ru) Способ и устройство аутентификации личности
US11727942B2 (en) Age compensation in biometric systems using time-interval, gender and age
Dey et al. Speech biometric based attendance system
WO2017113658A1 (fr) Procédé et dispositif à base d'intelligence artificielle permettant une authentification par empreinte vocale
Chen et al. Robust deep feature for spoofing detection—The SJTU system for ASVspoof 2015 challenge
WO2016150032A1 (fr) Procédé et dispositif de connexion par empreinte vocale basée sur l'intelligence artificielle
Das et al. Development of multi-level speech based person authentication system
WO2017114307A1 (fr) Procédé d'authentification par empreinte vocale permettant d'empêcher une attaque d'enregistrement, serveur, terminal et système
US11348590B2 (en) Methods and devices for registering voiceprint and for authenticating voiceprint
WO2017162053A1 (fr) Procédé et dispositif d'authentification d'identité
Saquib et al. A survey on automatic speaker recognition systems
Mansour et al. Voice recognition using dynamic time warping and mel-frequency cepstral coefficients algorithms
JP2007133414A (ja) 音声の識別能力推定方法及び装置、ならびに話者認証の登録及び評価方法及び装置
EP2879130A1 (fr) Procédés et systèmes pour la séparation d'un signal numérique
Korshunov et al. Impact of score fusion on voice biometrics and presentation attack detection in cross-database evaluations
CN110767239A (zh) 一种基于深度学习的声纹识别方法、装置及设备
US10909991B2 (en) System for text-dependent speaker recognition and method thereof
Chen et al. Towards understanding and mitigating audio adversarial examples for speaker recognition
CN110111798B (zh) 一种识别说话人的方法、终端及计算机可读存储介质
CN107346568A (zh) 一种门禁系统的认证方法和装置
JP7259981B2 (ja) 話者認証システム、方法およびプログラム
JP6280068B2 (ja) パラメータ学習装置、話者認識装置、パラメータ学習方法、話者認識方法、およびプログラム
Shirvanian et al. Quantifying the breakability of voice assistants

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17812669

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 17812669

Country of ref document: EP

Kind code of ref document: A1