CN108269575B - Voice recognition method for updating voiceprint data, terminal device and storage medium - Google Patents

Voice recognition method for updating voiceprint data, terminal device and storage medium Download PDF

Info

Publication number
CN108269575B
CN108269575B CN201810030623.1A CN201810030623A CN108269575B CN 108269575 B CN108269575 B CN 108269575B CN 201810030623 A CN201810030623 A CN 201810030623A CN 108269575 B CN108269575 B CN 108269575B
Authority
CN
China
Prior art keywords
voice
verification
threshold value
scoring
vector
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810030623.1A
Other languages
Chinese (zh)
Other versions
CN108269575A (en
Inventor
王健宗
郑斯奇
于夕畔
肖京
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201810030623.1A priority Critical patent/CN108269575B/en
Priority to PCT/CN2018/089415 priority patent/WO2019136911A1/en
Publication of CN108269575A publication Critical patent/CN108269575A/en
Application granted granted Critical
Publication of CN108269575B publication Critical patent/CN108269575B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/12Score normalisation

Abstract

The invention discloses a voice recognition method for updating voiceprint data, which comprises the following steps: registering a preset number of registered voices and calculating a characteristic voice vector of each registered voice; comparing every two characteristic voice vectors of each registered voice and scoring; obtaining verification voice and calculating a characteristic voice vector of the verification voice; and comparing and scoring the characteristic voice vectors of the verification voices with the characteristic voice vectors of the registration voices in pairs respectively, and updating the preset number of the registration voices according to the verification voices. The invention also provides a terminal device and a storage medium. The voice recognition method, the terminal device and the storage medium for updating the voiceprint data can be carried out according to the comparison and updating process when the user verifies each time, can improve the accuracy of subsequent voiceprint verification, and can adapt to the sound change of a registrant fluctuating along with time.

Description

Voice recognition method for updating voiceprint data, terminal device and storage medium
Technical Field
The present invention relates to the field of voice recognition, and in particular, to a voice recognition method, a terminal apparatus, and a storage medium for updating voiceprint data.
Background
The conventional voiceprint registration and identification method generally comprises the following steps: 1. and (4) feature extraction, namely extracting voice features of the data after the user registration voice data is obtained. 2. An authentication vector is generated. 3. Comparing and verifying, wherein a feature vector (i-vector) of the user during registration is reserved in the voiceprint library, each time of verification, the i-vector extracted by the verification voice is compared with the i-vector during registration, namely the distance between the current authentication vector and the registration vector corresponding to the user is calculated by utilizing a predetermined cosine distance formula, if the distance is within a set threshold range, the two i-vectors are considered to be generated by the voice of the same person, namely the verification is successful; otherwise, failure is returned.
However, under such conventional enrollment verification, the authentication vector of each enrollment is the authentication vector generated when the user first enrolled in speech. The voice of the user may change with time due to factors such as age, physical condition, and environment, and the authentication may fail by comparing with the registered i-vector at each authentication.
Disclosure of Invention
In view of the above, the present invention provides a voice recognition method, a terminal device and a storage medium for updating voiceprint data, which can improve the accuracy of subsequent voiceprint verification and adapt to the fluctuating voice change of a registrant over time.
To achieve the above object, the present invention provides a terminal device, which includes a memory and a processor, wherein the memory stores a voice recognition program for updating voiceprint data, the voice recognition program being executable on the processor, and the voice recognition program for updating voiceprint data implements the following steps when executed by the processor:
registering a preset number of user registration voices, and calculating a characteristic voice vector of each user registration voice in the preset number of user registration voices;
comparing every two feature voice vectors of each user registration voice and scoring, and acquiring a first scoring average value as a first threshold value;
obtaining verification voice and calculating a characteristic voice vector of the verification voice;
comparing and scoring the characteristic voice vectors of the verification voice with the characteristic voice vectors of the registration voice in pairs respectively, and acquiring a second scoring average value;
judging whether the second scoring average value is larger than a second threshold value, wherein the second threshold value is the sum of the first threshold value and a preset value; and
and if the second scoring average value is larger than the second threshold value, updating the registered voice according to the verification voice.
Optionally, the step of calculating a feature speech vector of each user registration speech in the preset number of user registration speeches includes:
extracting MFCC characteristics of each frame of voice in each voice by using an MFCC method and forming a matrix; and
and screening out the most core features in the matrix by using a UBM and a feature speech vector extractor to form the feature speech vector.
Optionally, the step of updating the registration voice according to the verification voice further includes:
judging whether the second scoring average value is larger than a third threshold value, wherein the third threshold value is larger than the second threshold value; and
and updating the verification voice with the second scoring average value larger than the third threshold value into the registration voice, and registering according to the updated registration voice.
Optionally, the third threshold is determined by:
selecting all verification voices with the second scoring average value higher than the second threshold value, and counting the verification voices to be N;
sorting the second scoring average values corresponding to the selected verification voices from high to low; and
and selecting the second scoring average value of the Nth/3 th as the third threshold value.
In addition, to achieve the above object, the present invention further provides a voice recognition method for updating voiceprint data, which is applied to a terminal device, and the method includes:
registering a preset number of user registration voices, and calculating a characteristic voice vector of each user registration voice in the preset number of user registration voices;
comparing every two feature voice vectors of each user registration voice and scoring, and acquiring a first scoring average value as a first threshold value;
obtaining verification voice and calculating a characteristic voice vector of the verification voice;
comparing and scoring the characteristic voice vectors of the verification voice with the characteristic voice vectors of the registration voice in pairs respectively, and acquiring a second scoring average value;
judging whether the second scoring average value is larger than a second threshold value, wherein the second threshold value is the sum of the first threshold value and a preset value; and
and if the second scoring average value is larger than the second threshold value, updating the registered voice according to the verification voice.
Optionally, the step of calculating a feature speech vector of each user registration speech in the preset number of user registration speeches includes:
extracting MFCC characteristics of each frame of voice in each voice by using an MFCC method and forming a matrix; and
and screening out the most core features in the matrix by using a UBM and a feature speech vector extractor to form the feature speech vector.
Optionally, the step of comparing and scoring the feature speech vectors of each user registration speech two by two includes:
and comparing and scoring the feature voice vectors of each voice pairwise by using a vector dot product algorithm and a PLDA algorithm.
Optionally, the step of updating the registration voice according to the verification voice further includes:
judging whether the second scoring average value is larger than a third threshold value, wherein the third threshold value is larger than the second threshold value; and
and updating the verification voice with the second scoring average value larger than the third threshold value into the registration voice, and registering according to the updated registration voice.
Optionally, the third threshold is determined by:
selecting all verification voices with the second scoring average value higher than the second threshold value, and counting the verification voices to be N;
sorting the second scoring average values corresponding to the selected verification voices from high to low; and
and selecting the second scoring average value of the Nth/3 th as the third threshold value.
Further, to achieve the above object, the present invention also provides a storage medium storing a voice recognition program for updating voiceprint data, the voice recognition program for updating voiceprint data being executable by at least one processor to cause the at least one processor to execute the steps of the voice recognition method for updating voiceprint data as described above.
Compared with the prior art, the voice recognition method, the terminal device and the storage medium for updating the voiceprint data provided by the invention have the advantages that firstly, the preset number of user registration voices are registered, and the characteristic voice vector of each user registration voice in the preset number of user registration voices is calculated; secondly, comparing every two feature voice vectors of each user registration voice and scoring, and acquiring a first scoring average value as a first threshold value; then, obtaining verification voice and calculating a characteristic voice vector of the verification voice; then, comparing and scoring the characteristic voice vector of the verification voice and the characteristic voice vector of the registration voice in pairs, and acquiring a second scoring average value; further, whether the second scoring average value is larger than a second threshold value is judged, and the second threshold value is larger than the first threshold value; and finally, if the second scoring average value is larger than the second threshold value, updating the user registration voices in the preset number according to the verification voices. Therefore, the defect that the verification fails due to the fact that the voice of a user possibly changes due to factors such as age, physical conditions and environment when the user performs verification by using an i-vector during registration in the conventional voice recognition method can be overcome, the user can perform verification according to a comparison and updating process when the user performs verification every time, namely verification meeting requirements passes every time, registration information of the user in a voiceprint library is updated, the accuracy of subsequent voiceprint verification can be improved, and the voice recognition method can adapt to the voice change of a registrant fluctuating along with time.
Drawings
FIG. 1 is a diagram of an operating environment of a terminal device according to a preferred embodiment of the present invention;
FIG. 2 is a block diagram of an embodiment of a speech recognition process for updating voiceprint data in accordance with the present invention;
FIG. 3 is a flowchart of an embodiment of a speech recognition process for updating voiceprint data according to the present invention;
the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for facilitating the explanation of the present invention, and have no specific meaning in itself. Thus, "module", "component" or "unit" may be used mixedly.
The terminal device may be implemented in various forms. For example, the terminal device described in the present invention may include a mobile terminal such as a mobile phone, a tablet computer, a notebook computer, a palmtop computer, a Personal Digital Assistant (PDA), a Portable Media Player (PMP), a navigation device, a wearable device, a smart band, a pedometer, and the like, and a fixed terminal such as a Digital TV, a desktop computer, and the like.
The following description will be given by way of example of a mobile terminal, and it will be understood by those skilled in the art that the construction according to the embodiment of the present invention can be applied to a fixed type terminal, in addition to elements particularly used for mobile purposes.
Fig. 1 is a diagram illustrating an operating environment of a terminal device 100 according to a preferred embodiment of the invention. The terminal device 100 further includes a voice recognition program 300 for updating voiceprint data, a memory 20, a processor 30, a sensing unit 40, and the like. The sensing unit 40 may be various sensors that sense the voice of the user, and is mainly used to acquire the authentication voice.
The memory 20 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. The processor 30 may be a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor, or other data Processing chip.
Based on the operating environment diagram of the terminal device 100, various embodiments of the method of the present invention are provided.
First, the present invention provides a speech recognition program 300 for updating voiceprint data.
Referring to FIG. 2, a block diagram of a first embodiment of a speech recognition process 300 for updating voiceprint data is shown.
In this embodiment, the voice recognition program 300 for updating voiceprint data includes a series of computer program instructions stored in the memory 20, which when executed by the processor 30, can implement the voice recognition operation for updating voiceprint data according to embodiments of the present invention. In some embodiments, the speech recognition program 300 that updates the voiceprint data can be divided into one or more modules based on the particular operations implemented by the portions of the computer program instructions. For example, in fig. 2, the voice recognition program 300 for updating voiceprint data can be divided into a registration module 301, a first comparison module 302, an acquisition module 303, a second comparison module 304, a determination module 305, and an update module 306. Wherein:
the registration module 301 is configured to register a preset number of registration voices, and calculate a feature voice vector of each registration voice in the preset number of registration voices. For example, the registration module 301 registers N registration voices. In this embodiment, the voice recognition program 300 for updating the voiceprint data is stored in the terminal device 100, and the terminal device 100 of this embodiment may be any terminal with a voice recognition function, such as a mobile phone, a portable computer, a personal digital assistant, a bank payment terminal, an access control device, and the like, and these devices may implement some specific functions and applications through a voice recognition technology. In addition, the terminal device 100 obtains the valid voice when the user performs voice registration, and the valid voice can be obtained from the time when the user clicks the voice input until the user stops the voice input, so that some unnecessary noise interference can be avoided, and the purity of the voice sample to be processed can be improved. The N registered voices are preferably 3 registered voices, but N may be selected as another suitable positive integer as needed.
In this embodiment, the registration module 301 calculates the feature speech vector of each registration speech in the preset number of registration speeches respectively by:
the registration module 301 extracts the MFCC features of each frame of speech in each piece of speech by using the mel-frequency cepstrum coefficient MFCC method and forms a matrix, and screens out the most core features in the matrix by using a Universal Background Model (UBM) and a feature speech vector (i-vector) extractor (extractor) to form the feature speech vector.
Where MFCC is an abbreviation for Mel-Frequency Cepstral Coefficients, comprising two key steps: conversion to mel frequency and then cepstrum analysis. In the embodiment, each voice is subjected to voice framing to obtain voice spectrums of a plurality of frames; then, the acquired frequency spectrum is processed by a Mel filter bank to obtain a Mel frequency spectrum, wherein the Mel filter bank can convert non-uniform frequency into uniform frequency; and finally, performing cepstrum analysis on the Mel frequency spectrum to obtain Mel frequency cepstrum coefficients MFCC which are the characteristics of the frame of voice, wherein the cepstrum analysis is to logarithm the Mel frequency spectrum and perform inverse transformation, the actual inverse transformation is generally realized by DCT discrete cosine transformation, and the 2 nd to 13 th coefficients after DCT are taken as MFCC coefficients. Thus, the MFCCs of each frame of speech are combined into a vector matrix, and the most core vector in the matrix is selected by a background model (UBM) and a feature speech vector (i-vector) extractor (extractor), and the vector is used as the feature speech vector of the speech, wherein the most core vector in the matrix is selected by the background model (UBM) and the feature speech vector (i-vector) extractor (extractor) and belongs to the existing calculation method of vector matrix calculation, which is not repeated herein.
The first comparison module 302 is configured to compare and score every two feature speech vectors of each registered speech, and obtain a first score average value as a first threshold. Specifically, the first comparing module 302 performs pairwise comparison and scoring on the feature speech vectors of each speech by using a vector dot-product (dot-product) algorithm and a PLDA algorithm. In this embodiment, the vector dot product algorithm and the PLDA algorithm are one of the existing algorithms, and are not described herein.
The obtaining module 303 is configured to obtain a verification voice, and calculate a feature voice vector of the verification voice. In this embodiment, after obtaining the verification speech, the obtaining module 303 calculates the feature speech vector of the verification speech by using the MFCC algorithm, the UBM model and the vector extractor.
The second comparison module 304 is configured to compare and score the feature speech vectors of the verification speech with the feature speech vectors of the registration speech, and obtain a second score average value. In this embodiment, the feature speech vectors of the verification speech and the feature speech vectors of the plurality of registration speech are respectively subjected to pairwise comparison and scoring, so as to obtain a plurality of corresponding second scoring values, and an average value of the plurality of second scoring values is obtained to obtain a second scoring average value. In this embodiment, the registration module 301 registers a preset number of registered speeches, for example, registers 3 registered speeches, and at this time, a feature speech vector corresponding to the registered speeches is a feature speech vector group including 3 speech feature vectors. At this time, only one verification voice is obtained in the voice verification process, and a feature voice vector of the verification voice also exists. The second comparison module 304 performs a pairwise comparison and scoring process on the feature speech vectors of the verification speech and the feature speech vectors of the registration speech, specifically, compares and scores the obtained verification speech with 3 feature speech vectors in the feature speech vector group of the registration speech, respectively, to obtain three scoring values, and averages the three scoring values to obtain a second scoring average value.
The determining module 305 is configured to determine whether the second scored average value is greater than a second threshold. In this embodiment, the second threshold is the sum of the first threshold and a preset value. In this embodiment, the second threshold is greater than the first threshold, and the preset value of the phase difference can be set by a user in a customized manner according to the actual situation in repeated experiments.
The updating module 306 is configured to update the registration voice according to the verification voice when the second scored average value is greater than the second threshold. In this embodiment, the updating module 306 updates the registration voice by the following method:
the update module 306 further determines whether the second scored average value is greater than a third threshold, where the third threshold is greater than the second threshold. The updating module 306 updates the verification speech with the second scored average value larger than the third threshold value to the registration speech, and performs registration according to the updated registration speech. Wherein the third threshold may be determined by:
the updating module 306 selects all verification voices with the second scoring average value higher than the second threshold value, counts the verification voices as N, and then sorts the second scoring average values corresponding to the selected verification voices from high to low; and finally, selecting the second scoring average value of the Nth/3 th as the third threshold value. And dynamically setting a third threshold value by utilizing a second scoring average value corresponding to the verification voice in the continuous verification process, and further dynamically updating the registration voice according to the third threshold value and ensuring that the registration voice can change according to the change of the user in different periods.
It should be noted that, the first threshold, the second threshold, and the third threshold are respectively related to the fact that the first threshold is smaller than the second threshold, and the second threshold is smaller than the third threshold, where the first threshold is that an average value of the first scoring average is substantially an average difference between the registered voices, and the second scoring average substantially reflects a difference between the verification voice and the registered voice, and if a difference between the verification voice and the registered voice is larger than an average difference between the registered voices by a predetermined value, it indicates that a difference between the verification voice and the previous registered voice is larger, and at this time, the registered voice needs to be updated. Furthermore, a third threshold value is further set on the basis of the second threshold value, so that verification voices which are obviously different from the registration voices can be further screened out, and the registration voices are updated according to the verification voices.
By executing the program module 301 and the processing module 306, the defect that the verification fails due to the fact that the voice of the user changes along with time in the existing voice recognition method and the user voice may change due to factors such as age, physical conditions and environment and the like and the i-vector used for registration is used for comparison during each verification can be overcome, and therefore the user can perform verification according to the comparison and updating process during each verification, namely the verification meeting the requirements passes each time, the registration information of the user in the voiceprint library is updated, the accuracy of subsequent voiceprint verification can be improved, and the voice change of the registrant along with time fluctuation can be adapted.
In addition, the invention also provides a voice recognition method for updating the voiceprint data.
Fig. 3 is a schematic flow chart showing the implementation of the first embodiment of the voice recognition method for updating voiceprint data according to the present invention. In this embodiment, the execution order of the steps in the flowchart shown in fig. 3 may be changed and some steps may be omitted according to different requirements.
Step S401, register a preset number of registered voices, and calculate a feature voice vector of each registered voice in the preset number of registered voices. For example, the terminal device 100 registers N pieces of registration voices. In this embodiment, the voice recognition method for updating the voiceprint data is stored in the terminal device 100, and the terminal device 100 of this embodiment may be any terminal with a voice recognition function, such as a mobile phone, a portable computer, a personal digital assistant, a bank payment terminal, an access control device, and the like, which can implement some specific functions and applications through a voice recognition technology. In addition, the terminal device 100 obtains the valid voice when the user performs voice registration, and the valid voice can be obtained from the time when the user clicks the voice input until the user stops the voice input, so that some unnecessary noise interference can be avoided, and the purity of the voice sample to be processed can be improved. The N registered voices are preferably 3 registered voices, but N may be selected as another suitable positive integer as needed.
In this embodiment, the terminal device 100 calculates the feature speech vector of each registered speech in the preset number of registered speeches respectively by:
the terminal device 100 extracts the MFCC features of each frame of voice in each voice by using the mel-frequency cepstrum coefficient MFCC method and forms a matrix, and screens out the most core features in the matrix by using a Universal Background Model (UBM) and a feature voice vector (i-vector) extractor (extractor) to form the feature voice vector.
Where MFCC is an abbreviation for Mel-Frequency Cepstral Coefficients, comprising two key steps: conversion to mel frequency and then cepstrum analysis. In the embodiment, each voice is subjected to voice framing to obtain voice spectrums of a plurality of frames; then, the acquired frequency spectrum is processed by a Mel filter bank to obtain a Mel frequency spectrum, wherein the Mel filter bank can convert non-uniform frequency into uniform frequency; and finally, performing cepstrum analysis on the Mel frequency spectrum to obtain Mel frequency cepstrum coefficients MFCC, wherein the MFCC is the characteristics of the frame of voice, the cepstrum analysis is to logarithm the Mel frequency spectrum and then perform inverse transformation, the inverse transformation is generally realized by DCT discrete cosine transformation, and the coefficients from 2 nd to 13 th after DCT are taken as MFCC coefficients. Thus, the MFCC of each frame of voice forms a vector matrix, the most core vector in the matrix is screened out through a background model (UBM) and a feature voice vector (i-vector) extractor (extractor), and the vector is taken as the feature voice vector of the voice, wherein the most core vector in the matrix is screened out through the background model (UBM) and the feature voice vector (i-vector) extractor (extractor) and belongs to the existing data algorithm of vector matrix calculation, and the description is not repeated herein.
Step S402, comparing every two characteristic voice vectors of each registered voice and scoring, and acquiring a first scoring average value as a first threshold value. Specifically, the terminal device 100 performs pairwise comparison and scoring on the feature speech vectors of each piece of speech by using a vector dot-product (dot-product) algorithm and a PLDA algorithm. In this embodiment, the vector dot product algorithm and the PLDA algorithm are one of the existing algorithms, and are not described herein.
Step S403, obtaining verification voice and calculating the characteristic voice vector of the verification voice. In this embodiment, after obtaining the verification speech, the obtaining module 303 calculates the feature speech vector of the verification speech by using the MFCC algorithm, the UBM model and the vector extractor.
And S404, comparing and scoring the characteristic voice vectors of the verification voice and the characteristic voice vectors of the registration voice in pairs respectively, and acquiring a second scoring average value. And respectively comparing and scoring the characteristic voice vectors of the verification voice with the characteristic voice vectors of the plurality of registration voices in pairs to obtain a plurality of corresponding second scoring values, and averaging the plurality of second scoring values to obtain a second scoring average value. In this embodiment, the registration module 301 registers a preset number of registered speeches, for example, registers 3 registered speeches, and at this time, a feature speech vector corresponding to the registered speeches is a feature speech vector group including 3 speech feature vectors. At this time, only one verification voice is obtained in the voice verification process, and a feature voice vector of the verification voice also exists. The second comparison module 304 performs a pairwise comparison and scoring process on the feature speech vectors of the verification speech and the feature speech vectors of the registration speech, specifically, compares and scores the obtained verification speech with 3 feature speech vectors in the feature speech vector group of the registration speech, respectively, to obtain three scoring values, and averages the three scoring values to obtain a second scoring average value.
Step S405, determining whether the second scored average value is greater than a second threshold. In this embodiment, the second threshold is the sum of the first threshold and a preset value. And when the second scoring average value is larger than the second threshold value, executing step S406, otherwise, ending the process. In this embodiment, the second threshold is greater than the first threshold, and the preset value of the phase difference can be set by a user in a customized manner according to the actual situation in repeated experiments.
Step S406, updating the registration voice according to the verification voice. In this embodiment, the terminal device 100 further updates the registration voice by:
the terminal device 100 first determines whether the second scored average value is greater than a third threshold, where the third threshold is greater than the second threshold. Then, the terminal device 100 updates the verification voice, of which the second scored average value is greater than the third threshold value, to the registration voice, and performs registration according to the updated registration voice. In this embodiment, the third threshold is determined by:
the terminal device 100 selects all verification voices with the second scoring average value higher than a second threshold value, counts the verification voices to be N, and then sorts the second scoring average values corresponding to the selected verification voices from high to low; and finally, selecting the second scoring average value of the Nth/3 th as the third threshold value. And dynamically setting a third threshold value by utilizing a second scoring average value corresponding to the verification voice in the continuous verification process, and further dynamically updating the registration voice according to the third threshold value and ensuring that the registration voice can change according to the change of the user in different periods.
It should be noted that, the first threshold, the second threshold, and the third threshold are respectively related to the fact that the first threshold is smaller than the second threshold, and the second threshold is smaller than the third threshold, where the first threshold is that an average value of the first scoring average is substantially an average difference between the registered voices, and the second scoring average substantially reflects a difference between the verification voice and the registered voice, and if a difference between the verification voice and the registered voice is larger than an average difference between the registered voices by a predetermined value, it indicates that a difference between the verification voice and the previous registered voice is larger, and at this time, the registered voice needs to be updated. Furthermore, a third threshold value is further set on the basis of the second threshold value, so that verification voices which are obviously different from the registration voices can be further screened out, and the registration voices are updated according to the verification voices.
Through the steps S401-406, the voice recognition method for updating voiceprint data provided by the invention can solve the defect that the verification fails because the voice of the user is possibly changed due to factors such as age, physical conditions and environment and the like in the existing voice recognition method over time and the i-vector used for registration is used for comparison during each verification, and further the user can be verified according to the comparison and updating process each time, namely the verification meeting the requirements passes each time, the registration information of the user in a voiceprint library is updated, the accuracy rate of the subsequent voiceprint verification can be improved, and the voice recognition method can adapt to the voice change of a registrant fluctuating over time.
The present invention also provides another embodiment, which is to provide a storage medium storing a voice recognition program for updating voiceprint data, the voice recognition program for updating voiceprint data being executable by at least one processor to cause the at least one processor to perform the steps of the voice recognition method for updating voiceprint data as described above.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (6)

1. A voice recognition method for updating voiceprint data is applied to a terminal device, and is characterized by comprising the following steps:
registering a preset number of registered voices and calculating a characteristic voice vector of each registered voice in the preset number of registered voices;
comparing every two feature voice vectors of each registered voice and scoring, and acquiring a first scoring average value as a first threshold value;
obtaining verification voice and calculating a characteristic voice vector of the verification voice;
comparing and scoring the characteristic voice vectors of the verification voice with the characteristic voice vectors of the registration voice in pairs respectively, and acquiring a second scoring average value;
judging whether the second scoring average value is larger than a second threshold value, wherein the second threshold value is the sum of the first threshold value and a preset value; and
if the second scoring average is greater than the second threshold, updating the registration voice according to the verification voice, including: judging whether the second scoring average value is larger than a third threshold value, updating the verification voice with the second scoring average value larger than the third threshold value into the registration voice, and registering according to the updated registration voice, wherein the third threshold value is larger than the second threshold value, and the third threshold value is determined through the following steps:
selecting all verification voices of the second scoring average value which are higher than the second threshold value, and counting the verification voices to be N;
sorting the second scoring average values corresponding to the selected verification voices from high to low; and
and selecting the second scoring average value of the Nth/3 th as the third threshold value.
2. The speech recognition method for updating voiceprint data according to claim 1, wherein the step of calculating the feature speech vector of each of the predetermined number of registered speeches includes:
extracting MFCC characteristics of each frame of voice in each voice by using an MFCC method and forming a matrix; and
and screening out the most core features in the matrix by using a UBM and a feature speech vector extractor to form the feature speech vector.
3. The speech recognition method for updating voiceprint data according to claim 1, wherein the step of comparing and scoring the feature speech vectors of each registered speech by two comprises:
and comparing and scoring the characteristic voice vectors of each registered voice pairwise by using a vector dot product algorithm and a PLDA algorithm.
4. A terminal device, comprising a memory and a processor, wherein the memory stores a voice recognition program for updating voiceprint data, the voice recognition program being executable on the processor, and the voice recognition program for updating voiceprint data implements the following steps when executed by the processor:
registering a preset number of registered voices and calculating a characteristic voice vector of each registered voice in the preset number of registered voices;
comparing every two feature voice vectors of each registered voice and scoring, and acquiring a first scoring average value as a first threshold value;
obtaining verification voice and calculating a characteristic voice vector of the verification voice;
comparing and scoring the characteristic voice vectors of the verification voice with the characteristic voice vectors of the registration voice in pairs respectively, and acquiring a second scoring average value;
judging whether the second scoring average value is larger than a second threshold value, wherein the second threshold value is the sum of the first threshold value and a preset value; and
if the second scoring average is greater than the second threshold, updating the registration voice according to the verification voice, including: judging whether the second scoring average value is larger than a third threshold value, updating the verification voice with the second scoring average value larger than the third threshold value into the registration voice, and registering according to the updated registration voice, wherein the third threshold value is larger than the second threshold value, and the third threshold value is determined through the following steps:
selecting all verification voices of the second scoring average value which are higher than the second threshold value, and counting the verification voices to be N;
sorting the second scoring average values corresponding to the selected verification voices from high to low; and
and selecting the second scoring average value of the Nth/3 th as the third threshold value.
5. The terminal apparatus according to claim 4, wherein the step of calculating the feature speech vector of each of the preset number of registered speeches comprises:
extracting MFCC characteristics of each frame of voice in each voice by using an MFCC method and forming a matrix; and
and screening out the most core features in the matrix by using a UBM and a feature speech vector extractor to form the feature speech vector.
6. A storage medium storing a speech recognition program for updating voiceprint data, the speech recognition program for updating voiceprint data being executable by at least one processor to cause the at least one processor to perform the steps of the method for speech recognition of updating voiceprint data according to any one of claims 1 to 3.
CN201810030623.1A 2018-01-12 2018-01-12 Voice recognition method for updating voiceprint data, terminal device and storage medium Active CN108269575B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201810030623.1A CN108269575B (en) 2018-01-12 2018-01-12 Voice recognition method for updating voiceprint data, terminal device and storage medium
PCT/CN2018/089415 WO2019136911A1 (en) 2018-01-12 2018-05-31 Voice recognition method for updating voiceprint data, terminal device, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810030623.1A CN108269575B (en) 2018-01-12 2018-01-12 Voice recognition method for updating voiceprint data, terminal device and storage medium

Publications (2)

Publication Number Publication Date
CN108269575A CN108269575A (en) 2018-07-10
CN108269575B true CN108269575B (en) 2021-11-02

Family

ID=62775513

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810030623.1A Active CN108269575B (en) 2018-01-12 2018-01-12 Voice recognition method for updating voiceprint data, terminal device and storage medium

Country Status (2)

Country Link
CN (1) CN108269575B (en)
WO (1) WO2019136911A1 (en)

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110400567B (en) * 2019-07-30 2021-10-19 深圳秋田微电子股份有限公司 Dynamic update method for registered voiceprint and computer storage medium
CN110660398B (en) * 2019-09-19 2020-11-20 北京三快在线科技有限公司 Voiceprint feature updating method and device, computer equipment and storage medium
CN111599365B (en) * 2020-04-08 2023-05-05 云知声智能科技股份有限公司 Adaptive threshold generation system and method for voiceprint recognition system
CN111785280A (en) * 2020-06-10 2020-10-16 北京三快在线科技有限公司 Identity authentication method and device, storage medium and electronic equipment
CN112289322B (en) * 2020-11-10 2022-11-15 思必驰科技股份有限公司 Voiceprint recognition method and device
CN112487804B (en) * 2020-11-25 2024-04-19 合肥三恩信息科技有限公司 Chinese novel speech synthesis system based on semantic context scene
TWI787996B (en) * 2021-09-08 2022-12-21 華南商業銀行股份有限公司 Voiceprint identification device for financial transaction system and method thereof
TWI817897B (en) * 2021-09-08 2023-10-01 華南商業銀行股份有限公司 Low-noise voiceprint identification device for financial transaction system and method thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104184587A (en) * 2014-08-08 2014-12-03 腾讯科技(深圳)有限公司 Voiceprint generation method, voiceprint generation server, client and voiceprint generation system
CN106782564A (en) * 2016-11-18 2017-05-31 百度在线网络技术(北京)有限公司 Method and apparatus for processing speech data
CN107068154A (en) * 2017-03-13 2017-08-18 平安科技(深圳)有限公司 The method and system of authentication based on Application on Voiceprint Recognition

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1905445B (en) * 2005-07-27 2012-02-15 国际商业机器公司 System and method of speech identification using mobile speech identification card
CN102760434A (en) * 2012-07-09 2012-10-31 华为终端有限公司 Method for updating voiceprint feature model and terminal
WO2016015687A1 (en) * 2014-07-31 2016-02-04 腾讯科技(深圳)有限公司 Voiceprint verification method and device
CN104616655B (en) * 2015-02-05 2018-01-16 北京得意音通技术有限责任公司 The method and apparatus of sound-groove model automatic Reconstruction
CN106157959B (en) * 2015-03-31 2019-10-18 讯飞智元信息科技有限公司 Sound-groove model update method and system
CN107424614A (en) * 2017-07-17 2017-12-01 广东讯飞启明科技发展有限公司 A kind of sound-groove model update method

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104184587A (en) * 2014-08-08 2014-12-03 腾讯科技(深圳)有限公司 Voiceprint generation method, voiceprint generation server, client and voiceprint generation system
CN106782564A (en) * 2016-11-18 2017-05-31 百度在线网络技术(北京)有限公司 Method and apparatus for processing speech data
CN107068154A (en) * 2017-03-13 2017-08-18 平安科技(深圳)有限公司 The method and system of authentication based on Application on Voiceprint Recognition

Also Published As

Publication number Publication date
CN108269575A (en) 2018-07-10
WO2019136911A1 (en) 2019-07-18

Similar Documents

Publication Publication Date Title
CN108269575B (en) Voice recognition method for updating voiceprint data, terminal device and storage medium
WO2019179036A1 (en) Deep neural network model, electronic device, identity authentication method, and storage medium
US11875799B2 (en) Method and device for fusing voiceprint features, voice recognition method and system, and storage medium
US6772119B2 (en) Computationally efficient method and apparatus for speaker recognition
WO2019179029A1 (en) Electronic device, identity verification method and computer-readable storage medium
WO2019134247A1 (en) Voiceprint registration method based on voiceprint recognition model, terminal device, and storage medium
US11545154B2 (en) Method and apparatus with registration for speaker recognition
CN110556126B (en) Speech recognition method and device and computer equipment
WO2019200744A1 (en) Self-updated anti-fraud method and apparatus, computer device and storage medium
WO2019136912A1 (en) Electronic device, identity authentication method and system, and storage medium
US11062120B2 (en) High speed reference point independent database filtering for fingerprint identification
CN113223536B (en) Voiceprint recognition method and device and terminal equipment
US20100045787A1 (en) Authenticating apparatus, authenticating system, and authenticating method
US6389392B1 (en) Method and apparatus for speaker recognition via comparing an unknown input to reference data
CN108694952B (en) Electronic device, identity authentication method and storage medium
CN112735437A (en) Voiceprint comparison method, system and device and storage mechanism
WO2019136811A1 (en) Audio comparison method, and terminal and computer-readable storage medium
CN113053395A (en) Pronunciation error correction learning method and device, storage medium and electronic equipment
CN108630208B (en) Server, voiceprint-based identity authentication method and storage medium
CN111640438A (en) Audio data processing method and device, storage medium and electronic equipment
JP6996627B2 (en) Information processing equipment, control methods, and programs
JP2014182270A (en) Information processor and information processing method
CN113035230A (en) Authentication model training method and device and electronic equipment
JP2543528B2 (en) Voice recognition device
CN117313723B (en) Semantic analysis method, system and storage medium based on big data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant