WO2019136911A1

WO2019136911A1 - Voice recognition method for updating voiceprint data, terminal device, and storage medium

Info

Publication number: WO2019136911A1
Application number: PCT/CN2018/089415
Authority: WO
Inventors: 王健宗; 郑斯奇; 于夕畔; 肖京
Original assignee: 平安科技（深圳）有限公司
Priority date: 2018-01-12
Filing date: 2018-05-31
Publication date: 2019-07-18
Also published as: CN108269575A; CN108269575B

Abstract

Disclosed is a voice recognition method for updating voiceprint data, comprising: registering preset number of registration voice and calculating a feature voice vector of each registration voice; comparing and scoring the feature voice vectors of the registration voices two by two; obtaining verification voice and calculating the feature voice vectors of the verification voice; and comparing and scoring the feature voice vectors of the verification voice and the feature voice vectors of the registration voice two by two, and updating the preset number of registration voice according to the verification voice. The present application further provides a terminal device and a storage medium. By means of the voice recognition method for updating the voiceprint data, the terminal device, and the storage medium provided by the present application, each verification of a user can be carried out according to a comparison and updating process, and the follow-up voiceprint verification accuracy can be improved; moreover, the method can adapt to change of voice of a registered person which fluctuates over time.

Description

Voice recognition method, terminal device and storage medium for updating voiceprint data

Priority claim

This application claims priority to Chinese Patent Application No. 201810030623.1, entitled "Voice Recognition Method, Terminal Device and Storage Medium for Updating Voiceprint Data", which was submitted to the Chinese Patent Office on January 12, 2018. This is incorporated herein by reference.

Technical field

The present application relates to the field of voice recognition, and in particular, to a voice recognition method, a terminal device, and a storage medium for updating voiceprint data.

Background technique

At present, the conventional voiceprint registration recognition method generally includes the following steps: 1. Feature extraction, after obtaining the user registration voice data, extracting the sound features of the data. 2. Generate an authentication vector. 3. Alignment verification, the feature vector (i-vector) of the user registration is retained in the voiceprint library, and the i-vector for verifying the voice extraction is compared with the i-vector at the time of registration for each verification, that is, the predetermined cosine is utilized. The distance formula calculates the distance between the current authentication vector and the registration vector corresponding to the user. If the distance is within the set threshold range, the two i-vectors are considered to be generated by the same person's voice, that is, the verification is successful; otherwise, the return fails.

However, under this conventional registration verification, each comparison vector is the authentication vector generated when the user first registers the voice. Over time, the user's voice may change due to factors such as age, physical condition, environment, etc. Each time the verification is performed using the i-vector at the time of registration, the verification may fail.

Summary of the invention

In view of this, the present application proposes a voice recognition method, a terminal device, and a storage medium for updating voiceprint data. By implementing the above manner, the accuracy of subsequent voiceprint verification can be improved, and the voice change of the registrant with time can be adapted.

First, in order to achieve the above object, the present application provides a terminal device, where the terminal device includes a memory and a processor, and the memory stores a voice recognition program for updating voiceprint data that can be run on the processor. The speech recognition program for updating the voiceprint data is executed by the processor to implement the following steps:

Registering a preset number of user registration voices, and calculating a feature voice vector of each user registration voice in the preset number of user registration voices;

Performing a pairwise comparison on the feature speech vectors of each user registration voice, and obtaining a first score average as the first threshold;

Acquiring a verification voice, and calculating a feature speech vector of the verification voice;

And characterizing the feature speech vectors of the verification speech and the feature speech vectors of the registered speech respectively, and obtaining a second scoring average;

Determining whether the second score average is greater than a second threshold, the second threshold being the first threshold plus a preset value; and

If the second score average is greater than the second threshold, the registration voice is updated according to the verification voice.

In addition, in order to achieve the above object, the present application further provides a voice recognition method for updating voiceprint data, which is applied to a terminal device, and the method includes:

Further, in order to achieve the above object, the present application further provides a storage medium storing a voice recognition program for updating voiceprint data, and the voice recognition program for updating voiceprint data may be executed by at least one processor. The step of causing the at least one processor to perform the speech recognition method of updating the voiceprint data as described above.

Compared with the prior art, the voice recognition method, the terminal device, and the storage medium for updating the voiceprint data proposed by the present application first register a preset number of user registration voices, and calculate the preset number of user registration voices. a feature speech vector of each user registration voice; secondly, the feature speech vector of each user registration voice is scored in pairs, and the first score average is obtained as a first threshold; and then, the verification voice is obtained, And calculating a feature speech vector of the verification speech; then, scoring the feature speech vector of the verification speech and the feature speech vector of the registration speech, and obtaining a second scoring average; further, determining the Whether the second score average is greater than a second threshold, the second threshold is greater than the first threshold; and finally, if the second score average is greater than the second threshold, updating the preview according to the verification voice Set the number of users to register voice. In this way, the existing speech recognition method can be changed over time, and the user's voice may be changed due to factors such as age, physical condition, environment, etc., and each time the verification is performed using the i-vector at the time of registration, the failure of the verification may be caused. In addition, each time the user verifies, the user will follow the process of comparison and update, that is, each time the verification is met, the registration information of the user in the voiceprint library will be updated, which can improve the accuracy of subsequent voiceprint verification. And can adapt to changes in the voice of the registrant over time.

DRAWINGS

1 is a diagram showing an operating environment of a terminal device according to a preferred embodiment of the present application;

2 is a program block diagram of an embodiment of a speech recognition program for updating voiceprint data according to the present application;

3 is a flow chart of an embodiment of a voice recognition program for updating voiceprint data according to the present application;

The implementation, functional features and advantages of the present application will be further described with reference to the accompanying drawings.

Detailed ways

It is understood that the specific embodiments described herein are merely illustrative of the application and are not intended to be limiting.

In the following description, the suffixes such as "module," "component," or "unit" used to denote an element are merely illustrative for the benefit of the present application, and have no particular meaning per se. Therefore, "module", "component" or "unit" can be used in combination.

The terminal device can be implemented in various forms. For example, the terminal device described in the present application may include, for example, a mobile phone, a tablet computer, a notebook computer, a palmtop computer, a personal digital assistant (PDA), a portable media player (PMP), a navigation device, Mobile terminals such as wearable devices, smart bracelets, pedometers, and fixed terminals such as digital TVs, desktop computers, and the like.

The following description will be made by taking a mobile terminal as an example, and those skilled in the art will understand that the configuration according to the embodiment of the present application can be applied to a terminal of a fixed type in addition to an element particularly for mobile purposes.

.

Referring to FIG. 1, FIG. 1 is a diagram showing an operating environment of a terminal device 100 according to a preferred embodiment of the present application. The electronic device 100 also includes a voice recognition program 300 that updates voiceprint data, a memory 20, a processor 30, a sensing unit 40, and the like. The sensing unit 40 may be various sensors that sense user voices, and are mainly used to acquire verification voices.

The memory 20 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (eg, SD or DX memory, etc.), a random access memory (RAM), a static Random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, and the like. The processor 30 can be a central processing unit (CPU), a controller, a microcontroller, a microprocessor, or other data processing chip.

Various embodiments of the method of the present application are proposed based on the operating environment map of the terminal device 100 described above.

First, the present application proposes a speech recognition program 300 for updating voiceprint data.

Referring to FIG. 2, it is a program module diagram of the first embodiment of the speech recognition program 300 for updating voiceprint data in the present application.

In this embodiment, the voice recognition program 300 for updating voiceprint data includes a series of computer program instructions stored on the memory 109. When the computer program instructions are executed by the processor 110, embodiments of the present application may be implemented. The voice recognition operation of updating the voiceprint data. In some embodiments, the speech recognition program 300 that updates the voiceprint data can be divided into one or more modules based on the particular operations implemented by the various portions of the computer program instructions. For example, in FIG. 2, the voice recognition program 300 for updating voiceprint data may be divided into a registration module 301, a first comparison module 302, an acquisition module 303, a second comparison module 304, a determination module 305, and an update. Module 306. among them:

The registration module 301 is configured to register a preset number of registered voices, and calculate a feature voice vector of each registered voice in the preset number of registered voices. For example, the registration module 301 registers N registered voices. In this embodiment, the voice recognition program 300 for updating the voiceprint data is stored in the terminal device 100. The terminal device 100 of the embodiment may be any terminal having a voice recognition function, such as a mobile phone, a portable computer, and a personal digital assistant. Bank payment terminals, access control devices, etc. These devices can implement some specific functions and applications through voice recognition technology. In addition, the terminal device 100 acquires an effective voice when the user performs voice registration, and can start acquiring when the user clicks on the voice input, until the user stops the voice input, thereby avoiding unnecessary noise interference and improving the purity of the voice sample to be processed. degree. In addition, the above N registered voices are preferably three registered voices, and of course, N may be selected as other suitable positive integers as needed.

In this embodiment, the registration module 301 separately calculates a feature speech vector of each registered voice in the preset number of registered voices by:

The registration module 301 extracts the MFCC features of each frame of speech in each speech using the Mel Frequency Cepstrum Coefficient MFCC method and forms a matrix, and uses a Universal Background Model (UBM) and a feature speech vector (i-vector) extractor. The extractor selects the most core features in the matrix to form the feature speech vector.

Among them, MFCC is an abbreviation of Mel-Frequency Cepstral Coefficients, which contains two key steps: conversion to the Mel frequency, followed by cepstrum analysis. In this embodiment, voice segmentation is performed on each voice to obtain a voice spectrum of multiple frames; and the acquired spectrum is obtained through a Mel filter bank to obtain a Mel spectrum, where the Mel filter group may be non-uniform. The frequency is converted to a uniform frequency; finally, the cepstrum analysis is performed on the Mel spectrum to obtain the Mel frequency cepstrum coefficient MFCC. This MFCC is the characteristic of the speech of the frame. The so-called cepstrum analysis is to take the logarithm of the Mel spectrum, and then do Inverse transform, in which the actual inverse transform is generally implemented by DCT discrete cosine transform, and the second to thirteenth coefficients after DCT are taken as MFCC coefficients. In this way, the MFCC of each frame of speech is composed into a vector matrix, and the most core vector in the matrix is filtered by a background model (UBM) and a feature-vector vector (i-vector) extractor, and the vector is used as the vector. a feature speech vector of the speech, wherein the background of the most central vector in the matrix is filtered by a background model (UBM) and a feature speech vector (i-vector) extractor, which belongs to the existing calculation method of vector matrix calculation, I will not repeat them.

The first comparison module 302 is configured to perform a pairwise comparison of the feature speech vectors of each registered voice, and obtain a first score average as a first threshold. Specifically, the first comparison module 302 performs a pairwise comparison score on the feature speech vectors of each of the speeches by using a dot-product algorithm and a PLDA algorithm. In the present embodiment, the vector dot product algorithm and the PLDA algorithm are an existing algorithm, and will not be described in detail herein.

The obtaining module 303 is configured to acquire a verification voice, and calculate a feature voice vector of the verification voice. In the present embodiment, after acquiring the verification speech, the acquisition module 303 also calculates the feature speech vector of the verification speech by using the MFCC algorithm and the UBM model and the vector extractor.

The second comparison module 304 is configured to perform a pairwise comparison of the feature speech vectors of the verification speech and the feature speech vectors of the registered speech, and obtain a second scoring average. In this embodiment, the feature speech vectors of the verification speech are respectively scored with the feature speech vectors of the plurality of registered speeches to obtain a plurality of corresponding second scoring values, and the plurality of second scoring values are obtained. The average is obtained to obtain a second score average. In this embodiment, the registration module 301 registers a preset number of registered voices, for example, three registered voices are registered, and the feature voice vector corresponding to the registered voice is a feature voice vector group including three voice feature vectors. . At this time, in the process of voice verification, there is only one verification voice acquired, and there is also a feature voice vector for verifying the voice. The second comparison module 304 performs a process of pairwise matching the feature speech vectors of the verification speech with the feature speech vectors of the registered speech, specifically, the obtained one verification speech and the characteristic speech vector group of the registered speech respectively. The three characteristic speech vectors are separately scored, and three scoring values are obtained, and the three scoring values are averaged to obtain the second scoring average.

The determining module 305 is configured to determine whether the second score average is greater than a second threshold. In this embodiment, the second threshold is the first threshold plus a preset value. In this embodiment, the second threshold is greater than the first threshold, and the preset value of the phase difference may be customized according to the actual situation in the repeated experiment, and the present application does not limit the preset value.

The update module 306 is configured to update the registration voice according to the verification voice when the second score average is greater than the second threshold. In this embodiment, the update module 306 updates the registration voice in the following manner:

The update module 306 further determines whether the second score average is greater than a third threshold, and the third threshold is greater than the second threshold. The update module 306 updates the verification voice whose second score average value is greater than the third threshold value into the registration voice, and registers according to the updated registration voice. The third threshold can be determined by the following steps:

The update module 306 first selects all the verification voices whose second score is higher than the second threshold, and counts as N, and then, the second score average corresponding to the selected verification voice is performed by the high value. To the low ordering; finally, the second score average of the N/3 is selected as the third threshold. The third threshold is dynamically set by using the second score average corresponding to the verification voice in the continuous verification process, thereby dynamically updating the registration voice according to the third threshold, and ensuring that the registered voice can be changed according to changes of the user in different periods.

It should be noted that the magnitude relationship between the first threshold, the second threshold, and the third threshold is that the first threshold is smaller than the second threshold, and the second threshold is smaller than the third threshold, where the first threshold is also the first score average. The upper is the average difference between the registered speeches, and the second scoring average substantially reflects the difference between the verified speech and the registered speech, if at this time the verification speech is compared with the registered speech by a difference between the registered speech and the registered speech. An average difference is also larger than a preset value, indicating that the difference between the verification voice and the previous registration voice is relatively large, and the registration voice needs to be updated at this time. Further, further setting a third threshold based on the second threshold may further filter out the verification voice that is significantly different from the registered voice, and then update the registration voice according to the verification voice.

By executing the above-mentioned program modules 301-306, it is possible to solve the change in the existing speech recognition method over time, and the user's voice may be changed due to factors such as age, physical condition, environment, etc., and each time the verification is performed using the i-vector at the time of registration, The drawbacks that may lead to verification failure, and then each time the user verifies, will follow the process of comparison and update, that is, each time the verification is met, the registration information of the user in the voiceprint library will be updated. Improve the accuracy of subsequent voiceprint verification, and adapt to the changes in the voice of the registrant over time.

In addition, the present application also proposes a voice recognition method for updating voiceprint data.

Referring to FIG. 3, it is a schematic flowchart of the implementation of the first embodiment of the voice recognition method for updating voiceprint data in the present application. In this embodiment, the order of execution of the steps in the flowchart shown in FIG. 3 may be changed according to different requirements, and some steps may be omitted.

Step S401: Register a preset number of registered voices, and calculate a feature voice vector of each registered voice in the preset number of registered voices. For example, the terminal device 100 registers N registered voices. In this embodiment, the voice recognition method for updating the voiceprint data is stored in the terminal device 100. The terminal device 100 in this embodiment may be any terminal having a voice recognition function, such as a mobile phone, a portable computer, a personal digital assistant, Bank payment terminals, access control devices, etc., which can implement some specific functions and applications through voice recognition technology. In addition, the terminal device 100 acquires an effective voice when the user performs voice registration, and can start acquiring when the user clicks on the voice input, until the user stops the voice input, thereby avoiding unnecessary noise interference and improving the purity of the voice sample to be processed. degree. In addition, the above N registered voices are preferably three registered voices, and of course, N may be selected as other suitable positive integers as needed.

In this embodiment, the terminal device 100 separately calculates a feature speech vector of each registered voice in the preset number of registered voices by:

The terminal device 100 extracts the MFCC features of each frame of speech in each speech using the Mel Frequency Cepstrum Coefficient MFCC method and forms a matrix, and uses a Universal Background Model (UBM) and a feature speech vector (i-vector) extractor. The extractor selects the most core features in the matrix to form the feature speech vector.

Among them, MFCC is an abbreviation of Mel-Frequency Cepstral Coefficients, which contains two key steps: conversion to the Mel frequency, followed by cepstrum analysis. In this embodiment, voice segmentation is performed on each voice to obtain a voice spectrum of multiple frames; and the acquired spectrum is obtained through a Mel filter bank to obtain a Mel spectrum, where the Mel filter group may be non-uniform. The frequency is converted to a uniform frequency; finally, the cepstrum analysis is performed on the Mel spectrum to obtain the Mel frequency cepstrum coefficient MFCC. This MFCC is the characteristic of the speech of the frame. The so-called cepstrum analysis is to take the logarithm of the Mel spectrum, and then do The inverse transform is actually implemented by DCT discrete cosine transform, and the second to thirteenth coefficients after DCT are taken as MFCC coefficients. In this way, the MFCC of each frame of speech is composed into a vector matrix, and the most core vector in the matrix is filtered by a background model (UBM) and a feature-vector vector (i-vector) extractor, and the vector is used as the vector. a feature speech vector of the speech, wherein the background vector (UBM) and the feature speech vector (i-vector) extractor filter out the most core vector in the matrix belongs to the existing data algorithm of the vector matrix calculation, I will not repeat them.

Step S402, performing a pairwise comparison of the feature speech vectors of each registered speech, and obtaining a first scoring average as the first threshold. Specifically, the terminal device 100 performs a pairwise comparison and scoring of the feature speech vectors of each of the speeches by using a dot-product algorithm and a PLDA algorithm. In the present embodiment, the vector dot product algorithm and the PLDA algorithm are an existing algorithm, and will not be described in detail herein.

Step S403, obtaining a verification voice, and calculating a feature speech vector of the verification voice. In the present embodiment, after acquiring the verification speech, the acquisition module 303 also calculates the feature speech vector of the verification speech by using the MFCC algorithm and the UBM model and the vector extractor.

Step S404, the feature speech vectors of the verification speech are respectively compared with the feature speech vectors of the registered speech, and the second scoring average is obtained. The feature speech vectors of the verification speech are respectively scored with the feature speech vectors of the plurality of registered speeches, and then a plurality of corresponding second scoring values are obtained, and the plurality of second scoring values are averaged to obtain a second. Score the average. In this embodiment, the registration module 301 registers a preset number of registered voices, for example, three registered voices are registered, and the feature voice vector corresponding to the registered voice is a feature voice vector group including three voice feature vectors. . At this time, in the process of voice verification, there is only one verification voice acquired, and there is also a feature voice vector for verifying the voice. The second comparison module 304 performs a process of pairwise matching the feature speech vectors of the verification speech with the feature speech vectors of the registered speech, specifically, the obtained one verification speech and the characteristic speech vector group of the registered speech respectively. The three characteristic speech vectors are separately scored, and three scoring values are obtained, and the three scoring values are averaged to obtain the second scoring average.

Step S405, determining whether the second score average value is greater than a second threshold. In this embodiment, the second threshold is the first threshold plus a preset value. When the second score average is greater than the second threshold, step S406 is performed, otherwise, the flow is ended. In this embodiment, the second threshold is greater than the first threshold, and the preset value of the phase difference may be customized according to the actual situation in the repeated experiment, and the present application does not limit the preset value.

Step S406, updating the registration voice according to the verification voice. In this embodiment, the terminal device 100 further updates the registration voice by:

The terminal device 100 first determines whether the second scoring average is greater than a third threshold, wherein the third threshold is greater than the second threshold. Then, the terminal device 100 updates the verification voice whose second score average value is greater than the third threshold value into the registration voice, and registers according to the updated registration voice. In this embodiment, the third threshold is determined by the following steps:

The terminal device 100 first selects all the verification voices whose second score is higher than the second threshold, and counts as N, and then performs the second score average corresponding to the selected verification voice. High to low ordering; finally, the second score average of the N/3 is selected as the third threshold. The third threshold is dynamically set by using the second score average corresponding to the verification voice in the continuous verification process, thereby dynamically updating the registration voice according to the third threshold, and ensuring that the registered voice can be changed according to changes of the user in different periods.

Through the above steps S401-406, the voice recognition method for updating the voiceprint data proposed by the present application can solve the change in the existing voice recognition method over time, and the user voice may change due to factors such as age, physical condition, environment, etc. When using the i-vector at the time of verification for comparison, it may lead to the drawback of verification failure, and then it can be carried out according to the comparison and update process every time the user verifies the verification, that is, the verification is passed every time the user meets the requirements. The registration information in the library will be updated to improve the accuracy of subsequent voiceprint verification and to adapt to changes in the voice of the registrant over time.

The present application further provides another embodiment, that is, a storage medium storing a voice recognition program for updating voiceprint data, the voice recognition program for updating voiceprint data being executable by at least one processor And the step of causing the at least one processor to perform the speech recognition method of updating the voiceprint data as described above.

The serial numbers of the embodiments of the present application are merely for the description, and do not represent the advantages and disadvantages of the embodiments.

Through the description of the above embodiments, those skilled in the art can clearly understand that the foregoing embodiment method can be implemented by means of software plus a necessary general hardware platform, and of course, can also be through hardware, but in many cases, the former is better. Implementation. Based on such understanding, the technical solution of the present application, which is essential or contributes to the prior art, may be embodied in the form of a software product stored in a storage medium (such as ROM/RAM, disk, The optical disc includes a number of instructions for causing a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to perform the methods described in various embodiments of the present application.

The above is only a preferred embodiment of the present application, and is not intended to limit the scope of the patent application, and the equivalent structure or equivalent process transformations made by the specification and the drawings of the present application, or directly or indirectly applied to other related technical fields. The same is included in the scope of patent protection of this application.

Claims

A voice recognition method for updating voiceprint data is applied to a terminal device, and the method includes the steps of:

Registering a preset number of registered voices, and calculating a feature voice vector of each registered voice in the preset number of registered voices;

Performing a pairwise comparison of the feature speech vectors of each registered voice, and obtaining a first score average as a first threshold;

Acquiring a verification voice, and calculating a feature speech vector of the verification voice;

And characterizing the feature speech vectors of the verification speech and the feature speech vectors of the registered speech respectively, and obtaining a second scoring average;

Determining whether the second score average is greater than a second threshold, the second threshold being the first threshold plus a preset value; and

And if the second score average is greater than the second threshold, updating the registered voice according to the verification voice.
The method for updating the voiceprint data of the voiceprint data according to claim 1, wherein the calculating the feature voice vector of each registered voice in the preset number of registered voices comprises:

Extracting MFCC features of each frame of speech in each speech using MFCC method and forming a matrix; and

The UOB and the feature speech vector extractor are used to filter out the most core features in the matrix to form the feature speech vector.
The method for updating a voiceprint data of the voiceprint data according to claim 1, wherein the calculating the feature voice vector of the verification voice comprises:

Extracting the MFCC features of each frame of speech in the verification speech using the MFCC method and forming a matrix;

The UBO and the feature speech vector extractor are used to filter out the most core features in the matrix to form a feature speech vector of the verification speech.
The voice recognition method for updating voiceprint data according to claim 2 or 3, wherein the step of extracting MFCC features and forming a matrix using the MFCC method comprises:

Perform voice framing on each voice to obtain a voice spectrum of multiple frames;

Passing the speech spectrum through a Mel filter bank to obtain a Mel spectrum;

Performing a cepstrum analysis on the Mel spectrum to obtain a Mel frequency cepstral coefficient MFCC;

The MFCC of the speech spectrum of each frame is composed into a vector matrix.
The method for updating a voiceprint data according to claim 1, wherein the step of scoring the feature speech vectors of each of the registered voices by two or two comparisons comprises:

The feature speech vectors of each of the speeches are subjected to pairwise comparison scoring using a vector dot product algorithm and a PLDA algorithm.
The method for updating a voiceprint data according to claim 1, wherein the step of updating the preset number of registered voices according to the verification voice further comprises:

Determining whether the second score average is greater than a third threshold, the third threshold being greater than the second threshold; and

And updating the verification voice whose second score average value is greater than the third threshold value into the registration voice, and registering according to the updated registration voice.
The speech recognition method for updating voiceprint data according to claim 6, wherein the third threshold is determined by the following steps:

Selecting all the verification voices of the second score average value higher than the second threshold, and counting as N;

Sorting the average of the second score corresponding to the selected verification voice from high to low; and

The second score average of the N/3 is selected as the third threshold.
A terminal device, comprising: a memory, a processor, wherein the memory stores a voice recognition program for updating voiceprint data that can be run on the processor, where the voiceprint data is updated The speech recognition program is implemented by the processor to implement the following steps:

Registering a preset number of registered voices, and calculating a feature voice vector of each registered voice in the preset number of registered voices;

Performing a pairwise comparison of the feature speech vectors of each registered voice, and obtaining a first score average as a first threshold;

Acquiring a verification voice, and calculating a feature speech vector of the verification voice;

And characterizing the feature speech vectors of the verification speech and the feature speech vectors of the registered speech respectively, and obtaining a second scoring average;

Determining whether the second score average is greater than a second threshold, the second threshold being the first threshold plus a preset value; and

If the second score average is greater than the second threshold, the registration voice is updated according to the verification voice.
The terminal device according to claim 8, wherein the calculating the feature speech vector of each registered voice in the preset number of registered voices comprises:

Extracting MFCC features of each frame of speech in each speech using MFCC method and forming a matrix; and

The UOB and the feature speech vector extractor are used to filter out the most core features in the matrix to form the feature speech vector.
The terminal device according to claim 8, wherein the calculating the feature speech vector of the verification speech comprises:

Extracting the MFCC features of each frame of speech in the verification speech using the MFCC method and forming a matrix;

UBM and feature speech vector extractor are used to filter out the most core features in the matrix to form the feature speech vector of the verification speech.
The terminal device according to claim 9 or 10, wherein the step of extracting MFCC features and forming a matrix using the MFCC method comprises:

Perform voice framing on each voice to obtain a voice spectrum of multiple frames;

Passing the speech spectrum through a Mel filter bank to obtain a Mel spectrum;

Performing a cepstrum analysis on the Mel spectrum to obtain a Mel frequency cepstral coefficient MFCC;

The MFCC of the speech spectrum of each frame is composed into a vector matrix.
The terminal device according to claim 8, wherein the step of scoring the feature speech vectors of each of the registered voices by two or two comparisons comprises:

The feature speech vectors of each of the speeches are subjected to pairwise comparison scoring using a vector dot product algorithm and a PLDA algorithm.
The terminal device according to claim 8, wherein the step of updating the preset number of registered voices according to the verification voice further comprises:

Determining whether the second score average is greater than a third threshold, the third threshold being greater than the second threshold; and

And updating the verification voice whose second score average value is greater than the third threshold value into the registration voice, and registering according to the updated registration voice.
The terminal device according to claim 13, wherein said third threshold is determined by the following steps:

Selecting all the verification voices whose second score is higher than the second threshold, and counting N;

Sorting the average of the second score corresponding to the selected verification voice from high to low; and

The second score average of the N/3 is selected as the third threshold.
A storage medium storing a speech recognition program for updating voiceprint data, the speech recognition program for updating the voiceprint data being executable by at least one processor to cause the at least one processor to perform the following steps:

Registering a preset number of registered voices, and calculating a feature voice vector of each registered voice in the preset number of registered voices;

Performing a pairwise comparison of the feature speech vectors of each registered voice, and obtaining a first score average as a first threshold;

Acquiring a verification voice, and calculating a feature speech vector of the verification voice;

And characterizing the feature speech vectors of the verification speech and the feature speech vectors of the registered speech respectively, and obtaining a second scoring average;

Determining whether the second score average is greater than a second threshold, the second threshold being the first threshold plus a preset value; and

If the second score average is greater than the second threshold, the registration voice is updated according to the verification voice.
The storage medium according to claim 15, wherein the calculating the feature speech vector of each registered voice in the preset number of registered voices comprises:

Extracting MFCC features of each frame of speech in each speech using MFCC method and forming a matrix; and

The UOB and the feature speech vector extractor are used to filter out the most core features in the matrix to form the feature speech vector.
The storage medium according to claim 16, wherein said step of extracting MFCC features using a MFCC method and forming a matrix comprises:

Perform voice framing on each voice to obtain a voice spectrum of multiple frames;

Passing the speech spectrum through a Mel filter bank to obtain a Mel spectrum;

Performing a cepstrum analysis on the Mel spectrum to obtain a Mel frequency cepstral coefficient MFCC;

The MFCC of the speech spectrum of each frame is composed into a vector matrix.
The storage medium according to claim 15, wherein the step of scoring the feature speech vectors of each of the registered voices by two or two comparisons comprises:

The feature speech vectors of each of the speeches are subjected to pairwise comparison scoring using a vector dot product algorithm and a PLDA algorithm.
The storage medium according to claim 15, wherein the step of updating the preset number of registered voices according to the verification voice further comprises:

Determining whether the second score average is greater than a third threshold, the third threshold being greater than the second threshold; and

And updating the verification voice whose second score average value is greater than the third threshold value into the registration voice, and registering according to the updated registration voice.
A storage medium according to claim 19, wherein said third threshold is determined by the following steps:

Selecting all the verification voices whose second score is higher than the second threshold, and counting N;

Sorting the average of the second score corresponding to the selected verification voice from high to low; and

The second score average of the N/3 is selected as the third threshold.