CN108269575B

CN108269575B - Voice recognition method for updating voiceprint data, terminal device and storage medium

Info

Publication number: CN108269575B
Application number: CN201810030623.1A
Authority: CN
Inventors: 王健宗; 郑斯奇; 于夕畔; 肖京
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2018-01-12
Filing date: 2018-01-12
Publication date: 2021-11-02
Anticipated expiration: 2038-01-12
Also published as: CN108269575A; WO2019136911A1

Abstract

The invention discloses a voice recognition method for updating voiceprint data, which comprises the following steps: registering a preset number of registered voices and calculating a characteristic voice vector of each registered voice; comparing every two characteristic voice vectors of each registered voice and scoring; obtaining verification voice and calculating a characteristic voice vector of the verification voice; and comparing and scoring the characteristic voice vectors of the verification voices with the characteristic voice vectors of the registration voices in pairs respectively, and updating the preset number of the registration voices according to the verification voices. The invention also provides a terminal device and a storage medium. The voice recognition method, the terminal device and the storage medium for updating the voiceprint data can be carried out according to the comparison and updating process when the user verifies each time, can improve the accuracy of subsequent voiceprint verification, and can adapt to the sound change of a registrant fluctuating along with time.

Description

Voice recognition method for updating voiceprint data, terminal device and storage medium

Technical Field

The present invention relates to the field of voice recognition, and in particular, to a voice recognition method, a terminal apparatus, and a storage medium for updating voiceprint data.

Background

The conventional voiceprint registration and identification method generally comprises the following steps: 1. and (4) feature extraction, namely extracting voice features of the data after the user registration voice data is obtained. 2. An authentication vector is generated. 3. Comparing and verifying, wherein a feature vector (i-vector) of the user during registration is reserved in the voiceprint library, each time of verification, the i-vector extracted by the verification voice is compared with the i-vector during registration, namely the distance between the current authentication vector and the registration vector corresponding to the user is calculated by utilizing a predetermined cosine distance formula, if the distance is within a set threshold range, the two i-vectors are considered to be generated by the voice of the same person, namely the verification is successful; otherwise, failure is returned.

However, under such conventional enrollment verification, the authentication vector of each enrollment is the authentication vector generated when the user first enrolled in speech. The voice of the user may change with time due to factors such as age, physical condition, and environment, and the authentication may fail by comparing with the registered i-vector at each authentication.

Disclosure of Invention

In view of the above, the present invention provides a voice recognition method, a terminal device and a storage medium for updating voiceprint data, which can improve the accuracy of subsequent voiceprint verification and adapt to the fluctuating voice change of a registrant over time.

To achieve the above object, the present invention provides a terminal device, which includes a memory and a processor, wherein the memory stores a voice recognition program for updating voiceprint data, the voice recognition program being executable on the processor, and the voice recognition program for updating voiceprint data implements the following steps when executed by the processor:

registering a preset number of user registration voices, and calculating a characteristic voice vector of each user registration voice in the preset number of user registration voices;

comparing every two feature voice vectors of each user registration voice and scoring, and acquiring a first scoring average value as a first threshold value;

obtaining verification voice and calculating a characteristic voice vector of the verification voice;

comparing and scoring the characteristic voice vectors of the verification voice with the characteristic voice vectors of the registration voice in pairs respectively, and acquiring a second scoring average value;

judging whether the second scoring average value is larger than a second threshold value, wherein the second threshold value is the sum of the first threshold value and a preset value; and

and if the second scoring average value is larger than the second threshold value, updating the registered voice according to the verification voice.

Optionally, the step of calculating a feature speech vector of each user registration speech in the preset number of user registration speeches includes:

extracting MFCC characteristics of each frame of voice in each voice by using an MFCC method and forming a matrix; and

and screening out the most core features in the matrix by using a UBM and a feature speech vector extractor to form the feature speech vector.

Optionally, the step of updating the registration voice according to the verification voice further includes:

judging whether the second scoring average value is larger than a third threshold value, wherein the third threshold value is larger than the second threshold value; and

and updating the verification voice with the second scoring average value larger than the third threshold value into the registration voice, and registering according to the updated registration voice.

Optionally, the third threshold is determined by:

selecting all verification voices with the second scoring average value higher than the second threshold value, and counting the verification voices to be N;

sorting the second scoring average values corresponding to the selected verification voices from high to low; and

and selecting the second scoring average value of the Nth/3 th as the third threshold value.

In addition, to achieve the above object, the present invention further provides a voice recognition method for updating voiceprint data, which is applied to a terminal device, and the method includes:

Optionally, the step of comparing and scoring the feature speech vectors of each user registration speech two by two includes:

and comparing and scoring the feature voice vectors of each voice pairwise by using a vector dot product algorithm and a PLDA algorithm.

Optionally, the third threshold is determined by:

Further, to achieve the above object, the present invention also provides a storage medium storing a voice recognition program for updating voiceprint data, the voice recognition program for updating voiceprint data being executable by at least one processor to cause the at least one processor to execute the steps of the voice recognition method for updating voiceprint data as described above.

Compared with the prior art, the voice recognition method, the terminal device and the storage medium for updating the voiceprint data provided by the invention have the advantages that firstly, the preset number of user registration voices are registered, and the characteristic voice vector of each user registration voice in the preset number of user registration voices is calculated; secondly, comparing every two feature voice vectors of each user registration voice and scoring, and acquiring a first scoring average value as a first threshold value; then, obtaining verification voice and calculating a characteristic voice vector of the verification voice; then, comparing and scoring the characteristic voice vector of the verification voice and the characteristic voice vector of the registration voice in pairs, and acquiring a second scoring average value; further, whether the second scoring average value is larger than a second threshold value is judged, and the second threshold value is larger than the first threshold value; and finally, if the second scoring average value is larger than the second threshold value, updating the user registration voices in the preset number according to the verification voices. Therefore, the defect that the verification fails due to the fact that the voice of a user possibly changes due to factors such as age, physical conditions and environment when the user performs verification by using an i-vector during registration in the conventional voice recognition method can be overcome, the user can perform verification according to a comparison and updating process when the user performs verification every time, namely verification meeting requirements passes every time, registration information of the user in a voiceprint library is updated, the accuracy of subsequent voiceprint verification can be improved, and the voice recognition method can adapt to the voice change of a registrant fluctuating along with time.

Drawings

FIG. 1 is a diagram of an operating environment of a terminal device according to a preferred embodiment of the present invention;

FIG. 2 is a block diagram of an embodiment of a speech recognition process for updating voiceprint data in accordance with the present invention;

FIG. 3 is a flowchart of an embodiment of a speech recognition process for updating voiceprint data according to the present invention;

the implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for facilitating the explanation of the present invention, and have no specific meaning in itself. Thus, "module", "component" or "unit" may be used mixedly.

The terminal device may be implemented in various forms. For example, the terminal device described in the present invention may include a mobile terminal such as a mobile phone, a tablet computer, a notebook computer, a palmtop computer, a Personal Digital Assistant (PDA), a Portable Media Player (PMP), a navigation device, a wearable device, a smart band, a pedometer, and the like, and a fixed terminal such as a Digital TV, a desktop computer, and the like.

The following description will be given by way of example of a mobile terminal, and it will be understood by those skilled in the art that the construction according to the embodiment of the present invention can be applied to a fixed type terminal, in addition to elements particularly used for mobile purposes.

Fig. 1 is a diagram illustrating an operating environment of a terminal device 100 according to a preferred embodiment of the invention. The terminal device 100 further includes a voice recognition program 300 for updating voiceprint data, a memory 20, a processor 30, a sensing unit 40, and the like. The sensing unit 40 may be various sensors that sense the voice of the user, and is mainly used to acquire the authentication voice.

The memory 20 includes at least one type of readable storage medium including a flash memory, a hard disk, a multimedia card, a card type memory (e.g., SD or DX memory, etc.), a Random Access Memory (RAM), a Static Random Access Memory (SRAM), a Read Only Memory (ROM), an Electrically Erasable Programmable Read Only Memory (EEPROM), a Programmable Read Only Memory (PROM), a magnetic memory, a magnetic disk, an optical disk, etc. The processor 30 may be a Central Processing Unit (CPU), a controller, a microcontroller, a microprocessor, or other data Processing chip.

Based on the operating environment diagram of the terminal device 100, various embodiments of the method of the present invention are provided.

First, the present invention provides a speech recognition program 300 for updating voiceprint data.

Referring to FIG. 2, a block diagram of a first embodiment of a speech recognition process 300 for updating voiceprint data is shown.

In this embodiment, the voice recognition program 300 for updating voiceprint data includes a series of computer program instructions stored in the memory 20, which when executed by the processor 30, can implement the voice recognition operation for updating voiceprint data according to embodiments of the present invention. In some embodiments, the speech recognition program 300 that updates the voiceprint data can be divided into one or more modules based on the particular operations implemented by the portions of the computer program instructions. For example, in fig. 2, the voice recognition program 300 for updating voiceprint data can be divided into a registration module 301, a first comparison module 302, an acquisition module 303, a second comparison module 304, a determination module 305, and an update module 306. Wherein:

the registration module 301 is configured to register a preset number of registration voices, and calculate a feature voice vector of each registration voice in the preset number of registration voices. For example, the registration module 301 registers N registration voices. In this embodiment, the voice recognition program 300 for updating the voiceprint data is stored in the terminal device 100, and the terminal device 100 of this embodiment may be any terminal with a voice recognition function, such as a mobile phone, a portable computer, a personal digital assistant, a bank payment terminal, an access control device, and the like, and these devices may implement some specific functions and applications through a voice recognition technology. In addition, the terminal device 100 obtains the valid voice when the user performs voice registration, and the valid voice can be obtained from the time when the user clicks the voice input until the user stops the voice input, so that some unnecessary noise interference can be avoided, and the purity of the voice sample to be processed can be improved. The N registered voices are preferably 3 registered voices, but N may be selected as another suitable positive integer as needed.

In this embodiment, the registration module 301 calculates the feature speech vector of each registration speech in the preset number of registration speeches respectively by:

the registration module 301 extracts the MFCC features of each frame of speech in each piece of speech by using the mel-frequency cepstrum coefficient MFCC method and forms a matrix, and screens out the most core features in the matrix by using a Universal Background Model (UBM) and a feature speech vector (i-vector) extractor (extractor) to form the feature speech vector.

Where MFCC is an abbreviation for Mel-Frequency Cepstral Coefficients, comprising two key steps: conversion to mel frequency and then cepstrum analysis. In the embodiment, each voice is subjected to voice framing to obtain voice spectrums of a plurality of frames; then, the acquired frequency spectrum is processed by a Mel filter bank to obtain a Mel frequency spectrum, wherein the Mel filter bank can convert non-uniform frequency into uniform frequency; and finally, performing cepstrum analysis on the Mel frequency spectrum to obtain Mel frequency cepstrum coefficients MFCC which are the characteristics of the frame of voice, wherein the cepstrum analysis is to logarithm the Mel frequency spectrum and perform inverse transformation, the actual inverse transformation is generally realized by DCT discrete cosine transformation, and the 2 nd to 13 th coefficients after DCT are taken as MFCC coefficients. Thus, the MFCCs of each frame of speech are combined into a vector matrix, and the most core vector in the matrix is selected by a background model (UBM) and a feature speech vector (i-vector) extractor (extractor), and the vector is used as the feature speech vector of the speech, wherein the most core vector in the matrix is selected by the background model (UBM) and the feature speech vector (i-vector) extractor (extractor) and belongs to the existing calculation method of vector matrix calculation, which is not repeated herein.

The first comparison module 302 is configured to compare and score every two feature speech vectors of each registered speech, and obtain a first score average value as a first threshold. Specifically, the first comparing module 302 performs pairwise comparison and scoring on the feature speech vectors of each speech by using a vector dot-product (dot-product) algorithm and a PLDA algorithm. In this embodiment, the vector dot product algorithm and the PLDA algorithm are one of the existing algorithms, and are not described herein.

The obtaining module 303 is configured to obtain a verification voice, and calculate a feature voice vector of the verification voice. In this embodiment, after obtaining the verification speech, the obtaining module 303 calculates the feature speech vector of the verification speech by using the MFCC algorithm, the UBM model and the vector extractor.

The second comparison module 304 is configured to compare and score the feature speech vectors of the verification speech with the feature speech vectors of the registration speech, and obtain a second score average value. In this embodiment, the feature speech vectors of the verification speech and the feature speech vectors of the plurality of registration speech are respectively subjected to pairwise comparison and scoring, so as to obtain a plurality of corresponding second scoring values, and an average value of the plurality of second scoring values is obtained to obtain a second scoring average value. In this embodiment, the registration module 301 registers a preset number of registered speeches, for example, registers 3 registered speeches, and at this time, a feature speech vector corresponding to the registered speeches is a feature speech vector group including 3 speech feature vectors. At this time, only one verification voice is obtained in the voice verification process, and a feature voice vector of the verification voice also exists. The second comparison module 304 performs a pairwise comparison and scoring process on the feature speech vectors of the verification speech and the feature speech vectors of the registration speech, specifically, compares and scores the obtained verification speech with 3 feature speech vectors in the feature speech vector group of the registration speech, respectively, to obtain three scoring values, and averages the three scoring values to obtain a second scoring average value.

The determining module 305 is configured to determine whether the second scored average value is greater than a second threshold. In this embodiment, the second threshold is the sum of the first threshold and a preset value. In this embodiment, the second threshold is greater than the first threshold, and the preset value of the phase difference can be set by a user in a customized manner according to the actual situation in repeated experiments.

The updating module 306 is configured to update the registration voice according to the verification voice when the second scored average value is greater than the second threshold. In this embodiment, the updating module 306 updates the registration voice by the following method:

the update module 306 further determines whether the second scored average value is greater than a third threshold, where the third threshold is greater than the second threshold. The updating module 306 updates the verification speech with the second scored average value larger than the third threshold value to the registration speech, and performs registration according to the updated registration speech. Wherein the third threshold may be determined by:

the updating module 306 selects all verification voices with the second scoring average value higher than the second threshold value, counts the verification voices as N, and then sorts the second scoring average values corresponding to the selected verification voices from high to low; and finally, selecting the second scoring average value of the Nth/3 th as the third threshold value. And dynamically setting a third threshold value by utilizing a second scoring average value corresponding to the verification voice in the continuous verification process, and further dynamically updating the registration voice according to the third threshold value and ensuring that the registration voice can change according to the change of the user in different periods.

It should be noted that, the first threshold, the second threshold, and the third threshold are respectively related to the fact that the first threshold is smaller than the second threshold, and the second threshold is smaller than the third threshold, where the first threshold is that an average value of the first scoring average is substantially an average difference between the registered voices, and the second scoring average substantially reflects a difference between the verification voice and the registered voice, and if a difference between the verification voice and the registered voice is larger than an average difference between the registered voices by a predetermined value, it indicates that a difference between the verification voice and the previous registered voice is larger, and at this time, the registered voice needs to be updated. Furthermore, a third threshold value is further set on the basis of the second threshold value, so that verification voices which are obviously different from the registration voices can be further screened out, and the registration voices are updated according to the verification voices.

By executing the program module 301 and the processing module 306, the defect that the verification fails due to the fact that the voice of the user changes along with time in the existing voice recognition method and the user voice may change due to factors such as age, physical conditions and environment and the like and the i-vector used for registration is used for comparison during each verification can be overcome, and therefore the user can perform verification according to the comparison and updating process during each verification, namely the verification meeting the requirements passes each time, the registration information of the user in the voiceprint library is updated, the accuracy of subsequent voiceprint verification can be improved, and the voice change of the registrant along with time fluctuation can be adapted.

In addition, the invention also provides a voice recognition method for updating the voiceprint data.

Fig. 3 is a schematic flow chart showing the implementation of the first embodiment of the voice recognition method for updating voiceprint data according to the present invention. In this embodiment, the execution order of the steps in the flowchart shown in fig. 3 may be changed and some steps may be omitted according to different requirements.

Step S401, register a preset number of registered voices, and calculate a feature voice vector of each registered voice in the preset number of registered voices. For example, the terminal device 100 registers N pieces of registration voices. In this embodiment, the voice recognition method for updating the voiceprint data is stored in the terminal device 100, and the terminal device 100 of this embodiment may be any terminal with a voice recognition function, such as a mobile phone, a portable computer, a personal digital assistant, a bank payment terminal, an access control device, and the like, which can implement some specific functions and applications through a voice recognition technology. In addition, the terminal device 100 obtains the valid voice when the user performs voice registration, and the valid voice can be obtained from the time when the user clicks the voice input until the user stops the voice input, so that some unnecessary noise interference can be avoided, and the purity of the voice sample to be processed can be improved. The N registered voices are preferably 3 registered voices, but N may be selected as another suitable positive integer as needed.

In this embodiment, the terminal device 100 calculates the feature speech vector of each registered speech in the preset number of registered speeches respectively by:

the terminal device 100 extracts the MFCC features of each frame of voice in each voice by using the mel-frequency cepstrum coefficient MFCC method and forms a matrix, and screens out the most core features in the matrix by using a Universal Background Model (UBM) and a feature voice vector (i-vector) extractor (extractor) to form the feature voice vector.

Where MFCC is an abbreviation for Mel-Frequency Cepstral Coefficients, comprising two key steps: conversion to mel frequency and then cepstrum analysis. In the embodiment, each voice is subjected to voice framing to obtain voice spectrums of a plurality of frames; then, the acquired frequency spectrum is processed by a Mel filter bank to obtain a Mel frequency spectrum, wherein the Mel filter bank can convert non-uniform frequency into uniform frequency; and finally, performing cepstrum analysis on the Mel frequency spectrum to obtain Mel frequency cepstrum coefficients MFCC, wherein the MFCC is the characteristics of the frame of voice, the cepstrum analysis is to logarithm the Mel frequency spectrum and then perform inverse transformation, the inverse transformation is generally realized by DCT discrete cosine transformation, and the coefficients from 2 nd to 13 th after DCT are taken as MFCC coefficients. Thus, the MFCC of each frame of voice forms a vector matrix, the most core vector in the matrix is screened out through a background model (UBM) and a feature voice vector (i-vector) extractor (extractor), and the vector is taken as the feature voice vector of the voice, wherein the most core vector in the matrix is screened out through the background model (UBM) and the feature voice vector (i-vector) extractor (extractor) and belongs to the existing data algorithm of vector matrix calculation, and the description is not repeated herein.

Step S402, comparing every two characteristic voice vectors of each registered voice and scoring, and acquiring a first scoring average value as a first threshold value. Specifically, the terminal device 100 performs pairwise comparison and scoring on the feature speech vectors of each piece of speech by using a vector dot-product (dot-product) algorithm and a PLDA algorithm. In this embodiment, the vector dot product algorithm and the PLDA algorithm are one of the existing algorithms, and are not described herein.

Step S403, obtaining verification voice and calculating the characteristic voice vector of the verification voice. In this embodiment, after obtaining the verification speech, the obtaining module 303 calculates the feature speech vector of the verification speech by using the MFCC algorithm, the UBM model and the vector extractor.

And S404, comparing and scoring the characteristic voice vectors of the verification voice and the characteristic voice vectors of the registration voice in pairs respectively, and acquiring a second scoring average value. And respectively comparing and scoring the characteristic voice vectors of the verification voice with the characteristic voice vectors of the plurality of registration voices in pairs to obtain a plurality of corresponding second scoring values, and averaging the plurality of second scoring values to obtain a second scoring average value. In this embodiment, the registration module 301 registers a preset number of registered speeches, for example, registers 3 registered speeches, and at this time, a feature speech vector corresponding to the registered speeches is a feature speech vector group including 3 speech feature vectors. At this time, only one verification voice is obtained in the voice verification process, and a feature voice vector of the verification voice also exists. The second comparison module 304 performs a pairwise comparison and scoring process on the feature speech vectors of the verification speech and the feature speech vectors of the registration speech, specifically, compares and scores the obtained verification speech with 3 feature speech vectors in the feature speech vector group of the registration speech, respectively, to obtain three scoring values, and averages the three scoring values to obtain a second scoring average value.

Step S405, determining whether the second scored average value is greater than a second threshold. In this embodiment, the second threshold is the sum of the first threshold and a preset value. And when the second scoring average value is larger than the second threshold value, executing step S406, otherwise, ending the process. In this embodiment, the second threshold is greater than the first threshold, and the preset value of the phase difference can be set by a user in a customized manner according to the actual situation in repeated experiments.

Step S406, updating the registration voice according to the verification voice. In this embodiment, the terminal device 100 further updates the registration voice by:

the terminal device 100 first determines whether the second scored average value is greater than a third threshold, where the third threshold is greater than the second threshold. Then, the terminal device 100 updates the verification voice, of which the second scored average value is greater than the third threshold value, to the registration voice, and performs registration according to the updated registration voice. In this embodiment, the third threshold is determined by:

the terminal device 100 selects all verification voices with the second scoring average value higher than a second threshold value, counts the verification voices to be N, and then sorts the second scoring average values corresponding to the selected verification voices from high to low; and finally, selecting the second scoring average value of the Nth/3 th as the third threshold value. And dynamically setting a third threshold value by utilizing a second scoring average value corresponding to the verification voice in the continuous verification process, and further dynamically updating the registration voice according to the third threshold value and ensuring that the registration voice can change according to the change of the user in different periods.

Through the steps S401-406, the voice recognition method for updating voiceprint data provided by the invention can solve the defect that the verification fails because the voice of the user is possibly changed due to factors such as age, physical conditions and environment and the like in the existing voice recognition method over time and the i-vector used for registration is used for comparison during each verification, and further the user can be verified according to the comparison and updating process each time, namely the verification meeting the requirements passes each time, the registration information of the user in a voiceprint library is updated, the accuracy rate of the subsequent voiceprint verification can be improved, and the voice recognition method can adapt to the voice change of a registrant fluctuating over time.

The present invention also provides another embodiment, which is to provide a storage medium storing a voice recognition program for updating voiceprint data, the voice recognition program for updating voiceprint data being executable by at least one processor to cause the at least one processor to perform the steps of the voice recognition method for updating voiceprint data as described above.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A voice recognition method for updating voiceprint data is applied to a terminal device, and is characterized by comprising the following steps:

registering a preset number of registered voices and calculating a characteristic voice vector of each registered voice in the preset number of registered voices;

comparing every two feature voice vectors of each registered voice and scoring, and acquiring a first scoring average value as a first threshold value;

if the second scoring average is greater than the second threshold, updating the registration voice according to the verification voice, including: judging whether the second scoring average value is larger than a third threshold value, updating the verification voice with the second scoring average value larger than the third threshold value into the registration voice, and registering according to the updated registration voice, wherein the third threshold value is larger than the second threshold value, and the third threshold value is determined through the following steps:

selecting all verification voices of the second scoring average value which are higher than the second threshold value, and counting the verification voices to be N;

2. The speech recognition method for updating voiceprint data according to claim 1, wherein the step of calculating the feature speech vector of each of the predetermined number of registered speeches includes:

3. The speech recognition method for updating voiceprint data according to claim 1, wherein the step of comparing and scoring the feature speech vectors of each registered speech by two comprises:

and comparing and scoring the characteristic voice vectors of each registered voice pairwise by using a vector dot product algorithm and a PLDA algorithm.

4. A terminal device, comprising a memory and a processor, wherein the memory stores a voice recognition program for updating voiceprint data, the voice recognition program being executable on the processor, and the voice recognition program for updating voiceprint data implements the following steps when executed by the processor:

5. The terminal apparatus according to claim 4, wherein the step of calculating the feature speech vector of each of the preset number of registered speeches comprises:

6. A storage medium storing a speech recognition program for updating voiceprint data, the speech recognition program for updating voiceprint data being executable by at least one processor to cause the at least one processor to perform the steps of the method for speech recognition of updating voiceprint data according to any one of claims 1 to 3.