CN110648671A

CN110648671A - Voiceprint model reconstruction method, terminal, device and readable storage medium

Info

Publication number: CN110648671A
Application number: CN201910775992.8A
Authority: CN
Inventors: 陈昊亮; 罗伟航; 李炳霖
Original assignee: Guangzhou National Acoustic Intelligent Technology Co Ltd
Current assignee: Guangzhou National Acoustic Intelligent Technology Co Ltd
Priority date: 2019-08-21
Filing date: 2019-08-21
Publication date: 2020-01-03

Abstract

The invention discloses a voiceprint model reconstruction method, which comprises the following steps: obtaining voice sample data, generating an initial voiceprint model based on the voice sample data, wherein the voice sample data comprises a plurality of sub-voice sample data, then obtaining a voiceprint characteristic vector of each sub-voice sample data based on the initial voiceprint model, clustering the voice sample data based on a K-Means algorithm and each voiceprint characteristic vector, dividing the voice sample data into a preset number of sub-sample sets, and then generating a target voiceprint model based on the preset number of sub-sample sets. The invention also discloses a device, a terminal and a readable storage medium. According to the method, the voice sample data are clustered and grouped, and the grouped voice sub-sample set is used for training the voiceprint model in an iterative mode, so that the training efficiency of the voiceprint model and the robustness of the voiceprint model are improved.

Description

Voiceprint model reconstruction method, terminal, device and readable storage medium

Technical Field

The invention relates to the field of voiceprint recognition, in particular to a voiceprint model reconstruction method, a terminal, a device and a readable storage medium.

Background

Voiceprints are the spectrum of sound waves carrying verbal information displayed with an electro-acoustic instrument. Modern scientific research shows that the voiceprint not only has characteristics of specificity, but also has characteristics of relative stability. After the adult, the voice of the human can be kept relatively stable and unchanged for a long time. The voiceprint recognition algorithm establishes a voiceprint recognition model by learning various voice features from the voice map, thereby confirming the speaker.

However, at present, user speech training data are trained based on a training sample guidance Model, specifically, the training data can be trained by a Gaussian Mixture Model-general background Model (GMM-UBM), a Total variance modeling (TV) system or a deep neural network system, and a large amount of user speech is used to train a feature vector representing user information in the training process. The fact that the voice sample lacks a label is a common phenomenon, if the label of the user is manually labeled, a larger error exists, because the user information is very difficult to label by a label operator in the unfamiliar voice of the user, the error is large, and the labeling cost is very high. Therefore, based on the above problems, how to train a better voiceprint model on the training data of the user with incomplete labels is a problem to be solved at present.

The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.

Disclosure of Invention

The invention mainly aims to provide a voiceprint model reconstruction method, a terminal, a device and a readable storage medium, and aims to solve the technical problem that a voiceprint model trained on training data based on incomplete user labels is not strong in robustness.

In order to achieve the above object, the present invention provides a method for reconstructing a voiceprint model, which comprises the following steps:

when a voiceprint model reconstruction request is received, reminding a user of reading the authentication voice content by voice, and acquiring a face video of the user reading the voice;

carrying out face recognition authentication based on the face video, and extracting voice data in the face video;

when the face recognition authentication fails, acquiring a feature vector corresponding to the voice data, and performing voiceprint authentication based on the feature vector;

and after the voiceprint authentication is passed, responding to the voiceprint model reconstruction request.

Further, in an embodiment, the preset number of sub-sample sets includes a first sub-sample set, a second sub-sample set, and a third sub-sample set, where the first preset value is smaller than the second preset value, the clustering the voice sample data based on the K-Means algorithm and each voiceprint feature vector, and the dividing the voice sample data into the preset number of sub-sample sets includes:

calculating the distance between each voiceprint characteristic vector and a preset clustering center based on the K-Means algorithm;

when a first sub-distance smaller than or equal to the first preset value exists in all the distances, taking sub-voice sample data corresponding to the first sub-distance as voice sample data in a first sub-sample set;

when a second sub-distance which is greater than the first preset value and less than or equal to the second preset value exists in all the distances, taking sub-voice sample data corresponding to the second sub-distance as voice sample data in a second sub-sample set;

and when a third sub-distance larger than the second preset value exists in all the distances, taking the sub-voice sample data corresponding to the third sub-distance as the voice sample data in a third sub-sample set.

Further, in one embodiment, the cluster center is calculated by:

and calculating the average value of each voiceprint feature vector, and taking the average value as the clustering center.

Further, in an embodiment, the step of generating the target voiceprint model based on the preset number of subsample sets comprises:

generating a first voiceprint model based on the first set of subsamples;

generating a target voiceprint model based on the first set of subsamples, the second set of subsamples, the third set of subsamples, and the first voiceprint model.

Further, in an embodiment, the step of generating a target voiceprint model based on the first set of subsamples, the second set of subsamples, the third set of subsamples and the first voiceprint model comprises:

generating a second voiceprint model based on the first set of subsamples, the second set of subsamples, and the first voiceprint model;

generating a target voiceprint model based on the first set of subsamples, the second set of subsamples, the third set of subsamples, and the second voiceprint model.

Further, in an embodiment, after the step of generating the target voiceprint model based on the preset number of subsample sets, the method further includes:

when a voiceprint authentication request is received, acquiring voice data to be authenticated based on the voiceprint authentication request;

and determining a voiceprint authentication result of the voice data to be authenticated based on the target voiceprint model.

Further, in an embodiment, after the step of determining the voiceprint authentication result of the voice data to be authenticated based on the target voiceprint model, the method further includes:

and when the voiceprint authentication result is that the voiceprint authentication is passed, sending prompt information that the voiceprint authentication request is passed to a preset terminal.

Further, in an embodiment, the voiceprint model reconstruction apparatus includes:

the voice recognition system comprises an acquisition module, a processing module and a processing module, wherein the acquisition module acquires voice sample data and generates an initial voiceprint model based on the voice sample data, and the voice sample data comprises a plurality of sub-voice sample data;

the processing module is used for acquiring the voiceprint characteristic vector of each sub-voice sample data based on the initial voiceprint model, clustering the voice sample data based on a K-Means algorithm and each voiceprint characteristic vector, and dividing the voice sample data into a preset number of sub-sample sets;

and the generating module is used for generating a target voiceprint model based on the sub-sample sets with the preset number.

In addition, to achieve the above object, the present invention also provides a terminal, including: a memory, a processor and a voiceprint model reconstruction program stored on the memory and executable on the processor, the voiceprint model reconstruction program when executed by the processor implementing the steps of the voiceprint model reconstruction method of any one of the above.

In addition, to achieve the above object, the present invention further provides a readable storage medium having stored thereon a voiceprint model reconstruction program, which when executed by a processor, implements the steps of the voiceprint model reconstruction method according to any one of the above.

The method comprises the steps of obtaining voice sample data, generating an initial voiceprint model based on the voice sample data, wherein the voice sample data comprises a plurality of sub-voice sample data, then obtaining a voiceprint characteristic vector of each sub-voice sample data based on the initial voiceprint model, clustering the voice sample data based on a K-Means algorithm and each voiceprint characteristic vector, dividing the voice sample data into a preset number of sub-sample sets, and then generating a target voiceprint model based on the preset number of sub-sample sets. The voice sample data are clustered and grouped through the unsupervised learning K-Means algorithm, the influence of the fact that the voice sample lacks a label on model training is weakened, voiceprint model iterative training is carried out according to the sequence of the grouped voice sample data from easy to difficult, the performance of the voiceprint model is better, meanwhile, the difficult voice sample data are used for model training through repeated iteration, and therefore the robustness of the voiceprint model is effectively improved.

Drawings

Fig. 1 is a schematic structural diagram of a terminal in a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a first embodiment of a voiceprint model reconstruction method according to the present invention;

FIG. 3 is a flowchart illustrating a voiceprint model reconstruction method according to a second embodiment of the present invention;

fig. 4 is a functional block diagram of an embodiment of a voiceprint model reconstruction apparatus according to the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further described with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

As shown in fig. 1, fig. 1 is a schematic structural diagram of a terminal in a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 1, the terminal may include: a processor 1001, such as a CPU, a network interface 1004, a client interface 1003, a memory 1005, and a communication bus 1002. Wherein a communication bus 1002 is used to enable connective communication between these components. The client interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional client interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface). The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a storage device separate from the processor 1001 described above.

Optionally, the terminal may further include a camera, a Radio Frequency (RF) circuit, a sensor, an audio circuit, a WiFi module, and the like. Wherein, the sensors such as light sensor, motion sensor and other sensors are not described in detail herein.

Those skilled in the art will appreciate that the system architecture shown in fig. 1 is not intended to be limiting and may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a readable storage medium, may include therein an operating system, a network communication module, a client interface module, and a voiceprint model reconstruction program.

In the system shown in fig. 1, the network interface 1004 is mainly used for connecting to a backend server and communicating with the backend server; the client interface 1003 is mainly used for connecting a client (client) and performing data communication with the client; and the processor 1001 may be used to invoke a voiceprint model reconstruction program stored in the memory 1005.

In this embodiment, the terminal includes: the system comprises a memory 1005, a processor 1001 and a voiceprint model reconstruction program stored in the memory 1005 and capable of running on the processor 1001, wherein when the processor 1001 calls the voiceprint model reconstruction program stored in the memory 1005, the steps of the voiceprint model reconstruction method provided by each embodiment of the application are executed.

The invention also provides a voiceprint model reconstruction method, and referring to fig. 2, fig. 2 is a schematic flow chart of a first embodiment of the voiceprint model reconstruction method of the invention.

While a logical order is shown in the flowchart, in some cases, the steps shown or described may be performed in a different order than presented.

In this embodiment, the method for reconstructing a voiceprint model includes:

step S100, obtaining voice sample data, and generating an initial voiceprint model based on the voice sample data, wherein the voice sample data comprises a plurality of sub-voice sample data;

in this embodiment, voiceprint recognition, a type of biometric technology, also known as speaker recognition, is classified into two categories, namely speaker recognition and speaker verification. Different tasks and applications may use different voiceprint recognition techniques, such as recognition techniques may be required to narrow criminal investigation, and validation techniques may be required for banking transactions. Voiceprint recognition is the conversion of acoustic signals into electrical signals, which are then recognized by a computer. A large amount of voice sample data are needed for training the voiceprint model, and the voice data can be collected by a voice data collecting system and stored in a database and used during voiceprint model training. Since the collected voice data is limited, more voice data needs to be acquired to improve the accuracy of the voiceprint model. The difficulty and the cost for acquiring the standard voice data are high, so that a large amount of voice data without labels exist in the voice sample data.

Further, an initial voiceprint Model is generated according to the acquired voice sample data, and the voiceprint Model training can be performed by a Gaussian Mixture Model-Universal Background Model (GMM-UBM), a Total variance modeling (TV) system or a deep neural network system and other methods.

S200, acquiring a voiceprint characteristic vector of each sub-voice sample data based on the initial voiceprint model, clustering the voice sample data based on a K-Means algorithm and each voiceprint characteristic vector, and dividing the voice sample data into a preset number of sub-sample sets;

in this embodiment, first, a voiceprint feature vector of each voice sample data is obtained by using an initial voiceprint model, then, the voice sample data is clustered according to a K-Means algorithm and each voiceprint feature vector, the voice sample data is divided into a plurality of sub-sample sets, and the number of the sub-sample sets is determined according to an actual situation. The K-Means algorithm is a clustering analysis algorithm for iterative solution, and the method comprises the steps of randomly selecting K objects as initial clustering centers, then calculating the distance between each object and each seed clustering center, and allocating each object to the nearest clustering center. The cluster centers and the objects assigned to them represent a cluster. The cluster center of a cluster is recalculated for each sample assigned based on the objects existing in the cluster. This process will be repeated until some termination condition is met. The termination condition may be that no (or minimum number) objects are reassigned to different clusters, no (or minimum number) cluster centers are changed again, and the sum of squared errors is locally minimal. In the present invention, the average value of each voiceprint feature vector is calculated and used as the clustering center.

Specifically, step S200 includes:

step S210, calculating the distance between each voiceprint characteristic vector and a preset clustering center based on the K-Means algorithm;

in this embodiment, a cluster center is determined first, and the determination method is as follows: and calculating the average value of each voiceprint feature vector, and taking the average value as the clustering center, wherein the process of calculating the voiceprint feature vector is well known to those skilled in the art and is not described herein again.

Next, in the specific implementation process of selecting K-Means for calculating the distance between each voiceprint feature vector and the clustering center according to the K-Means algorithm, the Euclidean distance is used as a calculation formula of the user distance:

wherein the content of the first and second substances,and

respectively representing cluster centers and voiceprint feature vectors.

Step S220, when a first sub-distance smaller than or equal to the first preset value exists in all distances, taking sub-voice sample data corresponding to the first sub-distance as voice sample data in a first sub-sample set;

in this embodiment, taking three subsample sets as an example, two preset values, namely a first preset value and a second preset value, are set empirically, where the first preset value is smaller than the second preset value, then distances between each voiceprint feature vector and a cluster center are calculated according to a K-Means algorithm, and voice sample data corresponding to the voiceprint feature vectors whose distances among all distances are smaller than or equal to the first preset value is divided into the first subsample set.

Step S230, when a second sub-distance greater than the first preset value and less than or equal to the second preset value exists in all distances, taking sub-voice sample data corresponding to the second sub-distance as voice sample data in a second sub-sample set;

in this embodiment, among the distances between each voiceprint feature vector and the center of the cluster calculated according to the K-Means algorithm, the voice sample data corresponding to all the voiceprint feature vectors whose distances are greater than the first preset value and less than or equal to the second preset value are divided into a second sub-sample set.

Step S240, when a third sub-distance greater than the second preset value exists in all distances, taking sub-voice sample data corresponding to the third sub-distance as voice sample data in a third sub-sample set.

In this embodiment, among all distances, voice sample data corresponding to the voiceprint feature vector whose distance is greater than the second preset value is classified as a third subsample set.

It should be noted that the distance between the voiceprint feature vector in the first sub-sample set and the cluster center is closest, and the voiceprint feature vector is considered as a voice sample which is easy to learn, while the distance between the second sub-sample set and the third sub-sample set is considered as a voice sample which is far away from the cluster center, and the voiceprint feature vector is a voice sample which is difficult to recognize, and the voice sample is just a sample which needs to be learned by the voiceprint model most.

And step S300, generating a target voiceprint model based on the preset number of the sub-sample sets.

In this embodiment, the target voiceprint is generated according to a preset number of sub-sample sets, for example, when there are 3 sub-sample sets, the target voiceprint model is generated according to the first sub-sample set, the second sub-sample set, and the third sub-sample set.

Specifically, step S300 includes:

step S310, generating a first voiceprint model based on the first subsample set;

in the present embodiment, the learning order is set according to how easy it is to be (i.e., the closer the distance from the cluster center is considered, the easier the distance is considered, and the farther the distance is considered, the harder the distance is), that is, the first set of subsamples is set as simple voice sample data, the second set of subsamples is set as the harder voice sample data, and the third set of subsamples is set as the hardest voice sample data. The learning sequence includes learning simple voice sample data, learning harder voice sample data and finally learning the hardest voice sample data. That is, the simplest first set of subsamples is learned first, then the harder first set of subsamples is learned, and finally the hardest first set of subsamples is learned.

Specifically, a first voiceprint Model is generated according to the first subsample set, and the voiceprint Model training can be performed by a Gaussian Mixture Model-Universal Background Model (GMM-UBM), a Total variance modeling (TV) system or a deep neural network system and other methods.

Step S320, generating a target voiceprint model based on the first subsample set, the second subsample set, the third subsample set and the first voiceprint model.

In this embodiment, after the first voiceprint model is generated according to the first sub-sample set, the first voiceprint model is trained by using the first sub-sample set, the second sub-sample set, and the third sub-sample set, so as to generate the target voiceprint model.

Specifically, step S320 includes:

step S321, generating a second voiceprint model based on the first subsample set, the second subsample set and the first voiceprint model;

in this embodiment, a first voiceprint model is used as an initial model, the first voiceprint model is trained by using a first subsample set and a second subsample set, and the learning rate of the speech training of the first subsample set and the second subsample set is set according to empirical data. Specifically, the first voiceprint Model is used as an initial Model, and all the voice sample data in the first sub-sample set and the second sub-sample set are input into the first voiceprint Model for training by using a Gaussian Mixture Model-Universal Background Model (GMM-UBM), a total variance modeling (TV) system or a deep neural network system, and the like, so as to obtain the second voiceprint Model.

Step S322, generating a target voiceprint model based on the first set of subsamples, the second set of subsamples, the third set of subsamples and the second voiceprint model.

In this embodiment, after the second voiceprint model is generated according to the first subsample set and the second subsample set, the second voiceprint model is trained by using the first subsample set, the second subsample set, and the third subsample set, so as to generate the target voiceprint model.

Specifically, the method includes the steps of firstly obtaining a relatively pure voice closest to a clustering center (namely a first subsample set), then obtaining a voice far away from the clustering center and considered to be difficult (namely a second subsample set), and finally obtaining a voice farthest away from the clustering center and considered to be the most difficult (namely a third subsample set), firstly training a voiceprint model by using the first subsample set (namely a first voiceprint model), then training the second subsample set difficult to obtain together with the first subsample set by using the first voiceprint model as an initial model, and obtaining a second voiceprint model, and finally training the third subsample set difficult to obtain together with the second subsample set and the first subsample set by using the second voiceprint model as the initial model, so that a target voiceprint model is obtained. In the training process, the process that the learning knowledge of human beings is changed from simple to difficult is simulated, the difficult training voice sample is well utilized, the robustness of the voiceprint model is effectively improved, and the performance of the voiceprint model is better. .

The method for reconstructing the voiceprint model provided by this embodiment includes obtaining voice sample data, generating an initial voiceprint model based on the voice sample data, where the voice sample data includes a plurality of sub-voice sample data, then obtaining a voiceprint feature vector of each sub-voice sample data based on the initial voiceprint model, clustering the voice sample data based on a K-Means algorithm and each voiceprint feature vector, dividing the voice sample data into a preset number of sub-sample sets, and then generating a target voiceprint model based on the preset number of sub-sample sets. According to the method, the voice sample data are clustered and grouped through the unsupervised learning K-Means algorithm, the influence of the fact that the voice sample lacks a label on model training is weakened, the voiceprint model iterative training is carried out according to the sequence that the grouped voice sample data are easy to go to be difficult, the performance of the voiceprint model is enabled to be better, meanwhile, the difficult voice sample data are utilized for model training through multiple iterations, and therefore the robustness of the voiceprint model is effectively improved.

Based on the first embodiment, referring to fig. 3, a second embodiment of the voiceprint model reconstruction method of the present invention is provided, in this embodiment, after step S300, the method further includes:

step S400, when a voiceprint authentication request is received, acquiring voice data to be authenticated based on the voiceprint authentication request;

in this embodiment, the Voiceprint (Voiceprint) is a spectrum of sound waves carrying speech information displayed by an electro-acoustic apparatus. The generation of human language is a complex physiological and physical process between the human language center and the vocal organs, and the vocal print maps of any two people are different because the size and the shape of the vocal organs, namely tongue, teeth, larynx, lung and nasal cavity, used by a person during speaking are different greatly. The speech acoustic characteristics of each individual are both relatively stable and variable, not absolute, but invariable. The variation can come from physiology, pathology, psychology, simulation, camouflage and is also related to environmental interference. However, since the pronunciation organs of each person are different, in general, people can distinguish different sounds or judge whether the sounds are the same. Voiceprint recognition has two categories, Speaker Identification (Speaker Identification) and Speaker Verification (Speaker Verification). The former is used for judging which one of a plurality of people said a certain section of voice, and is a 'one-out-of-multiple' problem; the latter is used to confirm whether a certain speech is spoken by a specified person, which is a one-to-one discrimination problem. Different voiceprint recognition techniques are used for different tasks and applications, such as identification techniques may be required for criminal investigation and validation techniques for bank transactions. Therefore, voiceprint recognition is widely applied in the field of identity authentication.

Specifically, when a voiceprint authentication request is received, to-be-authenticated voice data is obtained according to the voiceprint authentication request, and the to-be-authenticated voice data can be subjected to voiceprint authentication by extracting a voiceprint feature vector through a voiceprint model.

Step S500, determining the voiceprint authentication result of the voice data to be authenticated based on the target voiceprint model.

In this embodiment, voice data to be authenticated is used as input of the target voiceprint model, a voiceprint feature vector corresponding to the voice data to be authenticated is further obtained through the target voiceprint model, the voiceprint feature vector corresponding to the voice data to be authenticated is compared with the voiceprint feature vector registered by the user, and a voiceprint authentication result is determined. Specifically, calculating a matching numerical value of a voiceprint feature vector corresponding to voice data to be authenticated and a voiceprint feature vector registered by the user, and if the matching numerical value is greater than or equal to a preset threshold value, determining that voiceprint authentication passes authentication; and if the matching numerical value is smaller than the preset threshold value, determining that the voiceprint authentication fails.

And step S600, when the voiceprint authentication result is that the voiceprint authentication is passed, sending a prompt message that the voiceprint authentication request is passed to a preset terminal.

In this embodiment, when the voiceprint authentication result is that the voiceprint authentication passes, the prompt message that the voiceprint authentication request passes is sent to the preset terminal, and similarly, when the voiceprint authentication result is that the voiceprint authentication fails, the prompt message that the voiceprint authentication request fails is sent to the preset terminal.

According to the voiceprint model reconstruction method provided by the embodiment, when a voiceprint authentication request is received, to-be-authenticated voice data are obtained based on the voiceprint authentication request, then the voiceprint authentication result of the to-be-authenticated voice data is determined based on the target voiceprint model, and then the voiceprint authentication result of the to-be-authenticated voice data is determined based on the target voiceprint model, so that the user identity is authenticated by using the voiceprint of the user, the convenience in the use process is improved, and the user experience is improved.

The invention further provides a voiceprint model reconstruction device, and referring to fig. 4, fig. 4 is a functional module schematic diagram of an embodiment of the voiceprint model reconstruction device of the invention.

The acquiring module 10 acquires voice sample data, and generates an initial voiceprint model based on the voice sample data, wherein the voice sample data comprises a plurality of sub-voice sample data;

the processing module 20 is configured to obtain a voiceprint feature vector of each sub-voice sample data based on the initial voiceprint model, perform clustering on the voice sample data based on a K-Means algorithm and each voiceprint feature vector, and divide the voice sample data into a preset number of sub-sample sets;

and a generating module 30 for generating a target voiceprint model based on the preset number of sub-sample sets.

Further, the processing module 20 is further configured to:

Further, the generating module 30 is further configured to:

generating a first voiceprint model based on the first set of subsamples;

Further, the generating module 30 is further configured to:

Further, the voiceprint model reconstruction device is further configured to:

the voice recognition system comprises an acquisition module, a recognition module and a recognition module, wherein the acquisition module acquires voice data to be recognized based on a voiceprint authentication request when receiving the voiceprint authentication request;

and the determining module is used for determining the voiceprint authentication result of the voice data to be authenticated based on the target voiceprint model.

Further, the voiceprint model reconstruction device is further configured to:

and the sending module is used for sending prompt information that the voiceprint authentication request passes to a preset terminal when the voiceprint authentication result is that the voiceprint authentication passes.

In addition, an embodiment of the present invention further provides a readable storage medium, where a voiceprint model reconstruction program is stored on the readable storage medium, and when being executed by a processor, the voiceprint model reconstruction program implements the steps of the voiceprint model reconstruction method in the foregoing embodiments.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better embodiment. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a readable storage medium (such as ROM/RAM, magnetic disk, optical disk) as described above and includes instructions for enabling a system device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the present specification and drawings, or used directly or indirectly in other related fields, are included in the scope of the present invention.

Claims

1. A voiceprint model reconstruction method is characterized by comprising the following steps:

acquiring voice sample data, and generating an initial voiceprint model based on the voice sample data, wherein the voice sample data comprises a plurality of sub-voice sample data;

acquiring a voiceprint characteristic vector of each sub-voice sample data based on the initial voiceprint model, clustering the voice sample data based on a K-Means algorithm and each voiceprint characteristic vector, and dividing the voice sample data into a preset number of sub-sample sets;

and generating a target voiceprint model based on the preset number of sub-sample sets.

2. The method of claim 1, wherein the predetermined number of subsamples includes a first subsample set, a second subsample set, and a third subsample set, the first predetermined value is smaller than the second predetermined value, the clustering the voice sample data based on the K-Means algorithm and the respective voiceprint feature vectors, the dividing the voice sample data into the predetermined number of subsamples includes:

3. The voiceprint model reconstruction method of claim 2 wherein the cluster center is calculated by:

4. The voiceprint model reconstruction method of claim 2 wherein said step of generating a target voiceprint model based on said preset number of subsample sets comprises:

generating a first voiceprint model based on the first set of subsamples;

5. The method of voiceprint model reconstruction according to claim 4 wherein said step of generating a target voiceprint model based on said first set of subsamples, said second set of subsamples, said third set of subsamples and said first voiceprint model comprises:

6. The voiceprint model reconstruction method according to any one of claims 1 to 5, wherein said step of generating a target voiceprint model based on said preset number of sets of subsamples is followed by further comprising:

7. The voiceprint model reconstruction method according to claim 6, wherein after the step of determining the voiceprint authentication result of the voice data to be authenticated based on the target voiceprint model, further comprising:

8. A voiceprint model reconstruction apparatus, comprising:

the acquisition module acquires voice sample data and generates an initial voiceprint model based on the voice sample data, wherein the voice sample data comprises a plurality of sub-voice sample data;

9. A terminal, characterized in that the terminal comprises: a memory, a processor and a voiceprint model reconstruction program stored on the memory and executable on the processor, the voiceprint model reconstruction program when executed by the processor implementing the steps of the voiceprint model reconstruction method of any one of claims 1 to 7.

10. A readable storage medium, having stored thereon the voiceprint model reconstruction program which, when executed by a processor, implements the steps of the voiceprint model reconstruction method according to any one of claims 1 to 7.