WO2017197953A1

WO2017197953A1 - Voiceprint-based identity recognition method and device

Info

Publication number: WO2017197953A1
Application number: PCT/CN2017/075346
Authority: WO
Inventors: 彭丹丹
Original assignee: 腾讯科技（深圳）有限公司
Priority date: 2016-05-16
Filing date: 2017-03-01
Publication date: 2017-11-23
Also published as: CN107395352A; CN107395352B

Abstract

An embodiment of the present invention discloses a voiceprint-based identity recognition method. The method comprises: acquiring voice data transmitted by a user account as a sender in an instant messaging application; training a voiceprint recognition model according to the acquired voice data, and creating a voiceprint feature library corresponding to the user account; receiving an initiated identity verification request, and acquiring an input target user account and target voice data; finding a voiceprint feature library matching the target user account, and, if the target voice data matches the found voiceprint feature library, confirming verification of the identity of the target user account. In addition, an embodiment of the present invention also correspondingly discloses a voiceprint-based identity recognition device. The present invention can improve operational convenience in the recording of sample voiceprints of users.

Description

Voiceprint based identification method and device

Cross-reference to related applications

The present application claims priority to the patent application filed on Jan. 16, 2016, at the Chinese Patent Office, the priority of which is hereby incorporated by reference. .

Technical field

The present invention relates to the field of computer technologies, and in particular, to a voiceprint based identification method and apparatus.

Background technique

Voiceprint recognition technology, that is, the recognition technology of speaker speech, is a kind of correlation feature that extracts the identity of the speaker by voice signal, such as the fundamental frequency characteristic reflecting the opening and closing frequency of the glottis, and the spectral features reflecting the size and shape of the mouth and the length of the channel. Etc., in order to identify the technology of the speaker identity and so on. It can be widely used in information security, telephone banking, smart access control, and entertainment value-added. The security provided by voiceprint recognition is comparable to other biometric technologies (fingerprint, palm shape, and iris), and only requires a telephone or a microphone. No special equipment is required, data collection is extremely convenient, and the cost is low. An economical, reliable, easy and secure way to identify. At any time, simply enter the speaker's voice and rely on a unique voiceprint to safely identify the speaker. Voiceprint recognition technology is more prominent in the telephone channel and is the only non-contact biometric technology that can be used for remote control.

However, in order to improve the confidence of the voiceprint feature as a sample, that is, to improve the accuracy of voiceprint recognition, it is generally required that the user read a large amount of text when recording the sample voiceprint, thereby extracting a relatively complete voiceprint feature. This makes the process of recording the sample voiceprint by the user takes a long time, resulting in insufficient convenience of operation.

Summary of the invention

Based on this, in order to solve the problem of the convenience of the operation in order to extract a relatively complete voiceprint feature in the conventional technology, the user needs to read a large amount of characters when recording the sample voiceprint. The embodiment of the present invention proposes a voiceprint based identification method.

A method based on voiceprint identification, comprising:

Collecting voice data transmitted by the user account of the sender in the instant messaging application;

Performing a voiceprint recognition model training according to the collected voice data, and creating a voiceprint feature library corresponding to the user account;

Receiving the initiated authentication request, obtaining the input target user account and the target voice data;

And searching for a voiceprint feature database that matches the target user account, and determining that the identity verification of the target user account passes when the target voice data matches the found voiceprint feature database.

In addition, in order to solve the technical problem that the conventional techniques exist in order to extract a relatively complete voiceprint feature, the user needs to read a large amount of characters when recording the sample voiceprint, resulting in poor operation convenience, the embodiment of the present invention also proposes A voiceprint based identification device.

A voiceprint based identification device comprising:

a voice data collecting module, configured to collect voice data transmitted by a user account as a sender in an instant messaging application;

a voiceprint feature library creating module, configured to perform a voiceprint recognition model training according to the collected voice data, and create a voiceprint feature library corresponding to the user account;

a target information obtaining module, configured to receive the initiated identity verification request, and obtain the input target user account and the target voice data;

a voiceprint matching module, configured to search a voiceprint feature library that matches the target user account, and determine an identity school of the target user account when the target voice data matches the found voiceprint feature database Passed the test.

After adopting the voiceprint-based identification method and device, the user does not need to read a large amount of training text to record the voiceprint feature to establish a voiceprint feature library, and the terminal or server can collect the instant communication message sent by the user. The speech data in the speech is used as a training sample of the voiceprint feature of the user, thereby saving the time for the user to input the voiceprint feature and improving the convenience of operation.

DRAWINGS

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below. Obviously, the drawings in the following description are only It is a certain embodiment of the present invention, and other drawings can be obtained from those skilled in the art without any creative work.

1 is a schematic flow chart of a voiceprint based identification method in an embodiment;

2 is a schematic diagram of an instant messaging application interface for transmitting a voice segment in an embodiment;

3 is a schematic diagram of an interface for providing random code reading verification in one embodiment;

4 is a schematic structural diagram of a voiceprint based identification device in an embodiment;

FIG. 5 is a schematic structural diagram of a computer device that runs the aforementioned voiceprint-based identification method in one embodiment.

detailed description

The technical solutions in the embodiments of the present invention are clearly and completely described in the following with reference to the accompanying drawings in the embodiments of the present invention. It is obvious that the described embodiments are only a part of the embodiments of the present invention, but not all embodiments. All other embodiments obtained by those skilled in the art based on the embodiments of the present invention without creative efforts are within the scope of the present invention.

In order to solve the technical problem that the conventional techniques exist in order to extract a relatively complete voiceprint feature, the user needs to read a large amount of characters when recording the sample voiceprint, resulting in poor operation convenience, the embodiment of the present invention provides a basis for Voiceprint identification method. The implementation of the method may depend on a computer program running on a von Neumann system-based computer system, which may be an instant messaging application or a client program of a social networking application with instant messaging functionality or The server program, the computer system executing the server program may be a terminal device running a client program of an instant messaging application or a social networking application having an instant communication function, or may be a social networking application running an instant messaging application or having an instant communication function. Server device for the server program.

Specifically, as shown in FIG. 1, the voiceprint-based identification method includes:

Step S102: Collect voice data transmitted by the user account as the sender in the instant messaging application.

When users use instant messaging applications, such as WeChat and QQ, these instant messaging applications provide the function of voice clip calls. As shown in FIG. 2, the user can input a segment of the voice data through the microphone on the mobile phone by long pressing the virtual button, and after releasing the virtual button, the voice data can be sent to the user of the receiver.

When users use the instant messaging application, they need to log in to the user account first. In this embodiment, the terminal only collects voice data sent by the logged-in user account, and does not collect voice data received by the logged-in user account. When the instant messaging application collects voice data input by the user through the mobile phone microphone, it is usually cached in a preset storage address, and when a complete voice data input is obtained after the acquisition is completed (that is, when the user releases the virtual button, the voice data is collected once). , generate the corresponding voice data file), and then send it to the server or other terminal. When the terminal performs the voiceprint-based identification method, the voice data can be obtained in the cached storage address.

Step S104: Perform voiceprint recognition model training according to the collected voice data, and create a voiceprint feature library corresponding to the user account (ie, a database including one or more voiceprint features). There are many algorithms that can be used for voiceprint recognition modeling, such as Dynamic Time Warping (DTW), Artificial Neural Network (ANN), and Hidden Markov Model (HMM). , Gaussian Mixture Model (GMM) and so on. Because GMM has a good fitting property to the distribution of speech acoustic features, GMM method has become the mainstream method of speech recognition system. In order to improve the recognition accuracy and recognition efficiency, in this paper, GMM is used as a modeling method as an example.

For example, as a specific implementation of the voiceprint recognition model training, the input voice data sequence (PCM (Pulse Code Modulation) code stream) may be preprocessed to remove the non-speech signal and the silence voice signal, and the voice signal is Framing, for subsequent processing; extracting the Mel-Frequency Cepstral Coefficients (MFCC) parameters of each frame of the speech signal and saving; using the extracted MFCC parameters to train the user (ie, the speaker) GMM, The GMM voiceprint model that is specific to this user.

Since the user frequently uses the instant messaging application such as WeChat and QQ, the number of times the voice segment is sent is also high. Therefore, when the step S102 is performed multiple times, multiple voice segments corresponding to the same registered user account are collected. data). The collected plurality of voice data can be used as samples and input into the voiceprint recognition model for machine learning.

For example, the feature values of each acquired speech data in the spectrum, cepstrum, formant, pitch, reflection coefficient, rhythm, rhythm, speed, intonation, volume, etc. can be extracted and then identified by the existing voiceprint. The model is trained to obtain a voiceprint feature library corresponding to the logged-in user account.

Step S106: Receive an initiated identity verification request, and obtain the input target user account and target voice data.

Step S108: Find a voiceprint feature database that matches the target user account, and determine that the identity verification of the target user account passes when the target voice data matches the found voiceprint feature database.

After the voiceprint feature library is created, user authentication can be performed through the voiceprint feature library (when the collected voice feature data is less, or the voiceprint feature library is not created, the user can be prompted to change other authentication methods). When the user logs in on the terminal, he can select the voiceprint verification method, input the corresponding target user account, and input a voice (target voice data) through the microphone. The terminal may first search for the voiceprint feature library corresponding to the input target user account, and then match the target voice data with the voiceprint feature database. If the matching is successful, the identity verification of the target user account may be determined to pass. Still taking the GMM as an example, in this step, a matching operation function of the input voice and the GMM voiceprint model (set as needed) can be provided to determine whether the input target voice data matches the voiceprint (ie, the model), specifically In implementation, the matching process can be implemented using the Maximum a Posterior probability (MAP) criterion.

It should be noted that the above method may also be performed by an instant messaging application or a social application having an instant messaging application function:

In an application scenario in which the above method is performed by the server, the voice segments transmitted between the terminals need to be forwarded through the server, and the audio data cannot be directly transmitted between the terminals. The server may collect the voice data sent by the terminal that is registered by the user account of the sender when forwarding the voice data transmitted between the terminals, and establish a mapping relationship between the collected voice data and the user account of the sender.

For example, the voice data sent by the user account A to other friends after being logged in to the terminal needs to be forwarded through the server. Therefore, the server can collect the voice data sent by the user account A, and generate a voiceprint feature library corresponding to the user account A. Users can log in to other terminals using user account A The server inputs the target voice data through the terminal and uploads it to the server. The server searches for the voiceprint feature database corresponding to the user account A, and then determines whether the uploaded target voice data matches the found voiceprint feature database, and if so, the user account A completes the login on the server.

In addition, the voiceprint-based identification method is not limited to a scenario in which a user account is logged in, and may also be used in a password recovery/appeal of a user account. For example, in an application scenario, the user accounts of the instant messaging application QQ and WeChat are interrelated user accounts. When the user uses the QQ password recovery function, the user can select the account authentication method to select the WeChat account. At this time, the server can search for the WeChat account corresponding to the QQ number that needs to be retrieved by the password, and then search for the voiceprint feature database corresponding to the WeChat account, and receive the target voice data for the identity verification input by the user through the microphone, and match If it succeeds, it determines that the authentication is passed, prompting the user to reset the QQ password or send the password through the pre-bound mailbox.

Further, in an application scenario, after receiving the initiated identity verification request, the server may also generate target text content and display it to the user on the terminal. Prompt the user to read the above target text content. Then, the target voice data input corresponding to the target text content of the presentation is received, that is, the target voice data input when the user reads the target text content displayed on the terminal.

In this embodiment, when determining whether the identity verification of the target user account passes, the target voice data may also be converted into text data by voice recognition; when the text data matches the target text content. And determining that the identity verification of the target user account is passed.

As shown in FIG. 3, when the user performs identity verification, the terminal also displays a series of text content "85274196" generated by the terminal or the server, and prompts the user to read the numbers. The target voice data generated by the user reading these numbers is uploaded to the server. The server not only extracts the feature vector of the target speech data spectrum, cepstrum, formant, pitch, reflection coefficient, rhythm, rhythm, speed, tone or volume, but also performs speech recognition on the speech data to identify the semantic content of the target speech data. . On the premise that its voiceprint matches, and its semantics is also "85274196", or the pinyin string identified as "85274196" is recognized, the user identity verification is passed.

Using this combination of voiceprint verification and semantic verification to authenticate users can prevent criminals from using other users' recordings for authentication. For example, if only voiceprint is used for identity verification, when user B holds the recording of user A, it can log in using the account of user A, and input the target voice data by playing the recording, so that the user can successfully pass the authentication to the user. A's body Log in to the system to steal user privacy. The above-mentioned combination of voiceprint verification and semantic verification is used for identity verification. Even if user B holds the recording of user A, since the text content displayed to user B prompting the user to read can be randomly generated, user B only The voice recording can be verified by playing the recording, but the semantic verification cannot be successfully performed, thus improving the security of the authentication.

In this embodiment, in order to save the computing performance, after creating the voiceprint feature database corresponding to the user account, it may also be determined whether the confidence level of the created voiceprint feature library corresponding to the user account is greater than or Equal to the threshold, and if so, stop collecting voice data transmitted by the user account as the sender in the instant messaging application.

For example, if the server has collected 100 samples of voice data, and generated a voiceprint feature library. When the voice data of 101 samples is collected, it can be matched with the created voiceprint feature library, and the probability of successful matching is the confidence of the voiceprint feature library. If the confidence of the voiceprint feature library is high, it means that the voiceprint feature library has been able to identify the voiceprint more accurately, and therefore, the voice data of the sample can be stopped, thereby saving computer resources.

In this embodiment, acquiring the input target user account and the target voice data includes: receiving the input target voice data at least once. Before determining the identity verification of the target user account, the method further includes: determining a matching frequency/proportion of the at least one received target voice data and the found voiceprint feature database, where the matching times/proportions are greater than Or equal to the threshold, determining that the target voice data matches the found voiceprint feature library.

Since there may be inaccuracies in voiceprint feature matching when there are few samples, authentication can be performed by multiple matches. When the user in the process of identity verification, the target voice data input multiple times is verified by most or a large proportion, the identity verification is determined, thereby improving the accuracy of the identity verification.

In an embodiment, after searching the voiceprint feature library matching the target user account, the number of times the target voice data and the found voiceprint feature library are continuously mismatched may be greater than or equal to a threshold. The target user account.

That is to say, if the user voice fails to pass multiple consecutive authentications, the account that the user logs in can be locked, and the user is not allowed to log in again, and the user needs to be unlocked by other authentication methods. Or the target user account can be locked for a certain period of time. When the lock time arrives, the target user account is unlocked and allowed to log in to the system, thereby preventing the criminals from trying to simulate the sound multiple times. Verification improves security.

In addition, in order to solve the technical problem that in the conventional technology, in order to extract a relatively complete voiceprint feature, a user needs to read a large amount of characters when recording a sample voiceprint, resulting in poor operation convenience, in one embodiment, a method is also proposed. A voiceprint-based identification device, as shown in FIG. 4, the voiceprint-based identity recognition device includes a voice data collection module 102, a voiceprint feature library creation module 104, a target information acquisition module 106, and a voiceprint comparison Module 108, wherein:

The voice data collecting module 102 is configured to collect voice data transmitted by the user account as the sender in the instant messaging application;

The voiceprint feature library creating module 104 is configured to perform voiceprint recognition model training according to the collected voice data, and create a voiceprint feature library corresponding to the user account;

The target information obtaining module 106 is configured to receive the initiated identity verification request, and obtain the input target user account and the target voice data.

The voiceprint matching module 108 is configured to search a voiceprint feature library that matches the target user account, and determine the identity of the target user account when the target voice data matches the found voiceprint feature database. The verification passed.

Optionally, in one embodiment, as shown in FIG. 4, the target information acquiring module 106 is further configured to generate target text content and display; obtain an input target user account, and receive the target text content of the display. Corresponding target voice data input.

Optionally, in one embodiment, the voiceprint comparison module 108 is further configured to convert the target voice data into text data by voice recognition; when the text data matches the target text content, Determining that the identity verification of the target user account is passed.

Optionally, in one embodiment, as shown in FIG. 4, the foregoing apparatus further includes a voice data collection stop module 110, configured to determine whether the confidence level of the created voiceprint feature library corresponding to the user account is It is greater than or equal to the threshold, and if so, stops collecting voice data transmitted by the user account as the sender in the instant messaging application.

Optionally, in one embodiment, the target information acquiring module 106 is further configured to receive the input target voice data at least once; the voiceprint comparison module 108 is further configured to determine the at least one received target. The number of times/proportion of the matching of the voice data with the found voiceprint feature library, When the number of matches/proportion is greater than or equal to the threshold, it is determined that the target voice data matches the found voiceprint feature database.

Optionally, in one embodiment, as shown in FIG. 4, the foregoing apparatus further includes a target user account locking module 112, configured to: when the target voice data does not match the found voiceprint feature library, Lock the target user account.

In one embodiment, as shown in FIG. 5, FIG. 5 illustrates a terminal 10 of a von Neumann system-based computer system that operates the voiceprint-based identification method described above. The computer system can be a terminal device such as a smart phone, a tablet computer, a palmtop computer, a notebook computer or a personal computer. Specifically, the terminal 10 may include an external input interface 1001, a processor 1002, a memory 1003, and an output interface 1004 connected through a system bus. The external input interface 1001 can optionally include at least a network interface 10012. The memory 1003 may include an external memory 10032 (eg, a hard disk, an optical disk, or a floppy disk, etc.) and an internal memory 10034. The output interface 1004 can include at least a device such as a display 10042.

The processor 1002 (or CPU (Central Processing Unit)) is a computing core and a control core of the terminal 10, and can parse various types of commands in the terminal 10 and process various types of data of the smart device. Memory 1003 (Memory) is a memory device in terminal 10 for storing programs and data, which may include, but is not limited to, ROM, RAM, CD-ROM, and other removable memories and the like. The memory 1003 provides a storage space, which can be used to store the operating system of the terminal 10, and can also store program code, function modules, and the like. The operating system can include, but is not limited to, a windows system, an Android system, and the like.

The operation of the method according to an embodiment of the present invention may be based on a computer program whose program files are stored in the external memory 10032 of the aforementioned von Neumann system-based computer system 10, It is loaded into the internal memory 10034 at runtime, and then compiled into the machine code and then passed to the processor 1002 for execution, thereby forming a logical voice data acquisition module 102, sound in the von Neumann system-based computer system 10. The pattern feature library creation module 104, the target information acquisition module 106, the voiceprint comparison module 108, the voice data collection stop module 110, and the target user account lockout module 112. And in the above-described voiceprint-based identity recognition execution process, the input parameters are all received through the external input interface 1001, and transferred to the buffer in the memory 1003, and then input to the processor 1002 for processing, and the processed result data is cached in the memory. In 1003, for subsequent processing, or passed to the output interface 1004 for output.

A person skilled in the art can understand that all or part of the process of implementing the above embodiment method can be completed by a computer program to instruct related hardware, and the program can be stored in a computer readable storage medium, in the data processing device. In the operation, the program may cause the data processing device to perform the process or the steps including the embodiments of the foregoing methods. For details, refer to the description of the embodiments in conjunction with the accompanying drawings, and no further details are described herein.

The storage medium mentioned in the text may be a magnetic disk, an optical disk, a read-only memory (ROM), or a random access memory (RAM).

In addition, the above computer readable storage medium may also be various types of recording media that the computer device can access through a network or a communication link, for example, a recording medium that can extract data therein through a router, the Internet, a local area network, or the like. Furthermore, the computer readable storage medium described above may also be a plurality of computer readable storage media located in the same computer system, or a computer readable storage medium distributed across a plurality of computer systems or computing devices.

The above is only the preferred embodiment of the present invention, and the scope of the present invention is not limited thereto, and thus equivalent changes made in the claims of the present invention are still within the scope of the present invention.

Claims

A voiceprint-based identification method, comprising:

Collecting voice data transmitted by the user account of the sender in the instant messaging application;

Performing a voiceprint recognition model training according to the collected voice data, and creating a voiceprint feature library corresponding to the user account;

Receiving the initiated authentication request, obtaining the input target user account and the target voice data;

And searching for a voiceprint feature database that matches the target user account, and determining that the identity verification of the target user account passes when the target voice data matches the found voiceprint feature database.
A voiceprint based identification method according to claim 1, wherein

After receiving the initiated authentication request, the method further includes:

Generate target text content and display it;

The obtaining the input target user account and the target voice data includes:

Obtaining the input target user account, and receiving target voice data input corresponding to the target text content of the presentation.
The voiceprint-based identification method according to claim 2, wherein the determining the identity verification of the target user account further comprises:

Converting the target voice data into text data by voice recognition;

When the text data matches the target text content, it is determined that the identity verification of the target user account passes.
The voiceprint-based identification method according to claim 1, wherein the creating a voiceprint feature library corresponding to the user account further comprises:

Determining whether the confidence level of the created voiceprint feature database corresponding to the user account is greater than or equal to a threshold, and if so, stopping collecting voice data transmitted by the user account as the sender in the instant messaging application.
A voiceprint based identification method according to claim 1, wherein

The obtaining the input target user account and the target voice data includes:

Receiving at least one input target voice data;

Before determining the identity verification of the target user account, the method further includes:

Determining a number of matches/proportion of the at least one received target voice data and the found voiceprint feature library, and determining the target voice data and the finding when the number of matches/proportion is greater than or equal to a threshold The soundprint feature library is matched.
The voiceprint-based identification method according to claim 1, wherein the searching for the voiceprint feature library matching the target user account further comprises:

The target user account is locked when the target voice data does not match the found voiceprint feature library.
The voiceprint-based identification method according to claim 1, further comprising:

When the account authentication is performed on the target user account, the authentication mode is set to be verified by using the first user account, where the target user account and the first user account are associated accounts of the same user;

Finding the first user account, and searching for a first voiceprint feature library corresponding to the first user account;

Receiving, by the user, first voice data for authentication; and

Matching the first voice data with the first voiceprint feature library, and verifying by using the account of the target user account when the matching is successful.
A voiceprint-based identification device, comprising:

a voice data collecting module, configured to collect voice data transmitted by a user account as a sender in an instant messaging application;

a voiceprint feature library creating module, configured to perform a voiceprint recognition model training according to the collected voice data, and create a voiceprint feature library corresponding to the user account;

a target information obtaining module, configured to receive the initiated identity verification request, and obtain the input target user account and the target voice data;

a voiceprint matching module, configured to search a voiceprint feature library that matches the target user account, and determine an identity school of the target user account when the target voice data matches the found voiceprint feature database Passed the test.
The voiceprint-based identification device according to claim 8, wherein the target information acquisition module is further configured to generate target text content and display; obtain an input target user account, and receive target text with the display The target voice data corresponding to the content is input.
The voiceprint-based identification device according to claim 9, wherein the voiceprint comparison module is further configured to convert the target voice data into text data by voice recognition; When the content of the target text matches, it is determined that the identity verification of the target user account is passed.
The voiceprint-based identification device according to claim 8, wherein the device further comprises a voice data collection stop module, configured to determine the confidence of the created voiceprint feature library corresponding to the user account Whether the degree is greater than or equal to the threshold, and if so, stops collecting voice data transmitted by the user account as the sender in the instant messaging application.
The voiceprint-based identification device according to claim 8, wherein the target information acquisition module is further configured to receive at least one input target voice data;

The voiceprint matching module is further configured to determine a matching frequency/proportion of the at least one received target voice data and the found voiceprint feature database, when the number of matches/proportion is greater than or equal to a threshold, Determining that the target voice data matches the found voiceprint feature library.
The voiceprint-based identification device according to claim 8, wherein the device further comprises a target user account locking module, configured to: the target voice data does not match the found voiceprint feature library The target user account is locked.
A voiceprint-based identification device according to claim 8, wherein

The target information obtaining module is further configured to: when the account verification is performed on the target user account, the verification mode is set to be verified by using the first user account, wherein the target user account and the first user account Is the associated account of the same user;

The voiceprint comparison module is further configured to search the first user account, and search for a first voiceprint feature library corresponding to the first user account;

The target information acquiring module is further configured to receive first voice data input by the user for identity verification;

The voiceprint matching module is further configured to match the first voice data with the first voiceprint feature library, and when the matching is successful, verify the account by using the target user account.
A computer readable storage medium, configured to store computer readable instructions that, when executed on a data processing device, cause the data processing device to perform a predetermined operation, the predetermined operation comprising:

Collecting voice data transmitted by the user account of the sender in the instant messaging application;

Performing a voiceprint recognition model training according to the collected voice data, and creating a voiceprint feature library corresponding to the user account;

Receiving the initiated authentication request, obtaining the input target user account and the target voice data;

And searching for a voiceprint feature database that matches the target user account, and determining that the identity verification of the target user account passes when the target voice data matches the found voiceprint feature database.
A computer readable storage medium according to claim 15 wherein:

After receiving the initiated authentication request, the predetermined operation further includes:

Generate target text content and display it;

The obtaining the input target user account and the target voice data includes:

Obtaining the input target user account, and receiving target voice data input corresponding to the target text content of the presentation.
The computer readable storage medium according to claim 16, wherein the determining the identity verification of the target user account further comprises:

Converting the target voice data into text data by voice recognition;

When the text data matches the target text content, it is determined that the identity verification of the target user account passes.
The computer readable storage medium according to claim 15, wherein after the creation of the voiceprint feature library corresponding to the user account, the predetermined operation further comprises:

Determining whether the confidence level of the created voiceprint feature database corresponding to the user account is greater than or equal to a threshold, and if so, stopping collecting voice data transmitted by the user account as the sender in the instant messaging application.
A computer readable storage medium according to claim 15 wherein:

The obtaining the input target user account and the target voice data includes:

Receiving at least one input target voice data;

Before determining that the identity verification of the target user account is passed, the predetermined operation further includes:

Determining a number of matches/proportion of the at least one received target voice data and the found voiceprint feature library, and determining the target voice data and the finding when the number of matches/proportion is greater than or equal to a threshold The soundprint feature library is matched.
The computer readable storage medium according to claim 15, wherein after the lookup of the voiceprint feature library matching the target user account, the predetermined operation further comprises:

The target user account is locked when the target voice data does not match the found voiceprint feature library.