CN111785280A

CN111785280A - Identity authentication method and device, storage medium and electronic equipment

Info

Publication number: CN111785280A
Application number: CN202010524896.9A
Authority: CN
Inventors: 丁科; 何选基; 万广鲁
Original assignee: Beijing Sankuai Online Technology Co Ltd
Current assignee: Beijing Sankuai Online Technology Co Ltd
Priority date: 2020-06-10
Filing date: 2020-06-10
Publication date: 2020-10-16

Abstract

The present disclosure relates to an identity authentication method and apparatus, a storage medium, and an electronic device, the method including: responding to an identity authentication request of a first user, randomly generating authentication text content and providing the authentication text content for the first user, wherein the authentication text content comprises preset text content which is text content matched with registered voice content input by a registered user; acquiring authentication voice content input by the first user based on the authentication text content; under the condition that the authentication voice content is determined to be matched with the authentication text content, extracting a target voice segment matched with the preset text content from the authentication voice content; and according to the target voice fragment, performing identity authentication on the first user and obtaining an authentication result of the identity authentication.

Description

Identity authentication method and device, storage medium and electronic equipment

Technical Field

The present disclosure relates to the field of voice recognition, and in particular, to an identity authentication method and apparatus, a storage medium, and an electronic device.

Background

Voiceprint authentication is an identity authentication mode widely applied at present. Due to the difference of gender, age, accent, habit and the like, each person pronounces sound with unique characteristics, so that people can recognize different speakers by listening to the other person to speak, and the characteristics are the application premise of voiceprint authentication. Similar to fingerprint authentication, face authentication and other modes, sound can also be processed to extract a unique voice print of an individual for comparison and authentication.

Since voiceprint extraction involves a large number of pronunciation processing habits, in traditional voiceprint authentication, a large number of user voices need to be collected in a feature learning stage to extract processing features of various word tones, word pronunciations, word group rhythms and the like of a user, so that feature comparison can be performed on any or appointed voices spoken by the user in an authentication stage to accurately authenticate the user. However, this method requires a user to upload a large amount of speech at the time of registration, and is time-consuming and inconvenient for the user. Another voiceprint authentication method can solve the problem that the same audio is used during registration and authentication, so that only the pronunciation habit in the audio can be learned in a feature learning stage, and other characters and phrases are not involved during authentication, so that the user voice can be conveniently learned and recognized.

Disclosure of Invention

An object of the present disclosure is to provide an identity authentication method and apparatus, a storage medium, and an electronic device, so as to solve the above technical problems.

In order to achieve the above object, in a first aspect of the present disclosure, there is provided an identity authentication method, including:

responding to an identity authentication request of a first user, randomly generating authentication text content and providing the authentication text content for the first user, wherein the authentication text content comprises preset text content which is text content matched with registered voice content input by a registered user; acquiring authentication voice content input by the first user based on the authentication text content; under the condition that the authentication voice content is determined to be matched with the authentication text content, extracting a target voice segment matched with the preset text content from the authentication voice content; and according to the target voice fragment, performing identity authentication on the first user and obtaining an authentication result of the identity authentication.

Optionally, the randomly generating authentication text content in response to the identity authentication request of the first user and providing the authentication text content to the first user includes: responding to the identity authentication request of the first user, and randomly extracting a registration voice segment from the registration voice content; recognizing the text content of the registered voice fragment, and taking the text content as the preset text content; providing the first user with authentication text content including the preset text content.

Optionally, the method further includes: responding to an identity registration request of a second user, and providing preset registration text content to the second user; acquiring voice content input by the second user based on the authentication text content; and under the condition that the voice content input by the second user is matched with the preset registered text content, taking the voice content input by the second user as the registered voice content of the second user, and taking the preset registered text content as the preset text content.

Optionally, the method further includes: responding to an identity registration request of a third user, and acquiring voice content input by the third user; recognizing text content of the voice content input by the third user; and taking the voice content input by the third user as the registered voice content, and taking the text content obtained by recognition as the preset text content.

Optionally, the method further includes: responding to an identity registration request of a fourth user, and acquiring voice content input by the fourth user; randomly extracting a plurality of voice segments from voice content input by the fourth user to serve as a plurality of preset voice segments, extracting a plurality of sections of text content consistent with the preset voice segments from text content of the voice content input by the fourth user to serve as a plurality of preset text contents, and correspondingly storing the preset voice segments and the preset text content consistent with the preset voice segments; the randomly generating authentication text content and providing the authentication text content to the first user in response to the identity authentication request of the first user comprises: responding to an identity authentication request of a first user, randomly extracting one preset text content from a plurality of preset text contents, and randomly generating an authentication text content based on the preset text content; the performing identity authentication on the first user and obtaining an authentication result of the identity authentication according to the target voice segment includes: determining a preset voice segment which is stored corresponding to preset text content used for generating the authentication text content from a plurality of preset voice segments, judging whether the sound characteristic of the target voice segment is consistent with the sound characteristic of the preset voice segment, and if the sound characteristic of the target voice segment is consistent with the sound characteristic of the preset voice segment, determining that the first user is a registered user; the text content of the voice content input by the fourth user is obtained by recognizing the voice content input by the fourth user, or the text content of the voice content input by the fourth user is provided to the fourth user as a preset registered text content before the voice content input by the fourth user is acquired.

Optionally, the randomly generating authentication text content in response to the identity authentication request of the first user and providing the authentication text content to the first user includes: responding to an identity authentication request of the first user, determining the preset text content corresponding to the user identification according to the user identification in the identity authentication request, and displaying the authentication text content comprising the preset text content to the first user; the performing identity authentication on the first user and obtaining an authentication result of the identity authentication according to the target voice segment includes: judging whether the sound characteristics of the target voice fragment are consistent with the sound characteristics of the registered voice content corresponding to the user identification; and if the sound characteristics of the target voice fragment are consistent with the sound characteristics of the registered voice content corresponding to the user identification, determining that the first user is a registered user consistent with the user identification.

Optionally, the authentication text content further includes randomly generated random text content; the method further comprises the following steps: after the first user passes the identity authentication, recording the random text content, and randomly determining a target text content from all the random text contents of the history record as the preset text content of the next identity authentication of the first user; and extracting a voice segment corresponding to the target text content from the authentication voice content, and using the voice segment as the registration voice content of the first user.

Optionally, the performing, according to the target voice fragment, identity authentication on the first user and obtaining an authentication result of the identity authentication includes: judging whether registered voice content with the voice similarity of the target voice fragment larger than a similarity threshold exists or not; and if the registered voice content with the voice similarity of the target voice fragment larger than the similarity threshold exists, determining that the first user is a registered user.

In a second aspect of the present disclosure, an identity authentication apparatus is provided, including: the system comprises a providing module, a voice recognition module and a recognition module, wherein the providing module is used for responding to an identity recognition request of a first user, randomly generating a recognition text content and providing the recognition text content to the first user, wherein the recognition text content comprises a preset text content, and the preset text content is a text content matched with a registered voice content input by a registered user; the acquisition module is used for acquiring the authentication voice content input by the first user based on the authentication text content; the extraction module is used for extracting a target voice segment matched with the preset text content from the authentication voice content under the condition that the authentication voice content is matched with the authentication text content; and the processing module is used for carrying out identity authentication on the first user according to the target voice fragment and obtaining an authentication result of the identity authentication.

Optionally, the providing module is configured to randomly extract a registration voice segment from the registration voice content in response to the identity authentication request of the first user; recognizing the text content of the registered voice fragment, and taking the text content as the preset text content; providing the first user with authentication text content including the preset text content.

Optionally, the apparatus further includes a first registration module, configured to provide preset registration text content to a second user in response to an identity registration request of the second user; acquiring voice content input by the second user based on the authentication text content; and under the condition that the voice content input by the second user is matched with the preset registered text content, taking the voice content input by the second user as the registered voice content of the second user, and taking the preset registered text content as the preset text content.

Optionally, the apparatus further comprises: the second registration module is used for responding to an identity registration request of a third user and acquiring voice content input by the third user; recognizing text content of the voice content input by the third user; and taking the voice content input by the third user as the registered voice content, and taking the text content obtained by recognition as the preset text content.

Optionally, the apparatus further comprises: the third registration module is used for responding to an identity registration request of a fourth user and acquiring voice content input by the fourth user; randomly extracting a plurality of voice segments from voice content input by the fourth user to serve as a plurality of preset voice segments, extracting a plurality of sections of text content consistent with the preset voice segments from text content of the voice content input by the fourth user to serve as a plurality of preset text contents, and correspondingly storing the preset voice segments and the preset text content consistent with the preset voice segments; the providing module is used for responding to an identity authentication request of a first user, randomly extracting one preset text content from a plurality of preset text contents, and randomly generating an authentication text content based on the preset text content; the processing module is configured to perform identity authentication on the first user and obtain an authentication result of the identity authentication according to the target voice segment, and includes: determining a preset voice segment which is stored corresponding to preset text content used for generating the authentication text content from a plurality of preset voice segments, judging whether the sound characteristic of the target voice segment is consistent with the sound characteristic of the preset voice segment, and if the sound characteristic of the target voice segment is consistent with the sound characteristic of the preset voice segment, determining that the first user is a registered user; the text content of the voice content input by the fourth user is obtained by recognizing the voice content input by the fourth user, or the text content of the voice content input by the fourth user is provided to the fourth user as a preset registered text content before the voice content input by the fourth user is acquired.

Optionally, the providing module is further configured to respond to an identity authentication request of the first user, determine, according to a user identifier in the identity authentication request, the preset text content corresponding to the user identifier, and display the authentication text content including the preset text content to the first user; the processing module is used for judging whether the sound characteristics of the target voice fragment are consistent with the sound characteristics of the registered voice content corresponding to the user identification; and if the sound characteristics of the target voice fragment are consistent with the sound characteristics of the registered voice content corresponding to the user identification, determining that the first user is a registered user consistent with the user identification.

Optionally, the authentication text content further includes randomly generated random text content; the device further comprises a recording module, which is used for recording the random text content after the first user passes the identity authentication and randomly determining a target text content from all the random text contents in the history record as the preset text content of the next identity authentication of the first user; and extracting a voice segment corresponding to the target text content from the authentication voice content, and using the voice segment as the registration voice content of the first user.

Optionally, the processing module is configured to determine whether there is a registered voice content whose voice similarity with the target voice segment is greater than a similarity threshold; and if the registered voice content with the voice similarity of the target voice fragment larger than the similarity threshold exists, determining that the first user is a registered user.

In a third aspect of the disclosure, a computer-readable storage medium is provided, on which a computer program is stored, which program, when being executed by a processor, carries out the steps of the method of any one of the first aspect of the disclosure.

In a fourth aspect of the present disclosure, an electronic device is provided, which includes a memory and a processor, wherein the memory stores a computer program thereon, and the processor is configured to execute the computer program in the memory to implement the steps of the method in any one of the first aspect of the present disclosure.

Through the technical scheme, the following technical effects can be at least achieved: the method comprises the steps of generating verification text content including two parts of voice content (namely preset text content) recorded during user registration and random content during identity authentication, wherein the random content can be used for ensuring that the verification voice is generated immediately but not forged, and the preset text content can confirm the identity of a user through voiceprint comparison.

Additional features and advantages of the disclosure will be set forth in the detailed description which follows.

Drawings

The accompanying drawings, which are included to provide a further understanding of the disclosure and are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description serve to explain the disclosure without limiting the disclosure. In the drawings:

FIG. 1 is a flow chart illustrating a method of identity authentication according to an exemplary disclosed embodiment.

Fig. 2 is a flow chart illustrating a method of identity authentication according to an exemplary disclosed embodiment.

Fig. 3 is a block diagram illustrating an identity authentication device according to an exemplary disclosed embodiment.

FIG. 4 is a block diagram illustrating an electronic device according to an exemplary disclosed embodiment.

Detailed Description

The following detailed description of specific embodiments of the present disclosure is provided in connection with the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating the present disclosure, are given by way of illustration and explanation only, not limitation.

The present disclosure may be applied to devices at both ends, for example, the authentication text content may be displayed and the authentication voice content input by the user may be acquired at the user end device, the authentication voice content and the authentication text content may be matched at the user end or the authentication end device, the target voice segment may be extracted and the voice characteristics may be matched at the authentication end device, and the present disclosure may be used when remote identity authentication is required, such as login of a network account, purchase of an online commodity, and the like. In order to avoid unnecessary repetition, various possible combinations will not be separately described in this disclosure.

The method can be applied to one device, and all technical contents are completed in one device, for example, under the scenes of opening of a security door, opening of a safe, punching a card by an enterprise, identity authentication of a password shield and the like.

Fig. 1 is a flowchart illustrating an identity authentication method according to an exemplary disclosed embodiment, which may be applied to a user device, such as a security door, a safe, a mobile phone, a computer, etc., or a server, such as a cloud authentication platform, etc., as shown in fig. 1, and the method includes the following steps:

s11, responding to the identity authentication request of the first user, randomly generating authentication text content and providing the authentication text content for the first user, wherein the authentication text content comprises preset text content, and the preset text content is text content matched with the registered voice content input by the registered user.

The identity authentication request may include other operation requests, for example, the identity authentication request may include an opening request for a security door, a safe, a security system, a payment request for an order, an operation request for other network operations, and the like, and when the identity authentication request passes, the operation requests may also pass, and when the identity authentication request does not pass, the operation requests may not pass. The identity authentication request may also be implemented as a precondition for other operation requests, for example, it may be set that the operation request may be continuously sent to the target after the identity authentication request passes, and the passing or not of the identity authentication request and the operation request is not necessarily related.

The authentication text content is randomly generated, but all the randomly generated authentication text content includes preset text content which is text content matched with registered voice content entered when the user registers. It should be noted that the preset text content may be completely consistent with the registered voice content entered during registration, or may be consistent with a segment of the registered voice content. That is, the authentication text content includes two parts, one part being random text content that is randomly generated and one part being text content from a match with the registered voice content.

For example, if the registered voice content entered by the user at the time of registration is "kiwi fruit liked", the preset text content may be "kiwi fruit liked", and when the authentication text content is generated, text contents including "kiwi fruit liked" may be randomly generated, such as "i do not like kiwi fruit", "i like kiwi fruit right", "i like watermelon and kiwi fruit right", etc., wherein the random text contents are "i do not … …", "i … … right", "i … … right", "i … … watermelon and … …", respectively, that is, the position of the random text content in the authentication text content may be before, after, on both sides of, or in the middle of the preset text content.

It should be noted that, when the user registers, a plurality of pieces of registered voice content may be input, and when the authentication text content is generated, any one piece of registered voice content and the preset text content corresponding to the registered voice content may be extracted from the plurality of pieces of registered voice content, and the authentication text content may be generated based on the preset text content.

In one possible implementation mode, in response to the identity authentication request of the first user, randomly extracting a registration voice segment from the registration voice content; recognizing the text content of the registered voice fragment, and taking the text content as the preset text content; providing the first user with authentication text content including the preset text content.

For example, if the registered voice content entered by the user at the time of registration is "i do not like to eat pears, i prefer to eat apples", the authentication text content may be "i do not like to eat pears, i prefer to eat mangos", wherein "i do not like to eat pears" is a preset text content matched with the registered voice content, is a fragment of the registered voice content, and "i prefer to eat mangos" is a randomly generated random text content; the authentication text content can also be that the user does not like to eat pears, does not like to eat mangos, and prefers to eat apples, wherein the user does not like to eat pears, the user prefers to eat apples, and the user does not like to eat mangos; the authentication text content may also be "i do not like to eat pears, i prefer to eat apples and mangos", including text content that is a complete registered voice of preset text content, and random text content "and mangos".

It should be noted that the user may input a section of registered voice content during registration, or may input multiple sections of registered voice content, and when randomly extracting a registered voice segment from the registered voice content, may extract an arbitrary registered voice segment from an arbitrary section of registered voice.

The method in this embodiment may be applied to a user terminal or a server, and when the method in this embodiment is applied to the server when providing the authentication text content to the first user, the authentication text content may be sent to the user terminal by the server and displayed by the user terminal. When the authentication text content is displayed, the authentication text content may be displayed in a text manner, for example, the randomly generated authentication text content is displayed to the first user in a text format or the randomly generated authentication text content is displayed to the first user in a picture format including characters; the authentication text content may also be presented in a voice manner, for example, the authentication text content is presented to the first user in a voice broadcast manner, for example, the user is played "please read after pressing the authentication button: i do not like eating mango ", or a voice playing control is added behind a display box in a text format, and when a user clicks the voice playing control, the voice of the authentication text content is played.

It should be noted that the registered voice content may be audio of a designated text content recorded during registration of the user, or may be any text content recorded during registration, and the text content corresponding to the registered voice content may be created by recognizing the audio of the user. The present disclosure does not limit the acquisition mode of the registered voice content.

In one possible embodiment, the registration text content and the registration voice content are obtained by: responding to an identity registration request of a second user, and displaying preset registration text content to the second user; acquiring voice content input by the second user; and under the condition that the voice content input by the second user is matched with the preset registered text content, taking the voice content input by the second user as the registered voice content of the second user, and taking the preset registered text content as the preset text content.

That is to say, when a user registers, the voice of the specified content of the system needs to be recorded as the basis for later identity verification, the voice is the registered voice content, and the specified content of the system is the registered text content.

In one possible embodiment, the registration text content and the registration voice content are obtained by: responding to an identity registration request of a third user, and acquiring voice content input by the third user; recognizing text content of the voice content input by the third user; and taking the voice content input by the third user as the registered voice content, and taking the text content obtained by recognition as the preset text content.

That is to say, when a user registers, the user needs to record any content determined by the user as a basis for verifying the identity later, the arbitrary content is the registration voice content, and the text content obtained by identifying the registration voice content is the registration text content.

In a possible implementation manner, in response to an identity registration request of a fourth user, acquiring a voice content input by the fourth user, randomly extracting a plurality of voice segments from the voice content input by the fourth user as a plurality of preset voice segments, extracting a plurality of text contents consistent with the plurality of preset voice segments from the text contents of the voice content input by the fourth user as a plurality of preset text contents, and correspondingly storing the preset voice segments and the preset text contents consistent with the preset voice segments.

That is to say, a plurality of voice segments can be randomly extracted from the registered voice content input by the user during registration and stored in correspondence with the corresponding preset text content, so that a pair of preset voice segments and preset text content can be directly extracted from the library during authentication, and the time for randomly extracting the preset text content in the process of generating the authentication text content is reduced. The text content of the voice content input by the fourth user is obtained by recognizing the voice content input by the fourth user, or the text content of the voice content input by the fourth user is provided to the fourth user as a preset registration text content before the voice content input by the fourth user is acquired.

It should be noted that, after the identity authentication is passed, the segment corresponding to the random text content and the authentication voice content submitted by the user during the authentication can be used as the registration voice content to be added into the voice authentication library, and during the subsequent identity authentication, the new preset text content can be extracted from the random text content corresponding to the authentication voice content, so as to increase the complexity of the authentication text content during the voice authentication and prevent the prerecording and the falsifying of the registration voice content.

In a possible implementation manner, the authentication text content further includes randomly generated random text content, and after the first user is determined to be a registered user, the random text content is recorded, and a target text content is randomly determined from all random text contents in the history record to serve as the preset text content for the next identity authentication of the first user; and extracting a voice segment corresponding to the target text content from the authentication voice content, and using the voice segment as the registration voice content of the first user.

For example, the voice content recorded by the user at the time of registration is "i like to eat an apple", the authentication voice content at the time of the first authentication is "i like to eat an apple, i dislike rainy day", a voice segment of "i dislike rainy day" may be added to the registration voice content, an authentication text content of "i like sunny day, i dislike rainy day" may be generated at the time of the subsequent authentication, and when the subsequent sound characteristics are matched, the feature comparison is performed by using the part of "i dislike rainy day" recorded at the time of the last authentication of the user and the "i dislike rainy day" recorded this time; after the authentication is passed, the 'i like sunny days' can be continuously added into the registered voice content.

It should be noted that, after the registered voice content reaches the preset number threshold, the voice feature information of the user may be learned based on all the registered voice contents through the voiceprint matching model, and in the subsequent authentication process, a random authentication text content may be generated, which may be unrelated to the registered voice content. Because the learning sample size is enough, the voiceprint matching model can fully learn the voice characteristics of the user based on a large number of samples, and the identity of the user can be verified without character-by-character characteristic comparison with the original voice fragment. That is, in the present disclosure, the authentication voice content may be used as a learning sample of the voiceprint matching model, and by learning the voice feature of the authentication voice content, the authentication accuracy of the voiceprint matching model can be improved, and the identity of the user can be discriminated by the voice irrelevant to the existing registered voice content.

And S12, acquiring the authentication voice content input by the first user based on the authentication text content.

The authentication voice content input by the first user may be collected by a microphone originally used for other voice functions by the authentication device, may also be collected by a microphone set for voice authentication by the authentication device, or may also be transmitted to the authentication device after being collected by other devices or other microphones. For example, when the method is applied to authentication equipment on articles such as a security door, a safe and the like, the microphone of the original call system of the security door can be used for collecting authentication voice contents, and the microphone can be newly arranged on the safe for collecting the authentication voice contents; when the method is applied to the mobile terminal, the voice content can be collected through a microphone of the mobile terminal; when the method is applied to cloud authentication equipment and a server, the authentication voice content can be collected through a microphone originally used for conversation of mobile equipment (such as a mobile phone, a smart watch and the like) in communication connection with the authentication equipment.

And S13, under the condition that the authentication voice content is determined to be matched with the authentication text content, extracting a target voice segment matched with the preset text content from the authentication voice content.

The authentication voice content can be converted into the authentication word content through a voice recognition technology, and when the authentication word content is the same as the authentication text content (or the similarity is higher than a certain threshold), the authentication voice content can be considered to be matched with the authentication text content.

After the authentication voice content is matched with the authentication text content, a segment aligned with the preset voice segment can be extracted from the authentication voice content through a sound track alignment technology to serve as a target voice segment. (wherein, the preset voice segment is a voice segment which is extracted from the registered voice content and is matched with the preset text content), or, a segment corresponding to the text content which is consistent with the preset text content in the authenticated voice content can be identified and used as the target voice segment.

S14, according to the target voice fragment, performing identity authentication on the first user and obtaining an authentication result of the identity authentication.

The voice characteristics of the target voice segment may be compared with the voice characteristics of all the registered voice contents, whether the registered voice contents consistent with the voice characteristics of the target voice segment exist in the database or not is judged, and when the registered voice contents exist, the first user may be determined to be a registered user.

In one possible implementation, after the target voice segment is determined, the voice characteristics of the target voice segment may be compared with all the registered voice contents or with the preset voice segment in all the registered voice contents, and when the registered voice contents (or the preset voice segment extracted from the registered voice contents) with the voice similarity of the target voice segment greater than the similarity threshold exist, the registered voice contents may be used as the target registered voice contents consistent with the voice characteristics of the target voice segment.

It is worth mentioning that, during registration, a user may enter one section of registered voice content, or may enter multiple sections of registered voice content; the registered voice content library may store the registered voice of one user, or may store the registered voices of a plurality of users. Under the condition that only one or more sections of registered voice contents of one user exist in a registered voice content library, when the authentication text contents are generated, any one section of registered voice contents can be extracted from all the registered voice contents, and the text contents matched with the registered voice contents serve as preset text contents, for example, two sections of contents of 'i like apple' and 'i find pear' are input by the user during registration, when the authentication text contents are generated, one section of contents can be randomly selected from the 'i like apple' and the 'i find pear' to generate the preset text contents, and when the authentication text contents are matched, the target voice segment can be matched with all the registered voice contents in the library to judge whether the target registered voice contents with consistent sound characteristics exist in the library; when there are registered voice contents of a plurality of users in the registered voice content library, the text contents of the registered voice contents should be consistent or grouped, that is, all users may enter the same one or multiple pieces of contents at the time of registration, when generating the authentication text content, any one of the registered voice contents may be extracted from all the registered voice contents, and the text content matched therewith is used as the preset text content, for example, all users enter two pieces of contents of "i like apple" and "i hate pears" at the time of registration, when generating the authentication text content, one piece of content may be arbitrarily selected from "i like apple" and "i hate pears" to generate the preset text content, when matching, the target voice segment may be matched with the one piece of registered voice contents of all users in the library to determine whether there is the target registered voice content with consistent voice characteristics in the library (for example, when "i like apple" is used as the preset text content, only the registered voice content of "i like apple" of all users is matched, and the registered voice content of "i dislike pear" is not matched).

Under the condition that a mode of randomly extracting preset voice segments from registered voice contents in advance and correspondingly storing the preset voice segments and the corresponding preset text contents is adopted, the preset text contents in the authentication text contents are directly extracted from a library, so that the preset voice segments corresponding to the preset text contents for generating the authentication text contents can be determined from a plurality of preset voice segments, and the voice characteristics of the preset voice segments and the voice characteristics of the target voice segments are directly compared, so that the user identity is authenticated more accurately.

Wherein the registered user is authenticated on behalf of the user identity. When the identity authentication request includes other operation requests, the operation may be performed for the user after confirming the registered identity of the user.

It should be noted that a registered user identity may be bound to each registered voice content, and the authority of the operation that each identity may perform is different, for example, an administrator identity may be bound to the registered voice content a, when the first user confirms the identity of the registered user "administrator" through the identity authentication request, the operation under the authority content of the administrator in the operation request may be performed for the first user, and an ordinary user identity is bound to the registered voice content B, and when the first user confirms the identity of the registered user "ordinary user" through the identity authentication request, the operation request that the first user needs the identity of the administrator "to perform and is synchronously sent will not be passed.

Fig. 2 is a flowchart illustrating an identity authentication method according to an exemplary disclosed embodiment, which may be applied to a user device, such as a security door, a safe, a mobile phone, a computer, etc., or a server, such as a cloud authentication platform, etc., as shown in fig. 2, and the method includes the following steps:

s21, responding to the identity authentication request of the first user, determining the preset text content corresponding to the user identification according to the user identification in the identity authentication request, randomly generating authentication text content based on the preset text content, and providing the authentication text content for the first user.

The authentication text content is randomly generated and comprises preset text content, and the preset text content is matched with the registered voice content input by the user.

The user identifier may be a user number, a user name, or other identification information that can uniquely determine the user identity, and the user may obtain a user identifier when registering, where the user identifier is associated with the registered voice content of the user, the text content of the registered voice content (or the preset text content obtained by registering the voice content).

The identity authentication request may include other operation requests, for example, the identity authentication request may include an opening request for a security door, a safe, a security system, a payment request for an order, an operation request for other network operations, and the like, and when the identity authentication request passes, the operation requests may also pass at the same time, and when the identity authentication request does not pass, the operation requests are not passed. The identity authentication request may also be implemented as a precondition for other operation requests, for example, it may be set that the operation request may be continuously sent to the target after the identity authentication request passes, and the passing or not of the identity authentication request and the operation request is not necessarily related.

The authentication text content is randomly generated on the basis of the preset text content, that is, all the randomly generated authentication text content includes the preset text content, and the preset text content is the text content matched with the registered voice content input when the user registers. It should be noted that the preset text content may be completely consistent with the registered voice content entered during registration, or may be consistent with a segment of the registered voice content. That is, the authentication text content includes two parts, one part being random text content that is randomly generated and one part being text content from a match with the registered voice content.

When the preset text content is generated, the registered voice content associated with the user identifier or the text content consistent with the registered voice content can be extracted, and then the preset text content is generated according to the registered voice content or the text content.

It should be noted that the registered voice content may be audio of a designated text content recorded during registration of the user, or may be any text content recorded during registration, and the text content corresponding to the registered voice content may be created by recognizing the audio of the user. The contents of the registered voice contents input at the time of registration of each user may be identical or may be different from each other. The present disclosure does not limit the acquisition mode of the registered voice content.

That is to say, when a user registers, a piece of voice of system-specific content needs to be recorded as a basis for later identity verification, the voice is registered voice content, and the system-specific content is registered text content.

And S22, acquiring the authentication voice content input by the first user based on the authentication text content.

The authentication voice content input by the first user can be collected by a microphone originally used for other voice functions of the authentication device, can also be collected by a microphone set for voice authentication by the authentication device, and can also be transmitted to the authentication device after being collected by the authentication device or other microphones. For example, when the method is applied to authentication equipment on articles such as a security door, a safe and the like, the microphone of the original call system of the security door can be used for collecting authentication voice contents, and the microphone can be newly arranged on the safe for collecting the authentication voice contents; when the method is applied to the mobile terminal, the voice content can be collected through a microphone of the mobile terminal; when the method is applied to cloud authentication equipment and a server, the authentication voice content can be collected through a microphone originally used for conversation of mobile equipment (such as a mobile phone, a smart watch and the like) in communication connection with the authentication equipment.

And S23, under the condition that the authentication voice content is determined to be matched with the authentication text content, extracting a target voice segment matched with the preset text content from the authentication voice content.

And S24, judging whether the sound characteristic of the target voice segment is consistent with the sound characteristic of the registered voice content corresponding to the user identification.

After the target voice segment is determined, the target voice segment may be compared with the sound features of the registered voice content corresponding to the user identifier, or compared with the sound features of a segment which is extracted from the registered voice content corresponding to the user identifier and is consistent with the preset text content, and when the sound similarity between the two is greater than the similarity threshold, the sound features of the two may be considered to be consistent.

And S25, if the sound feature of the target voice segment is consistent with the sound feature of the registered voice content corresponding to the user identifier, determining that the first user is a registered user consistent with the user identifier.

It should be noted that a registered user identity may be bound to each user identifier, and the authority of operations that each identity may perform is different, for example, an administrator identity may be bound to the user identifier a12345, when the first user confirms the identity of the registered user "administrator" through the identity authentication request, the first user may perform operations under the authority content of the administrator in its operation request, bind a normal user identity to the user identifier a11234, and when the first user confirms the identity of the registered user "normal user" through the identity authentication request, the operation request that needs the "administrator" identity to perform and is synchronously sent by the first user will not be passed.

Fig. 3 is a block diagram illustrating an identity authentication apparatus according to an exemplary disclosed embodiment, and as shown in fig. 3, the apparatus 300 includes a providing module 310, an obtaining module 320, an extracting module 330, and a processing module 340.

The providing module 310 is configured to randomly generate an authentication text content in response to an identity authentication request of a first user, and provide the authentication text content to the first user, where the authentication text content includes a preset text content, and the preset text content is a text content matched with a registered voice content entered by a registered user.

An obtaining module 320, configured to obtain authenticated voice content input by the first user based on the authenticated text content.

An extracting module 330, configured to extract, from the authentication voice content, a target voice segment that matches the preset text content when the authentication voice content matches the authentication text content.

The determining module 340 is configured to perform identity authentication on the first user according to the target voice segment and obtain an authentication result of the identity authentication.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

Fig. 4 is a block diagram illustrating an electronic device 400 according to an example embodiment. As shown in fig. 4, the electronic device 400 may include: a processor 401 and a memory 402. The electronic device 400 may also include one or more of a multimedia component 403, an input/output (I/O) interface 404, and a communications component 405.

The processor 401 is configured to control the overall operation of the electronic device 400, so as to complete all or part of the steps in the above-mentioned identity authentication method. The memory 402 is used to store various types of data to support operation at the electronic device 400, such as instructions for any application or method operating on the electronic device 400 and application-related data, such as contact data, transmitted and received messages, pictures, audio, video, and so forth. The Memory 402 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Erasable Programmable Read-Only Memory (EPROM), Programmable Read-Only Memory (PROM), Read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk or optical disk. The multimedia components 403 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 402 or transmitted through the communication component 405. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 404 provides an interface between the processor 401 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 405 is used for wired or wireless communication between the electronic device 400 and other devices. Wireless communication, such as Wi-Fi, bluetooth, Near Field Communication (NFC), 2G, 3G, 4G, NB-IOT, eMTC, or other 5G, etc., or a combination of one or more of them, which is not limited herein. The corresponding communication component 405 may therefore include: Wi-Fi module, Bluetooth module, NFC module, etc.

In an exemplary embodiment, the electronic Device 400 may be implemented by one or more Application Specific Integrated Circuits (ASICs), Digital Signal Processors (DSPs), Digital Signal Processing Devices (DSPDs), Programmable Logic Devices (PLDs), Field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors, or other electronic components for performing the above-described authentication method.

In another exemplary embodiment, a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the identity authentication method described above is also provided. For example, the computer readable storage medium may be the memory 402 comprising program instructions executable by the processor 401 of the electronic device 400 to perform the identity authentication method described above.

The preferred embodiments of the present disclosure are described in detail with reference to the accompanying drawings, however, the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solution of the present disclosure within the technical idea of the present disclosure, and these simple modifications all belong to the protection scope of the present disclosure.

It should be noted that, in the foregoing embodiments, various features described in the above embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, various combinations that are possible in the present disclosure are not described again.

In addition, any combination of various embodiments of the present disclosure may be made, and the same should be considered as the disclosure of the present disclosure, as long as it does not depart from the spirit of the present disclosure.

Claims

1. An identity authentication method, the method comprising:

responding to an identity authentication request of a first user, randomly generating authentication text content and providing the authentication text content for the first user, wherein the authentication text content comprises preset text content which is text content matched with registered voice content input by a registered user;

acquiring authentication voice content input by the first user based on the authentication text content;

under the condition that the authentication voice content is determined to be matched with the authentication text content, extracting a target voice segment matched with the preset text content from the authentication voice content;

and according to the target voice fragment, performing identity authentication on the first user and obtaining an authentication result of the identity authentication.

2. The method of claim 1, wherein randomly generating authentication text content and providing the authentication text content to the first user in response to an identity authentication request of the first user comprises:

responding to the identity authentication request of the first user, and randomly extracting a registration voice segment from the registration voice content;

recognizing the text content of the registered voice fragment, and taking the text content as the preset text content;

providing the first user with authentication text content including the preset text content.

3. The method of claim 1, further comprising:

responding to an identity registration request of a second user, and providing preset registration text content to the second user;

acquiring voice content input by the second user based on the authentication text content;

and under the condition that the voice content input by the second user is matched with the preset registered text content, taking the voice content input by the second user as the registered voice content of the second user, and taking the preset registered text content as the preset text content.

4. The method of claim 1, further comprising:

responding to an identity registration request of a third user, and acquiring voice content input by the third user;

recognizing text content of the voice content input by the third user;

and taking the voice content input by the third user as the registered voice content, and taking the text content obtained by recognition as the preset text content.

5. The method of claim 1, further comprising:

responding to an identity registration request of a fourth user, and acquiring voice content input by the fourth user;

randomly extracting a plurality of voice segments from voice content input by the fourth user to serve as a plurality of preset voice segments, extracting a plurality of sections of text content consistent with the preset voice segments from text content of the voice content input by the fourth user to serve as a plurality of preset text contents, and correspondingly storing the preset voice segments and the preset text content consistent with the preset voice segments;

the randomly generating authentication text content and providing the authentication text content to the first user in response to the identity authentication request of the first user comprises:

responding to an identity authentication request of a first user, randomly extracting one preset text content from a plurality of preset text contents, and randomly generating an authentication text content based on the preset text content;

the performing identity authentication on the first user and obtaining an authentication result of the identity authentication according to the target voice segment includes:

determining a preset voice segment which is stored corresponding to preset text content used for generating the authentication text content from a plurality of preset voice segments, judging whether the sound characteristic of the target voice segment is consistent with the sound characteristic of the preset voice segment, and if the sound characteristic of the target voice segment is consistent with the sound characteristic of the preset voice segment, determining that the first user is a registered user;

wherein the text content of the voice content input by the fourth user is obtained by recognizing the voice content input by the fourth user, or,

and the text content of the voice content input by the fourth user is provided to the fourth user as preset registered text content before the voice content input by the fourth user is acquired.

6. The method of claim 1, wherein randomly generating authentication text content and providing the authentication text content to the first user in response to an identity authentication request of the first user comprises:

responding to an identity authentication request of the first user, determining the preset text content corresponding to the user identification according to the user identification in the identity authentication request, randomly generating authentication text content based on the preset text content, and providing the authentication text content for the first user;

the performing identity authentication on the first user and obtaining an authentication result of the identity authentication according to the target voice segment includes: judging whether the sound characteristics of the target voice fragment are consistent with the sound characteristics of the registered voice content corresponding to the user identification;

and if the sound characteristics of the target voice fragment are consistent with the sound characteristics of the registered voice content corresponding to the user identification, determining that the first user is a registered user consistent with the user identification.

7. The method of claim 1, wherein the authentication textual content further comprises randomly generated random textual content;

the method further comprises the following steps:

after the first user passes the identity authentication, recording the random text content, and randomly determining a target text content from all the random text contents of the history record as the preset text content of the next identity authentication of the first user; and

and extracting a voice segment corresponding to the target text content from the authentication voice content, and using the voice segment as the registration voice content of the first user.

8. The method according to claim 1, wherein said authenticating the first user and obtaining an authentication result of the authentication according to the target voice segment comprises:

judging whether registered voice content with the voice similarity of the target voice fragment larger than a similarity threshold exists or not;

and if the registered voice content with the voice similarity of the target voice fragment larger than the similarity threshold exists, determining that the first user is a registered user.

9. An identity authentication apparatus, the apparatus comprising:

the system comprises a providing module, a voice recognition module and a recognition module, wherein the providing module is used for responding to an identity recognition request of a first user, randomly generating a recognition text content and providing the recognition text content to the first user, wherein the recognition text content comprises a preset text content, and the preset text content is a text content matched with a registered voice content input by a registered user;

the acquisition module is used for acquiring the authentication voice content input by the first user based on the authentication text content;

the extraction module is used for extracting a target voice segment matched with the preset text content from the authentication voice content under the condition that the authentication voice content is matched with the authentication text content;

and the processing module is used for carrying out identity authentication on the first user according to the target voice fragment and obtaining an authentication result of the identity authentication.

10. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.

11. An electronic device, comprising:

a memory having a computer program stored thereon;

a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1 to 8.