CN111883140A - Authentication method, device, equipment and medium based on knowledge graph and voiceprint recognition - Google Patents

Authentication method, device, equipment and medium based on knowledge graph and voiceprint recognition Download PDF

Info

Publication number
CN111883140A
CN111883140A CN202010723015.6A CN202010723015A CN111883140A CN 111883140 A CN111883140 A CN 111883140A CN 202010723015 A CN202010723015 A CN 202010723015A CN 111883140 A CN111883140 A CN 111883140A
Authority
CN
China
Prior art keywords
reply
result
user
authentication
voiceprint
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010723015.6A
Other languages
Chinese (zh)
Other versions
CN111883140B (en
Inventor
邹芳
李俊蓉
李沛恒
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Life Insurance Company of China Ltd
Original Assignee
Ping An Life Insurance Company of China Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Life Insurance Company of China Ltd filed Critical Ping An Life Insurance Company of China Ltd
Priority to CN202010723015.6A priority Critical patent/CN111883140B/en
Publication of CN111883140A publication Critical patent/CN111883140A/en
Application granted granted Critical
Publication of CN111883140B publication Critical patent/CN111883140B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L15/18Speech classification or search using natural language modelling
    • G10L15/1822Parsing for meaning understanding
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/18Artificial neural networks; Connectionist approaches
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces
    • G10L17/24Interactive procedures; Man-machine interfaces the user being prompted to utter a password or a predefined phrase
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/32Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials
    • H04L9/3226Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols including means for verifying the identity or authority of a user of the system or for message authentication, e.g. authorization, entity authentication, data integrity or data verification, non-repudiation, key authentication or verification of credentials using a predetermined code, e.g. password, passphrase or PIN
    • H04L9/3231Biological data, e.g. fingerprint, voice or retina

Abstract

The invention relates to the field of artificial intelligence, and provides an authentication method, an authentication device, authentication equipment and an authentication medium based on knowledge graph and voiceprint recognition, wherein the method comprises the following steps: obtaining user information in an identity verification instruction; acquiring user sample voice information and a user knowledge graph; acquiring a first authentication problem generation model to generate a problem to be confirmed; obtaining authentication problem voice through a voice conversion model; receiving the reply voice information, and acquiring a voiceprint matching result output by the voiceprint recognition model according to the reply voice information and the user sample voice information; performing text recognition and intention recognition on the reply voice information through a reply recognition model to obtain a reply result, and then recognizing the matching degree of the question to be confirmed and the reply result to obtain a reply comprehensive result; and determining the identity authentication result in the current conversation. The invention realizes double authentication, enhances the safety of user information, and also relates to a block chain technology.

Description

Authentication method, device, equipment and medium based on knowledge graph and voiceprint recognition
Technical Field
The invention relates to the field of artificial intelligence voice processing, in particular to an authentication method, an authentication device, authentication equipment and an authentication medium based on knowledge graph and voiceprint recognition.
Background
Voiceprint recognition technology is currently widely applied in authentication scenarios. However, since personal voice is variable, and is easily influenced by factors such as physical condition, age, emotion and the like or external factors (such as a microphone, a channel, environmental noise, mixed speaking noise and the like), authentication fails, and user experience satisfaction is greatly reduced; and the lawbreaker impersonates the voice by means of impersonation attack, replay attack, voice synthesis attack, voice conversion attack and the like to confuse illegal operation through identity authentication, thereby threatening the information security of the user.
Disclosure of Invention
The invention provides an authentication method, an authentication device, computer equipment and a storage medium based on a knowledge graph and voiceprint recognition, which realize that an authentication problem voice is generated by acquiring user sample voice information and a user knowledge graph of a call user, a reply voice message is received, a voiceprint matching result and a reply comprehensive result are obtained through text recognition and intention recognition, and an identity authentication result in the current round of conversation is confirmed through the voiceprint matching voice message and the reply voice message, so that a double authentication effect is achieved, the identity of the call user is authenticated more accurately, and the safety of user information is enhanced.
An authentication method based on knowledge graph and voiceprint recognition comprises the following steps:
receiving an identity verification instruction of a call user, and acquiring user information in the identity verification instruction;
determining a user identification code of the call user according to the user information, and acquiring user sample tone information and a user knowledge graph associated with the user identification code; the user knowledge graph comprises graph nodes with a tree structure;
inputting the user knowledge graph into a first authentication problem generation model, and acquiring a problem to be confirmed which is generated by the first authentication problem generation model and contains a ternary tree structure; the problem to be confirmed is generated by the first authentication problem generation model according to the graph nodes and the first node problem template;
inputting the question to be confirmed into a preset voice conversion model, acquiring authentication question voice converted by the voice conversion model through a voice synthesis technology, and broadcasting the authentication question voice to the call user;
receiving reply voice information of the call user aiming at the authentication question voice, inputting the reply voice information and the user sample voice information into a voiceprint recognition model, and simultaneously inputting the reply voice information and the question to be confirmed into a reply recognition model;
extracting voiceprint features in the reply voice message through the voiceprint recognition model, and obtaining a voiceprint matching result output by the voiceprint recognition model according to the voiceprint features; the voiceprint matching result refers to a confidence value of matching of the voiceprint features and the user sample tone information;
performing text recognition and intention recognition on the reply voice information through the reply recognition model to obtain a reply result, and recognizing the matching degree of the question to be confirmed and the reply result through the reply recognition model to obtain a reply comprehensive result;
and determining the identity authentication result of the call user in the current conversation according to the voiceprint matching result and the reply comprehensive result.
An authentication apparatus based on knowledge-graph and voiceprint recognition, comprising:
the acquisition module is used for determining the user identification code of the call user according to the user information and acquiring user sample tone information and a user knowledge graph associated with the user identification code; the user knowledge graph comprises graph nodes with a tree structure;
the generating module is used for inputting the user knowledge graph into a first authentication problem generating model and acquiring a problem to be confirmed which is generated by the first authentication problem generating model and contains a ternary tree structure; the problem to be confirmed is generated by the first authentication problem generation model according to the graph nodes and the first node problem template;
the conversion module is used for inputting the question to be confirmed into a preset voice conversion model, acquiring the authentication question voice converted by the voice conversion model through a voice synthesis technology, and broadcasting the authentication question voice to the call user;
the input module is used for receiving reply voice information of the call user aiming at the authentication question voice, inputting the reply voice information and the user sample voice information into a voiceprint recognition model, and simultaneously inputting the reply voice information and the question to be confirmed into a reply recognition model;
the extracting module is used for extracting the voiceprint features in the reply voice message through the voiceprint recognition model and obtaining a voiceprint matching result output by the voiceprint recognition model according to the voiceprint features; the voiceprint matching result refers to a confidence value of matching of the voiceprint features and the user sample tone information;
the recognition module is used for performing text recognition and intention recognition on the reply voice information through the reply recognition model to obtain a reply result, and recognizing the matching degree of the question to be confirmed and the reply result through the reply recognition model to obtain a reply comprehensive result;
and the authentication module is used for determining the identity authentication result of the call user in the current conversation according to the voiceprint matching result and the reply comprehensive result.
A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, the processor implementing the steps of the above-described authentication method based on knowledge-graph and voiceprint recognition when executing the computer program.
A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the above-mentioned authentication method based on knowledge-graph and voiceprint recognition.
According to the authentication method, the authentication device, the computer equipment and the storage medium based on the knowledge graph and the voiceprint recognition, user information in an identity verification instruction is obtained by receiving the identity verification instruction of a call user; determining a user identification code of the call user according to the user information, and acquiring user sample tone information and a user knowledge graph associated with the user identification code; the user knowledge graph comprises graph nodes with a tree structure; inputting the user knowledge graph into a first authentication problem generation model, and acquiring a problem to be confirmed which is generated by the first authentication problem generation model and contains a ternary tree structure; the problem to be confirmed is generated by the first authentication problem generation model according to the graph nodes and the first node problem template; inputting the question to be confirmed into a preset voice conversion model, acquiring authentication question voice converted by the voice conversion model through a voice synthesis technology, and broadcasting the authentication question voice to the call user; receiving reply voice information of the call user aiming at the authentication question voice, inputting the reply voice information and the user sample voice information into a voiceprint recognition model, and simultaneously inputting the reply voice information and the question to be confirmed into a reply recognition model; extracting voiceprint features in the reply voice message through the voiceprint recognition model, and obtaining a voiceprint matching result output by the voiceprint recognition model according to the voiceprint features; the voiceprint matching result refers to a confidence value of matching of the voiceprint features and the user sample tone information; performing text recognition and intention recognition on the reply voice information through the reply recognition model to obtain a reply result, and recognizing the matching degree of the question to be confirmed and the reply result through the reply recognition model to obtain a reply comprehensive result; and determining the identity authentication result of the call user in the current conversation according to the voiceprint matching result and the reply comprehensive result.
The invention realizes the purpose of obtaining the user information in the identity authentication instruction; determining a user identification code of the call user according to the user information, and acquiring user sample tone information and a user knowledge graph associated with the user identification code; acquiring a to-be-confirmed question containing a ternary tree structure generated by a first authentication question generation model according to the user knowledge graph; through a voice synthesis technology, the authentication problem voice is obtained through conversion of a voice conversion model; receiving reply voice information of the call user aiming at the authentication question voice, inputting the reply voice information and the user sample voice information into a voiceprint recognition model, and simultaneously inputting the reply voice information and the question to be confirmed into a reply recognition model; extracting voiceprint features in the reply voice information through the voiceprint recognition model to obtain a voiceprint matching result; performing text recognition and intention recognition on the reply voice information through the reply recognition model to obtain a reply result, and obtaining a reply comprehensive result through the reply recognition model; according to the voiceprint matching result and the reply comprehensive result, the identity authentication result of the call user in the current round of conversation is determined, so that the user sample voice information and the user knowledge map of the call user are obtained, the problem to be confirmed is generated through the first authentication problem generation model, the authentication problem voice is obtained through the voice synthesis technology conversion, the reply voice information is received, voiceprint recognition is carried out through the voiceprint recognition model, text recognition and intention recognition are carried out through the reply recognition model, the voiceprint matching result and the reply comprehensive result are obtained, the identity authentication result in the current round of conversation is confirmed through the voiceprint matching result and the reply comprehensive result, the double authentication effect is achieved, the identity of the call user is authenticated more accurately, and the safety of user information is enhanced.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments of the present invention will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without inventive labor.
FIG. 1 is a schematic diagram of an application environment of an authentication method based on knowledge-graph and voiceprint recognition according to an embodiment of the present invention;
FIG. 2 is a flow diagram of an authentication method based on knowledge-graph and voiceprint recognition in one embodiment of the invention;
FIG. 3 is a flowchart of step S80 of the method for authentication based on knowledge-graph and voiceprint recognition in an embodiment of the invention;
FIG. 4 is a flowchart of step S20 of the method of authentication based on knowledge-graph and voiceprint recognition in one embodiment of the invention;
FIG. 5 is a flowchart of step S30 of the method for authentication based on knowledge-graph and voiceprint recognition in an embodiment of the invention;
FIG. 6 is a flowchart of step S60 of the method of authentication based on knowledge-graph and voiceprint recognition in one embodiment of the invention;
FIG. 7 is a flowchart of step S70 of the method of authentication based on knowledge-graph and voiceprint recognition in one embodiment of the invention;
FIG. 8 is a flowchart of step S90 of the method of authentication based on knowledge-graph and voiceprint recognition in one embodiment of the invention;
FIG. 9 is a functional block diagram of an authentication device based on knowledge-graph and voiceprint recognition in an embodiment of the present invention;
FIG. 10 is a schematic diagram of a computer device in an embodiment of the invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The authentication method based on knowledge graph and voiceprint recognition provided by the invention can be applied to the application environment as shown in fig. 1, wherein a client (computer equipment) communicates with a server through a network. The client (computer device) includes, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, cameras, and portable wearable devices. The server may be implemented as a stand-alone server or as a server cluster consisting of a plurality of servers.
In an embodiment, as shown in fig. 2, an authentication method based on knowledge graph and voiceprint recognition is provided, which mainly includes the following steps S10-S80:
and S10, receiving an identity authentication instruction of a call user, and acquiring user information in the identity authentication instruction.
Understandably, the call user is a user who needs to perform identity authentication and is in a call, the identity authentication instruction is an instruction triggered by the call user who needs to perform identity authentication, and the user information is information related to the call user, such as an identity card number, a mobile phone number and the like of the call user.
S20, determining the user identification code of the call user according to the user information, and acquiring user sample tone information and a user knowledge graph associated with the user identification code; the user knowledge graph comprises graph nodes of a tree structure.
Understandably, the user identification code is a unique code for identifying the call user, the user identification code can be set according to requirements, the user sample sound information is voiceprint feature data obtained after the call user records according to sample sound content and has extracted voiceprint features, the user sample sound information is associated with the user identification code, the user knowledge graph is a knowledge graph which is constructed by carrying out knowledge fusion and relation extraction on the user data associated with the user identification code and has a tree-shaped structure associated with the user identification code in a triple mode.
In an embodiment, as shown in fig. 4, before the step S20, that is, before the obtaining of the user sample tone information and the user knowledge base associated with the user identification code, the method includes:
s201, acquiring user data associated with the user identification code.
Understandably, the user data includes structured data and unstructured data associated with the user identification code, the structured data is information that can be represented by data or a uniform structure, such as numbers and symbols, and the structured data has a definite relationship that makes the data convenient to use, for example, the structured data includes: credit card number, date, financial amount, telephone number, address, product name, etc., in a non-relational database, without conforming to any predefined model, and stored in a non-relational database, which may be textual or non-textual, and may be human or machine generated images or videos, etc.
S202, converting the structured data in the user data to obtain first data, and simultaneously performing text extraction on the unstructured data in the user data to obtain second data.
Understandably, the structured data is data logically expressed and realized by a two-dimensional table structure in a database in an acquisition server, is stored and managed mainly by a relational database, and is converted according to a preset rule to acquire knowledge such as entities, events, related attributes and the like so as to obtain the first data; the unstructured data is data obtained by removing the structured data from the user data, the unstructured data is generally obtained by text extraction from the content or comment of the visited website associated with the user identification code, and the second data is obtained by text extraction from the unstructured data, wherein the text extraction refers to entity knowledge extraction, event extraction and attribute extraction from the unstructured data.
S203, performing knowledge fusion and relation extraction on all the first data and all the second data to obtain map nodes, constructing a user knowledge map which is associated with the user identification code and contains the map nodes according to a triple mode, and storing the user knowledge map in a block chain.
Understandably, the knowledge fusion is to fuse the same entities from different knowledge bases, that is, fuse all the same entities in the first data and all the same entities in the second data together, and may also be said to be superimposed, the relationship extraction is to extract specific event or fact information through natural language text, connect two entities according to the event or fact information, establish a relationship between the two entities, and the way of a triplet is RDF (Resource Description Framework) in a knowledge graph, such as (zhang, height, 185), (zhang, occupation, teacher), and store the user knowledge graph in a block chain.
It is emphasized that, to further ensure the privacy and security of the user's knowledge-graph, the user's knowledge-graph may also be stored in the nodes of the blockchain.
The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism, an encryption algorithm and the like. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like. The decentralized fully distributed DNS service provided by the block chain can realize the query and analysis of the domain name through the point-to-point data transmission service among all nodes in the network, can be used for ensuring that an operating system and firmware of certain important infrastructure are not tampered, can monitor the state and integrity of software, finds out bad tampering, ensures that transmitted data are not tampered, stores the user knowledge graph in the block chain, and can ensure the privacy and the safety of the user knowledge graph.
According to the invention, the user knowledge graph of the call user is constructed so as to extract the important and frequent related information of the call user, and the accuracy can be enhanced for the subsequent identification.
S30, inputting the user knowledge graph into a first authentication problem generation model, and acquiring a problem to be confirmed which is generated by the first authentication problem generation model and contains a ternary tree structure; and the problem to be confirmed is generated by the first authentication problem generation model according to the graph nodes and the first node problem template.
Understandably, the first node problem template is a node problem template corresponding to the graph node, the node problem template is a template which is generated according to attributes of the graph node and is formed in a questioning mode, the first authentication problem model is a template which is generated by extracting one graph node from the user knowledge graph, the first node problem template is determined according to the graph node, then the node attributes in the graph node are combined with the first node problem template to generate the problem to be confirmed, the problem to be confirmed is a ternary tree structure formed by three sub-nodes, and the node attributes are contents described in a triple mode, such as (zhang, height, 185), (zhang, occupation and teacher).
In an embodiment, as shown in fig. 5, the step S30, namely, the obtaining the question to be confirmed containing the ternary tree structure generated by the first authentication question generation model, includes:
s301, randomly acquiring a first layer of tree nodes from the user knowledge graph through the first authentication problem model; the user knowledge-graph includes a plurality of the first level tree nodes.
Understandably, the first-layer tree node is a graph node next to the user identification code in a tree structure in the knowledge graph, that is, a graph node at a first layer in the user knowledge graph of the tree structure, the tree structure in the knowledge graph includes multi-level graph nodes, and the multi-level graph nodes are divided into the first-layer tree node, the second-layer tree node and the like, and the root in the user knowledge graph is the user identification code.
S302, a node problem template corresponding to the node attribute in the first layer of tree nodes is obtained, and the obtained node problem template is determined as the first node problem template.
Understandably, the node problem template is a template which is set by extracting only one or more node attributes from the tree nodes and is asked in a problem mode, and the obtained node problem template corresponding to the node attributes in the first-layer tree nodes is determined as the first node problem template by matching the node attributes with the corresponding node problem template.
S303, combining the node attribute with the first node problem template through the first authentication problem model to generate the problem to be confirmed; the problem to be confirmed includes three child nodes, the child nodes being associated with a tree node in the user knowledge graph.
Understandably, extracting relevant node attributes in the first layer of tree nodes according to the requirement of the first node problem template to form an attribute problem of the problem to be confirmed, taking the remaining one node attribute as an attribute answer of the problem to be confirmed, wherein the node attribute is associated with the second layer of tree node corresponding to the first layer of tree node in a user knowledge graph, the second layer of tree node is a branch of the first layer of tree node, namely a master-slave relationship, the child nodes comprise correct nodes (or pairs of nodes, the node attribute of which is correct), incorrect nodes (or wrong nodes, the node attribute of which is incorrect), unknown nodes (or uncertain nodes, the node attribute of which is unknown), wherein the attribute answer is associated with the pairs of nodes, and the incorrect nodes are associated with the content except the attribute answer, the unknown nodes are associated with unanswered or unknown content.
The invention realizes the random generation of the problem to be confirmed through the first authentication problem model, and the problem to be confirmed of the ternary tree structure is formed through the establishment and the association of the first node problem template and the three child nodes.
And S40, inputting the question to be confirmed into a preset voice conversion model, acquiring the authentication question voice converted by the voice conversion model through a voice synthesis technology, and broadcasting the authentication question voice to the call user.
Understandably, the speech synthesis technology is abbreviated as TTS, which refers to a technology for converting a text file into an audio file of a real-person mandarin in real time, the voice conversion model is used for converting the question to be confirmed into the authentication question voice through the voice synthesis technology, the speech conversion model is a deep convolution neural network model which is finished by training, the processing process of the speech synthesis technology can be set according to requirements, and as an optimization, the voice synthesis technology can be used for performing text analysis on the input problem to be confirmed, applying a deep two-way long-and-short-term memory network to consider information such as semantics, syntax, part of speech and the like, obtaining the voice of the authentication problem through a vocoder, the deep bidirectional long and short term memory network is abbreviated as Bi-LSTM, and the vocoder is a voice signal codec capable of encoding input information into sound.
And S50, receiving reply voice information of the call user aiming at the authentication question voice, inputting the reply voice information and the user sample voice information into a voiceprint recognition model, and simultaneously inputting the reply voice information and the question to be confirmed into a reply recognition model.
Understandably, after receiving a reply to the authentication problem voice from the call user, recording the reply voice information in a recording mode, and inputting the reply voice information and the user sample voice information into the voiceprint recognition model, where the voiceprint recognition model is a trained deep convolutional neural network model, and a network structure of the voiceprint recognition model may be set according to requirements, for example, the network structure of the voiceprint recognition model may be a network structure of a GMM (gaussian) model, a network structure of a UBM (universal background) model, a network structure of a GMM-UBM (mixed gaussian-universal background model), or the like.
In an embodiment, before the step S50, namely before the step of inputting the reply speech information and the user sample information into the voiceprint recognition model, the method includes:
s501, acquiring a voice sample set; the set of speech samples comprises a plurality of speech samples, one of the speech samples being associated with one of the speech tags.
Understandably, the set of voice samples is a set of the voice samples, the voice samples are voice files of users collected in history, one voice sample is associated with one voice tag, the voice tag is a voiceprint feature value corresponding to the user who uttered the voice sample, and the voiceprint feature value may be information extracted from the voice sample manually.
S502, inputting the voice sample into a deep convolution neural network model containing initial parameters.
Understandably, the deep convolutional neural network model contains the initial parameters.
S503, extracting the voiceprint features in the voice sample through the deep convolutional neural network model, and obtaining a sample identification result output by the deep convolutional neural network model.
Understandably, the voiceprint features are features related to a sound wave frequency spectrum of sound, the voiceprint features comprise tone quality, duration, intensity, height and the like, and the sample identification result identified by the deep convolutional neural network model according to the extracted voiceprint features is obtained.
S504, determining a loss value according to the sample recognition result and the voice tag.
Understandably, the loss value is calculated by outputting the sample recognition result and the voice tag through a loss function of the deep convolutional neural network model, and the loss function may be set according to requirements, for example, the loss function may be a cross entropy loss function or the like.
And S505, recording the converged deep convolutional neural network model as a voiceprint recognition model when the loss value reaches a preset convergence condition.
Understandably, the convergence condition may be a condition that the loss value is smaller than a set threshold, that is, when the loss value is smaller than the set threshold, the deep convolutional neural network model after convergence is recorded as a voiceprint recognition model.
In an embodiment, after the step S504, that is, after determining the loss value according to the sample recognition result and the voice tag, the method further includes:
s506, when the loss value does not reach the preset convergence condition, iteratively updating the initial parameters of the deep convolutional neural network model until the loss value reaches the preset convergence condition, and recording the converged deep convolutional neural network model as a voiceprint recognition model.
Understandably, the convergence condition may also be a condition that the loss value is small and will not decrease again after 10000 times of calculation, that is, when the loss value is small and will not decrease again after 10000 times of calculation, the training is stopped, and the converged deep convolutional neural network model is recorded as a voiceprint recognition model.
S60, extracting voiceprint features in the reply voice message through the voiceprint recognition model, and obtaining a voiceprint matching result output by the voiceprint recognition model according to the voiceprint features; the voiceprint matching result refers to a confidence value of matching of the voiceprint features and the user sample tone information.
Understandably, the voiceprint recognition model outputs a recognition result according to the extracted voiceprint features in the reply voice information, the voiceprint features are features related to a sound wave frequency spectrum of sound, the voiceprint features comprise tone quality, tone length, tone intensity, tone pitch and the like, the voiceprint recognition model compares and verifies the recognition result and the user sample sound information to obtain a confidence value after comparison and verification, the confidence value indicates a probability value that the recognition result is matched with the user sample sound information, then the voiceprint matching result is determined according to the confidence value, and the voiceprint matching result represents the voiceprint matching degree between the reply voice information and the user sample sound information.
In an embodiment, as shown in fig. 6, in the step S60, that is, extracting the voiceprint feature in the reply voice message through the voiceprint recognition model, and acquiring a voiceprint matching result output by the voiceprint recognition model according to the voiceprint feature includes:
s601, obtaining the recognition result output by the voiceprint recognition model according to the extracted voiceprint features.
Understandably, the extracting process of the voiceprint recognition model includes preprocessing the reply voice information, convolving the preprocessed reply voice information according to the network structure of the voiceprint recognition model, extracting the voiceprint features in the preprocessed reply voice information through convolution, and outputting the voiceprint feature vectors corresponding to the voiceprint features, which are the recognition results, wherein the preprocessing can be set according to requirements, such as the preprocessing including VAD, denoising, dereverberation, spaker segmentation, and the like.
S602, comparing and verifying the identification result and the user sample tone information to obtain the confidence value after comparison and verification.
Understandably, the matching method of the comparison verification may be set according to requirements, for example, the matching method of the comparison verification may be a probability statistical matching method, a vector quantization matching method, a VQ cluster matching method, and the like, and preferably, the matching method of the comparison verification is a probability statistical matching method, and the recognition result and the user sample tone information are compared and verified by the matching method of the comparison verification to obtain a probability value of a matching degree between the recognition result and the user sample tone information, that is, the confidence value after the comparison verification is performed.
S603, determining the voiceprint matching result according to the confidence value, wherein the voiceprint matching result represents the voiceprint matching degree between the reply voice information and the user sample voice information.
Understandably, if the confidence value is greater than or equal to a preset confidence threshold, determining that the voiceprint matching result is a pass match, and if the confidence value is less than the preset confidence threshold, determining that the voiceprint matching result is a fail match, and meanwhile, the voiceprint matching result comprises the confidence value.
The invention realizes the voiceprint feature extraction through the voiceprint recognition model, and the voiceprint feature is compared and verified with the user sample voice information to obtain the voiceprint matching result, thereby improving the accuracy of voiceprint recognition.
S70, performing text recognition and intention recognition on the reply voice message through the reply recognition model to obtain a reply result, and recognizing the matching degree of the question to be confirmed and the reply result through the reply recognition model to obtain a reply comprehensive result.
Understandably, the text recognition is a method of recognizing through a speech recognition technology, the intention recognition is a method of recognizing through a natural language understanding technology, the speech recognition technology is abbreviated as ASR, the speech recognition technology is a method of analyzing parameters of speech features of an input speech file, a result is recognized according to the parameters of the speech features, the natural language understanding technology is abbreviated as NLU, the natural language understanding technology is a method of extracting semantic features through lexical analysis, syntactic analysis and semantic analysis of the input speech file, and finally, a result is analyzed according to the extracted semantic features, and the reply recognition model includes a reply text recognition model, a reply intention recognition model and a dialogue management module.
In an embodiment, as shown in fig. 7, in the step S70, the performing text recognition and intention recognition on the reply voice message through the reply recognition model to obtain a reply result, and then recognizing a matching value between the question to be confirmed and the reply result through the reply recognition model to obtain a reply integrated result includes:
s701, performing text recognition on the reply voice message through the reply text recognition model according to a voice recognition technology to obtain a reply text result; the reply recognition model includes a reply text recognition model, a reply intent recognition model, and a dialog management module.
Understandably, the text recognition is a method of performing recognition through a voice recognition technology, the voice recognition technology is to analyze parameters of voice characteristics for an input voice file, recognize a result according to the parameters of the voice characteristics, recognize text contents in the reply voice information through the reply text recognition model, and obtain the reply text result, that is, the reply text result includes the Chinese contents in the reply voice information, which are answered by the calling user according to the authentication question voice.
S702, according to the natural language understanding technology, performing intention recognition on the reply voice information through the reply intention recognition model to obtain a reply state result.
Understandably, according to a natural language understanding technology, extracting semantic features in the reply voice information through the reply intention recognition model, wherein the semantic features are text meanings in the reply voice information, the semantic features comprise emotions, intonation and the like, intention recognition is carried out on the extracted semantic features to obtain a reply state result, and the reply state result represents emotional features of a call user answering the authentication question voice and shows whether the reply voice information is positive kiss or whether impatient or boring emotions are contained.
And S703, determining the reply text result and the reply state result as the reply result through the dialogue management module.
Understandably, the reply text result and the reply state result are combined into the reply result through the conversation management module, the conversation management module is a module for managing the results of multiple rounds of conversations, and the conversation management module comprises confirmation of the reply result.
S704, matching the reply text with the child nodes in the question to be confirmed through the dialogue management module to obtain a matching value of the reply text and the question to be confirmed.
Understandably, the dialog management module further includes obtaining the content in the child node, and matching the reply text with the content in the child node to obtain the matching value, where the matching value may be calculated by a text similarity matching algorithm, or may be obtained by a method of determining whether the reply text completely corresponds to the matching (that is, if the reply text includes the content in the child node, the matching value is determined to be 100%, and if the reply text does not include the content in the child node, the matching value is determined to be 0%).
S705, obtaining the reply comprehensive result through the dialogue management module according to the matching value and the reply state result.
Understandably, the dialog management module further calculates the authenticity of the user identity by replying the status (including emotion and intonation) of the question to be confirmed and the result of voiceprint matching, the reply comprehensive result includes the matching value, the reply status result and the result of confirmation result, the confirmation result includes correct, incorrect and unknown, and the confirmation result indicates whether the reply of the question to be confirmed is correct or not.
The invention realizes that the answer comprehensive index is obtained by replying the recognition model according to the voice recognition technology, the natural language understanding technology and the dialogue management, and can improve the recognition accuracy.
And S80, determining the identity authentication result of the call user in the current conversation according to the voiceprint matching result and the reply comprehensive result.
Understandably, inputting the voiceprint matching result and the reply integrated result into a preset weighted verification model, the weighted verification model obtains a verification probability value by converting the voiceprint matching result into a measurable numerical value and then carrying out weighted processing on the matching value in the reply comprehensive result, wherein the verification probability value represents the probability of whether the call user passes the verification or not, if the verification probability value is larger than a preset probability threshold value, the identity authentication result of the call user in the current round of conversation is confirmed to be that the current round of authentication is passed, if the verification probability value is larger than a preset probability threshold value, the identity authentication result of the call user in the current round of conversation is determined to be the current round of authentication failure, the identity authentication result comprises a current round authentication identifier and a current round authentication probability value, and the current round authentication identifier comprises a current round authentication passing and a current round authentication failure.
The invention realizes the purpose of obtaining the user information in the identity authentication instruction; determining a user identification code of the call user according to the user information, and acquiring user sample tone information and a user knowledge graph associated with the user identification code; acquiring a to-be-confirmed question containing a ternary tree structure generated by a first authentication question generation model according to the user knowledge graph; through a voice synthesis technology, the authentication problem voice is obtained through conversion of a voice conversion model; receiving reply voice information of the call user aiming at the authentication question voice, inputting the reply voice information and the user sample voice information into a voiceprint recognition model, and simultaneously inputting the reply voice information and the question to be confirmed into a reply recognition model; extracting voiceprint features in the reply voice information through the voiceprint recognition model to obtain a voiceprint matching result; performing text recognition and intention recognition on the reply voice information through the reply recognition model to obtain a reply result, and obtaining a reply comprehensive result through the reply recognition model; according to the voiceprint matching result and the reply comprehensive result, the identity authentication result of the call user in the current round of conversation is determined, so that the user sample voice information and the user knowledge map of the call user are obtained, the problem to be confirmed is generated through the first authentication problem generation model, the authentication problem voice is obtained through the voice synthesis technology conversion, the reply voice information is received, voiceprint recognition is carried out through the voiceprint recognition model, text recognition and intention recognition are carried out through the reply recognition model, the voiceprint matching result and the reply comprehensive result are obtained, the identity authentication result in the current round of conversation is confirmed through the voiceprint matching result and the reply comprehensive result, the double authentication effect is achieved, the identity of the call user is authenticated more accurately, and the safety of user information is enhanced.
In an embodiment, as shown in fig. 3, after the step S80, that is, after the determining the authentication result of the current session according to the confidence value and the answer integrated result, the method includes:
and S90, inputting the question to be confirmed, the user knowledge graph and the reply result into a second authentication question generation model, and acquiring the next round of question to be confirmed generated by the second authentication question generation model through a knowledge decision method.
Understandably, the reply result includes a reply text and a reply state, the problem to be confirmed includes three child nodes, the knowledge decision method is to match the reply text with each child node respectively, obtain a tree node associated with the child node matched with the reply text, obtain a next-layer tree node identical to the tree node in the user knowledge graph, the tree node is the graph node having an association relationship in the user knowledge graph, and the tree node includes a first-layer tree node and a next-layer tree node.
The next round of to-be-confirmed question is generated by the second authentication question generation model according to the next round of map node and a second node question template, and the next round of map node is a knowledge map node determined according to the user knowledge map, the to-be-confirmed question and the reply result through the knowledge decision method.
In an embodiment, as shown in fig. 8, the step S90, namely, inputting the question to be confirmed, the user knowledge graph and the reply result into a second authentication question generation model, and acquiring a next round of question to be confirmed generated by the second authentication question generation model includes:
s901, inquiring the child nodes matched with the reply comprehensive result, and acquiring the tree nodes associated with the child nodes.
Understandably, the child node matched with the answer integrated result is queried according to the confirmation result in the answer integrated result, that is, the child node matched with the content of the child node is queried according to the confirmation result in the answer integrated result, and then the tree node associated with the child node is obtained, in an embodiment, if the confirmation result is unknown, "the matched child node is a child node with unknown" content, the obtained associated tree node may be any first-layer tree node not including the first-layer tree node corresponding to the question to be confirmed, if the confirmation result is incorrect, "the matched child node is a child node with" incorrect "content, and the obtained associated tree node may be any first-layer tree node marked with a sensitive identifier, which indicates that a more sensitive authentication question is needed to authenticate a call user, if the confirmation result is correct, the matched child node is a child node with correct content, and the obtained associated tree node can be a second-layer tree node corresponding to the first-layer tree node corresponding to the problem to be confirmed, wherein the second-layer tree node is a branch of the first-layer tree node, namely, a master-slave relationship.
S902, finding out next-layer tree nodes which are the same as the tree nodes in the user knowledge graph, determining the found next-layer tree nodes as next-round graph nodes, obtaining node attributes in the next-round graph nodes, and determining the node attributes as next-round node attributes.
Understandably, the same tree nodes as the tree nodes are found out in the user knowledge graph, the tree nodes are determined as the next-layer tree nodes and are used as the next-round graph nodes, the node attributes in the next-round graph nodes are obtained and are marked as the next-round node attributes.
And S903, acquiring the node problem template corresponding to the node attribute of the next round, and determining the acquired node problem template as a second node problem template.
Understandably, the node problem template is a template which is set by extracting only one or more node attributes from the tree nodes and is asked in a problem mode, and the obtained node problem template corresponding to the node attributes of the next round is determined as the second node problem template by matching the node attributes of the next round with the corresponding node problem template of the next round.
And S904, combining the attribute of the next round of nodes with the second node problem template through the second authentication problem model to generate the next round of problems to be confirmed.
Understandably, the content of the next round of node attributes is filled into the corresponding position in the second node problem template, and the next round of problems to be confirmed is generated by combining, for example: the question to be confirmed is ' do your occupation is teacher ', if the calling user replies ' is teacher ', the confirmation result in the reply comprehensive result is ' correct ', the next-layer tree node obtained through the user knowledge graph is (Zhang three, occupation, junior teacher), and the corresponding second-node question template is ' do your occupation be XXXXXX? "then the next round to confirm the question" is your occupation a junior teacher? ".
The invention realizes that the child nodes matched with the reply text are inquired out and the associated tree nodes are obtained; searching a next-layer tree node which is the same as the tree node in the user knowledge graph, determining the searched next-layer tree node as the next-round graph node, and acquiring the attribute of the next-round node; acquiring a node problem template corresponding to the node attribute of the next round, and determining the acquired node problem template as a second node problem template; and combining the attribute of the next round of nodes with the problem template of the second nodes through the problem model of the second authentication to generate the problem to be confirmed in the next round, so that the incidence relation of the tree nodes based on the knowledge graph is realized, the problem to be confirmed in the next round is generated in a layer-by-layer progressive mode, the identity authentication of the calling user is more accurately carried out, and the identification accuracy and reliability are improved.
And S100, inputting the reply state result in the reply comprehensive result and the next round of questions to be confirmed into a preset personalized voice conversion model, acquiring the next round of authentication question voice obtained by the personalized voice conversion model conversion through a voice synthesis technology, and broadcasting the next round of authentication question voice to the call user.
Understandably, the personalized voice conversion model is a trained neural network model, the personalized voice conversion model realizes that the language, the speed and the style matched with the reply state are identified according to the extracted reply state by extracting the reply state in the reply result, the language, the speed and the style are fused into the next round of questions to be confirmed through a voice synthesis technology, the next round of authentication problem voice is converted and synthesized, the next round of authentication problem voice is voice data which is personalized by the call user, and the next round of authentication problem voice is broadcasted to the call user.
And S110, receiving the next round of reply voice information of the call user for the next round of authentication question voice reply, inputting the next round of reply voice information and the user sample tone information into the voiceprint recognition model, and simultaneously inputting the next round of reply voice information and the next round of question to be confirmed into the reply recognition model.
Understandably, recording the next round of reply voice information in a recording mode, wherein the next round of reply voice information is the voice information of a calling user for answering the next round of authentication questions, inputting the next round of reply voice information and the user sample voice information into the voiceprint recognition model together, and inputting the next round of reply voice information and the next round of questions to be confirmed into the reply recognition model.
S120, acquiring a next round voiceprint matching result output by the voiceprint recognition model according to the extracted voiceprint features in the next round of repeated voice information; and the next round of voiceprint matching result refers to a confidence value of matching of the voiceprint features in the next round of repeated voice information with the user sample voice information.
Understandably, extracting the voiceprint features in the next round of repeated voice information through the voiceprint recognition model, and recognizing according to the extracted voiceprint features in the next round of repeated voice information to obtain the next round of voiceprint matching result, wherein the next round of voiceprint matching result refers to the confidence value of matching of the voiceprint features in the next round of repeated voice information and the user sample voice information.
And S130, performing text recognition and intention recognition on the next round of reply voice information through the reply recognition model to obtain a next round of reply result, and recognizing the matching degree of the next round of questions to be confirmed and the next round of reply result through the reply recognition model to obtain a next round of reply comprehensive result.
Understandably, performing text recognition and intention recognition on the next round of reply voice information through the reply recognition model, outputting the next round of reply result, and recognizing the matching degree of the next round of questions to be confirmed and the next round of reply result through the reply recognition model to obtain the next round of reply comprehensive result.
And S140, determining the identity authentication result of the next round of conversation according to the voiceprint matching result of the next round and the reply comprehensive result of the next round.
Understandably, the identity authentication result of the next round of conversation is confirmed by the weighted verification model by inputting the combined result of the next round of voiceprint matching and the next round of reply into the weighted verification model.
S150, when the preset round of conversation is detected to be completed, obtaining the identity authentication result of each round of conversation, and determining the final identity authentication result of the call user according to all the identity authentication results; the final identity authentication result refers to the identity authentication result of the call user determined through multiple rounds of conversations.
Understandably, when detecting that the dialog does not reach the preset round, repeating steps S90 to S140 until detecting that the dialogues of the preset round are all completed, where the preset round may be set according to requirements, for example, the preset round is set as a three-round dialog, or the preset round is set as a two-round dialog determined to be added or not according to the identity authentication result of the dialog of the current round and the identity authentication result of the dialog of the next round, and when detecting that the dialogues of the preset round are all completed, determining whether the final identity result of the call flower passes or not according to the identity authentication result obtained by each round of dialog.
Inputting the question to be confirmed, the user knowledge graph and the reply comprehensive result into a second authentication question generation model, and acquiring a next round of question to be confirmed generated by the second authentication question generation model through a knowledge decision method; inputting a reply state result in the reply comprehensive result and the next round of questions to be confirmed into a preset personalized voice conversion model, acquiring the next round of authentication question voice obtained by the personalized voice conversion model conversion through a voice synthesis technology, and broadcasting the next round of authentication question voice to the call user; receiving the next round of reply voice information of the call user for the next round of authentication question voice reply, inputting the next round of reply voice information and the user sample tone information into the voiceprint recognition model, and simultaneously inputting the next round of reply voice information and the next round of question to be confirmed into the reply recognition model; acquiring a next round voiceprint matching result output by the voiceprint recognition model according to the extracted voiceprint features in the next round of repeated voice information; performing text recognition and intention recognition on the next round of repeated voice information through the reply recognition model to obtain a next round of reply comprehensive result, and recognizing the matching degree of the next round of questions to be confirmed and the next round of reply comprehensive result through the reply recognition model to obtain the next round of reply comprehensive result; determining the identity authentication result of the next round of conversation according to the voiceprint matching result of the next round and the reply comprehensive result of the next round; when the preset number of conversations is detected to be completed, obtaining the identity authentication result of each conversation, and determining the final identity authentication result of the call user according to all the identity authentication results; the final identity authentication result refers to the identity authentication result of the call user determined through multiple rounds of conversations.
According to the invention, the next round of questions to be confirmed generated by the second authentication question generation model are obtained according to the questions to be confirmed, the user knowledge graph and the reply comprehensive result by a knowledge decision method; through a voice synthesis technology, the personalized voice conversion model converts the next round of questions to be confirmed into the next round of authentication question voice; receiving the next round of repeated voice information, identifying a next round of voiceprint matching result through a voiceprint identification model, identifying a next round of reply comprehensive result through the reply identification model, finally determining the identity authentication result of the next round of conversation according to the next round of questions to be confirmed and the next round of reply comprehensive result, and after the preset round of conversation is finished, determining whether the calling user passes the identity authentication according to the identity authentication result of each round of conversation.
In one embodiment, an authentication device based on knowledge graph and voiceprint recognition is provided, and the authentication device based on knowledge graph and voiceprint recognition corresponds to the authentication method based on knowledge graph and voiceprint recognition in the embodiment one to one. As shown in fig. 9, the authentication apparatus based on knowledge graph and voiceprint recognition includes a receiving module 11, an obtaining module 12, a generating module 13, a converting module 14, an inputting module 15, an extracting module 16, a recognizing module 17, and an authenticating module 18. The functional modules are explained in detail as follows:
the receiving module 11 is configured to receive an authentication instruction of a call user, and acquire user information in the authentication instruction;
an obtaining module 12, configured to determine, according to the user information, a user identification code of the call user, and obtain user sample tone information and a user knowledge graph associated with the user identification code; the user knowledge graph comprises graph nodes with a tree structure;
the generating module 13 is configured to input the user knowledge graph into a first authentication problem generating model, and acquire a to-be-confirmed problem containing a ternary tree structure generated by the first authentication problem generating model; the problem to be confirmed is generated by the first authentication problem generation model according to the graph nodes and the first node problem template;
the conversion module 14 is configured to input the question to be confirmed into a preset voice conversion model, acquire, through a voice synthesis technology, the authentication question voice converted by the voice conversion model, and broadcast the authentication question voice to the call user;
an input module 15, configured to receive reply voice information of the call user for the authentication question voice, input the reply voice information and the user sample tone information into a voiceprint recognition model, and simultaneously input the reply voice information and the question to be confirmed into a reply recognition model;
an extraction module 16, configured to extract a voiceprint feature in the reply voice message through the voiceprint recognition model, and obtain a voiceprint matching result output by the voiceprint recognition model according to the voiceprint feature; the voiceprint matching result refers to a confidence value of matching of the voiceprint features and the user sample tone information;
the recognition module 17 is configured to perform text recognition and intention recognition on the reply voice message through the reply recognition model to obtain a reply result, and recognize the matching degree between the question to be confirmed and the reply result through the reply recognition model to obtain a reply comprehensive result;
and the authentication module 18 is used for determining the identity authentication result of the call user in the current conversation according to the voiceprint matching result and the reply comprehensive result.
In one embodiment, the authentication module 18 includes:
the generating unit is used for inputting the problem to be confirmed, the user knowledge graph and the reply result into a second authentication problem generating model and acquiring a next round of problem to be confirmed generated by the second authentication problem generating model through a knowledge decision method;
the personalized unit is used for inputting a reply state result in the reply result and the next round of questions to be confirmed into a preset personalized voice conversion model, acquiring the next round of authentication question voice obtained by the personalized voice conversion model conversion through a voice synthesis technology, and broadcasting the next round of authentication question voice to the call user;
a receiving unit, configured to receive next round of reply voice information of the voice reply to the next round of authentication problem from the call user, input the next round of reply voice information and the user sample tone information into the voiceprint recognition model, and simultaneously input the next round of reply voice information and the next round of to-be-confirmed problem into the reply recognition model;
a first obtaining unit, configured to obtain a next round voiceprint matching result output by the voiceprint recognition model according to the extracted voiceprint features in the next round of repeated voice information; the next round of voiceprint matching result refers to a confidence value of matching of voiceprint features in the next round of repeated voice information and the user sample voice information;
the recognition unit is used for performing text recognition and intention recognition on the next round of reply voice information through the reply recognition model to obtain a next round of reply result, and recognizing the matching degree of the next round of questions to be confirmed and the next round of reply result through the reply recognition model to obtain a next round of reply comprehensive result;
the determining unit is used for determining the identity authentication result of the next round of conversation according to the next round of voiceprint matching result and the next round of reply comprehensive result;
the final authentication unit is used for obtaining the identity authentication result of each round of conversation when the preset round of conversation is detected to be completed, and determining the final identity authentication result of the call user according to all the identity authentication results; the final identity authentication result refers to the identity authentication result of the call user determined through multiple rounds of conversations.
In one embodiment, the obtaining module 12 includes:
a second acquisition unit configured to acquire user data associated with the user identification code;
the conversion unit is used for converting structured data in the user data to obtain first data and extracting text from unstructured data in the user data to obtain second data;
the building unit is used for performing knowledge fusion and relation extraction on all the first data and all the second data to obtain map nodes, building a user knowledge map which is associated with the user identification code and contains the map nodes according to a triple mode, and storing the user knowledge map in a block chain.
In one embodiment, the generating module 13 includes:
a random unit, configured to randomly obtain a first-layer tree node from the user knowledge graph through the first authentication problem model; the user knowledge-graph comprises a plurality of the first-level tree nodes;
a third obtaining unit, configured to obtain a node problem template corresponding to a node attribute in the first-layer tree node, and determine the obtained node problem template as the first node problem template;
a combining unit, configured to combine the node attribute with the first node problem template through the first authentication problem model, and generate the problem to be confirmed; the problem to be confirmed includes three child nodes, the child nodes being associated with a tree node in the user knowledge graph.
In one embodiment, the extraction module 16 includes:
the fourth acquisition unit is used for acquiring the recognition result output by the voiceprint recognition model according to the extracted voiceprint features;
the comparison unit is used for comparing and verifying the identification result and the user sample tone information to obtain the confidence value after comparison and verification;
and the first output unit is used for determining the voiceprint matching result according to the confidence value, and the voiceprint matching result represents the voiceprint matching degree between the reply voice information and the user sample voice information.
In one embodiment, the identification module 17 comprises:
the text recognition unit is used for performing text recognition on the reply voice information through the reply text recognition model according to a voice recognition technology to obtain a reply text result; the reply recognition model comprises a reply text recognition model, a reply intention recognition model and a dialogue management module;
the intention recognition unit is used for carrying out intention recognition on the reply voice information through the reply intention recognition model according to a natural language understanding technology to obtain a reply state result;
the management unit is used for determining the reply text result and the reply state result as the reply result through the conversation management module;
the matching unit is used for matching the reply text with the child nodes in the question to be confirmed through the dialogue management module to obtain a matching value of the reply text and the question to be confirmed;
and the second output unit is used for obtaining the reply comprehensive result through the dialogue management module according to the matching degree value and the reply state result.
In one embodiment, the generating unit includes:
the query subunit is used for querying the child nodes matched with the reply comprehensive result and acquiring the tree nodes associated with the child nodes;
the searching subunit is used for searching a next-layer tree node which is the same as the tree node in the user knowledge graph, determining the searched next-layer tree node as the next-round graph node, acquiring the node attribute in the next-round graph node, and determining the node attribute as the next-round node attribute;
the acquiring subunit is configured to acquire the node problem template corresponding to the node attribute of the next round, and determine the acquired node problem template as a second node problem template;
and the synthesizing subunit is used for combining the attribute of the next round of nodes with the second node problem template through the second authentication problem model to generate the next round of problems to be confirmed.
For the specific limitations of the authentication device based on knowledge graph and voiceprint recognition, reference may be made to the above limitations of the authentication method based on knowledge graph and voiceprint recognition, which are not described in detail herein. The modules in the authentication device based on knowledge graph and voiceprint recognition can be wholly or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.
In one embodiment, a computer device is provided, which may be a server, and its internal structure diagram may be as shown in fig. 10. The computer device includes a processor, a memory, a network interface, and a database connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device comprises a nonvolatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, a computer program, and a database. The internal memory provides an environment for the operation of an operating system and computer programs in the non-volatile storage medium. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program is executed by a processor to implement a method of authentication based on knowledge-graph and voiceprint recognition.
In one embodiment, a computer device is provided, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the processor implements the method for authentication based on knowledge graph and voiceprint recognition in the above embodiments.
In one embodiment, a computer-readable storage medium is provided, on which a computer program is stored, which, when executed by a processor, implements the method of authentication based on knowledge-graph and voiceprint recognition in the above embodiments.
It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by hardware instructions of a computer program, which can be stored in a non-volatile computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. Any reference to memory, storage, databases, or other media used in embodiments provided herein may include non-volatile and/or volatile memory. Non-volatile memory can include read-only memory (ROM), Programmable ROM (PROM), Electrically Programmable ROM (EPROM), Electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), Dynamic RAM (DRAM), Synchronous DRAM (SDRAM), Double Data Rate SDRAM (DDRSDRAM), Enhanced SDRAM (ESDRAM), Synchronous Link DRAM (SLDRAM), Rambus Direct RAM (RDRAM), direct bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM).
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-mentioned division of the functional units and modules is illustrated, and in practical applications, the above-mentioned function distribution may be performed by different functional units and modules according to needs, that is, the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-mentioned functions.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present invention, and not for limiting the same; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present invention, and are intended to be included within the scope of the present invention.

Claims (10)

1. An authentication method based on knowledge graph and voiceprint recognition is characterized by comprising the following steps:
receiving an identity verification instruction of a call user, and acquiring user information in the identity verification instruction;
determining a user identification code of the call user according to the user information, and acquiring user sample tone information and a user knowledge graph associated with the user identification code; the user knowledge graph comprises graph nodes with a tree structure;
inputting the user knowledge graph into a first authentication problem generation model, and acquiring a problem to be confirmed which is generated by the first authentication problem generation model and contains a ternary tree structure; the problem to be confirmed is generated by the first authentication problem generation model according to the graph nodes and the first node problem template;
inputting the question to be confirmed into a preset voice conversion model, acquiring authentication question voice converted by the voice conversion model through a voice synthesis technology, and broadcasting the authentication question voice to the call user;
receiving reply voice information of the call user aiming at the authentication question voice, inputting the reply voice information and the user sample voice information into a voiceprint recognition model, and simultaneously inputting the reply voice information and the question to be confirmed into a reply recognition model;
extracting voiceprint features in the reply voice message through the voiceprint recognition model, and obtaining a voiceprint matching result output by the voiceprint recognition model according to the voiceprint features; the voiceprint matching result refers to a confidence value of matching of the voiceprint features and the user sample tone information;
performing text recognition and intention recognition on the reply voice information through the reply recognition model to obtain a reply result, and recognizing the matching degree of the question to be confirmed and the reply result through the reply recognition model to obtain a reply comprehensive result;
and determining the identity authentication result of the call user in the current conversation according to the voiceprint matching result and the reply comprehensive result.
2. The method for authentication based on knowledge-graph and voiceprint recognition according to claim 1, wherein after determining the authentication result of the current session according to the confidence value and the answer integrated result, the method comprises:
inputting the question to be confirmed, the user knowledge graph and the reply result into a second authentication question generation model, and acquiring a next round of question to be confirmed generated by the second authentication question generation model through a knowledge decision method;
inputting a reply state result in the reply result and the next round of questions to be confirmed into a preset personalized voice conversion model, acquiring the next round of authentication question voice obtained by the personalized voice conversion model conversion through a voice synthesis technology, and broadcasting the next round of authentication question voice to the call user;
receiving the next round of reply voice information of the call user for the next round of authentication question voice reply, inputting the next round of reply voice information and the user sample tone information into the voiceprint recognition model, and simultaneously inputting the next round of reply voice information and the next round of question to be confirmed into the reply recognition model;
acquiring a next round voiceprint matching result output by the voiceprint recognition model according to the extracted voiceprint features in the next round of repeated voice information; the next round of voiceprint matching result refers to a confidence value of matching of voiceprint features in the next round of repeated voice information and the user sample voice information;
performing text recognition and intention recognition on the next round of reply voice information through the reply recognition model to obtain a next round of reply result, and recognizing the matching degree of the next round of questions to be confirmed and the next round of reply result through the reply recognition model to obtain a next round of reply comprehensive result;
determining the identity authentication result of the next round of conversation according to the voiceprint matching result of the next round and the reply comprehensive result of the next round;
when the preset number of conversations is detected to be completed, obtaining the identity authentication result of each conversation, and determining the final identity authentication result of the call user according to all the identity authentication results; the final identity authentication result refers to the identity authentication result of the call user determined through multiple rounds of conversations.
3. The method of claim 1, wherein the obtaining the user sample tone information and the user knowledge graph associated with the user identification code comprises:
acquiring user data associated with the user identification code;
converting structured data in the user data to obtain first data, and simultaneously performing text extraction on unstructured data in the user data to obtain second data;
and performing knowledge fusion and relation extraction on all the first data and all the second data to obtain map nodes, constructing a user knowledge map which is associated with the user identification code and contains the map nodes in a triple mode, and storing the user knowledge map in a block chain.
4. The method of claim 1, wherein the obtaining of the question to be validated generated by the first authentication question generation model and having a ternary tree structure comprises:
randomly acquiring a first layer of tree nodes from the user knowledge graph through the first authentication problem model; the user knowledge-graph comprises a plurality of the first-level tree nodes;
acquiring a node problem template corresponding to the node attribute in the first-layer tree node, and determining the acquired node problem template as the first node problem template;
combining the node attribute with the first node problem template through the first authentication problem model to generate the problem to be confirmed; the problem to be confirmed includes three child nodes, the child nodes being associated with a tree node in the user knowledge graph.
5. The method for authenticating based on knowledge graph and voiceprint recognition according to claim 1, wherein the extracting, by the voiceprint recognition model, the voiceprint feature in the reply voice message and obtaining the voiceprint matching result output by the voiceprint recognition model according to the voiceprint feature comprises:
acquiring a recognition result output by the voiceprint recognition model according to the extracted voiceprint features;
comparing and verifying the identification result and the user sample tone information to obtain a confidence value after comparison and verification;
and determining the voiceprint matching result according to the confidence value, wherein the voiceprint matching result represents the voiceprint matching degree between the reply voice information and the user sample voice information.
6. The method for authentication based on knowledge-graph and voiceprint recognition according to claim 4, wherein the text recognition and intention recognition are performed on the reply voice message through the reply recognition model to obtain a reply result, and then the matching value between the question to be confirmed and the reply result is recognized through the reply recognition model to obtain a reply integrated result, comprising:
according to a voice recognition technology, performing text recognition on the reply voice information through the reply text recognition model to obtain a reply text result; the reply recognition model comprises a reply text recognition model, a reply intention recognition model and a dialogue management module;
according to a natural language understanding technology, performing intention recognition on the reply voice information through the reply intention recognition model to obtain a reply state result;
determining the reply text result and the reply state result as the reply result through the dialogue management module;
matching the reply text with the child nodes in the question to be confirmed through the dialogue management module to obtain a matching value of the reply text and the question to be confirmed;
and obtaining the reply comprehensive result through the dialogue management module according to the matching value and the reply state result.
7. The method of claim 4, wherein the inputting the question to be confirmed, the user knowledge graph and the reply result into a second authentication question generation model, and obtaining a next round of question to be confirmed generated by the second authentication question generation model comprises:
inquiring the child nodes matched with the reply comprehensive result, and acquiring tree nodes associated with the child nodes;
searching a next-layer tree node which is the same as the tree node in the user knowledge graph, determining the searched next-layer tree node as the next-round graph node, acquiring a node attribute in the next-round graph node, and determining the node attribute as the next-round node attribute;
acquiring the node problem template corresponding to the next round of node attributes, and determining the acquired node problem template as a second node problem template;
and combining the attribute of the next round of nodes with the second node problem template through the second authentication problem model to generate the next round of problems to be confirmed.
8. An authentication apparatus based on knowledge-graph and voiceprint recognition, comprising:
the receiving module is used for receiving an identity authentication instruction of a call user and acquiring user information in the identity authentication instruction;
the acquisition module is used for determining the user identification code of the call user according to the user information and acquiring user sample tone information and a user knowledge graph associated with the user identification code; the user knowledge graph comprises graph nodes with a tree structure;
the generating module is used for inputting the user knowledge graph into a first authentication problem generating model and acquiring a problem to be confirmed which is generated by the first authentication problem generating model and contains a ternary tree structure; the problem to be confirmed is generated by the first authentication problem generation model according to the graph nodes and the first node problem template;
the conversion module is used for inputting the question to be confirmed into a preset voice conversion model, acquiring the authentication question voice converted by the voice conversion model through a voice synthesis technology, and broadcasting the authentication question voice to the call user;
the input module is used for receiving reply voice information of the call user aiming at the authentication question voice, inputting the reply voice information and the user sample voice information into a voiceprint recognition model, and simultaneously inputting the reply voice information and the question to be confirmed into a reply recognition model;
the extracting module is used for extracting the voiceprint features in the reply voice message through the voiceprint recognition model and obtaining a voiceprint matching result output by the voiceprint recognition model according to the voiceprint features; the voiceprint matching result refers to a confidence value of matching of the voiceprint features and the user sample tone information;
the recognition module is used for performing text recognition and intention recognition on the reply voice information through the reply recognition model to obtain a reply result, and recognizing the matching degree of the question to be confirmed and the reply result through the reply recognition model to obtain a reply comprehensive result;
and the authentication module is used for determining the identity authentication result of the call user in the current conversation according to the voiceprint matching result and the reply comprehensive result.
9. A computer device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the method of authentication based on knowledge-graph and voiceprint recognition according to any one of claims 1 to 7 when executing the computer program.
10. A computer-readable storage medium, in which a computer program is stored, which, when being executed by a processor, carries out a method of authentication based on a knowledge graph and voiceprint recognition according to any one of claims 1 to 7.
CN202010723015.6A 2020-07-24 2020-07-24 Authentication method, device, equipment and medium based on knowledge graph and voiceprint recognition Active CN111883140B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010723015.6A CN111883140B (en) 2020-07-24 2020-07-24 Authentication method, device, equipment and medium based on knowledge graph and voiceprint recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010723015.6A CN111883140B (en) 2020-07-24 2020-07-24 Authentication method, device, equipment and medium based on knowledge graph and voiceprint recognition

Publications (2)

Publication Number Publication Date
CN111883140A true CN111883140A (en) 2020-11-03
CN111883140B CN111883140B (en) 2023-07-21

Family

ID=73201347

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010723015.6A Active CN111883140B (en) 2020-07-24 2020-07-24 Authentication method, device, equipment and medium based on knowledge graph and voiceprint recognition

Country Status (1)

Country Link
CN (1) CN111883140B (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112530438A (en) * 2020-11-27 2021-03-19 贵州电网有限责任公司 Identity authentication method based on knowledge graph assisted voiceprint recognition
CN112951215A (en) * 2021-04-27 2021-06-11 平安科技(深圳)有限公司 Intelligent voice customer service answering method and device and computer equipment
CN113314125A (en) * 2021-05-28 2021-08-27 深圳市展拓电子技术有限公司 Voiceprint identification method, system and memory for monitoring room interphone
CN113781059A (en) * 2021-11-12 2021-12-10 百融至信(北京)征信有限公司 Identity authentication anti-fraud method and system based on intelligent voice
CN114363277A (en) * 2020-12-31 2022-04-15 万翼科技有限公司 Intelligent chatting method and device based on social relationship and related products
CN114615062A (en) * 2022-03-14 2022-06-10 河南应用技术职业学院 Computer network engineering safety control system
US11405180B2 (en) * 2019-01-15 2022-08-02 Fisher-Rosemount Systems, Inc. Blockchain-based automation architecture cybersecurity
US11960473B2 (en) 2019-01-15 2024-04-16 Fisher-Rosemount Systems, Inc. Distributed ledgers in process control systems

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5897616A (en) * 1997-06-11 1999-04-27 International Business Machines Corporation Apparatus and methods for speaker verification/identification/classification employing non-acoustic and/or acoustic models and databases
CN102457845A (en) * 2010-10-14 2012-05-16 阿里巴巴集团控股有限公司 Method, equipment and system for authenticating identity by wireless service
CN102737634A (en) * 2012-05-29 2012-10-17 百度在线网络技术(北京)有限公司 Authentication method and device based on voice
WO2016015687A1 (en) * 2014-07-31 2016-02-04 腾讯科技(深圳)有限公司 Voiceprint verification method and device
CN110134795A (en) * 2019-04-17 2019-08-16 深圳壹账通智能科技有限公司 Generate method, apparatus, computer equipment and the storage medium of validation problem group
CN110164455A (en) * 2018-02-14 2019-08-23 阿里巴巴集团控股有限公司 Device, method and the storage medium of user identity identification
CN110931016A (en) * 2019-11-15 2020-03-27 深圳供电局有限公司 Voice recognition method and system for offline quality inspection
CN111046133A (en) * 2019-10-29 2020-04-21 平安科技(深圳)有限公司 Question-answering method, question-answering equipment, storage medium and device based on atlas knowledge base

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5897616A (en) * 1997-06-11 1999-04-27 International Business Machines Corporation Apparatus and methods for speaker verification/identification/classification employing non-acoustic and/or acoustic models and databases
CN102457845A (en) * 2010-10-14 2012-05-16 阿里巴巴集团控股有限公司 Method, equipment and system for authenticating identity by wireless service
CN102737634A (en) * 2012-05-29 2012-10-17 百度在线网络技术(北京)有限公司 Authentication method and device based on voice
WO2016015687A1 (en) * 2014-07-31 2016-02-04 腾讯科技(深圳)有限公司 Voiceprint verification method and device
CN110164455A (en) * 2018-02-14 2019-08-23 阿里巴巴集团控股有限公司 Device, method and the storage medium of user identity identification
CN110134795A (en) * 2019-04-17 2019-08-16 深圳壹账通智能科技有限公司 Generate method, apparatus, computer equipment and the storage medium of validation problem group
CN111046133A (en) * 2019-10-29 2020-04-21 平安科技(深圳)有限公司 Question-answering method, question-answering equipment, storage medium and device based on atlas knowledge base
CN110931016A (en) * 2019-11-15 2020-03-27 深圳供电局有限公司 Voice recognition method and system for offline quality inspection

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11405180B2 (en) * 2019-01-15 2022-08-02 Fisher-Rosemount Systems, Inc. Blockchain-based automation architecture cybersecurity
US11960473B2 (en) 2019-01-15 2024-04-16 Fisher-Rosemount Systems, Inc. Distributed ledgers in process control systems
CN112530438A (en) * 2020-11-27 2021-03-19 贵州电网有限责任公司 Identity authentication method based on knowledge graph assisted voiceprint recognition
CN112530438B (en) * 2020-11-27 2023-04-07 贵州电网有限责任公司 Identity authentication method based on knowledge graph assisted voiceprint recognition
CN114363277A (en) * 2020-12-31 2022-04-15 万翼科技有限公司 Intelligent chatting method and device based on social relationship and related products
CN112951215A (en) * 2021-04-27 2021-06-11 平安科技(深圳)有限公司 Intelligent voice customer service answering method and device and computer equipment
CN113314125A (en) * 2021-05-28 2021-08-27 深圳市展拓电子技术有限公司 Voiceprint identification method, system and memory for monitoring room interphone
CN113781059A (en) * 2021-11-12 2021-12-10 百融至信(北京)征信有限公司 Identity authentication anti-fraud method and system based on intelligent voice
CN114615062A (en) * 2022-03-14 2022-06-10 河南应用技术职业学院 Computer network engineering safety control system

Also Published As

Publication number Publication date
CN111883140B (en) 2023-07-21

Similar Documents

Publication Publication Date Title
CN111883140B (en) Authentication method, device, equipment and medium based on knowledge graph and voiceprint recognition
CN111858892B (en) Voice interaction method, device, equipment and medium based on knowledge graph
US10832686B2 (en) Method and apparatus for pushing information
KR101963993B1 (en) Identification system and method with self-learning function based on dynamic password voice
CN107395352B (en) Personal identification method and device based on vocal print
US10777207B2 (en) Method and apparatus for verifying information
CN106373575B (en) User voiceprint model construction method, device and system
Mukhopadhyay et al. All your voices are belong to us: Stealing voices to fool humans and machines
US8812319B2 (en) Dynamic pass phrase security system (DPSS)
US9979721B2 (en) Method, server, client and system for verifying verification codes
US11127399B2 (en) Method and apparatus for pushing information
CN112256825A (en) Medical field multi-turn dialogue intelligent question-answering method and device and computer equipment
CN109428719A (en) A kind of auth method, device and equipment
CN113724695B (en) Electronic medical record generation method, device, equipment and medium based on artificial intelligence
CN109462482B (en) Voiceprint recognition method, voiceprint recognition device, electronic equipment and computer readable storage medium
JP7123871B2 (en) Identity authentication method, identity authentication device, electronic device and computer-readable storage medium
CN112468659B (en) Quality evaluation method, device, equipment and storage medium applied to telephone customer service
CN112632248A (en) Question answering method, device, computer equipment and storage medium
CN113873088B (en) Interactive method and device for voice call, computer equipment and storage medium
WO2020057014A1 (en) Dialogue analysis and evaluation method and apparatus, computer device and storage medium
CN112712793A (en) ASR (error correction) method based on pre-training model under voice interaction and related equipment
CN111933117A (en) Voice verification method and device, storage medium and electronic device
CN111785280A (en) Identity authentication method and device, storage medium and electronic equipment
CN114095883B (en) Fixed telephone terminal communication method, device, computer equipment and storage medium
CN115952482B (en) Medical equipment data management system and method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant