CN114596604A - Identity verification method, device, equipment, storage medium and computer program product - Google Patents

Identity verification method, device, equipment, storage medium and computer program product Download PDF

Info

Publication number
CN114596604A
CN114596604A CN202011421028.4A CN202011421028A CN114596604A CN 114596604 A CN114596604 A CN 114596604A CN 202011421028 A CN202011421028 A CN 202011421028A CN 114596604 A CN114596604 A CN 114596604A
Authority
CN
China
Prior art keywords
target user
video
preset
identity verification
question
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011421028.4A
Other languages
Chinese (zh)
Inventor
周楠楠
于夕畔
汤耀华
杨海军
徐倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202011421028.4A priority Critical patent/CN114596604A/en
Publication of CN114596604A publication Critical patent/CN114596604A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/213Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The application discloses an identity verification method, an identity verification device, identity verification equipment, a storage medium and a computer program product, wherein the identity verification method comprises the following steps: carrying out video call with a target user to acquire a target face image of the target user; if the target face image is consistent with the preset face image, acquiring a first video clip acquired when a first video question and answer is carried out with a target user in the video call process; determining a first trustworthiness of the target user based on the first video segment; and dynamically determining question and answer execution texts corresponding to different rounds of question and answer based on the first credibility to obtain the identity verification result of the target user. In the application, whether the user lies or not is determined based on the obtained credibility in a video mode, so that the identity of the user is verified in multiple dimensions, and the verification accuracy is improved.

Description

Identity verification method, device, equipment, storage medium and computer program product
Technical Field
The present application relates to the field of artificial intelligence, and in particular, to a method, an apparatus, a device, a storage medium, and a computer program product for identity verification.
Background
With the continuous development of financial technologies, especially internet technology and finance, more and more technologies (such as distributed, Blockchain, artificial intelligence, etc.) are applied to the financial field, but the financial industry also puts higher requirements on the technologies, such as higher requirements on identity verification of the financial industry.
In the financial industry, especially the banking industry, generally, a robot is required to verify the identity of an applicant (user) through a video mode, and a loan is issued to the user after the verification is passed.
Disclosure of Invention
The present application mainly aims to provide an identity verification method, apparatus, device, storage medium and computer program product, and aims to solve the technical problem in the prior art that the accuracy of user identity verification is low.
To achieve the above object, the present application provides an identity verification method, including:
carrying out video call with a target user to obtain a target face image of the target user;
if the target face image is consistent with the preset face image, acquiring a first video clip acquired when a first video question and answer is carried out with a target user in the video call process;
determining a first trustworthiness of the target user based on the first video segment;
and dynamically determining question and answer execution texts corresponding to different rounds of question and answer based on the first credibility to obtain the identity verification result of the target user.
Optionally, the step of determining a first credibility of the target user based on the first video segment includes:
acquiring first mood information and first pace information when a first video question and answer is carried out with the target user;
and determining a first credibility of the target user based on the first video clip, the first mood information and the first pace information.
Optionally, the step of determining a first confidence level of the target user based on the first video segment, the first mood information, and the first pace information includes:
inputting the first video clip, the first mood information and the first pace information into a preset confidence coefficient judgment model;
the preset confidence coefficient distinguishing model is obtained by performing iterative training on a preset model to be trained based on a preset training data set;
and judging the first video segment, the first mood information and the first pace information based on the preset confidence coefficient judging model to obtain the first confidence coefficient of the target user.
Optionally, the step of performing a discrimination processing on the first video segment, the first mood information, and the first pace information based on the preset confidence discrimination model to obtain a first confidence level of the target user includes:
vectorizing the first video segment to obtain a first video feature vector of the first video segment;
vectorizing the first mood information and the first pace information respectively to obtain a first mood characteristic vector and a first pace characteristic vector;
splicing the first video feature vector, the first tone feature vector and the first speech speed feature vector to obtain a first spliced vector;
and judging the first splicing vector based on the full connection layer of the preset confidence judgment model to obtain the first confidence of the target user.
Optionally, before the step of performing a discrimination processing on the first stitching vector based on the full-link layer of the preset confidence discrimination model to obtain the first confidence of the target user, the method includes:
acquiring a preset training data set, wherein the preset training data set corresponds to a preset type label;
inputting the preset training data set into the preset model to be trained to judge the type of the training sentence to obtain a prediction label;
calculating a model error based on the predicted tag and the preset type tag;
and updating the preset model to be trained based on the model error until the preset model to be trained meets a preset updating end condition, and taking the preset model to be trained as the preset confidence coefficient judgment model.
Optionally, the step of obtaining a preset training data set includes:
collecting a historical video data set, and determining a second video segment with credible and non-credible labels in the historical video data set;
performing picture dispersion, feature vector extraction and feature vector splicing on the second video clip to obtain a second video feature vector;
acquiring second mood information and second speech rate information corresponding to the second video clip, and performing vectorization processing on the second mood information and the second speech rate information to obtain a second mood feature vector and a second speech rate feature vector;
splicing the second video feature vector, the second mood feature vector and the second pace feature vector to obtain a second spliced vector;
and obtaining the preset training data set based on the second splicing vector.
Optionally, the step of dynamically determining the question-answer execution texts corresponding to different rounds of question-answer based on the size of the first confidence level to obtain the identity verification result of the target user includes:
if the first confidence level is within a first preset confidence level interval, acquiring a second video clip acquired when a second video question and answer is carried out with a target user in the video call process;
determining a second trustworthiness of the target user based on the second video segment;
and dynamically determining sub-question-answer execution texts corresponding to subsequent different rounds of question-answers based on the second credibility, and continuously asking questions of the target user based on the sub-question-answer execution texts to obtain the identity verification result of the target user.
Optionally, the step of dynamically determining, based on the magnitude of the first degree of reliability, a question-answer execution text corresponding to different rounds of question-answers to obtain an identity verification result of the target user includes:
if the first confidence level is in a second preset confidence level interval, performing preset manual label processing on the target user;
performing manual review on the target user based on the preset label to obtain a manual review result;
and determining whether to continue to perform video question answering on the target user or not based on the manual checking result so as to obtain an identity verification result of the target user.
Optionally, the step of acquiring a first video segment obtained when a first video question and answer is performed with a target user in a video call process if the target face image is consistent with a preset face image includes:
if the target face image is consistent with a preset face image, asking the target user;
and determining whether the answer of the target user is correct, and acquiring a first video segment acquired when a first video question and answer is carried out with the target user in the video call process if the answer of the target user is correct.
The present application further provides an identity verification apparatus, comprising:
the first acquisition module is used for carrying out video call with a target user so as to acquire a target face image of the target user;
the second acquisition module is used for acquiring a first video clip acquired when a first video question and answer is carried out with a target user in the video call process if the target face image is consistent with the preset face image;
a first determining module for determining a first trustworthiness of the target user based on the first video segment;
and the second determining module is used for dynamically determining question and answer execution texts corresponding to different rounds of question and answer based on the first credibility so as to obtain the identity verification result of the target user.
Optionally, the first determining module includes:
the first obtaining unit is used for obtaining first mood information and first pace information when the target user carries out first video question answering;
and the first determining unit is used for determining the first credibility of the target user based on the first video clip, the first mood information and the first pace information.
Optionally, the first determining unit includes:
the input subunit is used for inputting the first video segment, the first mood information and the first pace information into a preset confidence coefficient judgment model;
the preset confidence coefficient distinguishing model is obtained by performing iterative training on a preset model to be trained based on a preset training data set;
and the judging subunit is used for judging the first video segment, the first mood information and the first pace information based on the preset confidence coefficient judging model to obtain the first confidence coefficient of the target user.
Optionally, the judging subunit is configured to implement:
vectorizing the first video segment to obtain a first video feature vector of the first video segment;
vectorizing the first mood information and the first pace information respectively to obtain a first mood characteristic vector and a first pace characteristic vector;
splicing the first video feature vector, the first tone feature vector and the first speech speed feature vector to obtain a first spliced vector;
and judging the first splicing vector based on the full connection layer of the preset confidence judgment model to obtain the first confidence of the target user.
Optionally, the apparatus further comprises:
the third acquisition module is used for acquiring a preset training data set, wherein the preset training data set corresponds to a preset type label;
the input module is used for inputting the preset training data set into the preset model to be trained so as to judge the type of the training sentence and obtain a prediction label;
the calculation module is used for calculating a model error based on the prediction label and the preset type label;
and the updating module is used for updating the preset model to be trained based on the model error until the preset model to be trained meets a preset updating end condition, and taking the preset model to be trained as the preset confidence coefficient judging model.
Optionally, the third obtaining module includes:
the acquisition unit is used for acquiring a historical video data set and determining a second video segment with credible and credible labels in the historical video data set;
the picture discrete unit is used for carrying out picture discrete, feature vector extraction and feature vector splicing processing on the second video clip to obtain a second video feature vector;
a second obtaining unit, configured to obtain second mood information and second pace information corresponding to the second video segment, and perform vectorization processing on the second mood information and the second pace information to obtain a second mood feature vector and a second pace feature vector;
the splicing unit is used for splicing the second video feature vector, the second mood feature vector and the second pace feature vector to obtain a second spliced vector;
and the third obtaining unit is used for obtaining the preset training data set based on the second splicing vector.
Optionally, the second determining module further includes:
the fourth obtaining unit is used for obtaining a second video clip acquired when a second video question and answer is carried out with the target user in the video call process if the first confidence level is in a first preset confidence level interval;
a second determining unit, configured to determine a second credibility of the target user based on the second video segment;
and the third determining unit is used for dynamically determining sub-question-answer execution texts corresponding to the questions and answers of different subsequent rounds based on the second credibility, and continuously asking questions of the target user based on the sub-question-answer execution texts to obtain the identity verification result of the target user.
Optionally, the second determining module further includes:
the tag unit is used for performing preset manual tag processing on the target user if the first confidence level is in a second preset confidence level interval;
the manual auditing unit is used for performing manual auditing on the target user based on the preset label to obtain a manual auditing result;
and the fourth determining unit is used for determining whether to continue to perform video question answering on the target user or not based on the manual auditing result so as to obtain the identity verification result of the target user.
Optionally, the first obtaining module includes:
the question asking unit is used for asking the target user if the target face image is consistent with a preset face image;
and the fifth determining unit is used for determining whether the answer of the target user is correct or not, and acquiring a first video clip acquired when a first video question and answer is carried out with the target user in the video call process if the answer of the target user is correct.
The present application further provides an identity verification apparatus, which is an entity apparatus, the identity verification apparatus including: a memory, a processor and a program of the identity verification method stored on the memory and executable on the processor, which program, when executed by the processor, is operable to carry out the steps of the identity verification method as described above.
The present application also provides a readable storage medium having stored thereon a program for implementing an identity verification method, which when executed by a processor, implements the steps of the identity verification method as described above.
The present application also provides a computer program product, comprising a computer program which, when executed by a processor, performs the steps of the identity verification method described above.
Compared with the prior art that the accuracy of user identity verification is low because the user identity verification result is obtained by performing face recognition in a video mode, the method, the device, the equipment, the storage medium and the computer program product are used for obtaining the target face image of the target user by performing video call with the target user; if the target face image is consistent with the preset face image, acquiring a first video clip acquired when a first video question and answer is carried out with a target user in the video call process; determining a first trustworthiness of the target user based on the first video segment; and dynamically determining question and answer execution texts corresponding to different rounds of question and answer based on the first credibility to obtain the identity verification result of the target user. In the method, after face recognition is carried out in a video mode, when a first video question and answer with a target user is obtained in a video call process, the obtained first video segment is collected, the first credibility of the target user is determined based on the first video segment, and a question and answer execution text corresponding to the question and answer in different rounds is dynamically determined based on the first credibility so as to obtain an identity verification result of the target user.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive labor.
FIG. 1 is a schematic flow chart illustrating a first embodiment of an identity verification method according to the present application;
FIG. 2 is a flowchart illustrating a detailed process of step S30 in the first embodiment of the identity verification method of the present application;
fig. 3 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present application.
The objectives, features, and advantages of the present application will be further described with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
In a first embodiment of the identity verification method of the present application, referring to fig. 1, the identity verification method includes:
step S10, making a video call with a target user to obtain a target face image of the target user;
step S20, if the target face image is consistent with a preset face image, acquiring a first video clip acquired when a first video question and answer is carried out with a target user in the video call process;
step S30, determining a first credibility of the target user based on the first video segment;
step S40, based on the first confidence level, dynamically determining question-answer execution texts corresponding to different rounds of question answers, so as to obtain an identity verification result of the target user.
The method comprises the following specific steps:
step S10, carrying out video call with a target user to obtain a target face image of the target user;
in this embodiment, it should be noted that the identity verification method is applied to identity verification equipment, which may be a core machine, and in addition, the identity verification equipment may also be an all-in-one machine, where the all-in-one machine refers to a whole body in or among different financial institutions, or a whole body in mutual connection, such as a whole body in mutual communication in a bank institution, or a whole body in mutual communication in different bank institutions, such as a whole body including a china micro-mass bank ATM machine, a china people bank ATM machine, and the like, or an all-in-one machine including a china micro-mass bank ATM machine in shenzhen region, a china micro-mass bank ATM machine in shanghai region, and the like. The all-in-one machine or the core machine includes a camera, an OCR recognition device, and the like.
In this embodiment, when a video call instruction is detected on the identity verification device, the video call may be performed with the target user to obtain the target face image of the target user, or when an application instruction is detected on the identity verification device, the video call may be performed with the target user to obtain the target face image of the target user, specifically, when an application instruction is detected on the identity verification device, the video call may be performed with the target user to obtain the target face image of the target user, where the step of:
when a petty loan application instruction is detected on identity verification equipment, carrying out video call with a target user to obtain a target face image of the target user;
and when a rent loan application instruction is detected on the identity verification equipment, carrying out video call with a target user to obtain a target face image of the target user.
And carrying out video call with a target user to acquire a target face image of the target user, and continuously recording the video call while acquiring the target face image of the target user so as to acquire an acquired video segment.
And after the target face image of the target user is obtained, comparing the target face image with a preset face image, wherein the preset face image is from a credit investigation system or a public security system and other mechanisms with public trust, so as to accurately determine whether the target user is the applicant.
Step S20, if the target face image is consistent with the preset face image, acquiring a first video clip acquired when a first video question and answer is carried out with a target user in the video call process;
if the target face image is inconsistent with the preset face image, the conversation is ended, and the target user is determined not to be the applicant, so that the possibility of fraud exists, and subsequent identification processing is not performed. If the target face image is consistent with the preset face image, namely when the target user is the applicant, acquiring a first video clip acquired when a first video question and answer is carried out with the target user in the video call process; fraud, the first video question answer may be a question answer of a name, an identification number, etc. Specifically, in the embodiment, the video image information includes information such as expressions and facial textures, the tone information includes states such as tone low, tone level and the like, and the tone information includes contents such as fast, slow and moderate speech speed.
If the target face image is consistent with the preset face image, acquiring a first video segment acquired when a first video question and answer is carried out with a target user in the video call process, wherein the step comprises the following steps of:
step S21, if the target face image is consistent with a preset face image, the target user is asked;
step S22, determining whether the answer of the target user is correct, and if the answer of the target user is correct, acquiring a first video clip acquired when a first video question and answer is made with the target user in the video call process.
In this embodiment, before the first video segment is acquired, the target user is further asked, and it is determined whether the answer of the target user is correct, and before the first video segment is acquired, it is determined whether the answer of the target user is correct so as to: the method includes the steps of avoiding waste of resources caused by collection of a first video segment when an answer is incorrect, namely, if the answer of a target user is correct, obtaining the first video segment obtained when a first video question and answer is carried out with the target user in a video call process, if the answer of the target user is incorrect, not executing the step of obtaining the first video segment obtained when the first video question and answer is carried out with the target user in the video call process, directly ending a conversation, and determining that the target user is possible to cheat.
Step S30, determining a first credibility of the target user based on the first video segment;
and determining a first credibility of the target user based on the first video segment, wherein the credibility refers to a credibility probability, and whether the target user lies or not can be judged through the credibility probability, for example, when the credibility is greater than the highest preset credibility, the target user is determined not to lie, and when the credibility is less than the lowest preset credibility, the target user is determined to lie.
Referring to fig. 2, the step of determining a first confidence level of the target user based on the first video segment includes:
step S31, acquiring first mood information and first pace information when performing first video question answering with the target user;
step S32, determining a first confidence level of the target user based on the first video segment, the first mood information, and the first pace information.
The method comprises the steps of obtaining first tone information and first speech rate information when a first video question and answer is carried out with a target user, specifically, obtaining the first tone information when the first video question and answer is carried out with the target user through a preset tone extraction model, obtaining the first speech rate information when the first video question and answer is carried out with the target user through the preset speech rate extraction model, and determining a first credibility of the target user based on a first video clip, the first tone information and the first speech rate information, specifically, determining the first credibility of the target user based on image information, the first speech rate information and the first speech rate information in the first video clip.
The step of determining a first confidence level of the target user based on the first video segment, the first mood information, and the first pace information includes:
step S321, inputting the first video segment, the first mood information and the first pace information into a preset confidence coefficient judgment model;
the preset confidence coefficient distinguishing model is obtained by performing iterative training on a preset model to be trained based on a preset training data set;
step S322, performing a discrimination process on the first video segment, the first mood information, and the first pace information based on the preset confidence level discrimination model to obtain a first confidence level of the target user.
In this embodiment, the first video segment, the first mood information, and the first pace information are input into a preset confidence level discrimination model, and specifically, the first video image, the first mood information, and the first pace information in the first video segment are input into the preset confidence level discrimination model, and the first video image, the first mood information, and the first pace information are discriminated based on the preset confidence level discrimination model to obtain the first confidence level of the target user.
The preset confidence coefficient distinguishing model is obtained by performing iterative training on a preset model to be trained based on a preset training data set;
that is, before the preset confidence coefficient discrimination model is obtained, a preset training data set and a preset model to be trained are obtained, and the preset model to be trained is subjected to iterative training based on the preset training data set to obtain the preset confidence coefficient discrimination model.
The step of performing discrimination processing on the first video segment, the first mood information and the first pace information based on the preset confidence level discrimination model to obtain the first confidence level of the target user includes:
step A1, performing vectorization processing on the first video segment to obtain a first video feature vector of the first video segment;
in this embodiment, first, vectorization processing is performed on the first video segment to obtain a first video feature vector of the first video segment, specifically, vectorization processing is performed on the first video segment through a preset convolutional neural network model to obtain the first video feature vector of the first video segment, and before the vectorization processing is performed on the first video segment, the first video segment is discretized into a picture of one frame. In this embodiment, feature extraction is performed on a picture of one frame through a preset convolutional neural network model to obtain a picture feature vector, and the extracted picture feature vectors are fused through means of averaging or splicing to obtain a first video feature vector a.
Step A2, performing vectorization processing on the first mood information and the first pace information respectively to obtain a first mood feature vector and a first pace feature vector;
the first mood information and the first pace information are respectively subjected to vectorization processing to obtain a first mood feature vector and a first pace feature vector, and specifically, the first mood information and the first pace information can be subjected to vectorization by adopting a one-hot encoding mode to obtain a first mood feature vector B and a first pace feature vector C.
Step A3, splicing the first video feature vector, the first mood feature vector and the first pace feature vector to obtain a first spliced vector;
the first video feature vector is spliced, and the first mood feature vector and the first pace feature vector are obtained as a first spliced vector, where the splicing manner may be fusion, for example, if the first video feature vector a is (a1, a2, a3), if the first mood feature vector B is (B1, B2, B3), and if the first pace feature vector C is (C1, C2, C3), the first spliced vector is (a1, a2, a3, B1, B2, B3, C1, C2, C3).
And A4, performing discrimination processing on the first splicing vector based on the full-connection layer of the preset confidence discrimination model to obtain the first confidence of the target user.
After the feature vectors A, B and C are spliced, the first spliced vector is subjected to discrimination processing based on the full-connection layer of the preset confidence discrimination model to obtain the first confidence of the target user, namely the spliced feature vectors are sequentially input into a full-connection neural network and a softmax function in the preset confidence discrimination model to obtain a prediction result of the model, and the prediction result of the model is used as the first confidence of the target user.
Step S40, dynamically determining question-answer execution texts corresponding to different rounds of question answers based on the first confidence level, so as to obtain the identity verification result of the target user.
And dynamically determining question and answer execution texts corresponding to different rounds of questions and answers based on the first credibility to obtain the identity verification result of the target user, wherein if the question and answer execution texts are used for ending the conversation based on the first credibility, the identity verification result of the target user is verified to be not passed, and if the question and answer execution texts are used for continuing the question and answer based on the first credibility, the identity verification result of the target user is obtained based on the question and answer of the next round.
Wherein the step of dynamically determining question-answer execution texts corresponding to different rounds of question-answers based on the first confidence level to obtain the identity verification result of the target user comprises:
acquiring a preset configuration file, wherein the preset configuration file stores a first credibility, a mapping relation between a question-answering turn and a corresponding dialogue execution text;
in this embodiment, another way of determining a dialog execution text is provided, and specifically, the dialog execution text may be determined through a preset configuration file, where the preset configuration file stores a mapping relationship between a first credibility and a turn of question and answer and a corresponding dialog execution text.
And dynamically determining a conversation execution text of the next round according to the preset configuration file and the first credibility of the current round, and determining the identity verification result of the applicant based on the dynamically determined conversation execution text.
Compared with the prior art that the accuracy of user identity verification is low because the user identity verification result is obtained by performing face recognition in a video mode, the method, the device, the equipment, the storage medium and the computer program product are used for obtaining the target face image of the target user by performing video call with the target user; if the target face image is consistent with the preset face image, acquiring a first video clip acquired when a first video question and answer is carried out with a target user in the video call process; determining a first trustworthiness of the target user based on the first video segment; and dynamically determining question and answer execution texts corresponding to different rounds of question and answer based on the first credibility to obtain the identity verification result of the target user. In the method, after face recognition is carried out in a video mode, when a first video question and answer with a target user is obtained in a video call process, the obtained first video segment is collected, the first credibility of the target user is determined based on the first video segment, and a question and answer execution text corresponding to the question and answer in different rounds is dynamically determined based on the first credibility so as to obtain an identity verification result of the target user.
Further, based on the first embodiment in the present application, in another embodiment of the present application, before the step of performing a discrimination process on the first stitching vector based on the full-connected layer of the preset confidence discrimination model to obtain the first confidence of the target user, the method includes:
step B1, acquiring a preset training data set, wherein the preset training data set corresponds to a preset type label;
in this embodiment, it should be noted that the preset type labels are pre-labeled lie (credible) and non-lie (untrustworthy) identifiers, and the preset model to be trained is a non-trained preset confidence degree discrimination model.
The method comprises the steps of obtaining a preset training data set and a preset model to be trained, wherein the preset training data set corresponds to a preset type label, specifically, obtaining the preset training data set and the preset model to be trained, manually labeling the preset training data set, obtaining a manually labeled preset training data set, and wherein the preset training data set can be expanded.
Step B2, inputting the preset training data set into the preset model to be trained so as to judge the type of the training sentence and obtain a prediction label;
inputting the preset training data set into the preset model to be trained to perform type discrimination on the training sentences to obtain prediction labels, specifically, vectorizing the training sentences based on a vectorization network in the preset model to be trained to obtain spliced vectors, performing probability prediction on the spliced vectors based on a full-connection network in the preset model to be trained to obtain probability scores, and further obtaining the prediction labels based on the probability scores.
Step B3, calculating a model error based on the prediction label and the preset type label;
in this embodiment, based on the predicted tag and the preset type tag, a model error is calculated, specifically, a distance between the predicted tag and the preset type tag is calculated, and a model error is obtained.
And step B4, updating the preset model to be trained based on the model error until the preset model to be trained meets a preset updating end condition, and taking the preset model to be trained as the preset confidence coefficient judgment model.
In this embodiment, the preset model to be trained is updated based on the model error until the preset model to be trained satisfies the preset update end condition, so as to obtain a preset confidence discrimination model, specifically, gradient information is calculated based on the model error, and by means of back propagation, updating the model parameters of the preset model to be trained according to the gradient information to obtain an updated preset model to be trained, and further judging whether the updated preset model to be trained meets the preset updating end condition, if so, taking the updated preset model to be trained as the preset confidence coefficient judgment model, if not, the training and updating are carried out again until the updated preset model to be trained meets the preset updating end condition, the preset updating end condition comprises reaching of maximum iteration times, convergence of a loss function and the like.
The step of obtaining the preset training data set comprises:
step C1, collecting a historical video data set, and determining a second video segment with credible and credible labels in the historical video data set;
in this embodiment, a specific process of obtaining a preset training data set with a preset type of tag is performed, that is, a historical video data set is first collected, and a second video segment with trusted and untrusted tags in the historical video data set is determined.
Step C2, performing picture dispersion, feature vector extraction and feature vector splicing processing on the second video clip to obtain a second video feature vector;
step C3, acquiring second mood information and second pace information corresponding to the second video segment, and performing vectorization processing on the second mood information and the second pace information to obtain a second mood feature vector and a second pace feature vector;
step C4, splicing the second video feature vector, the second mood feature vector and the second pace feature vector to obtain a second spliced vector;
in this embodiment, the second video segment is subjected to picture discretization, feature vector extraction, and feature vector splicing to obtain a second video feature vector, the second mood information and the second pace information corresponding to the second video segment are obtained, and the second mood information and the second pace information are subjected to vectorization to obtain a second mood feature vector and a second pace feature vector.
And step C5, obtaining the preset training data set based on the second splicing vector.
And obtaining the preset training data set based on the second splicing vector, namely obtaining the preset training data set based on the second splicing vector with the label.
Or in this embodiment, the second video segment with the credible and untrustworthy labels in the historical video data set may also be directly used as a preset training data set.
In this embodiment, a preset training data set is obtained, where the preset training data set corresponds to a preset type tag; inputting the preset training data set into the preset model to be trained to judge the type of the training sentence to obtain a prediction label; calculating a model error based on the predicted tag and the preset type tag; and updating the preset model to be trained based on the model error until the preset model to be trained meets a preset updating end condition, and taking the preset model to be trained as the preset confidence coefficient judgment model. In this embodiment, the preset confidence level discrimination model is accurately obtained.
Further, based on the first embodiment and the second embodiment in this application, in another embodiment of this application, the question-answer execution text includes a sub-question-answer execution text, and the step of dynamically determining question-answer execution texts corresponding to different rounds of question-answers based on the size of the first credibility to obtain the identity verification result of the target user includes:
step S41, if the first confidence level is in a first preset confidence level interval, acquiring a second video clip acquired when a second video question and answer is carried out with a target user in the video call process;
step S42, determining a second credibility of the target user based on the second video segment;
in this embodiment, if the first confidence level is in a first preset confidence level interval, for example, in an interval (first preset confidence level interval) greater than 0.9, when a second video question and answer with the target user is obtained in the video call process, the obtained second video clip is acquired, that is, the question and answer are continuously performed on the target user to obtain the second video clip, so as to determine the second confidence level of the target user.
Step S43, based on the second confidence level, dynamically determining a sub-question-answer execution text corresponding to the question and answer in different subsequent rounds, and continuously asking the question of the target user based on the sub-question-answer execution text to obtain the identity verification result of the target user.
And dynamically determining whether the question-answer execution text corresponding to the subsequent different rounds of question-answers is a question-answer continuation or a question-answer ending text based on the second credibility, and continuously asking the target user based on the question-answer execution text to obtain the identity verification result of the target user.
The step of dynamically determining question-answer execution texts corresponding to different rounds of question-answers based on the first credibility to obtain the identity verification result of the target user comprises:
step S44, if the first confidence level is in a second preset confidence level interval, performing preset manual label processing on the target user;
in this embodiment, if the first confidence level is in a second preset confidence level interval (0.5 to 0.9), the target user is subjected to a preset manual tagging process.
Step S45, based on the preset label, carrying out manual review on the target user to obtain a manual review result;
and performing manual review on the target user based on the preset label to obtain a manual review result, wherein the manual review result is that the review is passed or the review is not passed.
Step S46, determining whether to continue performing video question answering on the target user based on the manual review result, so as to obtain an identity verification result of the target user.
And determining to continue to perform video question and answer on the target user based on the manual examination result if the manual examination result passes the examination so as to obtain an identity verification result of the target user, and determining not to continue to perform video question and answer on the target user based on the manual examination result if the manual examination result does not pass the examination so as to obtain the identity verification result of the target user as not passing the examination.
In the embodiment, if the first confidence level is within a first preset confidence level interval, a second video clip acquired when a second video question and answer is carried out with a target user in the video call process is acquired; determining a second trustworthiness of the target user based on the second video segment; and dynamically determining sub-question-answer execution texts corresponding to subsequent different rounds of question-answers based on the second credibility, and continuously asking questions of the target user based on the sub-question-answer execution texts to obtain the identity verification result of the target user. The identity verification result of the target user can be accurately obtained.
Referring to fig. 3, fig. 3 is a schematic device structure diagram of a hardware operating environment according to an embodiment of the present application.
As shown in fig. 3, the identity verification apparatus may include: a processor 1001, such as a CPU, a memory 1005, and a communication bus 1002. The communication bus 1002 is used for realizing connection communication between the processor 1001 and the memory 1005. The memory 1005 may be a high-speed RAM memory or a non-volatile memory (e.g., a magnetic disk memory). The memory 1005 may alternatively be a memory device separate from the processor 1001 described above.
Optionally, the identity verification device may further comprise a rectangular user interface, a network interface, a camera, RF (Radio Frequency) circuitry, sensors, audio circuitry, a WiFi module, and the like. The rectangular user interface may comprise a Display screen (Display), an input sub-module such as a Keyboard (Keyboard), and the optional rectangular user interface may also comprise a standard wired interface, a wireless interface. The network interface may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface).
It will be appreciated by those skilled in the art that the configuration of the identity verification device shown in figure 3 does not constitute a limitation of the identity verification device and may include more or less components than those shown, or some components in combination, or a different arrangement of components.
As shown in fig. 3, a memory 1005, which is a kind of computer storage medium, may include therein an operating system, a network communication module, and an identity verification method program. The operating system is a program that manages and controls the hardware and software resources of the identity verification device, supports the operation of the identity verification method program, and other software and/or programs. The network communication module is used to implement communication between the components within the memory 1005 and with other hardware and software in the identity verification method system.
In the identity verification apparatus shown in fig. 3, the processor 1001 is configured to execute an identity verification method program stored in the memory 1005 to implement the steps of the identity verification method described in any one of the above.
The specific implementation of the identity verification apparatus of the present application is substantially the same as that of each embodiment of the identity verification method, and is not described herein again.
An embodiment of the present application further provides an identity verification apparatus, where the identity verification apparatus includes:
the first acquisition module is used for carrying out video call with a target user so as to acquire a target face image of the target user;
the second acquisition module is used for acquiring a first video clip acquired when a first video question and answer is carried out with a target user in the video call process if the target face image is consistent with the preset face image;
a first determining module for determining a first trustworthiness of the target user based on the first video segment;
and the second determining module is used for dynamically determining question and answer execution texts corresponding to different rounds of question and answer based on the first credibility so as to obtain the identity verification result of the target user.
Optionally, the first determining module includes:
the first obtaining unit is used for obtaining first mood information and first pace information when the target user carries out first video question answering;
and the first determining unit is used for determining the first credibility of the target user based on the first video clip, the first mood information and the first pace information.
Optionally, the first determining unit includes:
the input subunit is used for inputting the first video segment, the first mood information and the first pace information into a preset confidence coefficient judgment model;
the preset confidence coefficient distinguishing model is obtained by performing iterative training on a preset model to be trained based on a preset training data set;
and the judging subunit is used for judging the first video segment, the first mood information and the first pace information based on the preset confidence coefficient judging model to obtain the first confidence coefficient of the target user.
Optionally, the judging subunit is configured to implement:
vectorizing the first video segment to obtain a first video feature vector of the first video segment;
vectorizing the first mood information and the first pace information respectively to obtain a first mood characteristic vector and a first pace characteristic vector;
splicing the first video feature vector, the first tone feature vector and the first speech speed feature vector to obtain a first spliced vector;
and judging the first splicing vector based on the full connection layer of the preset confidence judgment model to obtain the first confidence of the target user.
Optionally, the apparatus further comprises:
the third acquisition module is used for acquiring a preset training data set, wherein the preset training data set corresponds to a preset type label;
the input module is used for inputting the preset training data set into the preset model to be trained so as to judge the type of the training sentence and obtain a prediction label;
the calculation module is used for calculating a model error based on the prediction label and the preset type label;
and the updating module is used for updating the preset model to be trained based on the model error until the preset model to be trained meets a preset updating end condition, and taking the preset model to be trained as the preset confidence coefficient judging model.
Optionally, the third obtaining module includes:
the acquisition unit is used for acquiring a historical video data set and determining a second video segment with credible and credible labels in the historical video data set;
the picture discrete unit is used for carrying out picture discrete, feature vector extraction and feature vector splicing processing on the second video clip to obtain a second video feature vector;
a second obtaining unit, configured to obtain second mood information and second pace information corresponding to the second video segment, and perform vectorization processing on the second mood information and the second pace information to obtain a second mood feature vector and a second pace feature vector;
the splicing unit is used for splicing the second video feature vector, the second mood feature vector and the second pace feature vector to obtain a second spliced vector;
and the third acquisition unit is used for acquiring the preset training data set based on the second splicing vector.
Optionally, the second determining module further includes:
the fourth obtaining unit is used for obtaining a second video clip acquired when a second video question and answer is carried out with the target user in the video call process if the first confidence level is in a first preset confidence level interval;
a second determining unit, configured to determine a second credibility of the target user based on the second video segment;
and the third determining unit is used for dynamically determining sub-question-answer execution texts corresponding to the questions and answers of different subsequent rounds based on the second credibility, and continuously asking questions of the target user based on the sub-question-answer execution texts to obtain the identity verification result of the target user.
Optionally, the second determining module further includes:
the tag unit is used for performing preset manual tag processing on the target user if the first confidence level is in a second preset confidence level interval;
the manual auditing unit is used for carrying out manual auditing on the target user based on the preset label to obtain a manual auditing result;
and the fourth determining unit is used for determining whether to continue to perform video question answering on the target user or not based on the manual auditing result so as to obtain the identity verification result of the target user.
Optionally, the first obtaining module includes:
the question asking unit is used for asking the target user if the target face image is consistent with a preset face image;
and the fifth determining unit is used for determining whether the answer of the target user is correct or not, and acquiring a first video clip acquired when a first video question and answer is carried out with the target user in the video call process if the answer of the target user is correct.
The specific implementation of the identity verification apparatus of the present application is substantially the same as that of the above-mentioned embodiments of the identity verification method, and is not described herein again.
The present application also provides a computer program product, comprising a computer program which, when executed by a processor, performs the steps of the identity verification method described above.
The specific implementation of the computer program product of the present application is substantially the same as that of the embodiments of the identity verification method described above, and is not described herein again.
The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings, or which are directly or indirectly applied to other related technical fields, are included in the scope of the present application.

Claims (13)

1. An identity verification method, comprising:
carrying out video call with a target user to acquire a target face image of the target user;
if the target face image is consistent with the preset face image, acquiring a first video clip acquired when a first video question and answer is carried out with a target user in the video call process;
determining a first trustworthiness of the target user based on the first video segment;
and dynamically determining question and answer execution texts corresponding to different rounds of question and answer based on the first credibility to obtain the identity verification result of the target user.
2. The identity verification method of claim 1, wherein the step of determining a first trustworthiness of the target user based on the first video segment comprises:
acquiring first mood information and first pace information when a first video question and answer is carried out with the target user;
and determining a first credibility of the target user based on the first video clip, the first mood information and the first pace information.
3. The identity verification method of claim 2, wherein the step of determining the first trustworthiness of the target user based on the first video segment, the first mood information, and the first pace information comprises:
inputting the first video clip, the first mood information and the first pace information into a preset confidence coefficient judgment model;
the preset confidence coefficient distinguishing model is obtained by performing iterative training on a preset model to be trained based on a preset training data set;
and judging the first video segment, the first mood information and the first pace information based on the preset confidence coefficient judging model to obtain the first confidence coefficient of the target user.
4. The identity verification method according to claim 3, wherein the step of obtaining the first confidence level of the target user by performing the discrimination processing on the first video segment, the first mood information, and the first pace information based on the preset confidence level discrimination model comprises:
vectorizing the first video segment to obtain a first video feature vector of the first video segment;
vectorizing the first mood information and the first pace information respectively to obtain a first mood characteristic vector and a first pace characteristic vector;
splicing the first video feature vector, the first tone feature vector and the first speech speed feature vector to obtain a first spliced vector;
and judging the first splicing vector based on the full connection layer of the preset confidence judgment model to obtain the first confidence of the target user.
5. The identity verification method of claim 4, wherein before the step of performing the discrimination processing on the first stitched vector based on the fully-connected layer of the preset confidence discrimination model to obtain the first confidence of the target user, the method comprises:
acquiring a preset training data set, wherein the preset training data set corresponds to a preset type label;
inputting the preset training data set into the preset model to be trained to judge the type of the training sentence to obtain a prediction label;
calculating a model error based on the predicted tag and the preset type tag;
and updating the preset model to be trained based on the model error until the preset model to be trained meets a preset updating end condition, and taking the preset model to be trained as the preset confidence coefficient judgment model.
6. The identity verification method of claim 5, wherein the step of obtaining the predetermined training data set comprises:
collecting a historical video data set, and determining a second video segment with credible and non-credible labels in the historical video data set;
performing picture dispersion, feature vector extraction and feature vector splicing on the second video clip to obtain a second video feature vector;
acquiring second mood information and second speech rate information corresponding to the second video clip, and performing vectorization processing on the second mood information and the second speech rate information to obtain a second mood feature vector and a second speech rate feature vector;
splicing the second video feature vector, the second mood feature vector and the second pace feature vector to obtain a second spliced vector;
and obtaining the preset training data set based on the second splicing vector.
7. The identity verification method of claim 1, wherein the question-answer execution texts comprise sub-question-answer execution texts, and the step of dynamically determining the question-answer execution texts corresponding to different turn questions and answers based on the first credibility to obtain the identity verification result of the target user comprises:
if the first confidence level is within a first preset confidence level interval, acquiring a second video clip acquired when a second video question and answer is carried out with a target user in the video call process;
determining a second trustworthiness of the target user based on the second video segment;
and dynamically determining sub-question-answer execution texts corresponding to subsequent different rounds of question-answers based on the second credibility, and continuously asking questions of the target user based on the sub-question-answer execution texts to obtain the identity verification result of the target user.
8. The identity verification method of claim 1, wherein the step of dynamically determining question-answer execution texts corresponding to different turn questions and answers based on the first confidence level to obtain the identity verification result of the target user comprises:
if the first confidence level is in a second preset confidence level interval, performing preset manual label processing on the target user;
performing manual review on the target user based on the preset label to obtain a manual review result;
and determining whether to continue to perform video question answering on the target user or not based on the manual checking result so as to obtain an identity verification result of the target user.
9. The identity verification method according to claim 1, wherein the step of acquiring the first video segment obtained when the first video question and answer is performed with the target user during the video call if the target face image is consistent with the preset face image comprises:
if the target face image is consistent with a preset face image, asking the target user;
and determining whether the answer of the target user is correct, and if the answer of the target user is correct, acquiring a first video clip acquired when a first video question and answer is carried out with the target user in the video call process.
10. An identity verification apparatus, comprising:
the first acquisition module is used for carrying out video call with a target user so as to acquire a target face image of the target user;
the second acquisition module is used for acquiring a first video clip acquired when a first video question and answer is carried out with a target user in the video call process if the target face image is consistent with the preset face image;
a first determining module for determining a first trustworthiness of the target user based on the first video segment;
and the second determining module is used for dynamically determining question and answer execution texts corresponding to different rounds of question and answer based on the first credibility so as to obtain the identity verification result of the target user.
11. An identity verification apparatus, characterized in that the identity verification apparatus comprises: a memory, a processor, and a program stored on the memory for implementing the identity verification method,
the memory is used for storing a program for realizing the identity verification method;
the processor is adapted to execute a program implementing the identity verification method to implement the steps of the identity verification method as claimed in any one of claims 1 to 9.
12. A readable storage medium having stored thereon a program for implementing an identity verification method, the program being executable by a processor for implementing the steps of the identity verification method as claimed in any one of claims 1 to 9.
13. A computer program product comprising a computer program, characterized in that the computer program realizes the method of any of claims 1 to 9 when executed by a processor.
CN202011421028.4A 2020-12-07 2020-12-07 Identity verification method, device, equipment, storage medium and computer program product Pending CN114596604A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011421028.4A CN114596604A (en) 2020-12-07 2020-12-07 Identity verification method, device, equipment, storage medium and computer program product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011421028.4A CN114596604A (en) 2020-12-07 2020-12-07 Identity verification method, device, equipment, storage medium and computer program product

Publications (1)

Publication Number Publication Date
CN114596604A true CN114596604A (en) 2022-06-07

Family

ID=81803249

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011421028.4A Pending CN114596604A (en) 2020-12-07 2020-12-07 Identity verification method, device, equipment, storage medium and computer program product

Country Status (1)

Country Link
CN (1) CN114596604A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116152967A (en) * 2023-04-17 2023-05-23 成都赛力斯科技有限公司 Vehicle remote sharing method, device and system and electronic equipment

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116152967A (en) * 2023-04-17 2023-05-23 成都赛力斯科技有限公司 Vehicle remote sharing method, device and system and electronic equipment

Similar Documents

Publication Publication Date Title
US20180197547A1 (en) Identity verification method and apparatus based on voiceprint
CN109815156A (en) Displaying test method, device, equipment and the storage medium of visual element in the page
CN111354237A (en) Context-based deep knowledge tracking method and computer readable medium thereof
CN111652087B (en) Car inspection method, device, electronic equipment and storage medium
CN111738041A (en) Video segmentation method, device, equipment and medium
CN110096576B (en) Method, system and storage medium for automatically segmenting text
CN111653274B (en) Wake-up word recognition method, device and storage medium
CN111858973A (en) Multimedia event information detection method, device, server and storage medium
CN112463923B (en) User fraud detection method, device, equipment and storage medium
CN112990294B (en) Training method and device of behavior discrimination model, electronic equipment and storage medium
CN114596604A (en) Identity verification method, device, equipment, storage medium and computer program product
CN113191478A (en) Training method, device and system of neural network model
CN113190444B (en) Test method, test device and storage medium
CN111767923B (en) Image data detection method, device and computer readable storage medium
CN113053395A (en) Pronunciation error correction learning method and device, storage medium and electronic equipment
KR101854804B1 (en) Apparatus for providing user authentication service and training data by determining the types of named entities associated with the given text
CN116645683A (en) Signature handwriting identification method, system and storage medium based on prompt learning
CN112085594B (en) Identity verification method, device and readable storage medium
US11551434B2 (en) Apparatus and method for retraining object detection using undetected image
CN113642443A (en) Model testing method and device, electronic equipment and storage medium
CN114625872A (en) Risk auditing method, system and equipment based on global pointer and storage medium
CN112446360A (en) Target behavior detection method and device and electronic equipment
CN112069800A (en) Sentence tense recognition method and device based on dependency syntax and readable storage medium
CN112055013A (en) Automatic authentication method, device, equipment and storage medium
CN116167727B (en) Image analysis-based flow node identification and processing system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination