CN111784358A

CN111784358A - Identity verification method and device based on user privacy protection

Info

Publication number: CN111784358A
Application number: CN202010762416.2A
Authority: CN
Inventors: 薛琼; 杨陆毅
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2020-07-31
Filing date: 2020-07-31
Publication date: 2020-10-16

Abstract

An embodiment of the present specification provides an identity verification method and apparatus based on user privacy protection, where the method includes: acquiring operation behaviors and/or behavior additional data corresponding to the operation behaviors, wherein the behavior additional data is data which is not user privacy data and is related to the operation behaviors of a user; combining the operation behaviors and/or the behavior additional data and a plurality of time windows to generate candidate question stems, and respectively configuring corresponding answer candidate items aiming at the candidate question stems to form at least one candidate question; and selecting a target topic meeting the preset requirement from the candidate topics, and verifying the identity of the target user based on the target topic. According to the method, the risk rate and the pass rate are balanced, the big data core product with user privacy protection is constructed by combining the user behavior sequence and the weak information, and the user identity can be verified while the personal privacy data are prevented from being used.

Description

Identity verification method and device based on user privacy protection

Technical Field

The embodiment of the specification relates to the technical field of privacy data protection, in particular to an identity verification method and device based on user privacy protection.

Background

An application program supporting online payment needs to perform online verification or verification on the identity of a user in many scenes, and one verification mode is an active verification mode based on kyc (knour customer) rules, that is, the user actively provides information such as a bank card, an identity card, an account ID, a mobile phone number and the like, so that the system can master and verify the identity of the user.

However, in the initial stage of product delivery, when a user account is not mature or the coverage rate of the active body verification method based on KYC is extremely low, or in a scene where the user gives up providing KYC related information due to the complicated operation of the active body verification method, it is necessary to seek other body verification methods to perform online verification on the user identity.

Disclosure of Invention

The specification describes an identity verification method based on user privacy protection, corresponding selection questions are set for a user to select based on historical operation behaviors and behavior additional data of the user, whether the user is a real legal user or not is further judged, and the behavior additional data is data except strong privacy data, so that the method can be suitable for various regions with strong supervision of the user privacy data.

According to a first aspect, there is provided an identity verification method based on user privacy protection, the method comprising:

acquiring at least one operation behavior corresponding to a target user and/or at least one behavior additional data corresponding to the operation behavior when the operation behavior occurs, wherein the behavior additional data is data which is not user privacy data and is related to the operation behavior of the user; generating at least one candidate stem by combining the at least one operation behavior and/or the at least one behavior additional data and a plurality of time windows, wherein a single candidate stem is used for asking for another item after any one or two items of the time windows, the operation behavior and the behavior additional data are determined; respectively configuring corresponding answer candidate items aiming at least one candidate question stem to form at least one candidate question, wherein the answer candidate items comprise correct answers and at least one interference item; screening at least one candidate topic which meets the preset requirement as a target topic from at least one candidate topic; the identity of the target user is verified based on the at least one target topic.

In one embodiment, before screening out at least one candidate topic meeting a predetermined requirement as a target topic from the at least one candidate topic, the method further includes: predicting the passing rate and the risk rate respectively corresponding to at least one candidate question; the passing rate is at least determined based on the probability of selecting correct answers when legal users answer, and the risk rate is at least determined based on the probability of selecting correct answers when illegal users answer;

screening at least one candidate topic meeting the preset requirement as a target topic from at least one candidate topic, and comprising the following steps: and screening out at least one candidate topic with the risk rate lower than a preset threshold value and the passing rate in the order of the top preset digits from the at least one candidate topic as a target topic.

In one embodiment, predicting the respective passing rates of the at least one candidate topic includes:

predicting the probability of selecting a correct answer when the target user answers the corresponding candidate question through a pre-trained prediction model; the prediction model is obtained by training based on a first training sample, and the first training sample comprises first sample characteristics and sample labels representing whether correct answers are selected or not when each legal user actually answers; the first sample characteristics comprise behavior characteristics extracted based on the operation behaviors of legal users and theme characteristics extracted based on various candidate themes.

In one embodiment, predicting the risk rates respectively corresponding to at least one candidate topic comprises:

for any first candidate topic in at least one candidate topic, obtaining the prior probability value of each answer candidate item under the first candidate topic; and calculating the ratio of the prior probability value of the correct answer to the sum of the prior probability values of all answer candidate items under the first candidate item to serve as the first risk rate of the first candidate item.

In one embodiment, predicting the risk rates corresponding to the at least one candidate topic further includes:

acquiring the probability of account embezzlement as a second risk rate, and taking the sum of the first risk rate and the second risk rate as the total risk rate corresponding to the first candidate topic; the probability of account theft occurring is obtained based on a pre-trained theft risk model.

In one embodiment, the at least one operational behavior is performed based on a specified application having payment capabilities, including at least one of payment, login, password change, binding, active identity verification, participation in passive identity verification, participation in a marketing campaign, verification, activation, registration, and top-up.

In one embodiment, the behavioral additional data includes environmental data and/or behavioral object data that does not uniquely correspond to the target user.

In one embodiment, the environmental data includes device data and/or network environment data corresponding to when the operational behavior occurs; the equipment data comprises at least one of equipment brand name, equipment category, equipment model, operating system category and operating system name; the network environment data includes at least one of a network operator name, a network type.

In one embodiment, the behavior object data includes at least one of a name of a marketing campaign, a top-up amount, a payment type, payment object attributes, and a frequency of operational behaviors engaged by the target user.

In one embodiment, a single candidate stem is used to ask questions about behavior additional data after the time window, operational behavior determination;

respectively configuring corresponding answer candidates for at least one candidate stem, including: acquiring behavior additional data corresponding to a target user when an operation behavior in the candidate question stem occurs, and taking the behavior additional data as a correct answer of the question stem;

based on the correct answer, at least one distracter is added.

In one embodiment, adding at least one distracter based on the correct answer includes: acquiring a plurality of behavior additional data respectively corresponding to operation behaviors of different users in the candidate question stem as alternatives of the interference item; and acquiring the correct answer and the prior probability value of each alternative, and selecting an alternative higher than the prior probability value of the correct answer from a plurality of alternatives as an interference item.

In one embodiment, adding at least one distracter based on the correct answer further comprises: selecting one alternative item with a prior probability value exceeding a specified value and lower than the prior probability value of the correct answer from a plurality of alternative items as another interference item; and/or randomly selecting one option with the prior probability value exceeding the specified value from a plurality of options as another interference item.

In a second aspect, embodiments of the present specification further provide an identity verification apparatus based on user privacy protection, where the apparatus includes:

the acquiring unit is configured to acquire at least one operation behavior corresponding to a target user and/or at least one behavior additional data corresponding to the operation behavior when the operation behavior occurs, wherein the behavior additional data is data which is not user privacy data and is related to the operation behavior of the user; a combination unit configured to generate at least one candidate stem by combining the at least one operation behavior and/or the at least one behavior additional data and the plurality of time windows, wherein a single candidate stem is used to ask a question about any one or two of the time windows, the operation behavior and the behavior additional data after the other one is determined; the confusion unit is configured to configure corresponding answer candidate items aiming at least one candidate question stem respectively so as to form at least one candidate question, wherein the answer candidate items comprise correct answers and at least one interference item; the screening unit is configured to screen out at least one candidate topic which meets a preset requirement as a target topic from at least one candidate topic; a verification unit configured to verify an identity of a target user based on at least one target topic.

According to a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first to second aspects.

According to a fourth aspect, there is provided a computing device comprising a memory having stored therein executable code and a processor that, when executing the executable code, implements the methods of the first to second aspects.

By adopting the identity verification method based on user privacy protection provided by the embodiment of the specification, the operation behaviors, the behavior additional data corresponding to the occurrence of the operation behaviors and the time window are combined to generate a plurality of candidate question stems, corresponding answer candidate items are set to form a plurality of candidate questions, at least one question is screened out from the candidate questions to be answered by the user, and the identity of the user is verified according to the user answers, wherein the behavior additional data is data except privacy data which is forbidden to be obtained, so that the method can adapt to various strong supervision environments, the user only needs to answer a plurality of choice questions without inputting information such as bank card account numbers, identity card account numbers, mobile phone numbers and the like, the operation is more convenient and more convenient, the defects of the active body verification mode of KYC are well made up, the coverage rate of the active body verification mode is extremely low, or the user does not want to adopt the active body verification mode to perform identity verification, the method provides another choice for the user, and further improves the user experience while protecting the user privacy data.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments disclosed in the present specification, the drawings needed to be used in the description of the embodiments will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments disclosed in the present specification, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.

FIG. 1 is a diagram illustrating an exemplary system flow architecture of a method for identity verification based on user privacy protection provided in the present specification;

FIG. 2 illustrates a flow diagram for one embodiment of a method for user privacy protection based identity verification provided herein;

FIG. 3 is a diagram illustrating an example of an interactive interface displaying a target topic in an embodiment of the present specification;

fig. 4 is a schematic structural diagram illustrating an embodiment of an identity verification apparatus based on user privacy protection provided in the present specification.

Detailed Description

The identity of the user is verified, and besides an active identity verification mode (namely identity verification), a passive identity verification mode can be adopted, and the passive identity verification mode does not need the user to actively provide related information, but obtains account related information for the user identity verification based on a big data mode. The kba (knowledge based authentication) is a core mode based on user knowledge, and is one of passive core modes, and the mode can effectively supplement the deficiency of the active core mode. For example, based on the KBA body checking mode of the payment bank, the user identity can be checked by utilizing account passive information obtained from the payment bank system, such as a user address book, a user connection wifi name and the like, and a user does not need to actively input multiple items of information such as a bank card account number and the like, so that the self-help body checking proportion of the user and the convenience of user operation are improved.

However, under the strong supervision environment, the address book of the user, the wifi name used by the user, and the like all belong to the user privacy data and are prohibited from being acquired, so that the method of performing identity verification by acquiring the privacy data of the address book of the user, the wifi name, and the like cannot be applied to the strong supervision area.

In view of this, embodiments of the present specification provide an identity verification method based on user privacy protection, which implements user identity verification without acquiring user privacy data such as an address book and a wifi name.

Embodiments disclosed in the present specification are described below with reference to the accompanying drawings.

Referring to fig. 1, in the method disclosed in the embodiment of the present specification, the acquired behavior additional data is combined with three items, namely, an operation behavior and a time window, to obtain a plurality of candidate question stems, then answer candidate items are added to obtain a plurality of candidate questions, then a plurality of target questions are screened out from the candidate questions according to a business rule with the highest passing rate on the premise that a risk rate is lower than a predetermined threshold value, so that a user can answer the candidate questions, and whether the user is a legal user can be determined according to an answer result of the user.

Specifically, in one embodiment, the identity verification method based on user privacy protection disclosed in this specification, referring to fig. 2, may include:

s201, acquiring at least one operation behavior corresponding to a target user and/or at least one behavior additional data corresponding to the operation behavior when the operation behavior occurs; s202, generating at least one candidate question stem by combining at least one operation behavior and/or at least one behavior additional data and a plurality of time windows; s203, corresponding answer candidate items are respectively configured for at least one candidate question stem to form at least one candidate question, and the answer candidate items comprise correct answers and at least one interference item; s204, screening at least one candidate topic which meets the preset requirement as a target topic from the at least one candidate topic; s205, based on at least one target topic, the identity of the target user is verified.

In S201, the operation behavior includes user behavior that can be recorded based on each item executed by the specified application program, and in one embodiment, the operation behavior may include at least one of payment, login, password change, binding, active identity verification, passive identity verification, marketing activity participation, verification, activation, registration, recharge, and the like. For example, the user logs in the payment treasure, and the operations of changing the password, participating in the marketing campaign, paying, recharging, and the like after logging in the payment treasure can be taken as the operation behaviors.

In the embodiments of the present specification, the behavior addition data is various data related to the operation behavior of the user, other than the user privacy data, and may also be referred to as weak information, which refers to data related to the operation behavior of the user through the specified application program, and is generally device environment information that is not uniquely corresponding to the user, other than the user privacy data such as a non-address book. In some embodiments, the behavior additional data includes environmental data and/or behavior object data that does not uniquely correspond to the target user. The environment data comprises corresponding equipment data and/or network environment data when the operation behavior occurs; the device data comprises at least one of a device brand name, a device class, a device model (e.g., a cell phone model), an operating system class, and an operating system name; the network environment data includes at least one of a network operator name, a network type. And the behavior object data comprises at least one of the name of the marketing campaign participated by the target user, the recharge amount, the payment type, the attribute of the payment object and the operation behavior frequency.

Specifically, the behavior additional data may be at least one of a brand name of a device, a class of the device, a model of the device, a payment type, a property of a payment object, a class of an operating system, a name of the operating system, a name of a marketing campaign involved, a recharge amount, a name of a network operator, a network type, and a frequency of an operation behavior corresponding to when the operation behavior is executed.

For example, data related to the operation behavior, such as the model number and brand name of the terminal device used when the user logs in the payment instrument, and the amount of money to be recharged, may be used as the behavior additional data. It should be noted that, the definitions of the private data by the monitoring departments in different regions may be different, and the behavior additional data in the embodiment of the present specification should be adjusted according to the specific requirements of the specific product application region, so as to avoid using data divided into private data by the local monitoring department as the behavior additional data, and for the operation behavior data, behavior data other than the operation behavior data that is prohibited by the monitoring department.

In S202, a single candidate stem is used to ask another term after any two of the time window, the operation behavior, and the behavior additional data are determined. That is, a user generally generates multiple operation behaviors based on a specified application program to generate multiple items of additional data, and multiple items of operation behaviors, multiple items of behavior additional data and a time window are arranged and combined to generate multiple candidate question stems.

For example, if the user U1 registers a payment treasure before one month, registers the payment treasure twice a week after registration, and pays an amount of money in the past week, the operation behaviors of the user U1 include three operations of registration, and payment, and the device used to perform the three operations is a certain brand of mobile phone, and the operating system is android, i.e., two behavior additional data, so that the three operations and the two behavior additional data are primarily arranged and combined to form 3 combinations, and then combined to time windows of different lengths, thereby obtaining a plurality of candidate stems. And combining the registration behaviors with the terminal equipment models in the behavior additional data, and setting a time window to be one month before, wherein the obtained corresponding candidate question stems are as follows: asking you which model of device was registered one month ago? "combine payment behavior with device model, set time window to one week, then get candidate question stem as" you have paid one money in the past week, the device used is the following? Similarly, other combinations are also available.

The operation behavior, behavior addition data and time window are 3 elements of the candidate stem, but not all elements, and besides the 3 elements, corresponding necessary components should be added according to the semantic expression rule of the chinese language or other natural languages to obtain the candidate stem conforming to the semantic expression, for example, for the chinese language, corresponding subject and predicate, adverb or complement should be added.

The time window may be a time period, for example, any time duration such as 1 hour, 1 day, 2 days, one week, two weeks, one month, and the like may be used as the time window, and the time window may also specifically refer to a certain moment, for example, a minute, "ask you about 1 pm and 10 minutes later in yesterday, a payment operation is performed, and the type of payment is which of the following? "

The selection of the time window should be set according to the frequency of occurrence of the operation behavior, for example, if the operation behavior is a low-frequency one-time behavior such as registration, the time window may be set to be the length of the registration behavior from the current time, for example, if the user performs registration 3 months ago, the time window may be set to 3 months, and the question is asked 3 months ago; for logging in the high-frequency behavior, setting a time window according to the frequency and time characteristics of user logging in, for example, a user logs in 1 time each day, and then setting the time window to a week or a month obviously cannot be distinguished, so that the time window can be set to a day or a specific time period in a day, for example, a corresponding question stem can be "asking which time period you logged in a payer yesterday? "asking which device you use between 3 pm and 5 pm yesterday to log in a payroll? "and the like.

In addition, in this embodiment of the present specification, the operation behavior, the additional behavior data, and the time window may be all used as topics, that is, when two other items are determined, the remaining item is asked, for example, when the additional behavior data is asked, then the corresponding candidate stem has a composition structure: subject + time window + action behavior, what is the behavior additional data? For example, when the time window is set to 1 month, the corresponding candidate stem may be "what model of device was used for registration before your month? "do you log on the product (or product name) many times in the past month, and what is the operating system of the device used? "and the like;

when asking about the operation behavior, the corresponding candidate question stem has the following composition structure: subject + time window + behavior additional data, what is the behavior of the operation? For example, a candidate stem for asking about operational behavior may be "what kind of behavior we have performed once the following using a model of mobile phone in the past week? "" you used one month ago? ", or, with the frequency of the operation behavior as the behavior additional data: "what is the behavior you have operated at least three times in the past week? "and the like;

when a question is asked about a time window, the corresponding candidate question stem composition structure is as follows: subject + operation behavior + behavior additional data, what is time? For example, when asking for time, the candidate stem may be set as: "when you registered the product with the brand mobile phone for the last time? "you have recently made a payment in a wifi network environment asking for a specific time which is the following? "and the like.

In the above embodiments, each example is based on any two items of the operation behavior, the behavior additional data and the time window, and the other item is asked, but in another embodiment, one item of the operation behavior and the behavior additional data can be combined with the time window, and the obtained candidate stem structure is "subject + operation behavior, what is time? "or" subject + time, what is the operational behavior? "or" subject + behavior additional data, what time? "for example" when did you last login to pay for your money? "what kind of operations did you perform based on the product (which may be a specific product name) yesterday? "when did you use model number of mobile phone? "

That is, the number of elements in the candidate stem may be 3 or 2, that is, in some embodiments, three elements, i.e., the operation behavior, the behavior additional data and the time window, are all included in the candidate stem, and a question can be asked about any of the 3 elements, while in other embodiments, the candidate stem may only relate to the operation behavior and the time window, and the question is asked about the operation behavior when the time is determined; or only to the behavioural additional data and the time window for which the question is asked when the behavioural additional data is determined.

In a specific embodiment, the candidate stem is obtained based on the operation behavior, the behavior additional data and the time window in a specific manner that two of three items or three items are randomly combined, and a corresponding language expression necessary vocabulary such as subject, predicate and the like is added to form the candidate stem.

The stem obtained by the random combination method does not generally conform to the requirement of grammar or language expression habit, so as to be an implementable method, the stem can be combined by a pre-trained combination model and necessary auxiliary words are added to enhance grammar expression, and the pre-trained combination model can be implemented based on various models with natural language processing capability, such as DNN (deep neural Networks ), bert (bidirectional Encoder retrieval from transformations), LSTM (long-short term memory model), and the like. In one embodiment, the operation behavior, the behavior additional data and the time window can be firstly converted into corresponding feature vectors respectively, the three feature vectors are combined through a pre-trained combination model, and other word vectors are added, such as the subject "you", the object "once", the assistant "and the predicate" yes "and the language assistant" is added, so as to output the candidate stems.

The combined model can also be a reinforcement learning model, for example, the agent of the agent outputs a candidate question stem, then the environment feeds back a reward score (i.e. reinforcement signal), and updates the environment state, and the agent readjusts the action according to the current environment state and the reward score to obtain a higher reward score. Various combinations and combination models can be specifically adopted, and the embodiments in the specification are not listed one by one.

After obtaining a plurality of candidate stems, in S203, corresponding answer candidates are set for each candidate stem.

Answer candidates are set corresponding to the question stem questioning object, for example, when the candidate question stem is "do you register with a brand of device before a month? When "is true, the corresponding answer candidate may be" a, brand 1; b, brand 2; c, brand 3; d, brand 4 "; when the candidate stem is "did you log in AlipayHK many times in the past month, which is the operating system of the device used? ", the corresponding answer candidate may be" a, Android Pie; b, Android 11; c, macOS 10.14 "Mojave"; d, macOS 10.15 "Catalina"; when the candidate question stem is "what did you operate at least three times in the past week? ", the corresponding answer candidate may be" a, pay; b, recharging; c, cash withdrawal; d, participating in activity and coupon ", and so on.

When the question stem asks questions about the behavior additional data, the answer candidate items are candidates corresponding to the behavior additional data, for example, in the behavior additional data, when a question is asked about a payment type, the corresponding answer candidate items can be credit card payment, savings card payment, flower payment, other payment and the like, and when a question is asked about a payment object attribute, the corresponding answer candidate items can be a personal account, a personal settlement account, a personal savings account, a personal credit card account, a pair public account, a basic account, a temporary account, a special account and the like. The specific answer candidate items correspond to the corresponding question elements, and various choices are provided, and are not listed one by one in the specification.

Specifically, taking the case of asking questions about the behavior additional data as an example, the following method may be adopted for configuring corresponding answer candidates for each candidate stem: and acquiring behavior additional data corresponding to the target user when the operation behavior in the candidate question stem occurs, taking the behavior additional data as a correct answer of the question stem, and adding at least one interference item based on the correct answer.

That is, in one embodiment, the added interference item is set around the correct answer, specifically, a plurality of behavior additional data corresponding to different users for operation behaviors in the candidate stem are obtained as candidates for the interference item, the correct answer and prior probability values of the respective candidates are obtained, one of the candidates having a prior probability value higher than that of the correct answer is selected as one interference item, and one of the candidates having a prior probability value exceeding a specified value and lower than that of the correct answer is selected as another interference item; alternatively, an alternative with a prior probability value exceeding a specified value may be randomly selected from a plurality of alternatives as another interference item.

For example, the number of answer candidates is 4, an interference item with a higher prior probability than a correct answer is selected first, and then an interference item with a lower prior probability than the correct answer is selected; and then a randomly chosen interference term is selected. Wherein the prior probability of all interference terms is greater than 0.5% (which may also be other thresholds, such as 1%, 0.6%, etc., and the value range is 0.3% -25%), and the specific subject example is as follows: asking you which brand of mobile phone he used to log in AlipayHK within 3 months? ", answer candidates are set to: 1. brand a (correct answer) 2. brand b (highest prior probability, higher than brand a)3. brand c (lowest prior probability, lower than brand a)4. brand d (other randomly selected). Meanwhile, the prior probability of all the options is ensured to be more than 0.5 percent so as to ensure the validity of the options.

It should be noted that, at the initial stage of product delivery in a certain area, the prior probability of each answer candidate item may be determined according to the big data statistical result in other areas, that is, statistics is performed according to the historical data of the product to obtain the prior probability; alternatively, when no other regional data is available for reference or is not suitable for reference, the data may be obtained based on a preset regular function or a probability model, for example, the probability model may be a bayesian network model or a markov random field. After the data volume rises to a certain scale, the prior probability of each candidate item can be counted according to the user history selection of the local area.

In this way, answer candidates are added to each candidate question. The mode of questioning the operation behavior or time may refer to the answer candidate setting mode when questioning the behavior additional data, and this description is not repeated.

In another embodiment, the candidate stem may also be set to "no" form, i.e. the answer candidates only contain two options of "yes" or "no", e.g. the corresponding question may be "asking you whether your last week logged in AlipayHK? "asking if you have ever registered AlipayHK with a brand of device within the past month? "the corresponding answer candidate item only contains" yes "or" no ", and the question is relatively poor in risk resistance and can be applied to occasions with low auditing strictness; or the purpose of improving the auditing accuracy is achieved by increasing the number of the questions.

After a plurality of candidate topics are obtained, next, in S204, at least one of the candidate topics is screened out as a target topic. Specifically, in one embodiment, at least one candidate topic meeting a predetermined requirement is screened out as a target topic from at least one candidate topic as follows:

and predicting the passing rate and the risk rate corresponding to each candidate topic, and screening at least one candidate topic with the risk rate lower than a preset threshold and the ranking of the passing rate at the top preset digit from the candidate topics as a target topic.

The passing rate is at least determined based on the probability of selecting a correct answer when a legal user answers, namely the passing rate is used for representing the probability of making a current question when the user is a real legal user. In practical applications, many corresponding topics are asked for the user's historical behaviors, and different user memory abilities are different, so that even if the user himself/herself has a possibility that a correct answer cannot be selected, and therefore, in setting the topics, the topics with a high passing rate should be selected as much as possible from the candidate topics as the target topics.

When the initial service data volume of the product delivery is insufficient, the passing rate may be obtained based on a preset service rule, for example, setting a consideration dimension of the service rule includes a time length from the last occurrence of the operation behavior of each category in 90 days, a number of occurrences of the operation behavior of each category in 90 days based on a specified product on-line number of days by a user, and the like, setting a corresponding rule function, for example, as an implementable manner, the rule function may be a weighted sum function, and the above consideration dimensions are used as arguments to perform weighted sum, and the weight may be set according to an actual measurement result.

After the data volume rises to form a scale, a prediction model obtained by pre-training can be adopted to predict the passing rate of the questions of each user.

Specifically, in one embodiment, the probability that a correct answer is selected when a target user answers a corresponding candidate question is predicted through a pre-trained prediction model, and the prediction model is obtained by training based on a first training sample, wherein the first training sample comprises first sample characteristics and sample labels indicating whether correct answers are selected when each legal user actually answers; the first sample characteristics comprise behavior characteristics extracted based on the operation behaviors of legal users and theme characteristics extracted based on various candidate themes, wherein the behavior characteristics are equivalent to the user characteristics, namely the behavior characteristics of the unused users are quantitatively represented.

For example, the predictive model may be a Logistic Regression (LR) model, a Gradient Boosting Decision Tree (GBDT) model, or the like.

Taking an LR model as an example, firstly extracting topic features based on candidate topics, extracting behavior features based on user behaviors, splicing the topic features and the behavior features into a sample feature, and taking an actual pass rate as a sample label to obtain a plurality of training samples. A sigmoid function is selected as a mapping function, and the expression of the LR model can be:

wherein x is the sample characteristic, theta^TFor the transposition of the parameter matrix, firstly initializing the parameter matrix theta, then inputting the obtained feature vector x of the sample feature as an independent variable into the model, and calculating h_θ(x)，h_θ(x) And an output value representing the probability of selecting the correct answer by the current sample characteristic x, namely the probability of selecting the correct answer when the user corresponding to the user characteristic in the current sample characteristic answers the question corresponding to the question characteristic. And then, calculating a loss value between the output value and the actual passing rate based on a preset loss function, continuously adjusting and optimizing a parameter matrix theta by adopting a gradient descent method according to the calculated loss value until the loss value is lower than a preset value, indicating that the model is converged, finishing training and obtaining the trained LR model. By the LR model, the passage rate of each user when answering each question can be predicted.

Similarly, a decision tree model such as GBDT and a gradient boosting model (Xgboost) may also be used as the prediction model, and specifically, the training sample may be referred to train the corresponding prediction model to obtain the trained prediction model, which is not specifically listed in the embodiments of this specification.

In the embodiment of the present specification, in addition to the passing rate, the existence of the risk rate should be considered when setting the topics, in the embodiment of the present specification, the risk rate is determined based on at least the probability of selecting the correct answer when the illegal user answers, that is, the risk rate is at least used for representing the probability that the answer randomly selected by guessing after the user account is stolen or when the illegal user imitates the identity of the real user for verification is the correct answer, that is, the probability that the illegal user can answer the question, and when selecting the target topic from the candidate topics, the topics with lower risk rate should be selected as much as possible.

Specifically, as an implementable manner, the first risk rate (breakthrough probability) is obtained as follows: for any first candidate topic in at least one candidate topic, namely for any candidate topic, obtaining the prior probability value of each answer candidate under the first candidate topic, and calculating the ratio of the prior probability value of the correct answer to the sum of the prior probability values of all answer candidates under the first candidate topic to serve as the first risk rate of the first candidate topic.

In one embodiment, the risk rate considered in screening topics should be combined with the occurrence probability of the theft of the user account in addition to the first risk rate. Specifically, the probability of account theft is obtained and used as the second risk rate, and the sum of the first risk rate and the second risk rate is used as the total risk rate of the first candidate topic. Wherein the probability of occurrence of the account theft, i.e. the second risk rate, is obtained based on a pre-trained theft risk model.

The first risk rate and the second risk rate represent typical risk factors faced by the scheme of the embodiment of the present disclosure, in an actual application scenario, different areas of use may be faced with other risks, and the risk factors specifically causing the risks may be set according to different areas, and are not limited to the first risk rate and the second risk rate.

In one embodiment, after determining the passing rate and the risk rate of the topics, first selecting the topics with the risk rate lower than a predetermined threshold value, for example, topics with the risk rate lower than 5% -25%, and then further screening the topics with the passing rate ranked in the first few digits, for example, topics ranked in the first 1-10 digits, from these low-risk topics as target topics, typically selecting 2-5 target topics for one user.

In the above embodiment, the passing rate and the risk rate are respectively obtained after the candidate topics are formed, and the topic screening is performed based on the passing rate and the risk rate. However, the method provided in this embodiment is not limited to this, and the general rule of the topics is set such that the risk rate is lower than the preset threshold and the passing rate ranks ahead, in another embodiment, in the process of generating the topic stems, that is, when three items of data, i.e., the operation behavior, the behavior additional data and the time window, are combined, various possible combination modes are traversed, and various answer candidates are traversed, the passing rate and the risk rate respectively corresponding to the topics composed of the corresponding topic stems and the answer candidates are respectively calculated, the combination mode with the highest passing rate is selected from the passage rates, and then the combination is performed, that is, at the beginning of the combination, that is, the passing rate and the risk rate are considered, the combined topic stem and the corresponding answer candidate may already meet the requirements of the passing rate or the risk rate, and the combined topic stem and the corresponding answer may be used as the target topic; or when the candidate question stems and the answer options are more, after combination, secondary screening can be carried out by combining the passing rate and the risk rate again to obtain the target question.

And displaying the generated target title to the user through a user interaction interface in a facing manner so as to be answered by the user. For example, a corresponding interactive interface is shown in FIG. 3. And after the user answers, judging the identity of the user according to the answering result of the user. The judgment rule can be set that more than 80% of the questions are approved, or set that all the questions can be approved. The method specifically determines according to the auditing purpose and the auditing severity, and if large payment or transfer is involved, all the pairs are required to continue the subsequent operation. If the auditing is aimed at auditing whether to qualify as a coupon, the auditing severity can be reduced appropriately, for example, 70% of the questions are answered as pass.

The weak information and the user behavior sequence do not belong to the unique user identifier, so that the big data core product based on the weak information and the user behavior sequence cannot be verified by using the unique value answer. In the method disclosed in the embodiment of the present specification, a plurality of target topics in the form of choice topics are obtained through key steps of topic stem generation, interference term generation, and the like by combining three items of data information, namely weak information, account behavior, and a time window, and personalized verification is performed on the user identity. The design of the interference item gives consideration to low risk and low confusion, and the personalized core-body question is possible while the individual privacy information is effectively avoided.

In a second aspect, referring to fig. 4, an embodiment of the present specification further provides an identity verification apparatus 400 based on user privacy protection, the apparatus including:

the acquiring unit 4001 is configured to acquire at least one operation behavior corresponding to a target user and/or at least one behavior additional data corresponding to the operation behavior when the operation behavior occurs, where the behavior additional data is data associated with the operation behavior of the user, and is not user privacy data;

a combination unit 4002 configured to generate at least one candidate stem by combining the at least one operation behavior and/or the at least one behavior additional data and the time windows, wherein a single candidate stem is used to ask another one after any one or two of the time windows, the operation behavior and the behavior additional data are determined;

a confusion unit 4003 configured to configure corresponding answer candidates for at least one candidate stem to compose at least one candidate topic, wherein the answer candidates include correct answers and at least one interference item;

a screening unit 4004 configured to screen out at least one candidate topic meeting a predetermined requirement as a target topic from the at least one candidate topic;

a verification unit 4005 configured to verify an identity of a target user based on at least one target topic.

Optionally, in an embodiment, the apparatus further includes a predicting unit 4006 configured to predict a passing rate and a risk rate respectively corresponding to the at least one candidate topic; the passing rate is at least determined based on the probability of selecting correct answers when legal users answer, and the risk rate is at least determined based on the probability of selecting correct answers when illegal users answer;

a screening unit 4004 configured specifically to:

and screening out at least one candidate topic with the risk rate lower than a preset threshold value and the passing rate in the order of the top preset digits from the at least one candidate topic as a target topic.

Optionally, in an embodiment, the prediction unit 4006 is configured to:

Optionally, in an embodiment, the prediction unit 4006 is further configured to:

Optionally, in one embodiment, the at least one operational action is performed based on a designated application having payment capabilities, including at least one of payment, login, password change, binding, active identity verification, participation in passive identity verification, participation in a marketing campaign, verification, activation, registration, and top-up.

Optionally, in one embodiment, the behavior additional data includes at least one of a device brand name, a device class, a device model, an operating system class, an operating system name, a name of a participating marketing campaign, a recharge amount, a payment type, payment object attributes, a network operator name, a network type, and a frequency of operational behavior.

Optionally, in an embodiment, a single candidate stem is used for asking questions about the behavior additional data after the time window and the operation behavior are determined;

a obfuscation unit 4003 configured specifically to: acquiring behavior additional data corresponding to a target user when an operation behavior in the candidate question stem occurs, and taking the behavior additional data as a correct answer of the question stem; based on the correct answer, at least one distracter is added.

Optionally, in an embodiment, the obfuscating unit 4003 is specifically configured to: acquiring a plurality of behavior additional data respectively corresponding to operation behaviors of different users in the candidate question stem as alternatives of the interference item; and acquiring the correct answer and the prior probability value of each alternative, and selecting an alternative higher than the prior probability value of the correct answer from a plurality of alternatives as an interference item.

Optionally, in an embodiment, the obfuscating unit 4003 is further configured to: selecting one alternative item with a prior probability value exceeding a specified value and lower than the prior probability value of the correct answer from a plurality of alternative items as another interference item; and/or randomly selecting one option with the prior probability value exceeding the specified value from a plurality of options as another interference item.

As above, according to an embodiment of a further aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in any of the above embodiments.

According to an embodiment of a further aspect, there is also provided a computing device, including a memory and a processor, where the memory stores executable code, and the processor executes the executable code to implement the method described in any of the above embodiments.

In summary, the method provided in the embodiments of the present specification constructs a big data core product for protecting user privacy by balancing risk and throughput, and combining with user behavior sequences and weak information, and the method and the corresponding product can effectively solve the problem of insufficient user active core information, and can avoid the problem of supervision compliance caused by using data of personal privacy information, so that the method and the corresponding product can be applied to strong supervision areas.

Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in the embodiments disclosed herein may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The above-mentioned embodiments, objects, technical solutions and advantages of the embodiments disclosed in the present specification are further described in detail, it should be understood that the above-mentioned embodiments are only specific embodiments of the embodiments disclosed in the present specification, and are not intended to limit the scope of the embodiments disclosed in the present specification, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the embodiments disclosed in the present specification should be included in the scope of the embodiments disclosed in the present specification.

Claims

1. An identity verification method based on user privacy protection, the method comprising:

acquiring at least one operation behavior corresponding to a target user and/or at least one behavior additional data corresponding to the operation behavior when the operation behavior occurs, wherein the behavior additional data is data which is not user privacy data and is related to the operation behavior of the user;

generating at least one candidate stem by combining the at least one operation behavior and/or the at least one behavior additional data and a plurality of time windows, wherein a single candidate stem is used for asking for another one after any one or two of the time windows, the operation behavior and the behavior additional data are determined;

respectively configuring corresponding answer candidate items aiming at the at least one candidate question stem to form at least one candidate question, wherein the answer candidate items comprise correct answers and at least one interference item;

screening out at least one candidate topic meeting a preset requirement from the at least one candidate topic to serve as a target topic;

and verifying the identity of the target user based on the at least one target topic.

2. The method according to claim 1, wherein before selecting at least one candidate topic from the at least one candidate topic as the target topic, the method further comprises:

predicting the passing rate and the risk rate respectively corresponding to the at least one candidate question; the passing rate is at least determined based on the probability of selecting correct answers when legal users answer, and the risk rate is at least determined based on the probability of selecting correct answers when illegal users answer;

screening out at least one candidate topic meeting a preset requirement from the at least one candidate topic as a target topic, wherein the screening comprises the following steps:

and screening out at least one candidate topic with the risk rate lower than a preset threshold value and the passing rate with the sequence at the top preset digit from the at least one candidate topic as the target topic.

3. The method of claim 2, wherein predicting respective passage rates for the at least one candidate topic comprises:

predicting the probability of selecting a correct answer when the target user answers the corresponding candidate question through a pre-trained prediction model; the prediction model is obtained by training based on a first training sample, wherein the first training sample comprises first sample characteristics and sample labels representing whether correct answers are selected or not when each legal user actually answers; the first sample characteristics comprise behavior characteristics extracted based on the operation behaviors of legal users and question characteristics extracted based on each candidate question.

4. The method of claim 2, wherein predicting the respective risk rates for the at least one candidate topic comprises:

for any first candidate topic in the at least one candidate topic, obtaining a prior probability value of each answer candidate item under the first candidate topic;

and calculating the ratio of the prior probability value of the correct answer to the sum of the prior probability values of all answer candidate items under the first candidate item to serve as the first risk rate of the first candidate item.

5. The method of claim 4, wherein predicting respective risk rates for the at least one candidate topic further comprises:

acquiring the probability of account embezzlement as a second risk rate, and taking the sum of the first risk rate and the second risk rate as the total risk rate corresponding to the first candidate topic; the probability of the account theft occurring is obtained based on a pre-trained theft risk model.

6. The method of claim 1, wherein the at least one operational behavior is performed based on a designated application with payment functionality, including at least one of payment, login, password change, binding, active identity verification, participation in passive identity verification, participation in a marketing campaign, verification, activation, registration, and top-up.

7. The method of claim 1, wherein the behavioral additional data comprises environmental data and/or behavioral object data that does not uniquely correspond to a target user.

8. The method of claim 7, wherein the environmental data comprises device data and/or network environmental data corresponding to when the operational behavior occurs;

the device data comprises at least one of a device brand name, a device category, a device model, an operating system category and an operating system name;

the network environment data includes at least one of a network operator name and a network type.

9. The method of claim 7, wherein the behavioral object data includes at least one of a marketing campaign name, a top-up amount, a payment type, payment object attributes, and operational behavioral frequency engaged by the target user.

10. The method of claim 1, wherein a single candidate stem is used to ask additional data for action after the time window, operational action are determined;

respectively configuring corresponding answer candidates for the at least one candidate stem, including:

acquiring behavior additional data corresponding to the target user when the operation behavior in the candidate question stem occurs, and taking the behavior additional data as a correct answer of the question stem;

adding at least one distracter based on the correct answer.

11. The method of claim 10, wherein adding at least one distracter based on the correct answer comprises:

acquiring a plurality of behavior additional data respectively corresponding to operation behaviors of different users in the candidate question stem as candidates of the interference item;

obtaining a correct answer and the prior probability value of each alternative, and selecting an alternative higher than the prior probability value of the correct answer from a plurality of alternatives as an interference item.

12. The method of claim 11, wherein adding at least one distracter based on the correct answer further comprises:

selecting one alternative from a plurality of alternatives, wherein the prior probability value exceeds a specified value and is lower than the prior probability value of the correct answer, and the one alternative is used as another interference item; and/or the presence of a gas in the gas,

from the plurality of alternatives, an alternative having a prior probability value exceeding a specified value is randomly selected as another interference item.

13. An identity verification apparatus based on user privacy protection, the apparatus comprising:

the device comprises an acquisition unit, a processing unit and a processing unit, wherein the acquisition unit is configured to acquire at least one operation behavior corresponding to a target user and/or at least one behavior additional data corresponding to the operation behavior when the operation behavior occurs, and the behavior additional data is data which is not user privacy data and is related to the operation behavior of the user;

a combination unit configured to generate at least one candidate stem by combining the at least one operation behavior and/or the at least one behavior additional data and a plurality of time windows, wherein a single candidate stem is used to ask a question about any one or two of the time windows, the operation behavior and the behavior additional data after the other one is determined;

the confusion unit is configured to configure corresponding answer candidates for the at least one candidate question stem respectively to form at least one candidate question, wherein the answer candidates comprise correct answers and at least one interference item;

the screening unit is configured to screen out at least one candidate topic which meets a preset requirement from the at least one candidate topic to serve as a target topic;

a verification unit configured to verify an identity of the target user based on the at least one target topic.

14. The apparatus according to claim 13, wherein the apparatus further comprises a prediction unit configured to predict a respective pass rate and a respective risk rate for the at least one candidate topic; the passing rate is at least determined based on the probability of selecting correct answers when legal users answer, and the risk rate is at least determined based on the probability of selecting correct answers when illegal users answer;

the screening unit is specifically configured to:

15. The apparatus of claim 14, wherein the prediction unit is configured to:

16. The apparatus of claim 14, wherein the prediction unit is further configured to:

17. The apparatus of claim 16, wherein the prediction unit is further configured to:

18. The apparatus of claim 13, wherein the at least one operational behavior is performed based on a designated application with payment functionality, including at least one of payment, login, password change, binding, active identity verification, participation in passive identity verification, participation in a marketing campaign, verification, activation, registration, top-up.

19. The apparatus of claim 13, wherein the behavioral additional data comprises environmental data and/or behavioral object data that does not uniquely correspond to a target user.

20. The apparatus of claim 19, wherein the environmental data comprises device data and/or network environmental data corresponding to when the operational behavior occurs;

21. The apparatus of claim 19, wherein the behavioral object data includes at least one of a marketing campaign name, a top-up amount, a payment type, payment object attributes, and operational behavioral frequency engaged by the target user.

22. The apparatus of claim 13, wherein a single candidate stem is used to ask additional data for action after the time window, operational action are determined;

the obfuscation unit is specifically configured to:

adding at least one distracter based on the correct answer.

23. The apparatus of claim 22, wherein the obfuscation unit is specifically configured to:

24. The apparatus of claim 23, wherein the obfuscation unit is further configured to:

25. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed in a computer, causes the computer to perform the method of any of claims 1-12.

26. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that when executed by the processor implements the method of any of claims 1-12.