WO2020015089A1

WO2020015089A1 - Identity information risk assessment method and apparatus, and computer device and storage medium

Info

Publication number: WO2020015089A1
Application number: PCT/CN2018/104806
Authority: WO
Inventors: 孙静远; 徐亮; 肖京
Original assignee: 平安科技（深圳）有限公司
Priority date: 2018-07-18
Filing date: 2018-09-10
Publication date: 2020-01-23
Also published as: CN109242740A

Abstract

Provided is an identity information risk assessment method, comprising: receiving identity data; extracting an identity feature parameter from the identity data; searching for historical verification data corresponding to the identity data, and extracting a verification time parameter from the historical verification data; inputting the identity feature parameter and the verification time parameter into a preset risk assessment model to obtain an identity risk probability; and generating a risk assessment result according to the identity risk probability.

Description

Identity information risk assessment method, device, computer equipment and storage medium

Cross-reference to related applications

This application claims the priority of a Chinese patent application filed on July 18, 2018 with the Chinese Patent Office under the application number of 2018107914492, the application name is "Identity Information Risk Assessment Method, Device, Computer Equipment, and Storage Medium", all of which passed Citations are incorporated in this application.

Technical field

The present application relates to an identity information risk assessment method, device, computer equipment, and storage medium.

Background technique

Airports, ports and other places of entry and exit will pass through a large number of passengers every day, and some of them are smugglers, smugglers and other illegal elements.

When conducting security checks on passengers at the entry and exit places, they usually judge passengers based on their work experience to determine whether there is a security risk. However, due to the large volume of people passing through customs every day, the security risks of passengers that can be detected by relying only on the manual inspection of security personnel are very limited, resulting in a low accuracy of security checks at entry and exit places and making many criminals into Missing Fish.

Summary of the invention

According to various embodiments disclosed in the present application, an identity information risk assessment method, apparatus, computer equipment, and storage medium are provided.

An identity information risk assessment method includes:

Receiving identity data;

Extracting identity characteristic parameters from the identity data;

Find historical verification data corresponding to the identity data, and extract verification time parameters from the historical verification data;

Inputting the identity characteristic parameter and the verification time parameter into a preset risk assessment model to obtain an identity risk probability; and

A risk assessment result is generated according to the identity risk probability.

An identity information risk assessment device includes:

Identity data acquisition module, for receiving identity data;

An identity parameter extraction module, configured to extract identity characteristic parameters from the identity data;

A time parameter acquisition module, configured to find historical verification data corresponding to the identity data, and extract verification time parameters from the historical verification data;

A risk probability obtaining module, configured to input the identity characteristic parameter and the verification time parameter into a preset risk assessment model to obtain an identity risk probability; and

A risk result generating module is configured to generate a risk assessment result according to the identity risk probability.

A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the processor, the one or more processors are executed. The following steps:

Receiving identity data;

Extracting identity characteristic parameters from the identity data;

One or more non-volatile storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors execute the following steps:

Receiving identity data;

Extracting identity characteristic parameters from the identity data;

Details of one or more embodiments of the present application are set forth in the accompanying drawings and description below. Other features and advantages of the application will become apparent from the description, the drawings, and the claims.

BRIEF DESCRIPTION OF THE DRAWINGS

In order to explain the technical solutions in the embodiments of the present application more clearly, the drawings used in the embodiments will be briefly introduced below. Obviously, the drawings in the following description are only some embodiments of the present application. Those of ordinary skill in the art can obtain other drawings according to the drawings without paying creative labor.

FIG. 1 is an application scenario diagram of an identity information risk assessment method according to one or more embodiments.

FIG. 2 is a schematic flowchart of an identity information risk assessment method according to one or more embodiments.

3 is a schematic flowchart of a method for generating a preset risk assessment model according to one or more embodiments.

FIG. 4 is a structural block diagram of an identity information risk assessment device according to one or more embodiments.

FIG. 5 is an internal structural diagram of a computer device according to one or more embodiments.

detailed description

In order to make the technical solution and advantages of the present application more clear and clear, the present application is further described in detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are only used to explain the application, and are not used to limit the application.

The identity information risk assessment method provided in this application can be applied to the application environment shown in FIG. 1. The terminal 102 communicates with the server 104 through a network. The server 104 receives the passenger identity data sent by the terminal 102, extracts identity characteristic parameters from the received identity data, finds historical verification data corresponding to the identity data, extracts verification time parameters from the historical verification data, and combines the identity characteristic parameters and verification time The parameters are input to a preset risk assessment model to obtain an identity risk probability, a risk assessment result is generated according to the identity risk probability, and the server 104 returns the generated risk assessment result to the terminal 102. The terminal 102 may be, but is not limited to, various personal computers, notebook computers, smart phones, tablet computers, and portable wearable devices. The server 104 may be implemented by an independent server or a server cluster composed of multiple servers.

In one embodiment, as shown in FIG. 2, an identity information risk assessment method is provided. The method is applied to the server 104 in FIG. 1 as an example for description, and includes the following steps:

Step 210: Receive identity data.

Identity data is data that can uniquely determine the identity of a passenger, such as the ID type, ID number, etc. of IDs, visas, student IDs and other IDs.

The staff of the security terminal can collect the identity data of the passengers who are checked through the identity information collection equipment such as a credit card machine. The identity information collection equipment transmits the collected passenger's identity data to the security terminal. The staff can also enter the passenger's Identity data. The security terminal sends the acquired passenger's identity data to the server, and the server receives the identity data sent by the security terminal.

Step 220: Extract identity characteristic parameters from the identity data.

The identity characteristic parameter is a parameter for characterizing the passenger. The identity characteristic parameter may include parameters such as passenger age, passenger origin, and passenger gender. The passenger's identity data includes the passenger's characteristic parameters, and the server extracts the identity characteristic parameters from the received identity data.

In one of the embodiments, the step of extracting identity characteristic parameters from the identity data may include: extracting a document number from the identity data; identifying the document format of the document number, searching for a document type corresponding to the format recognition result; The document number is segmented to obtain the segmented character string; the identity characteristic parameters corresponding to each segmented character string are found.

The server extracts the ID number from the identity data. The server recognizes the credential format of the credential number, and recognizes the credential format such as the number length and alphanumeric composition of the credential number. The mapping relationship between the credential type and the credential format is stored in the server in advance, and the server looks for the credential type corresponding to the recognized credential format. In the embodiment, the credential category may include categories such as an identity card, a pass, a home permit, and a passport.

The character string of the preset position in the credential number of different credential types corresponds to a certain identity characteristic parameter. The server obtains the preset position and the preset length of the character string corresponding to the credential type. Segmentation is performed to obtain a segmentation string. The server obtains a data conversion table of each preset character string corresponding to the credential type, and the data conversion table stores the correspondence between the specific string value of each preset character string and the identity characteristic parameter. The server looks up the identity characteristic parameter corresponding to each segmented character string from the data conversion table.

For example, if the type of ID is ID, after the ID number is segmented, the first three digits of the ID number are a participle string, and the first three digits of the ID number correspond to the identity of the passenger, and the server obtains the first three digits of the ID number. The corresponding data conversion table, for example, the first three digits of the ID number is "410", and the passenger's nationality parameter corresponding to "410" found from the data conversion table is "Henan". Using the above method, the server looks up the identity characteristic parameters corresponding to all the segmented character strings.

Step 230: Find historical verification data corresponding to the identity data, and extract verification time parameters from the historical verification data.

The historical verification data is the historical record data of passengers' security inspections. The historical record data may include data such as the security inspection time and security inspection results of passengers during previous security inspections. The server searches the historical verification data corresponding to the passenger according to the document number in the passenger identity data.

The verification time parameters may include parameters such as the frequency of passengers entering and exiting the security inspection place for security inspections, the time period for each security inspection, and the time period to which the current security inspection time belongs. Specifically, the frequency of security inspections can be set to daily security inspections. Frequency, weekly security inspection frequency and monthly security inspection frequency, etc., the frequency of security inspections can be specifically set by the staff according to the actual security inspection requirements. The server performs statistics on the historical verification data, and counts out various verification time parameters from it.

Step 240: Input the identity characteristic parameters and the verification time parameters into a preset risk assessment model to obtain an identity risk probability.

The server obtains a preset risk assessment model, and the preset risk assessment model is a preset model for assessing passenger safety risks. The input of the preset risk assessment model is various identity characteristic parameters and verification time parameters, and the output is the probability that the passenger has a security risk. The server inputs the extracted identity characteristic parameters and verification time parameters into a preset risk assessment model, and the preset risk assessment model calculates and processes the parameters to obtain the identity risk probability.

Step 250: Generate a risk assessment result according to the identity risk probability.

The server generates a risk assessment result based on the calculated identity risk probability. The risk assessment result may include information such as identity risk probability, passenger historical security information, and security deployment recommendations.

In this embodiment, the server extracts the identity characteristic parameters from the received passenger's identity data, and searches for historical verification data corresponding to the extracted identity characteristic parameters, and sets a risk assessment model in advance to combine the identity characteristic parameters with the corresponding The historical verification data can be used to enter the preset risk assessment model to obtain the passenger's identity risk probability, so that the passenger's security risk can be calculated and evaluated scientifically based on the passenger's characteristics and historical data, thereby improving the accuracy of the security inspection.

In one embodiment, as shown in FIG. 3, the method for generating a preset risk assessment model includes:

Step 201: Collect sample data, and divide the sample data into training set data and test set data.

The sample data is historical data for security inspections in real security places. The server collects historical security inspection data within a preset time range. The preset time range can be set to 1 month, 3 months, half a year, etc. The server will collect samples The data is randomly divided into training set data and test set data. The number of samples contained in the training set data and the test set data may be the same or different.

Step 203: Extract a first feature parameter and a first target category from the training set data.

According to the inspection results of the security inspection, the sample data can be divided into positive sample data and negative sample data. The positive sample data is the historical security data of passengers whose inspection results are normal, and the negative sample data is the historical security data of passengers whose inspection results are abnormal. Both the training set data and the test set data contain both positive sample data and negative sample data.

The server extracts the first feature parameter and the first target category one by one from each sample of the training set data. The first characteristic parameter includes an identity characteristic parameter and a verification time parameter, that is, the first feature parameter corresponds to an identity characteristic parameter extracted from passenger identity data in an actual security check and a verification time parameter extracted from a passenger historical verification data. The first target category is the category of the security inspection results. The first target category is divided into two categories: normal security check and abnormal security check.

Step 205: Perform feature gain evaluation according to the first feature parameter and the first target category, and perform feature selection according to the feature gain evaluation result. Classify the selected features to obtain an initial decision tree risk assessment model, and calculate the initial decision based on the training set data The risk probability of each classification node in the tree risk assessment model.

In this embodiment, the preset risk assessment model constructed is a decision tree model. A decision tree is a tree structure composed of nodes and directed edges that is used to classify instances. There are two types of nodes: internal nodes and leaf nodes. Among them, the internal nodes represent test conditions for features or attributes, and the leaf nodes represent classification. The specific method of using the decision tree model for classification is: starting from the root node, testing a certain feature of the instance, and assigning the instance to its child nodes according to the test results. When it is possible to reach a leaf node or another internal node along this branch, the new test condition is used to recursively execute until a leaf node is reached. When the leaf node is reached, the final classification result is obtained, and the leaf node is used as the classification node.

In this embodiment, an ID3 algorithm is used to construct an initial decision tree risk assessment model. The ID3 algorithm evaluates the information gain of each feature, and selects the feature parameter with the largest information gain each time as a judgment module to establish a child node. The server calculates the information gain of each feature corresponding to the first feature parameter, selects the feature with the largest information gain as the judgment module to establish the child node, divides the training set data corresponding to the child node into the subset data, and recursively performs the subset data. Branching establishes branch nodes until all branch nodes correspond to the same first target category.

Specifically, the server uses the following formula (1) to calculate the information gain of each feature corresponding to the first feature parameter:

g (D, A) = H (D) -H (D | A) (1)

Among them, g (D, A) is the information gain of feature A on training data set D, H (D) is the empirical entropy of training data set D, and H (D | A) is the empirical conditional entropy of feature A on data set D .

The server uses the following formula (2) to calculate the empirical entropy H (D) of the training data set D:

Among them, C _k is the number of samples corresponding to the first target category, and K is the number of categories of the first target category. In this embodiment, the first target category is divided into two types: normal security check and abnormal security check.

The server uses the following formula (3) to calculate the empirical conditional entropy H (D | A) of feature A on the training data set D:

Wherein, value (A) wherein A is a set of all values, i is a value characteristic of the A, D _i is a training data set D wherein A is a sample set of values of i, | D _i | that value Is the number of samples in the sample set of i, | D | represents the total number of samples before the sample set is divided. For example, all the values of feature A corresponding to the gender characteristic parameter are male and female. For example, male can be represented by 0, female can be represented by 1, and value (A) is (0, 1).

The server uses the Hunt algorithm to build a decision tree recursively. After calculating the information gain of each feature parameter and selecting features, it obtains the training set data corresponding to the feature parameter with the largest information gain, and uses the same method for the training set data. Feature selection is performed on the subsets, and the training data set is gradually divided into more pure subsets.

The recursive definition of the Hunt algorithm is as follows: Let Dt be the subset of training data associated with node t, and y = {y1, y2, ..., yc} be the target category labels. If all sample data in Dt belong to the same category, Then t is a leaf node, labeled with yt; if Dt contains sample data belonging to multiple categories, a feature test condition is selected to divide the sample data into smaller subsets. For each output of the test condition, create a branch node and distribute the sample data in Dt to the branch nodes based on the test results. For each branch node, the algorithm is called recursively.

After the server builds the initial decision tree risk assessment model, according to the first feature parameters and the first target category of each sample in the training data set, the feature parameter combination corresponding to each classification node in the initial decision tree risk assessment model is calculated from the training data set. For the matched negative sample data, calculate the ratio of the statistical negative sample data to the total negative sample data in the training data set, and use this ratio as the risk probability of each classification node.

Step 207: Extract a second feature parameter and a second target category from the test set data.

The server extracts the second feature parameter and the second target category one by one from each sample of the test set data. Among them, the second characteristic parameter includes an identity characteristic parameter and a verification time parameter, that is, the second feature parameter corresponds to the identity characteristic parameter extracted from the passenger identity data in the actual security inspection and the verification time parameter extracted from the passenger historical verification data. The second target category is the category of the security inspection results. The second target category is divided into two categories: normal security inspection and abnormal security inspection.

In step 209, the risk probability of each classification node in the initial decision tree risk assessment model is verified according to the second feature parameter and the second target category, and the initial decision tree risk assessment model is adjusted and a preset risk assessment model is generated according to the verification result.

Based on the second feature parameters and the second target category of each sample in the test data set, the server calculates negative sample data from the test data set that matches the combination of the feature parameter corresponding to each classification node in the initial decision tree risk assessment model, and calculates the statistical negative The proportion of sample data in the total negative sample data in the test data set, and the risk probability of each classification node in the decision tree model is verified based on the calculated proportion. During the verification, the server can set a preset tolerance error. When the absolute difference between the calculated ratio and the risk probability is less than the preset tolerance error, the verification passes. When the absolute difference between the calculated ratio and the risk probability is greater than the preset, When the tolerance is poor, the verification fails. When the verification fails, the server can add the sample data in the test data set to the training data set, expand the sample capacity to train the initial decision tree risk assessment model, and adjust the initial decision tree risk assessment model to generate a preset risk assessment model.

In one embodiment, the identity information risk assessment method may further include: when the update time of the verification data is reached, loading the updated verification data; and extracting from the verification data a third characteristic parameter and a risk target corresponding to a preset risk assessment model Mark; verify the risk probability of each classification node in the preset risk assessment model according to the third characteristic parameter and the risk target mark, and optimize the preset risk assessment model according to the verification result.

The server presets the check data update time, and the check data update time is the time to update the security check data of the security place. When the preset check data update time is reached, the server loads the updated check data. The check data includes the passenger's identity data, check time, and security check results. The security terminal can actively or passively send the updated check data to the server.

The server extracts the third characteristic parameter and the risk target mark from the verification data. The third characteristic parameter corresponds to the characteristic set in the preset risk assessment model. The risk target mark is a security check result mark, which is divided into no security risk mark and There are two types of security risk tags.

The server calculates the negative sample data that matches the combination of the characteristic parameter corresponding to each classification node in the preset risk assessment model from the check data according to the third characteristic parameter and the risk target mark of each sample in the check data, and calculates the statistical negative sample data. The proportion of the total negative sample data in the verification data, and the risk probability of each classification node in the preset risk assessment model is verified according to the calculated proportion. During verification, the server can set a preset deviation. When the absolute difference between the calculated ratio and the risk probability is less than the preset deviation, the verification passes; when the absolute difference between the calculated ratio and the risk probability is greater than the preset deviation , Verification failed. When the verification fails, the server can continue to train and adjust the audit data to the preset risk assessment model, so as to continuously optimize the preset risk assessment model according to the verification data, so that the training of big data enables the preset risk assessment model to pass. The resulting risk assessment results are becoming more accurate.

In one of the embodiments, the step of generating a risk assessment result according to the identity risk probability may include: finding a decision path corresponding to the identity risk probability with the highest probability value from a preset risk assessment model; obtaining node data of the decision path; and according to the node data The identity risk probability with the largest probability value and the greatest probability value is used to generate a seizure path map and output it.

After the server enters the extracted identity characteristic parameters and verification time parameters into the preset risk assessment model, the identity characteristic parameters and verification time parameters may match the feature parameters in multiple decision paths in the preset risk assessment model. Therefore, it is possible The identity risk probability corresponding to multiple matching classification nodes will be obtained. If the characteristic parameter corresponding to the classification node in the risk assessment model is the clearance frequency, the input parameters may satisfy both the decision path of the classification node being "the number of security checks of the day" and the node characteristic value being "greater than twice". It may meet the decision path two of the classification node as “the number of security checks in the last natural week” and the value of the node feature as “between 8-15 days”. The identity risk probability corresponding to decision path one is 21%, and the identity corresponding to decision path two The risk probability is 25%. The server finds the decision path corresponding to the identity risk probability with the highest probability value from the calculation result of the preset risk assessment model.

The server obtains the characteristic parameters corresponding to each node in the found decision path, and the nodes include internal branch nodes and classification leaf nodes. The server generates a seizure path map in series according to the characteristic parameters of all nodes, and also adds the identity risk probability of the final data to the seizure path map, and returns the seized path map to the security terminal, so that the security terminal displays the seized path map, that is, Visually display the output of the preset risk assessment model, so that security personnel can clearly understand the characteristics of the current passenger and the potential security risks, and determine whether to further inspect the current passenger based on the visualized roadmap.

For example, the passenger ’s identity characteristics and verification time parameters of the input model are “Zhang San, male, 24 years old, Chinese, born in Guangdong, the third verification this month”, which is in line with the “male-20” in the preset risk assessment model. -30-year-old—Chinese—born in South China—check 3-5 times a month, the decision path has a 30% risk probability, which is the decision path with the highest probability value among the decision paths that match all the characteristic parameters. A seizure path map is generated based on the decision path and the corresponding risk probability.

In one of the embodiments, the step of generating a risk assessment result according to the identity risk probability may include: obtaining an identity risk probability with a maximum probability value; obtaining current security manpower data, finding a security passenger flow threshold corresponding to the current security manpower data; obtaining a preset Threshold conversion data. The risk probability threshold is calculated based on the security passenger flow threshold and the preset threshold conversion data. When the identity probability with the highest probability value exceeds the risk probability threshold, a risk check alert is generated and output.

The server obtains the identity risk probability corresponding to each classification node of the preset risk assessment model, and selects the identity risk probability with the largest probability value from it.

The security passenger flow threshold is the maximum value of the passenger flow corresponding to the current security manpower that can perform security checks. The server obtains the current security manpower data. The current security manpower data may include data such as the total security manpower deployed at the current security checkpoint, and the security manpower deployed at the current security checkpoint corresponding to the security terminal. The server obtains the mapping relationship between the pre-stored security manpower data and the security passenger flow threshold, including the mapping relationship between the total security manpower and the total security passenger flow threshold, and the mapping relationship between the security manpower at the current security checkpoint and the corresponding security passenger flow threshold. The server looks up the total security passenger flow threshold corresponding to the current total security manpower, and finds the security checkpoint security passenger flow threshold corresponding to the current security manpower at the security checkpoint.

The risk probability threshold is the minimum value of the identity risk probability that can determine that the passenger has a security risk. The risk probability threshold is not fixed, but adjusted according to the security manpower. When the security manpower is sufficient, the risk probability threshold is set to be relatively small. Conversely, the risk probability threshold is set relatively large.

The server obtains preset threshold conversion data. The preset threshold conversion data is conversion data converted between the security passenger flow threshold and the risk probability threshold. The conversion data may be a mapping table between the security passenger flow threshold and the risk probability threshold, or may be a preset Conversion calculation formula, etc. The server calculates a risk probability threshold corresponding to the security passenger flow threshold according to the preset threshold conversion data, including a first risk probability threshold corresponding to the total security passenger flow threshold and a second risk probability threshold corresponding to the security passenger flow threshold of the security checkpoint. The minimum of a risk probability threshold and a second risk probability threshold is used as the risk probability threshold.

The server compares the obtained identity risk probability with the highest probability value to the calculated risk probability threshold. When the identity risk probability with the largest probability value is less than or equal to the risk probability threshold, the current passenger security check passes, and the server can generate a security check notification and return it to Security terminal; when the identity probability with the largest probability value exceeds the risk probability threshold, the server generates a risk check alert prompt, which can carry the calculated identity risk probability of the current passenger. When the current passenger's historical verification data exists When the security check is abnormally recorded, the abnormal record information is also added to the risk check warning prompt, and the server sends the generated risk check warning prompt to the security terminal to remind the staff at the security checkpoint that the passenger has a certain security risk and needs to be further Security check.

It should be understood that although the steps in the flowchart of FIG. 2-3 are sequentially displayed in accordance with the directions of the arrows, these steps are not necessarily performed in the order indicated by the arrows. Unless explicitly stated in this document, the execution of these steps is not strictly limited, and these steps can be performed in other orders. Moreover, at least a part of the steps in Figure 2-3 may include multiple sub-steps or stages. These sub-steps or stages are not necessarily performed at the same time, but may be performed at different times. These sub-steps or stages The execution order of is not necessarily performed sequentially, but may be performed in turn or alternately with at least a part of another step or a sub-step or stage of another step.

In one embodiment, as shown in FIG. 4, an identity information risk assessment device is provided, including: an identity data acquisition module 410, an identity parameter extraction module 420, a time parameter acquisition module 430, a risk probability acquisition module 440, and a risk. Result generation module 450, where:

The identity data obtaining module 410 is configured to receive identity data.

The identity parameter extraction module 420 is configured to extract identity characteristic parameters from the identity data.

The time parameter obtaining module 430 is configured to find historical verification data corresponding to the identity data, and extract verification time parameters from the historical verification data.

A risk probability obtaining module 440 is configured to input identity characteristic parameters and verification time parameters into a preset risk assessment model to obtain an identity risk probability.

The risk result generating module 450 is configured to generate a risk assessment result according to the identity risk probability.

In one embodiment, the apparatus may further include:

A data acquisition module is used to collect sample data and divide the sample data into training set data and test set data.

A training data extraction module is used to extract a first feature parameter and a first target category from the training set data.

The initial model building module is used to perform feature gain evaluation according to the first feature parameter and the first target category, and to perform feature selection according to the feature gain evaluation result, and classify according to the selected features to obtain an initial decision tree risk assessment model. According to the training set, The data calculates the risk probability of each classification node in the initial decision tree risk assessment model.

The test data extraction module is configured to extract a second feature parameter and a second target category from the test set data.

The evaluation module generation module is used to verify the risk probability of each classification node in the initial decision tree risk assessment model according to the second characteristic parameter and the second target category, and adjust the initial decision tree risk assessment model and generate a preset based on the verification result. Risk assessment model.

In one embodiment, the apparatus may further include:

The verification data loading module is used to load the updated verification data when the verification data update time is reached.

The verification data extraction module is used to extract from the verification data the third characteristic parameter and the risk target mark corresponding to the preset risk assessment model.

The model optimization module is used to verify the risk probability of each classification node in the preset risk assessment model according to the third characteristic parameter and the risk target mark, and optimize the preset risk assessment model according to the verification result.

In one embodiment, the risk result generation module 450 may include:

The path finding module is configured to find a decision path corresponding to an identity risk probability with a maximum probability value from a preset risk assessment model.

The path data acquisition module is used to acquire node data of a decision path.

The path graph generating module is configured to generate and output a seized path graph according to the node data and the identity risk probability with the largest probability value.

In one embodiment, the risk result generation module 450 may include:

The probability acquisition module is configured to acquire an identity risk probability with a maximum probability value.

The security threshold search module is used to obtain the current security manpower data and find the security passenger flow threshold corresponding to the current security manpower data.

The risk threshold calculation module is configured to obtain preset threshold conversion data, and calculate a risk probability threshold according to the security passenger flow threshold and the preset threshold conversion data.

An early warning prompt generating module is configured to generate and output a risk check early warning prompt when the identity probability with the highest probability value exceeds the risk probability threshold.

In one embodiment, the identity parameter extraction module 420 may include:

The number extraction module is used for extracting the document number from the identity data.

The credential type search module is used to identify the credential format of the credential number and find the credential type corresponding to the format recognition result.

The word segmentation module is used to segment the document number according to the type of the document to obtain a word segmentation string.

A parameter search module is used to search for identity characteristic parameters corresponding to each segmented character string.

Regarding the specific limitation of the identity information risk assessment device, please refer to the limitation on the identity information risk assessment method mentioned above, which will not be repeated here. Each module in the above-mentioned identity information risk assessment device may be implemented in whole or in part by software, hardware, and a combination thereof. The above-mentioned modules may be embedded in the hardware form or independent of the processor in the computer device, or may be stored in the memory of the computer device in the form of software, so that the processor calls and performs the operations corresponding to the above modules.

In one embodiment, a computer device is provided. The computer device may be a server, and its internal structure diagram may be as shown in FIG. 5. The computer device includes a processor, a memory, a network interface, and a database connected through a system bus. The processor of the computer device is used to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system, computer-readable instructions, and a database. The internal memory provides an environment for operating the operating system and computer-readable instructions in a non-volatile storage medium. The database of the computer equipment is used to store relevant data of identity information risk assessment. The network interface of the computer device is used to communicate with an external terminal through a network connection. The computer-readable instructions are executed by a processor to implement a method for risk assessment of identity information.

Those skilled in the art can understand that the structure shown in FIG. 5 is only a block diagram of a part of the structure related to the solution of the application, and does not constitute a limitation on the computer equipment to which the solution of the application is applied. The specific computer equipment may be Include more or fewer parts than shown in the figure, or combine certain parts, or have a different arrangement of parts.

A computer device includes a memory and one or more processors. Computer-readable instructions are stored in the memory. When the computer-readable instructions are executed by the processor, the processor causes the one or more processors to perform the following steps: receiving identity data; Identity identity parameters are extracted from identity data; historical verification data corresponding to identity data are found, and verification time parameters are extracted from historical verification data; identity feature parameters and verification time parameters are entered into a preset risk assessment model to obtain identity risk probability; according to identity risk probability Generate risk assessment results.

In one embodiment, when the processor executes the computer-readable instructions, the following steps are further implemented: collecting sample data, dividing the sample data into training set data and test set data; and extracting first feature parameters and first data from the training set data. Target category; feature gain evaluation based on the first feature parameter and the first target category, feature selection based on the feature gain evaluation result, classification based on the selected features to obtain an initial decision tree risk assessment model, and calculation of the initial decision based on the training set data The risk probability of each classification node in the tree risk assessment model; the second feature parameter and the second target category are extracted from the test set data; The risk probability is verified, and the initial decision tree risk assessment model is adjusted and a preset risk assessment model is generated according to the verification result.

In one of the embodiments, when the processor executes the computer-readable instructions, the processor further implements the following steps: when the verification data update time is reached, loading the updated verification data; and extracting a third feature corresponding to a preset risk assessment model from the verification data Parameters and risk target labels; the risk probability of each classification node in the preset risk evaluation model is verified according to the third characteristic parameter and the risk target labels, and the preset risk evaluation model is optimized based on the verification results.

In one embodiment, when the processor executes the computer-readable instructions, the step of generating a risk assessment result according to the identity risk probability is further used to: find a decision path corresponding to the identity risk probability with the highest probability value from a preset risk assessment model ; Obtain the node data of the decision path; generate the seizure path map based on the node data and the identity risk probability with the largest probability value and output it.

In one embodiment, when the processor executes the computer-readable instructions, the step of generating a risk assessment result according to the identity risk probability is further used to: obtain the identity risk probability with the largest probability value; obtain the current security manpower data, find and compare the current security The security passenger flow threshold corresponding to the human data; obtain preset threshold conversion data, and calculate the risk probability threshold based on the security passenger flow threshold and the preset threshold conversion data; when the identity risk probability with the highest probability value exceeds the risk probability threshold, generate a risk check warning prompt and Output.

In one of the embodiments, when the processor executes the computer-readable instructions to implement the step of extracting identity characteristic parameters from the identity data, the processor is further configured to: extract a document number from the identity data; identify and search for the document format of the document number The document type corresponding to the format recognition result; segmenting the document number according to the document type to obtain a segmented character string; and finding the identity characteristic parameters corresponding to each segmented character string.

One or more non-volatile storage media storing computer-readable instructions. When the computer-readable instructions are executed by one or more processors, the one or more processors cause the following steps to be performed: receiving identity data; Extract the identity characteristic parameters in the system; find the historical verification data corresponding to the identity data, and extract the verification time parameters from the historical verification data; enter the identity characteristic parameters and the verification time parameters into the preset risk assessment model to obtain the identity risk probability; evaluation result.

In one embodiment, when the computer-readable instructions are executed by the processor, the following steps are further implemented: collecting sample data, dividing the sample data into training set data and test set data; and extracting first feature parameters and first A target category; feature gain evaluation based on the first feature parameter and the first target category, feature selection based on the feature gain evaluation result, classification based on the selected features to obtain an initial decision tree risk assessment model, and initial calculation based on the training set data The risk probability of each classification node in the decision tree risk assessment model; the second feature parameter and the second target category are extracted from the test set data; each classification node in the initial decision tree risk assessment model is based on the second feature parameter and the second target category According to the verification results, the initial decision tree risk assessment model is adjusted and a preset risk assessment model is generated.

In one embodiment, when the computer-readable instructions are executed by the processor, the following steps are further implemented: when the update time of the audit data is reached, loading the updated audit data; and extracting a third corresponding to the preset risk assessment model from the audit data Feature parameters and risk target tags; verify the risk probability of each classification node in the preset risk assessment model according to the third feature parameter and risk goal tags, and optimize the preset risk assessment model based on the verification results.

In one embodiment, when the computer-readable instructions are executed by the processor, the step of generating a risk assessment result according to the identity risk probability is further used to: find a decision corresponding to the identity risk probability with the highest probability value from a preset risk assessment model Path; obtain the node data of the decision path; generate the seizure path map based on the node data and the identity risk probability with the highest probability value and output it.

In one embodiment, when the computer-readable instructions are executed by the processor, the step of generating a risk assessment result according to the identity risk probability is further used to: obtain the identity risk probability with the largest probability value; obtain the current security human data, Security passenger flow threshold corresponding to security manpower data; Obtain preset threshold conversion data, and calculate risk probability threshold based on security passenger flow threshold and preset threshold conversion data; when the identity probability with the highest probability value exceeds the risk probability threshold, generate a risk check warning alert And output.

In one of the embodiments, when the computer-readable instructions are executed by the processor, the following steps are also implemented: when implementing the step of extracting identity characteristic parameters from the identity data, it is further used for: extracting a document number from the identity data; Recognize the document format, find the document type corresponding to the format recognition result; segment the document number according to the document type to obtain the segmented character string; find the identity characteristic parameters corresponding to each segmented character string.

Those of ordinary skill in the art can understand that all or part of the processes in the methods of the above embodiments can be implemented by computer-readable instructions to instruct related hardware. The computer-readable instructions can be stored in a non-volatile computer. In the readable storage medium, the computer-readable instructions, when executed, may include the processes of the embodiments of the methods described above. Wherein, any reference to the memory, storage, database or other media used in the embodiments provided in this application may include non-volatile and / or volatile memory. Non-volatile memory may include read-only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory. Volatile memory can include random access memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in various forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Synchlink DRAM (SLDRAM), Rambus Direct RAM (RDRAM), Direct Memory Bus Dynamic RAM (DRDRAM), and Memory Bus Dynamic RAM (RDRAM).

The technical features of the above embodiments can be arbitrarily combined. In order to make the description concise, all possible combinations of the technical features in the above embodiments have not been described. However, as long as there is no contradiction in the combination of these technical features, they should be It is considered to be the range described in this specification.

The above-mentioned embodiments only express several implementation manners of the present application, and their descriptions are more specific and detailed, but they cannot be understood as limiting the scope of the invention patent. It should be noted that, for those of ordinary skill in the art, without departing from the concept of the present application, several modifications and improvements can be made, which all belong to the protection scope of the present application. Therefore, the protection scope of this application patent shall be subject to the appended claims.

Claims

An identity information risk assessment method, including:

Receiving identity data;

Extracting identity characteristic parameters from the identity data;

Find historical verification data corresponding to the identity data, and extract verification time parameters from the historical verification data;

Inputting the identity characteristic parameter and the verification time parameter into a preset risk assessment model to obtain an identity risk probability; and

A risk assessment result is generated according to the identity risk probability.
The method according to claim 1, wherein the generating method of the preset risk assessment model comprises:

Collect sample data, and divide the sample data into training set data and test set data;

Extracting a first feature parameter and a first target category from the training set data;

Perform feature gain evaluation according to the first feature parameter and the first target category, perform feature selection according to the feature gain evaluation result, classify the selected feature to obtain an initial decision tree risk assessment model, and according to the training set data Calculating the risk probability of each classification node in the initial decision tree risk assessment model;

Extracting a second feature parameter and a second target category from the test set data; and

The risk probability of each classification node in the initial decision tree risk assessment model is verified according to the second characteristic parameter and the second target category, and the initial decision tree risk assessment model is adjusted and a preliminary prediction is generated according to the verification result. Set up a risk assessment model.
The method according to claim 2, further comprising:

When the verification data update time is reached, loading the updated verification data;

Extracting from the verification data a third characteristic parameter and a risk target flag corresponding to the preset risk assessment model; and

The risk probability of each classification node in the preset risk assessment model is verified according to the third characteristic parameter and the risk target flag, and the preset risk assessment model is optimized according to a verification result.
The method according to claim 2, wherein the generating a risk assessment result according to the identity risk probability comprises:

Find a decision path corresponding to the identity risk probability with the highest probability value from the preset risk assessment model;

Acquiring node data of the decision path; and

According to the node data and the identity risk probability with the largest probability value, a seizure path map is generated and output.
The method according to claim 2, wherein the generating a risk assessment result according to the identity risk probability comprises:

Get the identity risk probability with the largest probability value;

Acquiring current security manpower data, and searching for a security passenger flow threshold corresponding to the current security manpower data;

Obtaining preset threshold conversion data, and calculating a risk probability threshold based on the security passenger flow threshold value and the preset threshold conversion data; and

When the identity risk probability with the highest probability value exceeds the risk probability threshold, a risk check warning alert is generated and output.
The method according to claim 1, wherein the extracting identity characteristic parameters from the identity data comprises:

Extracting a document number from the identity data;

Identify the document format of the document number, and find the document type corresponding to the format identification result;

Segmenting the credential number according to the credential type to obtain a segmented character string; and

Finding the identity characteristic parameters corresponding to each of the word segmentation strings.
An identity information risk assessment device includes:

Identity data acquisition module, for receiving identity data;

An identity parameter extraction module, configured to extract identity characteristic parameters from the identity data;

A time parameter acquisition module, configured to find historical verification data corresponding to the identity data, and extract verification time parameters from the historical verification data;

A risk probability obtaining module, configured to input the identity characteristic parameter and the verification time parameter into a preset risk assessment model to obtain an identity risk probability; and

A risk result generating module is configured to generate a risk assessment result according to the identity risk probability.
The apparatus according to claim 7, further comprising:

A data collection module, configured to collect sample data, and divide the sample data into training set data and test set data;

A training data extraction module, configured to extract a first feature parameter and a first target category from the training set data;

An initial model building module is configured to perform feature gain evaluation according to the first feature parameter and the first target category, perform feature selection according to the feature gain evaluation result, and classify according to the selected features to obtain an initial decision tree risk assessment model. Calculating the risk probability of each classification node in the initial decision tree risk assessment model according to the training set data;

A test data extraction module, configured to extract a second feature parameter and a second target category from the test set data; and

An evaluation module generating module is configured to verify the risk probability of each classification node in the initial decision tree risk assessment model according to the second characteristic parameter and the second target category, and to evaluate the initial decision tree risk according to the verification result The assessment model is adjusted and a preset risk assessment model is generated.
A computer device includes a memory and one or more processors. The memory stores computer-readable instructions. When the computer-readable instructions are executed by the one or more processors, the one or more processors are Each processor performs the following steps:

Receiving identity data;

Extracting identity characteristic parameters from the identity data;

Find historical verification data corresponding to the identity data, and extract verification time parameters from the historical verification data;

Inputting the identity characteristic parameter and the verification time parameter into a preset risk assessment model to obtain an identity risk probability; and

A risk assessment result is generated according to the identity risk probability.
The computer device according to claim 9, wherein the processor further executes the following steps when executing the computer-readable instructions:

Collect sample data, and divide the sample data into training set data and test set data;

Extracting a first feature parameter and a first target category from the training set data;

Perform feature gain evaluation according to the first feature parameter and the first target category, perform feature selection according to the feature gain evaluation result, classify the selected feature to obtain an initial decision tree risk assessment model, and according to the training set data Calculating the risk probability of each classification node in the initial decision tree risk assessment model;

Extracting a second feature parameter and a second target category from the test set data; and

The risk probability of each classification node in the initial decision tree risk assessment model is verified according to the second characteristic parameter and the second target category, and the initial decision tree risk assessment model is adjusted and a preliminary prediction is generated according to the verification result. Set up a risk assessment model.
The computer device according to claim 10, wherein the processor further executes the following steps when executing the computer-readable instructions:

When the verification data update time is reached, loading the updated verification data;

Extracting from the verification data a third characteristic parameter and a risk target flag corresponding to the preset risk assessment model; and

The risk probability of each classification node in the preset risk assessment model is verified according to the third characteristic parameter and the risk target flag, and the preset risk assessment model is optimized according to a verification result.
The computer device according to claim 10, wherein when the processor executes the computer-readable instructions to generate a risk assessment result according to the identity risk probability, further comprising:

Find a decision path corresponding to the identity risk probability with the highest probability value from the preset risk assessment model;

Acquiring node data of the decision path; and

According to the node data and the identity risk probability with the largest probability value, a seizure path map is generated and output.
The computer device according to claim 10, wherein when the processor executes the computer-readable instructions to generate a risk assessment result according to the identity risk probability, further comprising:

Get the identity risk probability with the largest probability value;

Acquiring current security manpower data, and searching for a security passenger flow threshold corresponding to the current security manpower data;

Acquiring preset threshold conversion data, and calculating a risk probability threshold based on the security passenger flow threshold value and the preset threshold conversion data; and

When the identity risk probability with the highest probability value exceeds the risk probability threshold, a risk check warning alert is generated and output.
The computer device of claim 9, wherein the processor, when executing the computer-readable instructions, extracts identity characteristic parameters from the identity data, further comprising:

Extracting a document number from the identity data;

Identify the document format of the document number, and find the document type corresponding to the format identification result;

Segmenting the credential number according to the credential type to obtain a segmented character string; and

Finding the identity characteristic parameters corresponding to each of the word segmentation strings.
One or more non-transitory computer-readable storage media storing computer-readable instructions that, when executed by one or more processors, cause the one or more processors to perform the following steps:

Receiving identity data;

Extracting identity characteristic parameters from the identity data;

Find historical verification data corresponding to the identity data, and extract verification time parameters from the historical verification data;

Inputting the identity characteristic parameter and the verification time parameter into a preset risk assessment model to obtain an identity risk probability; and

A risk assessment result is generated according to the identity risk probability.
The storage medium according to claim 15, wherein when the computer-readable instructions are executed by the processor, the following steps are further performed:

Collect sample data, and divide the sample data into training set data and test set data;

Extracting a first feature parameter and a first target category from the training set data;

Perform feature gain evaluation according to the first feature parameter and the first target category, perform feature selection according to the feature gain evaluation result, classify the selected feature to obtain an initial decision tree risk assessment model, and according to the training set data Calculating the risk probability of each classification node in the initial decision tree risk assessment model;

Extracting a second feature parameter and a second target category from the test set data; and

The risk probability of each classification node in the initial decision tree risk assessment model is verified according to the second characteristic parameter and the second target category, and the initial decision tree risk assessment model is adjusted and a preliminary prediction is generated according to the verification result. Set up a risk assessment model.
The storage medium according to claim 16, wherein when the computer-readable instructions are executed by the processor, the following steps are further performed:

When the verification data update time is reached, loading the updated verification data;

Extracting from the verification data a third characteristic parameter and a risk target flag corresponding to the preset risk assessment model; and

The risk probability of each classification node in the preset risk assessment model is verified according to the third characteristic parameter and the risk target flag, and the preset risk assessment model is optimized according to a verification result.
The storage medium according to claim 16, wherein when the computer-readable instructions are executed by the processor, a risk assessment result is generated according to the identity risk probability, further comprising:

Find a decision path corresponding to the identity risk probability with the highest probability value from the preset risk assessment model;

Acquiring node data of the decision path; and

According to the node data and the identity risk probability with the largest probability value, a seizure path map is generated and output.
The storage medium according to claim 16, wherein when the computer-readable instructions are executed by the processor, a risk assessment result is generated according to the identity risk probability, further comprising:

Get the identity risk probability with the largest probability value;

Acquiring current security manpower data, and searching for a security passenger flow threshold corresponding to the current security manpower data;

Acquiring preset threshold conversion data, and calculating a risk probability threshold based on the security passenger flow threshold value and the preset threshold conversion data; and

When the identity risk probability with the highest probability value exceeds the risk probability threshold, a risk check warning alert is generated and output.
The storage medium according to claim 15, wherein the execution of the computer-readable instructions by the processor to extract identity characteristic parameters from the identity data further comprises:

Extracting a document number from the identity data;

Identify the document format of the document number, and find the document type corresponding to the format identification result;

Segmenting the credential number according to the credential type to obtain a segmented character string; and

Finding the identity characteristic parameters corresponding to each of the word segmentation strings.