CN112801563B - Risk assessment method and device - Google Patents

Risk assessment method and device Download PDF

Info

Publication number
CN112801563B
CN112801563B CN202110397972.9A CN202110397972A CN112801563B CN 112801563 B CN112801563 B CN 112801563B CN 202110397972 A CN202110397972 A CN 202110397972A CN 112801563 B CN112801563 B CN 112801563B
Authority
CN
China
Prior art keywords
features
product
risk
identified
feature
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110397972.9A
Other languages
Chinese (zh)
Other versions
CN112801563A (en
Inventor
李迪
刘丹丹
杨达明
向丽
沈磊
李晶莹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202110397972.9A priority Critical patent/CN112801563B/en
Publication of CN112801563A publication Critical patent/CN112801563A/en
Application granted granted Critical
Publication of CN112801563B publication Critical patent/CN112801563B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data
    • G06Q30/0203Market surveys; Market polls

Landscapes

  • Business, Economics & Management (AREA)
  • Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Development Economics (AREA)
  • Finance (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Accounting & Taxation (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Marketing (AREA)
  • Software Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Operations Research (AREA)
  • Educational Administration (AREA)
  • Artificial Intelligence (AREA)
  • Tourism & Hospitality (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Quality & Reliability (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the specification provides a risk assessment method and a risk assessment device. According to the method of the embodiment, N characteristic values of a product to be identified are obtained firstly, wherein N is a positive integer; then determining the dynamic risk score of the product to be identified by using the weights of the N characteristics and the values of the N characteristics, wherein the weights of the N characteristics are obtained by training a machine learning model; and determining the risk state of the product to be identified by utilizing the dynamic risk score.

Description

Risk assessment method and device
Technical Field
One or more embodiments of the present disclosure relate to the field of computer application technologies, and in particular, to a risk assessment method and apparatus.
Background
Internet products are an integration of functions and services based on internet technology to meet the specific needs of users. Risk assessment is often required before and after the internet product is on line in different stages so as to avoid adverse effects on the user or other people or even the society in the process of using the internet product. Therefore, a method for automatically and accurately performing risk assessment is needed.
Disclosure of Invention
One or more embodiments of the present specification describe a method for risk assessment to facilitate automated and accurate risk assessment of internet products.
According to a first aspect, there is provided a method of risk assessment comprising:
obtaining N characteristic values of a product to be identified, wherein N is a positive integer;
determining a dynamic risk score of the product to be identified by using the weights of the N features and the values of the N features, wherein the weights of the N features are obtained by training a machine learning model;
and determining the risk state of the product to be identified by utilizing the dynamic risk score.
In one embodiment, before the obtaining N features of the product to be identified, the method further includes:
and screening the N characteristics from the characteristic library corresponding to the product to be identified according to the correlation coefficient among the characteristics.
In another embodiment, the screening the N features from the feature library corresponding to the product to be identified according to the correlation coefficient between the features includes:
calculating correlation coefficients of every two features in the feature library;
determining a pair of features with the highest correlation coefficient;
calculating the average value of the correlation coefficient of each feature and other features of the pair of features respectively;
and deleting the feature with the larger average value in the pair of features, and switching to the feature with the highest determined correlation until the correlation coefficient between every two of the remaining N features is lower than a preset correlation coefficient threshold.
In one embodiment, before the screening the N features from the feature library corresponding to the product to be identified according to the correlation coefficient between the features, the method further includes: normalizing the characteristic values of the products in the characteristic library to eliminate dimension difference;
the values of the N characteristics are the values after the normalization.
In another embodiment, the product to be identified is a product of a preset reference risk category;
prior to the determining a dynamic risk score for the product to be identified using the weights of the N features and the values of the N features, further comprising:
and taking the values of the N characteristics of the products of the preset reference category as the input of a machine learning model, taking the preset reference risk category as the target output of the machine learning model, and training the machine learning model to obtain the weight of the N characteristics.
In one embodiment, the machine learning model comprises a logistic regression model.
In another embodiment, the preset reference risk category is obtained by performing coarse-grained discrimination on the product to be identified.
In one embodiment, determining the dynamic risk score for the product to be identified using the weights of the N features and the values of the N features comprises:
respectively determining initial scores of the N characteristics by using the normalized values of the N characteristics;
and carrying out weighted summation on the initial scores of the N characteristics by using the weights of the N characteristics to obtain the dynamic risk score of the product to be identified.
In another embodiment, determining the risk status of the product to be identified using the dynamic risk score includes:
determining a static risk score of the product to be identified according to the matching condition of the attribute information of the product to be identified and a preset static rule;
and integrating the dynamic risk score and the static risk score to obtain the risk condition of the product to be identified.
According to a second aspect, there is provided a risk assessment apparatus comprising:
the characteristic value acquisition unit is configured to acquire values of N characteristics of a product to be identified, wherein the N characteristics are screened from a characteristic library corresponding to the product to be identified according to correlation coefficients among the characteristics, and N is a positive integer;
the dynamic scoring unit is configured to determine a dynamic risk score of the product to be identified by using the weights of the N characteristics and the values of the N characteristics, wherein the weights of the N characteristics are obtained by training a machine learning model;
a risk determination unit configured to determine a risk status of the product to be identified using the dynamic risk score.
In one embodiment, the apparatus further comprises:
and the feature screening unit is configured to screen the N features from the feature library corresponding to the product to be identified according to the correlation coefficient between the features.
In another embodiment, the feature filtering unit is specifically configured to:
calculating correlation coefficients of every two features in the feature library;
determining a pair of features with the highest correlation coefficient;
calculating the average value of the correlation coefficient of each feature and other features of the pair of features respectively;
and deleting the feature with the larger average value in the pair of features, and switching to the feature with the highest determined correlation until the correlation coefficient between every two of the remaining N features is lower than a preset correlation coefficient threshold.
In one embodiment, the apparatus further comprises:
the normalization unit is configured to normalize the characteristic value of each product in the characteristic library so as to eliminate dimension difference;
the values of the N features acquired by the feature value acquisition unit are values after the normalization.
In another embodiment, the product to be identified is a product of a preset reference risk category;
the device also includes: a weight determination unit configured to take values of the N features of the plurality of products of the preset reference category as inputs of a machine learning model, take the preset reference risk category as a target output of the machine learning model, train the machine learning model to obtain weights of the N features.
In one embodiment, the machine learning model comprises a logistic regression model.
In another embodiment, the dynamic scoring unit is specifically configured to determine initial scores of the N features by using the normalized values of the N features, respectively; and carrying out weighted summation on the initial scores of the N characteristics by using the weights of the N characteristics to obtain the dynamic risk score of the product to be identified.
In one embodiment, the apparatus further comprises:
the static scoring unit is configured to determine a static risk score of the product to be identified according to the matching condition of the attribute information of the product to be identified and a preset static rule;
the risk determination unit is specifically configured to synthesize the dynamic risk score and the static risk score to obtain a risk status of the product to be identified.
According to a third aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first aspect
According to a fourth aspect, there is provided a computing device comprising a memory having stored therein executable code and a processor that, when executing the executable code, implements the method of the first aspect.
According to the technical scheme, the embodiment of the specification provides the method and the device for carrying out dynamic risk assessment on the product to be identified, namely the characteristics adopted by risk identification are dynamically determined, and the risk of the product is quantitatively analyzed according to the dynamically determined characteristics, so that the risk assessment of the product to be identified is automatically and accurately realized.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 illustrates a main method flow diagram according to one embodiment;
FIG. 2 illustrates a detailed method flow diagram according to one embodiment;
FIG. 3 illustrates a flow diagram of a feature screening method according to one embodiment;
FIG. 4 illustrates a schematic diagram of determining a risk condition according to one embodiment;
FIG. 5 shows a schematic block diagram of the risk assessment arrangement according to one embodiment.
Detailed Description
The scheme provided by the specification is described below with reference to the accompanying drawings.
In traditional risk identification aiming at internet products, some rules are set according to manual experience, the satisfying conditions of the rules are scored by judging the attributes of the internet products, and finally the risk conditions of the internet products are determined according to the scoring values. However, with the continuous development of internet products and user behaviors, on one hand, the rules set by manual experience cannot be adjusted in time, and on the other hand, the rules set by manual experience cannot be guaranteed to be suitable for all internet products, so that the accuracy of risk identification is low.
Although some methods for identifying risks of internet products through machine learning models exist at present, the characteristics adopted by the machine learning models are set through manual experience, and cannot be adjusted in time or adapted to all internet products, so that the accuracy of risk identification is low.
The embodiment of the specification provides a mode capable of dynamically determining the characteristics adopted by risk identification, and the risk of a product is quantitatively analyzed according to the dynamically determined characteristics, so that the accuracy of risk identification is improved.
Specific implementations of the above concepts are described below.
Fig. 1 shows a flow diagram of a risk assessment method according to an embodiment of the present description. It is to be appreciated that the method can be performed by any apparatus, device, platform, cluster of devices having computing and processing capabilities. As shown in fig. 1, the method includes:
step 101, acquiring N characteristics of a product to be identified; n is a positive integer.
And 103, taking the values of the N characteristics of the product to be recognized as the input of the machine learning model, taking the reference risk category of the product to be recognized as the target output of the machine learning model, and training the machine learning model to obtain the weight of the N characteristics.
And 105, determining the dynamic risk score of the product to be identified by using the weights of the N characteristics.
Step 107: and determining the risk state of the product to be identified by using the dynamic risk score.
In the method shown in fig. 1, a method for performing dynamic risk assessment on a product to be identified is provided, that is, features used for risk identification are dynamically determined, and the risk of the product is quantitatively analyzed according to the dynamically determined features, so that the accuracy of risk identification is improved. The product to be identified in the embodiments of the present specification may be an internet product, and the internet product may be a specific computer application product, or may be a specific function module in the computer application product. For example, the application may be a financial application, or an investment function module in a financial application.
The method provided in the examples of this specification is described in detail below with reference to a specific example. As shown in fig. 2, the method may specifically comprise the steps of:
in 201, according to the correlation coefficient between the features, N features are screened from a feature library corresponding to the product to be identified.
For internet products, there are very many features, which are numerous, on the order of tens, hundreds, or even thousands. In the embodiment of the present specification, the feature types may be stored in the form of a feature library, and in this step, the features may be first obtained from the feature library corresponding to the product to be identified.
For example, assume that a financial assortment of products has the following characteristics:
total transaction customer number;
a total transaction amount;
total transaction number;
early warning the proportion of the total transaction amount of the customer;
early warning the total transaction amount of the customer;
early warning the proportion of the total transaction number of the customer;
the total transaction number of the customer is early warned;
early warning of customer proportion;
early warning the number of clients;
the total transaction amount of the customer related to the virtual currency is in proportion;
a virtual currency-related customer total transaction amount;
the total transaction number of the customers related to the virtual currency is in proportion;
the total transaction number of the customer related to the virtual currency;
the total transaction amount of the clients which have historically reported STR (suspicious transaction report) accounts for the total transaction amount;
historically reporting the total transaction amount of the clients with the STR;
the total transaction number of the clients which have historically reported STR is in proportion;
historically reporting the total transaction number of the clients subjected to STR;
the customer proportion of the STR is reported historically;
reporting the number of the clients of the STR historically;
the total transaction amount of the high-risk customers is proportional to the total transaction amount;
a high-risk customer total transaction amount;
the total transaction number of the high-risk customers is proportional to the total transaction number;
total transaction number of high-risk customers;
high risk customer proportion;
a high risk customer count;
single customer single maximum transaction amount;
the maximum transaction number of a single customer single order;
processing the proportion of the total transaction amount of the customer;
handling a customer total transaction amount;
handling the proportion of the total transaction number of the customers;
handling the total transaction number of the customer;
handling customer proportion;
handling the number of clients;
and so on.
As can be seen from the above, the characteristics of an internet product are numerous, but not all of them can accurately represent product risks. If all the characteristics or the selected characteristics which are distinguished are used for risk assessment of products, the risk assessment results are not contributed, and large deviation is brought. Therefore, it is critical to select a suitable feature from a large number of features to perform product risk assessment, that is, to find an optimal feature subset and remove irrelevant or redundant features, so as to reduce the dimension of the features and improve the recognition accuracy.
Feature screening may therefore be performed at step based on the correlation coefficient between features. The Pearson correlation coefficient is the simplest method that can help understand the relationship between the feature and the response variable, and thus may be used in the embodiments of the present specification.
A preferred feature screening method is exemplified herein, and as shown in fig. 3, may specifically include the following steps:
step 301: and calculating correlation coefficients of every two features in the feature library.
In this step, when the Pearson correlation coefficient between the two features is calculated, the following calculation formula may be adopted:
Figure 222393DEST_PATH_IMAGE001
(1)
wherein,
Figure 41664DEST_PATH_IMAGE002
a matrix representing one of the features and
Figure 352559DEST_PATH_IMAGE003
to representA matrix of another feature. Wherein the matrix of features is formed by the values of the features of a plurality of products, i.e. one m
Figure 623135DEST_PATH_IMAGE004
1, where m is the number of products participating in the statistics, that is, the values of the same feature of all the products participating in the statistics (including the product to be identified) constitute a data column. Calculating the Pearson correlation coefficient between two features is to calculate the Pearson correlation coefficient of the two data columns.
Figure 14671DEST_PATH_IMAGE005
To represent
Figure 712499DEST_PATH_IMAGE006
And
Figure 356364DEST_PATH_IMAGE007
the correlation coefficient of the pearlsion of (a),
Figure 352002DEST_PATH_IMAGE008
to represent
Figure 99509DEST_PATH_IMAGE009
And
Figure 155190DEST_PATH_IMAGE010
the covariance of (a) of (b),
Figure 17841DEST_PATH_IMAGE011
is composed of
Figure 567902DEST_PATH_IMAGE012
The variance of (a) is determined,
Figure 153604DEST_PATH_IMAGE013
is composed of
Figure 509016DEST_PATH_IMAGE007
The variance of (c).
Figure 829270DEST_PATH_IMAGE014
(2)
Figure 369973DEST_PATH_IMAGE015
(3)
Figure 590608DEST_PATH_IMAGE016
(4)
Wherein,
Figure 66720DEST_PATH_IMAGE017
is composed of
Figure 326800DEST_PATH_IMAGE009
To (1)
Figure 985708DEST_PATH_IMAGE018
The value of the one or more of the one,
Figure 890210DEST_PATH_IMAGE019
is composed of
Figure 927436DEST_PATH_IMAGE010
To (1)
Figure 455238DEST_PATH_IMAGE018
A value.
Figure 947530DEST_PATH_IMAGE020
Refer to
Figure 362331DEST_PATH_IMAGE009
The average value of the values in (a) is,
Figure 879113DEST_PATH_IMAGE021
refer to
Figure 457993DEST_PATH_IMAGE010
Average of the values in (1).
Further, since the dimensions used may be different between different features, in order to eliminate the dimensional influence between the features, the values of the features may be normalized to eliminate the dimensional difference before step 201.
As one of the realizable ways, a Log function may be employed in performing the above normalization processing. For example, to the characteristics
Figure 206506DEST_PATH_IMAGE009
When the normalization process is performed, the following calculation formula may be used:
Figure 990661DEST_PATH_IMAGE022
(5)
as described above
Figure 166427DEST_PATH_IMAGE023
Is a pair of characteristics
Figure 373549DEST_PATH_IMAGE009
The characteristic value after normalization processing is performed on each value in (1),
Figure 660173DEST_PATH_IMAGE024
representation feature
Figure 364081DEST_PATH_IMAGE009
Maximum value of (2). To the characteristics
Figure 461481DEST_PATH_IMAGE010
Normalized eigenvalues can be obtained by the same processing, and then the calculation of the above-mentioned calculation formulas (1) to (4) is performed using the normalized eigenvalues.
Step 303: judging whether the correlation coefficient between every two characteristics is lower than a preset correlation coefficient threshold value, if so, executing a step 305; otherwise, step 307 is executed.
In this embodiment, features with relatively low correlation between features need to be screened out, so that if the correlation coefficients between each two remaining features are lower than a preset correlation coefficient threshold value during screening, all the remaining features can be used as screened out features. Otherwise, the screening of step 307 is continued.
The correlation coefficient threshold may be an empirical value or an experimental value, and may be 0.5, for example.
Step 305: and ending the flow, and taking the residual features, namely all the undeleted features, as the screened features, wherein the number of the residual features is N.
Step 307: a pair of features with the highest correlation coefficient is determined.
Step 309: for each feature of the pair of features, an average of the correlation coefficients of the feature and the other features is calculated.
Step 311: the feature with the larger average value in the pair of features is deleted, and the step 303 is carried out to judge the remaining features.
Assuming that the pair of features with the highest correlation coefficient is feature 1 and feature 2, calculating the average value of the correlation coefficients of feature 1 and all other features, and the average value of the correlation coefficients of feature 2 and all other features. If the average of feature 1 is greater than the average of feature 2, feature 1 is deleted.
The above-mentioned process is executed repeatedly until the correlation coefficients between two of the remaining features are lower than the preset correlation coefficient threshold, and at this time, the remaining N features are the N features screened out. When the product to be identified is identified, the N characteristics of the product to be identified are obtained.
In addition to the above-described screening method shown in fig. 3, other screening methods may be employed based on the correlation coefficient between the features. For example, after calculating the correlation coefficients between the features two by two, the average value of the correlation coefficients between the feature and other features is calculated for each feature, and the feature with the average value smaller than the preset threshold value is selected as the N features screened out above.
Continuing to refer to fig. 2, step 203, determining a plurality of products belonging to the preset reference risk category with the product to be identified, taking the values of the N features of the plurality of products as the input of the machine learning model, taking the preset reference category as the target output of the machine learning model, and training the machine learning model to obtain the weights of the N features.
In the step, a concept, namely a "reference risk category", is involved, and currently, some existing technologies can give a coarse-grained risk judgment for the risk of a certain product, for example, some models or some manual strategies and the like can judge whether a product is risky in a coarse-grained manner, but cannot give a specific risk score. Then the coarse-grained decision result can be used as a reference risk category in this step. In one embodiment of the present disclosure, only the result of the coarse-grained determination is utilized, for example, when a product is determined to be at risk in the coarse-grained determination, the risk assessment method is performed on the product.
The above-mentioned coarse-grained risk determination method is not limited herein, and any conventional implementation method may be adopted. Only two ways are listed here:
for example, the attributes of the product to be recognized may be input into a classification model obtained through pre-training, such as whether quick payment is supported, whether real-time account arrival is supported, whether cross-border service is provided, and the like, and the classification model performs secondary classification based on the attributes to obtain a classification result of whether the product to be recognized is risky as a coarse-grained risk recognition result. The classification model may be a neural network model, an SVM (Support Vector Machine), or the like.
For another example, some manual policies may be preset to form the recognition model of the tree structure, non-leaf nodes of the tree structure are the set policies, and the leaf nodes are the recognition results that are at risk or not according to whether the results meeting the policies point to different next-level nodes or not. And inquiring the identification model of the tree structure by using the attributes of the product to be identified, sequentially matching according to the satisfaction condition of the product to be identified on the nodes, and taking the finally matched leaf nodes as the coarse-grained risk identification result.
In determining the weights of the N features in step 203, the values of the N features of the plurality of products belonging to the preset reference risk category may be used to train the machine learning model. For example, a plurality of products identified as being at risk by the coarse-grained risk may also be included. The machine learning model employed in the embodiments of the present specification may be a classification model, and may be a logistic regression model, for example.
And taking the values of the N characteristics of the products as samples, and taking the preset reference risk category, namely risk, as target output to train a logistic regression model. And after the training end condition is reached, acquiring the weight of each feature from the logistic regression model. The training end condition may be, for example, the number of iterations reaches a preset number threshold, or the loss function converges, or the like.
In the process of training the logistic regression model, a loss function can be constructed by using a maximum likelihood estimation method, and parameters are updated by using a gradient descent method, wherein the updating comprises updating of each feature weight. Since training of logistic regression models is prior art and will not be described in detail herein, embodiments of the present disclosure utilize training of logistic regression models to obtain the weights of features.
It should be noted that the steps 201 to 203 may be executed in advance, or may be executed in real time when the risk assessment is performed on the product to be identified.
Step 205: values of N characteristics of the product to be identified are obtained.
Step 207: and determining the dynamic risk score of the product to be identified by using the weight of the N characteristics and the values of the N characteristics of the product to be identified.
In this embodiment, the initial scores of the N features may be determined by using the values of the N features after the normalization processing of the product to be identified. Because the values of the N characteristics after the normalization processing are the characteristic values without dimension difference, the values of the N characteristics after the normalization processing can be directly used as the initial scores of the N characteristics of the product to be identified.
Other means than the above may also be employed. For example, the initial scores of the N characteristics of the product to be identified are determined according to the matching degree of the N characteristic values of the product to be identified to the setting rule. For example, for the feature "total number of transacted customers", if the total number of transacted customers is below 1 ten thousand, the initial score of the feature is 3 points; if the total number of the transaction customers is 1-10 ten thousand, the initial score of the characteristic is 5; if the total number of the transaction customers is 10-30 ten thousand, the initial score of the characteristic is 7 points; if the total number of the transaction clients is 30-100 ten thousand, the initial score of the characteristic is 9.
After the initial scores of the N features are determined, the initial scores of the N features are weighted and summed by using the weights of the N features determined in step 203 to obtain the dynamic risk score of the product to be identified. Of course, in addition to weighted summation, weighted averaging is also possible, but in essence weighted summation.
Step 209: and determining the risk state of the product to be identified by using the dynamic risk score.
As an implementable way, the risk status of the product to be identified can be determined directly from the dynamic risk score. For example, the dynamic risk score is divided into several intervals, each interval corresponding to a certain risk level. The score may also be used directly to reflect the risk level. The higher the score, the higher the risk rating of the product to be identified.
As a preferred implementation, the static risk scoring and the dynamic risk scoring may be combined in the examples of the present specification. And performing static risk scoring on the product to be identified while performing the dynamic risk scoring, as shown in fig. 4, and then integrating the dynamic risk scoring and the static risk scoring to obtain the risk condition of the product to be identified.
And the static risk score is determined according to the matching condition of the attribute information of the product to be identified and the preset static rule. The static rules may be set according to manual experience, for example: whether quick payment is supported, whether real-time billing is supported, whether cross-border services are provided, etc. And adding 1 point when the attribute information of the product to be identified meets a static rule, and finally accumulating to obtain a static risk score.
When the dynamic risk score and the static risk score are synthesized, the two scores can be subjected to weighted summation to obtain a final risk score value of a product to be identified, wherein the weighted value of the scores can be set according to an empirical value or an experimental value; and then determining the risk state of the product to be identified according to the final risk score value. For example, the final risk score value is divided into several intervals, each interval corresponding to a certain risk level. The final risk score value may also be used directly to reflect the risk level. The higher the score, the higher the risk rating of the product to be identified.
In addition to the combination of the static risk score and the dynamic risk score, the risk status of the product to be identified may be determined by combining the dynamic risk score with the risk score determined by risk identification in other manners, which is not listed herein.
As can be seen from the foregoing method embodiments, the method provided by the embodiments of this specification may have the following advantages:
1) each characteristic value changes along with the online of the internet products. Therefore, the above-mentioned risk assessment provided by the embodiments of the present disclosure may be performed periodically or based on a specific event. The risk of the product can be analyzed according to the feature quantification determined dynamically after each feature value changes.
2) The embodiment provides a mode of keeping the traditional rule set based on manual experience for scoring, and can ensure that the risk state of a product can be obtained according to manual experience under the condition that no transaction data and no risk data exist at the initial stage of online of a new product.
3) The mode provided by the embodiment can analyze the risk condition of the product, can also acquire the influence degree of various characteristics generated in the risk condition, and improves the relevance of the risk identification index and the interpretability of the risk identification.
4) The mode that utilizes above-mentioned embodiment to provide can carry out timely early warning to high risk product to effectively monitor and avoid the emergence of risk action such as money laundering.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
According to an embodiment of another aspect, a risk assessment device is provided. FIG. 5 shows a schematic block diagram of the risk assessment arrangement according to one embodiment. It is to be appreciated that the apparatus can be implemented by any apparatus, device, platform, and cluster of devices having computing and processing capabilities. As shown in fig. 5, the apparatus 500 includes: the feature value obtaining unit 501, the dynamic scoring unit 502, and the risk determining unit 503 may further include a feature screening unit 504, a normalizing unit 505, a weight determining unit 506, and a static scoring unit 507. The main functions of each component unit are as follows:
the feature value obtaining unit 501 is configured to obtain values of N features of the product to be identified, where the N features are screened from a feature library corresponding to the product to be identified according to a correlation coefficient between the features, and N is a positive integer.
And the dynamic scoring unit 502 is configured to determine the dynamic risk score of the product to be identified by using the weights of the N characteristics and the values of the N characteristics, wherein the weights of the N characteristics are obtained by training a machine learning model.
A risk determination unit 503 configured to determine a risk status of the product to be identified using the dynamic risk score.
The feature screening unit 504 is configured to screen N features from a feature library corresponding to the product to be identified according to the correlation coefficient between the features.
As a preferred embodiment, the feature filtering unit 504 may be specifically configured to:
calculating correlation coefficients of every two features in the feature library;
determining a pair of features with the highest correlation coefficient;
calculating the average value of the correlation coefficient of each feature and other features of the pair of features respectively;
and deleting the feature with the larger average value in the pair of features, and switching to the step of determining the pair of features with the highest correlation until the correlation coefficient between every two of the remaining N features is lower than a preset correlation coefficient threshold value.
The above-described Pearson correlation coefficient may be employed in calculating the correlation coefficient between the features.
In addition to the above-described preferred embodiments, the feature filtering unit 504 may also adopt other filtering methods based on the correlation coefficient between the features. For example, after calculating the correlation coefficients between the features two by two, the average value of the correlation coefficients between the feature and other features is calculated for each feature, and the feature with the average value smaller than the preset threshold value is selected as the N features screened out above.
A normalization unit 505 configured to normalize the feature values of the products in the feature library to eliminate dimensional differences; accordingly, the values of the N features acquired by the feature value acquisition unit 501 are normalized values.
The product to be identified is a product of a preset reference risk category; a weight determination unit 506 configured to take values of N features of the plurality of products of the preset reference category as inputs of the machine learning model, take the preset reference risk category as a target output of the machine learning model, train the machine learning model to obtain weights of the N features.
The preset reference risk category may be a product that is judged to be at risk or not at risk coarsely by using some existing models or some manual strategies. The above-mentioned coarse-grained risk determination method is not limited herein, and any conventional implementation method may be adopted, and the embodiment of the present specification only uses the determination result.
The machine learning model may include, among other things, a logistic regression model.
As a preferred embodiment, the dynamic scoring unit 502 may be specifically configured to determine the initial scores of the N features by using the normalized values of the N features respectively; and carrying out weighted summation on the initial scores of the N characteristics by using the weights of the N characteristics to obtain the dynamic risk score of the product to be identified.
And the static scoring unit 507 is configured to determine a static risk score of the product to be identified according to the matching condition of the attribute information of the product to be identified and the preset static rule.
Accordingly, the risk determination unit 503 may be specifically configured to synthesize the dynamic risk score and the static risk score resulting in a risk status of the product to be identified.
According to an embodiment of another aspect, there is also provided a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in fig. 1, fig. 2 or fig. 3.
According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory and a processor, the memory having stored therein executable code, the processor, when executing the executable code, implementing the method of fig. 1, 2 or 3.
With the development of time and technology, computer readable storage media are more and more widely used, and the propagation path of computer programs is not limited to tangible media any more, and the computer programs can be directly downloaded from a network and the like. Any combination of one or more computer-readable storage media may be employed. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present specification, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The processors described above may include one or more single-core processors or multi-core processors. The processor may comprise any combination of general purpose processors or dedicated processors (e.g., image processors, application processor baseband processors, etc.).
The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, as for the apparatus embodiment, since it is substantially similar to the method embodiment, the description is relatively simple, and for the relevant points, reference may be made to the partial description of the method embodiment.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in this invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments, objects, technical solutions and advantages of the present invention are further described in detail, it should be understood that the above-mentioned embodiments are only exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims (15)

1. A method of risk assessment comprising:
obtaining the values of N characteristics of a product to be identified; n is a positive integer;
determining a dynamic risk score of the product to be identified by using the weights of the N features and the values of the N features, wherein the weights of the N features are obtained by a pre-trained machine learning model;
determining a risk status of the product to be identified using the dynamic risk score;
wherein,
obtaining weights of the N features through a pre-trained machine learning model, including:
judging the coarse granularity of the product to be identified, and taking the judgment result of the coarse granularity as a reference risk category; wherein, the judgment of the coarse granularity is as follows: judging whether the risk exists; the judgment result of the coarse granularity is risky or risk-free;
determining at least two products belonging to the reference risk category;
determining values of the N features of at least two products belonging to the reference risk category;
and taking the values of the N characteristics of at least two products belonging to the reference risk category as the input of the machine learning model, taking the reference risk category as the target output of the machine learning model, and training the machine learning model to obtain the weight of the N characteristics.
2. The method of claim 1, further comprising, prior to said obtaining N features of a product to be identified:
and screening the N characteristics from the characteristic library corresponding to the product to be identified according to the correlation coefficient among the characteristics.
3. The method of claim 2, wherein the screening the N features from the feature library corresponding to the product to be identified according to the correlation coefficient between the features comprises:
calculating correlation coefficients of every two features in the feature library;
determining a pair of features with the highest correlation coefficient;
calculating the average value of the correlation coefficient of each feature and other features of the pair of features respectively;
and deleting the feature with the larger average value in the pair of features, and switching to the step of executing the step of determining the feature with the highest correlation coefficient to the step of deleting the feature with the larger average value in the pair of features until the correlation coefficients between every two of the remaining N features are lower than a preset correlation coefficient threshold value.
4. The method of claim 2, wherein before the screening the N features from the feature library corresponding to the product to be identified according to the correlation coefficient between the features, the method further comprises: normalizing the characteristic values of the products in the characteristic library to eliminate dimension difference;
the values of the N characteristics are the values after the normalization.
5. The method of claim 1, wherein the machine learning model comprises a logistic regression model.
6. The method of claim 4, wherein determining the dynamic risk score for the product to be identified using the weights of the N features and the values of the N features comprises:
respectively determining initial scores of the N characteristics by using the normalized values of the N characteristics;
and carrying out weighted summation on the initial scores of the N characteristics by using the weights of the N characteristics to obtain the dynamic risk score of the product to be identified.
7. The method of claim 1, wherein determining a risk status of the product to be identified using the dynamic risk score comprises:
determining a static risk score of the product to be identified according to the matching condition of the attribute information of the product to be identified and a preset static rule;
and integrating the dynamic risk score and the static risk score to obtain the risk condition of the product to be identified.
8. A risk assessment device comprising:
a feature value acquisition unit configured to acquire values of N features of a product to be identified; n is a positive integer;
a dynamic scoring unit configured to determine a dynamic risk score of the product to be identified using weights of the N features and values of the N features, the weights of the N features being obtained by a pre-trained machine learning model;
a risk determination unit configured to determine a risk status of the product to be identified using the dynamic risk score;
wherein obtaining the weights of the N features through a pre-trained machine learning model comprises: judging the coarse granularity of the product to be identified, and taking the judgment result of the coarse granularity as a reference risk category; wherein, the judgment of the coarse granularity is as follows: judging whether the risk exists; the judgment result of the coarse granularity is risky or risk-free; determining at least two products belonging to the reference risk category; determining values of the N features of at least two products belonging to the reference risk category; and taking the values of the N characteristics of at least two products belonging to the reference risk category as the input of the machine learning model, taking the reference risk category as the target output of the machine learning model, and training the machine learning model to obtain the weight of the N characteristics.
9. The apparatus of claim 8, further comprising:
and the feature screening unit is configured to screen the N features from the feature library corresponding to the product to be identified according to the correlation coefficient between the features.
10. The apparatus according to claim 9, wherein the feature screening unit is specifically configured to:
calculating correlation coefficients of every two features in the feature library;
determining a pair of features with the highest correlation coefficient;
calculating the average value of the correlation coefficient of each feature and other features of the pair of features respectively;
and deleting the feature with the larger average value in the pair of features, and switching to the step of executing the step of determining the feature with the highest correlation coefficient to the step of deleting the feature with the larger average value in the pair of features until the correlation coefficients between every two of the remaining N features are lower than a preset correlation coefficient threshold value.
11. The apparatus of claim 9, further comprising:
the normalization unit is configured to normalize the characteristic value of each product in the characteristic library so as to eliminate dimension difference;
the values of the N features acquired by the feature value acquisition unit are values after the normalization.
12. The apparatus according to claim 11, wherein the dynamic scoring unit is specifically configured to determine initial scores of the N features by using the normalized values of the N features, respectively; and carrying out weighted summation on the initial scores of the N characteristics by using the weights of the N characteristics to obtain the dynamic risk score of the product to be identified.
13. The apparatus of claim 8, further comprising:
the static scoring unit is configured to determine a static risk score of the product to be identified according to the matching condition of the attribute information of the product to be identified and a preset static rule;
the risk determination unit is specifically configured to synthesize the dynamic risk score and the static risk score to obtain a risk status of the product to be identified.
14. A computer-readable storage medium, on which a computer program is stored which, when executed in a computer, causes the computer to carry out the method of any one of claims 1-7.
15. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that, when executed by the processor, implements the method of any of claims 1-7.
CN202110397972.9A 2021-04-14 2021-04-14 Risk assessment method and device Active CN112801563B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110397972.9A CN112801563B (en) 2021-04-14 2021-04-14 Risk assessment method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110397972.9A CN112801563B (en) 2021-04-14 2021-04-14 Risk assessment method and device

Publications (2)

Publication Number Publication Date
CN112801563A CN112801563A (en) 2021-05-14
CN112801563B true CN112801563B (en) 2021-08-17

Family

ID=75811347

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110397972.9A Active CN112801563B (en) 2021-04-14 2021-04-14 Risk assessment method and device

Country Status (1)

Country Link
CN (1) CN112801563B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8489416B2 (en) * 2003-10-22 2013-07-16 Medco Health Solutions, Inc. Computer system and method for generating healthcare risk indices using medication compliance information
CN106296195A (en) * 2015-05-29 2017-01-04 阿里巴巴集团控股有限公司 A kind of Risk Identification Method and device
CN106910078A (en) * 2015-12-22 2017-06-30 阿里巴巴集团控股有限公司 Risk identification method and device
CN108416663A (en) * 2018-01-18 2018-08-17 阿里巴巴集团控股有限公司 The method and device of the financial default risk of assessment
CN110310129A (en) * 2019-06-04 2019-10-08 阿里巴巴集团控股有限公司 Risk Identification Method and its system
CN112037009A (en) * 2020-08-06 2020-12-04 百维金科(上海)信息科技有限公司 Risk assessment method for consumption credit scene based on random forest algorithm
CN112651635A (en) * 2020-12-28 2021-04-13 长沙市到家悠享网络科技有限公司 Risk identification method and device, electronic equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
GB2537776A (en) * 2014-02-14 2016-10-26 Tidd Nathan Selecting equity investments using quantitative multi-factor models
CN109377058A (en) * 2018-10-26 2019-02-22 中电科新型智慧城市研究院有限公司 The enterprise of logic-based regression model moves outside methods of risk assessment
CN109978406A (en) * 2019-04-08 2019-07-05 上海叮诺科技有限公司 A kind of method and system of security downside risks assessment diagnosis
CN110942171A (en) * 2019-09-12 2020-03-31 中电科新型智慧城市研究院有限公司 Enterprise labor and resource dispute risk prediction method based on machine learning
CN111311085B (en) * 2020-02-10 2024-08-06 清华大学合肥公共安全研究院 Building fire dynamic risk assessment method and device based on Internet of things monitoring
CN111291816B (en) * 2020-02-17 2021-08-06 支付宝(杭州)信息技术有限公司 Method and device for carrying out feature processing aiming at user classification model

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8489416B2 (en) * 2003-10-22 2013-07-16 Medco Health Solutions, Inc. Computer system and method for generating healthcare risk indices using medication compliance information
CN106296195A (en) * 2015-05-29 2017-01-04 阿里巴巴集团控股有限公司 A kind of Risk Identification Method and device
CN106910078A (en) * 2015-12-22 2017-06-30 阿里巴巴集团控股有限公司 Risk identification method and device
CN108416663A (en) * 2018-01-18 2018-08-17 阿里巴巴集团控股有限公司 The method and device of the financial default risk of assessment
CN110310129A (en) * 2019-06-04 2019-10-08 阿里巴巴集团控股有限公司 Risk Identification Method and its system
CN112037009A (en) * 2020-08-06 2020-12-04 百维金科(上海)信息科技有限公司 Risk assessment method for consumption credit scene based on random forest algorithm
CN112651635A (en) * 2020-12-28 2021-04-13 长沙市到家悠享网络科技有限公司 Risk identification method and device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN112801563A (en) 2021-05-14

Similar Documents

Publication Publication Date Title
KR102061987B1 (en) Risk Assessment Method and System
CN109003089B (en) Risk identification method and device
CN108073568B (en) Keyword extraction method and device
US20200090268A1 (en) Method and apparatus for determining level of risk of user, and computer device
CN110992167A (en) Bank client business intention identification method and device
TW201944305A (en) Method and apparatus for determining risk probability of service request event
CN112017040B (en) Credit scoring model training method, scoring system, equipment and medium
CN109583731B (en) Risk identification method, device and equipment
CA3165582A1 (en) Data processing method and system based on similarity model
CN114186626A (en) Abnormity detection method and device, electronic equipment and computer readable medium
CN112348685A (en) Credit scoring method, device, equipment and storage medium
CN111915312A (en) Risk identification method and device and electronic equipment
Yahaya et al. An enhanced bank customers churn prediction model using a hybrid genetic algorithm and k-means filter and artificial neural network
CN113392920B (en) Method, apparatus, device, medium, and program product for generating cheating prediction model
CN114896291A (en) Training method and sequencing method of multi-agent model
CN112927719B (en) Risk information evaluation method, apparatus, device and storage medium
CN112801563B (en) Risk assessment method and device
CN116342164A (en) Target user group positioning method and device, electronic equipment and storage medium
CN113177733B (en) Middle and small micro enterprise data modeling method and system based on convolutional neural network
CN114722941A (en) Credit default identification method, apparatus, device and medium
CN115048487A (en) Artificial intelligence-based public opinion analysis method, device, computer equipment and medium
CN110570301B (en) Risk identification method, device, equipment and medium
CN114092216A (en) Enterprise credit rating method, apparatus, computer device and storage medium
CN111681090A (en) Account grouping method and device of business system, terminal equipment and storage medium
CN113723522B (en) Abnormal user identification method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant