CN111797994A - Risk assessment method, device, equipment and storage medium - Google Patents

Risk assessment method, device, equipment and storage medium Download PDF

Info

Publication number
CN111797994A
CN111797994A CN202010596449.4A CN202010596449A CN111797994A CN 111797994 A CN111797994 A CN 111797994A CN 202010596449 A CN202010596449 A CN 202010596449A CN 111797994 A CN111797994 A CN 111797994A
Authority
CN
China
Prior art keywords
data
evaluated
missing
existing
existing data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010596449.4A
Other languages
Chinese (zh)
Other versions
CN111797994B (en
Inventor
邓继禹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010596449.4A priority Critical patent/CN111797994B/en
Publication of CN111797994A publication Critical patent/CN111797994A/en
Application granted granted Critical
Publication of CN111797994B publication Critical patent/CN111797994B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Abstract

The application discloses a risk assessment method, a risk assessment device, risk assessment equipment and a storage medium, which relate to the field of artificial intelligence, in particular to a deep learning technology and a big data technology, and particularly relates to a method for preventing and controlling financial risks and anti-fraud by using a deep learning model and a big data technology. The specific implementation scheme is as follows: if a data missing event is detected to exist according to the existing data characteristics of the object to be evaluated, transforming the existing data characteristics to obtain the missing data characteristics of the object to be evaluated; the existing data characteristics are obtained by processing the existing data of the object to be evaluated; and performing risk assessment on the object to be assessed according to the existing data characteristics and the missing data characteristics of the object to be assessed. The method and the device can improve the accuracy of risk assessment under the condition that the existing data of the object to be assessed are incomplete, so as to optimize a risk assessment scheme.

Description

Risk assessment method, device, equipment and storage medium
Technical Field
The application relates to the field of artificial intelligence, in particular to a deep learning technology and a big data technology, and particularly relates to a method for preventing and controlling financial risks and anti-fraud by using a deep learning model and a big data technology.
Background
With the development of artificial intelligence technology, risk assessment for users has been applied to more and more fields, for example, the field of financial wind control requires risk assessment for users when the users transact financial services; anti-fraud application fields in order to find a fraud, risk assessment and the like of an application user are also required. The current risk assessment technology is generally based on real-time data uploaded when a user transacts business or uses an application for assessment, but when the data uploaded by the user is not complete, the user cannot be assessed accurately and comprehensively, and improvement is urgently needed.
Disclosure of Invention
The disclosure provides a risk assessment method, a risk assessment device, a risk assessment equipment and a storage medium.
According to an aspect of the present disclosure, there is provided a risk assessment method, including:
if a data missing event is detected to exist according to the existing data characteristics of the object to be evaluated, transforming the existing data characteristics to obtain the missing data characteristics of the object to be evaluated; the existing data characteristics are obtained by processing the existing data of the object to be evaluated;
and performing risk assessment on the object to be assessed according to the existing data characteristics and the missing data characteristics of the object to be assessed.
According to another aspect of the present disclosure, there is provided a risk assessment apparatus comprising:
the missing characteristic determining module is used for transforming the existing data characteristics to obtain the missing data characteristics of the object to be evaluated if the existing data characteristics of the object to be evaluated detect that a data missing event exists; the existing data characteristics are obtained by processing the existing data of the object to be evaluated;
and the risk assessment module is used for performing risk assessment on the object to be assessed according to the existing data characteristics and the missing data characteristics of the object to be assessed.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform a risk assessment method as described in any of the embodiments of the present application.
According to another aspect of the present disclosure, there is provided a non-transitory computer readable storage medium having stored thereon computer instructions for causing a computer to perform a risk assessment method according to any of the embodiments of the present application.
According to the technology of the application, the problem that when existing data of an object to be evaluated are incomplete, a risk evaluation result is inaccurate is solved, and a risk evaluation scheme is optimized.
It should be understood that the statements in this section do not necessarily identify key or critical features of the embodiments of the present disclosure, nor do they limit the scope of the present disclosure. Other features of the present disclosure will become apparent from the following description.
Drawings
The drawings are included to provide a better understanding of the present solution and are not intended to limit the present application. Wherein:
FIG. 1 is a flow chart of a risk assessment method provided according to an embodiment of the present application;
FIG. 2A is a flow chart of another method for risk assessment provided in accordance with an embodiment of the present application;
FIG. 2B is a schematic diagram illustrating a missing data feature determination provided according to an embodiment of the present application;
FIG. 3 is a flow chart of another risk assessment method provided in accordance with an embodiment of the present application;
FIG. 4 is a flow chart of another risk assessment method provided in accordance with an embodiment of the present application;
FIG. 5 is a flow chart of another risk assessment method provided in accordance with an embodiment of the present application;
FIG. 6A is a flow chart of another risk assessment method provided in accordance with an embodiment of the present application;
FIG. 6B is an architecture diagram of a risk assessment system provided in accordance with an embodiment of the present application;
FIG. 7 is a schematic structural diagram of a risk assessment device according to an embodiment of the present application;
fig. 8 is a block diagram of an electronic device for implementing a risk assessment method according to an embodiment of the present application.
Detailed Description
The following description of the exemplary embodiments of the present application, taken in conjunction with the accompanying drawings, includes various details of the embodiments of the application for the understanding of the same, which are to be considered exemplary only. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a flowchart of a risk assessment method according to an embodiment of the present application. The method and the device are suitable for the condition of risk assessment of the object to be assessed. For example, it may be the case that the loan user is risk assessed based on his existing data at the time of loan approval. The embodiment may be performed by a risk assessment arrangement configured in an electronic device, which may be implemented in software and/or hardware. As shown in fig. 1, the method includes:
s101, if a data missing event is detected according to the existing data characteristics of the object to be evaluated, the existing data characteristics are transformed to obtain the missing data characteristics of the object to be evaluated.
The object to be evaluated can be any user needing risk evaluation. For example, if risk assessment is performed on an individual, the object to be assessed may be a user individual; if risk assessment is performed on an enterprise unit, the object to be assessed at this time may be a responsible person of the enterprise, and the like. The existing data feature of the object to be evaluated may be a feature extracted from the existing data of the object to be evaluated, and may be obtained by analyzing the existing data of the object to be evaluated through a pre-trained machine learning model, for example; or extracted from the existing data of the object to be evaluated by a preset feature extraction algorithm, and the like. The existing data of the object to be evaluated may be reference data that can be acquired for performing risk evaluation this time, and the data types of the existing data may be many, and may include, but are not limited to: at least one of text data, social data, consumption data, external data, other data, and the like. The existing data can be filled in or uploaded for the user to perform the evaluation; or by accessing various data sources. Optionally, if the existing data is obtained from a data source, the data source may include, but is not limited to: at least one of short message, operator, address book, credit card bill, e-commerce, third party credit investigation, blacklist bank, terminal behavior, etc. The data missing event in the embodiment of the application may be an event that is triggered and generated when the acquired existing data is not comprehensive enough and some kind of data is missing. The missing data feature is the relevant feature of the missing data in the evaluation.
Optionally, in this embodiment of the present application, whether there is a data missing event may be detected through an existing data feature of the object to be evaluated. Specifically, the existing data characteristics determined by the acquired existing data of the object to be evaluated may be judged to be comprehensive or not by combining the actual risk evaluation service, and if there is no other characteristics, a data missing event is indicated. For example, the target feature type required for risk assessment may be preset, after the existing data feature of the object to be assessed is determined, it is determined whether the existing data feature covers all the target feature types, and if not, it is determined that a data missing event exists. The existing data characteristics can be input into the score card model based on the interpretability of the score card model for determining the final result of the risk assessment, and if the existing data characteristics are incomplete, the score card model can input a data missing prompt, namely, a data missing event is detected.
Optionally, in this embodiment of the present application, if there is a data missing event, instead of ignoring the missing data (i.e., deleting the missing data), the missing data feature of the object to be evaluated is retrieved through a transformation process according to the existing data feature. Specifically, because the existing data of the object to be evaluated has more types, the existing data features extracted for each type of existing data may not be on one scale, and at this time, the existing data features of the existing data of various types may be converted to the same scale through a preset transformation algorithm, and then fusion is performed to accurately determine the features of the missing data. Optionally, the method for fusing different existing data features to the same scale may depend on a scale transformation algorithm, for example, if the scale transformation algorithm is a WOE algorithm, the fusion method at this time may be a weighted fusion method.
And S102, performing risk assessment on the object to be assessed according to the existing data characteristics and the missing data characteristics of the object to be assessed.
Optionally, the risk assessment performed in the embodiment of the present application may be to assess the credit, the fraud behavior, whether the environment is dangerous, and the like of the object to be assessed. Specifically, since the missing data feature of the missing data has been accurately retrieved in S101, at this time, according to the existing data feature and the missing data feature to be evaluated, when the risk evaluation is accurately performed on the object to be evaluated, a risk evaluation rule may be preset, and if the data features corresponding to different risk levels are what, the extracted data features (including the existing data feature and the missing data feature) of the user may be matched with the data features corresponding to the risk levels in the risk evaluation rule, so as to determine what the risk evaluation level corresponding to the current object to be evaluated is. The existing data characteristics and the missing data characteristics of the object to be evaluated can be input into a pre-trained model for risk evaluation, and the model can analyze the input existing data characteristics and the missing data characteristics based on an algorithm during training to obtain a final accurate risk evaluation result. Optionally, the model for performing risk assessment in the embodiment of the present application may be composed of a machine learning model and a score card model, where the machine learning model is configured to perform feature scoring on the existing data features and the missing data features, and input the score values of the existing data features and the missing data features into the score card model, and the score card model is configured to analyze the input score values of the existing data features and the missing data features to obtain a final risk assessment result.
According to the technical scheme of the embodiment of the application, if the data missing event existing in the risk assessment is detected according to the existing data characteristics of the object to be assessed, the missing data characteristics of the object to be assessed are determined by transforming the existing data characteristics, and then the risk assessment is carried out on the object to be assessed according to the determined missing data characteristics and the existing data characteristics. According to the scheme of the embodiment of the application, when a data missing event exists in the risk assessment process, the missing data is not ignored for assessment, the existing data characteristics are transformed, the characteristics of the missing data are accurately reduced, and then the risk assessment is carried out, so that the problem that the risk assessment result is inaccurate when the existing data of the object to be assessed are incomplete is solved, the accuracy of the risk assessment is greatly improved, and the risk assessment scheme is optimized.
FIG. 2A is a flow chart of another method for risk assessment provided in accordance with an embodiment of the present application; fig. 2B is a schematic diagram illustrating a principle of determining missing data characteristics according to an embodiment of the present application. On the basis of the above embodiments, the present embodiment performs further optimization, and gives a description of a specific situation in which the existing data features are transformed to obtain the missing data features of the object to be evaluated. As shown in fig. 2A-2B, the method specifically includes:
s201, if a data missing event is detected according to the existing data characteristics of the object to be evaluated, performing evidence weight WOE transformation on the existing data characteristics to obtain transformation characteristics.
Optionally, if a data missing event is detected according to the existing data features of the object to be evaluated, the embodiment may perform WOE transformation on the existing data features to restore the missing data features of the missing data. Specifically, existing data features determined by existing data acquired from different data sources may be subjected to WOE transformation, respectively, to obtain transformation features corresponding to each existing data feature.
S202, determining the weight value of the transformation characteristic.
Optionally, after the transformation features corresponding to the existing data features of different types are obtained through WOE transformation in S201, weight value assignment needs to be performed on each transformation feature. There are many specific assignment ways, and this embodiment is not limited thereto. If the weight value assignment can be carried out on different transformation characteristics by experienced workers; the weight values of different transformation characteristics can be obtained by training a machine learning model through a large amount of sample data; the weight values of the transformation features may also be determined according to other rules.
And S203, determining the missing data characteristics of the object to be evaluated according to the transformation characteristics and the weight values of the transformation characteristics.
Optionally, in this step, each transformation feature may be weighted and fused according to the transformation feature and the weight value of the transformation feature, and a weighted and fused result (i.e., a summation result obtained by multiplying the feature value of each transformation feature by the weight value thereof) is used as the determined missing data feature of the object to be evaluated. Illustratively, as shown in fig. 2B, the missing data feature is transform feature 1 × W1+ transform feature 2 × W2+ … + transform feature N × WN.
And S204, performing risk assessment on the object to be assessed according to the existing data characteristics and the missing data characteristics of the object to be assessed.
According to the scheme of the embodiment of the application, if the data missing event existing in the current risk assessment is detected according to the existing data characteristics of the object to be assessed, WOE transformation is carried out on the existing data characteristics of each type to obtain the transformation characteristics corresponding to the existing data characteristics of each type, after weight assignment is carried out on the transformation characteristics, weighted fusion is carried out on the transformation characteristics to obtain the missing data characteristics of the object to be assessed, and then risk assessment is carried out on the object to be assessed according to the determined missing data characteristics and the existing data characteristics. According to the scheme of the embodiment of the application, the missing data characteristics are restored by adopting WOE transformation, so that the accuracy of restoring the missing data characteristics is greatly improved, and the accuracy of final risk assessment is further improved.
Fig. 2B is a specific example of step S202 in fig. 2A, and as shown in fig. 2B, this step may assign a weight value of W1 to transform feature 1, a weight value of W2 to transform feature 2, and a weight value of WN to transform feature N. Performing feature extraction on existing data 1 acquired by a data source 1 channel to obtain existing data features 1, and performing WOE (world wide error) transformation on the existing data features 1 to obtain transformation features 1; performing feature extraction on existing data 2 acquired by a data source 2 channel to obtain existing data features 2, and performing WOE (world wide error) transformation on the existing data features 2 to obtain transformation features 2; and performing characteristic extraction on the existing data N acquired by the data source N channel to obtain existing data characteristics N, and performing WOE (world wide error) transformation on the existing data characteristics N to obtain transformation characteristics N. Optionally, the present embodiment may be a determination operation for performing transformation features on existing data acquired from different data sources in parallel.
FIG. 3 is a flow chart of another risk assessment method provided in accordance with an embodiment of the present application; on the basis of the above embodiments, the present embodiment performs further optimization, and gives a description of a specific case of determining existing data features of an object to be evaluated. As shown in fig. 3, the method specifically includes:
s301, performing data mining on the existing data of the object to be evaluated to obtain the edge relation of at least one dimension contained in the existing data.
In this embodiment, the edge relationship may be a relationship between different users. The edge relation in this embodiment is divided into strong and weak. For example, suppose that the closer the relationship between user a and user B is, the stronger the edge relationship between them is. The strength of the edge relationship can be represented by the weight value of the edge relationship. Optionally, the edge relationship in this embodiment is an edge relationship of at least one dimension, where a relationship directly established between two users is a one-degree edge relationship without using other users; the relationship established between the two users through the third user is a second-degree edge relationship; the relationship between the two users established by the third user and the fourth user is a three-degree edge relationship and the like. For example, assuming that user a and user B are friends, user a and user B are in a one-degree-edge relationship; if the user A and the user B are friends and the user B and the user C are friends, the user A and the user C are in a two-degree edge relationship at the moment.
Optionally, when the existing data of the object to be evaluated is mined for the edge relationship, different algorithms may be used for mining the existing data of different types, and the edge relationship of at least one dimension included in the existing data of different types is extracted. Specifically, for the operator data in the existing data, the closeness between two users may be measured according to the number of call records in the recent period of time, for example, a one-degree relationship exists between two users with higher frequency of call times. For the device data in the existing data, the degree of closeness between users may be measured according to indexes such as whether the same device is used or not, whether the same WIFI is used or not, and the like in the recent period of time, for example, a one-degree relationship exists between users who have used the same device or WIFI. For the address book data in the existing data, the intimacy between users can be measured according to the remark information of the numbers stored in the address book, and if the remark of a certain number is mom, a degree relationship exists between the user corresponding to the number and the user of the terminal to which the address book belongs. For the emergency contact data in the existing data, in general, the emergency contacts filled by the user are own family, colleagues, relatives and friends, and the emergency contacts of the user can be taken as a one-degree relationship with the filling user. For e-commerce address data in the existing data, comparing and measuring the edge relationship existing between users according to the address similarity, for example, a one-degree edge relationship exists between users with the same address; for the old customers who operate in the existing data to pull new data, the new interaction is generally pulled among family people, colleagues and relatives and friends, so that a one-degree relationship can be set between the new users and the old users when the new interaction is pulled. For bank card transfer data in the existing data, a one-degree-edge relationship exists between the transfer parties; for Location Based Services (LBS) address data in existing data, there is a one-degree relationship between users of the same address, and so on.
Optionally, in the embodiment of the present application, in consideration of the difference in closeness between the edge relationships extracted from different types of existing data, different weight values may be set for the edge relationships extracted from different types of existing data, for example, the edge relationships extracted from the operator data, the address book data, and the emergency contact data have higher closeness, and a higher weight value may be set for the edge relationship with higher closeness; the coverage rate of the bank card transfer record data is low, the data error of LBS address data acquisition is large, and the timeliness is poor, so that a low weight value can be set for the edge relation extracted from the two types of existing data.
Optionally, in this embodiment of the application, the obtained existing data of the object to be evaluated may include at least one of stock data and incremental data. Incremental data generally refers to data acquired in real time, and weight values of a one-degree-edge relationship can be directly calculated. For example, if the user has to import operator data when applying a loan for placing an order, the first degree relationship can be determined according to the call records imported in real time. Since the incremental data is the latest data, it is natural to reflect the risk of the user at this time more. So, in general, if there is incremental data, the incremental data is preferentially used to extract the edge relation of the user. But in some scenarios, inventory data (i.e., data historically stored in the system) may have to be used. For example, when a new user applies for loan, the personal data of the new user is not imported, but the personal data is imported in the historical data of the new user, and then the stock data also has the use value, and the stock data is used for extracting the edge relation of the user. The time corresponding to the incremental data and the time corresponding to the stock data are different, so that different algorithms are required to be adopted for mining the incremental data and the stock data.
Optionally, in this embodiment of the application, according to a difference of an actual risk assessment policy, it may be necessary to fuse the mined edge relations, and specifically, when performing edge relation fusion, this embodiment at least includes at least one of feature layer fusion and network layer fusion. The feature layer fusion can be that each sub-network independently constructs the edge relation in the graph feature and fuses in the feature layer, so that the advantage of the arrangement is that the edge relation fusion processing is conveniently, visually and parallelly carried out; the network layer fusion has the advantages that various strong edge relations and weak edge relations are weighted and fused based on corresponding weight values, so that the inherent meaning of the edge relations is richer, and the coverage probability of the edges is improved. In the embodiment of the present application, when the amount of the margin coefficient is small, the feature layer fusion mode may be used to perform the fusion of the edge relationship, and when the amount of the margin coefficient is large, the network layer fusion mode may be used to perform the fusion of the edge relationship.
S302, determining initial characteristics of the object to be evaluated according to the on-line characteristic table and the edge relation of at least one dimension.
The characteristics of each user are recorded in the online characteristic table, and can be determined according to the first-degree, second-degree or even higher-dimension edge relation among the users, and the characteristic table can be imported to the online after the determination, so that the characteristics can be used when the initial characteristics of the object to be evaluated are determined each time the risk evaluation is performed on the object to be evaluated.
Optionally, in this step, a relevant user of the object to be evaluated is determined according to the edge relation of at least one dimension, and then the feature of the relevant user is looked up in the online feature table and is transmitted to the object to be evaluated as the initial feature of the object to be evaluated. For example, if the object to be evaluated and the user a have an edge relationship of at least one dimension, the user a may be the associated user of the object to be evaluated at this time, and it is assumed that the feature recorded in the online feature table by the user a is a credit loss user, and the initial feature of the object to be evaluated is a credit loss user.
Optionally, in a general case, the user identifier in the online feature table is a mobile phone number, an equipment identifier, and the like of the user, and the user feature corresponding to the user identifier is constructed by using a preset algorithm, for example, a large number of time slice features are produced in batch by using a customer relationship management (RFM) model, and a service feature is constructed according to service understanding. The amount of users in the online feature table determines the initial feature coverage and accuracy determined by the object to be evaluated. For example, assuming that there are 100 associated users of the object to be evaluated, and only the features of 3 associated users are recorded in the online feature table, at this time, the initial features of the object to be evaluated only use the features of three associated users, and the coverage rate and accuracy are relatively low. Therefore, in order to improve the accuracy of determining the initial feature of the object to be evaluated in the embodiment of the present application, the embodiment needs to expand the amount of users in the online feature table. Specifically, when the online characteristic table is expanded, historical stock data is acquired as much as possible to determine that the user characteristics with strong stability are added to the table. For the characteristic of strong timeliness, the coverage rate is properly sacrificed, and the validity period is ensured. Optionally, in the online feature table, when a user has multiple records at different time points, the most recent one of the records can be selected to ensure that the timeliness of the feature is strongest; and the method can also refer to a forgetting curve, perform weighted fusion on the features at different time points, and the like.
Optionally, in this embodiment of the application, if the initial feature of the object to be evaluated, which is required for risk assessment measurement, is simple, for example, an address is determined from an identity card number, existing data may be directly mined at this time, and a simple initial feature may be obtained therefrom, without determining an edge relationship, and determined based on an online feature table.
S303, calling a preset model algorithm to extract the existing data characteristics of the object to be evaluated from the initial characteristics.
Optionally, in the embodiment of the present application, a model algorithm library may be preset, where a plurality of model algorithms for data extraction are recorded, and examples of the model algorithms may include, but are not limited to: LR algorithm, XGBoost algorithm, RF algorithm, GBDT algorithm, etc. In the step, when the existing data features of the object to be evaluated are extracted from the initial features, different model algorithms can be called to extract the existing data features of the object to be evaluated based on different types of initial features. Specifically, an LR algorithm can be called to extract the existing data features of the object to be evaluated from the text features in the initial features; calling an XGboost algorithm to extract existing data characteristics of the object to be evaluated from the relation characteristics in the initial characteristics; calling an RF algorithm to extract the existing data characteristics of the object to be evaluated from the RFM characteristics in the initial characteristics; and calling a GBDT algorithm to extract the existing data characteristics of the object to be evaluated and the like from the external characteristics in the initial characteristics.
S304, if the data missing event is detected to exist according to the existing data characteristics of the object to be evaluated, the existing data characteristics are transformed to obtain the missing data characteristics of the object to be evaluated.
S305, performing risk assessment on the object to be assessed according to the existing data characteristics and the missing data characteristics of the object to be assessed.
According to the scheme of the embodiment of the application, after the existing data of the object to be evaluated is obtained, data mining is carried out on the existing data to obtain the edge relation of at least one dimension, the initial characteristic of the object to be evaluated is determined from the on-line characteristic table according to the edge relation, and then the preset model algorithm is called to further extract the more accurate existing data characteristic of the object to be evaluated from the initial characteristic. And subsequently, if a data missing event is detected according to the existing data characteristics, determining the missing data characteristics of the object to be evaluated by transforming the existing data characteristics, and further performing risk evaluation on the object to be evaluated according to the determined missing data characteristics and the existing data characteristics. According to the method and the device, the border relation mining is carried out on a large amount of existing data, the existing data characteristics are obtained through two times of characteristic extraction, and the diversity and the accuracy of the existing data characteristics are improved. And guarantees are provided for subsequent judgment of whether a data missing event exists and accurate risk assessment.
Optionally, on the basis of the foregoing embodiment, after obtaining the edge relationship of at least one dimension included in the existing data, the method further includes: and optimizing and/or verifying the edge relation of the at least one dimension. Specifically, when the edge relationship of at least one dimension is optimized, the edge relationship of two or even higher dimensions may be added on the basis of the first-degree edge relationship, and precise weight assignment may be performed on the newly added edge relationship. When verifying the edge relationship, the edge relationship may be verified from two aspects of the discrimination and the stability of the edge relationship, for example, the discrimination of the edge relationship may be calculated by using a discrimination function, and whether the discrimination is higher than a discrimination threshold value is judged, the stability of the edge relationship may be calculated by using a stability function, and whether the stability is higher than a stability threshold value is judged, and if both the discrimination and the stability are higher than the corresponding threshold values, the verification of the edge relationship is passed. The method and the device for optimizing the edge relation have the advantages that the coverage rate of the edge relation and the accuracy of the weight of the edge relation are improved. The method for verifying the side relation has the advantages that the accuracy of the side relation is guaranteed, and the guarantee is provided for the follow-up accurate extraction of the existing data characteristics of the object to be evaluated.
Optionally, on the basis of the above embodiment, when the existing data features of the object to be evaluated are extracted in the embodiment of the present application, the called preset model algorithm includes a GBDT algorithm; and the target loss function during the GBDT algorithm training is determined through a second-order Taylor expansion result of an original loss function of the GBDT algorithm.
Specifically, this embodiment is right-handedThe optimization process of the regular GBDT algorithm can be that the original loss function of the GBDT model is subjected to linear approximation, and first-order Taylor expansion is carried out on the original loss function; then, a polynomial is used for replacing linearity, Taylor is expanded to the second order to obtain a descent method with higher longitude, and further, a loss function after minimization, namely
Figure BDA0002557426930000111
As a function of the target loss when the GBDT algorithm is trained at this time. Wherein the content of the first and second substances,
Figure BDA0002557426930000112
is a target loss function; t is the number of leaf nodes; x is the number ofiAs a data set IjThe ith data in (1); giAnd hiIs a variable extracted from the taylor second order expansion result; λ is a parameter variable. When the optimized GBDT algorithm is used for searching for the optimal segmentation point, some candidates which can become the segmentation point are listed according to a percentile method, and then the optimal segmentation point is found out from the candidates according to the calculation of the target loss function. Also, when a negative loss is encountered during splitting, the GBM stops splitting. It will split up to the specified maximum depth (max _ depth) and then prune back through the head. If a node has no more positive values behind it, it will remove the split. The advantage of this is that when a negative loss (e.g., -2) is followed by a positive loss (e.g., +10), the GBM will stop at-2 because it encounters a negative value. But it will continue to split and then find that the two splits combine to give +8, and thus will remain. The GBDT algorithm after optimization is used in the embodiment of the application has the advantages that only a small number of segmentation points need to be analyzed, and the calculation efficiency is improved. The optimized GBDT algorithm considers the condition that the training data are sparse values, and can specify the default direction of the branches for missing values or specified values, so that the efficiency of the algorithm is further improved. The optimized GBDT algorithm can support column sampling by using a random forest method for reference during column sampling, so that overfitting can be reduced, and calculation can be reduced. The optimized GBDT algorithm can be stored in a memory in a block form after the characteristic columns are sorted, and can be repeated in iterationReusing; although boosting algorithm iterations must be serial, parallel processing can be done as each feature column is processed. The optimized GBDT algorithm can optimize and find the optimal segmentation point according to a characteristic column mode. The optimized GBDT algorithm can also be combined with methods of multithreading, data compression and fragmentation when the data volume is large and the memory is insufficient, so that the efficiency of the algorithm is improved as much as possible.
FIG. 4 is a flow chart of another risk assessment method provided in accordance with an embodiment of the present application; on the basis of the above embodiment, the present embodiment performs further optimization, and gives a description of a specific situation of performing risk assessment on an object to be assessed according to existing data features and missing data features of the object to be assessed. As shown in fig. 4, the method specifically includes:
s401, if the data missing event is detected to exist according to the existing data characteristics of the object to be evaluated, the existing data characteristics are transformed to obtain the missing data characteristics of the object to be evaluated.
The existing data characteristics are obtained by processing existing data of the object to be evaluated.
S402, determining the existing score value and the missing score value of the object to be evaluated according to the existing data characteristics and the missing data characteristics of the object to be evaluated through a machine learning model.
Optionally, in this embodiment of the present application, the existing score value is a score value determined for an existing data feature, and the missing score value is a score value determined for a missing data feature. In this embodiment, an existing score value corresponding to an existing data feature may be predicted according to an existing data feature of an object to be evaluated through a pre-trained feature scoring network of a machine learning model, and a missing score value corresponding to a missing feature may be predicted according to a missing data feature of the object to be evaluated.
Optionally, in this embodiment of the application, for different types of data features, different scoring sub-networks may be used to predict the score value, for example, for a text type data feature, a text scoring sub-network is used to predict the score value; for the relation data characteristics, predicting the scoring value by adopting a relation scoring sub-network; for the RFM data characteristics, predicting the scoring value by adopting an RFM scoring sub-network; for the external data characteristics, an external scoring sub-network is adopted for predicting scoring values and the like. The scoring subnetworks may be combined according to the type of actual existing data features and missing data features to determine the feature scoring networks needed to ultimately predict the existing scores and missing scores of this step.
Optionally, if the missing data features are obtained after WOE transformation and fusion, in this step, when determining the missing score of the object to be evaluated according to the missing data features of the object to be evaluated through a feature scoring network of a machine learning model, a logic regression sigmoid function may be called through the machine learning model (e.g., the feature scoring network of the machine learning model) to process the missing data features of the object to be evaluated, so as to obtain the missing score of the object to be evaluated. So as to improve the accuracy of determining the deletion score value.
And S403, performing risk assessment on the object to be assessed according to the existing score value and the missing score value of the object to be assessed through the scoring card model.
Optionally, in the embodiment of the present application, the existing score value of the object to be evaluated and the missing score value determined in S402 may be input into a pre-trained scoring card model, the scoring card model analyzes the input existing score value and the missing score value based on an algorithm during training, and since the missing score value of the missing data is retrieved at this time, accurate risk evaluation may be performed on the object to be evaluated, so as to obtain a final risk evaluation result.
According to the scheme of the embodiment of the application, if the data missing event existing in the risk assessment is detected according to the existing data characteristics of the object to be assessed, the existing data characteristics are transformed to determine the missing data characteristics of the object to be assessed, the missing score value and the existing score value of the object to be assessed are determined according to the determined missing data characteristics and the existing data characteristics through a machine learning model, the missing score value and the existing score value are analyzed through a scoring card model, and the risk assessment of the object to be assessed is completed. According to the scheme of the embodiment of the application, the machine learning model and the scoring card model are combined to carry out risk assessment on the object to be assessed, so that the accuracy of the risk assessment is improved. A risk assessment scenario is optimized.
FIG. 5 is a flow chart of another risk assessment method provided in accordance with an embodiment of the present application; on the basis of the above embodiments, the present embodiment performs further optimization, and gives a description of how to determine whether there is a data missing event. As shown in fig. 5, the method specifically includes:
s501, determining the existing rating value of the object to be evaluated according to the existing data characteristics of the object to be evaluated through a machine learning model, and inputting the existing rating value into a rating card model.
Optionally, in the embodiment of the present application, after the existing data of the object to be evaluated is obtained, it is not known whether the obtained existing data is complete, at this time, the existing data characteristics may be determined according to the obtained existing data of the object to be evaluated, and then the score value, that is, the existing score value of the object to be evaluated, is determined for the existing data characteristics based on the machine learning model introduced in the above embodiment. It should be noted that other manners may also be adopted to determine the existing rating value of the object to be evaluated, for example, rating the data features according to a preset rating rule, and the like. And finally, inputting the determined existing rating value into the rating card model. Because the scoring card model has good interpretability, the scoring card model can analyze the input existing scoring value, position whether the scoring value is too high, too low or missing, if the scoring value is too high, the output result can be interpreted as that the risk assessment is passed, and if the scoring value is too low, the output result can be interpreted as that the risk assessment is not passed; if the score value is missing, the output result can be interpreted as data missing.
And S502, if the output result of the scoring card model is data missing, detecting that a data missing event exists.
Optionally, if the existing score value is input into the scoring card model in S501 and the output result of the scoring card model is data loss, it indicates that the data loss event is detected in this embodiment, and the risk assessment is performed after retrieving the missing data feature through the following operation of S503. If the existing rating value is input into the rating card model in the step S501 and the output result of the rating card model is not data missing, the output result of the rating card model in the step S501 is used as the risk assessment result, that is, the risk assessment of the object to be assessed is finished, and the step, that is, the subsequent operation is not executed.
And S503, if the data missing event is detected to exist according to the existing data characteristics of the object to be evaluated, transforming the existing data characteristics to obtain the missing data characteristics of the object to be evaluated.
And the existing data characteristics are obtained by processing the existing data of the object to be evaluated.
And S504, performing risk assessment on the object to be assessed according to the existing data characteristics and the missing data characteristics of the object to be assessed.
According to the technical scheme of the embodiment of the application, the existing rating value is determined according to the existing data characteristics of the object to be evaluated through the machine learning model and then is input into the scoring card model, if the output result of the scoring card model is data loss, the situation that a data loss event exists in the current risk evaluation is detected, at the moment, the existing data characteristics are transformed to determine the missing data characteristics of the object to be evaluated, and then the risk evaluation is carried out on the object to be evaluated according to the determined missing data characteristics and the existing data characteristics. According to the scheme of the embodiment of the application, the interpretability of the scoring card model is utilized, the machine learning model and the scoring card model are combined to judge whether a data missing event exists or not, the accuracy of judging the data missing event is improved, the accuracy of the overall risk evaluation result is improved, and the risk evaluation scheme is optimized.
FIG. 6A is a flow chart of another risk assessment method provided in accordance with an embodiment of the present application; FIG. 6B is an architecture diagram of a risk assessment system provided in accordance with an embodiment of the present application; on the basis of the above embodiments, the present embodiment further optimizes and gives a description of a preferred example of risk assessment. As shown in fig. 6A-6B, the method specifically includes:
s601, performing data mining on the existing data of the object to be evaluated to obtain the edge relation of at least one dimension contained in the existing data.
Optionally, as shown in fig. 6B, in this step, text data may be obtained from a short message and an address data source; acquiring social data from an operator and an address book data source; acquiring consumption data from credit card bills and e-commerce data sources; acquiring external data from a third party credit investigation and blacklist database data source; and acquiring other data from the equipment and the terminal behavior data source, and taking the acquired text data, social data, consumption data, external data and other data as the existing data of the object to be evaluated. Then inputting the obtained existing data into a data mining layer in a feature extraction network of a machine learning model, wherein the data mining layer adopts different mining algorithms to mine the big data of the obtained existing data, for example, Natural Language Processing (NLP) algorithm is adopted to mine the data of the obtained text data, and the edge relation of at least one dimension contained in the text data is obtained; data mining is carried out on the obtained social data by adopting a graph algorithm to obtain the edge relation of at least one dimension contained in the social data; adopting an RFM aggregation algorithm to carry out data mining on the acquired consumption data to obtain the edge relation of at least one dimension contained in the consumption data; data mining is carried out on the obtained external data by adopting a deep learning algorithm, and the edge relation of at least one dimension contained in the external data is obtained; and performing data mining on the obtained other data by adopting other algorithms to obtain the edge relation of at least one dimension contained in the other data.
S602, optimizing and verifying the edge relation of at least one dimension.
Optionally, the edge relations of all dimensions obtained in S601 are optimized and verified to improve the coverage rate and accuracy of the edge relations.
S603, determining the initial characteristics of the object to be evaluated according to the on-line characteristic table and the edge relation of at least one dimension.
Optionally, as shown in fig. 6B, in this step, the optimized and verified edge relationship in S602 may be input to a first feature extraction layer in a feature extraction network of the machine learning model, where the first feature extraction layer may extract the initial features of the object to be evaluated by using different algorithms based on different types of the edge relationship; if a text feature extraction algorithm is adopted, determining text features as part of initial features of the object to be evaluated according to the on-line feature table and the edge relation of the text type of at least one dimension; determining the relation characteristics as a part of initial characteristics of the object to be evaluated according to the on-line characteristic table and the edge relation of the relation type of at least one dimension by adopting a relation characteristic extraction algorithm; if an RFM feature extraction algorithm is adopted, determining consumption features as part of initial features of the object to be evaluated according to the online feature table and the edge relation of the consumption type of at least one dimension; determining external features as a part of initial features of an object to be evaluated according to an on-line feature table and an edge relation of external feature types of at least one dimension by adopting an external feature extraction algorithm; and determining other features as part of initial features of the object to be evaluated by adopting other feature extraction algorithms according to the on-line feature table and other types of edge relations of at least one dimension.
S604, calling a preset model algorithm to extract the existing data features of the object to be evaluated from the initial features.
Optionally, as shown in fig. 6B, after the initial feature of the object to be evaluated is extracted, the first feature extraction layer inputs the initial feature to a second feature extraction layer of the feature extraction network, and the second feature extraction layer performs more abstract feature extraction on the basis of the first feature extraction layer, so that the extracted feature is more accurate. Specifically, an LR algorithm is called to extract more accurate text features from the text features in the initial features as part of existing data features of the object to be evaluated; calling an XGboost algorithm to extract more accurate relational features from the relational features in the initial features as part of existing data features of the object to be evaluated; calling an FR algorithm to extract more accurate consumption characteristics from the consumption characteristics in the initial characteristics as part of existing data characteristics of the object to be evaluated; calling a GBDT algorithm (namely, the optimized GBDT algorithm described in the above embodiment) to extract more accurate external features from the external features in the initial features as a part of existing data features of the object to be evaluated; and calling other algorithms to extract more accurate other features from other features in the initial features as part of the existing data features of the object to be evaluated.
S605, determining the existing rating value of the object to be evaluated according to the existing data characteristics of the object to be evaluated through the machine learning model, and inputting the existing rating value into the rating card model.
Optionally, as shown in fig. 6B, existing data features of the object to be evaluated, which are extracted by the second feature extraction layer of the feature extraction network of the machine learning model, are input into the feature scoring network of the machine learning model. The feature scoring network is composed of a plurality of different scoring sub-networks, and different scoring sub-networks are used for scoring different types of existing data features, for example, a text scoring sub-network is used for scoring accurate text features in the existing data features; grading the accurate relation characteristics in the existing data characteristics by adopting a relation grading sub-network; grading the accurate consumption characteristics in the existing data characteristics by adopting an RFM grading sub-network; adopting an external scoring sub-network to score accurate external features in the existing data features; and scoring the precise other characteristics in the existing data characteristics by adopting other scoring sub-networks. And (4) inputting the scoring results output by each scoring sub-network in the feature scoring network into the scoring card model.
And S606, judging whether the output result of the scoring card model is data missing or not, if so, executing S607, and otherwise, executing S613.
Optionally, the score card model may analyze each input existing score value, determine whether data is missing, if so, output a result that the data is missing, at this time, the operation of S607 is executed, and a data missing event is detected. If the risk is not missing, the output result is the evaluation result of the risk evaluation, and at this time, the operation of S613 is executed to obtain the final evaluation result of the object to be evaluated.
And S607, if the output result of the scoring card model is data missing, detecting that a data missing event exists.
S608, performing evidence weight WOE transformation on the existing data characteristics to obtain transformation characteristics.
And S609, determining the weight value of the transformation characteristic.
S610, determining the missing data characteristics of the object to be evaluated according to the transformation characteristics and the weight values of the transformation characteristics.
S611, calling a logistic regression sigmoid function through the machine learning model to process the missing data characteristics of the object to be evaluated, and obtaining the missing score value of the object to be evaluated.
Optionally, the operations of S608-S611 in this embodiment may be executed by a feature extraction network and a feature scoring network of the machine learning model in fig. 6B, or may be executed by adding a network to the machine learning model, which is not limited in this embodiment.
And S612, performing risk assessment on the object to be assessed according to the existing score value and the missing score value of the object to be assessed through the scoring card model.
Optionally, as shown in fig. 6B, after obtaining the missing score of the object to be evaluated, the missing score is also input into the score card model, and the score card model performs risk assessment on the object to be evaluated again based on the existing score determined and input in S605 and the missing score determined and input in S611, so as to obtain a final risk assessment result.
S613, obtaining the final evaluation result of the object to be evaluated.
In this embodiment of the present application, the existing data of the object to be evaluated includes: at least one of text data, social data, consumption data, and external data; and performing an operation of determining the characteristics of the existing data for each kind of existing data through the machine learning model.
According to the technical scheme of the embodiment of the application, the scoring card model is adopted for final risk assessment, and the scoring card model has interpretability, so that the embodiment of the application keeps the interpretability on the granularity of the data source, and can position the reason to give feedback of data loss. The method and the device have the advantages that the machine learning model is utilized to determine the characteristics of the existing data for each kind of existing data, so that the threshold of artificially constructing the characteristics is reduced, and the accuracy of the determined characteristics of the existing data is improved. The WOE transformation is utilized to solve the problem of missing data, and the accuracy of the determined characteristics of the missing data is improved. For data in different data sources, the operations of data mining, feature extraction, feature evaluation and the like are independent and can be executed in parallel, the processing efficiency is improved, and simultaneously, the feature input into the score card main model in each data source dimension is ensured, so that the diversity of the input feature of the score card model is ensured. In addition, each network layer structure of the machine learning model in the embodiment of the application is flexible, flexible combination can be performed according to different data sources to be processed, a proper algorithm is called for processing, and the expansibility and the flexibility are strong.
FIG. 7 is a schematic structural diagram of a risk assessment device according to an embodiment of the present application; the method and the device are suitable for the condition of risk assessment of the object to be assessed. For example, it may be the case that the loan user is risk assessed based on his existing data at the time of loan approval. The apparatus may implement the risk assessment method according to any embodiment of the present application, the apparatus may be integrated in an electronic device, the apparatus 700 includes:
a missing feature determining module 701, configured to, if a data missing event is detected to exist according to existing data features of an object to be evaluated, transform the existing data features to obtain missing data features of the object to be evaluated; the existing data characteristics are obtained by processing the existing data of the object to be evaluated;
and the risk assessment module 702 is configured to perform risk assessment on the object to be assessed according to the existing data feature and the missing data feature of the object to be assessed.
According to the technical scheme of the embodiment of the application, if the data missing event existing in the risk assessment is detected according to the existing data characteristics of the object to be assessed, the missing data characteristics of the object to be assessed are determined by transforming the existing data characteristics, and then the risk assessment is carried out on the object to be assessed according to the determined missing data characteristics and the existing data characteristics. According to the scheme of the embodiment of the application, when a data missing event exists in the risk assessment process, the missing data is not ignored for assessment, the existing data characteristics are transformed, the characteristics of the missing data are accurately reduced, and then the risk assessment is carried out, so that the problem that the risk assessment result is inaccurate when the existing data of the object to be assessed are incomplete is solved, the accuracy of the risk assessment is greatly improved, and the risk assessment scheme is optimized.
Optionally, the missing feature determining module 701 includes:
the characteristic transformation unit is used for carrying out evidence weight WOE transformation on the existing data characteristics to obtain transformation characteristics;
a weight determination unit for determining a weight value of the transformation feature;
and the missing characteristic determining unit is used for determining the missing data characteristic of the object to be evaluated according to the transformation characteristic and the weight value of the transformation characteristic.
Optionally, the apparatus further comprises:
the side relation mining module is used for carrying out data mining on the existing data of the object to be evaluated to obtain the side relation of at least one dimension contained in the existing data;
an initial feature determining module, configured to determine an initial feature of the object to be evaluated according to an online feature table and the edge relation of the at least one dimension;
and the existing characteristic determining module is used for calling a preset model algorithm to extract the existing data characteristics of the object to be evaluated from the initial characteristics.
Optionally, the apparatus further comprises:
and the optimization and verification module is used for optimizing and/or verifying the edge relation of the at least one dimension.
Optionally, the preset model algorithm includes a GBDT algorithm; and the target loss function during the GBDT algorithm training is determined through a second-order Taylor expansion result of an original loss function of the GBDT algorithm.
Optionally, the risk assessment module 702 includes:
the score value determining unit is used for determining the existing score value and the missing score value of the object to be evaluated according to the existing data characteristic and the missing data characteristic of the object to be evaluated through a machine learning model;
and the risk evaluation unit is used for carrying out risk evaluation on the object to be evaluated according to the existing score value and the missing score value of the object to be evaluated through the scoring card model.
Optionally, when the score value determining unit determines the missing score value of the object to be evaluated according to the missing data feature of the object to be evaluated through a machine learning model, the score value determining unit is specifically configured to:
and calling a logistic regression sigmoid function through the machine learning model to process the missing data characteristics of the object to be evaluated to obtain the missing score value of the object to be evaluated.
Optionally, the risk assessment module 702 further includes:
the score value determination unit is configured to: determining the existing score value of the object to be evaluated according to the existing data characteristics of the object to be evaluated through a machine learning model, and inputting the existing score value into a scoring card model;
and the missing time detection module is used for detecting that a data missing event exists if the output result of the scoring card model is data missing.
Optionally, the existing data of the object to be evaluated includes: at least one of text data, social data, consumption data, and external data; and performing an operation of determining the characteristics of the existing data for each kind of existing data through the machine learning model.
According to an embodiment of the present application, an electronic device and a readable storage medium are also provided.
Fig. 8 is a block diagram of an electronic device according to the risk assessment method of the embodiment of the present application. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular phones, smart phones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be examples only, and are not meant to limit implementations of the present application that are described and/or claimed herein.
As shown in fig. 8, the electronic apparatus includes: one or more processors 801, memory 802, and interfaces for connecting the various components, including a high speed interface and a low speed interface. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions for execution within the electronic device, including instructions stored in or on the memory to display graphical information of a GUI on an external input/output apparatus (such as a display device coupled to the interface). In other embodiments, multiple processors and/or multiple buses may be used, along with multiple memories and multiple memories, as desired. Also, multiple electronic devices may be connected, with each device providing portions of the necessary operations (e.g., as a server array, a group of blade servers, or a multi-processor system). Fig. 8 illustrates an example of a processor 801.
The memory 802 is a non-transitory computer readable storage medium as provided herein. Wherein the memory stores instructions executable by at least one processor to cause the at least one processor to perform the risk assessment methods provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the risk assessment method provided herein.
Memory 802, as a non-transitory computer-readable storage medium, may be used to store non-transitory software programs, non-transitory computer-executable programs, and modules, such as program instructions/modules corresponding to the risk assessment methods in embodiments of the present application (e.g., missing feature determination module 701 and risk assessment module 702 shown in fig. 7). The processor 801 executes various functional applications of the server and data processing by running non-transitory software programs, instructions, and modules stored in the memory 802, that is, implements the risk assessment method in the above-described method embodiments.
The memory 802 may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store data created according to the use of the electronic device of the risk assessment method, and the like. Further, the memory 802 may include high speed random access memory and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid state storage device. In some embodiments, the memory 802 optionally includes memory located remotely from the processor 801, which may be connected to the risk assessment method electronics over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the risk assessment method may further include: an input device 803 and an output device 804. The processor 801, the memory 802, the input device 803, and the output device 804 may be connected by a bus or other means, and are exemplified by a bus in fig. 8.
The input device 803 may receive input numeric or character information and generate key signal inputs related to user settings and function controls of the electronic device of the risk assessment method, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer, one or more mouse buttons, a track ball, a joystick, or other input device. The output devices 804 may include a display device, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibrating motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device can be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, receiving data and instructions from, and transmitting data and instructions to, a storage system, at least one input device, and at least one output device.
These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and a pointing device (e.g., a mouse or a trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), Wide Area Networks (WANs), and the Internet.
The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical host and VPS service are overcome.
According to the technical scheme of the embodiment of the application, the scoring card model is adopted for final risk assessment, and the scoring card model has interpretability, so that the embodiment of the application keeps the interpretability on the granularity of the data source, and can position the reason to give feedback of data loss. The method and the device have the advantages that the machine learning model is utilized to determine the characteristics of the existing data for each kind of existing data, so that the threshold of artificially constructing the characteristics is reduced, and the accuracy of the determined characteristics of the existing data is improved. The WOE transformation is utilized to solve the problem of missing data, and the accuracy of the determined characteristics of the missing data is improved. For data in different data sources, the operations of data mining, feature extraction, feature evaluation and the like are independent and can be executed in parallel, the processing efficiency is improved, and simultaneously, the feature input into the score card main model in each data source dimension is ensured, so that the diversity of the input feature of the score card model is ensured. In addition, each network layer structure of the machine learning model in the embodiment of the application is flexible, flexible combination can be performed according to different data sources to be processed, a proper algorithm is called for processing, and the expansibility and the flexibility are strong.
It should be understood that various forms of the flows shown above may be used, with steps reordered, added, or deleted. For example, the steps described in the present application may be executed in parallel, sequentially, or in different orders, and the present invention is not limited thereto as long as the desired results of the technical solutions disclosed in the present application can be achieved.
The above-described embodiments should not be construed as limiting the scope of the present application. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may be made in accordance with design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present application shall be included in the protection scope of the present application.

Claims (20)

1. A method of risk assessment, comprising:
if a data missing event is detected to exist according to the existing data characteristics of the object to be evaluated, transforming the existing data characteristics to obtain the missing data characteristics of the object to be evaluated; the existing data characteristics are obtained by processing the existing data of the object to be evaluated;
and performing risk assessment on the object to be assessed according to the existing data characteristics and the missing data characteristics of the object to be assessed.
2. The method of claim 1, wherein the transforming the existing data features to obtain missing data features of the object to be evaluated comprises:
carrying out evidence weight WOE transformation on the existing data characteristics to obtain transformation characteristics;
determining a weight value of the transformation feature;
and determining the missing data characteristic of the object to be evaluated according to the transformation characteristic and the weight value of the transformation characteristic.
3. The method of claim 1, further comprising:
performing data mining on the existing data of the object to be evaluated to obtain the edge relation of at least one dimension contained in the existing data;
determining the initial characteristics of the object to be evaluated according to an on-line characteristic table and the edge relation of the at least one dimension;
and calling a preset model algorithm to extract the existing data characteristics of the object to be evaluated from the initial characteristics.
4. The method of claim 3, after obtaining the edge relation of at least one dimension included in the existing data, further comprising:
and optimizing and/or verifying the edge relation of the at least one dimension.
5. The method of claim 3, wherein the predetermined model algorithm comprises a GBDT algorithm; and the target loss function during the GBDT algorithm training is determined through a second-order Taylor expansion result of an original loss function of the GBDT algorithm.
6. The method of claim 1, wherein the performing a risk assessment on the subject to be assessed according to the existing data features and the missing data features of the subject to be assessed comprises:
determining the existing score value and the missing score value of the object to be evaluated according to the existing data characteristics and the missing data characteristics of the object to be evaluated through a machine learning model;
and performing risk assessment on the object to be assessed according to the existing score value and the missing score value of the object to be assessed through a scoring card model.
7. The method of claim 6, wherein the determining, by the machine learning model, the missing rating value of the subject to be evaluated from the missing data characteristic of the subject to be evaluated comprises:
and calling a logistic regression sigmoid function through the machine learning model to process the missing data characteristics of the object to be evaluated to obtain the missing score value of the object to be evaluated.
8. The method of claim 1, further comprising:
determining the existing score value of the object to be evaluated according to the existing data characteristics of the object to be evaluated through a machine learning model, and inputting the existing score value into a scoring card model;
and if the output result of the scoring card model is data missing, detecting that a data missing event exists.
9. The method of any of claims 1-8, wherein the existing data of the object to be evaluated comprises: at least one of text data, social data, consumption data, and external data; and performing an operation of determining the characteristics of the existing data for each kind of existing data through the machine learning model.
10. A risk assessment device comprising:
the missing characteristic determining module is used for transforming the existing data characteristics to obtain the missing data characteristics of the object to be evaluated if the existing data characteristics of the object to be evaluated detect that a data missing event exists; the existing data characteristics are obtained by processing the existing data of the object to be evaluated;
and the risk assessment module is used for performing risk assessment on the object to be assessed according to the existing data characteristics and the missing data characteristics of the object to be assessed.
11. The apparatus of claim 10, wherein the missing feature determination module comprises:
the characteristic transformation unit is used for carrying out evidence weight WOE transformation on the existing data characteristics to obtain transformation characteristics;
a weight determination unit for determining a weight value of the transformation feature;
and the missing characteristic determining unit is used for determining the missing data characteristic of the object to be evaluated according to the transformation characteristic and the weight value of the transformation characteristic.
12. The apparatus of claim 10, further comprising:
the side relation mining module is used for carrying out data mining on the existing data of the object to be evaluated to obtain the side relation of at least one dimension contained in the existing data;
an initial feature determining module, configured to determine an initial feature of the object to be evaluated according to an online feature table and the edge relation of the at least one dimension;
and the existing characteristic determining module is used for calling a preset model algorithm to extract the existing data characteristics of the object to be evaluated from the initial characteristics.
13. The apparatus of claim 12, further comprising:
and the optimization and verification module is used for optimizing and/or verifying the edge relation of the at least one dimension.
14. The apparatus of claim 12, wherein the predetermined model algorithm comprises a GBDT algorithm; and the target loss function during the GBDT algorithm training is determined through a second-order Taylor expansion result of an original loss function of the GBDT algorithm.
15. The apparatus of claim 10, wherein the risk assessment module comprises:
the score value determining unit is used for determining the existing score value and the missing score value of the object to be evaluated according to the existing data characteristic and the missing data characteristic of the object to be evaluated through a machine learning model;
and the risk evaluation unit is used for carrying out risk evaluation on the object to be evaluated according to the existing score value and the missing score value of the object to be evaluated through the scoring card model.
16. The apparatus according to claim 15, wherein the score value determining unit, when determining the missing score value of the object to be evaluated according to the missing data feature of the object to be evaluated through a machine learning model, is specifically configured to:
and calling a logistic regression sigmoid function through the machine learning model to process the missing data characteristics of the object to be evaluated to obtain the missing score value of the object to be evaluated.
17. The apparatus of claim 10, wherein the risk assessment module further comprises:
the score value determination unit is configured to: determining the existing score value of the object to be evaluated according to the existing data characteristics of the object to be evaluated through a machine learning model, and inputting the existing score value into a scoring card model;
and the missing time detection module is used for detecting that a data missing event exists if the output result of the scoring card model is data missing.
18. The apparatus of any of claims 10-17, wherein the existing data of the object to be evaluated comprises: at least one of text data, social data, consumption data, and external data; and performing an operation of determining the characteristics of the existing data for each kind of existing data through the machine learning model.
19. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein the content of the first and second substances,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the risk assessment method of any one of claims 1-9.
20. A non-transitory computer readable storage medium having stored thereon computer instructions for causing the computer to perform the risk assessment method of any one of claims 1-9.
CN202010596449.4A 2020-06-28 2020-06-28 Risk assessment method, apparatus, device and storage medium Active CN111797994B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010596449.4A CN111797994B (en) 2020-06-28 2020-06-28 Risk assessment method, apparatus, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010596449.4A CN111797994B (en) 2020-06-28 2020-06-28 Risk assessment method, apparatus, device and storage medium

Publications (2)

Publication Number Publication Date
CN111797994A true CN111797994A (en) 2020-10-20
CN111797994B CN111797994B (en) 2024-04-05

Family

ID=72803194

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010596449.4A Active CN111797994B (en) 2020-06-28 2020-06-28 Risk assessment method, apparatus, device and storage medium

Country Status (1)

Country Link
CN (1) CN111797994B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107633030A (en) * 2017-09-04 2018-01-26 深圳市华傲数据技术有限公司 Credit estimation method and device based on data model
CN107633455A (en) * 2017-09-04 2018-01-26 深圳市华傲数据技术有限公司 Credit estimation method and device based on data model
US20180285886A1 (en) * 2017-04-03 2018-10-04 The Dun & Bradstreet Corporation System and method for global third party intermediary identification system with anti-bribery and anti-corruption risk assessment
CN109360084A (en) * 2018-09-27 2019-02-19 平安科技(深圳)有限公司 Appraisal procedure and device, storage medium, the computer equipment of reference default risk
WO2019237523A1 (en) * 2018-06-11 2019-12-19 平安科技(深圳)有限公司 Safety risk evaluation method and apparatus, computer device, and storage medium
CN110706095A (en) * 2019-09-30 2020-01-17 四川新网银行股份有限公司 Target node key information filling method and system based on associated network
CN111080397A (en) * 2019-11-18 2020-04-28 支付宝(杭州)信息技术有限公司 Credit evaluation method and device and electronic equipment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180285886A1 (en) * 2017-04-03 2018-10-04 The Dun & Bradstreet Corporation System and method for global third party intermediary identification system with anti-bribery and anti-corruption risk assessment
CN107633030A (en) * 2017-09-04 2018-01-26 深圳市华傲数据技术有限公司 Credit estimation method and device based on data model
CN107633455A (en) * 2017-09-04 2018-01-26 深圳市华傲数据技术有限公司 Credit estimation method and device based on data model
WO2019237523A1 (en) * 2018-06-11 2019-12-19 平安科技(深圳)有限公司 Safety risk evaluation method and apparatus, computer device, and storage medium
CN109360084A (en) * 2018-09-27 2019-02-19 平安科技(深圳)有限公司 Appraisal procedure and device, storage medium, the computer equipment of reference default risk
CN110706095A (en) * 2019-09-30 2020-01-17 四川新网银行股份有限公司 Target node key information filling method and system based on associated network
CN111080397A (en) * 2019-11-18 2020-04-28 支付宝(杭州)信息技术有限公司 Credit evaluation method and device and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
赵海鹏;李丹;: "基于数据可视化Rattle的个人信用风险评价建模", 金融管理研究, no. 02 *

Also Published As

Publication number Publication date
CN111797994B (en) 2024-04-05

Similar Documents

Publication Publication Date Title
CN110442712B (en) Risk determination method, risk determination device, server and text examination system
CN111311030B (en) User credit risk prediction method and device based on influence factor detection
CN111667056A (en) Method and apparatus for searching model structure
CN112765452B (en) Search recommendation method and device and electronic equipment
CN112949973A (en) AI-combined robot process automation RPA process generation method
CN114036398A (en) Content recommendation and ranking model training method, device, equipment and storage medium
CN113792154A (en) Method and device for determining fault association relationship, electronic equipment and storage medium
CN113657113A (en) Text processing method and device and electronic equipment
CN114417118A (en) Abnormal data processing method, device, equipment and storage medium
CN112989170A (en) Keyword matching method applied to information search, information search method and device
JP2019059348A (en) Operation information processing device and processing method for the same
US11847599B1 (en) Computing system for automated evaluation of process workflows
CN111797994B (en) Risk assessment method, apparatus, device and storage medium
CN114511022B (en) Feature screening, behavior recognition model training and abnormal behavior recognition method and device
CN112614479B (en) Training data processing method and device and electronic equipment
CN114881521A (en) Service evaluation method, device, electronic equipment and storage medium
CN115619245A (en) Portrait construction and classification method and system based on data dimension reduction method
CN114281990A (en) Document classification method and device, electronic equipment and medium
US20220172102A1 (en) Machine learning model trained using features extracted from n-grams of mouse event data
CN113850072A (en) Text emotion analysis method, emotion analysis model training method, device, equipment and medium
CN113989562A (en) Model training and image classification method and device
CN113722593A (en) Event data processing method and device, electronic equipment and medium
CN113806541A (en) Emotion classification method and emotion classification model training method and device
CN113595886A (en) Instant messaging message processing method and device, electronic equipment and storage medium
CN113190154A (en) Model training method, entry classification method, device, apparatus, storage medium, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant