CN111797994B - Risk assessment method, apparatus, device and storage medium - Google Patents

Risk assessment method, apparatus, device and storage medium Download PDF

Info

Publication number
CN111797994B
CN111797994B CN202010596449.4A CN202010596449A CN111797994B CN 111797994 B CN111797994 B CN 111797994B CN 202010596449 A CN202010596449 A CN 202010596449A CN 111797994 B CN111797994 B CN 111797994B
Authority
CN
China
Prior art keywords
data
missing
evaluated
existing
existing data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010596449.4A
Other languages
Chinese (zh)
Other versions
CN111797994A (en
Inventor
邓继禹
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN202010596449.4A priority Critical patent/CN111797994B/en
Publication of CN111797994A publication Critical patent/CN111797994A/en
Application granted granted Critical
Publication of CN111797994B publication Critical patent/CN111797994B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof

Abstract

The application discloses a risk assessment method, a risk assessment device, risk assessment equipment and risk assessment storage media, relates to the field of artificial intelligence, in particular to a deep learning technology and a big data technology, and particularly relates to a method for preventing and controlling financial risk and anti-fraud by using a deep learning model and the big data technology. The specific implementation scheme is as follows: if the existing data characteristics of the object to be evaluated detect that a data missing event exists, the existing data characteristics are transformed to obtain missing data characteristics of the object to be evaluated; the existing data characteristics are obtained by processing existing data of the object to be evaluated; and carrying out risk assessment on the object to be assessed according to the existing data characteristics and the missing data characteristics of the object to be assessed. The accuracy of risk assessment can be improved under the condition that the existing data of the object to be assessed is incomplete, so that a risk assessment scheme is optimized.

Description

Risk assessment method, apparatus, device and storage medium
Technical Field
The application relates to the field of artificial intelligence, in particular to a deep learning technology and a big data technology, and especially relates to a method for preventing and controlling financial risks and anti-fraud by using a deep learning model and a big data technology.
Background
With the development of artificial intelligence technology, risk assessment on users has been applied to more and more fields, for example, financial wind control fields require risk assessment on users when they transact financial business; the anti-fraud application field also requires risk assessment of the application user, etc. in order to find the fraudsters. The current risk assessment technology is usually based on real-time data uploaded by users when handling business or using applications, but when the data uploaded by users are not complete, the users cannot be accurately and comprehensively assessed, and improvement is needed.
Disclosure of Invention
The disclosure provides a risk assessment method, a risk assessment device, risk assessment equipment and a storage medium.
According to an aspect of the present disclosure, there is provided a risk assessment method including:
if the existing data characteristics of the object to be evaluated detect that a data missing event exists, the existing data characteristics are transformed to obtain missing data characteristics of the object to be evaluated; the existing data characteristics are obtained by processing existing data of the object to be evaluated;
and carrying out risk assessment on the object to be assessed according to the existing data characteristics and the missing data characteristics of the object to be assessed.
According to another aspect of the present disclosure, there is provided a risk assessment apparatus including:
the missing feature determining module is used for transforming the existing data features to obtain missing data features of the object to be evaluated if the data missing event is detected according to the existing data features of the object to be evaluated; the existing data characteristics are obtained by processing existing data of the object to be evaluated;
and the risk assessment module is used for carrying out risk assessment on the object to be assessed according to the existing data characteristics and the missing data characteristics of the object to be assessed.
According to another aspect of the present disclosure, there is provided an electronic device including:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the risk assessment method of any of the embodiments of the present application.
According to another aspect of the present disclosure, there is provided a non-transitory computer-readable storage medium storing computer instructions for causing a computer to perform the risk assessment method according to any of the embodiments of the present application.
According to the method and the device for risk assessment, the problem that the risk assessment result is inaccurate when the existing data of the object to be assessed is incomplete is solved, so that the risk assessment scheme is optimized.
It should be understood that the description in this section is not intended to identify key or critical features of the embodiments of the disclosure, nor is it intended to be used to limit the scope of the disclosure. Other features of the present disclosure will become apparent from the following specification.
Drawings
The drawings are for better understanding of the present solution and do not constitute a limitation of the present application. Wherein:
FIG. 1 is a flow chart of a risk assessment method provided in accordance with an embodiment of the present application;
FIG. 2A is a flow chart of another risk assessment method provided in accordance with an embodiment of the present application;
FIG. 2B is a schematic diagram of missing data feature determination principles provided according to embodiments of the present application;
FIG. 3 is a flow chart of another risk assessment method provided in accordance with an embodiment of the present application;
FIG. 4 is a flow chart of another risk assessment method provided in accordance with an embodiment of the present application;
FIG. 5 is a flow chart of another risk assessment method provided in accordance with an embodiment of the present application;
FIG. 6A is a flow chart of another risk assessment method provided in accordance with an embodiment of the present application;
FIG. 6B is an architecture diagram of a risk assessment system provided in accordance with an embodiment of the present application;
fig. 7 is a schematic structural diagram of a risk assessment device according to an embodiment of the present application;
fig. 8 is a block diagram of an electronic device for implementing a risk assessment method of an embodiment of the present application.
Detailed Description
Exemplary embodiments of the present application are described below in conjunction with the accompanying drawings, which include various details of the embodiments of the present application to facilitate understanding, and should be considered as merely exemplary. Accordingly, one of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the present application. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
Fig. 1 is a flowchart of a risk assessment method according to an embodiment of the present application. The method and the device are suitable for risk assessment of the object to be assessed. For example, it may be the case that a loan user is risk-assessed based on his existing data at the time of approval of the loan. The embodiment may be performed by a risk assessment device configured in an electronic apparatus, which may be implemented in software and/or hardware. As shown in fig. 1, the method includes:
S101, if the data missing event is detected according to the existing data characteristics of the object to be evaluated, the existing data characteristics are transformed to obtain the missing data characteristics of the object to be evaluated.
The object to be evaluated may be any user who needs to perform risk evaluation. For example, if a person is risk rated, the object to be evaluated may be the user person; if the risk assessment is performed on the enterprise unit, the object to be assessed at this time may be a responsible person of the enterprise, and the like. The existing data features of the object to be evaluated may be features extracted from existing data of the object to be evaluated, for example, may be obtained by analyzing the existing data of the object to be evaluated through a pre-trained machine learning model; or extracting from the existing data of the object to be evaluated by a preset feature extraction algorithm. The existing data of the object to be evaluated may be acquired reference data for risk evaluation at this time, and the data types of the existing data may be numerous, for example, but not limited to: text data, social data, consumption data, external data, other data, and the like. The existing data may be filled in or uploaded by the user for the purpose of performing the present evaluation; or may be obtained by accessing various data sources. Alternatively, if the existing data is obtained from a data source, the data source may include, but is not limited to: at least one of short messages, operators, address books, credit card bills, electronic commerce, third party credit, blacklist libraries, terminal behaviors and the like. The data missing event in the embodiment of the present application may be an event that is triggered to be generated when the acquired existing data is not comprehensive enough and some data is missing. The missing data feature is the relevant feature of the missing data of the current evaluation.
Alternatively, in the embodiment of the present application, whether the data missing event exists may be detected by an existing data feature of the object to be evaluated. Specifically, the risk assessment method may be that a service of actually performing risk assessment is combined, whether the existing data features determined by the acquired existing data of the object to be assessed are comprehensive or not is judged, and if other features are absent, the existence of a data missing event is indicated. For example, the target feature types required by risk assessment are preset, after determining the existing data features of the object to be assessed, whether the existing data features cover all the target feature types is judged, and if not, the existence of the data missing event is indicated. The existing data features can also be input into the scoring card model based on the interpretability of the scoring card model for determining the final result of risk assessment, and if the existing data features are not comprehensive, the scoring card model can input a data loss prompt, namely, the data loss event is detected.
Alternatively, in the embodiment of the present application, if there is a data missing event, missing data is not ignored (i.e., the missing data is deleted), but missing data features of the object to be evaluated are retrieved through transformation processing according to existing data features. Specifically, since the types of existing data of the object to be evaluated are more, the existing data features extracted for each type of existing data may not be on one scale, and at this time, the step may be to convert the existing data features of the existing data of each type to the same scale through a preset transformation algorithm, and then perform fusion, so as to accurately determine the features of the missing data. Alternatively, the method of fusing different existing data features at the same scale may depend on the scaling algorithm, for example, if the scaling algorithm is a WOE algorithm, the fusion method at this time may be a weighted fusion method.
S102, performing risk assessment on the object to be assessed according to the existing data features and the missing data features of the object to be assessed.
Alternatively, risk assessment may be performed by the embodiment of the present application to assess credit, fraud, and whether the environment is dangerous or not. Specifically, because the missing data features of the missing data are accurately found in S101, when the risk assessment is accurately performed on the object to be assessed according to the existing data features and the missing data features to be assessed, a risk assessment rule may be preset, for example, what the data features corresponding to different risk levels are, and at this time, the extracted data features (including the existing data features and the missing data features) of the user may be matched with the data features corresponding to the risk levels in the risk assessment rule, so as to determine what the risk assessment level corresponding to the current object to be assessed is. The existing data features and the missing data features of the object to be evaluated can be input into a pre-trained model for risk evaluation, and the model can analyze the input existing data features and missing data features based on an algorithm in training to obtain a final accurate risk evaluation result. Optionally, the model for risk assessment in the embodiment of the present application may be composed of a machine learning model and a scoring card model, where the machine learning model is used to score the existing data features and the missing data features, and input the scoring values of the existing data features and the missing data features into the scoring card model, and the scoring card model is used to analyze the input scoring values of the existing data features and the missing data features, so as to obtain a final risk assessment result.
According to the technical scheme, if the risk assessment is carried out according to the existing data characteristics of the object to be assessed, if the data missing event is detected, the existing data characteristics are transformed to determine missing data characteristics of the object to be assessed, and then risk assessment is carried out on the object to be assessed according to the determined missing data characteristics and the existing data characteristics. According to the scheme, when the data missing event exists in the risk assessment process, missing data is not ignored for assessment, but the risk assessment is carried out after the existing data features are transformed to accurately restore the features of the missing data, so that the problem that the risk assessment result is inaccurate when the existing data of an object to be assessed is incomplete is solved, the accuracy of the risk assessment is greatly improved, and the risk assessment scheme is optimized.
FIG. 2A is a flow chart of another risk assessment method provided in accordance with an embodiment of the present application; fig. 2B is a schematic diagram of missing data feature determination principle provided according to an embodiment of the present application. Based on the above embodiment, the present embodiment is further optimized, and a specific description of the transformation of the existing data features to obtain the missing data features of the object to be evaluated is given. As shown in fig. 2A-2B, the method specifically includes:
S201, if the data missing event is detected according to the existing data characteristics of the object to be evaluated, evidence weight WOE transformation is conducted on the existing data characteristics to obtain transformation characteristics.
Optionally, if a data missing event is detected according to an existing data feature of the object to be evaluated, the embodiment may perform WOE transformation on the existing data feature to restore the missing data feature of the missing data. Specifically, the WOE transformation may be performed on existing data features determined by existing data acquired from different data sources, so as to obtain transformation features corresponding to each of the existing data features.
S202, determining weight values of the transformation characteristics.
Optionally, after obtaining the transformation features corresponding to the existing data features of different types through WOE transformation in S201, weight value assignment needs to be performed for each transformation feature. There are many specific assignment methods, and this embodiment is not limited. If the method is used, the weight value assignment is carried out on different transformation characteristics by a staff with rich experience; the weight values of different transformation characteristics can be obtained by training a machine learning model through a large amount of sample data; the weight values of the transformation characteristics may be determined according to other rules.
S203, determining missing data features of the object to be evaluated according to the transformation features and the weight values of the transformation features.
Optionally, this step may be to perform weighted fusion on each transformation feature according to the transformation feature and the weight value of the transformation feature, and use the weighted fusion result (i.e., the result of summation obtained by multiplying the feature value of each transformation feature by its weight value) as the determined missing data feature of the object to be evaluated. Illustratively, as shown in fig. 2B, missing data feature=transform feature 1×w1+transform feature 2×w2+ … +transform feature n×wn.
S204, performing risk assessment on the object to be assessed according to the existing data features and the missing data features of the object to be assessed.
According to the scheme of the embodiment of the application, if the risk assessment is detected to have a data missing event according to the existing data features of the object to be assessed, WOE transformation is conducted on the existing data features of each type to obtain transformation features corresponding to the existing data features of each type, weight assignment is conducted on the transformation features, weighted fusion is conducted on the transformation features to obtain missing data features of the object to be assessed, and further risk assessment is conducted on the object to be assessed according to the determined missing data features and the existing data features. According to the scheme, the missing data features are restored by adopting WOE transformation, so that the accuracy of restoring the missing data features is greatly improved, and the accuracy of final risk assessment is further improved.
Fig. 2B is a specific example of step S202 in fig. 2A, and as shown in fig. 2B, this step may assign W1 to the weight value of transform feature 1, W2 to the weight value of transform feature 2, and WN to the weight value of transform feature N. Extracting the characteristics of the existing data 1 acquired by the data source 1 channel to obtain the existing data characteristics 1, and performing WOE (word-of-event) transformation on the existing data characteristics 1 to obtain transformation characteristics 1; extracting features of the existing data 2 acquired by the data source 2 channel to obtain the existing data features 2, and performing WOE (word-of-event) transformation on the existing data features 2 to obtain transformed features 2; and extracting the characteristics of the existing data N acquired by the data source N channel to obtain the existing data characteristics N, and performing WOE transformation on the existing data characteristics N to obtain transformation characteristics N. Alternatively, the present embodiment may be a determination operation of transforming characteristics of existing data collected by different data sources in parallel.
FIG. 3 is a flow chart of another risk assessment method provided in accordance with an embodiment of the present application; the embodiment is further optimized based on the above embodiments, and a specific description is given of determining the existing data characteristics of the object to be evaluated. As shown in fig. 3, the method specifically includes:
S301, data mining is carried out on existing data of the object to be evaluated, and the edge relation of at least one dimension contained in the existing data is obtained.
In this embodiment of the present application, the edge relationship may be a relationship between different users. The side relationship in this embodiment is classified into strong and weak. For example, suppose that the closer the relationship between user A and user B, the stronger the relationship between them. The strength of the edge relationship can be represented by a weight value of the edge relationship. Optionally, the edge relationship in this embodiment is an edge relationship of at least one dimension, where a relationship directly established between two users without other users is a one-degree edge relationship; the relationship established between the two users through the third user is a two-degree-edge relationship; the relationship established between the two users through the third user and the fourth user is a three-degree side relationship and the like. For example, assuming that user a and user B are friends, user a and user B are one-degree edge relationships; if the user A and the user B are friends and the user B and the user C are friends, the user A and the user C are in a two-degree side relationship at the moment.
Optionally, when the embodiment of the present application mines an edge relationship of existing data of an object to be evaluated, different algorithms may be used to mine the data of different types of existing data, and the edge relationship of at least one dimension included in the different types of existing data is extracted. Specifically, for the operator data in the existing data, the intimacy degree between every two users can be measured according to indexes such as the number of call records in the last period, for example, a one-degree side relationship exists between two users with higher frequency of call numbers. For the device data in the existing data, the intimacy degree between users can be measured according to whether the same device is used or not and whether the same WIFI is used or not in the last period of time, for example, a one-degree side relationship exists between users using the same device or the WIFI. For address book data in the existing data, the intimacy between users can be measured according to remark information of the number stored in the address book, if the remark of a certain number is a mother, a one-degree side relationship exists between the corresponding user of the number and the user of the terminal to which the address book belongs. For emergency contact data in existing data, in general, emergency contacts filled by users are family members, colleagues, relatives and friends of the users, and the emergency contacts of the users can be used as a one-degree relationship with filling users. For the e-commerce address data in the existing data, the side relations existing between users can be compared and measured according to the address similarity, for example, a degree of side relation exists between users with the same address; for the operation old customer in the existing data, the new drawing interaction is generally to draw the new data among families, colleagues and relatives and friends, so that a one-degree side relationship can be set between the new user and the old user when the new data is drawn. For bank card transfer data in the existing data, a one-degree side relationship exists between the two transfer parties; for location based services (Location Based Services, LBS) address data in existing data, there is a one-degree relationship between users at the same address, and so on.
Optionally, in the embodiment of the present application, considering the difference in intimacy degree between the side relationships extracted by different types of existing data, different weight values may be set for the side relationships extracted by different types of existing data, for example, the side relationship extracted by the carrier data, the address book data and the emergency contact data has higher intimacy degree, and a higher weight value may be set for the side relationship with higher intimacy degree; the coverage rate of the bank card transfer record data is lower, the data error of LBS address data acquisition is larger, and the timeliness is poorer, so that a lower weight value can be set for the side relation extracted from the two types of existing data.
Alternatively, in the embodiment of the present application, the acquired existing data of the object to be evaluated may include at least one of stock data and incremental data. Incremental data generally refers to data acquired in real time, and can directly calculate a weight value of a one-degree edge relationship. For example, if the user must import operator data when making a loan order, a one-degree relationship may be determined from call records imported in real time. Since the incremental data is the latest data, it is natural to reflect the risk of the user at this time. So typically, if there is delta data, the delta data is preferentially used to extract the edge relationships of the user. But in some scenarios the stock data (i.e. data stored historically in the system) has to be used again. For example, if some new users do not import their own personal data during loan application, but have been imported in the history of the person, then the stock data is valuable to use, and the stock data is used to extract the side relationship of the user. The time corresponding to both the delta data and the stock data must be different, so different algorithms need to be employed for mining for the delta data and the stock data.
Optionally, in this embodiment of the present application, according to different actual risk assessment policies, the mined edge relationships may need to be fused, and specifically, when the edge relationship fusion is performed in this embodiment, at least one of feature layer fusion and network layer fusion is included. The feature layer fusion can be that each sub-network independently constructs the edge relation in the graph feature, and the feature layer fusion is performed in the feature layer, so that the arrangement has the advantages of convenience, intuitiveness and parallel edge relation fusion processing; the network layer fusion has the advantages that various strong side relations and weak side relations are subjected to weighted fusion based on the corresponding weight values, so that the intrinsic meaning of the side relations is richer, and the coverage probability of the sides is improved. In the embodiment of the present application, the edge relationship may be fused by using a feature layer fusion method when the amount of edge relationship coefficients is small, and by using a network layer fusion method when the amount of edge relationship coefficients is large.
S302, determining initial characteristics of the object to be evaluated according to the edge relation of the online characteristic table and at least one dimension.
The characteristics of each user are recorded in the online characteristic table, which may be determined according to a first-degree, second-degree or even higher-dimension side relationship between users, and the characteristic table may be imported onto the online after the determination, so as to be used when determining the initial characteristics of the object to be evaluated each time the object to be evaluated performs risk evaluation.
Optionally, this step may be to determine an associated user of the object to be evaluated according to the edge relationship of at least one dimension, and then search the online feature table for the feature of the associated user, and transmit the feature to the object to be evaluated as the initial feature of the object to be evaluated. For example, if the object to be evaluated and the user a have at least one dimensional side relationship, the user a may be used as an associated user of the object to be evaluated at this time, and if the feature recorded in the online feature table by the user a is a belief-losing user, the initial feature of the object to be evaluated is the belief-losing user.
Alternatively, in general, the user identifier in the online feature table is a mobile phone number, an equipment identifier, and the like of the user, and the user feature corresponding to the user identifier is configured by adopting a preset algorithm, for example, a large number of time slice features are produced in batch through a client relationship management (RFM) model, and service features are configured according to service understanding. The user quantity in the online feature table determines the initial feature coverage rate and the accuracy rate of the determination of the object to be evaluated. For example, assuming that there are 100 associated users of the object to be evaluated, only 3 associated users 'features are recorded in the online feature table, where the initial feature of the object to be evaluated is determined to use only three associated users' features, coverage rate and accuracy are relatively low. In order to improve the accuracy of the initial feature determination of the object to be evaluated in the embodiment of the present application, the present embodiment needs to expand the user amount in the online feature table. And particularly, when the on-line feature table is expanded, historical stock data is acquired as much as possible to determine the user features with strong stability, and the user features are added into the table. For the characteristic of strong timeliness, coverage rate is properly sacrificed, and the validity period is ensured. Optionally, in the online feature table, when a user has multiple records at different time points, the user can take the most recent guaranteed feature with the strongest timeliness; the method can also refer to forgetting curves, and perform weighted fusion on the characteristics of different time points.
Optionally, in the embodiment of the present application, if the initial feature of the object to be evaluated required for risk assessment measurement is simple, for example, an address is determined from an identification card number, at this time, existing data may be directly mined, and a simple initial feature is obtained therefrom, without determining an edge relationship, and is determined based on an online feature table.
S303, calling a preset model algorithm to extract the existing data features of the object to be evaluated from the initial features.
Optionally, the embodiment of the present application may be preset with a model algorithm library, in which a plurality of model algorithms for data extraction are recorded, for example, may include, but not limited to: LR algorithms, XGBoost algorithms, RF algorithms, GBDT algorithms, and the like. When extracting the existing data features of the object to be evaluated from the initial features, the step can be to call different model algorithms to extract the existing data features of the object to be evaluated based on different kinds of initial features. Specifically, an LR algorithm may be invoked to extract existing data features of the object under evaluation from text features in the initial features; invoking an XGBoost algorithm to extract the existing data characteristics of the object to be evaluated from the relation characteristics in the initial characteristics; invoking an RF algorithm to extract the existing data features of the object to be evaluated from the RFM features in the initial features; and calling the GBDT algorithm to extract the existing data characteristics of the object to be evaluated from the external characteristics in the initial characteristics, and the like.
S304, if the data missing event is detected according to the existing data characteristics of the object to be evaluated, the existing data characteristics are transformed to obtain the missing data characteristics of the object to be evaluated.
S305, performing risk assessment on the object to be assessed according to the existing data features and the missing data features of the object to be assessed.
According to the scheme, after the existing data of the object to be evaluated are obtained, the existing data are subjected to data mining to obtain the edge relation of at least one dimension, initial characteristics of the object to be evaluated are determined from the online characteristic table according to the edge relation, and then a preset model algorithm is called to further extract more accurate existing data characteristics of the object to be evaluated from the initial characteristics. And if the data missing event is detected according to the existing data features, determining missing data features of the object to be evaluated by transforming the existing data features, and performing risk evaluation on the object to be evaluated according to the determined missing data features and the existing data features. According to the method and the device for mining the side relationship of the data, the side relationship of a large amount of existing data is mined, the existing data features are obtained through feature extraction twice, and the diversity and the accuracy of the existing data features are improved. And the method provides a guarantee for the follow-up judgment of whether the data missing event exists or not and the accurate risk assessment.
Optionally, on the basis of the foregoing embodiment, after obtaining the side relationship of at least one dimension included in the existing data, the method further includes: and optimizing and/or verifying the side relation of the at least one dimension. Specifically, when optimizing the side relationship of at least one dimension, the side relationship of two or even higher dimensions can be added on the basis of the one-degree side relationship, and accurate weight assignment is performed for the newly added side relationship. When verifying the edge relationship, the verification may be performed from two aspects of distinction degree and stability of the edge relationship, for example, the distinction degree may be calculated by using a distinction degree function, whether the distinction degree is higher than a distinction degree threshold value is judged, the stability of the edge relationship is calculated by using a stability function, whether the stability is higher than a stability threshold value is judged, and if the distinction degree and the stability are both greater than the corresponding threshold values, the verification of the edge relationship is illustrated. The benefit of optimizing the edge relation in the embodiment of the application is that the coverage rate of the edge relation and the accuracy of the weight of the edge relation are improved. The verification of the edge relationship has the advantages of ensuring the accuracy of the edge relationship and providing guarantee for the follow-up accurate extraction of the existing data characteristics of the object to be evaluated.
Optionally, on the basis of the foregoing embodiment, when the embodiment of the present application extracts an existing data feature of an object to be evaluated, the invoked preset model algorithm includes a GBDT algorithm; the target loss function during the GBDT algorithm training is determined through the second-order Taylor expansion result of the original loss function of the GBDT algorithm.
Specifically, the process of optimizing the conventional GBDT algorithm in this embodiment may be to perform linear approximation on an original loss function of the GBDT model, and perform first-order taylor expansion on the original loss function; thenReplacing linearity with polynomial, expanding taylor to second order to obtain higher longitude descent method, and minimizing loss functionAs a function of the target loss when the GBDT algorithm is trained at this time. Wherein (1)>Is a target loss function; t is the number of leaf nodes; x is x i For dataset I j The i-th data in (a); g i And h i Is a variable extracted from the Taylor second-order expansion result; lambda is a parameter variable. When searching the optimal partition point, the optimized GBDT algorithm enumerates some candidates which can become the partition point according to the percentile method, and then calculates and finds the optimal partition point from the candidates according to the target loss function. And when a negative loss is encountered during a split, the GBM stops splitting. It splits up to a specified maximum depth (max_depth) and then branches back. If a node is no longer positive after it will remove this split. The advantage of this is that when a negative loss (e.g., -2) is followed by a positive loss (e.g., + 10), the GBM stops at-2 because it encounters a negative value. But it will continue to split and then find that the two splits will combine to get +8 and therefore will remain. The optimized GBDT algorithm is beneficial in that only a small number of segmentation points are needed to be analyzed, and the calculation efficiency is improved. The optimized GBDT algorithm considers the situation that training data is sparse, and can assign a default direction of a branch for a missing value or an assigned value, so that the algorithm efficiency is further improved. The optimized GBDT algorithm can support column sampling by referring to the random forest method during column sampling, so that the overfitting can be reduced, and the calculation can be reduced. The optimized GBDT algorithm can be stored in a memory in the form of blocks after the feature columns are ordered, and can be repeatedly used in iteration; although boosting algorithm iterations must be serial, parallelism is possible in processing each feature column. The optimized GBDT algorithm can be according to the specification The characterization mode optimizes and searches for the optimal segmentation point. The optimized GBDT algorithm can also combine the methods of multithreading, data compression and fragmentation when the data volume is larger and the memory is insufficient, so that the efficiency of the algorithm is improved as much as possible.
FIG. 4 is a flow chart of another risk assessment method provided in accordance with an embodiment of the present application; based on the above embodiment, the present embodiment is further optimized, and a specific description of risk assessment of an object to be assessed according to existing data features and missing data features of the object to be assessed is given. As shown in fig. 4, the method specifically includes:
s401, if the data missing event is detected according to the existing data features of the object to be evaluated, the existing data features are transformed to obtain the missing data features of the object to be evaluated.
The existing data features are obtained by processing existing data of the object to be evaluated.
S402, determining the existing grading value and the missing grading value of the object to be evaluated according to the existing data characteristics and the missing data characteristics of the object to be evaluated through a machine learning model.
Optionally, in the embodiment of the present application, the existing score value is a score value determined for an existing data feature, and the missing score value is a score value determined for a missing data feature. In this embodiment, the existing scoring values corresponding to the existing data features may be predicted according to the existing data features of the object to be evaluated through a feature scoring network of a machine learning model trained in advance, and the missing scoring values corresponding to the missing features may be predicted according to the missing data features of the object to be evaluated.
Optionally, in the embodiment of the present application, the scoring values may be predicted by using different scoring sub-networks for different types of data features, for example, for text data features, using a text scoring sub-network to predict the scoring values; for the characteristics of the relational data, a relational scoring sub-network is adopted to predict scoring values; for RFM class data characteristics, predicting a scoring value by adopting an RFM scoring sub-network; for the external data feature, the external scoring sub-network is adopted for scoring value prediction and the like. The scoring subnetworks may be combined according to the nature of the actual existing data features and missing data features to determine the feature scoring network that is ultimately required for the prediction of the existing scoring values and missing scoring values for this step.
Optionally, if the missing data feature is obtained after the fusion through WOE transformation, in this step, when determining the missing score value of the object to be evaluated according to the missing data feature of the object to be evaluated through the feature scoring network of the machine learning model, the missing data feature of the object to be evaluated may be processed by calling a logistic regression sigmoid function through the machine learning model (such as the feature scoring network of the machine learning model) to obtain the missing score value of the object to be evaluated. To improve the accuracy of the determination of the missing score value.
S403, carrying out risk assessment on the object to be assessed according to the existing score value and the missing score value of the object to be assessed through the score card model.
Optionally, in the embodiment of the present application, the existing score value of the object to be evaluated and the missing score value determined in S402 may be input into a pre-trained score card model, where the score card model may analyze the input existing score value and missing score value based on an algorithm during training, and since the missing score value of the missing data is already retrieved at this time, accurate risk assessment may be performed on the object to be evaluated, so as to obtain a final risk assessment result.
According to the scheme of the embodiment of the application, if the risk assessment is detected to have a data missing event according to the existing data features of the object to be assessed, the existing data features are transformed to determine missing data features of the object to be assessed, the missing grading value and the existing grading value of the object to be assessed are determined according to the determined missing data features and the existing data features through a machine learning model, and then the missing grading value and the existing grading value are analyzed through a grading card model, so that the risk assessment of the object to be assessed is completed. According to the scheme, the machine learning model and the scoring card model are combined to perform risk assessment on the object to be assessed, and accuracy of risk assessment is improved. The risk assessment scheme is optimized.
FIG. 5 is a flow chart of another risk assessment method provided in accordance with an embodiment of the present application; the present embodiment is further optimized based on the above embodiments, and a specific description is given of how to determine whether a data loss event exists. As shown in fig. 5, the method specifically includes:
s501, determining an existing grading value of the object to be evaluated according to the existing data characteristics of the object to be evaluated through a machine learning model, and inputting the existing grading value into a grading card model.
Alternatively, in the embodiment of the present application, after the existing data of the object to be evaluated is obtained, it is not known whether the obtained existing data is comprehensive, and at this time, the existing data feature may be determined according to the obtained existing data of the object to be evaluated, and then the scoring value for the existing data feature, that is, the existing scoring value of the object to be evaluated, may be determined based on the machine learning model described in the above embodiment. It should be noted that, other manners may be used to determine the existing scoring value of the object to be evaluated, for example, scoring the data feature according to a preset scoring rule, and so on. And finally, inputting the determined existing scoring value into a scoring card model. Because the scoring card model has good interpretability, the scoring card model can analyze the input existing scoring value and position whether the scoring value is too high, too low or missing, if the scoring value is too high, the output result can be interpreted as passing the risk assessment, and if the scoring value is too low, the output result can be interpreted as not passing the risk assessment; if the score value is missing, the output result may be interpreted as a data missing.
S502, if the output result of the grading card model is data missing, detecting that a data missing event exists.
Optionally, if the existing score value is input to the score card model in S501, the output result of the score card model is data missing, which indicates that the embodiment detects that there is a data missing event, and risk assessment is performed after retrieving the missing data feature through the following operation in S503. If the existing score value is input into the score card model in S501, the output result of the score card model in S501 is not data missing, and the output result of the score card model in S501 is used as the current risk assessment result, that is, the current risk assessment of the object to be assessed is finished, and the subsequent operation is not performed in this step.
S503, if the data missing event is detected according to the existing data features of the object to be evaluated, transforming the existing data features to obtain missing data features of the object to be evaluated.
The existing data characteristics are obtained by processing the existing data of the object to be evaluated.
S504, performing risk assessment on the object to be assessed according to the existing data features and the missing data features of the object to be assessed.
According to the technical scheme, the existing grading value is determined according to the existing data characteristics of the object to be evaluated through the machine learning model and then is input into the grading card model, if the output result of the grading card model is data missing, the fact that the data missing event exists in the risk evaluation is detected is indicated, at the moment, the missing data characteristics of the object to be evaluated are determined through transformation of the existing data characteristics, and then the risk evaluation is carried out on the object to be evaluated according to the determined missing data characteristics and the existing data characteristics. According to the scheme, the interpretability of the scoring card model is utilized, the machine learning model and the scoring card model are combined to judge whether the data missing event exists, the accuracy of judging the data missing event is improved, the accuracy of the overall risk assessment result is further improved, and the risk assessment scheme is optimized.
FIG. 6A is a flow chart of another risk assessment method provided in accordance with an embodiment of the present application; FIG. 6B is an architecture diagram of a risk assessment system provided in accordance with an embodiment of the present application; the embodiment is further optimized based on the above embodiments, and a description of a preferred example of risk assessment is given. As shown in fig. 6A-6B, the method specifically includes:
s601, data mining is carried out on existing data of an object to be evaluated, and an edge relation of at least one dimension contained in the existing data is obtained.
Optionally, as shown in fig. 6B, the text data may be obtained from a short message and an address data source in this step; acquiring social data from an operator and an address book data source; obtaining consumption data from a credit card bill and an electronic commerce data source; external data are acquired from a third-party credit investigation and blacklist library data source; and acquiring other data from the equipment and the terminal behavior data source, and taking the acquired text data, social data, consumption data, external data and other data as the existing data of the object to be evaluated. Then inputting the obtained existing data into a data mining layer in a feature extraction network of a machine learning model, wherein the data mining layer adopts different mining algorithms to carry out big data mining on the obtained existing data, for example, adopts a natural language processing (Natural Language Processing, NLP) algorithm to carry out data mining on the obtained text data, so as to obtain the edge relation of at least one dimension contained in the text data; adopting a graph algorithm to conduct data mining on the obtained social data to obtain an edge relation of at least one dimension contained in the social data; performing data mining on the acquired consumption data by adopting an RFM aggregation algorithm to obtain an edge relation of at least one dimension contained in the consumption data; performing data mining on the acquired external data by adopting a deep learning algorithm to obtain an edge relation of at least one dimension contained in the external data; and adopting other algorithms to perform data mining on the obtained other data to obtain the edge relation of at least one dimension contained in the other data.
S602, optimizing and verifying the side relation of at least one dimension.
Optionally, optimization and verification processing are performed on the edge relations of all dimensions obtained in the step S601, so that coverage rate and accuracy of the edge relations are improved.
S603, determining initial characteristics of the object to be evaluated according to the online characteristic table and the edge relation of at least one dimension.
Optionally, as shown in fig. 6B, the step may be that the edge relationship after optimization verification is input to a first feature extraction layer in a feature extraction network of the machine learning model, where the first feature extraction layer extracts initial features of the object to be evaluated by using different algorithms based on different types of edge relationships; if a text feature extraction algorithm is adopted, determining the text feature as a part of initial features of the object to be evaluated according to the on-line feature table and the edge relation of the text type of at least one dimension; adopting a relation feature extraction algorithm, and determining relation features as a part of initial features of the object to be evaluated according to the on-line feature table and the side relation of the relation type of at least one dimension; if an RFM feature extraction algorithm is adopted, determining the consumption feature as a part of initial features of the object to be evaluated according to the online feature table and the edge relation of the consumption type of at least one dimension; adopting an external feature extraction algorithm, and determining external features as a part of initial features of the object to be evaluated according to the on-line feature table and the edge relation of the external feature type of at least one dimension; and adopting other feature extraction algorithms to determine other features as part of initial features of the object to be evaluated according to the online feature table and other types of edge relations of at least one dimension.
S604, calling a preset model algorithm to extract the existing data features of the object to be evaluated from the initial features.
Optionally, as shown in fig. 6B, after the initial feature of the object to be evaluated is extracted, the first feature extraction layer is input to a second feature extraction layer of the feature extraction network, where the second feature extraction layer performs more abstract feature extraction on the basis of the first feature extraction layer, so that the extracted feature is more accurate. Specifically, calling an LR algorithm to extract more accurate text features from text features in the initial features as part of existing data features of the object to be evaluated; invoking XGBoost algorithm to extract more accurate relation features from relation features in the initial features as part of existing data features of the object to be evaluated; invoking an FR algorithm to extract more accurate consumption characteristics from consumption characteristics in the initial characteristics as part of existing data characteristics of the object to be evaluated; invoking the GBDT algorithm (namely the optimized GBDT algorithm introduced by the embodiment) to extract more accurate external characteristics from the external characteristics in the initial characteristics as part of the existing data characteristics of the object to be evaluated; other algorithms are invoked to extract more accurate other features from the other features in the initial features as part of the existing data features of the object to be evaluated.
S605, determining the existing grading value of the object to be evaluated according to the existing data characteristics of the object to be evaluated through a machine learning model, and inputting the existing grading value into a grading card model.
Alternatively, as shown in fig. 6B, the existing data features of the object to be evaluated extracted by the second feature extraction layer of the feature extraction network of the machine learning model are input into the feature scoring network of the machine learning model. The feature scoring network is composed of a plurality of different scoring sub-networks, and the different scoring sub-networks are used for scoring the existing data features of different types, such as scoring the accurate text features in the existing data features by using the text scoring sub-network; scoring the accurate relationship features in the existing data features by adopting a relationship scoring sub-network; scoring the accurate consumption characteristics in the existing data characteristics by adopting an RFM scoring sub-network; scoring the accurate external features in the existing data features by adopting an external scoring sub-network; and scoring other accurate characteristics in the existing data characteristics by adopting other scoring subnetworks. And (5) inputting the scoring results output by each scoring sub-network in the feature scoring network into the scoring card model.
S606, judging whether the output result of the grading card model is data missing, if so, executing S607, and if not, executing S613.
Optionally, the scoring card model analyzes the input existing scoring values to determine whether data is missing, if so, the output result is data missing, and at this time, the operation of S607 is performed, and a data missing event is detected. If not, outputting an evaluation result of the risk evaluation, and executing the operation of S613 to obtain a final evaluation result of the object to be evaluated.
S607, if the output result of the grading card model is data missing, detecting that the data missing event exists.
S608, performing evidence weight WOE transformation on the existing data features to obtain transformation features.
S609, a weight value of the transformation feature is determined.
S610, determining missing data features of the object to be evaluated according to the transformation features and the weight values of the transformation features.
S611, calling a logistic regression sigmoid function through a machine learning model to process missing data characteristics of the object to be evaluated, and obtaining a missing grading value of the object to be evaluated.
Alternatively, the operations of S608 to S611 of the present embodiment may be performed by the feature extraction network and the feature scoring network of the machine learning model in fig. 6B, or may be performed by adding a network to the machine learning model, which is not limited to this embodiment.
And S612, carrying out risk assessment on the object to be assessed according to the existing score value and the missing score value of the object to be assessed through the score card model.
Optionally, as shown in fig. 6B, after the missing score value of the object to be evaluated is obtained, the missing score value is also input into the score card model, and the score card model performs risk evaluation on the object to be evaluated again based on the existing score value determined and input in S605 and the missing score value determined and input in S611, so as to obtain a final risk evaluation result.
S613, obtaining a final evaluation result of the object to be evaluated.
In an embodiment of the present application, the existing data of the object to be evaluated includes: at least one of text data, social data, consumption data, and external data; and performing an operation of determining the characteristics of the existing data on each of the existing data through a machine learning model.
According to the technical scheme, the scoring card model is adopted for final risk assessment, and the scoring card model has the interpretability, so that the data source granularity interpretability is reserved, and the data loss feedback can be given by locating reasons. Because the machine learning model is utilized to execute the operation of determining the existing data characteristics for each existing data, the threshold of the artificial structural characteristics is reduced, and the accuracy of the determined existing data characteristics is improved. The problem of missing data is solved by using WOE transformation, and the accuracy of the determined missing data features is improved. The data in different data sources are independently subjected to data mining, feature extraction, feature score and the like, and the operations can be performed in parallel, so that the processing efficiency is improved, the feature input into the main scoring card model in each data source dimension is ensured, and the diversity of the input features of the scoring card model is further ensured. In addition, the structure of each network layer of the machine learning model in the embodiment of the application is flexible, flexible combination can be carried out according to different data sources to be processed, and a proper algorithm is called for processing, so that the expansibility and the flexibility are high.
Fig. 7 is a schematic structural diagram of a risk assessment device according to an embodiment of the present application; the method and the device are suitable for risk assessment of the object to be assessed. For example, it may be the case that a loan user is risk-assessed based on his existing data at the time of approval of the loan. The apparatus may implement the risk assessment method according to any embodiment of the present application, the apparatus may be integrated in an electronic device, and the apparatus 700 includes:
the missing feature determining module 701 is configured to, if a data missing event is detected according to an existing data feature of an object to be evaluated, transform the existing data feature to obtain a missing data feature of the object to be evaluated; the existing data characteristics are obtained by processing existing data of the object to be evaluated;
the risk assessment module 702 is configured to perform risk assessment on the object to be assessed according to the existing data features and the missing data features of the object to be assessed.
According to the technical scheme, if the risk assessment is carried out according to the existing data characteristics of the object to be assessed, if the data missing event is detected, the existing data characteristics are transformed to determine missing data characteristics of the object to be assessed, and then risk assessment is carried out on the object to be assessed according to the determined missing data characteristics and the existing data characteristics. According to the scheme, when the data missing event exists in the risk assessment process, missing data is not ignored for assessment, but the risk assessment is carried out after the existing data features are transformed to accurately restore the features of the missing data, so that the problem that the risk assessment result is inaccurate when the existing data of an object to be assessed is incomplete is solved, the accuracy of the risk assessment is greatly improved, and the risk assessment scheme is optimized.
Optionally, the missing feature determining module 701 includes:
the feature transformation unit is used for carrying out evidence weight WOE transformation on the existing data features to obtain transformation features;
a weight determining unit for determining a weight value of the transformation feature;
and the missing feature determining unit is used for determining missing data features of the object to be evaluated according to the transformation features and the weight values of the transformation features.
Optionally, the apparatus further includes:
the system comprises a side relation mining module, a data processing module and a data processing module, wherein the side relation mining module is used for carrying out data mining on the existing data of the object to be evaluated to obtain the side relation of at least one dimension contained in the existing data;
the initial feature determining module is used for determining initial features of the object to be evaluated according to the online feature table and the edge relation of the at least one dimension;
and the existing feature determining module is used for calling a preset model algorithm to extract the existing data features of the object to be evaluated from the initial features.
Optionally, the apparatus further includes:
and the optimization verification module is used for optimizing and/or verifying the side relationship of the at least one dimension.
Optionally, the preset model algorithm includes a GBDT algorithm; the target loss function during the GBDT algorithm training is determined through the second-order Taylor expansion result of the original loss function of the GBDT algorithm.
Optionally, the risk assessment module 702 includes:
the scoring value determining unit is used for determining the existing scoring value and the missing scoring value of the object to be evaluated according to the existing data characteristics and the missing data characteristics of the object to be evaluated through a machine learning model;
and the risk assessment unit is used for carrying out risk assessment on the object to be assessed according to the existing score value and the missing score value of the object to be assessed through the score card model.
Optionally, the scoring value determining unit is specifically configured to, when determining, by using a machine learning model, a missing scoring value of the object to be evaluated according to missing data features of the object to be evaluated:
and calling a logistic regression sigmoid function through the machine learning model to process the missing data characteristics of the object to be evaluated, so as to obtain the missing grading value of the object to be evaluated.
Optionally, the risk assessment module 702 further includes:
the scoring value determining unit is used for: determining an existing grading value of the object to be evaluated according to the existing data characteristics of the object to be evaluated through a machine learning model, and inputting the existing grading value into a grading card model;
and the missing time detection module is used for detecting that a data missing event exists if the output result of the scoring card model is data missing.
Optionally, the existing data of the object to be evaluated includes: at least one of text data, social data, consumption data, and external data; and performing an operation of determining the characteristics of the existing data on each of the existing data through a machine learning model.
According to embodiments of the present application, an electronic device and a readable storage medium are also provided.
As shown in fig. 8, a block diagram of an electronic device according to a risk assessment method according to an embodiment of the present application is shown. Electronic devices are intended to represent various forms of digital computers, such as laptops, desktops, workstations, personal digital assistants, servers, blade servers, mainframes, and other appropriate computers. The electronic device may also represent various forms of mobile devices, such as personal digital processing, cellular telephones, smartphones, wearable devices, and other similar computing devices. The components shown herein, their connections and relationships, and their functions, are meant to be exemplary only, and are not meant to limit implementations of the application described and/or claimed herein.
As shown in fig. 8, the electronic device includes: one or more processors 801, memory 802, and interfaces for connecting the components, including high-speed interfaces and low-speed interfaces. The various components are interconnected using different buses and may be mounted on a common motherboard or in other manners as desired. The processor may process instructions executing within the electronic device, including instructions stored in or on memory to display graphical information of the GUI on an external input/output device, such as a display device coupled to the interface. In other embodiments, multiple processors and/or multiple buses may be used, if desired, along with multiple memories and multiple memories. Also, multiple electronic devices may be connected, each providing a portion of the necessary operations (e.g., as a server array, a set of blade servers, or a multiprocessor system). One processor 801 is illustrated in fig. 8.
Memory 802 is a non-transitory computer-readable storage medium provided herein. Wherein the memory stores instructions executable by the at least one processor to cause the at least one processor to perform the risk assessment method provided herein. The non-transitory computer readable storage medium of the present application stores computer instructions for causing a computer to perform the risk assessment method provided by the present application.
The memory 802 is used as a non-transitory computer readable storage medium, and may be used to store non-transitory software programs, non-transitory computer executable programs, and modules, such as program instructions/modules (e.g., the missing feature determination module 701 and the risk assessment module 702 shown in fig. 7) corresponding to the risk assessment method in the embodiments of the present application. The processor 801 executes various functional applications of the server and data processing, i.e., implements the risk assessment method in the above-described method embodiments, by running non-transitory software programs, instructions, and modules stored in the memory 802.
Memory 802 may include a storage program area that may store an operating system, at least one application program required for functionality, and a storage data area; the storage data area may store data created according to the use of the electronic device of the risk assessment method, and the like. In addition, memory 802 may include high-speed random access memory, and may also include non-transitory memory, such as at least one magnetic disk storage device, flash memory device, or other non-transitory solid-state storage device. In some embodiments, memory 802 may optionally include memory remotely located with respect to processor 801, which may be connected to the electronics of the risk assessment method via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.
The electronic device of the risk assessment method may further include: an input device 803 and an output device 804. The processor 801, memory 802, input devices 803, and output devices 804 may be connected by a bus or other means, for example in fig. 8.
The input device 803 may receive entered numeric or character information and generate key signal inputs related to user settings and function control of the electronic device of the risk assessment method, such as a touch screen, a keypad, a mouse, a track pad, a touch pad, a pointer stick, one or more mouse buttons, a track ball, a joystick, etc. input devices. The output device 804 may include a display apparatus, auxiliary lighting devices (e.g., LEDs), and haptic feedback devices (e.g., vibration motors), among others. The display device may include, but is not limited to, a Liquid Crystal Display (LCD), a Light Emitting Diode (LED) display, and a plasma display. In some implementations, the display device may be a touch screen.
Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof. These various embodiments may include: implemented in one or more computer programs, the one or more computer programs may be executed and/or interpreted on a programmable system including at least one programmable processor, which may be a special purpose or general-purpose programmable processor, that may receive data and instructions from, and transmit data and instructions to, a storage system, at least one input device, and at least one output device.
These computing programs (also referred to as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented in a high-level procedural and/or object-oriented programming language, and/or in assembly/machine language. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.
To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and pointing device (e.g., a mouse or trackball) by which a user can provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user may be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic input, speech input, or tactile input.
The systems and techniques described here can be implemented in a computing system that includes a background component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such background, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: local Area Networks (LANs), wide Area Networks (WANs), and the internet.
The computer system may include a client and a server. The client and server are typically remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other. The server can be a cloud server, also called a cloud computing server or a cloud host, and is a host product in a cloud computing service system, so that the defects of high management difficulty and weak service expansibility in the traditional physical hosts and VPS service are overcome.
According to the technical scheme of the embodiment of the application, the scoring card model is adopted for final risk assessment, and the scoring card model has the interpretability, so that the embodiment of the application reserves the interpretability of the data source granularity and can locate the reasons and give the feedback of the data loss. Because the machine learning model is utilized to execute the operation of determining the existing data characteristics for each existing data, the threshold of the artificial structural characteristics is reduced, and the accuracy of the determined existing data characteristics is improved. The problem of missing data is solved by using WOE transformation, and the accuracy of the determined missing data features is improved. The data in different data sources are independently subjected to data mining, feature extraction, feature score and the like, and the operations can be performed in parallel, so that the processing efficiency is improved, the feature input into the main scoring card model in each data source dimension is ensured, and the diversity of the input features of the scoring card model is further ensured. In addition, the structure of each network layer of the machine learning model in the embodiment of the application is flexible, flexible combination can be carried out according to different data sources to be processed, and a proper algorithm is called for processing, so that the expansibility and the flexibility are high.
It should be appreciated that various forms of the flows shown above may be used to reorder, add, or delete steps. For example, the steps described in the present application may be performed in parallel, sequentially, or in a different order, provided that the desired results of the technical solutions disclosed in the present application can be achieved, and are not limited herein.
The above embodiments do not limit the scope of the application. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives are possible, depending on design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present application are intended to be included within the scope of the present application.

Claims (18)

1. A risk assessment method, comprising:
if the existing data features of the object to be evaluated are detected to have the data missing event, performing evidence weight WOE transformation on the existing data features to obtain transformation features;
determining a weight value of the transformation feature;
determining missing data features of the object to be evaluated according to the transformation features and the weight values of the transformation features; the existing data characteristics are obtained by processing existing data of the object to be evaluated; the existing data of the object to be evaluated comprises text data;
And carrying out risk assessment on the object to be assessed according to the existing data characteristics and the missing data characteristics of the object to be assessed.
2. The method of claim 1, further comprising:
performing data mining on the existing data of the object to be evaluated to obtain an edge relation of at least one dimension contained in the existing data;
determining initial characteristics of the object to be evaluated according to the online characteristic table and the edge relation of the at least one dimension;
and calling a preset model algorithm to extract the existing data characteristics of the object to be evaluated from the initial characteristics.
3. The method of claim 2, further comprising, after deriving the side relationship for at least one dimension contained in the existing data:
and optimizing and/or verifying the side relation of the at least one dimension.
4. The method of claim 2, wherein the pre-set model algorithm comprises a GBDT algorithm; the target loss function during the GBDT algorithm training is determined through the second-order Taylor expansion result of the original loss function of the GBDT algorithm.
5. The method of claim 1, wherein the risk assessment of the object to be assessed based on the existing data features and the missing data features of the object to be assessed comprises:
Determining the existing grading value and the missing grading value of the object to be evaluated according to the existing data characteristics and the missing data characteristics of the object to be evaluated through a machine learning model;
and carrying out risk assessment on the object to be assessed according to the existing score value and the missing score value of the object to be assessed through a score card model.
6. The method of claim 5, wherein the determining, by a machine learning model, a missing score value for the object under evaluation from missing data features of the object under evaluation comprises:
and calling a logistic regression sigmoid function through the machine learning model to process the missing data characteristics of the object to be evaluated, so as to obtain the missing grading value of the object to be evaluated.
7. The method of claim 1, further comprising:
determining an existing grading value of the object to be evaluated according to the existing data characteristics of the object to be evaluated through a machine learning model, and inputting the existing grading value into a grading card model;
and if the output result of the scoring card model is data missing, detecting that a data missing event exists.
8. The method of any of claims 1-7, wherein the existing data of the object under evaluation further comprises: at least one of social data, consumption data, and external data; and performing an operation of determining the characteristics of the existing data on each of the existing data through a machine learning model.
9. A risk assessment apparatus comprising:
a missing feature determination module comprising:
the feature transformation unit is used for carrying out evidence weight WOE transformation on the existing data features to obtain transformation features if the existing data features of the object to be evaluated detect that the data missing event exists;
a weight determining unit for determining a weight value of the transformation feature;
the missing feature determining unit is used for determining missing data features of the object to be evaluated according to the transformation features and the weight values of the transformation features;
the existing data characteristics are obtained by processing existing data of the object to be evaluated; the existing data of the object to be evaluated comprises text data;
and the risk assessment module is used for carrying out risk assessment on the object to be assessed according to the existing data characteristics and the missing data characteristics of the object to be assessed.
10. The apparatus of claim 9, further comprising:
the system comprises a side relation mining module, a data processing module and a data processing module, wherein the side relation mining module is used for carrying out data mining on the existing data of the object to be evaluated to obtain the side relation of at least one dimension contained in the existing data;
the initial feature determining module is used for determining initial features of the object to be evaluated according to the online feature table and the edge relation of the at least one dimension;
And the existing feature determining module is used for calling a preset model algorithm to extract the existing data features of the object to be evaluated from the initial features.
11. The apparatus of claim 10, further comprising:
and the optimization verification module is used for optimizing and/or verifying the side relationship of the at least one dimension.
12. The apparatus of claim 10, wherein the pre-set model algorithm comprises a GBDT algorithm; the target loss function during the GBDT algorithm training is determined through the second-order Taylor expansion result of the original loss function of the GBDT algorithm.
13. The apparatus of claim 9, wherein the risk assessment module comprises:
the scoring value determining unit is used for determining the existing scoring value and the missing scoring value of the object to be evaluated according to the existing data characteristics and the missing data characteristics of the object to be evaluated through a machine learning model;
and the risk assessment unit is used for carrying out risk assessment on the object to be assessed according to the existing score value and the missing score value of the object to be assessed through the score card model.
14. The apparatus according to claim 13, wherein the scoring value determining unit is configured to, when determining the missing scoring value of the object to be evaluated according to the missing data feature of the object to be evaluated by a machine learning model:
And calling a logistic regression sigmoid function through the machine learning model to process the missing data characteristics of the object to be evaluated, so as to obtain the missing grading value of the object to be evaluated.
15. The apparatus of claim 9, wherein the risk assessment module further comprises:
the scoring value determining unit is used for: determining an existing grading value of the object to be evaluated according to the existing data characteristics of the object to be evaluated through a machine learning model, and inputting the existing grading value into a grading card model;
and the missing time detection module is used for detecting that a data missing event exists if the output result of the scoring card model is data missing.
16. The apparatus of any of claims 9-15, wherein the existing data of the object under evaluation further comprises: at least one of social data, consumption data, and external data; and performing an operation of determining the characteristics of the existing data on each of the existing data through a machine learning model.
17. An electronic device, comprising:
at least one processor; and
a memory communicatively coupled to the at least one processor; wherein,
the memory stores instructions executable by the at least one processor to enable the at least one processor to perform the risk assessment method of any one of claims 1-8.
18. A non-transitory computer readable storage medium storing computer instructions for causing the computer to perform the risk assessment method of any one of claims 1-8.
CN202010596449.4A 2020-06-28 2020-06-28 Risk assessment method, apparatus, device and storage medium Active CN111797994B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010596449.4A CN111797994B (en) 2020-06-28 2020-06-28 Risk assessment method, apparatus, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010596449.4A CN111797994B (en) 2020-06-28 2020-06-28 Risk assessment method, apparatus, device and storage medium

Publications (2)

Publication Number Publication Date
CN111797994A CN111797994A (en) 2020-10-20
CN111797994B true CN111797994B (en) 2024-04-05

Family

ID=72803194

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010596449.4A Active CN111797994B (en) 2020-06-28 2020-06-28 Risk assessment method, apparatus, device and storage medium

Country Status (1)

Country Link
CN (1) CN111797994B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107633455A (en) * 2017-09-04 2018-01-26 深圳市华傲数据技术有限公司 Credit estimation method and device based on data model
CN107633030A (en) * 2017-09-04 2018-01-26 深圳市华傲数据技术有限公司 Credit estimation method and device based on data model
CN109360084A (en) * 2018-09-27 2019-02-19 平安科技(深圳)有限公司 Appraisal procedure and device, storage medium, the computer equipment of reference default risk
WO2019237523A1 (en) * 2018-06-11 2019-12-19 平安科技(深圳)有限公司 Safety risk evaluation method and apparatus, computer device, and storage medium
CN110706095A (en) * 2019-09-30 2020-01-17 四川新网银行股份有限公司 Target node key information filling method and system based on associated network
CN111080397A (en) * 2019-11-18 2020-04-28 支付宝(杭州)信息技术有限公司 Credit evaluation method and device and electronic equipment

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11386435B2 (en) * 2017-04-03 2022-07-12 The Dun And Bradstreet Corporation System and method for global third party intermediary identification system with anti-bribery and anti-corruption risk assessment

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107633455A (en) * 2017-09-04 2018-01-26 深圳市华傲数据技术有限公司 Credit estimation method and device based on data model
CN107633030A (en) * 2017-09-04 2018-01-26 深圳市华傲数据技术有限公司 Credit estimation method and device based on data model
WO2019237523A1 (en) * 2018-06-11 2019-12-19 平安科技(深圳)有限公司 Safety risk evaluation method and apparatus, computer device, and storage medium
CN109360084A (en) * 2018-09-27 2019-02-19 平安科技(深圳)有限公司 Appraisal procedure and device, storage medium, the computer equipment of reference default risk
CN110706095A (en) * 2019-09-30 2020-01-17 四川新网银行股份有限公司 Target node key information filling method and system based on associated network
CN111080397A (en) * 2019-11-18 2020-04-28 支付宝(杭州)信息技术有限公司 Credit evaluation method and device and electronic equipment

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
基于数据可视化Rattle的个人信用风险评价建模;赵海鹏;李丹;;金融管理研究(02);全文 *

Also Published As

Publication number Publication date
CN111797994A (en) 2020-10-20

Similar Documents

Publication Publication Date Title
US11397772B2 (en) Information search method, apparatus, and system
US10504029B2 (en) Personalized predictive models
JP2012118977A (en) Method and system for machine-learning based optimization and customization of document similarity calculation
CN111667056B (en) Method and apparatus for searching model structures
WO2017127325A1 (en) Dynamically optimizing user engagement
CN112559870B (en) Multi-model fusion method, device, electronic equipment and storage medium
CN107832035B (en) Voice input method of intelligent terminal
CN111311030B (en) User credit risk prediction method and device based on influence factor detection
CN112380104A (en) User attribute identification method and device, electronic equipment and storage medium
CN113780098A (en) Character recognition method, character recognition device, electronic equipment and storage medium
CN112949973A (en) AI-combined robot process automation RPA process generation method
CN112328909A (en) Information recommendation method and device, computer equipment and medium
US20210192554A1 (en) Method, apparatus, device and storage medium for judging permanent area change
CN111291192B (en) Method and device for calculating triplet confidence in knowledge graph
CN112989170A (en) Keyword matching method applied to information search, information search method and device
CN111797994B (en) Risk assessment method, apparatus, device and storage medium
US11601509B1 (en) Systems and methods for identifying entities between networks
CN112614479B (en) Training data processing method and device and electronic equipment
CN114881521A (en) Service evaluation method, device, electronic equipment and storage medium
CN111340222B (en) Neural network model searching method and device and electronic equipment
CN113722593A (en) Event data processing method and device, electronic equipment and medium
CN114281990A (en) Document classification method and device, electronic equipment and medium
CN113850072A (en) Text emotion analysis method, emotion analysis model training method, device, equipment and medium
CN113989562A (en) Model training and image classification method and device
CN111563591A (en) Training method and device for hyper network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant