CN110751400B - Risk assessment method and device - Google Patents

Risk assessment method and device Download PDF

Info

Publication number
CN110751400B
CN110751400B CN201911006993.2A CN201911006993A CN110751400B CN 110751400 B CN110751400 B CN 110751400B CN 201911006993 A CN201911006993 A CN 201911006993A CN 110751400 B CN110751400 B CN 110751400B
Authority
CN
China
Prior art keywords
risk assessment
sample
group
model
weak correlation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911006993.2A
Other languages
Chinese (zh)
Other versions
CN110751400A (en
Inventor
马子俊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Puxin Hengye Technology Development Beijing Co ltd
Yiren Hengye Technology Development Beijing Co ltd
Original Assignee
Puxin Hengye Technology Development Beijing Co ltd
Yiren Hengye Technology Development Beijing Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Puxin Hengye Technology Development Beijing Co ltd, Yiren Hengye Technology Development Beijing Co ltd filed Critical Puxin Hengye Technology Development Beijing Co ltd
Priority to CN201911006993.2A priority Critical patent/CN110751400B/en
Publication of CN110751400A publication Critical patent/CN110751400A/en
Application granted granted Critical
Publication of CN110751400B publication Critical patent/CN110751400B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Human Resources & Organizations (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Evolutionary Computation (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Educational Administration (AREA)
  • Tourism & Hospitality (AREA)
  • Probability & Statistics with Applications (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides a risk assessment method and a device, and the method comprises the following steps: grouping data sources according to the risk information quantity of the data to obtain a strong correlation variable group comprising strong correlation variables and a weak correlation variable group comprising weak correlation variables; constructing a first risk assessment model according to the weak correlation variable group; performing predictive probability classification on the full-scale samples only containing the weak correlation variables by using the first risk assessment model to obtain a sample group with the highest negative sample proportion; constructing a second risk assessment model according to the sample group with the highest negative sample proportion and the strong correlation variable group; and performing risk assessment by using the first risk assessment model and the second risk assessment model. The method solves the problem of low model prediction efficiency caused by unbalance of positive and negative samples, and improves the prediction efficiency of the risk assessment model.

Description

Risk assessment method and device
Technical Field
The invention relates to the technical field of risk control, in particular to a risk assessment method and a risk assessment device.
Background
Risk assessment is the quantification of risk and is a critical technique for risk management. At present, risk assessment is generally carried out in a modeling mode, and in the process of establishing a model, the steps of data extraction, feature generation, feature selection, algorithm model generation, rationality assessment and the like are mainly carried out.
As the source channel of data is richer and richer, more and more data fields can be used as risk characteristic variables. Since not all risk feature fields in all samples are valid values, the occurrence of missing values is inevitable, and the missing situation progresses toward an increasingly serious direction as the number of feature fields increases.
When data are generally sparse, namely the vacancy values of risk characteristic fields are more, if the characteristic selection and the subsequent modeling process are carried out according to the traditional model means, the efficiency of model prediction is low, and when the risk assessment is carried out by utilizing the model, the accuracy of the risk assessment is low.
Disclosure of Invention
In view of this, the present invention provides a risk assessment method and apparatus to improve the prediction efficiency of the model.
In order to achieve the above purpose, the invention provides the following specific technical scheme:
a method of risk assessment, comprising:
grouping data sources according to the risk information quantity of the data to obtain a strong correlation variable group comprising strong correlation variables and a weak correlation variable group comprising weak correlation variables;
constructing a first risk assessment model according to the weak correlation variable group;
performing predictive probability classification on the full-scale samples only containing the weak correlation variables by using the first risk assessment model to obtain a sample group with the highest negative sample proportion;
constructing a second risk assessment model according to the sample group with the highest negative sample proportion and the strong correlation variable group;
and performing risk assessment by using the first risk assessment model and the second risk assessment model.
Optionally, before the constructing the first risk assessment model according to the weak relevant variable group, the method further includes:
and respectively carrying out noise reduction processing on the strong correlation variable group and the weak correlation variable group.
Optionally, the performing, by using the first risk assessment model, predictive probability classification on a full-scale sample only including the weak correlation variable to obtain a sample group with a highest negative sample proportion includes:
performing predictive probability classification on the full-scale samples only containing the weak correlation variables by using the first risk assessment model to obtain the probability that each sample in the full-scale samples only containing the weak correlation variables is a negative sample;
and dividing the full samples only containing the weak correlation variables into a sample group with the highest proportion of negative samples and a sample group with the lowest proportion of negative samples according to a preset dividing point and the probability that each sample in the full samples only containing the weak correlation variables is a negative sample.
Optionally, the method further includes:
and calculating the optimal value of the segmentation point by adopting a preset optimization algorithm by taking the highest prediction accuracy of the positive sample and the negative sample as an optimization target.
Optionally, the performing risk assessment by using the first risk assessment model and the second risk assessment model includes:
performing risk assessment by using the first risk assessment model to obtain a first risk assessment value;
performing risk assessment by using the second risk assessment model to obtain a second risk assessment value;
determining a maximum of the first risk assessment value and the second risk assessment value as a final risk assessment value.
A risk assessment device comprising:
the variable group dividing unit is used for grouping the data sources according to the risk information amount of the data to obtain a strong correlation variable group comprising strong correlation variables and a weak correlation variable group comprising weak correlation variables;
the first model building unit is used for building a first risk assessment model according to the weak correlation variable group;
the probability classification unit is used for performing prediction probability classification on the full-scale samples only containing the weak correlation variables by using the first risk assessment model to obtain a sample group with the highest negative sample proportion;
the second model building unit is used for building a second risk assessment model according to the sample group with the highest negative sample proportion and the strong correlation variable group;
and the risk evaluation unit is used for carrying out risk evaluation by utilizing the first risk evaluation model and the second risk evaluation model.
Optionally, the apparatus further comprises:
and the noise reduction processing unit is used for respectively carrying out noise reduction processing on the strong correlation variable group and the weak correlation variable group.
Optionally, the probability classification unit is specifically configured to:
performing predictive probability classification on the full-scale samples only containing the weak correlation variables by using the first risk assessment model to obtain the probability that each sample in the full-scale samples only containing the weak correlation variables is a negative sample;
and dividing the full samples only containing the weak correlation variables into a sample group with the highest proportion of negative samples and a sample group with the lowest proportion of negative samples according to a preset dividing point and the probability that each sample in the full samples only containing the weak correlation variables is a negative sample.
Optionally, the apparatus further comprises:
and the division point setting unit is used for calculating the optimal value of the division point by adopting a preset optimization algorithm by taking the highest prediction accuracy of the positive sample and the negative sample as an optimization target.
Optionally, the risk assessment unit is specifically configured to:
performing risk assessment by using the first risk assessment model to obtain a first risk assessment value;
performing risk assessment by using the second risk assessment model to obtain a second risk assessment value;
determining a maximum of the first risk assessment value and the second risk assessment value as a final risk assessment value.
Compared with the prior art, the invention has the following beneficial effects:
the invention discloses a risk assessment method, firstly grouping data sources according to the risk information quantity of data to obtain a strong correlation variable group and a weak correlation variable group; then, a first risk evaluation model is constructed according to the weak correlation variable group, and the first risk evaluation model is utilized to carry out prediction probability classification on the full-scale samples only containing the weak correlation variables, so that a sample group with the highest negative sample proportion is obtained; a second risk assessment model is constructed according to the sample group with the highest negative sample proportion and the strong correlation variable group, and as the training data for constructing the second risk assessment model are the sample group with the highest negative sample proportion and the strong correlation variable group, the missing value of the training data is less, and the prediction efficiency of the second risk assessment model constructed on the basis is higher; and finally, performing risk assessment by using the first risk assessment model and the second risk assessment model, thereby improving the accuracy of the risk assessment.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
Fig. 1 is a schematic flow chart of a risk assessment method according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a risk assessment apparatus according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The embodiment provides a risk assessment method for the problems of sparse variables and unbalanced positive and negative samples for constructing a risk assessment model, which is applied to a risk assessment scene such as loan risk assessment, and please refer to fig. 1, and the risk assessment method specifically includes the following steps:
s101: grouping data sources according to the risk information quantity of the data to obtain a strong correlation variable group comprising strong correlation variables and a weak correlation variable group comprising weak correlation variables;
wherein, the higher the risk information amount of the data is, the higher the correlation of the data with the risk assessment object is, and conversely, the lower the risk information amount of the data is, the lower the correlation of the data with the risk assessment object is. If the distribution of the customer card opening amount is concentrated to a certain range, the information amount of the customer card opening amount data is also reduced to a certain range until the condition that the correlation between the customer card opening amount data and the loan risk is low can occur, it should be noted that the information amount of the data and the statistical distribution of the data are not in a direct linear relationship, and when the data distribution is complex but concentrated, the information amount of the data can be large.
The data source comprises various variable data, and the data source is grouped according to a preset grouping rule and the risk information amount of the variable data to obtain a strong relevant variable group comprising strong relevant variables and a weak relevant variable group comprising weak relevant variables. In the above example, if the concentration of the customer card opening amount is in a certain range, the customer card opening amount data is classified into the weak correlation variable group, and if the concentration of the customer card opening amount is not in the above range, the customer card opening amount data is classified into the strong correlation variable group. It should be noted that this process is generally performed in data exploratory analysis.
In order to facilitate subsequent processing, noise reduction processing can be performed on the strong correlation variable group and the weak correlation variable group, and continuity of the variables is increased.
Optionally, the deep learning auto-encoding tool may be used to perform denoising processing on the strong correlation variable group and the weak correlation variable group.
The noise reduction processing of the strong correlation variable group and the weak correlation variable group by using the deep learning auto-encoding tool is an encoding (encoder) and decoding (decoder) process using a neural network.
The neural network model comprises an input layer input, an intermediate layer code, a decoding layer decoder and an output layer output. Taking the variable as X for example, X is transformed into Z using neural network principle, where Z represents the output result of the middle layer, and the variable Z of the middle layer outputs X' through a decoder (decoder). Overall, the optimization objectives of this neural network are:
Distance(X,X′)=||X-X′|| 2
the optimization process mainly uses a gradient descent method, and details are not repeated here.
S102: constructing a first risk assessment model according to the weak correlation variable group;
in the actual operation process, the algorithm for constructing the first risk assessment model may be selected according to requirements, such as xgboost.
S103: performing predictive probability classification on the full-scale samples only containing the weak correlation variables by using the first risk assessment model to obtain a sample group with the highest negative sample proportion;
the positive and negative examples are risk assessment results, for example, in the risk assessment, a default customer is recorded as 1, and a non-default customer is recorded as 0, so that a negative example is marked as 1, and a positive example is marked as 0.
Specifically, a first risk assessment model is used for carrying out prediction probability classification on full-scale samples only containing weak correlation variables to obtain the probability that each sample in the full-scale samples only containing the weak correlation variables is a negative sample;
and dividing the full samples only containing the weak correlation variables into a sample group with the highest proportion of negative samples and a sample group with the lowest proportion of negative samples according to a preset dividing point and the probability that each sample in the full samples only containing the weak correlation variables is a negative sample.
If K represents the segmentation point, the proportion of negative samples in the sample group with the probability of the negative samples being more than or equal to K is higher, and the sample with the probability of the negative samples being more than or equal to K is divided into the sample group with the highest proportion of the negative samples; and dividing the samples with the probability of being the negative samples less than K into sample groups with the lowest proportion of the negative samples.
In order to enable the preset segmentation point K to be more reasonable and enable the prediction accuracy of the positive sample and the negative sample to be highest, the highest prediction accuracy of the positive sample and the negative sample is taken as an optimization target, and a preset optimization algorithm is adopted to calculate the optimal value of the segmentation point.
First, a confusion matrix is introduced, as shown in table 1.
TABLE 1
Figure BDA0002243072330000061
When the segmentation point K is determined, it is apparent that the sample may be divided into two types of predicted values, one type being predicted as a positive sample, using a first risk assessment model constructed from a weakly correlated variable group; the other type of prediction is negative examples. In the case of prediction as negative samples, the proportion of true negative samples is significantly increased, while in prediction as positive samples, the proportion of positive samples is the majority, so the optimization goal is as follows:
Figure BDA0002243072330000062
where a and b are coefficients that need to be input in practice. In an actual process, determining K may use multiple optimization methods, which may use a discrete optimization algorithm, or may perform simple traversal under the condition that there are not many sample sets.
S104: constructing a second risk assessment model according to the sample group with the highest negative sample proportion and the strong correlation variable group;
in the actual operation process, the algorithm for constructing the second risk assessment model may be selected according to the requirement, for example, xgboost.
The algorithms for constructing the first risk assessment model and the second risk assessment model may be the same or different.
The above process does not directly use an undersampling mode to increase the proportion of negative samples in the data source, but firstly divides the data source into a strong correlation variable group and a weak correlation variable group, and then divides full samples only containing weak correlation variables into a sample group with the highest proportion of negative samples and a sample group with the lowest proportion of negative samples. On the basis, the model probability of the second risk assessment model constructed according to the sample group with the highest negative sample proportion and the strong correlation variable group is the natural probability, so that the introduction of human errors is prevented to a certain extent, and the overfitting phenomenon of the model caused by undersampling sampling is prevented.
S105: and performing risk assessment by using the first risk assessment model and the second risk assessment model.
Specifically, risk assessment is performed by using the first risk assessment model to obtain a first risk assessment value; performing risk assessment by using the second risk assessment model to obtain a second risk assessment value; determining a maximum of the first risk assessment value and the second risk assessment value as a final risk assessment value.
P final (x)=max{P model1 (x),P model2 (x)}
Wherein P is model1 (x) Representing a first risk assessment value, P model2 (x) Representing a second risk assessment value. max represents the maximum of two elements.
According to the risk assessment method disclosed by the embodiment, firstly, data sources are grouped according to the risk information amount of data to obtain a strong correlation variable group and a weak correlation variable group; then, a first risk evaluation model is constructed according to the weak correlation variable group, and the first risk evaluation model is utilized to carry out prediction probability classification on the full-scale samples only containing the weak correlation variables, so that a sample group with the highest negative sample proportion is obtained; a second risk assessment model is constructed according to the sample group with the highest negative sample proportion and the strong correlation variable group, and as the training data for constructing the second risk assessment model are the sample group with the highest negative sample proportion and the strong correlation variable group, the missing value of the training data is less, and the prediction efficiency of the second risk assessment model constructed on the basis is higher; and finally, performing risk assessment by using the first risk assessment model and the second risk assessment model, thereby improving the accuracy of the risk assessment.
Based on the risk assessment method disclosed in the above embodiments, the present embodiment discloses a risk assessment apparatus, please refer to fig. 2, the apparatus includes:
a variable group dividing unit 201, configured to group data sources according to risk information amount of the data to obtain a strong relevant variable group including a strong relevant variable and a weak relevant variable group including a weak relevant variable;
a first model building unit 202, configured to build a first risk assessment model according to the weakly correlated variable group;
a probability classification unit 203, configured to perform predictive probability classification on a full-scale sample that only includes the weak correlation variable by using the first risk assessment model, so as to obtain a sample group with a highest negative sample proportion;
a second model building unit 204, configured to build a second risk assessment model according to the sample group with the highest negative sample proportion and the strong correlation variable group;
a risk assessment unit 205 configured to perform risk assessment using the first risk assessment model and the second risk assessment model.
Optionally, the apparatus further comprises:
and the noise reduction processing unit is used for respectively carrying out noise reduction processing on the strong correlation variable group and the weak correlation variable group.
Optionally, the probability classification unit is specifically configured to:
performing predictive probability classification on the full-scale samples only containing the weak correlation variables by using the first risk assessment model to obtain the probability that each sample in the full-scale samples only containing the weak correlation variables is a negative sample;
and dividing the full samples only containing the weak correlation variables into a sample group with the highest proportion of negative samples and a sample group with the lowest proportion of negative samples according to a preset dividing point and the probability that each sample in the full samples only containing the weak correlation variables is a negative sample.
Optionally, the apparatus further comprises:
and the division point setting unit is used for calculating the optimal value of the division point by adopting a preset optimization algorithm by taking the highest prediction accuracy of the positive sample and the negative sample as an optimization target.
Optionally, the risk assessment unit is specifically configured to:
performing risk assessment by using the first risk assessment model to obtain a first risk assessment value;
performing risk assessment by using the second risk assessment model to obtain a second risk assessment value;
determining a maximum of the first risk assessment value and the second risk assessment value as a final risk assessment value.
According to the risk assessment device disclosed by the embodiment, firstly, data sources are grouped according to the risk information amount of data to obtain a strong correlation variable group and a weak correlation variable group; then, a first risk evaluation model is constructed according to the weak correlation variable group, and the first risk evaluation model is utilized to carry out prediction probability classification on the full-scale samples only containing the weak correlation variables to obtain a sample group with the highest negative sample proportion; a second risk assessment model is constructed according to the sample group with the highest negative sample proportion and the strong correlation variable group, and as the training data for constructing the second risk assessment model are the sample group with the highest negative sample proportion and the strong correlation variable group, the missing value of the training data is less, and the prediction efficiency of the second risk assessment model constructed on the basis is higher; and finally, performing risk assessment by using the first risk assessment model and the second risk assessment model, thereby improving the accuracy of the risk assessment.
The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The device disclosed by the embodiment corresponds to the method disclosed by the embodiment, so that the description is simple, and the relevant points can be referred to the method part for description.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, read-only memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (8)

1. A risk assessment method is applied to a loan risk assessment scene, and comprises the following steps:
grouping data sources according to the risk information quantity of the data to obtain a strong correlation variable group comprising strong correlation variables and a weak correlation variable group comprising weak correlation variables; the data source comprises variable data, and the variable data comprises customer card opening quantity data;
constructing a first risk assessment model according to the weak correlation variable group;
performing predictive probability classification on the full-scale samples only containing the weak correlation variables by using the first risk assessment model to obtain a sample group with the highest negative sample proportion;
constructing a second risk assessment model according to the sample group with the highest negative sample proportion and the strong correlation variable group;
performing risk assessment using the first risk assessment model and the second risk assessment model;
the performing predictive probability classification on the full-scale samples only containing the weak correlation variables by using the first risk assessment model to obtain a sample group with the highest negative sample proportion includes:
performing predictive probability classification on the full-scale samples only containing the weak correlation variables by using the first risk assessment model to obtain the probability that each sample in the full-scale samples only containing the weak correlation variables is a negative sample;
and dividing the full samples only containing the weak correlation variables into a sample group with the highest proportion of negative samples and a sample group with the lowest proportion of negative samples according to a preset dividing point and the probability that each sample in the full samples only containing the weak correlation variables is a negative sample.
2. The method of claim 1, wherein prior to said constructing a first risk assessment model from said set of weakly-relevant variables, said method further comprises:
and respectively carrying out noise reduction processing on the strong correlation variable group and the weak correlation variable group.
3. The method of claim 1, further comprising:
and calculating the optimal value of the segmentation point by adopting a preset optimization algorithm by taking the highest prediction accuracy of the positive sample and the negative sample as an optimization target.
4. The method of claim 1, wherein said performing a risk assessment using said first risk assessment model and said second risk assessment model comprises:
performing risk assessment by using the first risk assessment model to obtain a first risk assessment value;
performing risk assessment by using the second risk assessment model to obtain a second risk assessment value;
determining a maximum of the first risk assessment value and the second risk assessment value as a final risk assessment value.
5. A risk assessment apparatus for use in a loan risk assessment scenario, the apparatus comprising:
the variable group dividing unit is used for grouping the data sources according to the risk information amount of the data to obtain a strong correlation variable group comprising strong correlation variables and a weak correlation variable group comprising weak correlation variables; the data source comprises variable data, and the variable data comprises customer card opening quantity data;
the first model building unit is used for building a first risk assessment model according to the weak correlation variable group;
the probability classification unit is used for performing prediction probability classification on the full-scale samples only containing the weak correlation variables by using the first risk assessment model to obtain a sample group with the highest negative sample proportion;
the second model building unit is used for building a second risk assessment model according to the sample group with the highest negative sample proportion and the strong correlation variable group;
a risk assessment unit for performing risk assessment using the first risk assessment model and the second risk assessment model;
wherein, the probability classification unit is specifically configured to:
performing predictive probability classification on the full-scale samples only containing the weak correlation variables by using the first risk assessment model to obtain the probability that each sample in the full-scale samples only containing the weak correlation variables is a negative sample;
and dividing the full samples only containing the weak correlation variables into a sample group with the highest proportion of negative samples and a sample group with the lowest proportion of negative samples according to a preset dividing point and the probability that each sample in the full samples only containing the weak correlation variables is a negative sample.
6. The apparatus of claim 5, further comprising:
and the noise reduction processing unit is used for respectively carrying out noise reduction processing on the strong correlation variable group and the weak correlation variable group.
7. The apparatus of claim 5, further comprising:
and the division point setting unit is used for calculating the optimal value of the division point by adopting a preset optimization algorithm by taking the highest prediction accuracy of the positive sample and the negative sample as an optimization target.
8. The device according to claim 5, wherein the risk assessment unit is specifically configured to:
performing risk assessment by using the first risk assessment model to obtain a first risk assessment value;
performing risk assessment by using the second risk assessment model to obtain a second risk assessment value;
determining a maximum of the first risk assessment value and the second risk assessment value as a final risk assessment value.
CN201911006993.2A 2019-10-22 2019-10-22 Risk assessment method and device Active CN110751400B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911006993.2A CN110751400B (en) 2019-10-22 2019-10-22 Risk assessment method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911006993.2A CN110751400B (en) 2019-10-22 2019-10-22 Risk assessment method and device

Publications (2)

Publication Number Publication Date
CN110751400A CN110751400A (en) 2020-02-04
CN110751400B true CN110751400B (en) 2022-08-02

Family

ID=69279360

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911006993.2A Active CN110751400B (en) 2019-10-22 2019-10-22 Risk assessment method and device

Country Status (1)

Country Link
CN (1) CN110751400B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111816298A (en) * 2020-06-05 2020-10-23 北京先通康桥医药科技有限公司 Event prediction method and device, storage medium, terminal and cloud service system
CN116029808B (en) * 2023-03-23 2023-06-30 北京芯盾时代科技有限公司 Risk identification model training method and device and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514566A (en) * 2013-10-15 2014-01-15 国家电网公司 Risk control system and method
CN107025596A (en) * 2016-02-01 2017-08-08 腾讯科技(深圳)有限公司 A kind of methods of risk assessment and system
CN108550077A (en) * 2018-04-27 2018-09-18 信雅达系统工程股份有限公司 A kind of individual credit risk appraisal procedure and assessment system towards extensive non-equilibrium collage-credit data

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514566A (en) * 2013-10-15 2014-01-15 国家电网公司 Risk control system and method
CN107025596A (en) * 2016-02-01 2017-08-08 腾讯科技(深圳)有限公司 A kind of methods of risk assessment and system
CN108550077A (en) * 2018-04-27 2018-09-18 信雅达系统工程股份有限公司 A kind of individual credit risk appraisal procedure and assessment system towards extensive non-equilibrium collage-credit data

Also Published As

Publication number Publication date
CN110751400A (en) 2020-02-04

Similar Documents

Publication Publication Date Title
US10997492B2 (en) Automated methods for conversions to a lower precision data format
CN110659744B (en) Training event prediction model, and method and device for evaluating operation event
CN109190442B (en) Rapid face detection method based on deep cascade convolution neural network
CN111797122B (en) Method and device for predicting change trend of high-dimensional reappearance concept drift stream data
CN107122327B (en) Method and training system for training model by using training data
CN110084271B (en) Method and device for identifying picture category
Park et al. Data compression and prediction using machine learning for industrial IoT
CN108171379B (en) Power load prediction method
CN108647272A (en) A kind of small sample extending method based on data distribution
CN110751400B (en) Risk assessment method and device
CN110728313B (en) Classification model training method and device for intention classification recognition
CN112508243A (en) Training method and device for multi-fault prediction network model of power information system
CN112988840A (en) Time series prediction method, device, equipment and storage medium
CN112801712A (en) Advertisement putting strategy optimization method and device
CN115392477A (en) Skyline query cardinality estimation method and device based on deep learning
WO2019124724A1 (en) Method and system for learning sequence data association on basis of probability graph
CN112561050B (en) Neural network model training method and device
CN113011532A (en) Classification model training method and device, computing equipment and storage medium
CN106874286B (en) Method and device for screening user characteristics
CN110837853A (en) Rapid classification model construction method
CN111882046B (en) Multimedia data identification method, device, equipment and computer storage medium
CN112463964B (en) Text classification and model training method, device, equipment and storage medium
CN111626472B (en) Scene trend judgment index computing system and method based on depth hybrid cloud model
CN114547552A (en) Method and device for generating analog data, intelligent terminal and storage medium
CN111898666A (en) Random forest algorithm and module population combined data variable selection method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant