CN110738565A - Real estate finance artificial intelligence composite wind control model based on data set - Google Patents

Real estate finance artificial intelligence composite wind control model based on data set Download PDF

Info

Publication number
CN110738565A
CN110738565A CN201910991857.7A CN201910991857A CN110738565A CN 110738565 A CN110738565 A CN 110738565A CN 201910991857 A CN201910991857 A CN 201910991857A CN 110738565 A CN110738565 A CN 110738565A
Authority
CN
China
Prior art keywords
data
model
loan
logistic regression
composite
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910991857.7A
Other languages
Chinese (zh)
Inventor
陈彦佐
李羽中
卢家浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongshan Yinlu Jinke Information Technology Co Ltd
Original Assignee
Zhongshan Yinlu Jinke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongshan Yinlu Jinke Information Technology Co Ltd filed Critical Zhongshan Yinlu Jinke Information Technology Co Ltd
Priority to CN201910991857.7A priority Critical patent/CN110738565A/en
Publication of CN110738565A publication Critical patent/CN110738565A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/067Enterprise or organisation modelling

Abstract

The invention discloses a data set-based real estate finance artificial intelligence composite wind control model, which is realized by the following steps: data merging, data processing, model training and model output; the data are merged into merging operation of multiple data sources, the data sources are multidimensional original data obtained based on client authorization, a data set is obtained after the data are merged, models are based on the data set, model training is carried out on the merged data set by utilizing a multi-supervised learning model including but not limited to logistic regression, decision trees, random forests and the like, and finally a visual wind control result is output.

Description

Real estate finance artificial intelligence composite wind control model based on data set
Technical Field
The invention relates to a data set-based real estate finance artificial intelligence composite wind control model, belonging to the technical field of real estate finance wind control
Technical Field
Most of the existing house property financial businesses are implemented offline, relevant credit investigation records, historical default conditions, liability conditions and the like of an applicant are collected, and an experienced credit investigation wind control staff carries out manual examination and verification on the applicant to judge whether the applicant has corresponding repayment capacity and repayment willingness. Therefore, the requirements on the experience of the crediting and auditing wind control personnel are very important, the crediting and auditing wind control personnel with rich experience also need to control the bad account loss of the business within a reasonable range through long-term accumulated self experience, but the time cost of cultivation of the experienced crediting and auditing wind control personnel is huge, the labor cost is high, the flow and the method cannot be accurately described based on the judgment of the experience, and the difficulty in teaching is high.
Therefore, types of wind control models can be obtained by fitting and training the collected data sets through a mathematical model by a machine, the effect of autonomous learning is achieved, and types of wind control models with the risk judgment capability are finally formed, so that the application of the wind control models in business is very important, and therefore, the invention is types of house property financial artificial intelligence composite wind control models based on the data sets.
Disclosure of Invention
Aiming at the defects of the existing business, the invention aims to provide real estate finance artificial intelligence composite wind control models based on data sets so as to solve the problems existing in the background.
In order to achieve the purpose, the invention provides the following technical implementation scheme: the real estate finance artificial intelligence composite wind control model based on the data set is realized by the following steps:
1. data processing: firstly, classifying the real estate financial wind control model data into 4 dimensions; namely, the real estate data, the pre-loan data, the anti-fraud data and the data in the loan, in the 4 dimensions, the data of each dimension is further subdivided, and after the data of each dimension is determined, the original data is processed, specifically: the method comprises the following steps of obtaining unstructured data through an API (application programming interface) interface, cleaning the unstructured data to obtain structured data, and classifying the data of each dimension as follows:
the real estate data: the property data comprises property information, property estimation data, property fluctuation data and the like of each city;
pre-loan data: the pre-loan data is the data of the applicant, including the current state of debt, historical credit condition, consumption capacity, repayment capacity and the like;
anti-fraud data: anti-fraud data provides and collects online and offline multidimensional data and the like for the applicant;
data in loan: the data in the loan are repayment data, house property value future fluctuation prediction data, loan entry data and the like, and the repayment data is formed according to monthly repayment data after the borrowing of the applicant is successful and whether the loan is overdue or bad.
2. And (3) performing model training on the obtained data set, namely performing model training on the data set combined by the 4 dimensions in the step by using a multi-supervised learning model including but not limited to logistic regression, decision trees, random forests and the like, and specifically performing the following steps:
①, determining the number of characteristic variables, namely, acquiring 300 characteristic data from an API (application programming interface), adding 100 special data generated by a service, cleaning and processing to remove 30 variables, and finally determining 370 variable dimensions in total;
② training model, putting the determined characteristic variable table into the model to train to obtain an output field, wherein the field represents the default probability of the user, and for the training of the model, the data is trained by taking the decision tree model and the logistic regression as examples:
2.1 decision Tree model
In the probability theory, the information entropy gives ways for measuring uncertainty, and is used for measuring uncertainty of random variables, and the entropy is the expected value of the information, if the objects to be classified are possibly divided into N classes, namely X1, X2, … … and xn, and the probabilities obtained by each class are P1, P2, … … and Pn, respectively, the entropy of X is defined as:
Figure BSA0000192375650000021
from the definition: 0 ≤ H (X) ≤ log (n)
When the random variable takes only two values, i.e., the distribution of X is P (X-1) P, X (X-0) 1-P, and 0 ≦ P ≦ 1, the entropy is: h (x) -plog2(p) - (1-p) log2 (1-p);
through the formula, each node in the data can be determined, finally, the optimal feature selection of the final decision model can be obtained through the priority of the node, and after the prediction selection of the model, the output probability can represent the credit qualification of the user:
2.2 logistic regression model
Putting the variables determined in the step 1 quantity determination into a logistic regression model for training to obtain an output field, wherein the field represents the default probability of the user and is represented by the following model formula of logistic regression:
the logistic regression is sense linear regression, a formula model of the logistic regression is quite similar to the linear regression, w 'x + b is provided, w and b are parameters needing to be fitted specifically, namely y is w' x + b, effective w and b can be fitted by putting processed and determined variables into a formula for calculation, so that a service binary classification effect is achieved, when the model is trained, new user data are introduced into the model again for prediction, the model can effectively give default probability of a user, and accordingly risk of the user can be judged according to the default probability.
3. And combining the obtained result with unstructured data in loan service, performing composite weighting, and finally outputting a prediction result and a judgment result of a composite model, wherein the method is characterized in that the results output by the two models are combined with unstructured data in loan service and two model weight factors x1 and x2 to perform composite weighting, and the prediction result of the composite model is output, and the specific process is as follows:
logistic regression model result x1+ decision tree result x2+ real estate valuation + loan amount + r ═ y
Wherein r is noise, the result y output above is calculated by a sigmod activation function:
Figure BSA0000192375650000022
and obtaining the final prediction effect of the composite model, wherein the value is between 0 and 1, and finally performing service judgment on the output result by combining the self service.
Drawings
FIG. 1 is a flow chart of implementation steps of real estate finance artificial intelligence composite wind control models based on data sets.
Detailed Description
The real estate finance composite wind control model based on the data set is characterized in that the real estate finance composite wind control model under the multiple data sets is realized by the following steps:
1. data processing: firstly, classifying the real estate financial wind control model data into 4 dimensions; namely, the real estate data, the pre-loan data, the anti-fraud data and the data in the loan, in the 4 dimensions, the data of each dimension is further subdivided, and after the data of each dimension is determined, the original data is processed, specifically: the method comprises the following steps of obtaining unstructured data through an API (application programming interface) interface, cleaning the unstructured data to obtain structured data, and classifying the data of each dimension as follows:
the real estate data: the property data comprises property information, property estimation data, property fluctuation data and the like of each city;
pre-loan data: the pre-loan data is the data of the applicant, including the current state of debt, historical credit condition, consumption capacity, repayment capacity and the like;
anti-fraud data: anti-fraud data provides and collects online and offline multidimensional data and the like for the applicant;
data in loan: the data in the loan are repayment data, house property value future fluctuation prediction data, loan entry data and the like, and the repayment data is formed according to monthly repayment data after the borrowing of the applicant is successful and whether the loan is overdue or bad.
2. And (3) performing model training on the obtained data set, namely performing model training on the data set combined by the 4 dimensions in the step by using a multi-supervised learning model including but not limited to logistic regression, decision trees, random forests and the like, and specifically performing the following steps:
①, determining the number of characteristic variables, namely, acquiring 300 characteristic data from an API (application programming interface), adding 100 special data generated by a service, cleaning and processing to remove 30 variables, and finally determining 370 variable dimensions in total;
② training model, putting the determined characteristic variable table into the model to train to obtain an output field, wherein the field represents the default probability of the user, and for the training of the model, the data is trained by taking the decision tree model and the logistic regression as examples:
2.1 decision Tree model
In the probability theory, the information entropy gives ways for measuring uncertainty, and is used for measuring uncertainty of random variables, and the entropy is the expected value of the information, if the objects to be classified are possibly divided into N classes, namely X1, X2, … … and xn, and the probabilities obtained by each class are P1, P2, … … and Pn, respectively, the entropy of X is defined as:
Figure BSA0000192375650000031
from the definition: h 0 ≦ h (X) ≦ log (n) when the random variable takes only two values, i.e., the distribution of X is P (X ═ 1) ═ P, X (X ═ 0) ═ 1-P, and 0 ≦ P ≦ 1 then the entropy is: h (x) -plog2(p) - (1-p) log2 (1-p);
through the formula, each node in the data can be determined, finally, the optimal feature selection of the final decision model can be obtained through the priority of the node, and after the prediction selection of the model, the output probability can represent the credit qualification of the user;
2.2 logistic regression model
Putting the variables determined in the step 1 quantity determination into a logistic regression model for training to obtain an output field, wherein the field represents the default probability of the user and is represented by the following model formula of logistic regression:
the logistic regression is sense linear regression, a formula model of the logistic regression is quite similar to the linear regression, w 'x + b is provided, w and b are parameters needing to be fitted specifically, namely y is w' x + b, effective w and b can be fitted by putting processed and determined variables into a formula for calculation, so that a service binary classification effect is achieved, when the model is trained, new user data are introduced into the model again for prediction, the model can effectively give default probability of a user, and accordingly risk of the user can be judged according to the default probability.
3. And combining the obtained result with unstructured data in loan service, performing composite weighting, and finally outputting a prediction result and a judgment result of a composite model, wherein the method is characterized in that the results output by the two models are combined with unstructured data in loan service and two model weight factors x1 and x2 to perform composite weighting, and the prediction result of the composite model is output, and the specific process is as follows:
logistic regression model result x1+ decision tree result x2+ real estate valuation + loan amount + r ═ y
Wherein r is noise, the result y output above is calculated by a sigmod activation function:
Figure BSA0000192375650000032
and obtaining the final prediction effect of the composite model, wherein the value is between 0 and 1, and finally performing service judgment on the output result by combining the self service.
By adopting the technical method, in the aspect of , the machine can automatically generate the wind control judgment rule, the risk and the error possibly caused in the subjective judgment of manpower are effectively avoided, the production efficiency is improved, the personnel cost of an enterprise is effectively reduced, in the aspect of , along with the fact that the machine continuously obtains the learning data, the learning effect of the model is better and better, the prediction accuracy is higher and higher, the loan risk of a financial institution can be reduced to a great extent, and the profit income of the enterprise is improved.

Claims (6)

1. The real estate finance artificial intelligence composite wind control model based on the data set is realized by the following steps:
data processing: firstly, classifying the real estate financial wind control model data into 4 dimensions; namely, the real estate data, the pre-loan data, the anti-fraud data and the data in the loan, in the 4 dimensions, the data of each dimension is further subdivided, and after the data of each dimension is determined, the original data is processed, specifically: the method comprises the following steps of obtaining unstructured data through an API (application programming interface) interface, cleaning the unstructured data to obtain structured data, and classifying the data of each dimension as follows:
the real estate data: the property data comprises property information, property estimation data, property fluctuation data and the like of each city;
pre-loan data: the pre-loan data is the data of the applicant, including the current debt situation, the historical credit situation, the consumption capacity, the repayment capacity and the like;
anti-fraud data: anti-fraud data provides and collects online and offline multidimensional data and the like for the applicant;
data in loan: the data in the loan are repayment data, house property value future fluctuation prediction data, loan entry data and the like, and the repayment data is formed according to monthly repayment data after the borrowing of the applicant is successful and whether the loan is overdue or bad.
2. Model training of the 4-dimensional merged data set in step using a multi-supervised learning model including but not limited to logistic regression, decision trees, random forests, etc., the specific steps are as follows:
①, determining the number of characteristic variables, namely, acquiring 300 characteristic data from an API (application programming interface), adding 100 special data generated by a service, cleaning and processing to remove 30 variables, and finally determining 370 variable dimensions in total;
② training model, putting the determined characteristic variable table into the model to train to obtain an output field, wherein the field represents the default probability of the user, and for the training of the model, the data is trained by taking the decision tree model and the logistic regression as examples:
2.1 decision Tree model
In the probability theory, the information entropy gives ways for measuring uncertainty, and is used for measuring uncertainty of random variables, and the entropy is the expected value of the information, if the objects to be classified are possibly divided into N classes, namely X1, X2, … … and xn, and the probabilities obtained by each class are P1, P2, … … and Pn, respectively, the entropy of X is defined as:
Figure FSA0000192375640000011
from the definition: 0 is less than or equal toH(X)≤log(n)
When the random variable takes only two values, i.e., the distribution of X is P (X-1) P, X (X-0) 1-P, and 0 ≦ P ≦ 1, the entropy is: h (x) -plog2(p) - (1-p) log2 (1-p);
through the formula, each node in the data can be determined, finally, the optimal feature selection of the final decision model can be obtained through the priority of the node, and after the prediction selection of the model, the output probability can represent the credit qualification of the user;
2.2 logistic regression model
Putting the variables determined in the step 1 quantity determination into a logistic regression model for training to obtain an output field, wherein the field represents the default probability of the user and is represented by the following model formula of logistic regression:
the logistic regression is sense linear regression, a formula model of the logistic regression is quite similar to the linear regression, w 'x + b is provided, w and b are parameters needing to be fitted specifically, namely y is w' x + b, effective w and b can be fitted by putting processed and determined variables into a formula for calculation, so that a service binary classification effect is achieved, when the model is trained, new user data are introduced into the model again for prediction, the model can effectively give default probability of a user, and accordingly risk of the user can be judged according to the default probability.
3. The method according to claim 2, wherein the results obtained by the method are combined with unstructured data in the loan service and then subjected to composite weighting, and the prediction result and the judgment result of the composite model are finally output, wherein the method comprises the following steps of performing composite weighting on the results output by the two models and the results output by the method combined with unstructured data in the loan service and two model weighting factors x1 and x2, and outputting the prediction result of the composite model, wherein the method comprises the following specific steps:
logistic regression model result x1+ decision tree result x2+ real estate valuation + loan amount + r ═ y
Wherein r is noise, the result y output above is calculated by a sigmod activation function:
Figure FSA0000192375640000012
and obtaining the final prediction effect of the composite model, wherein the value is between 0 and 1, and finally performing service judgment on the output result by combining the self service.
4. The data set-based real estate finance artificial intelligence composite wind control model according to claim 1, characterized in that: an effective data source can be formed in the acquisition, processing and change processes of the service data with 4 dimensions, so that the establishment of an accurate model is effectively assisted.
5. The processing of a data set using a multi-supervised learning model as recited in claim 2, by: the method comprises the steps of modeling data through machine learning models including but not limited to a decision tree model, a random forest model and a logistic regression model in machine learning, determining a result output by each model and determining a weight parameter through adjusting parameters of the models, and finally obtaining a wind control result which has guiding significance to business through carrying out composite calculation on the results output by multiple models and weights.
6. The method as claimed in claim 3, wherein the method comprises the steps of performing composite weighting on unstructured data in the loan transaction, and finally outputting a prediction result and a judgment result of a composite model, wherein after the technical method is adopted, in the aspect of , a machine can automatically generate a wind control judgment rule, risks and errors possibly caused in human subjective judgment are effectively avoided, production efficiency is improved, and enterprise personnel cost is effectively reduced, in addition, in the aspect of , as the machine continuously obtains learning data, the model learning effect is better and better, the prediction accuracy is higher and higher, the loan risk of a financial institution can be reduced to a great extent, and accordingly profit income of an enterprise is improved.
CN201910991857.7A 2019-10-11 2019-10-11 Real estate finance artificial intelligence composite wind control model based on data set Pending CN110738565A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910991857.7A CN110738565A (en) 2019-10-11 2019-10-11 Real estate finance artificial intelligence composite wind control model based on data set

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910991857.7A CN110738565A (en) 2019-10-11 2019-10-11 Real estate finance artificial intelligence composite wind control model based on data set

Publications (1)

Publication Number Publication Date
CN110738565A true CN110738565A (en) 2020-01-31

Family

ID=69269268

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910991857.7A Pending CN110738565A (en) 2019-10-11 2019-10-11 Real estate finance artificial intelligence composite wind control model based on data set

Country Status (1)

Country Link
CN (1) CN110738565A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111709831A (en) * 2020-06-16 2020-09-25 中国银行股份有限公司 Analysis method and device of blacklist
CN113538154A (en) * 2021-07-23 2021-10-22 同盾科技有限公司 Risk object identification method and device, storage medium and electronic equipment

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101996381A (en) * 2009-08-14 2011-03-30 中国工商银行股份有限公司 Method and system for calculating retail asset risk
CN106548395A (en) * 2016-12-07 2017-03-29 深圳市好公寓信息技术有限公司 For the creditum method of house lease, apparatus and system
CN109214598A (en) * 2018-10-25 2019-01-15 上海中估联信息技术有限公司 Batch ranking method based on K-MEANS and ARIMA model prediction residential quarters collateral risk
CN109583782A (en) * 2018-12-07 2019-04-05 厦门铅笔头信息科技有限公司 Support the auto metal halide lamp air control model of multi-data source
CN109871494A (en) * 2019-01-17 2019-06-11 平安城市建设科技(深圳)有限公司 Querying method, device, equipment and the readable storage medium storing program for executing of urban house average price
CN109949152A (en) * 2019-04-15 2019-06-28 武汉理工大学 A kind of personal credit's violation correction method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101996381A (en) * 2009-08-14 2011-03-30 中国工商银行股份有限公司 Method and system for calculating retail asset risk
CN106548395A (en) * 2016-12-07 2017-03-29 深圳市好公寓信息技术有限公司 For the creditum method of house lease, apparatus and system
CN109214598A (en) * 2018-10-25 2019-01-15 上海中估联信息技术有限公司 Batch ranking method based on K-MEANS and ARIMA model prediction residential quarters collateral risk
CN109583782A (en) * 2018-12-07 2019-04-05 厦门铅笔头信息科技有限公司 Support the auto metal halide lamp air control model of multi-data source
CN109871494A (en) * 2019-01-17 2019-06-11 平安城市建设科技(深圳)有限公司 Querying method, device, equipment and the readable storage medium storing program for executing of urban house average price
CN109949152A (en) * 2019-04-15 2019-06-28 武汉理工大学 A kind of personal credit's violation correction method

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111709831A (en) * 2020-06-16 2020-09-25 中国银行股份有限公司 Analysis method and device of blacklist
CN111709831B (en) * 2020-06-16 2023-07-25 中国银行股份有限公司 Method and device for analyzing blacklist
CN113538154A (en) * 2021-07-23 2021-10-22 同盾科技有限公司 Risk object identification method and device, storage medium and electronic equipment

Similar Documents

Publication Publication Date Title
Buchdadi et al. The influence of financial literacy on SMEs performance through access to finance and financial risk attitude as mediation variables
CN110956273A (en) Credit scoring method and system integrating multiple machine learning models
CN110738564A (en) Post-loan risk assessment method and device and storage medium
CN112700319A (en) Enterprise credit line determination method and device based on government affair data
CN106570631B (en) P2P platform-oriented operation risk assessment method and system
CN114048436A (en) Construction method and construction device for forecasting enterprise financial data model
CN107609771A (en) A kind of supplier's value assessment method
CN112700324A (en) User loan default prediction method based on combination of Catboost and restricted Boltzmann machine
CN109636467A (en) A kind of comprehensive estimation method and system of the internet digital asset of brand
CN109063983B (en) Natural disaster damage real-time evaluation method based on social media data
US20220327398A1 (en) Technology maturity judgment method and system based on science and technology data
CN110634060A (en) User credit risk assessment method, system, device and storage medium
CN109657962A (en) A kind of appraisal procedure and system of the volume assets of brand
CN112884590A (en) Power grid enterprise financing decision method based on machine learning algorithm
Mitra A white paper on scenario generation for stochastic programming
CN110738565A (en) Real estate finance artificial intelligence composite wind control model based on data set
CN113159535A (en) Software service performance evaluation method based on entropy weight method
CN113435713B (en) Risk map compiling method and system based on GIS technology and two-model fusion
CN115239502A (en) Analyst simulation method, analyst simulation system, electronic device and storage medium
CN114820074A (en) Target user group prediction model construction method based on machine learning
CN113935846A (en) Financial risk prediction and evaluation system and method based on mathematical statistics
CN113177733A (en) Medium and small micro-enterprise data modeling method and system based on convolutional neural network
CN111951099A (en) Credit card issuing model and application method thereof
Zhang Forecasting financial performance of companies for stock valuation
Alcántara et al. Optimal day-ahead offering strategy for large producers based on market price response learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
CB02 Change of applicant information

Address after: Room 416, no.39-2, Keji East Road, Torch Development Zone, Zhongshan City, Guangdong Province, 528400

Applicant after: Zhongshan Yinlu Jinke Information Technology Co.,Ltd.

Address before: 804, building 2, ocean Plaza, 28 Boai 6th Road, East District, Zhongshan City, Guangdong Province, 528400

Applicant before: Zhongshan Yinlu Jinke Information Technology Co.,Ltd.

CB02 Change of applicant information
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination