CN112817939A - Construction method of data wind control model and data wind control model - Google Patents
Construction method of data wind control model and data wind control model Download PDFInfo
- Publication number
- CN112817939A CN112817939A CN202110123940.XA CN202110123940A CN112817939A CN 112817939 A CN112817939 A CN 112817939A CN 202110123940 A CN202110123940 A CN 202110123940A CN 112817939 A CN112817939 A CN 112817939A
- Authority
- CN
- China
- Prior art keywords
- data
- model
- wind control
- control model
- training
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000010276 construction Methods 0.000 title abstract description 13
- 238000012549 training Methods 0.000 claims abstract description 58
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 54
- 238000000034 method Methods 0.000 claims abstract description 46
- 238000012795 verification Methods 0.000 claims abstract description 22
- 238000005516 engineering process Methods 0.000 claims description 26
- 238000012544 monitoring process Methods 0.000 claims description 20
- 238000011835 investigation Methods 0.000 claims description 11
- 238000003066 decision tree Methods 0.000 claims description 5
- 238000003062 neural network model Methods 0.000 claims description 4
- 238000011156 evaluation Methods 0.000 claims description 2
- 238000012954 risk control Methods 0.000 abstract description 6
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 230000005540 biological transmission Effects 0.000 description 14
- 238000011161 development Methods 0.000 description 11
- 238000009826 distribution Methods 0.000 description 11
- 238000005457 optimization Methods 0.000 description 11
- 230000003993 interaction Effects 0.000 description 9
- 230000008569 process Effects 0.000 description 9
- 230000008859 change Effects 0.000 description 6
- 230000007246 mechanism Effects 0.000 description 6
- 238000004364 calculation method Methods 0.000 description 5
- 238000012502 risk assessment Methods 0.000 description 5
- 238000004088 simulation Methods 0.000 description 5
- 238000012360 testing method Methods 0.000 description 5
- 238000013461 design Methods 0.000 description 4
- 238000007726 management method Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000007477 logistic regression Methods 0.000 description 3
- 238000005259 measurement Methods 0.000 description 3
- 238000002360 preparation method Methods 0.000 description 3
- 238000011166 aliquoting Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000013178 mathematical model Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000003860 storage Methods 0.000 description 2
- 238000013475 authorization Methods 0.000 description 1
- 239000003245 coal Substances 0.000 description 1
- 230000001276 controlling effect Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000013499 data model Methods 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000000586 desensitisation Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 230000003203 everyday effect Effects 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 238000002955 isolation Methods 0.000 description 1
- 238000013508 migration Methods 0.000 description 1
- 230000005012 migration Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 230000001105 regulatory effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000007619 statistical method Methods 0.000 description 1
- 238000010998 test method Methods 0.000 description 1
- XLYOFNOQVPJJNP-UHFFFAOYSA-N water Substances O XLYOFNOQVPJJNP-UHFFFAOYSA-N 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/211—Schema design and management
- G06F16/212—Schema design and management with details for data modelling support
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2462—Approximate or statistical queries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/602—Providing cryptographic facilities or services
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/64—Protecting data integrity, e.g. using checksums, certificates or signatures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q40/00—Finance; Insurance; Tax strategies; Processing of corporate or income taxes
- G06Q40/03—Credit; Loans; Processing thereof
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Human Resources & Organizations (AREA)
- General Engineering & Computer Science (AREA)
- Computer Security & Cryptography (AREA)
- Economics (AREA)
- Software Systems (AREA)
- Strategic Management (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Entrepreneurship & Innovation (AREA)
- Data Mining & Analysis (AREA)
- General Business, Economics & Management (AREA)
- Probability & Statistics with Applications (AREA)
- Accounting & Taxation (AREA)
- Computer Hardware Design (AREA)
- Marketing (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Health & Medical Sciences (AREA)
- Bioethics (AREA)
- Computational Linguistics (AREA)
- Mathematical Physics (AREA)
- Fuzzy Systems (AREA)
- Educational Administration (AREA)
- Technology Law (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention relates to a construction method of a data wind control model and the data wind control model, and belongs to the technical field of artificial intelligence. The method comprises the following steps: constructing a client portrait and risk data mart; setting a business index by using a rule engine; taking the data in the client portrait and risk data mart as a data source; dividing a data source into a training data set and a verification data set according to a preset proportion; training a plurality of AI algorithm models by using a training data set based on the service indexes to obtain an initial training model corresponding to each AI algorithm model; verifying the initial training model corresponding to each AI algorithm model by using a verification data set based on the service indexes; taking an initial training model with a verification result meeting a preset requirement as a data wind control model; after the data wind control model is deployed, risk control is only carried out on users registered in a specific block chain. The invention can effectively improve the efficiency, safety and accuracy of wind control examination and approval and provides convenience for the cooperation of a capital party and a client.
Description
Technical Field
The invention relates to the technical field of artificial intelligence, in particular to a data wind control model and a construction method thereof.
Background
The wind control model is a short form of risk control model and is used for controlling various risks of the financial transaction market within a reasonable range through describing, estimating and simulating the correlation of risk factors, so that the risk control of financial services is realized.
When facing a plurality of financial data, the conventional wind control model is difficult to realize high-efficiency accurate assessment of financial data risks, and is difficult to ensure safe transmission of wind control results, so that the wind control model not only influences the cooperation efficiency of a fund party and a client, but also has great influence on the market values of the fund party and the client if the risk assessment is inaccurate.
Therefore, how to realize efficient and accurate risk assessment and how to realize safe transmission of assessment results is an urgent problem to be solved.
Disclosure of Invention
According to the construction method of the data wind control model and the data wind control model, the customer image technology and the wind control data marketing technology are combined, and the accuracy of describing, estimating and simulating the relevance of the risk factors can be effectively improved based on the automatic regulation and control of the rule engine on the indexes at any time; and only the users registered in the specific block chain are subjected to the wind control service, so that the wind control efficiency and the safety can be effectively improved.
In order to achieve the purpose, the invention provides the following scheme:
a method for constructing a data wind control model comprises the following steps:
constructing a client portrait and risk data mart;
setting a business index by using a rule engine;
taking the customer portraits and the data in the risk data marts as data sources;
dividing the data source into a training data set and a verification data set according to a preset proportion;
training a plurality of AI algorithm models by using the training data set based on the service index to obtain an initial training model corresponding to each AI algorithm model;
verifying the initial training model corresponding to each AI algorithm model by using the verification data set based on the service index;
taking an initial training model with a verification result meeting a preset requirement as a data wind control model;
and after the data wind control model is deployed, performing wind control service only on the users registered in the specific block chain.
Optionally, constructing the client representation specifically includes:
and taking data describing basic attributes, behavior, risk indexes and credit investigation information of the client in multiple dimensions as client portrait data.
Optionally, the constructing a risk data mart specifically includes:
and taking business operation data, financial data and enterprise legal person data as the data of the risk data mart.
Optionally, the set service index is encrypted by using an encryption technology based on the block chain.
Optionally, the training of the AI algorithm model by using the training data set specifically includes:
and training a neural network model, a regression model and a decision tree model by using the training data set.
Optionally, the step of using the initial training model with the verification result meeting the preset requirement as the data wind control model specifically includes:
and evaluating the verification result of the initial training model corresponding to each AI algorithm model based on the service reasonableness and the statistical index performance, and taking the initial training model corresponding to the AI algorithm model with the optimal evaluation result as a data wind control model.
Optionally, the output result of the data wind control model is called by using a model data interface.
Optionally, the data in the customer portrait, the data in the risk data mart, the output result of the data wind control model and the called times of the model data interface are monitored in real time;
and when the real-time monitored data exceed the preset range, reconstructing the data wind control model.
Optionally, before the output result of the data wind control model is called, encryption is performed by using an encryption technology based on a block chain.
The invention also provides a data wind control model which is constructed by the construction method of the data wind control model.
According to the specific embodiment provided by the invention, the invention discloses the following technical effects:
1) according to the method, a client portrait technology and a wind control data mart technology are combined, rich data support is provided for construction of a data wind control model, and accuracy of description, estimation and simulation of risk factor relevance by using the data wind control model is indirectly improved;
2) the method realizes the user-defined, expression prediction and model deployment of the data wind control model indexes by utilizing the rule engine technology, realizes the prediction and deployment automation of the data wind control model, and improves the application flexibility of the data wind control model;
3) according to the method, the data security of the data wind control model is improved, the robustness of the data wind control model is improved, and the accuracy of data transmission between the data wind control model and external equipment is improved by using an encryption technology based on a block chain;
4) according to the invention, a mainstream model deployment technology and an encryption technology based on a block chain are combined, so that the efficiency and the safety of data transmission between the data wind control model and external equipment are improved;
5) according to the invention, through regular monitoring and updating of the data wind control model, the description, estimation and simulation accuracy of the data wind control model on the relevance of the risk factors is further improved, and the robustness of the data wind control model and the safety and transmission efficiency of the data wind control model and external equipment during data transmission are further improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings without creative efforts.
Fig. 1 is a flowchart of a method for constructing a data wind control model in embodiment 1 of the present invention;
FIG. 2 is a schematic structural diagram of a data wind control model according to the present invention;
fig. 3 is a schematic diagram illustrating an application of a data wind control model in a small-sized enterprise financing scenario according to embodiment 2 of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
The blockchain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. The consensus mechanism is a mathematical algorithm for establishing trust and obtaining rights and interests among different nodes in a blockchain system. The block chain combines the data blocks in a sequential connection mode into a chain data structure according to the time sequence, and a distributed ledger which is not falsifiable and counterfeitable is generated by a cryptographic algorithm.
The blockchain technique utilizes a blockchain data structure to verify and store data, utilizes a distributed node consensus algorithm to generate and update data, utilizes cryptography to secure data transmission and access, and utilizes intelligent contracts composed of automated script codes to program and manipulate data.
The theoretical system of big data technology has formed at the end of the 20 th century, and distributed storage, parallel computing and database technologies are the core of big data applications. With the reduction of computer hardware cost and the increasing optimization of algorithms, big data has become a necessity for enterprises in the 21 st century data age. Storage and application of mass data is ubiquitous, whether in a leased cloud computing environment or in a localized deployment. The fusion of big data and block chain solution realizes cooperative mutual trust under the condition of effectively protecting data privacy, thus gradually becoming the standard allocation of enterprise-level data application.
The concept of the mathematical model is wide, the core of the mathematical model comprises various mathematical theories which are strictly reasoned and various application statistical calculation models based on empirical algorithms, and the core algorithm theory of the model is mature in the 70 th 20 th century. The methods such as logistic regression, decision tree and neural network are widely applied to business scenes such as financial wind control and customer value management due to good intuitiveness and interpretability.
The algorithm model has more implementation tool choices, and not only business software (such as an SAS statistical analysis system) which is adopted by the card-holding financial institution mostly, but also open-source software (such as Python and R) which is very suitable for research and innovation applications.
According to the construction method of the data wind control model and the data wind control model, mainstream technology and model algorithm are adopted from the framework design to the output of the application interface, an open-source development interface is provided in the model algorithm link, and algorithm configuration and optimization are facilitated for a user; in the model configuration and management links, the concept of a rule engine is used for reference, and the automatic deployment of model configuration, online encryption, block chain encryption output and the like can be realized; finally, in a model application link, based on scenes such as financial credit wind control and the like which run through a life cycle of a client, a standardized business report and custom output based on industry expert experience can be provided, and enterprise level data application such as big data credit investigation and the like is combined, so that complete description of the client, data mining of information, strict management of wind control rules and models can be realized, accuracy of correlation description, estimation and simulation of risk factors can be effectively improved, wind control approval efficiency and accuracy of a capital party to the client can be effectively improved, and convenience is provided for cooperation of the capital party and the client.
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in further detail below.
Example 1:
as shown in fig. 1, which is a flowchart of a method for constructing a data-oriented model in embodiment 1 of the present invention, steps of the method are represented by S1-S8.
Firstly, constructing a client portrait and risk data mart;
then setting a service index by using a rule engine;
then taking data in the customer portrait and data in the risk data mart as data sources;
dividing a data source into a training data set and a verification data set according to a preset proportion;
then training a plurality of AI algorithm models by using a training data set based on the service indexes to obtain an initial training model corresponding to each AI algorithm model;
after the initial training model is obtained, verifying the initial training model corresponding to each AI algorithm model by using a verification data set based on the service index;
finally, taking an initial training model with a verification result meeting the preset requirement as a data wind control model;
and after the data wind control model is deployed, performing wind control service only on the users registered in the specific block chain.
The above steps are further described next.
Constructing a client representation specifically comprises:
and taking data describing basic attributes, behavior, risk indexes and credit investigation information of the client in multiple dimensions as client portrait data.
Specifically, the data describing basic attributes, behavior, risk indexes and credit investigation information of the client comprises enterprise data, personal credit data and internet big data. Wherein, the enterprise data is collected from public data and internal information authorized by enterprises; personal credit data is collected from credit investigation reports authorized by the user; the internet big data is the compliant personal data authorized by the client and obtained by utilizing the crawler technology.
In addition, constructing a risk data mart specifically includes:
and taking business operation data, financial data and enterprise legal person data as the data of the risk data mart. The enterprise legal person data comprises credit investigation data of the enterprise legal person, internet performance data of the enterprise legal person and the like. The data are collected based on legal compliance principles, so that the privacy of the user is prevented from being violated.
When the rule engine is used for setting the business index, the set business index is encrypted by using an encryption technology based on the block chain, so that desensitization of sensitive data in the business index and isolation of sensitive information are realized, and model failure risk and law violation risk caused by wind control index leakage are effectively avoided.
The AI algorithm model is the core of the data wind control model construction, and needs to be trained based on the client image, the risk data market and the business index after being constructed. The data in the client image and the data in the data risk market are used as data sources, then the data sources are divided into a training data set and a verification data set according to a preset proportion, namely, one part of data in the data sources is used for training the AI algorithm model, the other part of data in the data sources is used for verifying the output result of the trained AI algorithm model, and whether the trained AI algorithm model meets the preset requirements is judged.
The invention selects the mainstream models of a neural network model, a regression model and a decision-making model.
After various AI algorithm models are trained by using the training data set and the trained AI algorithm models are verified by using the verification data set, whether the trained AI algorithm models meet requirements needs to be judged according to verification results, and the trained AI algorithm models which best meet the requirements are used as data wind control models.
In the process, the verification result of the initial training model corresponding to each AI algorithm model is estimated based on the service rationality and the statistical index performance, and the initial training model corresponding to the AI algorithm model with the optimal estimation result is used as the data wind control model.
Specifically, the business rationality refers to whether data located in the client image and risk data mart for data wind control model construction has business importance. The statistical index performance refers to K-S statistic used for judging the rationality of the output result of the AI algorithm model.
Based on the above, the construction of the data wind control model is completed. However, if the data wind control model is applied to an actual scene, the output result of the data wind control model is also called to a fund side needing risk assessment and control through a model data interface, and the fund side comprehensively understands the risk condition of a client who cooperates with the fund side through the output result of the data wind control model, so that the fund side and the client can cooperate conveniently.
In order to further improve the wind control accuracy and the wind control efficiency of the data wind control model, the invention also considers the real-time monitoring of the data in the customer portrait, the data in the risk data mart, the output result of the data wind control model and the called times of the model data interface; and when the real-time monitored data exceed the preset range, constructing the data wind control model by reusing the method.
The purpose of this process is to achieve real-time optimization of the data wind control model. Because the client portrait and the data in the risk data mart are changed at any time along with the change of the client, if the data wind control model is still constructed based on the original fixed client portrait, the data in the risk data mart and the fixed business indexes, the output result of the data wind control model is different from the client information which is changed at any time with a high probability, so that the accuracy of risk assessment of the client is reduced, and the cooperation of a fund side and the client has a high risk. If the data in the customer portrait, the data in the risk data mart, the output result of the data wind control model and the called times of the model data interface are monitored in real time, whether the monitored data exceed the preset range or not is judged in real time, the data wind control model can be adjusted at any time according to the change of the customer information, the constructed data wind control model follows the change of the customer information, and the risk of the customer can be accurately evaluated and effectively controlled at any time.
In addition, the invention considers the safety problem of data interaction, and carries out encryption processing based on the block chain on the output result before the output result of the data wind control model is called to a capital side, so that the output result is converted into a high-efficiency safe data packet, and the safety of data interaction can be effectively improved.
And the blockchain itself has a consensus mechanism and intelligent contracts. The consensus mechanism has the characteristics of 'few obedience majority' and 'people equal', wherein the 'few obedience majority' does not completely refer to the number of nodes in the block chain, and can also refer to the computing power, the number of shares or other characteristic quantities which can be compared by a computer of each node in the block chain. "equal" means that when the nodes satisfy the condition, all nodes have the right to give priority to the consensus result, and the consensus result is possible to be the final consensus result after being directly identified by other nodes.
Taking bitcoins as an example, workload proof is adopted, and only after accounting nodes which control more than 51% of the whole network are controlled, a record which does not exist can be forged. The method can not be realized under the condition that the number of nodes of the whole network is enough, thereby effectively avoiding the possibility of counterfeiting and ensuring the safety of block chain data transmission.
The intelligent contracts are aimed at trusted and non-falsifiable data, and some predefined rules and terms can be automatically executed directly based on the data. That is, if a partner violates a default, refuses to make indemnity or refuses to continue to fulfill obligations, the predefined rules and terms can be automatically executed directly through the intelligent contract, so that delay of things caused by refusing to execute can be effectively avoided, and the event processing efficiency can be effectively improved based on the intelligent contract in the block chain. For the method, the security of the wind control service can be effectively improved by using the consensus mechanism in the block chain, and the execution efficiency of the wind control service can be effectively improved by using the intelligent contract in the block chain.
In conclusion, the data wind control model constructed by the construction method based on the data wind control model can realize accurate description, estimation and simulation of the correlation of the risk factors of the clients, can ensure safe and efficient transmission of the output results of the data wind control model by using the block chain technology, and can provide convenience for cooperation of the clients and the fund parties.
Example 2:
fig. 3 is a schematic diagram illustrating an application of a data wind control model in a small micro enterprise financing scenario according to an embodiment 2 of the present invention. The method is based on a big data platform based on a block chain technology, starts from drawing customer figures, designs risk data marts according to financial credit application scenes, uses a mainstream AI algorithm in the modeling process of a data wind control model, deploys the data wind control model through a rule engine finally, and carries out technical interaction in the form of a model application interface.
Based on the above, the data source of the data wind control model in the invention is a big data platform based on the block chain encryption technology. The big data platform can be deployed based on public cloud, and can also be locally deployed and customized and developed according to enterprise requirements. The big data platform based on the block chain encryption technology comprises the following parts (the parts are independent of each other):
1) customer portrait:
the basic attributes, the behavior, the risk indexes, the credit investigation information and the like of the client are drawn from small and micro enterprises, personal credit, internet big data and other dimensions. Enterprise data is collected from public data and internal information authorized by the enterprise; the personal data comes from credit report authorized by the user; meanwhile, the crawler technology is utilized, and the personal data is utilized on the basis of the premise of client authorization. Enterprise business data needs to obtain enterprise permission, and data model design and customized development of a data warehouse can be carried out according to actual conditions of organizations such as a large government data bureau.
2) A rule engine:
for product users with certain technical capability and business experience, the work of self-defining, performance prediction, model deployment and the like of data wind control model indexes can be realized. The development process of the data wind control model has higher requirements on the theoretical and technical capabilities of a developer, so that the efficiency is not high from the model deployment online to the business operation stage.
3) AI algorithm:
the data wind control model development algorithm covers various mainstream standardized AI algorithm modules, users can freely call the modules, and algorithm self-defined optimization can be realized. Professional wind control model developers can perform a series of automatic operations such as data integration and model development, the model development efficiency is greatly improved, various mainstream AI algorithms such as a neural network, a logistic regression model and a decision tree are realized through an application program interface, the preparation of model data mainly comes from a risk data mart, and users can also design derived indexes by using a rule engine.
4) Monitoring a model:
the model expression is regularly monitored and updated, and a user can generate an automatic report form by adopting a standardized template or combining with a user-defined index according to the self requirement. For example, wind control of a financial credit scene is often performed based on a life cycle of a customer, the identification capability of a data wind control model needs to be continuously monitored in business links from pre-loan approval to post-loan monitoring and collection, and main monitoring indexes of a report include the fluctuation range percentage and the expression migration percentage of indexes.
5) Risk data mart:
model indexes are enriched and perfected based on application scenarios (credit, wind control, associated recommendation, etc.) and business topics (supply chain finance, consumption finance, etc.). The core function of the risk data mart is to provide data support for development of a data wind control model and online deployment of the model. The risk data mart mainly comprises business operation data, financial data, enterprise legal data and the like; personal data comprises credit investigation data, internet performance data and the like, and is designed and collected based on a compliance principle, so that the privacy of a user is prevented from being violated.
6) Model application interface API:
and efficient and friendly model deployment and interaction are realized by applying mainstream model deployment technologies (XML, JSON and the like). Since plaintext exchange is a technical taboo of data information security, it is necessary to encrypt the output result of the model by using an encryption technology based on a block chain, and convert the output result of the model into a high-efficiency and safe data packet, so as to realize high-efficiency and safe interaction in a business scene.
The client representation, rule engine, AI algorithm, model monitoring, risk data mart, and model application interface described in 1) -6) above are all provided in a blockchain environment. That is, the data wind control model is constructed based on client portrait data and risk data mart data stored in a block chain environment; setting a service index through a rule engine based on index data stored in a block chain environment; and training and verifying the AI algorithm model by using the customer portrait data, the risk data market data and the set service indexes stored in the block chain environment so as to construct a data wind control model based on the data stored in the block chain environment.
After the data wind control model is deployed on line, the output result of the model can be transmitted to users registered in the block link environment through the model application interface. That is, only the users legally registered in the block link environment have the right to perform risk assessment and control by using the data wind control model of the present invention, and only the users legally registered in the block link environment have the right to obtain the output result of the model through the model application interface. The data wind control model and the method can effectively improve the use safety of the data wind control model and the safety of output result transmission.
Moreover, based on the characteristic of decentralized block chain, in the transmission process of output results, the method does not depend on an additional third-party management mechanism or hardware equipment, does not have central control, can directly realize point-to-point data transmission in the block chain environment, can ensure the safe acquisition of data, and avoids data leakage.
Taking a data wind control process of a small micro enterprise financing scene as an example, an application process of a data wind control model is described, and the specific steps are as follows:
step 1:
and establishing a risk data mart according to the characteristics of the client and the wind control demand of the capital, wherein indexes of the risk data mart can provide data support for model development and model monitoring of the subsequent AI algorithm. In the embodiment, the risk data mart is mainly drawn from the following aspects:
(1) enterprise basic information distribution: the method comprises the steps of enterprise total number, risk prompt enterprise number (information loss is executed, illegal, poor operation and the like), enterprise establishment age distribution, enterprise industry distribution and the like.
(2) Monitoring enterprise risk information: including the statistics and tracking of various risk events (credit risk, market risk, operational risk, etc.) of enterprises and jurisdictions.
The enterprise risk information monitoring can play a role in financial risk early warning prompt, and the information can be used as an input index of a data wind control model and used for model development;
(3) enterprise risk data market operation monitoring: the method comprises the technical indexes (data volume, index quantity, operation stable days and the like) and the business indexes (source statistics of various types of data, whether the data source meets the compliance risk and the like) of risk data mart.
The purpose of risk data mart operation monitoring is to guarantee the stability of a data wind control model from a technical perspective.
For example, the variable of age is used as a weighting factor of the data wind control model, and the applicability of the weighting factor is considered according to actual conditions.
(4) And (3) operation monitoring of a data wind control model: the method mainly comprises the times of calling the model application interface by the partner and service-related statistical indexes (such as credit approval quota distribution, risk early warning times, approval passing rate based on a data wind control model and the like).
The model operation monitoring is mainly used for verifying the application effect of the data wind control model.
For example, the output result of a data wind control model may be a credit score of 0-100 points, and if the higher the score is, the better the client credit is, we want to see that the client with the higher score has a higher approval passing rate, otherwise, it is likely that the model has problems in use, and needs to investigate the reason and modify the model.
Step 2:
and integrating the related multidimensional data of the small and micro enterprises to form an enterprise customer portrait. The enterprise customer portrait data related to the small micro-enterprise mainly comprises the following types:
(1) basic information of the enterprise: the method mainly comprises enterprise registration capital, enterprise location, enterprise establishment age and the like;
(2) enterprise legal information: the method mainly comprises basic information of enterprise legal people, authorized enterprise credit investigation records and the like;
(3) enterprise operation information: the method mainly comprises enterprise collection and payment records, enterprise tax payment records, enterprise water, electricity and coal use data and the like;
(4) enterprise public opinion information: the method mainly comprises enterprise credit investigation, public inspection public information, enterprise internet public sentiment and the like.
The purpose of obtaining enterprise customer representation data is to provide data support for subsequent steps such as AI algorithm modeling.
The risk data mart is constructed in the step 1, and the enterprise customer portrait is constructed in the step 2. The main difference between the risk data mart and the enterprise customer portrait is whether a specific application scene exists, and generally, the business meaning of the risk data mart is more clear, for example, the business direction of risk, marketing and the like is focused.
And step 3:
training and verifying an AI algorithm model based on the customer portrait and risk data mart, wherein the AI algorithm model can be sourced from a module of an open source software, such as a Spark version of statistical analysis software R, and can be developed on the basis of an innovative algorithm.
The main steps of training and verifying the AI algorithm model taking a credit approval scene as an example in the embodiment are as follows:
(1) preparing training data of the data wind control model:
the method mainly comprises target variables (such as loan repayment overdue for more than 30 days) for defining a model, independent variables (multidimensional data related to small micro-enterprises) and random sampling based on data in the risk data mart in the step 1 (a certain amount of data can be randomly extracted from full amount of data, and the full amount of data can also be directly utilized).
(2) Training a plurality of models, selecting an optimal model training result:
the method comprises the steps of trial calculation and verification of various models, wherein the various models comprise main flow models such as a neural network model, a regression model and a decision tree model, the optimal model is obtained finally based on business rationality (whether various indexes of a data wind control model have business importance) and statistical index performance (such as K-S statistic), and the optimal model is used as the data wind control model.
The K-S statistic is Kolmogorov-Smirnov statistic, K-S statistic for short, and is a test method for comparing a frequency distribution f (x) with a theoretical distribution g (x) or two observed value distributions.
Wherein, the main steps of training the multiple models in (2) comprise:
1) leading the training data of the prepared data wind control model into software according to the format requirements of model training software (such as SAS and R);
2) and performing preliminary trial calculation, and observing the service significance and the statistical significance of the index weight coefficients of various models.
The business significance mainly refers to whether the business meaning of the model index has business significance and compliance, and the indexes such as gender and age may have statistical significance (p value is less than 0.01, wherein the p value is an important index for judging the statistical significance), but the indexes such as gender and age may not meet the inverse discrimination regulation of the regulatory compliance, so that the indexes cannot be calculated into the final model weight.
3) And (3) verifying the model training result: after a model training result (for example, credit score of an enterprise) is formed, a batch of data (for example, enterprise data in different historical periods) can be randomly extracted from data except the model training data, a statistical index (for example, K-S statistical quantity) shown by the model is calculated again, if the deviation percentage of the statistical index and the result of the model training process is not large (for example, less than 1%), the result of the model can be considered to be stable, a small-scale online test can be tried, and after a certain period of time, the data wind control model can be comprehensively applied.
The model training result is verified through model measurement and calculation of different dimensional data, and the robustness of the model is guaranteed.
And 4, step 4: and (3) monitoring the data wind control model after the online in real time: the optimization of the data wind control model is carried out by analyzing the deviation between the actual repayment behavior of the small and micro enterprise owner and the model prediction behavior, and the main flow is as follows:
1) preparation of model monitoring data set:
periodically extracting data (for example, with a month frequency) according to the model target variables and the independent variables in the step 3;
2) calculating a model monitoring index:
typically, the PSI of the overdue rate predicted by the model is primarily observed. Taking a credit scene as an example, generally considering that the stability of the model is very high when the PSI is less than 0.1, the stability of the model is general when the PSI is between 0.1 and 0.2, at this time, the model needs to be further researched, and when the PSI is greater than 0.2, the stability of the model is poor, at this time, the model needs to be considered to be optimized;
the PSI is a Population Stability Index (Population Stability Index), and the corresponding formula is sum [ (actual occupancy ratio-expected occupancy ratio) × ln (actual occupancy ratio/expected occupancy ratio) ].
For example, training a logistic regression model to predict it will output a probability, which is denoted by p. And setting the output corresponding to the test data set as p1, sorting the test data set from small to large, dividing the data set into 10 equal parts, wherein the data volume in each equal part is consistent, and calculating the maximum and minimum predicted probability value in each equal part. The new sample is then predicted using this model to yield p2, and the new sample is divided into 10 portions per p2 using the upper and lower bounds for each of the 10 aliquots obtained on the test dataset. The actual fraction is the fraction of new samples falling within each aliquoting limit demarcated by p1 by p2, and the expected fraction is the fraction of each aliquoting sample on the test data set.
The significance of the PSI statistics is: if the prediction effect of the model is stable, the probability obtained by prediction on new data is consistent with the distribution of the initial modeling, otherwise, the prediction capability of the model is changed, and optimization adjustment needs to be carried out on the model.
3) Optimizing a data wind control model:
this step is performed on the premise that the behavior of the model is observed to change continuously over a period of time, e.g., a PSI greater than 0.2 for 3 months.
The optimization process of the data wind control model comprises business optimization and technical index optimization.
The optimization of the service comprises the following steps: after the independent variable of the data wind control model is selected again and the business rationality of the target variable is considered, the data wind control model is re-developed;
the optimization of the technical indexes comprises the following steps: and checking whether the model indexes cannot be continuously used due to technical or violation reasons and the like, and considering to expand other types of risk data market indexes.
And 5: technical interaction between a fund party (such as a bank) and a data wind control service provider is carried out through a data wind control model interface, and business indexes of the model interface are encrypted by adopting an encryption technology based on a block chain instead of plaintext interaction.
In addition, step 5 is the result display and technical interaction of steps 1-4, and the cooperative parties can formulate the data interaction content of the model interface according to the business requirements and the confidentiality form.
Step 6: the data wind control server side can perform automatic deployment and customized development of the rule engine according to the requirements of the capital side, and the wind control approval efficiency and accuracy of the capital side to the small and micro enterprise owner client are effectively improved.
The main process of the automatic deployment of the rule engine is as follows:
1) preparation of risk data mart:
deployment of the data wind control model and debugging of the rule engine parameters are performed based on the risk data mart. For example, if a loan approval person needs to adjust a certain wind control rule, the loan approval person has two implementation modes, one mode is realized by introducing the rule caliber with a technical developer and modifying a code by a developer background; in addition, the regulation, measurement and online of the rules are carried out based on the rule engine product (the mode requires that business personnel have certain technical understanding capacity), and a skilled rule engine user can also carry out the self-definition and development of data indexes based on the rule engine product.
2) Modification and pre-deduction measurement of data wind control rules:
for example, a loan approval person may want to know the influence on the distribution and passing rate of a group of customers after changing the wind control rule of "18-60 years" into "18-50 years", and the system may display the change of the distribution of the group of customers and the influence on the passing rate after the rules are adjusted after generating a new age distribution index in a rule engine product.
3) And (3) automatic report monitoring of the data wind control model:
the step is a specific technical implementation of the monitoring process of the data wind control model described in the step 4, and calculation (such as PSI and K-S) of the latest model verification indexes of the data wind control model can be formed by automatically updating the indexes of the data wind control model (such as loan repayment and overdue state of a customer are changed every day, and the indexes can be automatically updated in an automatic operation code mode), so that the purpose of automatically monitoring the model is achieved.
In conclusion, the construction method of the data wind control model and the data wind control model provided by the invention can effectively improve the accuracy of the relevance description, estimation and simulation of the risk factors; the safety and the efficiency of risk control and model output result transmission by using a data wind control model can be effectively improved by using an encryption technology based on a block chain; so as to reduce the risk of the influence of the price change of the financial market on the market and provide convenience for the cooperation of the fund party and the client.
The principle and the implementation of the present invention are explained in the present text by applying specific examples, and the above description of the examples is only used to help understanding the method and the core idea of the present invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, the specific embodiments and the application range may be changed. In view of the above, the present disclosure should not be construed as limiting the invention.
Claims (10)
1. A method for constructing a data wind control model is characterized by comprising the following steps:
constructing a client portrait and risk data mart;
setting a business index by using a rule engine;
taking the customer portraits and the data in the risk data marts as data sources;
dividing the data source into a training data set and a verification data set according to a preset proportion;
training a plurality of AI algorithm models by using the training data set based on the service index to obtain an initial training model corresponding to each AI algorithm model;
verifying the initial training model corresponding to each AI algorithm model by using the verification data set based on the service index;
taking an initial training model with a verification result meeting a preset requirement as a data wind control model;
and after the data wind control model is deployed, performing wind control service only on the users registered in the specific block chain.
2. The method of claim 1, wherein constructing a client representation specifically comprises:
and taking data describing basic attributes, behavior, risk indexes and credit investigation information of the client in multiple dimensions as client portrait data.
3. The method for constructing a data wind control model according to claim 1, wherein constructing a risk data mart specifically comprises:
and taking business operation data, financial data and enterprise legal person data as the data of the risk data mart.
4. The method according to claim 1, wherein the set service indicator is encrypted by using a blockchain-based encryption technique.
5. The method for constructing a data wind control model according to claim 1, wherein training an AI algorithm model using the training data set specifically comprises:
and training a neural network model, a regression model and a decision tree model by using the training data set.
6. The method for constructing the data wind control model according to claim 1, wherein the step of using the initial training model with the verification result meeting the preset requirement as the data wind control model specifically comprises:
and evaluating the verification result of the initial training model corresponding to each AI algorithm model based on the service reasonableness and the statistical index performance, and taking the initial training model corresponding to the AI algorithm model with the optimal evaluation result as a data wind control model.
7. The method for constructing the data wind control model according to claim 1, wherein the output result of the data wind control model is called by using a model data interface.
8. The method of constructing a data wind control model according to claim 7,
monitoring data in the customer portrait, data in the risk data mart, an output result of the data wind control model and the called times of the model data interface in real time;
and when the real-time monitored data exceed the preset range, reconstructing the data wind control model.
9. The method for constructing the data wind control model according to claim 7, wherein before the output result of the data wind control model is called, encryption is performed by using a block chain-based encryption technology.
10. A data-wind control model constructed using the method of any one of claims 1 to 9.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110123940.XA CN112817939A (en) | 2021-01-29 | 2021-01-29 | Construction method of data wind control model and data wind control model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110123940.XA CN112817939A (en) | 2021-01-29 | 2021-01-29 | Construction method of data wind control model and data wind control model |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112817939A true CN112817939A (en) | 2021-05-18 |
Family
ID=75860164
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110123940.XA Pending CN112817939A (en) | 2021-01-29 | 2021-01-29 | Construction method of data wind control model and data wind control model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112817939A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113112352A (en) * | 2021-05-27 | 2021-07-13 | 中国工商银行股份有限公司 | Risk service detection model training method, risk service detection method and device |
CN113344700A (en) * | 2021-07-27 | 2021-09-03 | 上海华瑞银行股份有限公司 | Wind control model construction method and device based on multi-objective optimization and electronic equipment |
CN113570468A (en) * | 2021-07-06 | 2021-10-29 | 猪八戒股份有限公司 | Enterprise payment wind control service platform |
CN113642825A (en) * | 2021-05-28 | 2021-11-12 | 浙江惠瀜网络科技有限公司 | Supervision method suitable for vehicle loan cooperation mechanism |
CN114066629A (en) * | 2021-11-15 | 2022-02-18 | 深圳华云信息系统有限公司 | Model management AI platform and model management method |
CN114219378A (en) * | 2022-02-22 | 2022-03-22 | 武汉和悦数字科技有限公司 | Wind control method and system for digital commodities |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107330641A (en) * | 2017-08-18 | 2017-11-07 | 广发证券股份有限公司 | A kind of real-time risk control system of financial derivatives based on Storm stream process framework and regulation engine and method |
CN108399567A (en) * | 2018-03-02 | 2018-08-14 | 数字乾元科技有限公司 | Block chain borrows or lends money method and system |
CN109711665A (en) * | 2018-11-20 | 2019-05-03 | 深圳壹账通智能科技有限公司 | A kind of prediction model construction method and relevant device based on financial air control data |
CN110363394A (en) * | 2019-06-21 | 2019-10-22 | 上海淇馥信息技术有限公司 | A kind of air control method of servicing, device and electronic equipment based on cloud platform |
CN111444416A (en) * | 2020-03-25 | 2020-07-24 | 腾讯科技(深圳)有限公司 | Method, system and device for popularizing financial business |
CN111768285A (en) * | 2019-04-01 | 2020-10-13 | 杭州金智塔科技有限公司 | Credit wind control model construction system and method, wind control system and storage medium |
-
2021
- 2021-01-29 CN CN202110123940.XA patent/CN112817939A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107330641A (en) * | 2017-08-18 | 2017-11-07 | 广发证券股份有限公司 | A kind of real-time risk control system of financial derivatives based on Storm stream process framework and regulation engine and method |
CN108399567A (en) * | 2018-03-02 | 2018-08-14 | 数字乾元科技有限公司 | Block chain borrows or lends money method and system |
CN109711665A (en) * | 2018-11-20 | 2019-05-03 | 深圳壹账通智能科技有限公司 | A kind of prediction model construction method and relevant device based on financial air control data |
CN111768285A (en) * | 2019-04-01 | 2020-10-13 | 杭州金智塔科技有限公司 | Credit wind control model construction system and method, wind control system and storage medium |
CN110363394A (en) * | 2019-06-21 | 2019-10-22 | 上海淇馥信息技术有限公司 | A kind of air control method of servicing, device and electronic equipment based on cloud platform |
CN111444416A (en) * | 2020-03-25 | 2020-07-24 | 腾讯科技(深圳)有限公司 | Method, system and device for popularizing financial business |
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113112352A (en) * | 2021-05-27 | 2021-07-13 | 中国工商银行股份有限公司 | Risk service detection model training method, risk service detection method and device |
CN113642825A (en) * | 2021-05-28 | 2021-11-12 | 浙江惠瀜网络科技有限公司 | Supervision method suitable for vehicle loan cooperation mechanism |
CN113570468A (en) * | 2021-07-06 | 2021-10-29 | 猪八戒股份有限公司 | Enterprise payment wind control service platform |
CN113344700A (en) * | 2021-07-27 | 2021-09-03 | 上海华瑞银行股份有限公司 | Wind control model construction method and device based on multi-objective optimization and electronic equipment |
CN113344700B (en) * | 2021-07-27 | 2024-04-09 | 上海华瑞银行股份有限公司 | Multi-objective optimization-based wind control model construction method and device and electronic equipment |
CN114066629A (en) * | 2021-11-15 | 2022-02-18 | 深圳华云信息系统有限公司 | Model management AI platform and model management method |
CN114219378A (en) * | 2022-02-22 | 2022-03-22 | 武汉和悦数字科技有限公司 | Wind control method and system for digital commodities |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN112817939A (en) | Construction method of data wind control model and data wind control model | |
CN107025509B (en) | Decision making system and method based on business model | |
CN111815439B (en) | Credit scoring system based on cloud platform | |
CN111861716B (en) | Method for generating monitoring early warning level in credit based on software system | |
CN118229420A (en) | Supply chain financial application method based on manageable blockchain privacy transaction | |
Zhao et al. | Toward trustworthy defi oracles: past, present, and future | |
De et al. | Privacy risk analysis | |
Luo et al. | Overview of intelligent online banking system based on HERCULES architecture | |
CN117273935A (en) | Supply chain financial wind control system and method based on blockchain technology | |
CN118552154A (en) | Electronic warranty service flow and wind control management based on big data service | |
CN114493686A (en) | Operation content generation and pushing method and device | |
Ouyang | Risk Control of Virtual Enterprise Based on Distributed Decision‐Making Model | |
Kusuma et al. | Secure Storage of Land Records and Implementation of Land Registration using Ethereum Blockchain | |
Guruprakash et al. | A Framework for Platform-Agnostic Blockchain and IoT Based Insurance System | |
Korman et al. | Technology management through architecture reference models: A smart metering case | |
CN110705817A (en) | Method and device for carrying out wind control evaluation management on enterprise financing data | |
Chen et al. | Event-based data authenticity analytics for IoT and blockchain-enabled ESG disclosure | |
Lee et al. | Impact of Blockchain on Improving Taxpayers’ Compliance: Empirical Evidence from Panel Data Model and Agent-Based Simulation | |
Ruohomaa | The effect of reputation on trust decisions in inter-enterprise collaborations | |
Li et al. | A contract-theoretic cyber insurance for withdraw delay in the blockchain networks with shards | |
CN113240520A (en) | Method for realizing rural finance three-resource supervision by using algorithm link based on data model | |
Engemann et al. | Disaster management of information resources using fuzzy and attitudinal modelling | |
Lee et al. | Estimating potential it security losses: An alternative quantitative approach | |
CN113610616B (en) | Financial investigation relation analysis method and analysis system based on event network | |
KR102491666B1 (en) | Agent system to improve the matching rate between companies and freelancers |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |