CN108446291A - The real-time methods of marking and points-scoring system of user credit - Google Patents

The real-time methods of marking and points-scoring system of user credit Download PDF

Info

Publication number
CN108446291A
CN108446291A CN201711444140.8A CN201711444140A CN108446291A CN 108446291 A CN108446291 A CN 108446291A CN 201711444140 A CN201711444140 A CN 201711444140A CN 108446291 A CN108446291 A CN 108446291A
Authority
CN
China
Prior art keywords
data
user
real
basic data
flow computation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711444140.8A
Other languages
Chinese (zh)
Inventor
刘杰
徐磊
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Samoye Internet Nationwide Financial Services Inc
Original Assignee
Shenzhen Samoye Internet Nationwide Financial Services Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Samoye Internet Nationwide Financial Services Inc filed Critical Shenzhen Samoye Internet Nationwide Financial Services Inc
Priority to CN201711444140.8A priority Critical patent/CN108446291A/en
Publication of CN108446291A publication Critical patent/CN108446291A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Human Resources & Organizations (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • Databases & Information Systems (AREA)
  • General Business, Economics & Management (AREA)
  • Development Economics (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Technology Law (AREA)
  • Educational Administration (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The present invention proposes a kind of real-time methods of marking of user credit, including:Data acquisition step obtains the basic data of user by internet.Acquired basic data is imported into progress real time data processing in data-flow computation cluster by data mart modeling step.Score step, and processed basic data is imported in one or several Rating Models and is scored, wherein Rating Model is established according to data with existing.Data store and feedback step, and basic data, processed basic data and scoring are saved in database, obtain feedback information.Model evaluation and Optimization Steps evaluate and optimize Rating Model and data-flow computation cluster according to basic data, processed basic data, scoring and the feedback information preserved.Step is updated, the Rating Model and data-flow computation cluster after foundation optimization are updated used Rating Model and data-flow computation cluster.The present invention also proposes a kind of real-time points-scoring system of user credit.

Description

The real-time methods of marking and points-scoring system of user credit
Technical field
The present invention relates to Internet technologies, more specifically to the user's evaluation technology of internet financial field.
Background technology
Internet finance has obtained quick development with the development of Internet technology.It is different from traditional financial industry, mutually Financial majority operation network on line, and non-at-scene completion.Internet finance is not necessarily to rapid and convenient, whole on-line operation With the characteristics of in-situ processing, the easy-to-use degree of user is greatly improved, therefore has obtained extensive welcome and rapidly development.
But the internet finance operated on line also faces some difficulties.Air control is the life of finance, different financial forms Requirement to air control is not quite similar.Traditional financial is more biased to air control under line, by acquire on the spot customer data, credit inquiry, The information such as Central Bank's reference, educational background are audited.Air control rhythm is slower under the line of transmission, and Review Cycle is long, but for the knowledge of risk Not and control relatively preferable.In the form of on line occur internet finance, feature be exactly quickly and easily operate, therefore no matter It is that faster rhythm is all pursued by user or enterprise, the obvious speed of mode audited under traditional line is partially slow, cannot meet interconnection Net the demand of finance.In the prior art, audit under original line is had been moved to realize on line by part internet financial company, but just For essence, the pattern audited under line is remained.User is differed only in by internet submission data, after data are collected by enterprise Manual examination and verification are carried out by batch, then auditing result is fed back by internet.Although this pattern and the audit mode under complete line Compared to improving audit speed, but still remain apparent defect:
First, auditing result can not be obtained after user's submission data (in other words in very short cycle) in real time online, also It is to wait for manual examination and verification as a result, the substantial rhythm requirement for not reaching internet finance according to processing batch.
Secondly, it audits essentially identical under the examining content and traditional wire of this audit mode, is only when having compressed audit Between, there is no big variations for the content audited due to audit time reduction, and the discrimination of potential risk may be dropped It is low, bring risk hidden danger to enterprise.
Invention content
The present invention proposes that one kind being based on big data, can carry out the technology of credit scoring to user in real time.
An embodiment according to the present invention proposes a kind of real-time methods of marking of user credit, including:
Data acquisition step obtains the basic data of user by internet;
Acquired basic data is imported into progress real time data in data-flow computation cluster and added by data mart modeling step Work;
Score step, and processed basic data is imported in one or several Rating Models and is scored, wherein scoring Model is established according to data with existing;
Data store and feedback step, and basic data, processed basic data and scoring are saved in database, obtained Negate feedforward information;
Model evaluation and Optimization Steps, according to basic data, processed basic data, scoring and the feedback letter preserved Breath, evaluates and optimizes Rating Model and data-flow computation cluster;
Step is updated, the Rating Model and data-flow computation cluster after foundation optimization are to used Rating Model and data Stream calculation cluster is updated.
In one embodiment, data acquisition step includes:Obtain the identity information of user;Obtain the basis letter of user Breath, the identity information according to user obtain the basic information of the user from one or several third parties by internet.
In one embodiment, data mart modeling step includes:The basic data of user is imported in data-flow computation frame, The data-flow computation frame is Spark data-flow computation frames;According to data classification model, the basic data of user is carried out Classification, data classification model are corresponding with dimension is calculated;Data-flow computation frame uses corresponding classification to each calculating dimension Basic data is calculated in real time;It preserves result of calculation and result of calculation is supplied to each Rating Model.
In one embodiment, user's portrait is obtained according to the basic data of user, the base of different attribute during user draws a portrait Plinth data correspond to different calculating dimensions, are calculated user's portrait of several users according to same calculating dimension, acquisition pair The subdivision customers data of dimension should be calculated.
In one embodiment, feedback information includes the follow-up practical operation behavior of user.
In one embodiment, model evaluation and Optimization Steps include:User class is assessed and optimization, according to single user's Basic data, processed basic data, scoring and feedback information, to Rating Model and data-flow computation cluster carry out assessment and Optimization.Customers' grade assessment and optimization:According to dimension is calculated, basic data, warp according to the user in a subdivision customers Basic data, scoring and the feedback information of processing, evaluate and optimize Rating Model and data-flow computation cluster.
In one embodiment, after each Rating Model and data-flow computation cluster are optimised, to scoring currently in use Model and data-flow computation cluster carry out real-time update.
In one embodiment, Rating Model be according to existing data, by logistic regression, random forest, GBDT or XGBoost is modeled.
In one embodiment, database includes unstructured database Hbase and relevant database Mysql, is used Data transmission middleware Kafka carries out accessing operation to database.
An embodiment according to the present invention proposes a kind of real-time points-scoring system of user credit, including:Data access mouth, Data-flow computation cluster, one or several Rating Models, database and model evaluation and optimization device.Data access mouth passes through mutual Networking obtains the basic data of user.Acquired basic data is imported into data-flow computation cluster, data-flow computation cluster Carry out real time data processing.Processed basic data is imported in Rating Model and is scored, and wherein Rating Model is according to There is data foundation.Basic data, processed basic data and scoring are saved in database.Model evaluation and optimization device Feedback information is obtained, and according to basic data, processed basic data, scoring and the feedback information preserved, to the mould that scores Type and data-flow computation cluster are evaluated and optimized, according to after optimization Rating Model and data-flow computation cluster to score mould Type and data-flow computation cluster are updated.
In one embodiment, data access mouth includes data acquisition facility, and data acquisition facility obtains the identity of user Information and identity information according to user, the basic information of the user is obtained by internet from one or several third parties.
In one embodiment, data-flow computation frame is Spark data-flow computation frames.Data-flow computation frame according to Data classification model classifies to the basic data of user, and data classification model is corresponding with dimension is calculated, data-flow computation Frame calculates each calculating dimension using the basic data of corresponding classification in real time, preservation result of calculation and by result of calculation It is supplied to each Rating Model.
In one embodiment, feedback information includes the follow-up practical operation behavior of user.Model evaluation and optimization device User's portrait also is obtained according to the basic data of user, the basic data of different attribute corresponds to different calculating dimensions during user draws a portrait Degree calculates user's portrait of several users according to same calculating dimension, obtains the subdivision client of the corresponding calculating dimension Group's data.
In one embodiment, the model evaluation and optimization of model evaluation and optimization device progress include:User class is assessed And optimization, according to basic data, processed basic data, scoring and the feedback information of single user, to Rating Model sum number It is evaluated and optimized according to stream calculation cluster.Customers' grade assessment and optimization:According to dimension is calculated, according to a subdivision customers In user basic data, processed basic data, scoring and feedback information, to Rating Model and data-flow computation cluster It is evaluated and optimized.
In one embodiment, model evaluation and optimization device every time carry out Rating Model and data-flow computation cluster excellent After change, real-time update is carried out to Rating Model currently in use and data-flow computation cluster.
In one embodiment, Rating Model be according to existing data, by logistic regression, random forest, GBDT or XGBoost is modeled.Database includes unstructured database Hbase and relevant database Mysql, uses data transmission Middleware Kafka carries out accessing operation to database.
The real-time methods of marking of user credit proposed by the present invention and the real-time points-scoring system of user credit can pass through interconnection Net obtains the basic information of user, makes real-time scoring to user from various dimensions using big data technology and data flow technique, comments Divide and is provided to subsequent processing use.The present invention is also using the follow-up practical operation of user as feedback, to modeling and data flow It is evaluated and optimized, using Machine self-learning principle score-system is constantly evolved.The present invention can be in internet finance User's real-time credit and borrow risk management and control strong data theory be provided and support.
Description of the drawings
The above and other feature of the present invention, property and advantage will pass through description with reference to the accompanying drawings and examples And become apparent, identical reference numeral always shows identical feature in the accompanying drawings, wherein:
Fig. 1 discloses the realization process of the real-time methods of marking of the user credit of an embodiment according to the present invention.
Fig. 2 discloses the structure diagram of the real-time points-scoring system of the user credit of an embodiment according to the present invention.
Specific implementation mode
With the development of big data technology, the more comprehensive information of user can be obtained by big data, from the true of user It carries out to assess the risk of user, the audit than traditional background information is more efficient.Meanwhile big data technology by In quick data-handling capacity, the operation and processing of mass data can be completed within a very short time, and it is " real can to meet user When " demand.Therefore, checking method gradually rises on the line based on big data, becomes the important of internet financial field Audit means.
The present invention proposes that a kind of real-time methods of marking of user credit, Fig. 1 disclose an embodiment according to the present invention The realization process of the real-time methods of marking of user credit.As shown in Figure 1, this method includes:
102, data acquisition step obtains the basic data of user by internet.In one embodiment, data acquisition Step includes:The step of basic information of the step of obtaining the identity information of user and acquisition user.On the basis for obtaining user In the step of information, the identity information according to user obtains the basis of the user from one or several third parties by internet Information.The identity information of user is usually voluntarily to be provided by user, and identity information is typically such as identification card number, name, identity Demonstrate,prove the term of validity, the information of identity card picture.In some embodiments, it is also necessary to which user provides one of cell-phone number as identity information Part.By means of big data technology, the means of such as crawler technology etc are utilized on the internet, it can be according to the identity of user The other information of acquisition of information user on the internet, these information are known as the basic information of user.Basic information can come from One or more third parties, such as:From telecom operators, from banking system, from other internet financial systems, come from Credit investigation system comes from social software, comes from online trading software, from application of function on line etc..Basic information may include:It is logical News record and call detailed list, educational background, with the presence or absence of on blacklist, shopping at network behavior, boat trip information, the behavior of network loaning bill, society The number of handing over the accounts network of personal connections, Web Community's behavior, reference report, refund situation etc..
Acquired basic data imported into data-flow computation cluster and carries out real time data by 104, data mart modeling step Processing.In one embodiment, data mart modeling step includes following process:
The basic data of user is imported in data-flow computation frame, data-flow computation frame is Spark data-flow computations Frame.Spark data-flow computation frames are distributed memory Computational frames, and calculating speed is fast, and can be real with data-stream form The continuous real-time processing of existing data.In one embodiment, the transmission that data are realized using data transmission middleware Kafka, than It is such as imported data in Spark data-flow computation frames by Kafka middlewares, and by Kafka middlewares by Spark The operation result of data-flow computation frame is supplied to Rating Model or is saved in database.Kafka middlewares are that height is handled up The distributed of amount subscribes to message system, subscribes to message with it and realizes that message is shared, which data notice related system receives, Kafka middlewares are suitable for big data quantity, short delay requirement data transmission.
According to data classification model, classify to the basic data of user, data classification model is opposite with dimension is calculated It answers.Data classification model is for the basic data of user to be mapped with the required calculating dimension of Rating Model.According to use The basic data at family obtains user's portrait, and the basic data of different attribute corresponds to different calculating dimensions during user draws a portrait.Citing For, the basic data of user may include:Gender, age, place city, shelter address, place industry, Business Name, duty Position, the report of educational background, education background, reference, shopping at network behavioral statistics, network loaning bill behavioral statistics, refund situation, boat trip number According to, address list and communication, list, social media account, social media network of personal connections, social media be dynamically etc. in detail.Wherein:
Gender and age can be included into primary attribute (the calculating dimension of corresponding age or gender);
Place city and shelter address can be included into Regional Property (the calculating dimension of corresponding region);
Place industry, Business Name, position can be included into working attributes (the calculating dimension of corresponding work);
Educational background and education background can be included into academic attribute (the calculating dimension of corresponding educational background);
Reference report, network loaning bill behavioral statistics, refund situation can be included into reference attribute (the calculating dimension of corresponding reference Degree);
Shopping at network behavioral statistics and boat trip data can be included into behavior property (the calculating dimension of corresponding behavior);
List, social media account, social media network of personal connections, social media dynamic can be included into society in detail for address list and communication Attribute of a relation (the calculating dimension of correspondence net).
Data-flow computation frame calculates each calculating dimension using the basic data of corresponding classification in real time.As above Described, when needing to calculate some calculating dimension, the basic data of corresponding attribute can be selected to carry out operation.
It preserves result of calculation and result of calculation is supplied to each Rating Model.
106, score step, and processed basic data is imported in one or several Rating Models and is scored, wherein Rating Model is established according to data with existing.In one embodiment, Rating Model be according to existing data in database, by Logistic regression, random forest, GBDT or XGBoost, which are modeled, to be obtained.Each Rating Model carries out any scoring, How to score, this is according to strategy decision.The specific type and Rating Model modeling process of strategy and Rating Model do not exist In the range of the present invention discusses, the present invention is the direct utilization for the Rating Model of modeled completion.
108, data storage and feedback step, database is saved in by basic data, processed basic data and scoring In, obtain feedback information.In one embodiment, database used in the present invention include unstructured database Hbase and Relevant database Mysql carries out accessing operation using data transmission middleware Kafka to database.Hbase is distributed column Formula unstructured database, inquiry velocity is fast, and basic data, processed basic data and scoring are saved in Hbase data In library, the requirement of real-time query can be met.Mysql is relevant database, and user preserves partial structured configuration information.Feedback Information refers mainly to the agenda of user.The appraisal result that the methods of marking of user credit is obtained is the basic number according to user It is whether correct in order to verify " estimated data ", it is also necessary to after acquisition according to " estimated data " made with existing historical data Continuous real data is verified.Such as in the scoring for user credit, " estimated data " of scoring represents user Refund wish and loan repayment capacity assessed value, but whether user is really refunded, it is also necessary to according to the practical row of user To be judged.Therefore, in one embodiment, feedback information includes the follow-up practical operation behavior of user.
110, model evaluation and Optimization Steps;According to the basic data, processed basic data, scoring and anti-preserved Feedforward information evaluates and optimizes Rating Model and data-flow computation cluster.In one embodiment, to Rating Model sum number It is evaluated and optimized including two levels according to stream calculation cluster:User class assess and optimization and customers grade assessment and it is excellent Change.User class assessment and optimization are basic data, processed basic data, scoring and the feedback information according to single user, Rating Model and data-flow computation cluster are evaluated and optimized.Customers grade assessment and optimization be according to calculate dimension, according to According to basic data, processed basic data, scoring and the feedback information of the user in a subdivision customers, to Rating Model It is evaluated and optimized with data-flow computation cluster.So-called customers and its preparation method are as follows:Basic data according to user User's portrait is obtained, the basic data of different attribute corresponds to different calculating dimensions during user draws a portrait, according to same calculating dimension User's portrait of several users is calculated, the subdivision customers data of the corresponding calculating dimension are obtained.Front is returned to be lifted Example, the basic data of user may include:Gender, the age, place city, shelter address, place industry, Business Name, Position, educational background, education background, reference report, shopping at network behavioral statistics, network loaning bill behavioral statistics, refund situation, boat trip number According to, address list and communication, list, social media account, social media network of personal connections, social media be dynamically etc. in detail.
Gender and age can be included into primary attribute (the calculating dimension of corresponding age or gender);
Place city and shelter address can be included into Regional Property (the calculating dimension of corresponding region);
Place industry, Business Name, position can be included into working attributes (the calculating dimension of corresponding work);
Educational background and education background can be included into academic attribute (the calculating dimension of corresponding educational background);
Reference report, network loaning bill behavioral statistics, refund situation can be included into reference attribute (the calculating dimension of corresponding reference Degree);
Shopping at network behavioral statistics and boat trip data can be included into behavior property (the calculating dimension of corresponding behavior);
List, social media account, social media network of personal connections, social media dynamic can be included into society in detail for address list and communication Attribute of a relation (the calculating dimension of correspondence net).
Dimension is calculated according to the age, can filter out the age falls in a certain range, such as 20-22 Sui young client Group.Alternatively, combining gender according to the age, 20-22 Sui male's youth customers can be filtered out.
For another example, according to educational background calculate dimension combine the age calculate dimension, 20-22 Sui can be filtered out, with undergraduate course with The well educated young customers of upper educational background.
By the combination of different calculating dimensions, the subdivision customers with different attribute can be obtained.By each subdivision The feedback data of customers, can be to Rating Model and data-flow computation cluster compared with scoring obtains before " estimated data " (mainly data classification model therein) is assessed.The assessment can effectively find Rating Model and data-flow computation cluster For the inadaptability of specific subdivision customers, such as some subdivision customers, the reality of " estimated data " and feedback When border data difference is larger, just illustrating Rating Model and data-flow computation cluster, there are blind areas for the subdivision customers, uncomfortable The characteristics of customers should be segmented.(it is mainly to Rating Model and data-flow computation cluster according to actual feedback later Data classification model therein) it optimizes, optimization point exists primarily directed to " estimated data " with the real data of feedback The point of significant difference.The optimization can combine model optimization and strategy, the best applications that analysis Rating Model and strategy combine Scheme ensures that Rating Model all has preferable validity and stability in each subdivision customers.It is excellent about Rating Model That changes has scheme, not within the scope of the discussion of the present invention.
112, update step, according to after optimization Rating Model and data-flow computation cluster to used Rating Model and Data-flow computation cluster is updated.In one embodiment, it is right after each Rating Model and data-flow computation cluster are optimised Rating Model currently in use and data-flow computation cluster carry out real-time update.Since the present invention is using Data Stream Processing side Formula, the data of each user for receiving be handle in real time, so, complete certain primary user class assessment and After optimization or customers' grade assessment and optimization, after Rating Model and data-flow computation cluster optimize, immediately to current Rating Model currently in use and data-flow computation cluster are updated, in this way, the data of next user can use optimization Rating Model afterwards and the processing of data-flow computation cluster.
It should be noted that the side that the real-time methods of marking of the user credit of the present invention is handled in real time using data flow Formula, for the angle of single user, data acquisition step, data mart modeling step, scoring step, data storage and feedback step Suddenly, model evaluation and Optimization Steps and update step are to execute successively.For the angle of holistic approach, due in synchronization It has many consumers and is handled synchronizing, the stage residing for each user is different, so from the point of view of holistic approach, data obtain Take step, data mart modeling step, scoring step, data storage and feedback step, model evaluation and Optimization Steps and update step It can be alternately performed, or be carried out at the same time.Although therefore being numbered to each step in above description, which is For convenience, not limit each step executes sequence.
Present invention further teaches a kind of real-time points-scoring systems of user credit, refering to what is shown in Fig. 2, Fig. 2 is disclosed according to this The structure diagram of the real-time points-scoring system of the user credit of one embodiment of invention.The real-time points-scoring system packet of the user credit It includes:Data access mouth 202, data-flow computation cluster 204, one or several Rating Models 206, database 208 and model evaluation And optimization device 210.
Data access mouth 202 obtains the basic data of user by internet.In one embodiment, data access mouth 202 include data acquisition facility, and data acquisition facility obtains the identity information of user and the identity information according to user, by mutual Networking obtains the basic information of the user from one or several third parties.Data access mouth 202 executes data acquisition step above-mentioned Rapid 102, detail is not repeated to describe herein.
Acquired basic data is imported into data-flow computation cluster 204, and data-flow computation cluster 204 is counted in real time According to processing.In one embodiment, data-flow computation frame 204 is Spark data-flow computation frames.Data-flow computation frame 204 classify to the basic data of user according to data classification model, and data classification model is corresponding with dimension is calculated, data Stream calculation frame calculates each calculating dimension using the basic data of corresponding classification in real time, and preservation result of calculation simultaneously will meter It calculates result and is supplied to each Rating Model.Data-flow computation cluster 204 executes data mart modeling step 104 above-mentioned, detail It is not repeated to describe herein.
Processed basic data is imported in one or several Rating Models 206 and is scored, and wherein Rating Model is root It is established according to data with existing.In one embodiment, Rating Model 206 is according to existing data, by logistic regression, random gloomy Woods, GBDT or XGBoost are modeled.Rating Model 206 executes scoring step 106 above-mentioned, and detail is no longer heavy herein Multiple description.
Database 208 is for preserving basic data, processed basic data and scoring.In one embodiment, data Library includes unstructured database Hbase and relevant database Mysql, using data transmission middleware Kafka to database Carry out accessing operation.Database 208 executes the data storage link in data storage above-mentioned and feedback step 108, specific thin Section is not repeated to describe herein.
Model evaluation and optimization device 210 obtain feedback information, and according to preserved basic data, processed basis Data, scoring and feedback information evaluate and optimize Rating Model and data-flow computation cluster, according to the scoring after optimization Model and data-flow computation cluster are updated Rating Model and data-flow computation cluster.In one embodiment, feedback letter Breath includes the follow-up practical operation behavior of user.Model evaluation and optimization device 210 obtain user according to the basic data of user Portrait, the basic data of different attribute corresponds to different calculating dimensions during user draws a portrait, according to same calculating dimension to several use User's portrait at family calculates, and obtains the subdivision customers data of the corresponding calculating dimension.Obtain subdivision customers it Afterwards, the model evaluation and optimization that model evaluation and optimization device 210 carry out include two levels:User class assess and optimization and Customers' grade assessment and optimization.User class is assessed and is optimized the basic data according to single user, processed basic data, comments Point and feedback information, Rating Model and data-flow computation cluster are evaluated and optimized.Customers' grade assessment and optimization basis Dimension is calculated, basic data, processed basic data, scoring and feedback letter according to the user in a subdivision customers Breath, evaluates and optimizes Rating Model and data-flow computation cluster.In one embodiment, in model evaluation and optimization dress Set 210 Rating Model and data-flow computation cluster are optimized every time after, to Rating Model currently in use and data flowmeter It calculates cluster and carries out real-time update.Model evaluation and the feedback element in the optimization execution aforementioned feedback step 108 of device 210, model Assessment and Optimization Steps 110 and update step 112, detail are not repeated to describe herein.
The real-time methods of marking of user credit proposed by the present invention and the real-time points-scoring system of user credit can pass through interconnection Net obtains the basic information of user, makes real-time scoring to user from various dimensions using big data technology and data flow technique, comments Divide and is provided to subsequent processing use.The present invention is also using the follow-up practical operation of user as feedback, to modeling and data flow It is evaluated and optimized, using Machine self-learning principle score-system is constantly evolved.The present invention can be in internet finance User's real-time credit and borrow risk management and control strong data theory be provided and support.
Above-described embodiment, which is available to, to be familiar with person in the art to realize or use the present invention, and is familiar with this field Personnel can make various modifications or variation, thus this to above-described embodiment without departing from the present invention in the case of the inventive idea The protection domain of invention is not limited by above-described embodiment, and should meet inventive features that claims are mentioned most On a large scale.

Claims (16)

1. a kind of real-time methods of marking of user credit, which is characterized in that including:
Data acquisition step obtains the basic data of user by internet;
Acquired basic data is imported into progress real time data processing in data-flow computation cluster by data mart modeling step;
Score step, and processed basic data is imported in one or several Rating Models and is scored, wherein Rating Model It is to be established according to data with existing;
Data store and feedback step, and basic data, processed basic data and scoring are saved in database, obtain anti- Feedforward information;
Model evaluation and Optimization Steps, according to basic data, processed basic data, scoring and the feedback information preserved, Rating Model and data-flow computation cluster are evaluated and optimized;
Step is updated, the Rating Model and data-flow computation cluster after foundation optimization are to used Rating Model and data flowmeter Cluster is calculated to be updated.
2. the real-time methods of marking of user credit as described in claim 1, which is characterized in that the data acquisition step packet It includes:
Obtain the identity information of user;
The basic information for obtaining user, according to the identity information of user, by internet, being obtained from one or several third parties should The basic information of user.
3. the real-time methods of marking of user credit as described in claim 1, which is characterized in that the data mart modeling step packet It includes:
The basic data of user is imported in data-flow computation frame, the data-flow computation frame is Spark data-flow computations Frame;
According to data classification model, classify to the basic data of user, data classification model is corresponding with dimension is calculated;
Data-flow computation frame calculates each calculating dimension using the basic data of corresponding classification in real time;
It preserves result of calculation and result of calculation is supplied to each Rating Model.
4. the real-time methods of marking of user credit as claimed in claim 3, which is characterized in that the basic data according to user obtains User's portrait is obtained, the basic data of different attribute corresponds to different calculating dimensions during user draws a portrait, according to same calculating dimension pair User's portrait of several users calculates, and obtains the subdivision customers data of the corresponding calculating dimension.
5. the real-time methods of marking of user credit as claimed in claim 4, which is characterized in that the feedback information includes user Follow-up practical operation behavior.
6. the real-time methods of marking of user credit as claimed in claim 5, which is characterized in that the model evaluation and optimization step Suddenly include:
User class is assessed and optimization, basic data, processed basic data, scoring and the feedback information of foundation single user, Rating Model and data-flow computation cluster are evaluated and optimized;
Customers' grade assessment and optimization:According to dimension is calculated, according to the basic data of the user in a subdivision customers, through adding Basic data, scoring and the feedback information of work, evaluate and optimize Rating Model and data-flow computation cluster.
7. the real-time methods of marking of user credit as claimed in claim 6, which is characterized in that each Rating Model and data flow After computing cluster is optimised, real-time update is carried out to Rating Model currently in use and data-flow computation cluster.
8. the real-time methods of marking of user credit as described in claim 1, which is characterized in that the Rating Model is according to Some data are modeled by logistic regression, random forest, GBDT or XGBoost.
9. the real-time methods of marking of user credit as described in claim 1, which is characterized in that the database includes non-structural Change database Hbase and relevant database Mysql, accessing operation is carried out to database using data transmission middleware Kafka.
10. a kind of real-time points-scoring system of user credit, which is characterized in that including:
Data access mouth obtains the basic data of user by internet;
Data-flow computation cluster, acquired basic data are imported into data-flow computation cluster, and data-flow computation cluster carries out Real time data is processed;
One or several Rating Models, processed basic data are imported in Rating Model and are scored, and wherein Rating Model is It is established according to data with existing;
Basic data, processed basic data and scoring are saved in database by database;
Model evaluation and optimization device, obtain feedback information, and according to preserved basic data, processed basic data, Scoring and feedback information, evaluate and optimize Rating Model and data-flow computation cluster, according to the Rating Model after optimization Rating Model and data-flow computation cluster are updated with data-flow computation cluster.
11. the real-time points-scoring system of user credit as claimed in claim 10, which is characterized in that the data access mouth includes Data acquisition facility, data acquisition facility obtain the identity information of user and identity information according to user, by internet from One or several third parties obtain the basic information of the user.
12. the real-time points-scoring system of user credit as claimed in claim 10, which is characterized in that the data-flow computation frame It is Spark data-flow computation frames;
Data-flow computation frame classifies to the basic data of user according to data classification model, data classification model and calculating Dimension is corresponding, and data-flow computation frame calculates each calculating dimension using the basic data of corresponding classification in real time, protects It deposits result of calculation and result of calculation is supplied to each Rating Model.
13. the real-time points-scoring system of user credit as claimed in claim 12, which is characterized in that
The feedback information includes the follow-up practical operation behavior of user;
The model evaluation and optimization device also obtain user's portrait, different attribute during user draws a portrait according to the basic data of user Basic data correspond to different calculating dimensions, the user of several users portrait is calculated according to same calculating dimension, is obtained The subdivision customers data of the calculating dimension must be corresponded to.
14. the real-time points-scoring system of user credit as claimed in claim 13, which is characterized in that the model evaluation and optimization The model evaluation of device progress and optimization include:
User class is assessed and optimization, basic data, processed basic data, scoring and the feedback information of foundation single user, Rating Model and data-flow computation cluster are evaluated and optimized;
Customers' grade assessment and optimization:According to dimension is calculated, according to the basic data of the user in a subdivision customers, through adding Basic data, scoring and the feedback information of work, evaluate and optimize Rating Model and data-flow computation cluster.
15. the real-time points-scoring system of user credit as claimed in claim 10, which is characterized in that model evaluation and optimization device After being optimized every time to Rating Model and data-flow computation cluster, to Rating Model currently in use and data-flow computation cluster Carry out real-time update.
16. the real-time points-scoring system of user credit as claimed in claim 10, which is characterized in that
The Rating Model is modeled by logistic regression, random forest, GBDT or XGBoost according to existing data;
The database includes unstructured database Hbase and relevant database Mysql, uses data transmission middleware Kafka carries out accessing operation to database.
CN201711444140.8A 2017-12-27 2017-12-27 The real-time methods of marking and points-scoring system of user credit Pending CN108446291A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711444140.8A CN108446291A (en) 2017-12-27 2017-12-27 The real-time methods of marking and points-scoring system of user credit

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711444140.8A CN108446291A (en) 2017-12-27 2017-12-27 The real-time methods of marking and points-scoring system of user credit

Publications (1)

Publication Number Publication Date
CN108446291A true CN108446291A (en) 2018-08-24

Family

ID=63190740

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711444140.8A Pending CN108446291A (en) 2017-12-27 2017-12-27 The real-time methods of marking and points-scoring system of user credit

Country Status (1)

Country Link
CN (1) CN108446291A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109360085A (en) * 2018-09-27 2019-02-19 中国银行股份有限公司 A kind of bank client responsible investigation method and system
CN109472439A (en) * 2018-09-13 2019-03-15 深圳市买买提信息科技有限公司 Credit estimation method, device, equipment and system
CN109815257A (en) * 2019-01-16 2019-05-28 四川驹马科技有限公司 Scalable real-time High Availabitity portrait algorithm service method and its system
CN110399988A (en) * 2019-07-31 2019-11-01 中国工商银行股份有限公司 Equipment portrait generation method and system
CN112084486A (en) * 2020-09-08 2020-12-15 中国平安财产保险股份有限公司 User information verification method and device, electronic equipment and storage medium
CN112258314A (en) * 2020-10-19 2021-01-22 天元大数据信用管理有限公司 Financial wind-control credit investigation system and method based on flow calculation technology
CN112347343A (en) * 2020-09-25 2021-02-09 北京淇瑀信息科技有限公司 Customized information pushing method and device and electronic equipment
CN112446555A (en) * 2021-01-26 2021-03-05 支付宝(杭州)信息技术有限公司 Risk identification method, device and equipment

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1680953A (en) * 2004-07-05 2005-10-12 中国银行股份有限公司 Risk analyzing system and method for customer of financial enterprise
CN101493913A (en) * 2008-01-23 2009-07-29 阿里巴巴集团控股有限公司 Method and system for assessing user credit in internet
CN105894336A (en) * 2016-05-25 2016-08-24 北京比邻弘科科技有限公司 Mobile Internet-based big data mining method and system
CN107194715A (en) * 2017-04-07 2017-09-22 广东精点数据科技股份有限公司 The construction method of social action data model
CN107330785A (en) * 2017-07-10 2017-11-07 广州市触通软件科技股份有限公司 A kind of petty load system and method based on the intelligent air control of big data

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1680953A (en) * 2004-07-05 2005-10-12 中国银行股份有限公司 Risk analyzing system and method for customer of financial enterprise
CN101493913A (en) * 2008-01-23 2009-07-29 阿里巴巴集团控股有限公司 Method and system for assessing user credit in internet
CN105894336A (en) * 2016-05-25 2016-08-24 北京比邻弘科科技有限公司 Mobile Internet-based big data mining method and system
CN107194715A (en) * 2017-04-07 2017-09-22 广东精点数据科技股份有限公司 The construction method of social action data model
CN107330785A (en) * 2017-07-10 2017-11-07 广州市触通软件科技股份有限公司 A kind of petty load system and method based on the intelligent air control of big data

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109472439A (en) * 2018-09-13 2019-03-15 深圳市买买提信息科技有限公司 Credit estimation method, device, equipment and system
CN109360085A (en) * 2018-09-27 2019-02-19 中国银行股份有限公司 A kind of bank client responsible investigation method and system
CN109815257A (en) * 2019-01-16 2019-05-28 四川驹马科技有限公司 Scalable real-time High Availabitity portrait algorithm service method and its system
CN110399988A (en) * 2019-07-31 2019-11-01 中国工商银行股份有限公司 Equipment portrait generation method and system
CN112084486A (en) * 2020-09-08 2020-12-15 中国平安财产保险股份有限公司 User information verification method and device, electronic equipment and storage medium
CN112347343A (en) * 2020-09-25 2021-02-09 北京淇瑀信息科技有限公司 Customized information pushing method and device and electronic equipment
CN112347343B (en) * 2020-09-25 2024-05-28 北京淇瑀信息科技有限公司 Custom information pushing method and device and electronic equipment
CN112258314A (en) * 2020-10-19 2021-01-22 天元大数据信用管理有限公司 Financial wind-control credit investigation system and method based on flow calculation technology
CN112446555A (en) * 2021-01-26 2021-03-05 支付宝(杭州)信息技术有限公司 Risk identification method, device and equipment
CN112446555B (en) * 2021-01-26 2021-05-25 支付宝(杭州)信息技术有限公司 Risk identification method, device and equipment

Similar Documents

Publication Publication Date Title
CN108446291A (en) The real-time methods of marking and points-scoring system of user credit
JP7529372B2 (en) COMPUTER-IMPLEMENTED SYSTEM AND METHOD FOR GENERATING AND EXTRACTION OF USER-RELATED DATA STORED ON A BLOCKCHAIN
Morris et al. Social value of public information
CN111784508A (en) Enterprise risk assessment method and device and electronic equipment
CN109977151A (en) A kind of data analysing method and system
KR20180041174A (en) Risk Assessment Methods and Systems
CN111476660B (en) Intelligent wind control system and method based on data analysis
CN110188198A (en) A kind of anti-fraud method and device of knowledge based map
WO2021254027A1 (en) Method and apparatus for identifying suspicious community, and storage medium and computer device
CN109635007B (en) Behavior evaluation method and device and related equipment
CN105308640A (en) Methods and systems for automatically generating high quality adverse action notifications
CN107038511A (en) A kind of method and device for determining risk assessment parameter
CN108492001A (en) A method of being used for guaranteed loan network risk management
CN110119980A (en) A kind of anti-fraud method, apparatus, system and recording medium for credit
CN107274042A (en) A kind of business participates in the Risk Identification Method and device of object
CN112950350B (en) Loan product recommendation method and system based on machine learning
CN113159930A (en) Customer group identification method and device based on economic dependency relationship
CN114820219B (en) Complex network-based fraud community identification method and system
Zhao et al. Network-based feature extraction method for fraud detection via label propagation
CN109544299A (en) Buyer's identity ranking method, equipment and the storage medium of platform are ensured based on transaction
CN114693428A (en) Data determination method and device, computer readable storage medium and electronic equipment
TWI720638B (en) Deposit interest rate bargaining adjustment system and method thereof
You et al. Evaluating reputation of internet financial platform: An improved fuzzy evaluation approach
CN112785331A (en) Injection attack resistant robust recommendation method and system combining evaluation text
CN110147938A (en) A kind of training sample generation method, device, system and recording medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180824

RJ01 Rejection of invention patent application after publication