CN111667356A - Multidimensional big data intelligent risk screening system - Google Patents

Multidimensional big data intelligent risk screening system Download PDF

Info

Publication number
CN111667356A
CN111667356A CN202010477432.7A CN202010477432A CN111667356A CN 111667356 A CN111667356 A CN 111667356A CN 202010477432 A CN202010477432 A CN 202010477432A CN 111667356 A CN111667356 A CN 111667356A
Authority
CN
China
Prior art keywords
data
module
risk
screening
file
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010477432.7A
Other languages
Chinese (zh)
Inventor
陈建
龙泳先
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Ruizhi Tuyuan Technology Co ltd
Original Assignee
Beijing Ruizhi Tuyuan Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Ruizhi Tuyuan Technology Co ltd filed Critical Beijing Ruizhi Tuyuan Technology Co ltd
Priority to CN202010477432.7A priority Critical patent/CN111667356A/en
Publication of CN111667356A publication Critical patent/CN111667356A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • G06F21/6245Protecting personal data, e.g. for financial or medical purposes

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • General Health & Medical Sciences (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • Technology Law (AREA)
  • Strategic Management (AREA)
  • Marketing (AREA)
  • Development Economics (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computer Hardware Design (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention discloses a multidimensional big data intelligent risk screening system, relating to the technical field of data processing; for risk screening based on big data; the system specifically comprises a data source module, a data preprocessing module, a data modeling module, a rating module, a risk screening module and an interaction module, wherein the data source module comprises a data collector and dealer service data. The invention obtains related data by a data source module in multiple parties, and calculates various derived variables and calculates scores and risk levels, during the period, the accuracy, the integrity and the consistency of the data are ensured by a data preprocessing module and a data modeling module, a bank or other organizations count the matching key information of an applicant into a file, an operator puts the file into a specified directory to request early warning level information, a risk screening module performs data matching, returns the characteristic data of the applicant according to a data service contract by an interaction module, and the file is stored in a disk and automatically deleted at regular time after being transmitted.

Description

Multidimensional big data intelligent risk screening system
Technical Field
The invention relates to the technical field of data processing, in particular to a multidimensional big data intelligent risk screening system.
Background
With the rapid development of credit business in recent years, the change of policy environment and the continuous aggravation of market competition, the situation of customers changes rapidly, the importance of post-loan inspection is more prominent, in order to better prevent the risk of credit business from being degraded, the post-loan management quality is further improved, the risk prevention and control capability is enhanced, companies provide post-loan risk screening services, banks, small loan companies and consumption financial institutions all have the requirements of post-loan risk screening, and a risk screening system can identify high risk groups through the credit scoring of customers, fast check score segments, customer figures, customer early warning levels, online time, online states, offline risk early warning, anti-fraud indexes and other scoring services, is used for prejudging the level of fraud risk, and provides full-process risk early warning before, during and after the loan.
Through retrieval, a Chinese patent application No. 201510620821.X discloses a screening system and a screening method for big data criminal partners, and the screening system comprises a screening system for big data criminal partners, wherein an acquisition module is used for acquiring signaling data, a preprocessing module is used for performing correlation analysis on the signaling data, a summarizing module is used for summarizing the correlation analysis data, and a page display module is used for displaying according to a query result; a big data criminal partnering method comprises the steps of collecting signaling data of a designated user, carrying out correlation analysis and summarization on the signaling data, and displaying the criminal partnering track through a geographic information platform. The screening system and method for big data criminal partnerships in the above patent have the following disadvantages: early warning and risk screening cannot be performed according to the obtained big data related information.
Disclosure of Invention
The invention aims to solve the defects in the prior art, and provides a multi-dimensional big data intelligent risk screening system.
In order to achieve the purpose, the invention adopts the following technical scheme:
a multidimensional big data intelligent screening risk system comprises a data source module, a data preprocessing module, a data modeling module, a rating module, a risk screening module and an interaction module, wherein the data source module comprises a data collector, dealer business data, partner data and a third-party data market; the data preprocessing comprises a data cleaning technology, a data reduction technology, a data integration technology and a data transformation technology; the data modeling module is used for establishing a mathematical model by using logistic regression for predicting the risk of the client; the rating module checks people with low repayment probability according to the data obtained by the data modeling module, and specifically, the people can be divided into A, B, C, D, E, F and G, and 7 risk levels; the risk screening module is in communication connection with the rating module and comprises a data matching module; the interaction module is in communication connection with the risk screening module.
Preferably: the data acquisition unit is client behavior information acquired by software modes such as API, SDK, JS and the like at a PC end or a mobile end.
Preferably: the security dealer service data mainly comprises centralized bidding transaction information of buying and selling such as bulk transaction, agreement transfer, after-hand transaction and the like of security trading in a public and centralized mode, and investment system data of buying and selling on online investment platforms of security dealers, investment analysis decision systems and other investment systems of users.
Preferably: the data of the partner is mainly data information which is provided by an organization having a cooperative relationship with a software developer and reflects the behavior preference, consumption condition and other relevant conditions of a client, and comprises public number data, e-commerce station data and media data.
Preferably: the third party data market comprises blacklist data providing institutions, telecommunication consumption data providing institutions, financial consumption data providing institutions and other data providing institutions.
Preferably: the data cleaning technology is used for cleaning noise in data and correcting inconsistency; the data reduction technology reduces the scale of data by sniping, deleting redundant features or clustering; data integration techniques consolidate data from multiple data sources into a coherent data store, such as a data warehouse; data transformation techniques compress data to a smaller interval, such as 0.0 to 1.0.
Preferably: the L function in the logistic regression generally uses a sigmoid function
Figure BDA0002516262370000031
Logistic regressionHas a loss function of L (y1, y2) — (y2log (y1)) + (1-y2) log (1-y 1); defining an average of a loss function of m training samples of a cost function
Figure BDA0002516262370000032
And measuring the average error cost between the predicted result and the real result, wherein the optimization aims at minimizing a cost function J (w, b), the effect of optimizing the model can be achieved by minimizing the cost function, and the optimization of the cost function can be realized by a gradient descent method.
Preferably: the interaction module counts the matching key information of the applicant into a file for a client, an operator puts the file into a specified directory to request early warning grade information, the interaction module feeds back the early warning grade information to the risk screening module, acquires related information according to the rating module, and returns the characteristic data of the applicant according to a data service contract.
Preferably: the risk screening module generates monitoring service data (concurrency, error number, slow query and the like) and sends the monitoring service data to the monitoring center in the process of screening service, the monitoring center gives an alarm according to a monitoring rule and generates a system report, the alarming functions of mails, short messages, calls and the like are completed through the message center, and the monitoring center can also set a service system and complete dynamic service capability configuration of the system.
Preferably: the input and output data in the interactive module are files, each line in the files is all input and output data called once, and the interactive module is internally provided with a timing clearing unit.
Preferably: the data cleansing technique clears noise in the data, correcting inconsistencies comprising:
a1, determining the customer behavior information;
training and sorting the collected customer behavior information as S, wherein S can be expressed as:
Figure BDA0002516262370000041
wherein x isijIs the j attribute of the customer behavior informationTraining values of i times, wherein the value of i is from 1 to m, m is the training times of the customer behavior information, the value of j is from 1 to n, and n is the number of attributes contained in the customer behavior information;
a2, calculating a cleaning threshold value;
Figure BDA0002516262370000042
wherein, βjA cleaning threshold value of the jth attribute, k is a preset correction parameter, and E is a dynamic range of a standard deviation;
a3, screening noise in data;
Figure BDA0002516262370000051
wherein λ isijFor the screening result, 1 represents the ith training value x of the jth attribute of the customer behavior informationijWithout correction, 0 represents the ith training value x of the jth attribute of the customer behavior informationijStep A4 is required for correction;
a4, correcting inconsistent data;
tij=MEDIAN(x1j:xmj),λij=0
wherein, tijIs xijCorrected data, MEDIAN (x)1j:xmj) Is a function of the median value.
The invention has the beneficial effects that: the data source module acquires related data from multiple parties and is used for calculating various derived variables and calculating grades and risk levels, the accuracy, integrity and consistency of the data are guaranteed through the data preprocessing module and the data modeling module, a bank or other organizations count the matching key information of the applicant into one file, the operator places the file in a designated directory to request early warning level information, the risk screening module performs data matching, the interactive module returns the characteristic data of the applicant according to the data service contract, the file timing automatic deleting mechanism stored in the disk after transmission is started, the grading request sensitive data submitted by the bank is not stored in the disk, the data obtained from each data partner is only used for carrying out early warning level calculation and generating risk index variables, and the personal sensitive data can not be stored in a disk in a landing manner.
Drawings
Fig. 1 is a schematic flow structure diagram of a multidimensional big data intelligent risk screening system provided by the present invention;
fig. 2 is a schematic diagram of a sigmoid function image of a multi-dimensional big data intelligent risk screening system provided by the invention.
Detailed Description
The technical solution of the present patent will be described in further detail with reference to the following embodiments.
Reference will now be made in detail to embodiments of the present patent, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present patent and are not to be construed as limiting the present patent.
In the description of this patent, it is to be understood that the terms "center," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like are used in the orientations and positional relationships indicated in the drawings for the convenience of describing the patent and for the simplicity of description, and are not intended to indicate or imply that the referenced devices or elements must have a particular orientation, be constructed and operated in a particular orientation, and are not to be considered limiting of the patent.
In the description of this patent, it is noted that unless otherwise specifically stated or limited, the terms "mounted," "connected," and "disposed" are to be construed broadly and can include, for example, fixedly connected, disposed, detachably connected, disposed, or integrally connected and disposed. The specific meaning of the above terms in this patent may be understood by those of ordinary skill in the art as appropriate.
Example 1:
a multi-dimensional big data intelligent risk screening system is shown in fig. 1 and fig. 2 and comprises a data source module, a data preprocessing module, a data modeling module, a rating module, a risk screening module and an interaction module; the data source module comprises a data collector, dealer service data, partner data and a third-party data market; the data preprocessing comprises a data cleaning technology, a data reduction technology, a data integration technology and a data transformation technology; the data modeling module is used for establishing a mathematical model by using logistic regression for predicting the risk of the client; the rating module checks people with low repayment probability according to the data obtained by the data modeling module, and specifically, the people can be divided into 7 risk levels A, B, C, D, E, F and G; the risk screening module is in communication connection with the rating module and comprises a data matching module; the interaction module is in communication connection with the risk screening module.
The data acquisition unit is client behavior information acquired by software modes such as API, SDK, JS and the like at a PC end or a mobile end.
The security dealer service data mainly comprises centralized bidding transaction information of buying and selling such as bulk transaction, agreement transfer, after-hand transaction and the like of security trading in a public and centralized mode, and investment system data of buying and selling on online investment platforms of security dealers, investment analysis decision systems and other investment systems of users.
The data of the partner is mainly data information which is provided by an organization having a cooperative relationship with a software developer and reflects the behavior preference, consumption condition and other relevant conditions of a client, and comprises public number data, e-commerce station data, media data and the like.
The third party data market comprises blacklist data providing institutions, telecommunication consumption data providing institutions, financial consumption data providing institutions and other data providing institutions.
The data cleaning technology is used for cleaning noise in data and correcting inconsistency; the data reduction technology reduces the scale of data by sniping, deleting redundant features or clustering; data integration techniques consolidate data from multiple data sources into a coherent data store, such as a data warehouse; the data transformation technology compresses data to a smaller interval, such as 0.0 to 1.0, and can improve the accuracy and efficiency of a mining algorithm for designing distance measurement.
And w and b in the logistic regression are parameters to be solved, the logistic regression corresponds w x + b to a hidden state P through a function L, P (w x + b), then the value of the dependent variable is determined according to the size of P and 1-P, and if L is the logistic function, the logistic regression is carried out.
The L function in the logistic regression generally uses a sigmoid function
Figure BDA0002516262370000081
The loss function of logistic regression is L (y1, y2) — (y2log (y1)) + (1-y2) log (1-y 1); defining an average of a loss function of m training samples of a cost function
Figure BDA0002516262370000082
The method measures the average error cost between the predicted result and the real result, the optimization aims at minimizing the cost function J (w, b), the effect of optimizing the model can be achieved by minimizing the cost function, and the optimization of the cost function can be realized by a gradient descent method.
In the gradient descent method, the updating mode of w and b is
Figure BDA0002516262370000083
Figure BDA0002516262370000084
For learning rate learning-rate representing the step size of the move, gradient
Figure BDA0002516262370000085
That is, the slope of the current point specifies the moving direction, and the gradient descent method moves in the negative direction of the gradient in order to find the minimum value, and is represented by an image: the curve in the figure is a cost function J, the abscissa is w or b, when the gradient (slope) is positive (the gradient points to the right front), w is calculated by the formula (6)Updating towards the left, and approaching the lowest point of the curve (the gradient is 0); when the gradient (slope) is negative- (the gradient points to the left front), w is updated towards the right through the operation of the formula (6) and is close to the lowest point of the curve until the gradient is 0, the minimum value is reached, and the optimal parameter w and b are obtained to enable J to achieve the minimum value.
F and G of the risk classes may expand the classes to decide on their own whether 2 risk classes are needed.
The data matching module is used for matching data from the data source module through the matching key of the applicant.
The interaction module counts the matching key information of the applicant into a file for a client, an operator puts the file into a specified directory to request early warning grade information, the interaction module feeds back the early warning grade information to the risk screening module, acquires related information according to the rating module, and returns the characteristic data of the applicant according to a data service contract.
Further, the customers comprise banks, small credit companies, internet financial companies and other financial service organizations;
further, the matching key of the applicant comprises a personal identification number, a name, a common mobile phone number and loan-associated bank account content.
The risk screening module generates monitoring service data (concurrency, error number, slow query and the like) and sends the monitoring service data to the monitoring center in the process of screening service, the monitoring center gives an alarm according to a monitoring rule and generates a system report, the alarming functions of mails, short messages, calls and the like are completed through the message center, and the monitoring center can also set a service system and complete dynamic service capability configuration of the system.
The input and output data in the interactive module are files, each line in the files is all input and output data called once, a timing clearing unit is arranged in the interactive module, the files in the disk are automatically deleted at regular time, and no personal sensitive data can be stored no matter the data is the input end data of a credit agency or the output data of a data partner.
When the system is used, the data source module acquires relevant data from multiple parties and is used for calculating various derived variables and calculating grades and risk levels, the accuracy, integrity and consistency of the data are guaranteed through the data preprocessing module and the data modeling module, a bank or other organizations count the matching key information of the applicant into one file, the operator places the file in a designated directory to request early warning level information, the risk screening module performs data matching, the interactive module returns the characteristic data of the applicant according to the data service contract, the file timing automatic deleting mechanism stored in the disk after transmission is started, the grading request sensitive data submitted by the bank is not stored in the disk, the data obtained from each data partner is only used for carrying out early warning level calculation and generating risk index variables, and the personal sensitive data can not be stored in a disk in a landing manner.
Example 2:
a multi-dimensional big data intelligent risk screening system is shown in fig. 1 and fig. 2 and comprises a data source module, a data preprocessing module, a data modeling module, a rating module, a risk screening module and an interaction module; the data source module comprises a data collector, dealer service data, partner data and a third-party data market; the data preprocessing comprises a data cleaning technology, a data reduction technology, a data integration technology and a data transformation technology; the data modeling module is used for establishing a mathematical model by using logistic regression for predicting the risk of the client; the rating module checks people with low repayment probability according to the data obtained by the data modeling module, and specifically, the people can be divided into 7 risk levels A, B, C, D, E, F and G; the risk screening module is in communication connection with the rating module and comprises a data matching module; the interaction module is in communication connection with the risk screening module.
The data acquisition unit is client behavior information acquired by software modes such as API, SDK, JS and the like at a PC end or a mobile end.
The security dealer service data mainly comprises centralized bidding transaction information of buying and selling such as bulk transaction, agreement transfer, after-hand transaction and the like of security trading in a public and centralized mode, and investment system data of buying and selling on online investment platforms of security dealers, investment analysis decision systems and other investment systems of users.
The data of the partner is mainly data information which is provided by an organization having a cooperative relationship with a software developer and reflects the behavior preference, consumption condition and other relevant conditions of a client, and comprises public number data, e-commerce station data, media data and the like.
The third party data market comprises blacklist data providing institutions, telecommunication consumption data providing institutions, financial consumption data providing institutions and other data providing institutions.
The data cleaning technology is used for cleaning noise in data and correcting inconsistency; the data reduction technology reduces the scale of data by sniping, deleting redundant features or clustering; data integration techniques consolidate data from multiple data sources into a coherent data store, such as a data warehouse; the data transformation technology compresses data to a smaller interval, such as 0.0 to 1.0, and can improve the accuracy and efficiency of a mining algorithm for designing distance measurement.
And w and b in the logistic regression are parameters to be solved, the logistic regression corresponds w x + b to a hidden state P through a function L, P (w x + b), then the value of the dependent variable is determined according to the size of P and 1-P, and if L is the logistic function, the logistic regression is carried out.
The L function in the logistic regression generally uses a sigmoid function
Figure BDA0002516262370000121
The loss function of logistic regression is L (y1, y2) — (y2log (y1)) + (1-y2) log (1-y 1); defining an average of a loss function of m training samples of a cost function
Figure BDA0002516262370000122
The method measures the average error cost between the predicted result and the real result, the optimization aims at minimizing the cost function J (w, b), the effect of optimizing the model can be achieved by minimizing the cost function, and the optimization of the cost function can be realized by a gradient descent method.
In the gradient descent method, update of w, bIn a manner that
Figure BDA0002516262370000123
Figure BDA0002516262370000124
For learning rate learning-rate representing the step size of the move, gradient
Figure BDA0002516262370000125
That is, the slope of the current point specifies the moving direction, and the gradient descent method moves in the negative direction of the gradient in order to find the minimum value, and is represented by an image: the curve in the figure is a cost function J, the abscissa is w or b, when the gradient (slope) is positive- (the gradient points to the right front), w is updated towards the left by the operation of the formula (6) and is close to the lowest point of the curve (the gradient is 0); when the gradient (slope) is negative- (the gradient points to the left front), w is updated towards the right through the operation of the formula (6) and is close to the lowest point of the curve until the gradient is 0, the minimum value is reached, and the optimal parameter w and b are obtained to enable J to achieve the minimum value.
F and G of the risk classes may expand the classes to decide on their own whether 2 risk classes are needed.
The data matching module is used for matching data from the data source module through the matching key of the applicant.
The interaction module counts the matching key information of the applicant into a file for a client, an operator puts the file into a specified directory to request early warning grade information, the interaction module feeds back the early warning grade information to the risk screening module, acquires related information according to the rating module, and returns the characteristic data of the applicant according to a data service contract.
Further, the matching key of the applicant comprises a personal identification number, a name, a common mobile phone number and loan-associated bank account content.
The risk screening module generates monitoring service data (concurrency, error number, slow query and the like) and sends the monitoring service data to the monitoring center in the process of screening service, the monitoring center gives an alarm according to a monitoring rule and generates a system report, the alarming functions of mails, short messages, calls and the like are completed through the message center, and the monitoring center can also set a service system and complete dynamic service capability configuration of the system.
The input and output data in the interactive module are files, each line in the files is all input and output data called once, a timing clearing unit is arranged in the interactive module, the files in the disk are automatically deleted at regular time, and no personal sensitive data can be stored no matter the data is the input end data of a credit agency or the output data of a data partner.
When the system is used, relevant data are obtained through a data source module in multiple parties, various derived variables are calculated, and a score and a risk level are calculated, during the period, the accuracy, the integrity and the consistency of the data are guaranteed through a data preprocessing module and a data modeling module, a bank or other organizations count matching key information of an applicant into one file, an operator puts the file into a specified directory to request early warning level information, a risk screening module performs data matching, characteristic data of the applicant are returned through an interaction module according to a data service contract, and the file is stored in a disk and is automatically deleted at regular time after being transmitted.
In the case of the example 3, the following examples are given,
in the above embodiment, the data cleansing technique removes noise from the data, and correcting the inconsistency includes:
a1, determining the customer behavior information;
training and sorting the collected customer behavior information as S, wherein S can be expressed as:
Figure BDA0002516262370000141
wherein x isijThe ith training value is the jth attribute of the customer behavior information, the value of i is from 1 to m, m is the training frequency of the customer behavior information, the value of j is from 1 to n, and n is the number of attributes contained in the customer behavior information; the attributes include at least: customer preferences, consumption behaviors, and lifestyle;
a2, calculating a cleaning threshold value;
Figure BDA0002516262370000142
wherein, βjA cleaning threshold of the jth attribute, k is a preset correction parameter, E is a dynamic range of standard deviation, and generally 0<k<1;
A3, screening noise in data;
Figure BDA0002516262370000151
wherein λ isijFor the screening result, 1 represents the ith training value x of the jth attribute of the customer behavior informationijWithout correction, 0 represents the ith training value x of the jth attribute of the customer behavior informationijStep A4 is required for correction;
a4, correcting inconsistent data;
tij=MEDIAN(x1j:xmj),λij=0
wherein, tijIs xijCorrected data, MEDIAN (x)1j:xmj) Is a function of the median value.
Has the advantages that: according to the technical scheme, the collected customer behavior information is trained, then the cleaning threshold is calculated to screen the noise data in the data, and finally the screened inconsistent data is corrected.
The above description is only for the preferred embodiment of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art should be considered to be within the technical scope of the present invention, and the technical solutions and the inventive concepts thereof according to the present invention should be equivalent or changed within the scope of the present invention.

Claims (11)

1. A multidimensional big data intelligent risk screening system comprises a data source module, a data preprocessing module, a data modeling module, a rating module, a risk screening module and an interaction module, and is characterized in that the data source module comprises a data collector, dealer service data, partner data and a third-party data market; the data preprocessing comprises a data cleaning technology, a data reduction technology, a data integration technology and a data transformation technology; the data modeling module is used for establishing a mathematical model by using logistic regression for predicting the risk of the client; the rating module checks people with low repayment probability according to the data obtained by the data modeling module, and specifically, the people can be divided into A, B, C, D, E, F and G, and 7 risk levels; the risk screening module is in communication connection with the rating module and comprises a data matching module; the interaction module is in communication connection with the risk screening module.
2. The system according to claim 1, wherein the data collector is a client behavior information collected by software means such as API, SDK, JS and the like at a PC end or a mobile end.
3. The system of claim 2, wherein the dealer business data includes centralized transaction data information for trading such as public and centralized bidding trading of securities traders, bulk trading, agreement transfer, post-inventory trading, etc., and investment system data for trading of users on online investment platforms of securities traders, investment analysis decision systems, etc.
4. The system as claimed in claim 3, wherein the partner data is mainly data information reflecting client behavior preference, consumption status and other relevant conditions provided by an organization having a cooperative relationship with a software developer, and includes public number data, e-commerce station data and media data.
5. The multi-dimensional big data intelligent screening risk system according to claim 4, wherein the third party data market comprises blacklist data providers, telecommunication consumption data providers, financial consumption data providers, and other data providers.
6. The multidimensional big data intelligent screening risk system according to claim 5, wherein the data cleaning technology removes noise in the data and corrects inconsistency; the data reduction technology reduces the scale of data by sniping, deleting redundant features or clustering; data integration techniques consolidate data from multiple data sources into a coherent data store, such as a data warehouse; data transformation techniques compress data to a smaller interval, such as 0.0 to 1.0.
7. The multidimensional big data intelligent risk screening system according to claim 1, wherein the L function in the logistic regression generally uses sigmoid function
Figure FDA0002516262360000021
The loss function of logistic regression is L (y1, y2) — (y2log (y1)) + (1-y2) log (1-y 1); defining an average of a loss function of m training samples of a cost function
Figure FDA0002516262360000022
And measuring the average error cost between the predicted result and the real result, wherein the optimization aims at minimizing a cost function J (w, b), the effect of optimizing the model can be achieved by minimizing the cost function, and the optimization of the cost function can be realized by a gradient descent method.
8. The system of claim 7, wherein the interaction module is configured to count matching key information of the applicant into one file for a client, an operator puts the file into a specified directory to request early warning level information, and the interaction module feeds back the file to the risk screening module, obtains related information according to the rating module, and returns feature data of the applicant according to a data service contract.
9. The system of claim 8, wherein the risk screening module generates monitoring service data (concurrency, error count, slow query, etc.) and sends the data to the monitoring center during the screening service, the monitoring center gives an alarm according to a monitoring rule and generates a system report, and the message center completes the alarm functions of e-mail, short message, call, etc., and the monitoring center can further set the service system to complete the dynamic service capability configuration of the system.
10. The system for multi-dimensional big data intelligent screening risk according to claim 9, wherein the input and output data in the interactive module is a file, each line in the file is the whole input and output data that is called once, and the interactive module is provided with a timing clearing unit.
11. The multidimensional big data intelligent screening risk system according to claim 6, wherein the data cleaning technology cleans noise in data, and correcting inconsistency comprises:
a1, determining the customer behavior information;
training and sorting the collected customer behavior information as S, wherein S can be expressed as:
Figure FDA0002516262360000031
wherein x isijThe ith training value is the jth attribute of the customer behavior information, the value of i is from 1 to m, m is the training frequency of the customer behavior information, the value of j is from 1 to n, and n is the number of attributes contained in the customer behavior information;
a2, calculating a cleaning threshold value;
Figure FDA0002516262360000041
wherein, βjA cleaning threshold value of the jth attribute, k is a preset correction parameter, and E is a dynamic range of a standard deviation;
a3, screening noise in data;
Figure FDA0002516262360000042
wherein λ isijFor the screening result, 1 represents the ith training value x of the jth attribute of the customer behavior informationijWithout correction, 0 represents the ith training value x of the jth attribute of the customer behavior informationijStep A4 is required for correction;
a4, correcting inconsistent data;
tij=MEDIAN(x1j:xmj),λij=0
wherein, tijIs xijCorrected data, MEDIAN (x)1j:xmj) Is a function of the median value.
CN202010477432.7A 2020-05-29 2020-05-29 Multidimensional big data intelligent risk screening system Pending CN111667356A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010477432.7A CN111667356A (en) 2020-05-29 2020-05-29 Multidimensional big data intelligent risk screening system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010477432.7A CN111667356A (en) 2020-05-29 2020-05-29 Multidimensional big data intelligent risk screening system

Publications (1)

Publication Number Publication Date
CN111667356A true CN111667356A (en) 2020-09-15

Family

ID=72385310

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010477432.7A Pending CN111667356A (en) 2020-05-29 2020-05-29 Multidimensional big data intelligent risk screening system

Country Status (1)

Country Link
CN (1) CN111667356A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114862284A (en) * 2022-07-06 2022-08-05 南通思普信息科技有限公司 Business intelligent module system based on cloud real-time semantic analysis

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165840A (en) * 2018-08-20 2019-01-08 平安科技(深圳)有限公司 Risk profile processing method, device, computer equipment and medium
CN109272396A (en) * 2018-08-20 2019-01-25 平安科技(深圳)有限公司 Customer risk method for early warning, device, computer equipment and medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109165840A (en) * 2018-08-20 2019-01-08 平安科技(深圳)有限公司 Risk profile processing method, device, computer equipment and medium
CN109272396A (en) * 2018-08-20 2019-01-25 平安科技(深圳)有限公司 Customer risk method for early warning, device, computer equipment and medium

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
J. SAUVOLA等: "Adaptive document image binarization", 《PATTERN RECOGNITION》 *
俞朝晖等: "《系统防护 网络安全与黑客攻防实用宝典 修订版》", 31 May 2014, 中国铁道出版社 *
王瑞: "小额贷款逾期客户还款概率预测模型", 《中国优秀硕士学位论文全文数据库 经济与管理科学辑》 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114862284A (en) * 2022-07-06 2022-08-05 南通思普信息科技有限公司 Business intelligent module system based on cloud real-time semantic analysis

Similar Documents

Publication Publication Date Title
CN110837931B (en) Customer churn prediction method, device and storage medium
US8560471B2 (en) Systems and methods for generating leads in a network by predicting properties of external nodes
US11257161B2 (en) Methods and systems for predicting market behavior based on news and sentiment analysis
US20050004862A1 (en) Identifying the probability of violative behavior in a market
US7881535B1 (en) System and method for managing statistical models
WO2007055919A2 (en) Electronic enterprise capital marketplace and monitoring apparatus and method
WO2022155740A1 (en) Systems and methods for outlier detection of transactions
CN112330047A (en) Credit card repayment probability prediction method based on user behavior characteristics
CN116664012A (en) Enterprise credit assessment method and system based on big data analysis
CN112597775A (en) Credit risk prediction method and device
CN113722433A (en) Information pushing method and device, electronic equipment and computer readable medium
Onar et al. A fuzzy rule based inference system for early debt collection
CN112907356A (en) Overdue collection method, device and system and computer readable storage medium
CN115760332A (en) Risk prediction method, system, medium and device based on enterprise data analysis
CN112950359B (en) User identification method and device
CN113298121B (en) Message sending method and device based on multi-data source modeling and electronic equipment
US20060248096A1 (en) Early detection and warning systems and methods
CN111667356A (en) Multidimensional big data intelligent risk screening system
Pang et al. Information matching model and multi-angle tracking algorithm for loan loss-linking customers based on the family mobile social-contact big data network
CN113407734B (en) Method for constructing knowledge graph system based on real-time big data
EP2465085A2 (en) Systems and methods for gererating leads in a network by predicting properties of external nodes
CN112712270B (en) Information processing method, device, equipment and storage medium
CN114708090A (en) Bank payment business risk identification device based on big data
CN114493686A (en) Operation content generation and pushing method and device
CN115115322A (en) Target group identification method, risk assessment method, apparatus, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination