CN114092222A - Threshold model establishing method, device, equipment and medium based on financial risk - Google Patents

Threshold model establishing method, device, equipment and medium based on financial risk Download PDF

Info

Publication number
CN114092222A
CN114092222A CN202111326988.7A CN202111326988A CN114092222A CN 114092222 A CN114092222 A CN 114092222A CN 202111326988 A CN202111326988 A CN 202111326988A CN 114092222 A CN114092222 A CN 114092222A
Authority
CN
China
Prior art keywords
data
financial
model
group
financial institutions
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111326988.7A
Other languages
Chinese (zh)
Inventor
周玮理
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CCB Finetech Co Ltd
Original Assignee
CCB Finetech Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CCB Finetech Co Ltd filed Critical CCB Finetech Co Ltd
Priority to CN202111326988.7A priority Critical patent/CN114092222A/en
Publication of CN114092222A publication Critical patent/CN114092222A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Business, Economics & Management (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Software Systems (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Economics (AREA)
  • Development Economics (AREA)
  • Strategic Management (AREA)
  • Technology Law (AREA)
  • General Business, Economics & Management (AREA)
  • Medical Informatics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Financial Or Insurance-Related Operations Such As Payment And Settlement (AREA)

Abstract

The embodiment of the invention discloses a threshold model establishing method, device, equipment and medium based on financial risk, relating to the technical field of financial risk management. The method comprises the following steps: grouping different financial institutions to determine at least two groups of financial institutions; removing heterogeneous data in the historical loss data of each group of financial institutions according to an anti-fraud model to form at least two groups of sample loss data; and respectively selecting a threshold value of the threshold value model according to the loss data of each group of samples so as to establish the threshold value model corresponding to each group of financial institutions. The threshold value selection can be carried out according to the extreme value scale interval of the sample loss data corresponding to each group of financial institutions, so that the threshold value deviation of the threshold value model corresponding to each group of financial institutions can be reduced, the condition that the heterogeneous data is used for building the threshold value model when the heterogeneous data is larger than the threshold value can be avoided, the stability of the threshold value model is improved, and the model risk of the threshold value model is reduced.

Description

Threshold model establishing method, device, equipment and medium based on financial risk
Technical Field
The embodiment of the invention relates to the technical field of financial risk management, in particular to a threshold model establishing method, device, equipment and medium based on financial risk.
Background
In financial risk management, the metering of risk is essential. The calculation of the value at risk (Var value) and the Expected loss (ES) is one of the most important means in financial risk metering. In recent years, due to the frequent occurrence of extreme financial events, people are more likely to adopt extreme value theory to calculate the risk value and the expected loss. A threshold (POT) model is a common extreme theoretical model, which can mathematically model all observed data exceeding a certain sufficiently large threshold (i.e., threshold) based on Generalized Pareto Distributions (GPD), gradually characterize the tail of the distribution, and can calculate the at-risk value and expected loss according to the POT model. Before using the POT model to calculate the risk value and the expected loss, the threshold of the POT model needs to be selected appropriately to avoid that too high selection of the threshold results in too little observed data, so that the variance of the parameter estimation is too high, or that too low selection of the threshold results in biased or non-biased estimation.
When the threshold value of the POT model is selected, all loss data of the financial institution are selected firstly, then analysis is carried out according to the size of the loss data, the fluctuation rate after deformation, the dispersion degree and the like, one loss data is selected as the threshold value, and then the loss data larger than the threshold value is adopted to establish the POT model. Since loss data caused by an emergency event is included in all loss data of the financial institution, when the loss data caused by the emergency event is greater than a threshold value and is used for POT model establishment, the model risk of the POT model is increased, and errors are generated in the calculation of the risk value and the expected loss by adopting the POT model. For example, for a financial institution, a fraud event is sudden, and loss data caused by the fraud event is unstable. When loss data caused by a fraud event is larger than a threshold value and is used for POT model building, the POT model is subjected to model defects due to data reasons, and therefore the model risk of the POT model is increased. Moreover, the financial institutions themselves have different competency and risk management levels, so that the different financial institutions have different sensitivities to the same loss data, and the different financial institutions have different definitions and understandings of the respective loss data, and the same loss data may be an extreme phenomenon for one type of financial institution but is not an extreme phenomenon for another type of financial institution. Therefore, when different financial institutions use the same threshold value to establish the POT model, threshold value deviation occurs, and therefore model risk of the POT model is increased. Illustratively, there are two close types of financial institutions A and B, with financial institution A issuing a 5000 ten thousand loan at the same time period with a 200 ten thousand loss; financial institution B issued a 200-thousand loan with a 200-thousand loss. After the threshold of the POT model is selected according to all loss data, when the threshold is greater than 200 ten thousand, 200 ten thousand are smaller than the threshold in the view of the value of the loss data, and therefore the threshold cannot be used for building the POT model. However, for financial institution B, which has a loss rate of 100%, it is a serious risk event, 200 tens of thousands of loss data should be used to build the POT model, resulting in the POT model having a larger model risk for financial institution B. When the threshold is less than 200 ten thousand, the lower threshold selection ratio for financial institution a tends to result in a POT model with biased or inappropriate estimates relative to financial institution a, which also results in a POT model with greater model risk for financial institution a.
Disclosure of Invention
The embodiment of the invention provides a threshold model establishing method, device, equipment and medium based on financial risk, so as to reduce model risk of a threshold model.
In a first aspect, an embodiment of the present invention provides a threshold model establishing method based on financial risk, including:
grouping different financial institutions to determine at least two groups of financial institutions;
removing heterogeneous data in the historical loss data of each group of financial institutions according to an anti-fraud model to form at least two groups of sample loss data;
and respectively selecting a threshold value of a threshold value model according to each group of the sample loss data so as to establish a threshold value model corresponding to each group of financial institutions.
Optionally, grouping different financial institutions includes:
selecting attribute data of at least two dimensions of the financial institution; wherein the attribute data of the financial institution includes at least two of: the data of the industrial and commercial information, the data of the financial statement, the credit data, the risk index data and the data of the financial product;
calculating a fluctuation degree characterization parameter of the attribute data of each dimension of the financial institution according to a first time slice aiming at the attribute data of each dimension to form an attribute data sample set; wherein the characterization parameters of the fluctuation degree of the financial institution attribute data include at least one of the following: sorting the standard deviation, the fluctuation rate, the variation coefficient, the maximum value, the minimum value, the average value, the same ratio, the ring ratio and the same characterization parameter;
and clustering based on a set clustering algorithm according to the attribute data sample set, and grouping the financial institutions according to clustering results.
Optionally, the set clustering algorithm is a K-means algorithm.
Optionally, clustering based on a set clustering algorithm according to the attribute data sample set, and grouping the financial institutions according to a clustering result, including:
selecting k samples in the attribute data sample set as initial center points;
grouping according to Euclidean distances from data points in the attribute data sample set to the initial central point;
and updating the initial central point according to the central point of each group for iteration until the central point of each group is unchanged, and determining samples in each group as a group of financial institutions.
Optionally, the units of segmentation of the first time slice include at least one of years, seasons, and months.
Optionally, culling the heterogeneous data in the historical loss data of each group of financial institutions according to an anti-fraud model, including:
identifying each group of said financial institution's fraudulent customers according to said anti-fraud model;
determining historical loss data formed by the fraudulent customer for the financial institution as the heterogeneity data;
and rejecting the heterogeneous data in the historical loss data of the financial institution.
Optionally, before culling the heterogeneous data in the historical loss data of each group of financial institutions according to an anti-fraud model, the method further includes:
and establishing an anti-fraud model according to the customer data of the financial institution.
Optionally, establishing an anti-fraud model based on customer data of the financial institution, comprising:
calculating network parameters according to the capital flow network of the financial institution client; wherein the customers include historical fraudulent customers and historical non-fraudulent customers, and the network parameters include at least one of: medium degree, fixed point degree, medium number and average distance;
forming a data set according to the network parameters and attribute data of at least two dimensions of the client; wherein the attribute data of the client includes at least two of: the system comprises industrial and commercial information data, tax data, financial statement data, guarantee data, multi-head loan data, complex network indexes and financial product data;
and establishing an anti-fraud model according to the data set.
Optionally, forming a data set according to the network parameters and the attribute data of at least two dimensions of the client comprises:
selecting attribute data of at least two dimensions of the client;
according to the attribute data of each dimension of the client, calculating a fluctuation degree characterization parameter of the attribute data of the dimension according to a second time slice; wherein the characterizing parameters of the degree of fluctuation of the customer property data include at least one of: sorting the standard deviation, the fluctuation rate, the variation coefficient, the maximum value, the minimum value, the average value, the same ratio, the ring ratio and the same characterization parameter;
and forming the data set according to the network parameters, the attribute data of at least two dimensions of the client and the fluctuation degree characterization parameters of the attribute data of each dimension.
Optionally, the second time slice comprises at least one of a year, a season, a month, and a week.
Optionally, establishing an anti-fraud model from the data set includes:
selecting part of the data set as a training set; wherein the proportion of the training set that occupies the data set is greater than three fifths and less than four fifths;
and selecting variables in the data set as explanatory variables, and a first variable corresponding to a fraudulent customer and a second variable corresponding to a non-fraudulent customer as explained variables, and completing model training by adopting the data in the training set according to a machine learning algorithm to establish the anti-fraud model.
Optionally, after the anti-fraud model is built according to the data set, the method further includes:
selecting another part of the data set as a verification set; wherein the sum of the validation set and the training set is the data set;
and verifying the anti-fraud model according to the verification set.
In a second aspect, an embodiment of the present invention further provides a threshold model establishing apparatus based on financial risk, including:
the grouping module is used for grouping different financial institutions to determine at least two groups of financial institutions;
the data removing module is used for removing heterogeneous data in the historical loss data of each group of financial institutions according to the anti-fraud model to form at least two groups of sample loss data;
and the threshold model establishing module is used for respectively selecting the threshold of the threshold model according to each group of the sample loss data so as to establish the threshold model corresponding to each group of the financial institutions.
In a third aspect, an embodiment of the present invention further provides an apparatus, where the apparatus includes:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a financial risk based threshold model building method as described above.
In a fourth aspect, the embodiment of the present invention further provides a computer-readable storage medium, on which a computer program is stored, which when executed by a processor implements the method for establishing a threshold model based on financial risk as described above.
According to the embodiment of the invention, different financial institutions are grouped to determine at least two groups of financial institutions, and then heterogeneous data in the historical loss data of each group of financial institutions is removed according to an anti-fraud model to form at least two groups of sample loss data. When the threshold value of the threshold value model corresponding to each group of financial institutions is selected, the threshold value can be selected according to the extreme value scale interval of the sample loss data corresponding to each group of financial institutions, so that the threshold value deviation of the threshold value model corresponding to each group of financial institutions can be reduced, the threshold value stability of the threshold value model is improved, and the model risk of the threshold value model is reduced. Moreover, heterogeneous data is removed from the sample loss data of each group, so that the situation that the heterogeneous data is used for establishing a threshold model when the heterogeneous data is larger than a threshold value can be avoided, the stability of the threshold model is improved, and the model risk of the threshold model is reduced.
Drawings
FIG. 1 is a flowchart of a method for establishing a threshold model based on financial risk according to an embodiment of the present invention;
FIG. 2 is a flow chart of another method for establishing a threshold model based on financial risk according to an embodiment of the present invention;
FIG. 3 is a flow chart of another method for establishing a threshold model based on financial risk according to an embodiment of the present invention;
FIG. 4 is a flowchart of another method for establishing a threshold model based on financial risk according to an embodiment of the present invention;
FIG. 5 is a flowchart of another method for establishing a threshold model based on financial risk according to an embodiment of the present invention;
fig. 6 is a schematic structural diagram of a threshold model establishing apparatus based on financial risk according to an embodiment of the present invention;
fig. 7 is a schematic structural diagram of a terminal according to an embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Wherein the acquisition, storage and/or processing of data involved in all embodiments of the present invention are in compliance with relevant regulations of national laws and regulations.
Fig. 1 is a flowchart of a method for establishing a threshold model based on financial risk according to an embodiment of the present invention, where the embodiment is applicable to a situation that a threshold model is used for assessing financial risk, and the method may be executed by a device for establishing a threshold model based on financial risk, where the device may be implemented in software and/or hardware, and the device may be configured in an electronic device, such as a server or a terminal device, where a typical terminal device includes a mobile terminal, specifically includes a mobile phone, a computer, or a tablet computer. As shown in fig. 1, the method specifically comprises the following steps:
s110, grouping different financial institutions to determine at least two groups of financial institutions;
the financial institution refers to a financial intermediary institution engaged in the financial service industry and is a part of a financial system. Illustratively, financial institutions include banks, security companies, insurance companies, trust companies, fund management companies, loan companies, and the like, encompassing the banking, security, and insurance industries. Different financial institutions may be classified into different types according to different standards, and the same type of financial institution may be used as a group of financial institutions. For example, the financial institutions may be classified into different levels according to their strength and qualification, and the financial institutions of different levels may have different sensitivities to the same loss data, and the financial institutions of different levels may be classified into different groups.
S120, removing heterogeneous data in the historical loss data of each group of financial institutions according to an anti-fraud model to form at least two groups of sample loss data;
the historical loss data of each financial institution is historical loss data of all financial institutions in the group, and the historical loss data comprises loss data caused by all clients corresponding to each financial institution in the group. The historical loss data for different groups of financial institutions may have different extremum scale intervals. For example, when different financial institutions are divided into different groups of financial institutions according to strength and qualification, the extremum scale intervals corresponding to the different groups of financial institutions may be divided into millions and millions. The historical time of the historical loss data may be set as needed, and may be, for example, 10 years or 20 years. Heterogeneous data is lost data resulting from an incident, for example, heterogeneous data may be lost data resulting from a fraudulent incident. Heterogeneous data is bursty and is prone to cause instability of extremely lost data of financial institutions. The anti-fraud model is a model for identifying fraud behaviors including transaction fraud, phishing, telephone fraud, card stealing and number stealing. Fraudulent customers can be identified through the anti-fraud model, and loss data caused by the fraudulent customers serves as heterogeneous data. Illustratively, the amount of overdue by a fraudulent customer may be considered heterogeneous data. After determining the heterogeneous data in the historical loss data of each group of financial institutions, the heterogeneous data in the historical loss data of each group of financial institutions can be removed respectively to form at least two groups of sample loss data.
It should be noted that, when the fraudulent client is identified through the anti-fraud model, the fraudulent client can be further identified according to methods such as material check and field investigation, and the accuracy of identifying the fraudulent client is improved.
And S130, respectively selecting a threshold value of the threshold value model according to the loss data of each group of samples so as to establish the threshold value model corresponding to each group of financial institutions.
After the sample loss data of each group is determined, the threshold model threshold corresponding to each group of financial institutions is selected according to the sample loss data of each group, and therefore the threshold model corresponding to each group of financial institutions can be established. The sample loss data of each group can have different extreme value scale intervals, and when the threshold value model threshold value corresponding to each group of financial institutions is selected, the threshold value can be selected according to the extreme value scale interval of the sample loss data of each group, so that the threshold value deviation of the threshold value model corresponding to each group of financial institutions can be reduced, the threshold value stability of the threshold value model is improved, and the model risk of the threshold value model is reduced. Moreover, heterogeneous data is removed from the sample loss data of each group, so that the situation that the heterogeneous data is used for establishing a threshold model when the heterogeneous data is larger than a threshold value can be avoided, the stability of the threshold model is improved, and the model risk of the threshold model is reduced.
According to the technical scheme, at least two groups of financial institutions are determined by grouping different financial institutions, and then heterogeneous data in the historical loss data of each group of financial institutions is removed according to an anti-fraud model to form at least two groups of sample loss data. When the threshold value of the threshold value model corresponding to each group of financial institutions is selected, the threshold value can be selected according to the extreme value scale interval of the sample loss data corresponding to each group of financial institutions, so that the threshold value deviation of the threshold value model corresponding to each group of financial institutions can be reduced, the threshold value stability of the threshold value model is improved, and the model risk of the threshold value model is reduced. Moreover, heterogeneous data is removed from the sample loss data of each group, so that the situation that the heterogeneous data is used for establishing a threshold model when the heterogeneous data is larger than a threshold value can be avoided, the stability of the threshold model is improved, and the model risk of the threshold model is reduced.
Fig. 2 is a flowchart of another method for establishing a threshold model based on financial risk according to an embodiment of the present invention, and the embodiment further optimizes the method for establishing the threshold model based on financial risk on the basis of the foregoing embodiment. Correspondingly, as shown in fig. 2, the method specifically includes:
s210, selecting attribute data of at least two dimensions of a financial institution; wherein the attribute data of the financial institution includes at least two of: the data of the industrial and commercial information, the data of the financial statement, the credit data, the risk index data and the data of the financial product;
the attribute data of different dimensions of the financial institution can be data required by people considering the attribute of the financial institution from different thinking angles. Illustratively, the business information data is required in consideration of the attributes of the financial institution in consideration of the qualifications of the financial institution to engage in the market operation, and the financial statement data of the financial institution is required in consideration of the attributes of the financial institution in consideration of the financial thinking. According to the method, when the financial institutions are grouped, attribute data of at least two dimensions can be selected, and then the financial institutions can be grouped from at least two thinking angles, so that the grouping accuracy can be improved.
Optionally, attribute data of different dimensions of the financial institutions can be selected as much as possible, and when the financial institutions are grouped, the financial institutions can be grouped more comprehensively from different thinking angles, so that the accuracy of grouping the financial institutions can be guaranteed. For example, the attribute data of the financial institution may include attribute data of multiple dimensions such as industry and commerce information data, financial statement data, credit data, risk index data and financial product data, so that the financial institutions may be grouped according to multiple thinking angles such as qualification, financial status, credit status, risk level and financial product of the financial institution engaged in market operation activity, thereby improving the grouping accuracy of the financial institution.
S220, calculating a fluctuation degree characterization parameter of the attribute data of each dimension of the financial institution according to a first time slice aiming at the attribute data of each dimension to form an attribute data sample set; wherein the characterization parameters of the fluctuation degree of the financial institution attribute data comprise at least one of the following items: sorting the standard deviation, the fluctuation rate, the variation coefficient, the maximum value, the minimum value, the average value, the same ratio, the ring ratio and the same characterization parameter;
wherein the attribute data for each dimension of the financial institution may be divided into sub-attribute data for a plurality of financial institutions along the time dimension according to a first time slice. The first time slice is to cut time according to a certain time period, wherein the time period is a cutting unit of the first time slice. For example, the unit of segmentation of the first time slice may be year, season, month, week, and the like. After the attribute data of each dimensionality of the financial institutions are divided into the sub-attribute data of the financial institutions according to a first time slice, the fluctuation degree characterization parameters of the attribute data of the dimensionality are calculated according to the sub-attribute data of the financial institutions under the same dimensionality, and then an attribute data sample set is formed together according to the attribute data of the financial institutions under the different dimensionalities and the fluctuation degree characterization parameters of the attribute data of the dimensionality, so that the attributes of the financial institutions can be characterized by adopting more statistical indexes, the attributes of the financial institutions can be characterized from a wider thinking perspective, the financial institutions can be grouped more comprehensively from different thinking perspectives when the financial institutions are grouped subsequently, and the accuracy of grouping the financial institutions is further improved. For example, a fluctuation degree characterizing parameter of as much attribute data as possible may be calculated based on a plurality of sub-attribute data of each dimension, the attribute data of which are divided by a first time slice, so that more statistical indexes may be used to characterize the attribute of the financial institution. For example, the characterization parameters of the fluctuation degree of the financial institution attribute data may include a standard deviation, a fluctuation rate, a variation coefficient, a maximum value, a minimum value, a mean value, a same ratio, a ring ratio and a numerical ranking of the same characterization parameter, and at this time, the attribute of the financial institution may be characterized by a plurality of statistical indexes such as the standard deviation, the fluctuation rate, the variation coefficient, the maximum value, the minimum value, the mean value, the same ratio, the ring ratio and the numerical ranking of the same characterization parameter, so that the attribute of the financial institution may be characterized from a wider thinking perspective.
When attribute data of different dimensions of the financial institutions are divided into sub-attribute data of a plurality of financial institutions according to the first time slice along the time dimension, the slicing units of the first time slice may be the same or different. Alternatively, when attribute data of different dimensions of the financial institution is divided into sub-attribute data of a plurality of financial institutions according to the first time slice along the time dimension, the division units of the first time slice may be the same.
Optionally, the units of segmentation of the first time slice include at least one of years, seasons, and months.
The statistical time units of the attribute data of different dimensions of the financial institution can be different, exemplarily, the statistical time units of the attribute data of different dimensions can be years, seasons and months, when the statistical time units of the attribute data of different dimensions of the financial institution are different, the statistical time units can be used as the slicing units of the first time according to the statistical time units, so that the accuracy of the fluctuation degree characterization parameters of the attribute data of different dimensions of the financial institution can be increased, the slicing accuracy of the first time can be improved, the attributes of the financial institution can be characterized by adopting various statistical indexes corresponding to different time slices, and the universality of the thinking angle for characterizing the attributes of the financial institution can be further increased. Moreover, when the statistical time unit of the attribute data of different dimensions is small, for example, it may be a month, the first time slices may be set to be a plurality of time slices according to the statistical time unit of the attribute data, and the first time slices respectively include a year, a season, and a month, and then fluctuation degree characterization parameters of the attribute data corresponding to the different time slices are respectively calculated, thereby further increasing the universality of the thinking angle for characterizing the attributes of the financial institution.
And S230, clustering based on a set clustering algorithm according to the attribute data sample set, and grouping the financial institutions according to clustering results.
The clustering algorithm may be various, and for example, may include a partition algorithm, a hierarchical algorithm, a density algorithm, a graph theory clustering algorithm, a grid algorithm, a model algorithm, and the like. After the attribute data sample set is determined, clustering can be performed according to a set clustering algorithm, then at least two clusters are formed, and financial institutions corresponding to the attribute data sample set included in each cluster are used as a group of financial institutions, so that the grouping of the financial institutions is realized.
S240, removing heterogeneous data in the historical loss data of each group of financial institutions according to an anti-fraud model to form at least two groups of sample loss data;
and S250, respectively selecting a threshold value of the threshold value model according to the loss data of each group of samples so as to establish the threshold value model corresponding to each group of financial institutions.
Illustratively, when clustering is performed based on a set clustering algorithm according to the attribute data sample set, the clustering algorithm is set to a K-means algorithm.
The K-means algorithm is a clustering analysis algorithm for iterative solution, at least two clusters can be formed by the attribute data sample sets through the K-means algorithm, and then financial institutions are grouped according to financial institutions corresponding to the attribute data samples in each cluster.
Exemplarily, clustering is performed based on a set clustering algorithm according to the attribute data sample set, and the financial institutions are grouped according to a clustering result, including:
selecting k samples in the attribute data sample set as initial center points;
wherein the value of k can be determined according to the number of groups to be grouped. Illustratively, financial institutions may be grouped to determine three groups of financial institutions, and the value of k may be 3. When k samples are selected as the initial center point, k samples having a larger euclidean distance between the samples may be selected.
Grouping according to Euclidean distances from data points in the attribute data sample set to the initial central point;
after k initial center points are determined, the Euclidean distance from a data point in a mathematical data sample set to each initial center point is respectively calculated, and then the initial center point with the minimum Euclidean distance from the data point to the k initial center points is divided into a cluster according to the Euclidean distance from the data point to the initial center points. Repeating the above process, dividing all data points in the attribute data sample set into clusters with the initial center point with the minimum Euclidean distance, and taking the samples in each cluster as a group of samples.
And updating the initial central point according to the central point of each group for iteration until the central point of each group is unchanged, and determining the samples in each group as a group of financial institutions.
Wherein after k sets of samples are determined, the center point of each set is determined based on the arithmetic mean of the respective dimensions of the data points in each set. After the central point of each group of samples is determined, replacing the initial central point in the group with the central point of the group, repeating the grouping process according to the Euclidean distance from the data point to the initial central point, calculating the central point of each group again, iterating the central points until the central point of each group is unchanged, and then dividing the financial institutions corresponding to the samples in the group into one group to realize the grouping of the financial institutions.
Fig. 3 is a flowchart of another method for establishing a threshold model based on financial risk according to an embodiment of the present invention, and the present embodiment further optimizes the method for establishing the threshold model based on financial risk on the basis of the foregoing embodiments. Correspondingly, as shown in fig. 3, the method specifically includes:
s310, grouping different financial institutions to determine at least two groups of financial institutions;
s320, identifying the cheating customers of each group of financial institutions according to the anti-cheating model;
the personal and civil enterprises are key objects for credit fraud risk prevention and control, and most borrowers of historical risk events are personal or civil enterprises from the case conditions of the historical risk events and the historical risk events. In identifying fraudulent customers for each group of financial institutions, it is first necessary to identify fraudulent activity. Illustratively, in the area of personal consumer credits, where a customer's behavior is characterized by overdue at the first or second term of payment, with loss of connection, credit qualification data is not authentic. And then, carrying out account age analysis on the individual customers in combination with data verification to screen such loss of contact samples. After the fraudulent activity is identified, the fraudulent client is identified as being likely to be a fraudulent client.
S330, determining that the historical loss data of the fraudulent client on the financial institution is heterogeneous data;
wherein, the historical loss data caused by the fraudulent activities of the fraudulent client is sudden. After determining historical data caused by fraudulent activities of a fraudulent client, the data is directly determined as heterogeneous data.
And S340, removing heterogeneous data from the historical loss data of the financial institution.
After the heterogeneous data is determined, the heterogeneous data can be removed from the historical loss data of each group of financial institutions, so that model risks of the threshold model caused by the heterogeneous data are avoided when the threshold model is established subsequently for the historical loss data of each group of financial institutions, and the stability of the threshold model is improved.
And S350, respectively selecting the threshold value of the threshold value model according to the loss data of each group of samples so as to establish the threshold value model corresponding to each group of financial institutions.
Optionally, before culling the heterogeneous data in the historical loss data of each group of financial institutions according to an anti-fraud model, the method further includes:
an anti-fraud model is established based on customer data for the financial institution.
When the anti-fraud model is established, a specific anti-fraud model can be established according to customer data of the financial institution, and the accuracy of the anti-fraud model is improved.
Fig. 4 is a flowchart of another method for establishing a threshold model based on financial risk according to an embodiment of the present invention, and the embodiment further optimizes the method for establishing the threshold model based on financial risk on the basis of the foregoing embodiment. Correspondingly, as shown in fig. 4, the method specifically includes:
s410, calculating network parameters according to the capital flow network of the financial institution client; wherein the clients include historical fraudulent clients and historical non-fraudulent clients, and the network parameters include at least one of: medium degree, fixed point degree, medium number and average distance;
when the anti-fraud model is established, historical fraudulent customers and non-fraudulent customers are selected, and an incidence relation network of the customers is established by adopting a database method according to the fund flow of the customers to form a fund flow network of the customers. The client can be a business, and the associated objects of the client include individuals and businesses. And observing whether the funds after the client borrows the loan flows into the real estate industry, the investment industry and the related party, and the account times and time intervals of the loan passing on the way to the industries, the inflow and outflow ratio and the like. In addition, the graph database method can establish a huge complex network according to various incidence relations of different data of customers, and then can mine super nodes and clustering coefficients of the nodes in the network. The super node is a 'pivot' in the network, and is a member of a client with fraud probability; nodes with a large clustering coefficient are often located in a small community, and the probability that the community is a fraudulent client is high. It is also possible to mine the community structure in the network and then further analyze the statistical characteristics inside the community for each community, analyzing the possibility that the community is a fraudulent client. After the network is analyzed by the method, fraudulent clients and fraudulent events which are not easy to be found by the conventional method can be obtained. In the event of fraudulent conduct, data counterfeiting, document counterfeiting and contract counterfeiting are very common, and a complex network can more intuitively and accurately find whether a transaction really exists and whether a contract opponent has an association relationship, so that fraudulent conduct can be effectively identified.
After the fund flow network of the client is formed, the network parameters can be calculated according to the fund flow network, the deep analysis and mining of the fund flow network are realized, the variable dimension in establishing the anti-fraud model can be increased, and the accuracy of the anti-fraud model is improved. Illustratively, the network parameters may include at least one of a degree of intermediation, a degree of fix, an betweenness, and an average distance. The more network parameters, the more variable dimensions are used to build the anti-fraud model, and the higher the accuracy of the anti-fraud model.
S420, forming a data set according to the network parameters and the attribute data of at least two dimensions of the client; wherein the attribute data of the client includes at least two of: the system comprises industrial and commercial information data, tax data, financial statement data, guarantee data, multi-head loan data, complex network indexes and financial product data;
the attribute data of the client in different dimensions can be data required by people considering the attributes of the client from different thinking angles. Illustratively, business information data is required in consideration of attributes of the client in consideration of the client's eligibility for conducting marketing activities, and tax statement data is required in consideration of the client's attributes in consideration of tax. The accuracy of the data set can be improved by forming the data set according to the network parameters and the attribute data of at least two dimensions of the client.
And S430, establishing an anti-fraud model according to the data set.
Because the data set comprises the network parameters and the attribute data of at least two dimensions, when the anti-fraud model is established according to the data set, the accuracy of the anti-fraud model can be improved, and the accuracy of the anti-fraud model for identifying the fraudulent client can be improved.
S440, grouping different financial institutions to determine at least two groups of financial institutions;
s450, eliminating heterogeneous data in the historical loss data of each group of financial institutions according to an anti-fraud model to form at least two groups of sample loss data;
and S460, respectively selecting the threshold value of the threshold value model according to the loss data of each group of samples so as to establish the threshold value model corresponding to each group of financial institutions.
Fig. 5 is a flowchart of another method for establishing a threshold model based on financial risk according to an embodiment of the present invention, and the embodiment further optimizes the method for establishing the threshold model based on financial risk on the basis of the foregoing embodiment. Correspondingly, as shown in fig. 5, the method specifically includes:
s510, calculating network parameters according to a capital flow network of a financial institution client; wherein the clients include historical fraudulent clients and historical non-fraudulent clients, and the network parameters include at least one of: medium degree, fixed point degree, medium number and average distance;
s520, selecting attribute data of at least two dimensions of the client;
when the attribute data of different dimensions of the client are selected, the attribute data of different dimensions of the client can be selected as much as possible, and when the anti-fraud model is established according to the network parameters and the attribute data of different dimensions of the client, the accuracy of the anti-fraud model can be improved. Illustratively, the attribute data of the client can comprise industrial and commercial information data, tax data, financial statement data, guarantee data, multi-head loan data, complex network indexes and financial product data, so that an anti-fraud model can be established by integrating multiple dimensional attributes of the client, and the accuracy of the anti-fraud model is improved.
S530, according to the attribute data of each dimension of the client, calculating a fluctuation degree characterization parameter of the attribute data of the dimension according to a second time slice; wherein the characterizing parameters of the fluctuation degree of the client attribute data comprise at least one of the following parameters: sorting the standard deviation, the fluctuation rate, the variation coefficient, the maximum value, the minimum value, the average value, the same ratio, the ring ratio and the same characterization parameter;
wherein the attribute data of each dimension of the client can be divided into sub-attribute data of a plurality of clients according to the second time slice along the time dimension. The second time slice is to cut time according to a certain time period, wherein the time period is a cutting unit of the second time slice. For example, the units of segmentation for the second time slice may be years, seasons, months, weeks, and the like. After the attribute data of each dimension of the client is divided into the sub-attribute data of a plurality of clients according to the second time slice, the fluctuation degree characterization parameter of the attribute data of the dimension is calculated according to the sub-attribute data of the clients under the same dimension, so that the attributes of the clients can be characterized by adopting more statistical indexes, and the attributes of the clients can be characterized from a wider thinking angle. And then the anti-fraud model is applied to the establishment of the anti-fraud model, so that the accuracy of the anti-fraud model can be improved. For example, the fluctuation degree characterization parameter of the attribute data can be calculated as much as possible according to the sub-attribute data of the clients, the attribute data of each dimension of the client is divided according to the second time slice, and therefore the attributes of the clients can be characterized by adopting more statistical indexes. For example, the characterization parameters of the fluctuation degree of the client attribute data may include a standard deviation, a fluctuation rate, a variation coefficient, a maximum value, a minimum value, an average value, a same ratio, a ring ratio, and a numerical ordering of the same characterization parameter, where the attributes of the client may be simultaneously characterized by a plurality of statistical indicators, such as the standard deviation, the fluctuation rate, the variation coefficient, the maximum value, the minimum value, the average value, the same ratio, the ring ratio, and the numerical ordering of the same characterization parameter, so that the attributes of the client may be characterized from a wider thinking perspective.
When attribute data of different dimensions of a client is divided into sub-attribute data of a plurality of clients according to a second time slice along the time dimension, the division units of the second time slice may be the same or different. Optionally, when attribute data of different dimensions of the client is divided into a plurality of sub-attribute data of the client according to the second time slice along the time dimension, the slicing units of the second time slice may be the same.
Optionally, the second time slice comprises at least one of a year, a season, a month, and a week.
The statistical time unit of the attribute data of different dimensions of the client can be different, and the statistical time unit of the attribute data of different dimensions of the client can be years, seasons, months and weeks. When the statistical time units of the attribute data of different dimensions of the client are different, the statistical time units can be used as the segmentation units of the second time slice, so that the accuracy of the fluctuation degree characterization parameters of the attribute data of different dimensions of the client can be improved, the accuracy of the second time slice can be improved, the attributes of the client can be characterized by adopting various statistical indexes corresponding to different time slices, and the universality of the thinking angle for characterizing the attributes of the client can be further improved. Moreover, when the statistical time unit of the attribute data of different dimensions of the client is small, for example, it may be a month, a plurality of second time slices may be set according to the statistical time unit of the attribute data of the client, and each second time slice includes a year, a season, and a month, and then fluctuation degree characterization parameters of the attribute data corresponding to different time slices are calculated, so as to further increase the universality of the thinking angle for characterizing the attributes of the client.
And S540, forming a data set according to the network parameters, the attribute data of at least two dimensions of the client and the fluctuation degree characterization parameters of the attribute data of each dimension.
The fluctuation degree characterization parameters of the attribute data of each dimension are integrated in the data set, so that the data dimension in the data set can be increased, and the accuracy of the data set is improved. The accuracy of the anti-fraud model may be improved when subsequently building the anti-fraud model from the data set.
And S550, establishing an anti-fraud model according to the data set.
S560, grouping different financial institutions to determine at least two groups of financial institutions;
s570, rejecting heterogeneous data in the historical loss data of each group of financial institutions according to an anti-fraud model to form at least two groups of sample loss data;
and S580, respectively selecting the threshold value of the threshold value model according to the loss data of each group of samples so as to establish the threshold value model corresponding to each group of financial institutions.
On the basis of the technical scheme, the anti-fraud model is established according to the data set, and the anti-fraud model comprises the following steps:
selecting a part of data sets as training sets; wherein the proportion of the training set occupying the data set is greater than three fifths and less than four fifths;
where the data set may be used in part for training and in part for validation. When the anti-fraud model is trained, more than 50% of data sets are selected as training sets to train the anti-fraud model, and the accuracy of the anti-fraud model can be guaranteed. Illustratively, the training set may be 70% of the data set.
And selecting variables in the data set as explanatory variables, and using first variables corresponding to the cheating customers and second variables corresponding to the non-cheating customers as explained variables, and completing model training by adopting data in the training set according to a machine learning algorithm to establish an anti-cheating model.
When the anti-fraud model is established according to the data set, a fraudulent client is marked as 1, a non-fraudulent client is marked as 0 and is used as an explained variable, then data in the data set is used as the explained variable, data in part of the data set is selected as a training set, a variable with strong interpretability on fraudulent behaviors is screened out according to machine learning algorithms such as a logistic regression algorithm and an xgboost algorithm, and training of the model is completed.
On the basis of the above technical solution, after the anti-fraud model is established according to the data set, the method further includes:
selecting another part of data set as a verification set; wherein the sum of the validation set and the training set is a data set;
when the anti-fraud model is established, the data set can be divided into a training set and a verification set, the intersection of the training set and the verification set is zero, and the data set is a data set, so that the data set can be fully utilized. Illustratively, the validation set may be 30% of the data set.
And verifying the anti-fraud model according to the verification set.
After the anti-fraud model training is finished, the other part of data set is adopted as a verification set to verify the anti-fraud model, so that the discrimination and the stability of the anti-fraud model can be detected, and the accuracy of the anti-fraud model is ensured.
The embodiment of the invention also provides a device for establishing the threshold model based on the financial risk. Fig. 6 is a schematic structural diagram of a threshold model establishing apparatus based on financial risk according to an embodiment of the present invention. The present embodiments may be applicable to situations where threshold models are employed for assessing financial risk. The financial risk-based threshold model establishing device provided by the embodiment of the invention can execute the financial risk-based threshold model establishing method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the executing method.
The device comprises a grouping module 10, a data eliminating module 20 and a threshold value model establishing module 30, wherein: the grouping module 10 is configured to group different financial institutions to determine at least two groups of financial institutions, the data removing module 20 is configured to remove heterogeneous data in the historical loss data of each group of financial institutions according to an anti-fraud model to form at least two groups of sample loss data, and the threshold model establishing module 30 is configured to respectively select a threshold of a threshold model according to each group of sample loss data to establish a threshold model corresponding to each group of financial institutions.
The financial risk-based threshold model establishing device provided by the embodiment of the invention can execute the financial risk-based threshold model establishing method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the executing method.
The embodiment of the invention also provides equipment. Fig. 7 is a schematic structural diagram of a terminal according to an embodiment of the present invention. FIG. 7 illustrates a block diagram of an exemplary device 412 suitable for use in implementing embodiments of the present invention.
The device 412 shown in fig. 7 is only an example and should not impose any limitation on the functionality or scope of use of embodiments of the present invention.
As shown in fig. 7, the device 412 is in the form of a general purpose device. The components of device 412 may include, but are not limited to: one or more processors 416, a storage device 428, and a bus 418 that couples the various system components including the storage device 428 and the processors 416.
Bus 418 represents one or more of any of several types of bus structures, including a memory device bus or memory device controller, a peripheral bus, an accelerated graphics port, and a processor or local bus using any of a variety of bus architectures. By way of example, such architectures include, but are not limited to, Industry Standard Architecture (ISA) bus, Micro Channel Architecture (MAC) bus, enhanced ISA bus, Video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Device 412 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by device 412 and includes both volatile and nonvolatile media, removable and non-removable media.
Storage 428 may include computer system readable media in the form of volatile Memory, such as Random Access Memory (RAM) 430 and/or cache 432. The device 412 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 434 may be used to read from and write to non-removable, nonvolatile magnetic media (not shown in FIG. 7, commonly referred to as a "hard drive"). Although not shown in FIG. 7, a magnetic disk drive for reading from and writing to a removable, nonvolatile magnetic disk (e.g., a "floppy disk") and an optical disk drive for reading from or writing to a removable, nonvolatile optical disk such as a Compact disk Read-Only Memory (CD-ROM), Digital Video disk Read-Only Memory (DVD-ROM) or other optical media may be provided. In these cases, each drive may be connected to bus 418 by one or more data media interfaces. Storage 428 may include at least one program product having a set (e.g., at least one) of program modules that are configured to carry out the functions of embodiments of the invention.
A program/utility 440 having a set (at least one) of program modules 442 may be stored, for instance, in storage 428, such program modules 442 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each of which examples or some combination thereof may comprise an implementation of a network environment. The program modules 442 generally perform the functions and/or methodologies of the described embodiments of the invention.
The device 412 may also communicate with one or more external devices 414 (e.g., keyboard, pointing terminal, display 424, etc.), with one or more terminals that enable a user to interact with the device 412, and/or with any terminals (e.g., network card, modem, etc.) that enable the device 412 to communicate with one or more other computing terminals. Such communication may occur via input/output (I/O) interfaces 422. Further, the device 412 may also communicate with one or more networks (e.g., a Local Area Network (LAN), Wide Area Network (WAN), and/or a public Network, such as the internet) via the Network adapter 420. As shown in FIG. 7, network adapter 420 communicates with the other modules of device 412 via bus 418. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the device 412, including but not limited to: microcode, end drives, Redundant processors, external disk drive Arrays, RAID (Redundant Arrays of Independent Disks) systems, tape drives, and data backup storage systems, among others.
The processor 416 executes various functional applications and data processing by executing programs stored in the storage device 428, for example, implementing a threshold model establishing method based on financial risk provided by the embodiment of the present invention, the method includes:
grouping different financial institutions to determine at least two groups of financial institutions;
removing heterogeneous data in the historical loss data of each group of financial institutions according to an anti-fraud model to form at least two groups of sample loss data;
and respectively selecting a threshold value of the threshold value model according to the loss data of each group of samples so as to establish the threshold value model corresponding to each group of financial institutions.
Embodiments of the present invention further provide a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements a threshold model establishing method based on financial risk according to an embodiment of the present invention, where the method includes:
grouping different financial institutions to determine at least two groups of financial institutions;
removing heterogeneous data in the historical loss data of each group of financial institutions according to an anti-fraud model to form at least two groups of sample loss data;
and respectively selecting a threshold value of the threshold value model according to the loss data of each group of samples so as to establish the threshold value model corresponding to each group of financial institutions.
Computer storage media for embodiments of the invention may employ any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or terminal. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.

Claims (15)

1. A threshold model building method based on financial risk is characterized by comprising the following steps:
grouping different financial institutions to determine at least two groups of financial institutions;
removing heterogeneous data in the historical loss data of each group of financial institutions according to an anti-fraud model to form at least two groups of sample loss data;
and respectively selecting a threshold value of a threshold value model according to each group of the sample loss data so as to establish a threshold value model corresponding to each group of financial institutions.
2. The financial risk based threshold model building method of claim 1, wherein grouping different financial institutions comprises:
selecting attribute data of at least two dimensions of the financial institution; wherein the attribute data of the financial institution includes at least two of: the data of the industrial and commercial information, the data of the financial statement, the credit data, the risk index data and the data of the financial product;
calculating a fluctuation degree characterization parameter of the attribute data of each dimension of the financial institution according to a first time slice aiming at the attribute data of each dimension to form an attribute data sample set; wherein the characterization parameters of the fluctuation degree of the financial institution attribute data include at least one of the following: sorting the standard deviation, the fluctuation rate, the variation coefficient, the maximum value, the minimum value, the average value, the same ratio, the ring ratio and the same characterization parameter;
and clustering based on a set clustering algorithm according to the attribute data sample set, and grouping the financial institutions according to clustering results.
3. The financial risk based threshold model building method of claim 2, wherein the set clustering algorithm is a K-means algorithm.
4. The method for establishing a threshold model based on financial risk according to claim 2, wherein clustering is performed based on a set clustering algorithm according to the attribute data sample set, and the financial institutions are grouped according to a clustering result, comprising:
selecting k samples in the attribute data sample set as initial center points;
grouping according to Euclidean distances from data points in the attribute data sample set to the initial central point;
and updating the initial central point according to the central point of each group for iteration until the central point of each group is unchanged, and determining samples in each group as a group of financial institutions.
5. The financial risk based threshold model building method of claim 2, wherein the units of segmentation of the first time slice comprise at least one of years, seasons and months.
6. The financial risk based threshold model building method according to claim 1, wherein the removing of the heterogeneous data in the historical loss data of each group of financial institutions according to the anti-fraud model comprises:
identifying each group of said financial institution's fraudulent customers according to said anti-fraud model;
determining historical loss data formed by the fraudulent customer for the financial institution as the heterogeneity data;
and rejecting the heterogeneous data in the historical loss data of the financial institution.
7. The financial risk based threshold model building method of claim 1, further comprising, before culling the heterogeneous data from the historical loss data of each group of financial institutions according to an anti-fraud model:
and establishing an anti-fraud model according to the customer data of the financial institution.
8. The financial risk based threshold model building method of claim 7, wherein building an anti-fraud model based on customer data of the financial institution comprises:
calculating network parameters according to the capital flow network of the financial institution client; wherein the customers include historical fraudulent customers and historical non-fraudulent customers, and the network parameters include at least one of: medium degree, fixed point degree, medium number and average distance;
forming a data set according to the network parameters and attribute data of at least two dimensions of the client; wherein the attribute data of the client includes at least two of: the system comprises industrial and commercial information data, tax data, financial statement data, guarantee data, multi-head loan data, complex network indexes and financial product data;
and establishing an anti-fraud model according to the data set.
9. The financial risk based threshold model building method of claim 8, wherein forming a data set from the network parameters and the attribute data of at least two dimensions of the customer comprises:
selecting attribute data of at least two dimensions of the client;
according to the attribute data of each dimension of the client, calculating a fluctuation degree characterization parameter of the attribute data of the dimension according to a second time slice; wherein the characterizing parameters of the degree of fluctuation of the customer property data include at least one of: sorting the standard deviation, the fluctuation rate, the variation coefficient, the maximum value, the minimum value, the average value, the same ratio, the ring ratio and the same characterization parameter;
and forming the data set according to the network parameters, the attribute data of at least two dimensions of the client and the fluctuation degree characterization parameters of the attribute data of each dimension.
10. The financial risk based threshold model building method of claim 9, wherein the second time slice comprises at least one of a year, a season, a month and a week.
11. The financial risk based threshold model building method according to claim 8 or 9, wherein building an anti-fraud model from the data set comprises:
selecting part of the data set as a training set; wherein the proportion of the training set that occupies the data set is greater than three fifths and less than four fifths;
and selecting variables in the data set as explanatory variables, and a first variable corresponding to a fraudulent customer and a second variable corresponding to a non-fraudulent customer as explained variables, and completing model training by adopting the data in the training set according to a machine learning algorithm to establish the anti-fraud model.
12. The financial risk based threshold model building method of claim 11, further comprising, after building an anti-fraud model from the data set:
selecting another part of the data set as a verification set; wherein the sum of the validation set and the training set is the data set;
and verifying the anti-fraud model according to the verification set.
13. A financial risk based threshold model building apparatus, comprising:
the grouping module is used for grouping different financial institutions to determine at least two groups of financial institutions;
the data removing module is used for removing heterogeneous data in the historical loss data of each group of financial institutions according to the anti-fraud model to form at least two groups of sample loss data;
and the threshold model establishing module is used for respectively selecting the threshold of the threshold model according to each group of the sample loss data so as to establish the threshold model corresponding to each group of the financial institutions.
14. An apparatus, characterized in that the apparatus comprises:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement a financial risk based threshold model building method as recited in any one of claims 1-12.
15. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out a financial risk based threshold model building method according to any one of claims 1-12.
CN202111326988.7A 2021-11-10 2021-11-10 Threshold model establishing method, device, equipment and medium based on financial risk Pending CN114092222A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111326988.7A CN114092222A (en) 2021-11-10 2021-11-10 Threshold model establishing method, device, equipment and medium based on financial risk

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111326988.7A CN114092222A (en) 2021-11-10 2021-11-10 Threshold model establishing method, device, equipment and medium based on financial risk

Publications (1)

Publication Number Publication Date
CN114092222A true CN114092222A (en) 2022-02-25

Family

ID=80299575

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111326988.7A Pending CN114092222A (en) 2021-11-10 2021-11-10 Threshold model establishing method, device, equipment and medium based on financial risk

Country Status (1)

Country Link
CN (1) CN114092222A (en)

Similar Documents

Publication Publication Date Title
US11810204B2 (en) Artificial intelligence transaction risk scoring and anomaly detection
US20200192894A1 (en) System and method for using data incident based modeling and prediction
CN110378786B (en) Model training method, default transmission risk identification method, device and storage medium
US8296205B2 (en) Connecting decisions through customer transaction profiles
US20210133490A1 (en) System and method for unsupervised abstraction of sensitive data for detection model sharing across entities
Van Thiel et al. Artificial intelligence credit risk prediction: An empirical study of analytical artificial intelligence tools for credit risk prediction in a digital era
Rahmawati et al. Fraud detection on event log of bank financial credit business process using Hidden Markov Model algorithm
CN110852878A (en) Credibility determination method, device, equipment and storage medium
US20110099101A1 (en) Automated validation reporting for risk models
Van Thiel et al. Artificial intelligent credit risk prediction: An empirical study of analytical artificial intelligence tools for credit risk prediction in a digital era
Dzhaparov Application of blockchain and artificial intelligence in bank risk management
CN110992041A (en) Individual behavior hypersphere construction method for online fraud detection
CN112419030A (en) Method, system and equipment for evaluating financial fraud risk
WO2023067025A1 (en) Mixed quantum-classical method for fraud detection with quantum feature selection
CN115545886A (en) Overdue risk identification method, overdue risk identification device, overdue risk identification equipment and storage medium
US11556734B2 (en) System and method for unsupervised abstraction of sensitive data for realistic modeling
Reddy et al. CNN-Bidirectional LSTM based Approach for Financial Fraud Detection and Prevention System
CN114092222A (en) Threshold model establishing method, device, equipment and medium based on financial risk
CN114820158A (en) Method and device for monitoring security data by bank based on rule engine
CN114240100A (en) Loan assessment method, loan assessment device, loan assessment computer equipment and loan assessment storage medium
Kriksciuniene et al. Research of customer behavior anomalies in big financial data
US20210133644A1 (en) System and method for unsupervised abstraction of sensitive data for consortium sharing
Shilpa Analyzing the Bank Scam's Financial Fraud and its Technological Repercussions using Data Mining
Kang Fraud Detection in Mobile Money Transactions Using Machine Learning
EP4310755A1 (en) Self learning machine learning transaction scores adjustment via normalization thereof

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination