Detailed Description
In order to make the objects, technical solutions and effects of the present application clearer and more specific, the present application will be described in further detail below with reference to the accompanying drawings and examples. In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, interfaces, techniques, etc., in order to provide a thorough understanding of the present application.
The term "and/or" is herein merely an association relationship describing an associated object, meaning that there may be three relationships, e.g., a and/or B, may represent: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" herein generally indicates that the front and rear associated objects are an "or" relationship. Further, "a plurality" herein means two or more than two. In addition, the term "at least one" herein means any one of a plurality or any combination of at least two of a plurality, for example, including at least one of A, B, C, and may mean including any one or more elements selected from the group consisting of A, B and C.
The risk identification method described in the present application may be performed by an electronic device, which may be any device having processing capability, for example, a mobile phone, a computer, etc., and may also be referred to as an executing device.
Referring to fig. 1, fig. 1 is a flowchart of an embodiment of a risk identification method according to the present application.
Specifically, the method may include:
step S110: and obtaining target business data.
The target business data may be, among other things, financial, funding related business data, e.g., business data about travel reimbursement, etc. Further, the target service data may include several pieces of service data.
In particular, the target business data may be business data generated by a financial management system from which the execution device may obtain the target business data.
Step S120: and dividing the target service data by using the risk identification model to obtain risk service data and non-risk service data.
It should be noted that the risk recognition model may be a model obtained by training in advance, and may be used to perform risk recognition on service data input to the model, and determine that the service data is of a risk data type or a non-risk data type, so as to divide the service data into risk service data and non-risk service data. Wherein the risk data type indicates that the business data is determined to be at risk of funds, and the non-risk data type indicates that the business data is determined to be at no risk of funds.
In particular, the target business data may include several pieces of business data, the risk business data may include business data in the target business data identified as a risk data type, and the non-risk business data may include business data in the target business data identified as a non-risk data type.
The risk identification model can be obtained by training service data after manual labeling in advance.
Step S130: and determining the risk category of the risk service data by using the risk classification model.
It should be noted that the risk classification model may be a model obtained by training in advance, and may be used to perform risk classification on input service data, and determine risk categories of the service data, so as to divide the service data into corresponding risk categories. The risk categories may include a number of predetermined risk categories. In some embodiments, the risk categories include four types of expense monitoring, budget monitoring, in-process monitoring, and financing monitoring.
The risk classification model can be obtained by training service data after manual labeling in advance.
Of course, the risk categories that the risk classification model can divide may not be limited to the categories in the above embodiment, and the risk categories that the risk classification model can divide may be adjusted according to actual application needs.
The risk service data may be output by a risk identification model, and the risk identification model performs risk identification on the target service data.
Step S140: the risk classification model is enhanced trained based on sample business data generated from the risk business data, and based on sample to-be-classified data generated from the risk business data of the determined risk category.
The risk service data is obtained by performing risk identification on target service data by a risk identification model and is used for generating sample service data, and the sample service data is used for performing reinforcement training on the risk identification model. The risk business data of the determined risk category is obtained by risk classification of the risk business data by a risk classification model and is used for generating sample data to be classified, and the sample data to be classified is used for carrying out reinforcement training on the risk classification model.
According to the scheme, the risk service data are obtained from the target service data by means of identification by using the risk identification model, the risk classification model is used for generating sample service data to carry out enhancement training on the risk identification model, the risk classification model is used for generating sample data to be classified to carry out enhancement training on the risk classification model, on one hand, the number of samples for model training is increased, the information quantity for model learning is increased, the accuracy of the model is improved, and therefore the accuracy of risk identification is improved, on the other hand, the information about financial risks in the target service data is mined from the unordered target service data by means of processing of the model, and is used in training of the model, the information quantity for model learning is increased, the accuracy of the model is improved, and the accuracy of risk identification is improved.
Referring to fig. 2, fig. 2 is a flowchart illustrating a step S120 according to another embodiment of the application. Specifically, step S120 includes:
step S221: and sequentially identifying and obtaining direct risk data, indirect risk data and rare risk data from the target business data by using the risk identification model.
Wherein, the direct risk data represents the business data which is directly generated by the financial management system and has financial risk, the indirect risk data represents the business data which is indirectly generated by the financial management system and has financial risk, and the rare risk data represents the business data which has low occurrence frequency and has financial risk.
In a specific application scenario, for suspected recurring payment monitoring, traffic data satisfying the following conditions may be identified as direct risk data: 1. taking the latest transaction flow record monitored on the same day as the payment record consistent with the name, account number, amount and abstract of the receiving and paying party within the range of 30 days (including 30 days) of the history, and monitoring the payment record with the repeated number of >0 (without the monitoring object); 2. transaction direction = payout; 3. and removing transaction records containing words of "submitted", "batch return", "actual return", "return set", "fund pool member" from the payment abstract.
In a specific application scenario, for the receivable bill budget execution service, the service data may include a receivable bill execution deviation rate (x) = |collection bill scale (actual execution number-secondary schedule number) |/secondary schedule number × 100%, (100% is taken when greater than 100%; zero is the denominator, the numerator is not zero, the deviation rate is 100%; zero is the denominator, the numerator is also zero, the deviation rate is 0%;) indirectly generated by the financial management system, and the service data may be identified as indirect risk data when the risk judging condition is satisfied.
In a specific application scenario, for account status anomaly monitoring, business data meeting the following conditions may be identified as rare risk data: 1. the "hanging state=unhooked" and "whether to handle unable hanging record=no" of the bank account; 2. the "monitoring status=unauthorized" of the bank account and "whether no monitoring record is done=no".
Step S222: the direct risk data, the indirect risk data and the rare risk data are taken as risk service data together, and part of service data except the direct risk data, the indirect risk data and the rare risk data in the target service data is taken as non-risk service data.
The direct risk data, the indirect risk data and the rare risk data are all business data with financial risks and are taken as risk business data. The target business data may be the non-risk business data, in which the target business data is the business data other than the three, and the target business data is determined not to belong to the three risk data, and the target business data is not at risk of funds.
In some embodiments, in the risk recognition process of the risk recognition model, the type of risk data included in the risk business data may be adjusted according to the actual application requirement. The risk recognition model is used for sequentially recognizing and obtaining direct risk data and indirect risk data from the target business data. Alternatively, the kinds of risk data included in the risk service data may be increased.
Referring to fig. 3, fig. 3 is a flowchart illustrating a step S221 according to another embodiment of the application. Specifically, step S221 includes:
step S3211: and acquiring a first probability that the target business data belongs to the direct risk data.
Specifically, different kinds of risk data are identified in different ways, and direct risk data, indirect risk data and rare risk data are identified in different ways. The method comprises the steps of firstly identifying direct risk data from target service data, then identifying indirect risk data in the portions except the direct risk data in the target service data, and finally identifying rare risk data in the portions except the direct risk data and the indirect risk data in the target service data so as to distinguish the target service data into risk service data and non-risk service data.
In some embodiments, the first probability that each piece of business data belongs to the direct risk data may be calculated by the following formula:
wherein Pr (S/W) represents a first probability that the piece of business data belongs to the direct risk data, pr (S) represents an overall probability that any business data is risk data, pr (W/S) represents an overall probability that "risk data in business data" appears in risk data, pr (H) represents a probability that any business data is not risk data, pr (W/H) represents a probability that risk data appears in business data, pr (S), pr (W/S), pr (H), pr (W/H) are calculated based on target business data.
Step S3212: direct risk data is determined from the target business data based on the first probability.
In particular, it may be determined whether the piece of business data belongs to direct risk data according to whether the first probability of the business data meets a first preset requirement.
In a specific application scenario, the first preset condition may be that a first probability threshold is preset, so that service data with the first probability reaching the first probability threshold is used as direct risk data. Of course, the first preset condition can be adjusted according to the actual application requirement.
Step S3213: and acquiring a second probability that part of business data except the direct risk data in the target business data belongs to the indirect risk data.
The business data may include a plurality of sub-data, and for one piece of business data, whether the business data belongs to indirect risk data is associated with risk probability of the sub-data.
In some embodiments, the second probability that each piece of business data in the target business number, excluding the direct risk data, belongs to the indirect risk data may be calculated by the following formula:
wherein p represents the second probability that the business data is indirect risk data, p 1 The first piece of sub data representing the business data corresponds to p (S/W) of risk data 1 ) Probability, p 2 Representing that the second piece of sub-data of the business data corresponds to p (S/W) 2 ) Probability … …, p n P (S/W) indicating that the nth sub data of the business data corresponds to risk data N ) Probability. P is p 1 、p 2 、……p n Is obtained for each piece of sub data corresponding to the piece of service data.
Step S3214: and determining indirect risk data from the part of the target business data except the direct risk data based on the second probability.
In particular, it may be determined whether the piece of business data belongs to indirect risk data according to whether the second probability of the business data meets a second preset requirement.
In a specific application scenario, the second preset condition may be that the second probability threshold is preset, so that the business data with the second probability reaching the second probability threshold is used as indirect risk data. Of course, the second preset condition can be adjusted according to the actual application requirement.
Step S3215: and acquiring a third probability that part of business data except the direct risk data and the indirect risk data in the target business data belong to rare risk data.
It should be noted that rare risk data is data having a relatively low frequency of occurrence of data and having a financial risk. The third probability of each piece of business data other than the direct risk data and the indirect risk data in the target business data may be calculated by the following formula:
wherein the above formula can be extended to the case where n is equal to zero (garbage undefined) and in this case estimated as PrS.
Wherein P' r (S/W) represents the third probability that the business data is rare risk data, and can also be understood as the correction probability of the business data (i.e. the probability calculated by updating on the basis of the first probability), S represents the intensity of the background information related to the input risk data, pr (S) represents the probability that any input data is risk data, n represents the number of times this data occurs in the learning phase, and Pr (S/W) represents the first probability that the piece of business data belongs to direct risk data.
In some embodiments, prS may again be equal to 0.5 to avoid over-suspicion of incoming traffic data. 3 is a good value for s, meaning that the learned corpus must contain 3 pieces of information beyond the business information, with more confidence in the risk value than in the default value.
Step S3216: based on the third probability, rare risk data is determined from the portion of the target business data other than the direct risk data and the indirect risk data.
In particular, it may be determined whether the piece of business data belongs to rare risk data based on whether the third probability of the business data meets a third preset requirement.
In a specific application scenario, the third preset condition may be that the third probability threshold is preset, so that the business data with the third probability reaching the third probability threshold is taken as the rare risk data. Of course, the third preset condition can be adjusted according to the actual application requirement.
Referring to fig. 4, fig. 4 is a flowchart of another embodiment of the risk identification method of the present application.
Specifically, the method may comprise the steps of:
step S410: and obtaining target business data.
In some embodiments, the executing device may acquire the target service data at intervals of a second preset time, and further acquire the target service data from the financial management system at intervals of the second preset time.
In a specific application scenario, the second preset duration may be 5 minutes. Of course, the second preset duration may also be evaluated according to the actual traffic data volume, and the second preset duration may also be a preset duration, for example.
In some embodiments, the acquiring of the target service data may also be triggered according to an amount of unprocessed service data generated in the financial management system, and the executing device may acquire the unprocessed service data as the target service data, for example, when the amount of unprocessed service data reaches the first preset amount.
In some embodiments, after the target service data is obtained, the target service data may also be preprocessed, which may include, but is not limited to, desensitization, washing, and the like.
In some embodiments, after the target service data is obtained, the target service data may also be stored in a database in the execution device for subsequent invocation.
By periodically acquiring the business data generated by the financial management system, risk identification and risk classification are performed, and each acquired target business data may include business data of an ongoing business and business data of a completed business. Illustratively, for example, the A reimbursement is in the reimbursement flow, and the B reimbursement has completed. The risk identification and risk classification are performed on the service data of the completed service, so that the evaluation of the funds risk which has occurred can be realized. Risk identification and risk classification are performed on the business data of the ongoing business, and prediction of impending funds risk can be achieved. Therefore, the method can identify the generated fund risk and the impending fund risk, so that the whole coverage of fund risk identification is realized, and the fund risk is accurately and comprehensively identified.
Step S420: and dividing the target service data by using the risk identification model to obtain risk service data and non-risk service data.
In a specific application scenario, inputting a data set of target service data into a risk identification model, and carrying out risk identification on each piece of service data in the model to obtain a risk type prediction result, wherein the risk type prediction result indicates which type of the risk data type and the non-risk data type the service data belongs to. The risk service data can be divided into two data sets of risk service data and non-risk service data based on the risk type prediction result.
In some embodiments, the executing device may store the two data sets of the risk service data and the non-risk service data obtained by processing the risk identification model in a database for subsequent use, for example, for subsequent model invocation.
Step S430: and determining the detail class of the risk service data.
The risk service data includes a plurality of service data, and specifically, the step may be to determine what preset detail class each piece of service data belongs to. The number of detail categories is greater than the number of risk categories.
In some embodiments, the detail categories may include, but are not limited to, the following preset categories: low efficiency account monitoring, overdraft account monitoring, unaccounted account monitoring, account external account monitoring, electric charge account abnormal expenditure monitoring, account status abnormal monitoring, electric charge account unpaid data monitoring, bank account validity monitoring, electronic payment monitoring, suspected repeated payment monitoring, reserve balance monitoring, external borrowing funds monitoring, payable balance abnormal monitoring, large-amount transfer account monitoring, refund ticket monitoring, payment process rollback monitoring, payment receipt transfer timeliness monitoring, large-amount payoff monitoring, "balance" monitoring, large-amount cash balance monitoring, internal closure settlement monitoring, funds bidirectional transaction abnormal monitoring, special payment monitoring, MAC address repeated interception and withdrawal, supplier blacklist withdrawal, signature process mismatch interception, external independent account large-amount payment interception, external supplier large-amount payment interception, private large-amount payment instruction interception, excessive large-amount interception, loan repayment timeliness monitoring, melting account regulation timeliness monitoring, cash redemption ticket monitoring, new-receipt unit addition monitoring, stand-by-order, account regulation execution timeliness, bank account regulation accuracy, non-acceptance ticket execution timeliness, non-payment acceptance ticket monitoring.
The risks indicated by the detail categories are different, and the specific steps are as follows:
inefficient account monitoring: monitoring accounts with account transaction times less than a certain standard under different account classifications; superscalar account monitoring: monitoring account numbers exceeding the account classification management and control standard number under the classification of each account of each unit; and (5) unaccounted for money monitoring: monitoring the condition that the account deposit quantity is not reached at the end of the month in each unit; and (5) account external account monitoring: in the fund balance monitoring transaction, the account name is a unit in the system, but the account number is not included in a transaction record in a monitoring range according to the regulation; monitoring abnormal expenditure of the electric charge account: monitoring transaction records of fund expenditure behaviors except fund payment and commission fee of the electric charge account; bank account validity monitoring: auditing all accounts in the previous day through balance rules, analyzing and judging the state of the bank account (the primary balance, income and expenditure in the previous day are compared with the final balance in the previous day, if the primary balance, income and expenditure in the previous day are inconsistent, displaying the account difference condition); three-party account checking real-time monitoring of electric charge account: the method comprises the steps that in a third party account checking result of an electric charge account of a fund center, records of at least one of an unknown account, an unclean account or an unread account exist; electronic payment monitoring: monitoring the electronic payment ratio and the non-electronic payment condition of each unit; suspected duplicate payment monitoring: monitoring payment records with the same payment unit and account number, payment amount and payment abstract in nearly 30 days; standby gold balance monitoring: monitoring the detail of spare gold stock of each unit; monitoring the external borrowing funds: monitoring a detailed payment record of the externally borrowed funds; payable balance anomaly monitoring: monitoring a detailed payment record of the externally borrowed funds; monitoring the large amount of households: monitoring records of single transfer amount of more than 1000 ten thousand yuan or accumulated transfer amount of more than 2000 ten thousand yuan from an electric financial account to an external bank account in each unit; and E, ticket refund monitoring: monitoring records of funds payment refunds; payment flow rollback monitoring: monitoring rollback conditions of centralized payment service; monitoring the timeliness of payment receipt transfer: monitoring records of centralized payment transfer center time and rolling schedule date greater than 3 days; large amount to private payment monitoring: monitoring a private payment record of a single transaction amount of the expense account greater than 5000 yuan; "balance" monitoring: monitoring bank account income, expenditure and balance reflected by financial institution transaction details, and recording differences among income, expenditure and balance reflected by a meeting accounting book; large amount cash balance monitoring: monitoring units with a month accumulated cash subject lender occurrence sum greater than 5000 yuan; large cash balance monitoring: monitoring units for which a large cash balance exists at the end of a month; internal closed settlement monitoring: monitoring the condition that each unit does not execute internal closed settlement; monitoring abnormal funds bidirectional transaction: monitoring the false transaction condition (temporary loan and bridge fund) of each unit; special payment monitoring: the monitoring transaction abstract comprises special payment records of characters such as 'hospitality', 'smoke', 'wine', 'tea', 'gift', 'reception' and the like; MAC address repeat intercept retract: monitoring the repeated interception and withdrawal condition of the MAC address; vendor blacklist intercept returns: monitoring a blacklist interception return condition of a provider; signature flow mismatch interception returns: monitoring the unmatched interception return condition of the signature process; external independent account micropayment interception: monitoring the interception and early warning condition of the large payment of the external independent account; external vendor micropayment interception: monitoring interception pre-warning conditions of large payment to an external provider; intercept and early warn for private large payment: monitoring interception early warning conditions of the private large payment; abnormal time payment instruction interception and early warning: monitoring abnormal time payment instruction interception early warning conditions; ultra-large payment interception early warning: monitoring the ground processing condition of the extra-large payment implementation; monitoring loan repayment timeliness: monitoring records of overdue unused loans; normative monitoring of financing ledger: monitoring the condition that the end balance of each unit financing machine account and the account balance are different; monitoring the bill pay-out timeliness: the payable bill is not redeemed (the bill record which expires before the expiration date is monitored and the payable bill is not processed), the payable bill is not redeemed (the bill record which expires before the expiration date is monitored and the payable bill is not processed); monitoring a newly added bill collection unit: monitoring a unit for newly increasing bill collecting behavior; monitoring a newly added bill collection unit: monitoring a unit for newly increasing bill collecting behavior; bill standing book standardization monitoring: monitoring the condition that the amount of the accumulated account bill surface of each unit of receivables and the accumulated account bill balance are different; receivables budget execution accuracy: the method comprises the steps of monitoring the deviation condition of a month number preset number and an actual execution number of a bill which is required to be received at the end of a month on each unit; payable budget execution accuracy: the method comprises the steps of monitoring the deviation condition of a month pre-calculated number and an actual execution number of a monthly payable bill on each unit; bill standing account-date of drawing normalization monitor: monitoring a bill record of which the bill standing account registration information is inconsistent with the bill out date; and (3) normative monitoring of the acceptance bank: the monitoring bill standing account-cashier does not contain bill records of the fields of 'agricultural bank', 'industrial and commercial bank', 'construction bank', 'Chinese bank'; monitoring of non-bank acceptance draft: monitoring bill standing accounts-monitoring bill records of which the bill type is not 'bank acceptance draft'; non-electronic ticket monitoring: monitoring the bill records that the bill ledger-bill media is not "electronic bill".
The risk presented by the different detail categories is different, and the risk presented by the same detail category has commonality, and the classification of the detail categories is a way of further mining information about risk identification included in the target business data on the basis of risk identification. In some embodiments, other processing methods may be used to mine the information about risk identification contained in the target service data instead of the classification of the detail class.
In some embodiments, the execution device may store the risk business data as risk detail data in a database for subsequent use after classifying the risk business data according to the detail class.
Step S440: and determining the risk category of the risk service data with the determined detail category by using the risk classification model.
The related description of step S440 may refer to the related content about step S130 in the foregoing embodiment.
Further, mapping the risk business data of the determined detail categories to the dimension of a classification function by using a risk classification model, and determining separation hyperplanes corresponding to the risk categories respectively; in the classification function dimension, the risk business data of the determined detail categories are classified to correspond to the risk categories based on the separation hyperplane. Thereby, risk business data with the detail class determined can be obtained.
In a specific application scenario, each piece of business data in the risk business data for which the detail class has been determined has already determined a good detail class. Risk business data for which a detail class has been determined can be further divided into several major classes of risk classes. For example, the detail class low-efficiency account monitoring, overdraft account monitoring, unaccounted money monitoring, account out-of-account monitoring, electric charge account abnormal expenditure monitoring, account state abnormal monitoring, electric charge account unaccounted data monitoring, bank account validity monitoring, and electric charge account three-party account real-time monitoring may be determined as the risk class of account monitoring. The risk categories of detail category electronic payment monitoring, suspected duplicate payment monitoring, reserve balance monitoring, out-of-funds monitoring, payable balance anomaly monitoring, large-amount subscriber monitoring, refund monitoring, payment process rollback monitoring, payment document delivery timeliness monitoring, large-amount to private payment monitoring, "balance" monitoring, large-amount cash balance monitoring, internal closed settlement monitoring, funds bi-directional transaction anomaly monitoring, special payment monitoring may be determined as balance monitoring. The detail class MAC address repeat intercept returns, vendor blacklist intercept returns, signature flow mismatch intercept returns, external independent account bulk payment intercept, external vendor bulk payment intercept, intercept early warning for private bulk payment, abnormal time payment instruction intercept early warning, and extra bulk payment intercept early warning can be determined as a risk class for in-process monitoring. The risk category of payoff timeliness monitoring, financing ledger standardization monitoring, bill payoff timeliness monitoring, newly added receivables entity monitoring, bill ledger standardization monitoring, receivables budget execution accuracy, payable bill budget execution accuracy, bill ledger-ticket date standardization monitoring, acceptance bank standardization monitoring, non-bank acceptance draft monitoring, non-electronic bill monitoring can be determined as financing monitoring.
Through the setting of the detail type and the risk type, the possible fund risks in the whole fund operation process can be identified, and the accuracy and the comprehensiveness of risk prediction are improved.
In some embodiments, different feature transfer functions may be employed for risk business data of different detail categories, i.e., risk business data may be mapped to classification function dimensions in different ways.
Step S450: the risk classification model is enhanced trained based on sample business data generated from the risk business data, and based on sample to-be-classified data generated from the risk business data of the determined risk category.
It should be noted that the enhanced training of the risk identification model and the enhanced training of the risk classification model may be independent of each other. The enhanced training of both models may be performed multiple times.
In some embodiments, the performing device may perform the step of correlating the enhanced training every first preset time period.
In some embodiments, the performing device may also repeatedly perform the relevant steps of the enhanced training of the risk identification model and the enhanced training of the risk classification model, respectively, at different time intervals.
The target service data is continuously acquired and used for generating sample service data and sample data to be classified, so that the enhancement training of the two models is periodically performed, rolling update of training data is realized, on one hand, the number of samples for model training is obviously increased, the information quantity for model learning is increased, the accuracy of the model is improved, and the accuracy of risk identification is improved.
Specifically, the execution device may further generate sample service data by using risk service data of the determined detail class, where the sample service data is labeled with a real risk tag, and the real risk tag characterizes what kind of risk data type and non-risk data type the sample service data belongs to, and the sample service data is used for performing enhanced training on the risk identification model.
In a specific application scenario, since the sample service data is generated according to the risk service data of the determined detail class, the service data included in the sample service data is from the data set of the risk service data, so that the real risk tag corresponding to the service data from the risk service data characterizes that the service data belongs to the risk data type.
In some embodiments, sample business data may also be generated using risk business data of the determined detail category along with other business data. By way of example, the other business data may be sample data that was previously used in training the risk identification model.
Specifically, the execution device may divide the sample service data by using the risk identification model to obtain a sample risk type prediction result, determine sample risk data and sample non-risk data based on the sample risk type prediction result, wherein the sample risk prediction result characterizes which of the risk data type and the non-risk data type the predicted sample service data belongs to, and then adjust parameters of the risk identification model based on a first difference between the risk prediction result and the real risk tag.
And the sample business data belonging to the risk data type is used as sample risk data for the sample risk type prediction result, and the sample business data belonging to the non-risk data type is used as sample non-risk data for the sample risk type prediction result. The sample risk type prediction result is a result of predicting whether the sample business data belongs to a risk data type or a non-risk data type by using the risk recognition model, the real risk label is a predetermined real label related to whether the sample business data belongs to the risk data type or the non-risk data type, and the sample risk type prediction result and the real risk label are compared to obtain a first difference which can be used for adjusting parameters of the risk recognition model, so that accuracy of the risk recognition model is improved.
In some embodiments, since the sample business data is generated based on risk business data of a determined detail category, the real risk tag may further contain information about the detail category, and the risk recognition model may be further configured to learn the information of the detail category contained in the real risk tag, so as to improve accuracy of the risk recognition model.
Specifically, the execution device may further generate sample data to be classified by using risk service data of the determined risk category, where the sample data to be classified is labeled with a real classification label, and the real classification label is used for characterizing the risk category of the sample risk data.
In some embodiments, the risk service data of the determined risk category may also be used to generate sample data to be classified together with other service data. Other business data may be, for example, sample data that was previously used in training the risk classification model.
Specifically, the execution device may determine a risk class of the sample data to be classified using the risk classification model as a sample prediction classification result, and adjust parameters of the risk classification model based on a second difference between the sample prediction classification result and the real class label.
The sample prediction classification result is a result of predicting a risk category of the sample data to be classified by using a risk classification model, the real category label is a real label of the risk category of the sample data to be classified, and parameters of the risk classification model can be adjusted by using a second difference between the real category label and the real label to improve accuracy of the risk classification model.
In a specific application scenario, the training process of the risk classification model includes: the following formula is first determined:
f(x)=sign(w *T ·Φ(x)+b * )
wherein phi (x) represents a characteristic transfer function of the space, x represents sample data to be classified, n pieces of service data are included, and y represents a risk classified data set corresponding to x, wherein n pieces of service data are included. Sign () represents a Sign function. Next, solve for α based on i * (α i Is the optimal solution of (2):
α i ≥0,i=1,2,…,n
then, w, b (optimal solution of w, b) are solved based on the following formula:
finally, the separation hyperplane is calculated as follows:
w * Φ(x)+b * =0
and then, sample data to be classified according to the predicted risk categories can be obtained by utilizing the separation hyperplane, the result is compared with a real category label marked by the sample data to be classified, and the parameters of the risk classification model are adjusted by utilizing the difference between the sample data to be classified and the real category label.
In some embodiments, the execution device may further store risk related data obtained based on the target business data in a database to send data matching the data call request to the target object in response to the data call request. The data call request is sent by the target object, and the risk related data comprises risk service data and non-risk service data.
In some embodiments, the risk related data may also include risk business data, etc., categorized by risk category.
Referring to fig. 5, fig. 5 is a flowchart of a risk identification method according to another embodiment of the application.
In this embodiment, the execution device may run a risk monitoring system, and the execution device includes two databases, where a first database may be used to store target business data obtained from a production environment (financial management system), and so on. The second database may be used to store risk related data, as well as sample business data, sample to-be-classified data, etc. required for enhanced training.
The execution device may communicate with the device of the production environment, obtain the service data generated by the production environment, and store the service data in the first database, and then the risk monitoring system may obtain the target service data from the first database. The risk monitoring system can identify risk business data in the target business data through the risk identification model and the risk classification model, and determine the risk category of the risk business data. The data obtained by processing the risk identification model and the risk classification model can be stored into a second database as risk related data for later enhancement training, and the risk monitoring system can acquire DB data (data set) processed by the risk identification model and the risk classification model from the second database for enhancement training, so that correction of the model is completed. The terminal user can communicate with the execution device through the client operated by the terminal device, and send a data call request to the execution device through the client, so that the execution device feeds back the data stored in the second database to the client, and the client can display related data, and exemplarily, service data of each risk category is displayed.
Referring to fig. 6, fig. 6 is a schematic diagram of a frame of an electronic device according to an embodiment of the application.
In this embodiment, the electronic device 60 includes a memory 61 and a processor 62, wherein the memory 61 is coupled to the processor 62. In particular, the various components of the electronic device 60 may be coupled together by a bus, or the processor 62 of the electronic device 60 may be coupled to each other individually. The electronic device 60 may be any device having processing capabilities, such as a computer, tablet, cell phone, or the like.
The memory 61 is used for storing program instructions executed by the processor 62, data during processing by the processor 62, and the like. Such as risk business data, sample business data, etc. Wherein the memory 61 comprises a non-volatile storage portion for storing the above-mentioned program instructions.
The processor 62 controls the operation of the electronic device 60, the processor 62 may also be referred to as a CPU (Central Processing Unit ). The processor 62 may be an integrated circuit chip having signal processing capabilities. Processor 62 may also be a general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), a Field Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. In addition, the processor 62 may be commonly implemented by a plurality of circuit-forming chips.
The processor 62 is operative to execute instructions to implement any of the risk identification methods described above by invoking program instructions stored in the memory 61.
Referring to fig. 7, fig. 7 is a schematic diagram of a frame of an embodiment of a computer readable storage medium according to the present application.
In this embodiment, the computer readable storage medium 70 stores processor executable program instructions 71, where the program instructions 71 are capable of being executed to implement any of the risk identification methods described above.
The computer readable storage medium 70 may be a medium such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a magnetic disk, or an optical disk, which may store program instructions, or may be a server storing the program instructions, and the server may send the stored program instructions to another device for execution, or may also self-execute the stored program instructions.
In some embodiments, the computer readable storage medium 70 may also be a memory as shown in FIG. 6.
The foregoing description is only of embodiments of the present application, and is not intended to limit the scope of the application, and all equivalent structures or equivalent processes using the descriptions and the drawings of the present application or directly or indirectly applied to other related technical fields are included in the scope of the present application.