WO2019165673A1 - 一种报销单风险预测方法、装置、终端设备及存储介质 - Google Patents

一种报销单风险预测方法、装置、终端设备及存储介质 Download PDF

Info

Publication number
WO2019165673A1
WO2019165673A1 PCT/CN2018/081527 CN2018081527W WO2019165673A1 WO 2019165673 A1 WO2019165673 A1 WO 2019165673A1 CN 2018081527 W CN2018081527 W CN 2018081527W WO 2019165673 A1 WO2019165673 A1 WO 2019165673A1
Authority
WO
WIPO (PCT)
Prior art keywords
reimbursement
model
prediction
success rate
risk level
Prior art date
Application number
PCT/CN2018/081527
Other languages
English (en)
French (fr)
Inventor
袁军
陆源
魏尧东
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019165673A1 publication Critical patent/WO2019165673A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/03Credit; Loans; Processing thereof
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/12Accounting

Definitions

  • the present application relates to the field of computer technology, and in particular, to a method, a device, a terminal device, and a storage medium for predicting a risk of a reimbursement.
  • the embodiment of the present application provides a method for predicting the risk of reimbursement bills, so as to solve the problem that the current risk level prediction model of the reimbursement bill has a low accuracy rate of the risk level prediction of the reimbursement slip.
  • an embodiment of the present application provides a method for predicting a reimbursement risk, including:
  • Regression analysis is performed on the model parameters, the prediction success rate, the test time, and the total predicted success rate to obtain a target prediction model.
  • an embodiment of the present application provides a reimbursement risk prediction apparatus, including:
  • a sample data collection module configured to obtain historical reimbursement information, and use the historical reimbursement information as sample data
  • a first dividing module configured to divide the sample data into training samples and test samples according to a preset ratio
  • a risk level preset module configured to determine a risk level of the reimbursement form of each of the training samples according to a preset definition of the risk level of the N reimbursement orders, wherein N is a positive integer;
  • An initial prediction model obtaining module configured to perform model training using an association rule algorithm for the training samples in each of the reimbursement risk levels, to obtain an initial prediction model, wherein the initial prediction model includes each of the reimbursement An association rule that satisfies a preset model parameter requirement in a single risk level, the model parameter includes a support degree and a confidence level;
  • an initial prediction model test module is configured to perform model prediction on the test sample using the initial prediction model, Calculating a predicted success rate of each of the reimbursement risk levels, and a combination of each of the combinations, in a combination of each of the reimbursement risk levels Total predicted success rate and test time;
  • the target prediction model acquisition module is configured to perform regression analysis on the model parameters, the prediction success rate, the test time, and the total prediction success rate to obtain a target prediction model.
  • an embodiment of the present application provides a terminal device, including a memory, a processor, and computer readable instructions stored in the memory and executable on the processor, where the processor executes the computer The step of implementing the claim form risk prediction method when reading the instruction.
  • embodiments of the present application provide one or more non-volatile readable storage media storing computer readable instructions, wherein the computer readable instructions are executed by one or more processors such that The step of implementing the reimbursement bill risk prediction method when the one or more processors execute.
  • Embodiment 1 is a flowchart of a method for predicting a reimbursement risk provided in Embodiment 1 of the present application;
  • step S4 is a flowchart of implementing step S4 in the method for predicting the reimbursement risk in the embodiment 1 of the present application
  • step S5 is a flowchart of implementing step S5 in the method for predicting the reimbursement risk in the embodiment 1 of the present application
  • step S6 is a flowchart showing an implementation of step S6 in the method for predicting the reimbursement risk in the embodiment 1 of the present application
  • FIG. 5 is a flowchart showing an implementation of testing a target prediction model using a cross-validation method in a reimbursement risk estimation method provided in Embodiment 1 of the present application;
  • FIG. 6 is a schematic diagram of a reimbursement claim risk prediction apparatus provided in Embodiment 2 of the present application.
  • FIG. 7 is a schematic diagram of a terminal device according to Embodiment 4 of the present application.
  • FIG. 1 shows an implementation flow of a method for predicting a reimbursement risk provided by an embodiment of the present application.
  • the claim for risk reimbursement is applied to the reimbursement system of each enterprise and enterprise to identify the risk level of the reimbursement form and improve the accuracy of the risk level of the forecast reimbursement form.
  • the reimbursement risk prediction method includes steps S1 to S6, which are detailed as follows:
  • the sample data is collected from the historical reimbursement form of the reimbursement database, and the historical reimbursement information is obtained.
  • the historical reimbursement form is the data stored in the reimbursement database in the process of production and operation by enterprises and institutions.
  • Each historical reimbursement information includes information obtained from the reimbursement form and information generated during the processing of the reimbursement form.
  • the historical reimbursement form information includes but is not limited to the reimbursement order number, the reimbursement name, and the Chinese name of the operator.
  • the Hadoop big data platform is used to collect sample data from the historical reimbursement form stored in the reimbursement database.
  • Hadoop is a distributed system infrastructure that implements a Hadoop Distributed File System (HDFS).
  • HDFS provides high-throughput data access and is ideal for applications on large-scale data sets.
  • data processing is performed by using distributed file system HDFS and data warehouse tool hive.
  • hive is a data warehouse tool based on Hadoop for storing, querying and analyzing large storage in Hadoop.
  • the scale data makes the collection of sample data using Hadoop big data platform has the advantage of high collection efficiency.
  • S2 The sample data is divided into training samples and test samples according to a preset ratio.
  • the ratio for dividing the sample data is set in advance.
  • the preset ratio may be a ratio obtained according to historical experience, or may be a ratio obtained by analyzing the sample data, and the specificity may be set according to actual application requirements, and is not limited herein.
  • the training sample is a sample data set for machine learning, and the data feature learning is performed by using the data information in the training sample to train the machine learning model to determine the parameters of the machine learning model, and the test sample is a machine learning for testing the completed training.
  • the resolving power of the model such as the predicted success rate of the reimbursement risk level.
  • the sample data is divided into training samples and test samples according to a preset ratio.
  • the sample data is divided according to a ratio of 9:1, that is, 90% of the sample data is used as a training sample, and the remaining 10% of the data is used as a test sample. If the total sample data collected is 6.05 million, then 5.445 million sample data will be used as training samples for feature learning according to the ratio of 9:1, and the remaining 605,000 sample data will be used as test samples to predict the risk level of the reimbursement form. , verify the prediction success rate of the model.
  • S3 Determine the risk level of the reimbursement list of each training sample according to the definition of the preset N reimbursement risk levels, where N is a positive integer.
  • the definition of the risk level of the N reimbursement orders is set in advance to distinguish the risk of the reimbursement form, wherein N is a positive integer, and the definition of the risk level of the reimbursement form can be set according to the needs of the actual application, where No restrictions.
  • the reimbursement risk level of each training sample is determined according to the definition of the preset reimbursement risk level, and the identification information of the corresponding reimbursement risk level is identified for each training sample.
  • Table 1 shows the classification criteria for the risk levels of the reimbursement orders divided into four risk levels: 0, 1, 2, and 3.
  • the association rule algorithm is used to perform model training to obtain an initial prediction model, wherein the initial prediction model includes an association rule that satisfies a preset model parameter requirement in each requisition risk level
  • the model parameters include support and confidence.
  • the training samples obtained by the collection are grouped according to the criteria of the preset reimbursement risk level classification, and the association rule algorithm is used for machine learning respectively.
  • the model parameter requirements include, but are not limited to, a preset support degree threshold and a confidence threshold, and according to the model parameter requirements, screening out the support degree threshold and the confidence threshold are selected.
  • the model parameters and their corresponding association rules are constructed, and the initial prediction model is constructed according to the model parameters and the association rules corresponding to the model parameters.
  • a set of support thresholds and confidence thresholds may be preset in preset model parameter requirements, and multiple sets of support thresholds and confidence thresholds, preset support thresholds and confidence thresholds may be preset. You can take values based on historical experience, or you can take values based on the distribution of data. There is no limit here.
  • the risk level of the reimbursement order is preset to four levels of 0, 1, 2, and 3, the specific grouping is as follows:
  • S5 Model prediction of test samples using an initial prediction model, and calculating a prediction success rate for each reimbursement risk level in each combination obtained by selecting a set of model parameters from each reimbursement risk level, and Total predicted success rate and test time for each combination.
  • data mining is performed on the training samples under each reimbursement risk level, and one or more sets of model parameters are preset in each risk reimbursement risk level to be filtered to meet preset model parameters.
  • the required association rule, in each combination mode obtained by selecting a set of model parameters from each of the reimbursement risk levels, using the initial prediction model to model the test samples, and calculating each reimbursement for each combination mode The predicted success rate and the total predicted success rate of the single risk level, and the test time t for obtaining the reimbursement risk level prediction of all test samples in the combined mode.
  • S6 Perform regression analysis on model parameters, prediction success rate, test time and total prediction success rate to obtain a target prediction model.
  • step S5 discrete data such as prediction success rate, test time and total prediction success rate in each combination mode are obtained in step S5, and regression analysis is performed to determine the quantitative relationship between variables, and a continuous relationship is obtained.
  • the larger the support threshold and the confidence threshold the more accurate the obtained association rules are.
  • the association rules are constructed.
  • the target prediction model obtains the target prediction model, which is used to predict the risk level of the reimbursement form and improve the accuracy of the reimbursement risk prediction model.
  • the historical reimbursement information is obtained as sample data, and the sample data is divided into training samples and test samples according to a preset ratio, and the quality of the model trained by the training samples can be evaluated through the test samples.
  • the model training is obtained, and the risk level of each reimbursement form is satisfied.
  • the target association rules required by the preset model parameters are constructed, and the initial prediction model is constructed. This method of training the models according to different reimbursement risk levels can learn the characteristics of the reimbursement data with a small proportion in the sample data, avoiding this part.
  • the reimbursement data is discarded as noise processing, thereby improving the accuracy of the model; finally, the initial prediction model is used to model the test samples, and a set of model parameters are selected from each reimbursement risk level. Calculate each combination in each combination The prediction success rate, total prediction success rate and test time of each reimbursement risk level, and regression analysis of these discrete data, to obtain the target prediction model, and obtain accurate model configuration parameters through model prediction and regression analysis, so that The target prediction model can assist the staff to accurately and efficiently identify the risk level of the reimbursement form, and effectively improve the accuracy of the risk level of the forecast reimbursement form.
  • the training model in the risk level of each reimbursement list mentioned in step S4 is used to perform model training using the association rule algorithm by using a specific embodiment.
  • the specific implementation method of the initial prediction model is described in detail.
  • FIG. 2 shows a specific implementation process of step S4 provided by the embodiment of the present application, which is described in detail as follows:
  • S41 Perform data preprocessing on the training samples in each reimbursement risk level to obtain a to-be-processed data set in each reimbursement risk level.
  • the process of data pre-processing includes data cleaning, data integration, and data conversion on the training samples.
  • Data cleaning is to select the attribute information needed in the training sample as the feature value for training learning.
  • Data integration is the integration of data from training samples for each reimbursement risk level into a data file as a data set.
  • Data conversion is to convert the data type of the training sample in the data set into a unified format.
  • the association rule algorithm is generally suitable for mining Boolean data, and then converting all data types into Boolean data.
  • S42 The data set to be processed is processed by using an association rule algorithm, and multiple item sets in each risk level of the reimbursement form are obtained.
  • an association rule algorithm is used to perform data mining on each data set to be processed, and each reimbursement ticket training sample is a transaction, which is denoted as T, and corresponding transaction identification information is identified for each training sample, and the transaction is identified.
  • the set is a transaction set, denoted as D
  • each attribute in the reimbursement list is an item, denoted as W
  • each transaction includes multiple attributes
  • the set of items is an item set
  • the item set W ⁇ w 1 , w 2 , ..., w j ⁇ , j is the number of items in the item set.
  • S43 For each reimbursement risk level, select a target item set that satisfies the model parameter requirement from the item set in the requisition risk level, and establish an association rule according to the target item set.
  • a corresponding one or more sets of support thresholds and confidence thresholds are preset, and the support degree is greater than or equal to the support degree from each data set.
  • the target item set of the threshold is used as the frequent item set, and the preliminary rule is generated by the frequent item set, the confidence of the preliminary rule is calculated, and the rule whose confidence degree is greater than or equal to the confidence threshold is obtained as the association rule.
  • the support degree is the percentage of the total number of transactions in the transaction set D including both the transaction A and the transaction B
  • the confidence is the number of transactions and the transaction including the transaction A and the transaction B in the transaction set D.
  • the percentage of the number of A transactions, the rules can be used Indicates that the relationship between transaction A and transaction B is reflected.
  • the support can be calculated according to formula (1):
  • the confidence can be calculated according to formula (2):
  • the association rule algorithm is used to perform data mining on the training sample to obtain the association rule, and the preset support degree threshold and the confidence threshold are used as model parameters.
  • the association rules are aggregated to generate an initial prediction model that is used to predict the risk level of the claims in the test sample.
  • the quality of the data used for training the machine learning model is improved, and the support level of each reimbursement risk level is preset.
  • Threshold and confidence thresholds are used as model parameters.
  • the association rules algorithm is used to perform data mining on the training samples of each reimbursement risk level. The association between the data is mined, the association rules are obtained, and the initial model parameters are combined to generate the initial.
  • a predictive model used to predict the risk level of a reimbursement.
  • the model training according to the risk level of different reimbursement forms can learn the characteristics of the reimbursement data with a small proportion in the sample data, and avoid the situation that the reimbursement data is discarded as noise processing, thereby improving the accuracy of the model. degree.
  • the test sample is predicted by using the initial prediction model in step S5 by a specific embodiment, and one risk level is selected from each reimbursement form.
  • the prediction success rate of each reimbursement risk level is calculated, and the specific implementation method of the total prediction success rate and the test time in the combination mode is described in detail.
  • FIG. 3 shows a specific implementation process of step S5 provided by the embodiment of the present application, which is described in detail as follows:
  • S51 Determine the risk level of the reimbursement form of each test sample and the number of test samples of each reimbursement risk level according to the definition of the preset N reimbursement risk levels.
  • the risk level of each reimbursement form in the test sample is determined according to the definition of the N reimbursement risk levels preset in step S3, and the identification information of the corresponding reimbursement risk level is identified for each test sample. According to the identification information, the number of test samples for each reimbursement risk level is obtained.
  • the test sample is predicted for its reimbursement risk level, and the rules generated during the model generation process are corrected and corrected.
  • P i is the probability of the i-th reimbursement risk level in the test sample
  • R i is the number of test samples of the i-th reimbursement risk level
  • S is the total number of test samples.
  • S53 Select a set of model parameters from each of the reimbursement risk levels to combine and obtain L combinations, where L is a positive integer.
  • the association rule mining is performed on the training samples of each reimbursement risk level according to the preset one or more sets of model parameter requirements in each reimbursement risk level, and the model parameters include the support threshold and the confidence. Degree threshold.
  • the model parameters include the support threshold and the confidence. Degree threshold.
  • the corresponding association rules satisfying the parameters of the model can be selected.
  • a set of model parameters are selected from each reimbursement risk level to be combined, and L different combinations are obtained, wherein L is a positive integer.
  • the model parameters of each reimbursement risk level are preset to:
  • the initial prediction model is used to predict the risk level of the reimbursement list according to the probability of the high-to-low probability, and the prediction result of each test sample is obtained, and the reimbursement in the combination mode is obtained. Test time for single risk level prediction.
  • the probability of each claim form risk level in the test sample calculated according to formula (3) is used, and for each combination mode, the initial prediction model pair obtained by training is used in descending order of probability.
  • the test sample is used to predict the risk level of the reimbursement form, obtain the prediction result of each test sample, and obtain the test time for completing the risk level prediction of the reimbursement form for all the test samples in the combined mode, and obtain a total of each test in the L combination mode.
  • the prediction results of the sample, and the corresponding test time are used to further analyze the accuracy of the initial prediction model.
  • the prediction result of the reimbursement risk level of each test sample predicted according to step S54 is compared with the identification information of the reimbursement risk level of the test sample, and if the requisitions of the two requisitions have the same risk level, the test is confirmed. The sample prediction is successful. If the risk level of the two reimbursement orders is different, the test sample prediction failure is confirmed.
  • the number of successful test sample predictions under each reimbursement risk level is calculated and used to calculate the predicted success rate for each reimbursement risk level for each combination.
  • hitrate i is the predicted success rate of the i-th reimbursement risk level
  • M i is the number of successful test samples under the i-th reimbursement risk level
  • R i is the test sample of the i-th reimbursement risk level number.
  • hitRate is the total prediction success rate
  • M i is the number of successful test samples under the risk level of the i-th reimbursement list
  • S is the total number of test samples.
  • the 605790 reimbursement test samples are collected, and the reimbursement risk level prediction is performed, according to the definition of the preset reimbursement risk level. And count the number of test samples for each reimbursement risk level. Among them, there are 561,627 reimbursement samples for the 0 risk level, 34,818 reimbursement samples for the 1 risk level, and 13 reimbursement samples for the 2 risk levels. There are 9332 reimbursement samples for the 3 risk levels.
  • test sample is used to predict the risk level of the reimbursement form, and the predicted result of each test sample is compared with the risk level of the reimbursement list identified by the identification information of the test sample, and the result of each risk level prediction success is: 0 reimbursement of the risk level
  • the number of singles is 561,527, the number of reimbursement for 1 risk level is 30,821, the number of reimbursement for 2 risk levels is 1, and the number of reimbursement for 3 risk is 1532.
  • the total number of reimbursement for the total forecast is 593,881.
  • 2 risk level single claims ratio of predictive success hitrate 2 1/13 7.69230%
  • a set of model parameters are selected from each reimbursement risk level, and the initial use is performed according to the order of probability from high to low.
  • the predictive model predicts the risk level of the reimbursement form for the test sample, verifies the recognition rate of the initial predictive model, and improves the efficiency of the model test.
  • the predicted success rate and the total predicted success rate of the level so as to further analyze the accuracy of the initial prediction model according to the predicted success rate, test time and total prediction success rate, perform checksum correction and the rules generated during the model generation process, and realize the initial prediction.
  • the optimization of the model and the accurate target prediction model enable the target prediction model to assist the staff to accurately and efficiently identify the risk level of the reimbursement form, and effectively improve the accuracy of the risk level of the forecast reimbursement form.
  • step S6 the regression analysis of the model parameters, the prediction success rate, the test time and the total prediction success rate mentioned in step S6 is performed by a specific embodiment to obtain the target prediction model.
  • the specific implementation method will be described in detail.
  • FIG. 4 shows a specific implementation process of step S6 provided by the embodiment of the present application, which is described in detail as follows:
  • the model parameters in each reimbursement risk level, and the prediction success rate and the test time are used as design variables, the total prediction success rate is taken as the target variable, and the function variable and the target variable are used for function fitting.
  • the result of the test sample prediction in each combination mode is used as a set of data, and the L group result data obtained in step S53 is fitted, and the fitting manner can be specifically expressed as:
  • n is the number of risk levels for the reimbursement form
  • t is the test time for the risk level prediction of the reimbursement form for all test samples in each combination mode
  • is the running configuration parameter
  • is a preset according to the system software and hardware configuration. Constants, which can be set according to the needs of the actual application, and are not limited here.
  • the execution efficiency of the functional modules of the fitting process can be adjusted by a combination of the parameter t and the parameter ⁇ .
  • the function fitting method can be fitted using tools such as office software (Microsoft Excel, excel) or mathematical software (Matrix Laboratory, matlab), model parameters including the support degree and confidence, prediction success rate, and total prediction.
  • Discrete data such as success rate is subjected to nonlinear regression analysis to find the relationship between the design variable and the target variable, and the expression f(x) of the fitting function is determined according to the relationship, thereby fitting the coincidence with the discrete data.
  • the fitted fitting function f(x) is solved, and a set of design variables with the highest total prediction success rate and the highest model parameter value is used as the model configuration parameter according to the solution result, wherein the support threshold and the confidence are supported.
  • the model prediction accuracy of the target prediction model is the highest.
  • the overall prediction success rate is the standard for evaluating the model quality. The higher the total prediction success rate of the model, the higher the model accuracy.
  • the model configuration parameters improve the accuracy of the association rules, and construct the target prediction model according to the model configuration parameters and the association rules that meet the requirements of the model parameters, so as to improve the accuracy of the target prediction model.
  • the regression analysis is performed on the model parameters, the prediction success rate, the test time and the total prediction success rate mentioned in step S6, and after the target prediction model is obtained, the crossover can be further used.
  • the method of verification selects a reasonable model.
  • the claim form risk prediction method further includes:
  • S71 Segment the sample data into K subsample data.
  • the cross-validation accuracy test method is used to verify the target prediction model after the fitting optimization, and the collected reimbursement sample data is segmented into K sub-sample data by means of random segmentation, through machine learning.
  • the method is to construct a plurality of target prediction models, and to accurately evaluate the constructed target prediction model, so as to avoid over-fitting of the trained model, wherein K is a positive integer.
  • Overfitting means that the fitting function is highly consistent with the training sample, but the model configuration parameters obtained by the solution are used to predict the success rate of the reimbursement risk level of the test sample is not high.
  • the cross-validation may adopt a holdout cross validation (holdout), a k-fold cross validatio or a leave-one-out cross validation (loocv).
  • holdout holdout
  • laocv leave-one-out cross validation
  • S72 From the K subsample data, select one subsample data as the test sample, and the remaining K-1 subsample data as the training samples, perform model training, model prediction and regression analysis, and obtain K target prediction models and each target prediction.
  • the model accuracy of the model where K is a positive integer.
  • one of the K subsample data is selected as the test sample of the verification model, and the other K-1 subsample data is used as the training sample of the feature learning, and the processes of step S3 to step S6 are performed.
  • Model training, model prediction and regression analysis are carried out to complete the construction of a target prediction model, and the target prediction model and its model accuracy are obtained.
  • each subsample data of the K subsample data is used as a sample specimen to construct a target prediction model, and K results are obtained, including K target prediction models and model accuracy of each target prediction model.
  • the model accuracy of the K target prediction models and each target prediction model is compared and analyzed, and the target prediction model with the highest model accuracy is used as a reasonable model to obtain a reliable and stable reasonable model.
  • the rational model can fit the sample data, on the other hand, it can predict the risk level of the new reimbursement data with high accuracy.
  • the reasonable model can predict the accurate reimbursement risk level and the reimbursement data and corresponding The reimbursement risk level is stored in the reimbursement database.
  • the time interval may be 1 month, 2 months or other time range
  • the historical reimbursement information is randomly obtained from the reimbursement database every predetermined time interval, and step S1 is repeatedly performed.
  • step S6 the autonomous machine learning is completed, and the updated target prediction model is obtained, thereby further optimizing the accuracy of the model, improving the success rate of the reimbursement risk level prediction, and realizing the accurate prediction of the risk level of the reimbursement form.
  • the accuracy of the model is tested by the method of cross-validation, and the sub-sample data of the random segmentation is used for multiple training and verification, so as to avoid improper fitting of the target prediction model obtained by the training, and from In the verification result, the target prediction model with the highest model accuracy is selected as the reasonable model, that is, the sample data can be fitted, and the prediction of the new reimbursement data can be realized with high accuracy, and the accuracy of the risk level prediction of the reimbursement form is improved.
  • FIG. 6 shows the reimbursement claim risk prediction device corresponding to the reimbursement claim risk prediction method shown in Embodiment 1.
  • FIG. 6 shows the reimbursement claim risk prediction device corresponding to the reimbursement claim risk prediction method shown in Embodiment 1.
  • FIG. 6 shows the reimbursement claim risk prediction device corresponding to the reimbursement claim risk prediction method shown in Embodiment 1.
  • the reimbursement risk prediction apparatus includes a sample data collection module 61, a sample data division module 62, a risk level preset module 63, an initial prediction model acquisition module 64, an initial prediction model test module 65, and a target prediction model acquisition. Module 66.
  • Each function module is described in detail as follows:
  • the sample data collection module 61 is configured to obtain historical reimbursement information and use historical reimbursement information as sample data;
  • a first dividing module 62 configured to divide the sample data into training samples and test samples according to a preset ratio
  • the risk level preset module 63 is configured to determine a risk level of the reimbursement form of each training sample according to a preset definition of the risk level of the N reimbursement orders, where N is a positive integer;
  • the initial prediction model obtaining module 64 is configured to perform model training using an association rule algorithm for the training samples in each reimbursement risk level, and obtain an initial prediction model, wherein the initial prediction model includes a preset in each of the reimbursement risk levels
  • the association rules required by the model parameters, the model parameters include support and confidence;
  • the initial prediction model testing module 65 is configured to perform model prediction on the test sample using the initial prediction model, and calculate each reimbursement risk under each combination mode obtained by selecting a set of model parameters from each reimbursement risk level. The predicted success rate of the level, and the total predicted success rate and test time for each combination;
  • the target prediction model acquisition module 66 is configured to perform regression analysis on model parameters, prediction success rate, test time, and total prediction success rate to obtain a target prediction model.
  • the initial prediction model acquisition module 64 includes:
  • the data pre-processing unit 641 is configured to perform data pre-processing on the training samples in each reimbursement risk level to obtain a to-be-processed data set in each reimbursement risk level;
  • the training sample mining unit 642 is configured to perform data mining using an association rule algorithm for the data set to be processed, and obtain a plurality of item sets in each risk level of the reimbursement form;
  • the association rule obtaining unit 643 is configured to filter, according to each reimbursement risk level, a target item set that satisfies the model parameter requirement from the item set in the reimbursement risk level, and establish an association rule according to the target item set;
  • the initial prediction model construction unit 644 is configured to construct an initial prediction model according to the model parameter requirements corresponding to the association rule and the association rule.
  • the initial prediction model testing module 65 includes:
  • a first statistic unit 651 configured to determine a reimbursement risk level of each test sample and a test sample number of each reimbursement risk level according to a definition of a preset N reimbursement risk level;
  • the first calculating unit 652 is configured to calculate a probability of each claim form risk level in the test sample according to the following formula:
  • P i is the probability of the i-th reimbursement risk level in the test sample
  • R i is the number of test samples of the i-th reimbursement risk level
  • S is the total number of test samples
  • the prediction mode combination unit 653 is configured to select a set of model parameters from each of the reimbursement risk levels to obtain L combinations, wherein L is a positive integer;
  • the test sample prediction unit 654 is configured to perform a reimbursement risk level prediction on the test sample according to the order of probability from high to low for each combination mode, obtain a prediction result of each test sample, and obtain the prediction result in the test sample.
  • the second statistic unit 655 is configured to compare the prediction result of each test sample with the reimbursement risk level of the test sample, and if the two are the same, confirm that the test sample is successfully predicted, and count each in each combination mode. The number of successful test samples under the risk level of the reimbursement form;
  • the second calculating unit 656 is configured to calculate a predicted success rate of each reimbursement risk level in each combination mode according to the following formula:
  • hitrate i is the predicted success rate of the i-th reimbursement risk level
  • M i is the number of successful test samples under the i-th reimbursement risk level
  • the third calculating unit 657 is configured to calculate the total predicted success rate in each combination mode according to the following formula:
  • hitRate is the total predicted success rate.
  • the target prediction model acquisition module 66 includes:
  • the data fitting unit 661 is configured to use the model parameters in each reimbursement risk level, the prediction success rate and the test time as design variables, and the total prediction success rate as a target variable, and use the design variable and the target variable to perform function fitting. , get the fitting function;
  • the target prediction model construction unit 662 is configured to solve the fitting function, and according to the solution result, a set of design variables with the highest total prediction success rate and the highest model parameter value is used as the model configuration parameter, and the target prediction model is constructed according to the model configuration parameter. , wherein the model prediction accuracy of the target prediction model is the highest total prediction success rate.
  • the reimbursement risk prediction device further includes:
  • a second dividing module 67 configured to divide the sample data into K sub-sample data
  • the cross-validation module 68 is configured to select one sub-sample data from the K sub-sample data as a test sample, and the remaining K-1 sub-sample data as a training sample, perform model training, model prediction and regression analysis to obtain K target prediction models. And the model accuracy of each target prediction model, where K is a positive integer;
  • the reasonable model acquisition module 69 is used to use the target prediction model with the highest model accuracy as a reasonable model.
  • This embodiment provides one or more non-volatile readable storage media having computer readable instructions stored thereon.
  • the computer readable instructions are executed by one or more processors, causing one or more processors to perform the reimbursement risk prediction method of embodiment 1, or when the computer readable instructions are executed by one or more processors
  • the function of each module/unit in the reimbursement risk prediction device in Embodiment 2 is not repeated here to avoid repetition.
  • non-volatile readable storage media storing computer readable instructions may comprise: any entity or device capable of carrying the computer readable instructions, a recording medium, a USB flash drive, a mobile hard drive, a magnetic Disc, optical disc, computer memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier signals, and telecommunications signals.
  • FIG. 7 is a schematic diagram of a terminal device according to an embodiment of the present application.
  • the terminal device 7 of this embodiment includes a processor 70, a memory 71, and computer readable instructions 72 stored in the memory 71 and executable on the processor 70.
  • the processor 70 executes the computer readable instructions 72, the steps in the embodiments of the various claims reimbursement methods described above are implemented, such as steps S1 through S6 shown in FIG.
  • processor 70 when executing computer readable instructions 72, implements the functions of the various modules/units of the various apparatus embodiments described above, such as the functions of modules 61 through 66 of FIG.

Abstract

本申请公开了一种报销单风险预测方法、装置、设备及介质。该方法包括:获取历史报销单信息作为样本数据,并按照预设的比例划分为训练样本和测试样本;根据预设的报销单风险等级,确定每个训练样本的风险等级;针对每个风险等级中的训练样本,使用关联规则算法进行模型训练,得到初始预测模型;使用初始预测模型对测试样本进行预测,在从每个风险等级中选择一组模型参数进行组合得到的每种组合方式下,计算每个风险等级的预测成功率,以及每种组合方式下的总预测成功率和测试时间;对模型参数、预测成功率、测试时间和总预测成功率作回归分析,得到目标预测模型,从而辅助工作人员高效地识别报销单的风险级别,提高预测报销单风险等级的准确率。

Description

一种报销单风险预测方法、装置、终端设备及存储介质
本申请以2018年02月27日提交的申请号为201810161565.6,名称为“一种报销单风险预测方法、装置、终端设备及存储介质”的中国发明专利申请为基础,并要求其优先权。
技术领域
本申请涉及计算机技术领域,尤其涉及一种报销单风险预测方法、装置、终端设备及存储介质。
背景技术
在日常的费用报销中会存在着一些恶意报销,虚假报销的情况,为了加强风险管理,目前大多使用基于关联规则的挖掘算法建立报销单风险等级预测模型,来进行预测报销单的风险等级。但是当报销单风险等级数据分布不均时,低概率风险等级的报销单在训练数据中所占比例很小,传统的基于关联规则的挖掘算法会把低概率风险等级的报销单数据当做噪声处理而丢弃,导致所建模型无法训练学习得到低概率风险等级报销单数据的特征,使得所建模型用于预测新的报销单的风险等级时,其预测准确率较低。
发明内容
本申请实施例提供一种报销单风险预测方法,以解决目前报销单风险等级预测模型对报销单的风险等级预测准确率低的问题。
第一方面,本申请实施例提供一种报销单风险预测方法,包括:
获取历史报销单信息,并将所述历史报销单信息作为样本数据;
将所述样本数据按照预设的比例划分为训练样本和测试样本;
根据预设的N个报销单风险等级的定义,确定每个所述训练样本的报销单风险等级,其中,N为正整数;
针对每个所述报销单风险等级中的所述训练样本,使用关联规则算法进行模型训练,得到初始预测模型,其中,所述初始预测模型包括每个所述报销单风险等级中满足预设的 模型参数要求的关联规则,所述模型参数包括支持度和置信度;
使用所述初始预测模型对所述测试样本进行模型预测,在从每个所述报销单风险等级中选择一组所述模型参数进行组合得到的每种组合方式下,计算每个所述报销单风险等级的预测成功率,以及每种所述组合方式下的总预测成功率和测试时间;
对所述模型参数、所述预测成功率、所述测试时间和所述总预测成功率进行回归分析,得到目标预测模型。
第二方面,本申请实施例提供一种报销单风险预测装置,包括:
样本数据采集模块,用于获取历史报销单信息,并将所述历史报销单信息作为样本数据;
第一划分模块,用于将所述样本数据按照预设的比例划分为训练样本和测试样本;
风险等级预设模块,用于根据预设的N个报销单风险等级的定义,确定每个所述训练样本的报销单风险等级,其中,N为正整数;
初始预测模型获取模块,用于针对每个所述报销单风险等级中的所述训练样本,使用关联规则算法进行模型训练,得到初始预测模型,其中,所述初始预测模型包括每个所述报销单风险等级中满足预设的模型参数要求的关联规则,所述模型参数包括支持度和置信度;初始预测模型测试模块,用于使用所述初始预测模型对所述测试样本进行模型预测,在从每个所述报销单风险等级中选择一组所述模型参数进行组合得到的每种组合方式下,计算每个所述报销单风险等级的预测成功率,以及每种所述组合方式下的总预测成功率和测试时间;
目标预测模型获取模块,用于对所述模型参数、所述预测成功率、所述测试时间和所述总预测成功率进行回归分析,得到目标预测模型。
第三方面,本申请实施例提供一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现所述报销单风险预测方法的步骤。
第四方面,本申请实施例提供一个或多个存储有计算机可读指令的非易失性可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行时实现所述报销单风险预测方法的步骤。
本申请的一个或多个实施例的细节在下面的附图及描述中提出。本申请的其他特征和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例1中提供的报销单风险预测方法的流程图;
图2是本申请实施例1中提供的报销单风险预测方法中步骤S4的实现流程图;
图3是本申请实施例1中提供的报销单风险预测方法中步骤S5的实现流程图;
图4是本申请实施例1中提供的报销单风险预测方法中步骤S6的实现流程图;
图5是本申请实施例1中提供的报销单风险预测方法中使用交叉验证方法测试目标预测模型精确度的实现流程图;
图6是本申请实施例2提供的报销单风险预测装置的示意图;
图7是本申请实施例4提供的终端设备的示意图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
实施例1
请参阅图1,图1示出了本申请实施例提供的报销单风险预测方法的实现流程。该报销单风险预测方法应用在各个企事业单位的报销单审核系统中,用于识别报销单的风险级别,提高预测报销单风险等级的准确率。如图1所示,该报销单风险预测方法包括步骤S1至步骤S6,详述如下:
S1:获取历史报销单信息,并将历史报销单信息作为样本数据。
在本申请实施例中,样本数据是从报销单数据库的历史报销单中采集,获取历史报销单信息。
历史报销单是企事业单位在生产经营过程中存储在报销单数据库中的数据。每个历史报销单信息包括从报销单上获得的信息和在处理报销单过程中所产生的信息,具体地,历史报销单信息包括但不限于报销单编号、报销单名称、经办人中文姓名、报销人中文姓名、部门名称、报销金额、合计金额、附单据张数等多种属性信息,以历史报销单信息作为样 本数据进行挖掘学习。
具体地,在对报销单的样本数据进行采集、存储和处理加工时,使用Hadoop大数据平台实现从报销单数据库中存储的历史报销单中采集样本数据。
Hadoop是一种分布式系统基础架构,实现了一个分布式文件系统(Hadoop Distributed File System,HDFS),HDFS能提供高吞吐量的数据访问,非常适合大规模数据集上的应用。在对样本数据的采集过程中,通过采用分布式文件系统HDFS和数据仓库工具hive进行数据处理,其中,hive是基于Hadoop的一个数据仓库工具,用于存储、查询和分析存储在Hadoop中的大规模数据,使得采用Hadoop大数据平台进行样本数据的采集具有采集效率高的优点。
S2:将样本数据按照预设的比例划分为训练样本和测试样本。
在本申请实施例中,预先设置用于对样本数据进行划分的比例。
需要说明的是,该预设的比例可以是根据历史经验获取的比例,也可以是根据对样本数据进行分析得到的比例,其具体可以根据实际应用的需要进行设置,此处不做限制。
训练样本是用于机器学习的样本数据集,进行数据特征学习,即采用训练样本中的数据信息进行训练机器学习模型,以确定机器学习模型的参数,测试样本是用于测试完成训练的机器学习模型的分辨能力,如报销单风险等级的预测成功率。
具体地,按照预设的比例将样本数据划分为训练样本和测试样本。例如,按照9:1的比例对样本数据进行划分,即将90%的样本数据作为训练样本,剩余10%的数据作为测试样本。若采集到的总样本数据为605万份,则按照9:1的比例,将其中544.5万份样本数据作为训练样本进行特征学习,剩余60.5万份样本数据作为测试样本进行预测其报销单风险等级,验证模型的预测成功率。
S3:根据预设的N个报销单风险等级的定义,确定每个训练样本的报销单风险等级,其中,N为正整数。
在本申请实施例中,预先设置N个报销单风险等级的定义,用于区分报销单的风险,其中,N为正整数,报销单风险等级的定义可以根据实际应用的需要进行设置,此处不做限制。报销单的风险等级越大,报销单存在的风险越高。
具体地,根据预设的报销单风险等级的定义确定每个训练样本的报销单风险等级,并为每个训练样本标识对应的报销单风险等级的标识信息。
为了更好的理解本步骤,下面以一个具体的报销单风险等级分类为例加以说明。如表一所示,表一示出了报销单的风险等级分为0、1、2、3四个风险等级的分类标准。
表一
Figure PCTCN2018081527-appb-000001
S4:针对每个报销单风险等级中的训练样本,使用关联规则算法进行模型训练,得到初始预测模型,其中,初始预测模型包括每个报销单风险等级中满足预设的模型参数要求的关联规则,该模型参数包括支持度和置信度。
具体地,根据步骤S3中标识的每个训练样本的报销单风险等级标识信息,将采集整理得到的训练样本按照预设的报销单风险等级分类的标准进行分组,分别使用关联规则算法进行机器学习。对每一组训练样本预设模型参数要求,该模型参数要求包括但不限于预设的支持度阈值和置信度阈值,根据该模型参数要求,筛选出满足该支持度阈值和该置信度阈值的模型参数及其对应的关联规则,并根据该模型参数和该模型参数对应的关联规则,构建得到初始预测模型。
需要说明的是,在预设的模型参数要求中可以预设一组支持度阈值和置信度阈值,也可以预设多组支持度阈值和置信度阈值,预设的支持度阈值和置信度阈值可以根据历史经验取值,也可以根据数据的分布情况进行取值,此处不做限制。
例如,当报销单风险等级预设为0、1、2、3四个等级时,具体分组如下:
P 0:sup 0=x 0,confid 0=y 0
P 1:sup 1=x 1,confid 1=y 1
P 2:sup 2=x 2,confdi 2=y 2
P 3:sup 3=x 3,confid 3=y 3
其中,P 0、P 1、P 2、P 3分别为训练样本按0、1、2、3四个报销单风险等级分类的分组,sup i为支持度阈值,confid i为置信度阈值,x i∈[0,1],y i∈[0,1],且y i≥x i,i=0,1,2,3。 例如,xi和yi的具体取值可以是x 0=0.6,y 0=0.8;x 1=0.1,y 1=0.7;x 2=0.6,y 2=0.95;x 3=0.1,y 3=0.7或者x 0=0.8,y 0=0.95;x 1=0.2,y 1=0.7;x 2=0.8,y 2=0.9;x 3=0.4,y 3=0.7等。
S5:使用初始预测模型对测试样本进行模型预测,在从每个报销单风险等级中选择一组模型参数进行组合得到的每种组合方式下,计算每个报销单风险等级的预测成功率,以及每种组合方式下的总预测成功率和测试时间。
在本申请实施例中,对每个报销单风险等级下的训练样本进行数据挖掘,在每个报销单风险等级的中预设了一组或多组模型参数要求进行筛选满足预设的模型参数要求的关联规则,在从每个报销单风险等级中选择一组模型参数进行组合得到的每种组合方式下,使用初始预测模型对测试样本进行模型预测,计算每种组合方式下的每个报销单风险等级的预测成功率和总预测成功率,并获取在该组合方式下完成全部测试样本的报销单风险等级预测的测试时间t。
S6:对模型参数、预测成功率、测试时间和总预测成功率进行回归分析,得到目标预测模型。
在本申请实施例中,对步骤S5得到每种组合方式下的预测成功率、测试时间和总预测成功率等离散型数据,进行回归分析,确定变量间相互依赖的定量关系,得到一个连续的函数或者更加密集的离散方程,使该函数或该离散方程与离散型数据相吻合,并对该函数或该离散方程进行求解和分析,以总预测成功率最高并且模型参数的值最高的一组离散型数据作为模型最优配置参数,其中,支持度阈值和置信度阈值越大,得到的关联规则越准确,并根据模型最优配置参数以及对应满足该模型最优配置参数要求的关联规则构建目标预测模型,得到目标预测模型,用于预测报销单风险等级,提高报销单风险预测模型的准确率。
在图1对应的实施例中,通过获取历史报销单信息作为样本数据,并将样本数据按照预设的比例划分为训练样本和测试样本,能够通过测试样本来评价训练样本训练得到的模型的质量;在对报销单风险等级进行定义,确定每个训练样本的报销单风险等级后,针对每个报销单风险等级中的训练样本,使用关联规则算法进行模型训练,获取各报销单风险等级中满足预设的模型参数要求的目标关联规则,构建初始预测模型,这种按照不同报销单风险等级进行模型训练的方式能够学习到样本数据中所占比例较小的报销单数据的特征,避免这部分报销单数据被当做噪声处理而丢弃的情况,从而提高模型的精确度;最后再使用初始预测模型对测试样本进行模型预测,在从每个报销单风险等级中选择一组模型参数进行组合得到的每种组合方式下,计算每种组合方式下的每个报销单风险等级的预测 成功率、总预测成功率和测试时间,并对这些离散型数据作回归分析,得到目标预测模型,通过模型预测和回归分析得到精准的模型配置参数,使得目标预测模型能够辅助工作人员精准高效地识别报销单的风险级别,有效提高预测报销单风险等级的准确率。
接下来,在图1对应的实施例的基础之上,下面通过一个具体的实施例对步骤S4中提及的针对每个报销单风险等级中的训练样本,使用关联规则算法进行模型训练,得到初始预测模型的具体实现方法进行详细说明。
请参阅图2,图2示出了本申请实施例提供的步骤S4的具体实现流程,详述如下:
S41:对每个报销单风险等级中的训练样本进行数据预处理,得到每个报销单风险等级中的待处理数据集。
在本申请实施例中,数据预处理的过程包括对训练样本进行数据清理、数据集成和数据转换。
数据清理是选取训练样本中需要的属性信息作为特征值进行训练学习。数据集成是将每个报销单风险等级的训练样本的数据集成到一个数据文件中作为数据集。数据转换是将数据集中训练样本的数据类型转换为统一的格式,例如,关联规则算法一般适用于对布尔型数据进行挖掘,则将数据类型全部转换为布尔型数据。
对每个报销单风险等级中的训练样本进行数据预处理后,得到每个报销单风险等级中的待处理数据集,提高训练样本的数据质量。
S42:对待处理数据集使用关联规则算法进行数据挖掘,得到每个报销单风险等级中的多个项集。
在本申请实施例中,使用关联规则算法对每个待处理数据集进行数据挖掘,每个报销单训练样本为一个事务,记为T,并为每个训练样本标识对应的事务标识信息,事务的集为事务集合,记为D,报销单中每个属性为一个项,记为W,每个事务包括多个属性,项的集合为项集,项集W={w 1,w 2,...,w j},j为项集中项的个数。在对待处理数据集中的每个训练样本进行标识后,每个事务的标识信息对应一个项集,得到每个报销单风险等级中的多个项集。
S43:针对每个报销单风险等级,从该报销单风险等级中的项集中筛选出满足模型参数要求的目标项集,并根据该目标项集建立关联规则。
在本申请实施例中,针对每个报销单风险等级训练样本的训练学习,预设对应的一组或多组支持度阈值和置信度阈值,从每个数据集中筛选出支持度大于等于支持度阈值的目标项集作为频繁项集,再由频繁项集产生初步规则,计算初步规则的置信度,获取置信度 大于等于置信度阈值的规则,作为关联规则。
需要说明的是,支持度是事务集合D中同时包含事务A和事务B的事务数所占总事务数的百分比,置信度是事务集合D中同时包含事务A和事务B的事务数与包含事务A事务数的百分比,规则可以用式子
Figure PCTCN2018081527-appb-000002
表示,反映事务A与事务B之间的关联性。
具体地,支持度可以按照公式(1)进行计算:
Figure PCTCN2018081527-appb-000003
其中,sup为支持度,
Figure PCTCN2018081527-appb-000004
为事务集合D中同时包含事务A和事务B的事务数,||D||为事务集合D中的事务数。
具体地,置信度可以按照公式(2)进行计算:
Figure PCTCN2018081527-appb-000005
其中,
Figure PCTCN2018081527-appb-000006
为规则
Figure PCTCN2018081527-appb-000007
的置信度,
Figure PCTCN2018081527-appb-000008
为事务集合D中包含事务A的事务数。
S44:根据关联规则和关联规则对应的模型参数要求,构建初始预测模型。
具体地,在根据预设的支持度阈值和置信度阈值,使用关联规则算法对训练样本进行数据挖掘得到关联规则的基础上,以预设的支持度阈值和置信度阈值作为模型参数,对得到的关联规则进行汇总,生成初始预测模型,该初始预测模型用于预测测试样本中报销单的风险等级。
在图2对应的实施例中,通过对每个报销单风险等级中的训练样本进行数据预处理,提高用于训练机器学习模型的数据的质量,再对每个报销单风险等级预设支持度阈值和置信度阈值作为模型参数要求,使用关联规则算法对每个报销单风险等级的训练样本进行数据挖掘,挖掘数据之间的关联性,得到关联规则,结合所预设的模型参数,生成初始预测模型,用于预测报销单的风险等级。采用按照不同报销单风险等级进行模型训练的方式能够学习到样本数据中所占比例较小的报销单数据的特征,避免这部分报销单数据被当做噪声处理而丢弃的情况,从而提高模型的精确度。
在图1或图2对应的实施例的基础之上,下面通过一个具体的实施例对步骤S5中提及使用初始预测模型对测试样本进行模型预测,在从每个报销单风险等级中选择一组所述模型参数进行组合得到的每种组合方式下,计算每个报销单风险等级的预测成功率,以及 该组合方式下的总预测成功率和测试时间的具体实现方法进行详细说明。
请参阅图3,图3示出了本申请实施例提供的步骤S5的具体实现流程,详述如下:
S51:根据预设的N个报销单风险等级的定义,确定每个测试样本的报销单风险等级,以及每个报销单风险等级的测试样本数。
在本申请实施例中,根据步骤S3预设的N个报销单风险等级的定义,确定测试样本中每个报销单的风险等级,并为每个测试样本标识对应的报销单风险等级的标识信息,根据该标识信息统计得到每个报销单风险等级的测试样本数。
通过使用训练学习得到的初始预测模型对测试样本预测其报销单风险等级,校验和修正模型生成过程中产生的规则。
S52:按照公式(3)计算测试样本中每个报销单风险等级的概率:
Figure PCTCN2018081527-appb-000009
其中,i∈[1,N],P i为测试样本中第i个报销单风险等级的概率,R i为第i个报销单风险等级的测试样本数,S为测试样本的总数。
S53:从每个报销单风险等级中选择一组模型参数进行组合,得到L种组合方式,其中,L为正整数。
在本申请实施例中,根据每个报销单风险等级中预设的一组或者多组模型参数要求,对每个报销单风险等级的训练样本进行关联规则挖掘,模型参数包括支持度阈值和置信度阈值。在每个报销单风险等级的训练样本中,根据每组模型参数均能够筛选得到对应的满足该模型参数要求的关联规则。
具体地,在N个报销单风险等级的多组模型参数中,从每个报销单风险等级中选择一组模型参数进行组合,得到L种不同的组合方式,其中,L为正整数。
例如,当报销单风险等级预设为0、1、2、3四个等级,每个报销单风险等级的模型参数分别预设为:
P 0:(sup 0,confid 0)={(x 01,y 01),(x 02,y 02),(x 03,y 03)}
P 1:(sup 1,confid 1)={(x 11,y 11),(x 12,y 12)}
P 2:(sup 2,confid 2)={(x 21,y 21)}
P 3:(sup 3,confid 3)={(x 31,y 31),(x 32,y 32)}
则组合方式为:
L 1:{(x 01,y 01),(x 11,y 11),(x 21,y 21),(x 31,y 31)}
L 2:{(x 01,y 01),(x 12,y 12),(x 21,y 21),(x 31,y 31)}
L 3:{(x 01,y 01),(x 11,y 11),(x 21,y 21),(x 32,y 32)}
一共有3×2×1×2=12种组合方式。
S54:针对每种组合方式,按照概率由高到低的顺序,使用初始预测模型对测试样本进行报销单风险等级预测,得到每个测试样本的预测结果,并获取在该组合方式下的进行报销单风险等级预测的测试时间。
在本申请实施例中,按照公式(3)计算得到的测试样本中每个报销单风险等级的概率,针对每种组合方式,按照概率由高到低的顺序,使用训练得到的初始预测模型对测试样本进行报销单风险等级预测,得到每个测试样本的预测结果,并获取在该组合方式下完成对全部测试样本进行报销单风险等级预测的测试时间,一共得到L种组合方式下每个测试样本的预测结果,以及对应的测试时间,用于进一步分析初始预测模型的精确度。
S55:将每个测试样本的预测结果与该测试样本的报销单风险等级进行对比,若两者相同则确认该测试样本预测成功,并统计在每个报销单风险等级下的测试样本预测成功的个数。
具体地,根据步骤S54预测得到的每个测试样本的报销单风险等级的预测结果,与该测试样本的报销单风险等级的标识信息进行对比分析,若两者报销单风险等级相同则确认该测试样本预测成功,若两者报销单风险等级不同则确认该测试样本预测失败。
统计在每个报销单风险等级下的测试样本预测成功的个数,用于计算每种组合方式下的每个报销单风险等级的预测成功率。
S56:按照公式(4)计算每种组合方式下的每个报销单风险等级的预测成功率:
Figure PCTCN2018081527-appb-000010
其中,hitrate i为第i个报销单风险等级的预测成功率,M i为第i个报销单风险等级下的测试样本预测成功的个数,R i为第i个报销单风险等级的测试样本数。
S57:按照公式(5)计算每种组合方式下的总预测成功率:
Figure PCTCN2018081527-appb-000011
其中,hitRate为总预测成功率,M i为第i个报销单风险等级下的测试样本预测成功的个数,S为测试样本的总数。
例如,当报销单风险等级预设为0、1、2、3四个等级,对采集的605790个报销单测试样本,进行报销单风险等级预测,根据预设的报销单风险等级的定义,标识并统计每个报销单风险等级的测试样本数,其中,0风险等级的报销单样本数有561627个,1风险等级的报销单样本数有34818个,2风险等级的报销单样本数有13个,3风险等级的报销单样本数有9332个。
当使用sup 0=0.8,confid 0=0.95,sup 1=0.4,confid 1=0.7,sup 2=0.4,confdi 2=0.95,sup 3=0.4,confid 3=0.7作为预设的模型参数要求,对测试样本进行报销单风险等级预测,并将每个测试样本的预测结果与该测试样本的标识信息标识的报销单风险等级进行对比后,得到各风险等级预测成功的结果为:0风险等级的报销单个数为561527个,1风险等级的报销单个数为30821个,2风险等级的报销单个数为1个,3风险等级的报销单个数为1532个,总共预测成功的报销单个数为593881个。
按照公式(4)计算得到:0风险等级的报销单预测成功率hitrate 0为561527/561627=99.98219%,1风险等级的报销单预测成功率hitrate 1为30821/34818=88.52285%,2风险等级的报销单预测成功率hitrate 2为1/13=7.69230%,3风险等级的报销单预测成功率hitrate 3为1532/9332=16.41663%。按照公式(5)计算得到总预测成功率hitRate为593881/605790=98.03413%。
在图3对应的实施例中,通过计算测试样本中每个报销单风险等级的概率,从每个报销单风险等级中选择一组模型参数进行组合,按照概率由高到低的顺序,使用初始预测模型对测试样本进行报销单风险等级预测,检验初始预测模型的识别率,提高了模型测试的效率。将每个测试样本的预测结果与预先标识的报销单风险等级进行对比,得到在每个报销单风险等级下的测试样本预测成功的个数,并计算每种组合方式下的每个报销单风险等级的预测成功率和总预测成功率,以便根据预测成功率、测试时间和总预测成功率进一步分析初始预测模型的精确度,进行校验和修正模型生成过程中产生的规则,实现对初始预测模型的优化,得到精准的目标预测模型,使得目标预测模型能够辅助工作人员精准高效地识别报销单的风险级别,有效提高预测报销单风险等级的准确率。
在图3对应的实施例的基础之上,下面通过一个具体的实施例对步骤S6中提及的对模型参数、预测成功率、测试时间和总预测成功率进行回归分析,得到目标预测模型的具体实现方法进行详细说明。
请参阅图4,图4示出了本申请实施例提供的步骤S6的具体实现流程,详述如下:
S61:将每个报销单风险等级中的模型参数,以及预测成功率和测试时间作为设计变量,将总预测成功率作为目标变量,使用设计变量和目标变量进行函数拟合,得到拟合函数。
在本申请实施例中,将每个报销单风险等级中的模型参数,以及预测成功率和测试时间作为设计变量,将总预测成功率作为目标变量,使用设计变量和目标变量进行函数拟合,以每种组合方式下对测试样本预测的结果作为一组数据,对步骤S53中得到L组结果数据进行拟合,拟合的方式具体可以表示为:
Figure PCTCN2018081527-appb-000012
其中,n表示报销单风险等级的个数,t为每种组合方式下完成全部测试样本的报销单风险等级预测的测试时间,δ为运行配置参数,δ是根据系统软硬件配置预设的一个常数,其具体可以根据实际应用的需要进行设置,此处不做限制。
通过参数t和参数δ的组合可以对拟合过程的功能模块的执行效率进行调节。
具体地,函数拟合的方式可以使用办公软件(Microsoft Excel,excel)或者数学软件(Matrix Laboratory,matlab)等工具进行拟合,对包含支持度和置信度的模型参数、预测成功率和总预测成功率等离散型数据进行非线性回归分析,寻找设计变量与目标变量之间的关系,并根据该关系确定拟合函数的表达式f(x),从而拟合出与离散型数据相吻合的离散方程。
S62:对拟合函数进行求解,根据求解结果将总预测成功率最高并且模型参数的值最高的一组设计变量作为模型配置参数,并根据模型配置参数构建目标预测模型,其中,目标预测模型的模型精确度为最高的总预测成功率。
具体地,对拟合得到的拟合函数f(x)进行求解,根据求解结果将总预测成功率最高并且模型参数的值最高的一组设计变量作为模型配置参数,其中,支持度阈值和置信度阈值越大,得到的关联规则越准确,并根据模型配置参数以及满足该模型配置参数要求的关联规则构建目标预测模型。
使用目标预测模型预测报销单风险等级时,以目标预测模型的模型精确度为最高的总 预测成功率,作为评价模型质量的标准,模型的总预测成功率越高,模型精确度也越高。
在图4对应的实施例中,通过将每个报销单风险等级中的模型参数,以及预测成功率和测试时间作为设计变量,将总预测成功率作为目标变量,作非线性回归分析进行函数拟合,以寻找设计变量与目标变量之间的关系,得到拟合函数的表达式,对拟合函数进行求解,根据求解结果将总预测成功率最高并且模型参数的值最高的一组设计变量作为模型配置参数,提高关联规则的准确性,并根据模型配置参数以及对应满足模型参数要求的关联规则进行构建目标预测模型,从而提高目标预测模型进行预测的准确率。
在图4对应的实施例的基础之上,在步骤S6中提及的对模型参数、预测成功率、测试时间和总预测成功率进行回归分析,得到目标预测模型之后,还可以进一步的使用交叉验证的方法选择合理模型。
如图5所示,该报销单风险预测方法还包括:
S71:将样本数据分割成K个子样本数据。
在本申请实施例中,使用交叉验证的精度测试方法对拟合优化后的目标预测模型进行验证,将采集到的报销单样本数据采用随机分割的方式分割成K个子样本数据,通过机器学习的方式进行多次目标预测模型的构建,以及对构建得到的目标预测模型进行精确度评价,避免训练得到的模型出现过拟合的情况,其中,K为正整数。
过拟合是指拟合函数与训练样本高度吻合,但是求解得到的模型配置参数用于预测测试样本的报销单风险等级成功率却不高的情况。
需要说明的是,交叉验证可以采用留出验证(holdout cross validation,holdout),K折交叉验证(k-fold cross validatio)或者留一验证(leave-one-out cross validation,loocv)等方式,将样本数据切割成较小子样本之后,获取其中大部分样本进行模型构建,剩余的小部分样本用于对建立的模型进行测试。
S72:从K个子样本数据中,选择一个子样本数据作为测试样本,剩余K-1个子样本数据作为训练样本,进行模型训练、模型预测和回归分析,得到K个目标预测模型和每个目标预测模型的模型精确度,其中,K为正整数。
在本申请实施例中,从K个子样本数据中,选择其中一个子样本数据作为验证模型的测试样本,其他K-1个子样本数据作为特征学习的训练样本,执行步骤S3至步骤S6的过程,进行模型训练、模型预测和回归分析,完成一次目标预测模型的构建,得到目标预测模型及其模型精确度。按照该构建方式,将K个子样本数据中的每个子样本数据作为测 试样本进行一次目标预测模型的构建,得到K个结果,包括K个目标预测模型和每个目标预测模型的模型精确度。
S73:将模型精确度最高的目标预测模型作为合理模型。
具体地,对得到K个目标预测模型和每个目标预测模型的模型精确度进行对比分析,将模型精确度最高的目标预测模型作为合理模型,从而得到可靠稳定的合理模型。
合理模型一方面能够拟合样本数据,另一方面能够以高准确率进行新的报销单数据的风险等级预测,该合理模型能够预测出准确的报销单风险等级,并将报销单数据和对应的报销单风险等级存储到报销单数据库中。
进一步地,按照预设的时间间隔,该时间间隔可以是1个月、2个月或者其它时间范围,每隔预定的时间间隔从报销单数据库中随机获取历史报销单信息,重复执行步骤S1至步骤S6过程,完成自主机器学习,得到更新后的目标预测模型,从而进一步地优化模型的精确度,提高报销单风险等级预测成功率,实现报销单风险等级的精准预测。
在图5对应的实施例中,通过交叉验证的方法对模型精度进行测试,运用随机分割的子样本数据进行多次训练和验证,避免训练得到的目标预测模型出现拟合不当的情况,并从验证结果中选择模型精确度最高的目标预测模型作为合理模型,即能够拟合样本数据,又能够以高准确率实现对新的报销单数据的预测,提高了报销单风险等级预测的准确率。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
实施例2
对应于实施例1中的报销单风险预测方法,图6示出与实施例1所示的报销单风险预测方法一一对应的报销单风险预测装置,为了便于说明,仅示出了与本申请实施例相关的部分。
如图6所示,该报销单风险预测装置包括样本数据采集模块61、样本数据划分模块62、风险等级预设模块63、初始预测模型获取模块64、初始预测模型测试模块65和目标预测模型获取模块66。各功能模块详细说明如下:
样本数据采集模块61,用于获取历史报销单信息,并将历史报销单信息作为样本数据;
第一划分模块62,用于将样本数据按照预设的比例划分为训练样本和测试样本;
风险等级预设模块63,用于根据预设的N个报销单风险等级的定义,确定每个训练样本的报销单风险等级,其中,N为正整数;
初始预测模型获取模块64,用于针对每个报销单风险等级中的训练样本,使用关联规 则算法进行模型训练,得到初始预测模型,其中,初始预测模型包括每个报销单风险等级中满足预设的模型参数要求的关联规则,模型参数包括支持度和置信度;
初始预测模型测试模块65,用于使用初始预测模型对测试样本进行模型预测,在从每个报销单风险等级中选择一组模型参数进行组合得到的每种组合方式下,计算每个报销单风险等级的预测成功率,以及每种组合方式下的总预测成功率和测试时间;
目标预测模型获取模块66,用于对模型参数、预测成功率、测试时间和总预测成功率进行回归分析,得到目标预测模型。
进一步地,初始预测模型获取模块64包括:
数据预处理单元641,用于对每个报销单风险等级中的训练样本进行数据预处理,得到每个报销单风险等级中的待处理数据集;
训练样本挖掘单元642,用于对待处理数据集使用关联规则算法进行数据挖掘,得到每个报销单风险等级中的多个项集;
关联规则获取单元643,用于针对每个报销单风险等级,从该报销单风险等级中的项集中筛选出满足模型参数要求的目标项集,并根据该目标项集建立关联规则;
初始预测模型构建单元644,用于根据关联规则和关联规则对应的模型参数要求,构建初始预测模型。
进一步地,初始预测模型测试模块65包括:
第一统计单元651,用于根据预设的N个报销单风险等级的定义,确定每个测试样本的报销单风险等级,以及每个报销单风险等级的测试样本数;
第一计算单元652,用于按照如下公式计算测试样本中每个报销单风险等级的概率:
Figure PCTCN2018081527-appb-000013
其中,i∈[1,N],P i为测试样本中第i个报销单风险等级的概率,R i为第i个报销单风险等级的测试样本数,S为测试样本的总数;
预测方式组合单元653,用于从每个报销单风险等级中选择一组模型参数进行组合,得到L种组合方式,其中,L为正整数;
测试样本预测单元654,用于针对每种组合方式,按照概率由高到低的顺序,使用初始预测模型对测试样本进行报销单风险等级预测,得到每个测试样本的预测结果,并获取在该组合方式下的进行报销单风险等级预测的测试时间;
第二统计单元655,用于将每个测试样本的预测结果与该测试样本的报销单风险等级 进行对比,若两者相同则确认该测试样本预测成功,并统计在每种组合方式下每个报销单风险等级下的测试样本预测成功的个数;
第二计算单元656,用于按照如下公式计算每种组合方式下每个报销单风险等级的预测成功率:
Figure PCTCN2018081527-appb-000014
其中,hitrate i为第i个报销单风险等级的预测成功率,M i为第i个报销单风险等级下的测试样本预测成功的个数;
第三计算单元657,用于按照如下公式计算每种组合方式下的总预测成功率:
Figure PCTCN2018081527-appb-000015
其中,hitRate为总预测成功率。
进一步地,目标预测模型获取模块66包括:
数据拟合单元661,用于将每个报销单风险等级中的模型参数,以及预测成功率和测试时间作为设计变量,将总预测成功率作为目标变量,使用设计变量和目标变量进行函数拟合,得到拟合函数;
目标预测模型构建单元662,用于对拟合函数进行求解,根据求解结果将总预测成功率最高并且模型参数的值最高的一组设计变量作为模型配置参数,并根据模型配置参数构建目标预测模型,其中,目标预测模型的模型精确度为最高的总预测成功率。
进一步地,该报销单风险预测装置还包括:
第二划分模块67,用于将样本数据分割成K个子样本数据;
交叉验证模块68,用于从K个子样本数据中,选择一个子样本数据作为测试样本,剩余K-1个子样本数据作为训练样本,进行模型训练、模型预测和回归分析,得到K个目标预测模型和每个目标预测模型的模型精确度,其中,K为正整数;
合理模型获取模块69,用于将模型精确度最高的目标预测模型作为合理模型。
本实施例提供的一种报销单风险预测装置中各模块实现各自功能的过程,具体可参考前述方法实施例1的描述,此处不再赘述。
实施例3
本实施例提供一个或多个存储有计算机可读指令的非易失性可读存储介质。该计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行实施例1中报销单风险 预测方法,或者,该计算机可读指令被一个或多个处理器执行时实现实施例2中报销单风险预测装置中各模块/单元的功能,为避免重复,这里不再赘述。
可以理解地,一个或多个存储有计算机可读指令的非易失性可读存储介质可以包括:能够携带所述计算机可读指令的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(Read-Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、电载波信号和电信信号等。
实施例4
图7是本申请一实施例提供的终端设备的示意图。如图7所示,该实施例的终端设备7包括:处理器70、存储器71以及存储在存储器71中并可在处理器70上运行的计算机可读指令72。处理器70执行计算机可读指令72时实现上述各个报销单风险预测方法实施例中的步骤,例如图1所示的步骤S1至步骤S6。或者,处理器70执行计算机可读指令72时实现上述各装置实施例中各模块/单元的功能,例如图6所示模块61至模块66的功能。
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。

Claims (20)

  1. 一种报销单风险预测方法,其特征在于,所述报销单风险预测方法包括:
    获取历史报销单信息,并将所述历史报销单信息作为样本数据;
    将所述样本数据按照预设的比例划分为训练样本和测试样本;
    根据预设的N个报销单风险等级的定义,确定每个所述训练样本的报销单风险等级,其中,N为正整数;
    针对每个所述报销单风险等级中的所述训练样本,使用关联规则算法进行模型训练,得到初始预测模型,其中,所述初始预测模型包括每个所述报销单风险等级中满足预设的模型参数要求的关联规则,所述模型参数包括支持度和置信度;
    使用所述初始预测模型对所述测试样本进行模型预测,在从每个所述报销单风险等级中选择一组所述模型参数进行组合得到的每种组合方式下,计算每个所述报销单风险等级的预测成功率,以及每种所述组合方式下的总预测成功率和测试时间;
    对所述模型参数、所述预测成功率、所述测试时间和所述总预测成功率进行回归分析,得到目标预测模型。
  2. 如权利要求1所述的报销单风险预测方法,其特征在于,所述针对每个所述报销单风险等级中的所述训练样本,使用关联规则算法进行模型训练,得到初始预测模型包括:
    对每个所述报销单风险等级中的所述训练样本进行数据预处理,得到每个所述报销单风险等级中的待处理数据集;
    对所述待处理数据集使用关联规则算法进行数据挖掘,得到每个所述报销单风险等级中的多个项集;
    针对每个所述报销单风险等级,从该报销单风险等级中的所述项集中筛选出满足所述模型参数要求的目标项集,并根据该目标项集建立关联规则;
    根据所述关联规则和所述关联规则对应的所述模型参数要求,构建所述初始预测模型。
  3. 如权利要求1或2所述的报销单风险预测方法,其特征在于,所述使用所述初始预测模型对所述测试样本进行模型预测,在从每个所述报销单风险等级中选择一组所述模型参数进行组合得到的每种组合方式下,计算每个所述报销单风险等级的预测成功率,以及该组合方式下的总预测成功率和测试时间包括:
    根据所述预设的N个报销单风险等级的定义,确定每个所述测试样本的报销单风险等 级,以及每个所述报销单风险等级的测试样本数;
    按照如下公式计算所述测试样本中每个报销单风险等级的概率:
    Figure PCTCN2018081527-appb-100001
    其中,i∈[1,N],P i为所述测试样本中第i个报销单风险等级的概率,R i为第i个所述报销单风险等级的测试样本数,S为所述测试样本的总数;
    从每个所述报销单风险等级中选择一组所述模型参数进行组合,得到L种组合方式,其中,L为正整数;
    针对每种所述组合方式,按照所述概率由高到低的顺序,使用所述初始预测模型对所述测试样本进行报销单风险等级预测,得到每个所述测试样本的预测结果,并获取在该组合方式下的进行报销单风险等级预测的测试时间;
    将每个所述测试样本的所述预测结果与该测试样本的报销单风险等级进行对比,若两者相同则确认该测试样本预测成功,并统计在每种所述组合方式下每个所述报销单风险等级下的测试样本预测成功的个数;
    按照如下公式计算每种所述组合方式下每个所述报销单风险等级的预测成功率:
    Figure PCTCN2018081527-appb-100002
    其中,hitrate i为第i个所述报销单风险等级的预测成功率,M i为第i个所述报销单风险等级下的测试样本预测成功的个数;
    按照如下公式计算每种所述组合方式下的总预测成功率:
    Figure PCTCN2018081527-appb-100003
    其中,hitRate为所述总预测成功率。
  4. 如权利要求3所述的报销单风险预测方法,其特征在于,所述对所述模型参数、所述预测成功率、所述测试时间和所述总预测成功率进行回归分析,得到目标预测模型包括:
    将每个所述报销单风险等级中的所述模型参数,以及所述预测成功率和所述测试时间作为设计变量,将所述总预测成功率作为目标变量,使用所述设计变量和所述目标变量进行函数拟合,得到拟合函数;
    对所述拟合函数进行求解,根据求解结果将所述总预测成功率最高并且所述模型参数的值最高的一组设计变量作为模型配置参数,并根据所述模型配置参数构建目标预测模 型,其中,所述目标预测模型的模型精确度为最高的所述总预测成功率。
  5. 如权利要求4所述的报销单风险预测方法,其特征在于,所述对所述模型参数、所述预测成功率、所述测试时间和所述总预测成功率进行回归分析,得到目标预测模型之后,所述报销单风险预测方法还包括:
    将所述样本数据分割成K个子样本数据;
    从所述K个子样本数据中,选择一个所述子样本数据作为所述测试样本,剩余K-1个所述子样本数据作为所述训练样本,进行所述模型训练、所述模型预测和所述回归分析,得到K个所述目标预测模型和每个所述目标预测模型的所述模型精确度,其中,K为正整数;
    将所述模型精确度最高的目标预测模型作为合理模型。
  6. 一种报销单风险预测装置,其特征在于,所述报销单风险预测装置包括:
    样本数据采集模块,用于获取历史报销单信息,并将所述历史报销单信息作为样本数据;
    第一划分模块,用于将所述样本数据按照预设的比例划分为训练样本和测试样本;
    风险等级预设模块,用于根据预设的N个报销单风险等级的定义,确定每个所述训练样本的报销单风险等级,其中,N为正整数;
    初始预测模型获取模块,用于针对每个所述报销单风险等级中的所述训练样本,使用关联规则算法进行模型训练,得到初始预测模型,其中,所述初始预测模型包括每个所述报销单风险等级中满足预设的模型参数要求的关联规则,所述模型参数包括支持度和置信度;
    初始预测模型测试模块,用于使用所述初始预测模型对所述测试样本进行模型预测,在从每个所述报销单风险等级中选择一组所述模型参数进行组合得到的每种组合方式下,计算每个所述报销单风险等级的预测成功率,以及每种所述组合方式下的总预测成功率和测试时间;
    目标预测模型获取模块,用于对所述模型参数、所述预测成功率、所述测试时间和所述总预测成功率进行回归分析,得到目标预测模型。
  7. 如权利要求6所述的报销单风险预测装置,其特征在于,所述初始预测模型获取模块包括:
    数据预处理单元,用于对每个所述报销单风险等级中的所述训练样本进行数据预处理,得到每个所述报销单风险等级中的待处理数据集;
    训练样本挖掘单元,用于对所述待处理数据集使用关联规则算法进行数据挖掘,得到每个所述报销单风险等级中的多个项集;
    关联规则获取单元,用于针对每个所述报销单风险等级,从该报销单风险等级中的所述项集中筛选出满足所述模型参数要求的目标项集,并根据该目标项集建立关联规则;
    初始预测模型构建单元,用于根据所述关联规则和所述关联规则对应的所述模型参数要求,构建所述初始预测模型。
  8. 如权利要求6或7所述的报销单风险预测装置,其特征在于,所述初始预测模型测试模块包括:
    第一统计单元,用于根据预设的N个报销单风险等级的定义,确定每个测试样本的报销单风险等级,以及每个报销单风险等级的测试样本数;
    第一计算单元,用于按照如下公式计算测试样本中每个报销单风险等级的概率:
    Figure PCTCN2018081527-appb-100004
    其中,i∈[1,N],P i为测试样本中第i个报销单风险等级的概率,R i为第i个报销单风险等级的测试样本数,S为测试样本的总数;
    预测方式组合单元,用于从每个报销单风险等级中选择一组模型参数进行组合,得到L种组合方式,其中,L为正整数;
    测试样本预测单元,用于针对每种组合方式,按照概率由高到低的顺序,使用初始预测模型对测试样本进行报销单风险等级预测,得到每个测试样本的预测结果,并获取在该组合方式下的进行报销单风险等级预测的测试时间;
    第二统计单元,用于将每个测试样本的预测结果与该测试样本的报销单风险等级进行对比,若两者相同则确认该测试样本预测成功,并统计在每种组合方式下每个报销单风险等级下的测试样本预测成功的个数;
    第二计算单元,用于按照如下公式计算每种组合方式下每个报销单风险等级的预测成功率:
    Figure PCTCN2018081527-appb-100005
    其中,hitrate i为第i个报销单风险等级的预测成功率,M i为第i个报销单风险等级下的测试样本预测成功的个数;
    第三计算单元,用于按照如下公式计算每种组合方式下的总预测成功率:
    Figure PCTCN2018081527-appb-100006
    其中,hitRate为总预测成功率。
  9. 如权利要求8所述的报销单风险预测装置,其特征在于,所述目标预测模型获取模块包括:
    数据拟合单元,用于将每个报销单风险等级中的模型参数,以及预测成功率和测试时间作为设计变量,将总预测成功率作为目标变量,使用设计变量和目标变量进行函数拟合,得到拟合函数;
    目标预测模型构建单元,用于对拟合函数进行求解,根据求解结果将总预测成功率最高并且模型参数的值最高的一组设计变量作为模型配置参数,并根据模型配置参数构建目标预测模型,其中,目标预测模型的模型精确度为最高的总预测成功率。
  10. 如权利要求9所述的报销单风险预测装置,其特征在于,所述报销单风险预测装置还包括:
    第二划分模块,用于将所述样本数据分割成K个子样本数据;
    交叉验证模块,用于从所述K个子样本数据中,选择一个所述子样本数据作为所述测试样本,剩余K-1个所述子样本数据作为所述训练样本,进行所述模型训练、所述模型预测和所述回归分析,得到K个所述目标预测模型和每个所述目标预测模型的所述模型精确度,其中,K为正整数;
    合理模型获取模块,用于将所述模型精确度最高的目标预测模型作为合理模型。
  11. 一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其特征在于,所述处理器执行所述计算机可读指令时实现如下步骤:
    获取历史报销单信息,并将所述历史报销单信息作为样本数据;
    将所述样本数据按照预设的比例划分为训练样本和测试样本;
    根据预设的N个报销单风险等级的定义,确定每个所述训练样本的报销单风险等级,其中,N为正整数;
    针对每个所述报销单风险等级中的所述训练样本,使用关联规则算法进行模型训练,得到初始预测模型,其中,所述初始预测模型包括每个所述报销单风险等级中满足预设的模型参数要求的关联规则,所述模型参数包括支持度和置信度;
    使用所述初始预测模型对所述测试样本进行模型预测,在从每个所述报销单风险等级中选择一组所述模型参数进行组合得到的每种组合方式下,计算每个所述报销单风险等级的预测成功率,以及每种所述组合方式下的总预测成功率和测试时间;
    对所述模型参数、所述预测成功率、所述测试时间和所述总预测成功率进行回归分析,得到目标预测模型。
  12. 如权利要求11所述的终端设备,其特征在于,所述针对每个所述报销单风险等级中的所述训练样本,使用关联规则算法进行模型训练,得到初始预测模型包括:
    对每个所述报销单风险等级中的所述训练样本进行数据预处理,得到每个所述报销单风险等级中的待处理数据集;
    对所述待处理数据集使用关联规则算法进行数据挖掘,得到每个所述报销单风险等级中的多个项集;
    针对每个所述报销单风险等级,从该报销单风险等级中的所述项集中筛选出满足所述模型参数要求的目标项集,并根据该目标项集建立关联规则;
    根据所述关联规则和所述关联规则对应的所述模型参数要求,构建所述初始预测模型。
  13. 如权利要求11或12所述的终端设备,其特征在于,所述使用所述初始预测模型对所述测试样本进行模型预测,在从每个所述报销单风险等级中选择一组所述模型参数进行组合得到的每种组合方式下,计算每个所述报销单风险等级的预测成功率,以及该组合方式下的总预测成功率和测试时间包括:
    根据所述预设的N个报销单风险等级的定义,确定每个所述测试样本的报销单风险等级,以及每个所述报销单风险等级的测试样本数;
    按照如下公式计算所述测试样本中每个报销单风险等级的概率:
    Figure PCTCN2018081527-appb-100007
    其中,i∈[1,N],P i为所述测试样本中第i个报销单风险等级的概率,R i为第i个所述报销单风险等级的测试样本数,S为所述测试样本的总数;
    从每个所述报销单风险等级中选择一组所述模型参数进行组合,得到L种组合方式,其中,L为正整数;
    针对每种所述组合方式,按照所述概率由高到低的顺序,使用所述初始预测模型对所述测试样本进行报销单风险等级预测,得到每个所述测试样本的预测结果,并获取在该组 合方式下的进行报销单风险等级预测的测试时间;
    将每个所述测试样本的所述预测结果与该测试样本的报销单风险等级进行对比,若两者相同则确认该测试样本预测成功,并统计在每种所述组合方式下每个所述报销单风险等级下的测试样本预测成功的个数;
    按照如下公式计算每种所述组合方式下每个所述报销单风险等级的预测成功率:
    Figure PCTCN2018081527-appb-100008
    其中,hitrate i为第i个所述报销单风险等级的预测成功率,M i为第i个所述报销单风险等级下的测试样本预测成功的个数;
    按照如下公式计算每种所述组合方式下的总预测成功率:
    Figure PCTCN2018081527-appb-100009
    其中,hitRate为所述总预测成功率。
  14. 如权利要求13所述的终端设备,其特征在于,所述对所述模型参数、所述预测成功率、所述测试时间和所述总预测成功率进行回归分析,得到目标预测模型包括:
    将每个所述报销单风险等级中的所述模型参数,以及所述预测成功率和所述测试时间作为设计变量,将所述总预测成功率作为目标变量,使用所述设计变量和所述目标变量进行函数拟合,得到拟合函数;
    对所述拟合函数进行求解,根据求解结果将所述总预测成功率最高并且所述模型参数的值最高的一组设计变量作为模型配置参数,并根据所述模型配置参数构建目标预测模型,其中,所述目标预测模型的模型精确度为最高的所述总预测成功率。
  15. 如权利要求14所述的终端设备,其特征在于,所述对所述模型参数、所述预测成功率、所述测试时间和所述总预测成功率进行回归分析,得到目标预测模型之后,所述处理器执行所述计算机可读指令时还实现如下步骤:
    将所述样本数据分割成K个子样本数据;
    从所述K个子样本数据中,选择一个所述子样本数据作为所述测试样本,剩余K-1个所述子样本数据作为所述训练样本,进行所述模型训练、所述模型预测和所述回归分析,得到K个所述目标预测模型和每个所述目标预测模型的所述模型精确度,其中,K为正整数;
    将所述模型精确度最高的目标预测模型作为合理模型。
  16. 一个或多个存储有计算机可读指令的非易失性可读存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行如下步骤:
    获取历史报销单信息,并将所述历史报销单信息作为样本数据;
    将所述样本数据按照预设的比例划分为训练样本和测试样本;
    根据预设的N个报销单风险等级的定义,确定每个所述训练样本的报销单风险等级,其中,N为正整数;
    针对每个所述报销单风险等级中的所述训练样本,使用关联规则算法进行模型训练,得到初始预测模型,其中,所述初始预测模型包括每个所述报销单风险等级中满足预设的模型参数要求的关联规则,所述模型参数包括支持度和置信度;
    使用所述初始预测模型对所述测试样本进行模型预测,在从每个所述报销单风险等级中选择一组所述模型参数进行组合得到的每种组合方式下,计算每个所述报销单风险等级的预测成功率,以及每种所述组合方式下的总预测成功率和测试时间;
    对所述模型参数、所述预测成功率、所述测试时间和所述总预测成功率进行回归分析,得到目标预测模型。
  17. 如权利要求16所述的非易失性可读存储介质,其特征在于,所述针对每个所述报销单风险等级中的所述训练样本,使用关联规则算法进行模型训练,得到初始预测模型包括:
    对每个所述报销单风险等级中的所述训练样本进行数据预处理,得到每个所述报销单风险等级中的待处理数据集;
    对所述待处理数据集使用关联规则算法进行数据挖掘,得到每个所述报销单风险等级中的多个项集;
    针对每个所述报销单风险等级,从该报销单风险等级中的所述项集中筛选出满足所述模型参数要求的目标项集,并根据该目标项集建立关联规则;
    根据所述关联规则和所述关联规则对应的所述模型参数要求,构建所述初始预测模型。
  18. 如权利要求16或17所述的非易失性可读存储介质,其特征在于,所述使用所述初始预测模型对所述测试样本进行模型预测,在从每个所述报销单风险等级中选择一组所述模型参数进行组合得到的每种组合方式下,计算每个所述报销单风险等级的预测成功率,以及该组合方式下的总预测成功率和测试时间包括:
    根据所述预设的N个报销单风险等级的定义,确定每个所述测试样本的报销单风险等 级,以及每个所述报销单风险等级的测试样本数;
    按照如下公式计算所述测试样本中每个报销单风险等级的概率:
    Figure PCTCN2018081527-appb-100010
    其中,i∈[1,N],P i为所述测试样本中第i个报销单风险等级的概率,R i为第i个所述报销单风险等级的测试样本数,S为所述测试样本的总数;
    从每个所述报销单风险等级中选择一组所述模型参数进行组合,得到L种组合方式,其中,L为正整数;
    针对每种所述组合方式,按照所述概率由高到低的顺序,使用所述初始预测模型对所述测试样本进行报销单风险等级预测,得到每个所述测试样本的预测结果,并获取在该组合方式下的进行报销单风险等级预测的测试时间;
    将每个所述测试样本的所述预测结果与该测试样本的报销单风险等级进行对比,若两者相同则确认该测试样本预测成功,并统计在每种所述组合方式下每个所述报销单风险等级下的测试样本预测成功的个数;
    按照如下公式计算每种所述组合方式下每个所述报销单风险等级的预测成功率:
    Figure PCTCN2018081527-appb-100011
    其中,hitrate i为第i个所述报销单风险等级的预测成功率,M i为第i个所述报销单风险等级下的测试样本预测成功的个数;
    按照如下公式计算每种所述组合方式下的总预测成功率:
    Figure PCTCN2018081527-appb-100012
    其中,hitRate为所述总预测成功率。
  19. 如权利要求18所述的非易失性可读存储介质,其特征在于,所述对所述模型参数、所述预测成功率、所述测试时间和所述总预测成功率进行回归分析,得到目标预测模型包括:
    将每个所述报销单风险等级中的所述模型参数,以及所述预测成功率和所述测试时间作为设计变量,将所述总预测成功率作为目标变量,使用所述设计变量和所述目标变量进行函数拟合,得到拟合函数;
    对所述拟合函数进行求解,根据求解结果将所述总预测成功率最高并且所述模型参数 的值最高的一组设计变量作为模型配置参数,并根据所述模型配置参数构建目标预测模型,其中,所述目标预测模型的模型精确度为最高的所述总预测成功率。
  20. 如权利要求19所述的非易失性可读存储介质,其特征在于,所述对所述模型参数、所述预测成功率、所述测试时间和所述总预测成功率进行回归分析,得到目标预测模型之后,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器还执行如下步骤:
    将所述样本数据分割成K个子样本数据;
    从所述K个子样本数据中,选择一个所述子样本数据作为所述测试样本,剩余K-1个所述子样本数据作为所述训练样本,进行所述模型训练、所述模型预测和所述回归分析,得到K个所述目标预测模型和每个所述目标预测模型的所述模型精确度,其中,K为正整数;
    将所述模型精确度最高的目标预测模型作为合理模型。
PCT/CN2018/081527 2018-02-27 2018-04-02 一种报销单风险预测方法、装置、终端设备及存储介质 WO2019165673A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810161565.6 2018-02-27
CN201810161565.6A CN108364106A (zh) 2018-02-27 2018-02-27 一种报销单风险预测方法、装置、终端设备及存储介质

Publications (1)

Publication Number Publication Date
WO2019165673A1 true WO2019165673A1 (zh) 2019-09-06

Family

ID=63003052

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/081527 WO2019165673A1 (zh) 2018-02-27 2018-04-02 一种报销单风险预测方法、装置、终端设备及存储介质

Country Status (2)

Country Link
CN (1) CN108364106A (zh)
WO (1) WO2019165673A1 (zh)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111191871A (zh) * 2019-11-21 2020-05-22 深圳壹账通智能科技有限公司 项目基线数据生成方法、装置、计算机设备和存储介质
CN111652746A (zh) * 2020-05-29 2020-09-11 泰康保险集团股份有限公司 信息生成方法、装置、电子设备及存储介质
CN112308170A (zh) * 2020-11-10 2021-02-02 维沃移动通信有限公司 建模方法、装置及电子设备
CN113313279A (zh) * 2020-02-27 2021-08-27 北京沃东天骏信息技术有限公司 一种单据审核方法和装置
CN113723800A (zh) * 2021-08-27 2021-11-30 上海幻电信息科技有限公司 风险识别模型训练方法及装置、风险识别方法及装置
CN114629797A (zh) * 2022-03-11 2022-06-14 阿里巴巴(中国)有限公司 带宽预测方法、模型生成方法及设备
CN115481929A (zh) * 2022-10-17 2022-12-16 四川大学华西医院 改造措施效度评估方法、装置、终端设备及存储介质
CN117094184A (zh) * 2023-10-19 2023-11-21 上海数字治理研究院有限公司 基于内网平台的风险预测模型的建模方法、系统及介质

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109544385B (zh) * 2018-11-07 2023-06-02 平安医疗健康管理股份有限公司 一种基于报销数据的诊疗真实性检测方法及系统
CN109493245A (zh) * 2018-11-07 2019-03-19 平安医疗健康管理股份有限公司 医保报销数据的风险管控及相关装置
CN109522304B (zh) * 2018-11-23 2021-05-18 中国联合网络通信集团有限公司 异常对象识别方法及装置、存储介质
CN109903165B (zh) * 2018-12-14 2020-10-16 阿里巴巴集团控股有限公司 一种模型合并方法和装置
CN109816158A (zh) * 2019-01-04 2019-05-28 平安科技(深圳)有限公司 预测模型的组合方法、装置、设备及可读存储介质
CN109784343B (zh) * 2019-01-25 2023-05-12 上海深杳智能科技有限公司 一种基于深度学习模型的资源分配方法及终端
CN110046229B (zh) * 2019-04-18 2021-07-23 北京百度网讯科技有限公司 用于获取信息的方法及装置
CN112084106B (zh) * 2019-06-14 2023-08-01 中国移动通信集团浙江有限公司 测试数据选取的方法、装置、计算设备及计算机存储介质
CN111160662A (zh) * 2019-12-31 2020-05-15 咪咕文化科技有限公司 一种风险预测方法、电子设备及存储介质
CN113254919B (zh) * 2021-07-14 2021-10-12 杭州云信智策科技有限公司 异常设备识别方法、电子设备和计算机可读存储介质
CN113656558B (zh) * 2021-08-25 2023-07-21 平安科技(深圳)有限公司 基于机器学习对关联规则进行评估的方法及装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130317889A1 (en) * 2012-05-11 2013-11-28 Infosys Limited Methods for assessing transition value and devices thereof
CN105022829A (zh) * 2015-07-30 2015-11-04 四川长虹电器股份有限公司 一种数据处理系统和方法
CN105718490A (zh) * 2014-12-04 2016-06-29 阿里巴巴集团控股有限公司 一种用于更新分类模型的方法及装置
CN106228441A (zh) * 2016-08-03 2016-12-14 北京天职信息技术有限公司西安分公司 一种基于网络的财务发票报销上传审核方法
CN106934586A (zh) * 2015-12-31 2017-07-07 远光软件股份有限公司 报销单据辅助审批的方法及装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104376221B (zh) * 2014-11-21 2018-06-15 环境保护部南京环境科学研究所 一种预测有机化学品的皮肤渗透系数的方法
US11530448B2 (en) * 2015-11-13 2022-12-20 Biotheranostics, Inc. Integration of tumor characteristics with breast cancer index
CN105740984A (zh) * 2016-02-01 2016-07-06 北京理工大学 一种基于性能预测的产品概念性能评价方法
CN107104978B (zh) * 2017-05-24 2019-12-24 赖洪昌 一种基于深度学习的网络风险预警方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130317889A1 (en) * 2012-05-11 2013-11-28 Infosys Limited Methods for assessing transition value and devices thereof
CN105718490A (zh) * 2014-12-04 2016-06-29 阿里巴巴集团控股有限公司 一种用于更新分类模型的方法及装置
CN105022829A (zh) * 2015-07-30 2015-11-04 四川长虹电器股份有限公司 一种数据处理系统和方法
CN106934586A (zh) * 2015-12-31 2017-07-07 远光软件股份有限公司 报销单据辅助审批的方法及装置
CN106228441A (zh) * 2016-08-03 2016-12-14 北京天职信息技术有限公司西安分公司 一种基于网络的财务发票报销上传审核方法

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
HAN, YING ET AL.: "Study on Disease Risk of Rural Residents in a County Based on Association Rules", CHINA 'S RURAL HEALTH MANAGEMENT, vol. 32, no. 9, 30 September 2012 (2012-09-30), pages 895 - 898 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111191871A (zh) * 2019-11-21 2020-05-22 深圳壹账通智能科技有限公司 项目基线数据生成方法、装置、计算机设备和存储介质
CN113313279A (zh) * 2020-02-27 2021-08-27 北京沃东天骏信息技术有限公司 一种单据审核方法和装置
CN111652746A (zh) * 2020-05-29 2020-09-11 泰康保险集团股份有限公司 信息生成方法、装置、电子设备及存储介质
CN111652746B (zh) * 2020-05-29 2023-08-29 泰康保险集团股份有限公司 信息生成方法、装置、电子设备及存储介质
CN112308170A (zh) * 2020-11-10 2021-02-02 维沃移动通信有限公司 建模方法、装置及电子设备
CN113723800A (zh) * 2021-08-27 2021-11-30 上海幻电信息科技有限公司 风险识别模型训练方法及装置、风险识别方法及装置
CN114629797A (zh) * 2022-03-11 2022-06-14 阿里巴巴(中国)有限公司 带宽预测方法、模型生成方法及设备
CN114629797B (zh) * 2022-03-11 2024-03-08 阿里巴巴(中国)有限公司 带宽预测方法、模型生成方法及设备
CN115481929A (zh) * 2022-10-17 2022-12-16 四川大学华西医院 改造措施效度评估方法、装置、终端设备及存储介质
CN115481929B (zh) * 2022-10-17 2023-11-24 四川大学华西医院 改造措施效度评估方法、装置、终端设备及存储介质
CN117094184A (zh) * 2023-10-19 2023-11-21 上海数字治理研究院有限公司 基于内网平台的风险预测模型的建模方法、系统及介质
CN117094184B (zh) * 2023-10-19 2024-01-26 上海数字治理研究院有限公司 基于内网平台的风险预测模型的建模方法、系统及介质

Also Published As

Publication number Publication date
CN108364106A (zh) 2018-08-03

Similar Documents

Publication Publication Date Title
WO2019165673A1 (zh) 一种报销单风险预测方法、装置、终端设备及存储介质
US7885915B2 (en) Analytical system for discovery and generation of rules to predict and detect anomalies in data and financial fraud
US11650968B2 (en) Systems and methods for predictive early stopping in neural network training
US20200286095A1 (en) Method, apparatus and computer programs for generating a machine-learning system and for classifying a transaction as either fraudulent or genuine
US20230050193A1 (en) Probabilistic feature engineering technique for anomaly detection
CN110335168B (zh) 基于gru优化用电信息采集终端故障预测模型的方法及系统
US20220253856A1 (en) System and method for machine learning based detection of fraud
CN110930038A (zh) 一种贷款需求识别方法、装置、终端及存储介质
CN110263827A (zh) 基于交易规律识别的异常交易检测方法及装置
CN108629375B (zh) 电力客户分类方法、系统、终端及计算机可读存储介质
Basiri et al. A hybrid approach to predict churn
US9852390B2 (en) Methods and systems for intelligent evolutionary optimization of workflows using big data infrastructure
Thakkar et al. Clairvoyant: AdaBoost with cost-enabled cost-sensitive classifier for customer churn prediction
WO2020024444A1 (zh) 人群绩效等级识别方法、装置、存储介质及计算机设备
CN115358481A (zh) 一种企业外迁预警识别的方法、系统及装置
US20190139144A1 (en) System, method and computer-accessible medium for efficient simulation of financial stress testing scenarios with suppes-bayes causal networks
Yahaya et al. An enhanced bank customers churn prediction model using a hybrid genetic algorithm and k-means filter and artificial neural network
KR102406375B1 (ko) 원천 기술의 평가 방법을 포함하는 전자 장치
CN112836750A (zh) 一种系统资源分配方法、装置及设备
CN117196630A (zh) 交易风险预测方法、装置、终端设备以及存储介质
Lezcano et al. A multi-objective approach for designing optimized operation sequence on binary image processing
CN110955811B (zh) 基于朴素贝叶斯算法的电力数据分类方法及系统
US11113652B2 (en) System and method for a recommendation mechanism regarding store remodels
CN111654853A (zh) 一种基于用户信息的数据分析方法
Guan et al. Constructing interdependent risks network of project portfolio based on bayesian network

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18907694

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 30/11/2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18907694

Country of ref document: EP

Kind code of ref document: A1