CN113159175B

CN113159175B - Data prediction method, device, equipment and storage medium

Info

Publication number: CN113159175B
Application number: CN202110432977.0A
Authority: CN
Inventors: 叶向荣
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2021-04-21
Filing date: 2021-04-21
Publication date: 2023-06-06
Anticipated expiration: 2041-04-21
Also published as: CN113159175A

Abstract

The invention relates to an intelligent decision making technology, and discloses a data prediction method, a device, equipment and a storage medium, wherein the method comprises the following steps: collecting sample data of a verification decision tree, wherein the sample data comprises request data, verification result data and processing result data, extracting a plurality of characteristic data in the request data, calculating an important coefficient of each characteristic data in each verification decision tree, selecting target characteristic data from the plurality of characteristic data according to the important coefficient, inputting the target characteristic data, the verification result data and the processing result data into the verification decision tree for training, fitting the trained plurality of verification decision trees to obtain a weak model sequence, combining the weak model sequences to obtain an aggregate model, obtaining request data to be verified, predicting the request data to be verified by using the aggregate model, and obtaining verification result data and processing result data corresponding to the request data to be verified. The invention can improve the accuracy of model prediction verification data.

Description

Data prediction method, device, equipment and storage medium

Technical Field

The present invention relates to the field of artificial intelligence technologies, and in particular, to a data prediction method, apparatus, device, and storage medium.

Background

When a user handles a certain service, the user generally needs to submit request data, and then the service organization performs verification on the request data, wherein the data verification refers to a process of verifying authenticity and/or legal compliance of the request data, and after verification, the service organization provides relevant services of the service for the user according to a verification result, and the scene is widely applied to different industries, such as a insurance claim of an insurance industry, loan approval of a financial industry and the like.

At present, the data verification comprises a manual verification mode and an artificial intelligence verification mode, but the information is easy to miss by adopting the manual verification mode, and the method is time-consuming, labor-consuming, strong in subjective portability and low in accuracy; however, the problem of the manual auditing method can be solved by adopting the artificial intelligence method, but the extracted verification elements are incomplete, or a large number of element features are removed for dimension reduction, so that the accuracy of the model is low.

Disclosure of Invention

The invention aims to provide a data prediction method, a data prediction device, data prediction equipment and a storage medium, which aim to improve accuracy of model prediction verification data.

The invention provides a data prediction method, which comprises the following steps:

collecting sample data of a verification decision tree, wherein the sample data comprises request data, verification result data obtained by verifying the request data and processing result data obtained by performing corresponding post-processing on the verification result data;

extracting a plurality of characteristic data in the request data, and calculating an important coefficient of each characteristic data in each verification decision tree according to a preset coefficient calculation method;

selecting target characteristic data from the plurality of characteristic data according to the important coefficient;

inputting the target characteristic data, the verification result data and the processing result data into a verification decision tree for training, and fitting a plurality of trained verification decision trees to obtain a weak model sequence;

combining the weak model sequences according to a preset combination mode to obtain a set model;

and obtaining request data to be verified, and predicting the request data to be verified by utilizing the set model to obtain verification result data and processing result data corresponding to the request data to be verified.

The invention also provides a data prediction device, which comprises:

the acquisition module is used for acquiring sample data of the verification decision tree, wherein the sample data comprises request data, verification result data obtained by verifying the request data and processing result data obtained by performing corresponding post-processing on the verification result data;

the computing module is used for extracting a plurality of characteristic data in the request data and computing important coefficients of each characteristic data in each verification decision tree according to a preset coefficient computing method;

the selecting module is used for selecting target characteristic data from the plurality of characteristic data according to the important coefficient;

the training module is used for inputting the target characteristic data, the verification result data and the processing result data into a verification decision tree for training, and fitting a plurality of trained verification decision trees to obtain a weak model sequence;

the combination module is used for combining the weak model sequences according to a preset combination mode to obtain a set model;

the prediction module is used for obtaining the request data to be verified, and predicting the request data to be verified by utilizing the set model to obtain verification result data and processing result data corresponding to the request data to be verified.

The invention also provides a computer device comprising a memory and a processor connected with the memory, wherein the memory stores a computer program which can run on the processor, and the processor realizes the data prediction step when executing the computer program.

The present invention also provides a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of data prediction described above.

The beneficial effects of the invention are as follows: in the training decision tree process, the method calculates the important coefficient of each characteristic data in each verification decision tree, firstly selects a preset number of characteristic data, then selects high-quality characteristic data from the preset number of characteristic data as training data according to the important coefficient to train the verification decision tree, and because the characteristic data is not pruned, the obtained aggregate model prediction verification data and processing result data have high accuracy and good generalization performance.

Drawings

FIG. 1 is a flowchart of a data prediction method according to a first embodiment of the present invention;

FIG. 2 is a detailed flowchart illustrating the step of calculating the importance coefficients of each feature data in each verification decision tree according to the predetermined coefficient calculation method in FIG. 1;

FIG. 3 is a schematic diagram of an embodiment of a data prediction apparatus according to the present invention;

fig. 4 is a schematic diagram of a hardware architecture of an embodiment of a computer device according to the present invention.

Detailed Description

The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

It should be noted that the description of "first", "second", etc. in this disclosure is for descriptive purposes only and is not to be construed as indicating or implying a relative importance or implying an indication of the number of technical features being indicated. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include at least one such feature. In addition, the technical solutions of the embodiments may be combined with each other, but it is necessary to base that the technical solutions can be realized by those skilled in the art, and when the technical solutions are contradictory or cannot be realized, the combination of the technical solutions should be considered to be absent and not within the scope of protection claimed in the present invention.

Referring to fig. 1, a flow chart of an embodiment of a data prediction method according to the present invention is shown. The data prediction method comprises the following steps:

step S1, collecting sample data of a verification decision tree, wherein the sample data comprises request data, verification result data obtained by verifying the request data and processing result data obtained by performing corresponding post-processing on the verification result data;

wherein, collecting sample data of the verification decision tree comprises: and randomly and repeatedly extracting data with the same data quantity as the data of the data set from the preset data set each time to serve as sample data of the verification decision tree.

When sample data of each verification decision tree is collected, if the size of the data set is N, for each verification decision tree, N training samples are randomly and repeatedly extracted from the data set to be used as a training set of the verification decision tree. Training by using one training set each time to obtain a verification decision tree model, and training k training sets to obtain k verification decision tree models.

In a preferred embodiment of the present invention, the request data is, for example, policy information, the verification result data is a verification result obtained by verifying the policy information according to a predetermined verification decision, and the processing result data is data for performing corresponding claims according to the verification result. The three results are limited by the various indexes of the policy information filled in during the underwriting when the underwriting is performed, including passing, failing (issuing exists), refusing to do so. A response model to the claim result is generated under these three underwriting results, where underwriting decisions act as one of the influencing factors. The sample data collection specifically comprises:

1. acquiring the policy information of the same product from an input database of the online system record insurance, wherein the insurance time is within a certain range, and taking the condition that the policy possibly has correction into consideration, using the latest policy data for all the policies meeting the conditions. And collecting stored quantization indexes which are taken as characteristic data and comprise information such as the ages, the birth places, the types of targets, dangerous seeds, responsibilities, insurance amount, premium, rate and insurance duration of the insurance applicant and the insured person.

2. And searching the check and protection data of the policy information, wherein the check and protection data comprise the pass, fail and refusal of the policy. Labeling the acquired policy information in the step 1 to form a corresponding relation between the underwriting and the underwriting, so that the underwriting quantitative index set of each policy corresponds to one underwriting state. Considering the change of the modified data, the statistical deviation needs to be reduced, the issued state of the policy is eliminated, and the three states of passing, failing and refusing of the verification of the latest policy are focused on acquisition.

3. And continuing to search the claim data such as the damage type, the pay amount, the risk number and the like for the insurance policy with the check and protection data searched. And (3) marking the sample data in the step (2) so as to generate data of the corresponding relation of three stages of underwriting, underwriting and claim settlement. So that a policy can be obtained by arrangement: and taking various quantitative indexes from the start of underwriting, the underwriting data and the final claim settlement result given by the underwriting person as the influence of underwriting on the insurance policy under the determined underwriting condition.

And storing all the acquired sample data into a newly established corresponding relation data table.

S2, extracting a plurality of characteristic data in the request data, and calculating an important coefficient of each characteristic data in each verification decision tree according to a preset coefficient calculation method;

in order to ensure that important features are not lost during feature screening of a verification decision tree, the importance degree of feature data needs to be analyzed to select feature data with high importance degree, and partial redundant features are removed.

In one embodiment, as shown in fig. 2, the step of calculating the significant coefficient of each feature data in each verification decision tree according to a predetermined coefficient calculation method specifically includes:

step S21, calculating the mean value of the variation of each characteristic data at the node n of the verification decision tree, the mean value of the variation of the node before the node n branches and the mean value of the variation of the node after the node n branches by adopting a coefficient calculation formula:

wherein, the value of the coefficient of the foundation is G, and the sequence of the characteristic data is X ₁ ,X ₂ ,…X _i . Calculating the variation mean value G of the ith characteristic data split at the node n of the verification decision tree model by adopting a coefficient-based calculation formula _n K is the number of classifications, μ _nk Is the duty cycle of the sample data of class k in node n. Similarly, the mean value of the variation of the node before the node n branches and the mean value of the variation of the node after the node n branches can be calculated by adopting a coefficient calculation formula.

Step S22, inputting the change amount mean value of the node n, the change amount mean value of the node before branching of the node n, and the change amount mean value of the node after branching of the node n into a predetermined first formula to calculate, so as to obtain an important coefficient of the feature data at the node n, where the first formula is:

W _in ＝G _n -G _P -G _q ，W _in is characteristic data X _i The important coefficient at the node n of the verification decision tree, i is the characteristic data X _i Sequence number in the signature sequence, G _n G is the variation mean value of the node n _P For the change quantity average value of the node p before the node n branches, G _q And the mean value of the variation quantity of the node q after the node n branches.

Step S23, inputting the important coefficient of the node n into a preset second formula for calculation to obtain the important coefficient of the characteristic data in the verification decision tree, wherein the important coefficient is used as a basis for selecting the characteristic data subsequently, and the second formula is as follows:

r is characteristic data X _i Important coefficients, W, in the verification decision tree _i Is characteristic data X _i Important coefficients, W, at node n of the verification decision tree _j Is characteristic data X _i And c is the number of the verification decision trees and j is the serial number of the verification decision trees at the important coefficient of the j-th verification decision tree node n.

S3, selecting target characteristic data from a plurality of characteristic data according to the important coefficient;

if the dimension of the feature data of each sample data is M, a predetermined number of constants M < M are designated, M feature subsets are randomly selected from the M features, and each time the verification decision tree is split, the optimal feature data are selected from the M feature data, namely, the optimal feature data are selected as target feature data according to the order of the importance coefficient R from large to small, in the process, each tree grows to the greatest extent, and one or more feature data are not completely excluded, namely, the pruning process is not performed.

S4, inputting the target characteristic data, the verification result data and the processing result data into a verification decision tree for training, and fitting a plurality of trained verification decision trees to obtain a weak model sequence;

inputting the target feature data, the verification result data and the processing result data into corresponding verification decision trees for training, and in one embodiment, establishing a vector sequence S according to the target feature data, the verification result data and the processing result data _i (i=1, 2, …, k), i being the vector number, k being the number of vectors of the vector sequence, using S _i (i=1, 2, …, k) training each individual verification decision tree model u (X, S _i ) I=1, 2, … k, wherein X in the model is a verification decision variable, and the model is used as an independent variable, and a weak model sequence { u } is obtained after k times of fitting ₁ (X),u ₂ (X),…,u _k (X) }. The weak model is a feature of the random forest model, and since the random forest model uses multiple decision trees for individual prediction, the combination of these tree prediction results jointly determines the final result.

S5, combining the weak model sequences according to a preset combination mode to obtain a set model;

wherein the weak model sequences { u } are combined in a predetermined manner ₁ (X),u ₂ (X),…,u _k (X) } the combination specifically comprises: and inputting the weak model sequence into a maximum function for operation, and establishing a set model. The aggregate model is a classification model, and the maximum value function is adopted to combine to obtain the aggregate model so as to select a final classification result from multiple classifications:

u (X) is the aggregate model, X is the predetermined verification decision, U _i (X) is the weak model sequence, i is the sequence number of the weak model sequence, k is the length of the weak model sequence, Z is a response variable, L is a collective indication function, and arg max is a maximum function.

And S6, obtaining request data to be verified, and predicting the request data to be verified by utilizing the set model to obtain verification result data and processing result data corresponding to the request data to be verified.

According to the embodiment, the collection model is used for carrying out prediction of the insurance policy and the claim settlement, a large amount of data stored in the whole flow of the online product is fully utilized, a collection model for prediction is trained, the collection model is used for rapidly giving the insurance index and obtaining decision suggestions, the time consumption of auditing the data is saved, and meanwhile the problems of partial information omission, substitution of subjective factors and the like caused by manual insurance are avoided. The algorithm of the embodiment ensures the comprehensiveness of characteristic data acquisition as much as possible, the accuracy of the set model is high, the generalization performance is good, the training by the GPU is not needed, and the high timeliness requirement of intelligent verification is met.

As can be seen from the above description, in the training decision tree process of the embodiment, the important coefficient of each feature data in each verification decision tree is calculated, a predetermined number of feature data is selected first, then, according to the order of the important coefficients from large to small, high-quality feature data is selected from the predetermined number of feature data as training data to train the verification decision tree, and because the feature data is not pruned, the accuracy of predicting the verification data and the processing result data by the obtained set model is high, and the generalization performance is good.

In an embodiment, on the basis of the above embodiment, before the step S6, the method further includes the following steps:

adaptively adjusting the number of the verification decision trees and the maximum tree depth in the set model by adopting a verification curve, and adjusting the number of sample data of each verification decision tree in the set model by adopting a learning curve;

and testing the adjusted aggregate model, and if the accuracy rate of the adjusted aggregate model obtained by testing is greater than or equal to the preset accuracy rate, using the adjusted aggregate model for prediction.

After the aggregate model is obtained, there may be an effect of under-fitting or over-fitting, at this time, a classification effect of the aggregate model may be evaluated by using a verification curve, which is essentially an influence of the super parameter on the training score and the verification score, so as to obtain an optimal parameter, and specifically includes adaptively adjusting the number of verification decision trees and the maximum tree depth in the aggregate model by using the verification curve; and adjusting the training set size of each verification decision tree by using a learning curve to obtain the optimal training set size, and improving the generalization performance of the set model.

Further, part of data is used in the process of training the verification decision tree, the rest of data is not used, policy data in the rest of data can be used as parameters, and under the corresponding verification decision, a set model is used for prediction to obtain a corresponding verification result and a claim settlement result, if the predicted verification result and the claim settlement result are close to the actual verification result and the claim settlement result, and the accuracy of the total claim settlement result reaches a preset threshold (such as 85%), the set model is valid, and the method is applicable to the policy to be verified later.

In one embodiment, the present invention provides a data prediction device, where the data prediction device corresponds to the method in the above embodiment one by one. As shown in fig. 3, the data prediction apparatus includes:

the acquisition module 101 is configured to acquire sample data of a verification decision tree, where the sample data includes request data, verification result data obtained by verifying the request data, and processing result data obtained by performing corresponding post-processing on the verification result data;

a calculating module 102, configured to extract a plurality of feature data in the request data, and calculate an important coefficient of each feature data in each verification decision tree according to a predetermined coefficient calculating method;

a selecting module 103, configured to select target feature data from a plurality of feature data according to the importance coefficient;

the training module 104 is configured to input the target feature data, the verification result data, and the processing result data into a verification decision tree for training, and fit a plurality of trained verification decision trees to obtain a weak model sequence;

a combining module 105, configured to combine the weak model sequences in a predetermined combination manner to obtain a set model;

and the prediction module 106 is configured to obtain the request data to be verified, and predict the request data to be verified by using the set model to obtain verification result data and processing result data corresponding to the request data to be verified.

The specific limitation of the data prediction apparatus may be referred to above as limitation of the data prediction method, and will not be described herein. The respective modules in the above-described data prediction apparatus may be implemented in whole or in part by software, hardware, and combinations thereof. The above modules may be embedded in hardware or may be independent of a processor in the computer device, or may be stored in software in a memory in the computer device, so that the processor may call and execute operations corresponding to the above modules.

In one embodiment, a computer device is provided, which is a device capable of automatically performing numerical calculation and/or information processing in accordance with instructions set or stored in advance. The computer device may be a PC (Personal Computer ), or a smart phone, a tablet computer, a server group formed by a single network server, a plurality of network servers, or a cloud based on cloud computing, where the cloud computing is a kind of distributed computing, and is a super virtual computer formed by a group of loosely coupled computer sets.

As shown in fig. 4, the computer device may include, but is not limited to, a memory 11, a processor 12, and a network interface 13, which may be communicatively connected to each other through a system bus, the memory 11 storing a computer program executable on the processor 12. It should be noted that FIG. 4 only shows a computer device having components 11-13, but it should be understood that not all of the illustrated components are required to be implemented and that more or fewer components may be implemented instead.

Wherein the memory 11 may be non-volatile and/or volatile memory. The nonvolatile memory can include Read Only Memory (ROM), programmable ROM (PROM), electrically Programmable ROM (EPROM), electrically Erasable Programmable ROM (EEPROM), or flash memory. Volatile memory can include Random Access Memory (RAM) or external cache memory. By way of illustration and not limitation, RAM is available in a variety of forms such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double Data Rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link DRAM (SLDRAM), memory bus direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), among others. In this embodiment, the readable storage medium of the memory 11 is typically used for storing an operating system and various application software installed on a computer device, for example, for storing program codes of a computer program in an embodiment of the present invention. Further, the memory 11 may be used to temporarily store various types of data that have been output or are to be output.

The processor 12 may in some embodiments be a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip for executing program code stored in the memory 11 or for processing data, such as executing a computer program or the like.

The network interface 13 may comprise a standard wireless network interface, a wired network interface, which network interface 13 is typically used to establish communication connections between the computer device and other electronic devices.

The computer program is stored in the memory 11 and comprises at least one computer readable instruction stored in the memory 11, the at least one computer readable instruction being executable by the processor 12 to implement the method of embodiments of the present application, comprising:

wherein collecting sample data for each of the verification decision trees comprises: and randomly and repeatedly extracting data with the same data quantity as the data quantity of the data set from the preset data set each time to serve as sample data of each verification decision tree.

3. And continuing to search the claim data such as the damage type, the pay amount, the risk number and the like for the insurance policy with the check and protection data searched. And (3) marking the sample data in the step (2) so as to generate data of the corresponding relation of three stages of underwriting, underwriting and claim settlement. Therefore, various quantitative indexes of a policy from the start of underwriting, the underwriting data given by the underwriting person and the final claim settlement result can be obtained in an arrangement mode, and under the determined underwriting condition, the underwriting influences on the policy are obtained.

In one embodiment, the step of calculating the importance coefficient of each feature data in each verification decision tree according to a predetermined coefficient calculation method specifically includes:

calculating the variation average value of each characteristic data at the node n of the verification decision tree, the variation average value of the node before the node n branches and the variation average value of the node after the node n branches by adopting a coefficient calculation formula:

wherein the coefficient of the radix is G, and the sequence of the characteristic data is (i.e. X ₁ ,X ₂ ,…X _i ). Calculating the variation mean value G of the ith characteristic data split at the node n of the verification decision tree model by adopting a coefficient-based calculation formula _n K is the number of classifications, μ _nk Is the duty cycle of the sample data of class k in node n.

Inputting the change amount mean value of the node n, the change amount mean value of the node before the node n branches and the change amount mean value of the node after the node n branches into a preset first formula for calculation to obtain the important coefficient of the characteristic data at the node n, wherein the first formula is as follows:

Inputting the important coefficient of the node n into a preset second formula to calculate, so as to obtain the important coefficient of the characteristic data in the verification decision tree, wherein the important coefficient is used as a basis for selecting the characteristic data subsequently, and the second formula is as follows:

Selecting target characteristic data from a plurality of characteristic data according to the important coefficient;

then, inputting the target feature data, the verification result data and the processing result data into the corresponding verification decision tree for training, and in one embodiment, establishing a vector sequence S according to the target feature data, the verification result data and the processing result data _i (i=1, 2, …, k), i being the vector number, k being the number of vectors of the vector sequence, using S _i (i=1, 2, …, k) training each individual verification decision tree model u (X, S _i ) I=1, 2, … k, wherein X in the model is a verification decision variable, and the model is used as an independent variable, and a weak model sequence { u } is obtained after k times of fitting ₁ (X),u ₂ (X),…,u _k (X) }. The weak model is a feature of the random forest model, and since the random forest model uses multiple decision trees for individual prediction, the combination of these tree prediction results jointly determines the final result.

In an embodiment, before the step of predicting the request data to be verified, the method further includes the following steps:

In one embodiment, the present invention provides a computer readable storage medium, which may be a nonvolatile and/or volatile memory, having stored thereon a computer program, which when executed by a processor, implements the steps of the data prediction method in the above embodiment, such as steps S1 to S6 shown in fig. 1. Alternatively, the computer program when executed by the processor implements the functions of the respective modules/units of the data prediction apparatus in the above embodiments, such as the functions of the modules 101 to 106 shown in fig. 3. In order to avoid repetition, a description thereof is omitted.

Those skilled in the art will appreciate that implementing all or part of the above described embodiment methods may be accomplished by way of a computer program comprising the steps of embodiments of the methods described above when executed by associated hardware.

The foregoing embodiment numbers of the present invention are merely for the purpose of description, and do not represent the advantages or disadvantages of the embodiments.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the scope of the invention, but rather is intended to cover any equivalents of the structures or equivalent processes disclosed herein or in the alternative, which may be employed directly or indirectly in other related arts.

Claims

1. A method of data prediction, comprising:

acquiring request data to be verified, and predicting the request data to be verified by utilizing the set model to obtain verification result data and processing result data corresponding to the request data to be verified;

the step of calculating the important coefficient of each feature data in each verification decision tree according to a predetermined coefficient calculation method specifically comprises the following steps: calculating the variation average value of each characteristic data at the node n of the verification decision tree, the variation average value of the node before the node n branches and the variation average value of the node after the node n branches by adopting a coefficient calculation formula; inputting a change amount mean value of the node n, a change amount mean value of the node before the node n branches and a change amount mean value of the node after the node n branches into a preset first formula for calculation to obtain an important coefficient of the characteristic data at the node n; inputting the important coefficient of the node n into a preset second formula for calculation to obtain the important coefficient of the characteristic data in the verification decision tree;

the first formula includes: w (W) _in ＝G _n -G _P -G _q ，W _in Is characteristic data X _i The important coefficient at the node n of the verification decision tree, i is the characteristic data X _i Sequence number in the signature sequence, G _n G is the variation mean value of the node n _P For the change quantity average value of the node p before the node n branches, G _q The mean value of the variation of the node q after the node n branches is obtained;

the second formula includes:

r is characteristic data X _i Important coefficients, W, in the verification decision tree _in Is characteristic data X _i Important coefficients, W, at node n of the verification decision tree _j Is characteristic data X _i The important coefficient of the node n of the j-th verification decision tree is c, the number of the verification decision trees is j, and the sequence number of the verification decision trees is j;

the step of combining the weak model sequences according to a predetermined combination mode to obtain a set model specifically comprises the following steps: inputting the weak model sequence into a maximum function for operation to obtain the aggregate model, wherein the aggregate model is

U (X) is the aggregate model, X is the predetermined verification decision, U _i (X) is the weak model sequence, i is the serial number of the weak model in the weak model sequence, and k is the weak modelThe length of the sequence, Z is the response variable, L is the collective indication function, arg max is the maximum function.

2. The method for predicting data according to claim 1, wherein the step of obtaining the request data to be verified, predicting the request data to be verified by using the set model, and obtaining verification result data and processing result data corresponding to the request data to be verified further comprises:

3. The method of claim 1, wherein the collecting sample data of a verification decision tree comprises: and randomly and repeatedly extracting data with the same data quantity as the data quantity of the data set from the preset data set each time to serve as sample data of each verification decision tree.

4. A data prediction apparatus, comprising:

the prediction module is used for obtaining request data to be verified, and predicting the request data to be verified by utilizing the set model to obtain verification result data and processing result data corresponding to the request data to be verified;

the second formula includes:

U (X) is the aggregate model, X is the predetermined verification decision, U _i (X) is the weak model sequence, i is the serial number of the weak model in the weak model sequence, k is the length of the weak model sequence, Z is a response variable, L is a collective indication function, and arg max is the maximum function.

5. A computer device comprising a memory and a processor connected to the memory, the memory storing a computer program executable on the processor, wherein the processor, when executing the computer program, implements the steps of the data prediction method according to any one of claims 1 to 3.

6. A computer readable storage medium having stored thereon a computer program, characterized in that the computer program when executed by a processor implements the steps of the data prediction method according to any of claims 1 to 3.