WO2020041998A1

WO2020041998A1 - Systems and methods for establishing optimized prediction model and obtaining prediction result

Info

Publication number: WO2020041998A1
Application number: PCT/CN2018/102897
Authority: WO
Inventors: 罗惟正; 陈宥宏; 钟舜宇
Original assignee: 财团法人交大思源基金会
Priority date: 2018-08-29
Filing date: 2018-08-29
Publication date: 2020-03-05

Abstract

Systems and methods for establishing an optimized prediction model and obtaining a prediction result based on machine learning. In an establishment program of the optimized prediction model, a plurality of pieces of training data input by a user and at least one machine learning algorithm selected by the user are received, and the received training data is uniformly converted to a relay format. Automatic feature value filtration and machine learning algorithm parameter optimization are performed, and iterative prediction model optimization is performed. Then, a prediction model and corresponding accuracy evaluation data are output. In the obtaining program of the prediction result, the data to be predicted is converted to the relay format, and iterative prediction is performed on the automatic programs to generate and output the prediction result and the accuracy evaluation data.

Description

System and method for establishing optimal prediction model and obtaining prediction result

Background technique

The present invention relates to a system and method for establishing a prediction model and obtaining a prediction result, and in particular, to a system and method for establishing an optimal prediction model based on mechanical learning and obtaining a prediction result.

Background technique

In recent years, with the great progress of artificial intelligence (AI) technology, the application field of artificial intelligence has been continuously extended. Through artificial intelligence, human life will be more advanced and convenient.

Machine learning is part of artificial intelligence. The purpose of machine learning is to make computers have the ability to learn. In order for the computer to have the ability to identify and judge, the computer must use the existing data for two programs of training and prediction. The entire program includes steps to obtain data, analyze data, build models, and predict the future.

Often, building a computer with artificial intelligence requires a great deal of expertise. For example, because the operation of related software, data acquisition, and integration of algorithms are not easy, relevant personnel must be well aware of the principles of machine learning, and they need good programming skills to complete machine learning training and prediction programs. In addition, due to the lack of automation and modular design of current model training, the selection of eigenvalues, the determination of algorithm parameters, the integration of algorithms, and the optimization of accuracy must rely on the experience of relevant personnel, resulting in the instability of the output model quality. And the overall system's learning and prediction preferences.

In view of this, if the machine learning mechanism can realize the acquisition of data, the selection of eigenvalues, the determination of algorithm parameters, the integration of algorithms, and the optimization of accuracy, such as automation and modular design, it will greatly Improve the efficiency, convenience, and accuracy of machine learning training.

Summary of the Invention

The invention provides a system and method for establishing an optimized prediction model based on mechanical learning and obtaining a prediction result, in which an automatic and modular design can be used for machine learning model training and prediction, thereby obtaining more efficient machine learning Training procedures, and more accurate predictions.

In the method for establishing an optimized prediction model based on mechanical learning in the embodiment of the present invention, first, a) a user provides training data, has a data format, and selects several mechanical learning algorithms and an operation to be used Magnitude and a target prediction value; b) using a conversion program to convert the data format to which the training data belongs to a relay format, obtain a formatted raw data, and set it with a first characteristic value and a parameter setting group The several mechanical learning algorithms; c) dividing the data values of the formatted raw data into a sub-training set and a sub-testing set; d) establishing by the data values contained in the several mechanical learning algorithms and the sub-training set A first sub-prediction model; e) substituting the data values contained in the sub-test set into the first sub-prediction model, and obtaining a first accuracy through several prediction algorithms; f) if the data values of the formatted original data are all As the sub-training set and the sub-test set, or the number of repetitions satisfies the calculation value, modify the n-th eigenvalue and parameter setting group according to the n-th accuracy to obtain an n + 1-th eigenvalue and parameter setting group Conversely, repeat steps c) to e); g) reset the mechanical learning algorithms with the nth feature value and the parameter setting group, and use the number of mechanical learning algorithms and the data value contained in the formatted original data Establish a first prediction model; h) If the n-th accuracy meets the target prediction value or the number of repetitions satisfies the calculation value, provide an n-th prediction model as an optimized prediction model, otherwise, modify the n-th feature according to the accuracy Value and parameter setting group to obtain an n + 1th eigenvalue and parameter setting group to set the mechanical learning algorithms, and repeat steps c) to e); and i) display the optimized prediction model and the nth accuracy .

The system for establishing an optimized prediction model based on mechanical learning in the embodiment of the present invention includes at least a storage unit and a processing unit. The storage unit includes training data having a data format, and several mechanical learning algorithms. The processing unit is coupled to the storage unit and configured to perform the following method steps: a) receiving a calculation value and a target prediction value; b) using a conversion program to convert the data format to which the training data belongs to a relay format To obtain a formatted raw data, and set the mechanical learning algorithms with a first feature value and a parameter setting group; c) divide the data value of the formatted raw data into a sub-training set and a sub-test set D) establishing a first sub-prediction model by using the several mechanical learning algorithms and data values carried in the sub-training set; e) substituting the data values contained in the sub-test set into the first sub-prediction model and passing several prediction algorithms Obtain a first accuracy; f) if the data values of the formatted original data have been used as the sub-training set and the sub-test set, or the number of repetitions satisfies the value of the operation, modify the nth feature according to the nth accuracy Value and parameter setting group to obtain an n + 1th eigenvalue and parameter setting group, otherwise, repeat steps c) to e); g) reset these mechanical learning algorithms with the nth eigenvalue and parameter setting group Through the several machine learning calculations Method to establish a first prediction model with the data values contained in the formatted raw data; h) if the n-th accuracy satisfies the target prediction value or the number of repetitions satisfies the calculation value, an n-th prediction model is provided as an optimized prediction Model, otherwise, modify the nth eigenvalue and parameter setting group according to the accuracy, obtain an n + 1th eigenvalue and parameter setting group to set these mechanical learning algorithms, and repeat steps c) to e); and i) Display the optimized prediction model and the n-th accuracy.

In some embodiments, step h further includes the following steps: h1) accessing the (n + 1) th feature value and the parameter setting group to a data temporary storage area; and h2) if the number of repetitions meets the operation value, the The highest accuracy is selected in the data temporary storage area, and the several mechanical learning algorithms are reset.

In some embodiments, step c further includes the following steps: c1) After dividing the data values of the formatted raw data into a training set and a test set, the data values of the training set are divided into the sub-training set and the sub-set The test set, and step g further includes the following steps: g1) establishing the first prediction model through the several mechanical learning algorithms and the data values of the training set; g2) substituting the data values of the test set into the first A prediction model, obtaining a first test accuracy through the plurality of prediction algorithms; and g3) replacing the first test accuracy with the first accuracy.

In some embodiments, step a further includes the following steps: selecting a classification sample balance cardinality (n) to be used; and step d further includes the following steps: d1) the plurality of mechanical learning algorithms load the sub-training set The data values are divided into multiple sampling categories, where the several mechanical learning algorithms have different sampling categories: d2) the number of sample balanced cardinalities is sampled from multiple sampling categories to establish a sample combination. In some embodiments, Step d2) may repeatedly sample the number of balanced cardinalities of the classification sample; d3) use the data value contained in the sample combination to establish the first sample prediction model; d4) repeat steps d2) to d3) until the operation value (t ) To obtain a plurality of sample prediction models, and merge the plurality of sample prediction models to form a first sub prediction model.

In some embodiments, step e further includes the following steps: eap1) each of the plurality of prediction algorithms obtains several first sample accuracy; and eap2) the plurality of first samples are selected by a voting mode or an average mode The one with the highest confidence index for sample accuracy is used as the first prediction result.

In some embodiments, step e further includes the following steps: e1) comparing the first accuracy with a known result to obtain a first accuracy index; and step f further includes the following steps: f1) is accurate according to the nth Degree and n-th accuracy index modify the n-th eigenvalue and parameter setting group. In some embodiments, the accuracy index includes accuracy, which refers to all correctly predicted samples / total samples, AUC (Area Underlying the Receiver, Operating Characteristic Curve), and MCC (Matthews Correlation, Coefficient).

In some embodiments, in step b, a plurality of the conversion programs are repeatedly compared to the data format, and a corresponding conversion program is selected.

In some embodiments, the data format is a csv file or a plain text file.

In the method for obtaining an optimized prediction result based on a mechanical material format convention according to an embodiment of the present invention, first, a) a user provides a data to be predicted, has a data format, and selects an optimized prediction model and the method to be used Several prediction algorithms; b) using a conversion program to convert the data format to which the data to be predicted belongs to a relay format to obtain a formatted raw data; and c) the data value contained in the formatted raw data Substituting into the optimization prediction model, an optimization prediction result and an optimization accuracy index are obtained through the plurality of prediction algorithms.

The system for obtaining an optimized prediction result based on mechanical learning in the embodiment of the present invention includes at least a storage unit and a processing unit. The storage unit includes a data to be predicted having a data format, an optimized prediction model, and a complex prediction algorithm. The processing unit is coupled to the storage unit and configured to perform the following method steps: a) selecting the optimized prediction model and the prediction algorithms; b) using a conversion program to convert the data format to which the data to be predicted belongs to a Relay format to obtain a formatted raw data; and c) substituting the data value contained in the formatted raw data into the optimized prediction model, and obtaining an optimized prediction result and an optimized accuracy index through the plurality of prediction algorithms.

In some embodiments, step a further includes the following steps: a1) selecting an operation value; and step c further includes: c1) the formatted raw data is a first formatted raw data, and the first format is The value of the data contained in the original data is substituted into the optimized prediction model, and a first prediction result is obtained through the prediction algorithms; c2) An n-th formatted to-be-predicted data is combined with an n-th prediction result to obtain an n + 1th format To predict the data, repeat step c1) until the number of repetitions satisfies the value of the operation, and provide an n + 1th prediction result as the optimized prediction result.

In some embodiments, step c1 further includes the following steps: c1p1) obtaining a first accuracy through the plurality of prediction algorithms, comparing the first accuracy with a known result to obtain a first accuracy index; and Step c2 further includes the following steps: c2p1) Provide an n + 1th accuracy index as the optimization accuracy index. In some embodiments, the accuracy indicators include accuracy, AUC, and MCC.

The above method of the present invention may exist in a program code manner. When the program code is loaded and executed by a machine, the machine becomes a device for practicing the present invention.

In order to make the above-mentioned objects, features, and advantages of the present invention more comprehensible, the following describes the embodiments in detail with the accompanying drawings, as follows.

BRIEF DESCRIPTION OF THE DRAWINGS

1 is a schematic diagram showing a system for establishing an optimized prediction model based on mechanical learning according to an embodiment of the present invention;

2 is a flowchart showing a method for establishing an optimized prediction model based on mechanical learning according to an embodiment of the present invention;

3 is a flowchart showing a method for establishing an optimized prediction model based on mechanical learning according to another embodiment of the present invention.

4 is a flowchart showing a method for automatic feature value selection and machine learning algorithm parameter optimization according to an embodiment of the present invention;

5A and 5B are a flowchart showing a method for modularly establishing a prediction model according to an embodiment of the present invention;

6A and 6B are flowcharts showing a balanced data sampling mode and a random forest prediction model training method according to an embodiment of the present invention;

7A and 7B are flowcharts showing a method for optimizing prediction accuracy according to an embodiment of the present invention;

8 is a schematic diagram showing a system for obtaining an optimal prediction result based on mechanical learning according to an embodiment of the present invention;

9 is a flowchart showing a method for obtaining an optimized prediction result based on mechanical learning according to an embodiment of the present invention;

10 is a flowchart showing a method for obtaining an optimized prediction result based on mechanical learning according to another embodiment of the present invention;

11 is a flowchart showing an iterative prediction method according to an embodiment of the present invention;

12 is a flowchart showing a method for predicting modular data according to an embodiment of the present invention;

FIG. 13 is a flowchart showing a random forest type data prediction method according to an embodiment of the present invention.

DESCRIPTION OF SYMBOLS: 1000 establishment system of optimized prediction model based on mechanical learning; 1100 electronic device; 1110 data input unit; 1120 storage unit; 1122 training data; 1124 machine learning algorithm; 1130 processing unit; S2002, S2004, ... , S2010 steps; S3002, S3004a, S3004b, S3004n, ..., S3014 steps; S4002, S4004, ..., S4012 steps; S5002, S5004, ..., S5024 steps; S6002, S6004, ..., S6018 steps; C1, C2, Cn categories ; S7002, S7004, ..., S7018 steps; TRD training set; TED test set; 8000 system for obtaining optimized prediction results based on mechanical learning; 8100 electronic device; 8110 data input unit; 8120 storage unit; 8122 data to be predicted; 8124 prediction model; 8130 processing unit; S9002, S9004, ..., S9008 steps; S10002, S10004a, S10004b, S10004n, ..., S10012 steps; S11002, S11004, ..., S11016 steps; S12002, S12004, ..., S12016 steps; S13002, Steps S13004, ..., S13014.

detailed description

FIG. 1 shows a system 1000 for building an optimized prediction model based on mechanical learning according to an embodiment of the present invention. The system 1000 for building an optimization prediction model based on mechanical learning according to an embodiment of the present invention may be applied to an electronic device 1100, such as a single-core or multi-core computing device, and may be a stand-alone environment or a cluster environment. The electronic device 1100 includes a data input unit 1110, a storage unit 1120, and a processing unit 1130. The data input unit 1110 may be used to receive a plurality of training data. The storage unit 1120 may store the training data 1122 received by the data input unit 1110 and a plurality of machine learning algorithms 1124. It is worth noting that, in some embodiments, the data format is a csv file or a plain text file. In addition, the system can receive an advanced system configuration through the data input unit 1110 for system settings, such as the size of the random forest, or the voting mechanism for setting prediction results and detailed parameters of each algorithm. The processing unit 1130 can control the related software and hardware operations in the electronic device 1100 and perform the method for establishing an optimized prediction model based on mechanical learning of the present invention, the details of which will be described below.

FIG. 2 shows a method for establishing an optimized prediction model based on mechanical learning according to an embodiment of the present invention. The method for establishing an optimized prediction model based on mechanical learning according to an embodiment of the present invention is applicable to the electronic device shown in FIG. 1.

First, in step S2002, a plurality of training data input by a user and at least one selected machine learning algorithm are received. It is worth noting that, in some embodiments, an advanced system configuration can also be received for system setting. Next, in step S2004, the received training data is uniformly converted into a relay format of the system. It should be noted that the received training data may have different data formats. In step S2004, the training data in different formats are respectively converted into a relay format for subsequent processing. Then, in step S2006, an algorithm M is performed to perform automatic feature value screening and machine learning algorithm parameter optimization. In step S2008, an algorithm O is performed to optimize the iterative prediction model. Finally, in step S2010, a prediction model and corresponding accuracy evaluation data are output. Algorithm M and Algorithm O will be described in detail below.

FIG. 3 shows a method for establishing an optimized prediction model based on mechanical learning according to another embodiment of the present invention. The method for establishing an optimized prediction model based on mechanical learning according to an embodiment of the present invention is applicable to the electronic device shown in FIG. 1.

First, in step S3002, a plurality of training data input by a user and at least one selected machine learning algorithm are received. Similarly, in some embodiments, an advanced system configuration may also be received for system settings. Then, according to steps S3004a, S3004b, ..., S3004n, the training data in different formats are uniformly converted into a relay format of the system by corresponding conversion procedures of different formats, and in step S3006, training with a relay format is output. Data, called "formatted raw data." Next, in step S3008, combining the characteristic values of the corresponding training data with the adjustable parameters of the selected machine learning algorithm is to control the specific behavior of each calculus, such as the number of layers of the artificial neural network and The number of nodes in each layer becomes a "characteristic value and parameter setting group". Then, in step S3010, an algorithm M is performed to perform automatic feature value screening and machine learning algorithm parameter optimization. In step S3012, an algorithm O is performed to optimize the iterative prediction model. Finally, in step S3014, a prediction model and corresponding accuracy evaluation data are output. Similarly, algorithm M and algorithm O will be described in detail below.

FIG. 4 shows an automatic eigenvalue screening and machine learning algorithm parameter optimization method (Algorithm M) according to an embodiment of the present invention. In this embodiment, "characteristic value screening" and "algorithm parameter optimization" can be performed according to an automated program.

In step S4002, a "characteristic value and parameter setting group" is obtained. In step S4004, the characteristic values are screened programmatically and the parameters of each algorithm are adjusted. In other words, the “feature value and parameter setting group” is adjusted programmatically, and an algorithm T is performed in step S4006 to establish a prediction model and test the accuracy according to the “feature value and parameter setting group”. It is worth noting that, in some embodiments, step S4004 may be a simple random screening and adjustment. In some embodiments, step S4004 may be performed using a Monte Carlo algorithm, a genetic algorithm, and / or a derivative algorithm thereof. The algorithm T will be described later. After that, in step S4008, the "feature value and parameter setting group" and the corresponding accuracy data are temporarily stored. In step S4010, it is determined whether the accuracy data has reached a specific standard or a cycle number has reached an upper limit. It is reminded that the specific standard or number of cycles can be defaulted by the system or set by a user. When the accuracy data does not reach a specific standard or the number of cycles does not reach the upper limit (NO in step S4010), as in step S4014, the number of cycles is increased by 1, and the flow returns to step S4004. When the accuracy data reaches a certain standard or the number of cycles reaches the upper limit (YES in step S4010), as in step S4012, the temporarily stored "characteristic value and parameter setting group" and / or the corresponding accuracy data are output.

5A and 5B show a method (algorithm T) for modularly building a prediction model according to an embodiment of the present invention. In this embodiment, a predictive model can be established by a modular program.

In step S5002, training data and a "feature value and parameter setting group" are obtained. In step S5004, it is determined whether a test is required. When a test is required (YES in step S5004), in step S5006, the training data is divided into a "training set TRD" and a "testing set TED". It is worth noting that step S5006 can be implemented in different ways. In some embodiments, the segmentation method may be based on N-fold cross validations, random grouping, or a combination of N-fold cross validations and random grouping. It should be noted that the above segmentation method is only an example of the invention, and the present invention is not limited thereto. In steps S5008a, S5008b, ..., S5008n, the training set TRD is put into a modular program, and a prediction model belonging to each method is established with a selected machine learning algorithm. It is worth noting that the algorithm AT is used to implement the above-mentioned modular program, the details of which will be described below. After that, in step S5010, the prediction models of all the machine learning algorithms are integrated, and in step S5012, an accuracy test is performed on the integrated prediction model according to the test set TED. It is worth noting that the algorithm P is used to implement the accuracy test described above, and its details will be described below. Next, in step S5014, it is determined whether all the training data have been used for model establishment and accuracy testing or the number of cycles has reached a number of tests. It is reminded that the number of tests may be defaulted by the system or set by a user. When all the training data has not been used to build the model and the accuracy test or the number of cycles has not reached the number of tests (NO in step S5014), such as step S5016, the number of cycles is increased by 1, and the flow returns to step S5006. When all the training data has been used to build the model and the accuracy test or the number of cycles has reached the number of tests (YES in step S5014), as in step S5018, a prediction accuracy is counted and output, and as in step S5024, the entire output is output. And after the prediction model. When the test is not required ("No" in step S5004), such as steps S5020a, S5020b, ..., S5020n, all the formatted raw data FOD is put into a modular program, and the selected machine learning algorithm is used to establish the The predictive model of the method. Similarly, the algorithm AT is used as the above-mentioned modular program, and its details will be described later. Then, as in step S5022, the prediction models of all the machine learning algorithms are merged, and in step S5024, the integrated prediction model is output.

FIG. 6A and FIG. 6B illustrate an equalized data sampling mode and a random forest prediction model training method (algorithm AT) according to an embodiment of the present invention. In this embodiment, the degree of "preference" and "over-adaptation" of the prediction system can be effectively reduced.

In step S6002, training data, a sampling number t, and a balanced sample number n are obtained. It is worth noting that in this procedure t samples will be sampled and t sub-prediction models will be established. In step S6004, the training data is grouped according to a known category to generate a category 1, a category 2, ..., a category n (C1, C2, ..., Cn). For example, it is known that there are 4 types of correct answers: heart disease, diabetes, gout, and none of the above diseases, and the training data can be divided into 4 groups according to the correct answers. In step S6006, the number of cycles s is initially set to 0 (s = 0), and in step S6008, the number of cycles s is increased by 1 (s = s + 1). In step S6010, in a random and repeatable manner, n pieces of data are taken from each group to form a sample s together. In step S6012, a sample prediction model s is established by using the obtained sample s. In step S6014, it is determined whether the number of cycles s is less than t. When the number of loops s is less than t (YES in step S6014), the flow returns to step S6008. When the number of cycles s is not less than t (NO in step S6014), as in step S6016, the t sub-prediction models obtained by combining the above are the final random forest type prediction model, and as in step S6018, the final random forest type prediction model Output.

FIG. 7A and FIG. 7B show a prediction accuracy optimization method (Algorithm O) according to an embodiment of the present invention. In this embodiment, “iterative prediction model optimization” can be performed with an automated program.

In step S7002, training data is obtained, and the training data is divided into "training set TRD" and "testing set TED". In step S7004, the latest generation of "characteristic value and parameter setting group" is obtained. In step S7006, a prediction model is established according to the test set TED and the algorithm T in the embodiment of FIG. 5, a prediction result such as a probability value and / or a confidence index is calculated, and the accuracy is tested. Next, in step S7008, the "feature value and parameter setting group" and the prediction result in step S7006 are integrated to form a new generation of "feature value and parameter setting group". In other words, the predicted data can be added as a new feature value to the "feature value and parameter setting group". In step S7010, the latest generation of "characteristic value and parameter setting group" and its accuracy data are temporarily stored. After that, if step S7012 is performed, the completed algebra is incremented, and if step S7014, it is judged whether the accuracy data has reached a specific standard or the number of cycles has reached the upper limit of the number of generations. It is reminded that a specific standard or algebraic upper limit may be a system default or set by a user. When the accuracy data does not reach a specific standard or the number of cycles does not reach the upper limit of the algebra (NO in step S7014), as in step S7016, the number of cycles is increased by 1, and the flow returns to step S7004. When the accuracy data reaches a certain standard or the number of cycles reaches the upper limit of algebra (YES in step S7014), as in step S7018, the "feature value and parameter setting group" with the highest current accuracy is output.

It must be noted that in some embodiments, the algorithm M and the algorithm O may be implemented as two steps of upstream and downstream, as shown in the embodiment of FIG. 3. In some embodiments, the algorithm M and the algorithm O can also be integrated as a step by covering each other. For example, the algorithm T used in the algorithm O is replaced with the algorithm M, or the algorithm T used in the algorithm M is replaced. Steps are replaced by algorithm O.

FIG. 8 shows a system for obtaining an optimized prediction result based on mechanical learning according to an embodiment of the present invention. The system 8000 for obtaining an optimized prediction result based on mechanical learning according to an embodiment of the present invention may be applicable to an electronic device 8100, such as a single-core or multi-core computing device, and may be a single-machine environment or a cluster environment. The electronic device 8100 includes a data input unit 8110, a storage unit 8120, and a processing unit 8130. The data input unit 8110 may be used to receive a data to be predicted. The storage unit 8120 may store the to-be-predicted data 8122 and the prediction model 8124 received by the data input unit 8110. It is worth noting that, in some embodiments, the system may receive an advanced system configuration through the data input unit 8110 for setting the system. The processing unit 8130 can control the related software and hardware operations in the electronic device 8100, and perform the method for obtaining an optimized prediction result based on mechanical learning of the present invention, the details of which will be described below.

FIG. 9 shows a method for obtaining an optimized prediction result based on mechanical learning according to an embodiment of the present invention. The method for obtaining an optimized prediction result based on mechanical learning according to an embodiment of the present invention is applicable to the electronic device shown in FIG. 8.

First, in step S9002, data to be predicted and a prediction model are received. It should be reminded that, in some embodiments, the prediction model may be generated according to the embodiment of FIG. 2 or FIG. 3. It is worth noting that, in some embodiments, an advanced system configuration can also be received for system setting. In step S9004, the data to be predicted is converted into a relay format of the system. It should be noted that the received to-be-predicted data may have different data formats. In step S9004, the data to be predicted in different formats are respectively converted into a relay format for subsequent processing. After that, in step S9006, an algorithm IP is performed for the automated program to perform "iterative prediction", and in step S9008, the prediction result and accuracy evaluation data are output. The algorithm IP will be described later.

FIG. 10 shows a method for obtaining an optimized prediction result based on mechanical learning according to another embodiment of the present invention. The method for obtaining an optimized prediction result based on mechanical learning according to an embodiment of the present invention is applicable to the electronic device shown in FIG. 8.

First, in step S10002, data to be predicted input by a user and a prediction model are received. It is reminded that, in some embodiments, the prediction model may be generated according to the embodiment of FIG. 2 or FIG. 3. It is worth noting that, in some embodiments, an advanced system configuration can also be received for system setting. Then, according to steps S10004a, S10004b, ..., S10004n, the data to be predicted in different formats are uniformly converted into a relay format of the system through corresponding different format conversion procedures, and in step S10006, the The data to be predicted is called "formatted data to be predicted". Next, in step S10008, the content of the prediction model is confirmed, and an algorithm adaptation operation is performed. After that, in step S10010, an algorithm IP is performed for the automated program to perform "iterative prediction", and in step S10012, the prediction result and accuracy evaluation data are output. The algorithm IP will be described later.

FIG. 11 shows an iterative prediction method (algorithm IP) according to an embodiment of the present invention.

In step S11002, the data to be predicted and an iterative prediction model are obtained. In step S11004, the total algebra (g) included in the iterative prediction model is analyzed. In step S11006, the latest generation of "data to be predicted" is obtained, and in step S11008, a prediction result is obtained according to the "data to be predicted". It is worth noting that, in some embodiments, the "data to be predicted" of the current algebra can be input into an algorithm P to perform prediction to obtain a prediction result. The algorithm P will be described below. It should be noted that the model used for prediction is extracted from the above iterative prediction model and must match the current data algebra. After that, as in step S10010, the number of iterations is decreased by 1 (g = g-1). In step S11012, it is determined whether g is greater than 0 (g> 0). When g is greater than 0 (YES in step S11012), as in step S11014, the prediction result obtained in step S11008 is integrated into the to-be-predicted data of the current generation as feature values, and becomes the new-generation of to-be-predicted data. After that, the flow returns to step S11006. When g is not greater than 0 (NO in step S11012), in other words, each generation model in the iterative prediction model is used up sequentially, as in step S11016, the prediction result is output.

FIG. 12 shows a modular data prediction method (algorithm P) according to an embodiment of the present invention.

In step S12002, data to be predicted and a prediction model are obtained. It is worth noting that, in some embodiments, a known result of the corresponding to-be-predicted data may also be received. In step S12004, each machine learning algorithm is adapted according to the prediction model, and in steps S12006a, S12006b, ..., S12006n, the data to be predicted is put into a modular program, and prediction is performed with each of the machine learning methods selected initially. In some embodiments, the modular program may be executed using an algorithm AP. The algorithm AP will be explained later. In step S12008, the prediction results of all the machine learning algorithms are merged. It is worth noting that, in some embodiments, the merging method may be to average the prediction data of all the used machine learning algorithms on the same piece of data. In step S12010, it is determined whether there is a known result to verify the prediction accuracy and it is required to perform verification. When there is no known result to verify the prediction accuracy and no verification is required (NO in step S12010), as in step S12012, the prediction result is output. When there is a known result to verify the accuracy of the prediction and a verification is required ("Yes" in step S12010), as in step S12014, the prediction result is compared with the known result, and various accuracy indicators are calculated, as in step S12016 , Output prediction results and / or various accuracy indicators. In some embodiments, the accuracy index includes accuracy, AUC, and MCC.

FIG. 13 shows a random forest type data prediction method (algorithm AP) according to an embodiment of the present invention. In this embodiment, random forest-type data prediction can be performed. With the algorithm AP and algorithm AT, the "preference" and "over-adaptation" of the prediction system can be effectively reduced.

In step S13002, the data to be predicted and a random forest type prediction model are obtained, and the machine learning method to be used is configured according to the settings in the random forest type prediction model. In step S13004, the data to be predicted are imported into the sub-prediction programs of all the sub-models in the corresponding random forest prediction model, and in steps S13006a, S13006b, ..., S13006t, the individual sub-prediction in the random forest prediction model is used according to the data to be predicted. Program to make predictions, so as to get prediction results and probability values. Assuming that there are t sub-models in the prediction model, there are t sub-prediction programs. In step S13008, it is determined that the prediction result integration mode is a voting mode or an average mode. When the prediction result integration mode is a voting mode, as in step S13010, how many sub-prediction programs are supported for each category of settlement of each to-be-predicted data. Among them, the category with the highest number of votes is the forecast result, and the percentage of votes obtained by each category is its confidence index. After that, in step S13014, the prediction result and the confidence index are output. When the prediction result integration mode is an average mode, as in step S13012, the confidence index of each sub-prediction program in each category is settled for each piece of data to be predicted. Among them, the confidence index of each category is the average probability value of all subroutines in that category, and the category with the highest confidence index is the predicted result. After that, in step S13014, the prediction result and the confidence index are output.

Therefore, through the establishment of an optimized prediction model based on machine learning and the system and method for obtaining prediction results of the present invention, it is possible to perform machine learning model training and prediction with an automated and modular design, thereby obtaining a more efficient Machine learning training programs, and more accurate prediction results.

The method of the present invention, or a specific type or part thereof, may exist in the form of program code. The program code may be contained in a physical medium, such as a floppy disk, a compact disc, a hard disk, or any other machine-readable (such as computer-readable) storage medium, or is not limited to an external form of computer program product. When code is loaded and executed by a machine, such as a computer, this machine becomes a device for participating in the present invention. The program code can also be transmitted through some transmission media, such as wires or cables, optical fibers, or any transmission type. Where the program code is received, loaded, and executed by a machine, such as a computer, the machine becomes used to participate in the program. Invented device. When implemented in a general-purpose processing unit, the program code in combination with the processing unit provides a unique device that operates similarly to application-specific logic circuits.

Although the present invention has been described as above with preferred embodiments, these descriptions are not intended to limit the present invention. Any person skilled in the art can make some modifications and retouching to the present invention without departing from the spirit and scope of the present invention. Therefore, the protection scope of the present invention shall be defined by the appended claims.

Claims

A method for establishing an optimized prediction model based on mechanical learning, including the following steps:

a) A user provides training data with a data format, and selects several mechanical learning algorithms to be used, a calculation value, and a target prediction value;

b) using a conversion program to convert the data format to which the training data belongs to a relay format, obtain a formatted raw data, and set the plurality of mechanical learning algorithms with a first feature value and a parameter setting group;

c) The data value of the formatted original data is divided into a sub-training set and a sub-test set;

d) establishing a first sub-prediction model by using the plurality of mechanical learning algorithms and data values contained in the sub-training set;

e) Substituting the data values contained in the sub-test set into the first sub-prediction model, and obtaining a first accuracy through several prediction algorithms;

f) If the data values of the formatted original data have been used as the sub-training set and sub-test set, or the number of repetitions satisfies the value of the operation, modify the n-th feature value and parameter setting group according to the n-th accuracy to obtain a N + 1th eigenvalue and parameter setting group, otherwise, repeat steps c) to e);

g) resetting the plurality of mechanical learning algorithms with the nth feature value and the parameter setting group, and establishing a first prediction model by using the plurality of mechanical learning algorithms and formatting the data values contained in the original data;

h) If the n-th accuracy meets the target prediction value or the number of repetitions satisfies the calculation value, provide an n-th prediction model as an optimized prediction model; otherwise, modify the n-characteristic value and parameter setting group according to the accuracy, Obtaining an n + 1th feature value and a parameter setting group to set the plurality of mechanical learning algorithms, and repeating steps c) to e); and

i) Display the optimized prediction model and the n-th accuracy.
The method for establishing an optimized prediction model based on machine learning according to claim 1, wherein step h further comprises the following steps:

h1) accessing the (n + 1) th feature value and parameter setting group to a data temporary storage area; and

h2) If the number of repetitions satisfies the value of the operation, the highest accuracy is selected from the data temporary storage area, and the several mechanical learning algorithms are reset.
The method for establishing an optimized prediction model based on machine learning according to claim 1, wherein step c further comprises the following steps:

c1) After the data values of the formatted original data are divided into a training set and a test set, the data values of the training set are divided into a sub-training set and a sub-test set;

Moreover, step g further includes the following steps:

g1) establishing a first prediction model by using the several machine learning algorithms and data values contained in the training set;

g2) Substituting the data value of the test set into the first prediction model, and obtaining a first test accuracy through the plurality of prediction algorithms; and

g3) replacing the first test accuracy with the first accuracy.
The method for establishing an optimized prediction model based on machine learning according to claim 1, wherein step a further comprises the following steps:

a1) Select a balanced sample cardinality (n) to be used;

Moreover, step d further includes the following steps:

d1) The plurality of machine learning algorithms divide the data values carried in the sub-training set into multiple sampling categories, wherein the plurality of machine learning algorithms have different sampling categories:

d2) sampling the number of balanced cardinal numbers of the classified samples from the multiple sampling categories to establish a sample combination;

d3) establishing a first sample prediction model using the data values contained in the sample combination; and

d4) Repeat steps d2) to d3) until the operation value (t) is satisfied, obtain a plurality of sample prediction models, and merge the plurality of sample prediction models to form a first sub-prediction model.
The method for establishing an optimized prediction model based on machine learning according to claim 1, wherein step e further comprises the following steps:

eap1) the plurality of prediction algorithms respectively obtain a plurality of first sample accuracy; and

eap2) Selecting the highest confidence index of the accuracy of the plurality of first samples from a voting mode or an average mode as the first prediction result.
The method for establishing an optimized prediction model based on machine learning according to claim 1, wherein step e further comprises the following steps:

e1) comparing the first accuracy with a known result to obtain a first accuracy index;

Moreover, step f also includes the following steps:

f1) Modify the nth feature value and parameter setting group according to the nth accuracy and the nth accuracy index.
The method for establishing an optimized prediction model based on machine learning according to claim 6, wherein the accuracy index includes accuracy, AUC, and MCC.
The method for establishing an optimized prediction model based on machine learning according to claim 1, wherein in step b, the data format is repeatedly compared via a plurality of conversion programs, and a corresponding conversion program is selected.
The method for establishing an optimized prediction model based on machine learning according to claim 1, wherein the data format is a csv file or a plain text file.
A method for obtaining optimized prediction results based on mechanical learning, including the following steps:

a) a user provides data to be predicted, has a data format, and selects an optimized prediction model according to claim 1 and a plurality of prediction algorithms to be used;

b) using a conversion program to convert the data format to which the data to be predicted belongs to a relay format to obtain a formatted raw data; and

c) Substituting the numerical value of the formatted raw data into an optimized prediction model, and obtaining an optimized prediction result and an optimized accuracy index through the prediction algorithm.
The method for obtaining an optimized prediction result based on machine learning according to claim 10, wherein step a further comprises the following steps:

a1) then select an operation value;

Moreover, step c further includes:

c1) formatting the raw data into a first formatting raw data, substituting the data value contained in the first formatting raw data into an optimized prediction model, and obtaining a first prediction result through the plurality of prediction algorithms;

c2) Combine an n-th formatted to-be-predicted data with the n-th prediction result, obtain an n + 1-th formatted to-be-predicted data, and repeat step c1) until the number of repetitions satisfies the value of the operation, and provide an n + 1th The prediction result is used as the optimized prediction result.
The method for obtaining an optimized prediction result based on machine learning according to claim 11, wherein step c1 further comprises the following steps:

c1p1) obtaining a first accuracy through the plurality of prediction algorithms, and comparing the first accuracy with a known result to obtain a first accuracy index;

In addition, step c2 includes the following steps:

c2p1) provides an n + 1th accuracy index as the optimization accuracy index.
The method for obtaining an optimized prediction result based on machine learning according to claim 12, wherein the accuracy index includes accuracy, AUC, and MCC.
A system for building an optimal prediction model based on mechanical learning, including:

A storage unit configured to store training data having a data format and several mechanical learning algorithms; and

A processing unit is coupled to the storage unit for configuration to perform the following method steps:

a) receiving a calculation value and a target prediction value;

b) using a conversion program to convert the data format to which the training data belongs to a relay format, obtain a formatted raw data, and set the mechanical learning algorithms with a first feature value and a parameter setting group;

c) dividing the data values of the formatted raw data into a sub-training set and a sub-testing set;

d) establishing a first sub-prediction model by using the plurality of mechanical learning algorithms and data values of the sub-training set;

e) Substituting the data values contained in the sub-test set into the first sub-prediction model, and obtaining a first accuracy through the plurality of prediction algorithms;

f) If the data values of the formatted original data have been used as the sub-training set and the sub-test set, or the number of repetitions satisfies the value of the operation, modify the n feature value and parameter setting group according to the n-th accuracy to obtain A n + 1th eigenvalue and parameter setting group, otherwise, repeat steps c) to e);

g) resetting the plurality of mechanical learning algorithms with the n feature value and the parameter setting group, and establishing a first prediction model by using the plurality of mechanical learning algorithms and formatting the data values contained in the original data;

h) If the n-th accuracy meets the target prediction value or the number of repetitions satisfies the calculation value, provide an n-th prediction model as an optimized prediction model; otherwise, modify the n-characteristic value and parameter setting group according to the accuracy, Obtaining an n + 1th feature value and a parameter setting group to set the plurality of mechanical learning algorithms, and repeating steps c) to e); and

i) Display the optimized prediction model and the n-th accuracy.
A method system for obtaining optimized prediction results based on mechanical learning, including:

A storage unit configured to store a to-be-predicted data having a data format, an optimized prediction model, and several prediction algorithms; and

A processing unit is coupled to the storage unit for configuration to perform the following method steps:

a) Select the optimized prediction model and the prediction algorithms;

b) using a conversion program to convert the data format to which the data to be predicted belongs to a relay format to obtain a formatted raw data; and

c) Substituting the data values contained in the formatted raw data into the optimized prediction model, and obtaining an optimized prediction result and an optimized accuracy index through the several prediction algorithms.