WO2020041998A1 - Systems and methods for establishing optimized prediction model and obtaining prediction result - Google Patents

Systems and methods for establishing optimized prediction model and obtaining prediction result Download PDF

Info

Publication number
WO2020041998A1
WO2020041998A1 PCT/CN2018/102897 CN2018102897W WO2020041998A1 WO 2020041998 A1 WO2020041998 A1 WO 2020041998A1 CN 2018102897 W CN2018102897 W CN 2018102897W WO 2020041998 A1 WO2020041998 A1 WO 2020041998A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
prediction
prediction model
accuracy
optimized
Prior art date
Application number
PCT/CN2018/102897
Other languages
French (fr)
Chinese (zh)
Inventor
罗惟正
陈宥宏
钟舜宇
Original Assignee
财团法人交大思源基金会
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 财团法人交大思源基金会 filed Critical 财团法人交大思源基金会
Priority to PCT/CN2018/102897 priority Critical patent/WO2020041998A1/en
Publication of WO2020041998A1 publication Critical patent/WO2020041998A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • the present invention relates to a system and method for establishing a prediction model and obtaining a prediction result, and in particular, to a system and method for establishing an optimal prediction model based on mechanical learning and obtaining a prediction result.
  • AI artificial intelligence
  • Machine learning is part of artificial intelligence.
  • the purpose of machine learning is to make computers have the ability to learn.
  • the computer In order for the computer to have the ability to identify and judge, the computer must use the existing data for two programs of training and prediction. The entire program includes steps to obtain data, analyze data, build models, and predict the future.
  • the machine learning mechanism can realize the acquisition of data, the selection of eigenvalues, the determination of algorithm parameters, the integration of algorithms, and the optimization of accuracy, such as automation and modular design, it will greatly Improve the efficiency, convenience, and accuracy of machine learning training.
  • the invention provides a system and method for establishing an optimized prediction model based on mechanical learning and obtaining a prediction result, in which an automatic and modular design can be used for machine learning model training and prediction, thereby obtaining more efficient machine learning Training procedures, and more accurate predictions.
  • a user provides training data, has a data format, and selects several mechanical learning algorithms and an operation to be used Magnitude and a target prediction value; b) using a conversion program to convert the data format to which the training data belongs to a relay format, obtain a formatted raw data, and set it with a first characteristic value and a parameter setting group The several mechanical learning algorithms; c) dividing the data values of the formatted raw data into a sub-training set and a sub-testing set; d) establishing by the data values contained in the several mechanical learning algorithms and the sub-training set A first sub-prediction model; e) substituting the data values contained in the sub-test set into the first sub-prediction model, and obtaining a first accuracy through several prediction algorithms; f) if the data values of the formatted original data are all As the sub-training set and the sub-test set, or the number of repetitions satisfies the calculation
  • the system for establishing an optimized prediction model based on mechanical learning in the embodiment of the present invention includes at least a storage unit and a processing unit.
  • the storage unit includes training data having a data format, and several mechanical learning algorithms.
  • the processing unit is coupled to the storage unit and configured to perform the following method steps: a) receiving a calculation value and a target prediction value; b) using a conversion program to convert the data format to which the training data belongs to a relay format To obtain a formatted raw data, and set the mechanical learning algorithms with a first feature value and a parameter setting group; c) divide the data value of the formatted raw data into a sub-training set and a sub-test set D) establishing a first sub-prediction model by using the several mechanical learning algorithms and data values carried in the sub-training set; e) substituting the data values contained in the sub-test set into the first sub-prediction model and passing several prediction algorithms Obtain a first accuracy; f) if the data values of the formatted original data have been used as the sub-
  • step h further includes the following steps: h1) accessing the (n + 1) th feature value and the parameter setting group to a data temporary storage area; and h2) if the number of repetitions meets the operation value, the The highest accuracy is selected in the data temporary storage area, and the several mechanical learning algorithms are reset.
  • step c further includes the following steps: c1) After dividing the data values of the formatted raw data into a training set and a test set, the data values of the training set are divided into the sub-training set and the sub-set The test set, and step g further includes the following steps: g1) establishing the first prediction model through the several mechanical learning algorithms and the data values of the training set; g2) substituting the data values of the test set into the first A prediction model, obtaining a first test accuracy through the plurality of prediction algorithms; and g3) replacing the first test accuracy with the first accuracy.
  • step a further includes the following steps: selecting a classification sample balance cardinality (n) to be used; and step d further includes the following steps: d1) the plurality of mechanical learning algorithms load the sub-training set The data values are divided into multiple sampling categories, where the several mechanical learning algorithms have different sampling categories: d2) the number of sample balanced cardinalities is sampled from multiple sampling categories to establish a sample combination.
  • Step d2) may repeatedly sample the number of balanced cardinalities of the classification sample; d3) use the data value contained in the sample combination to establish the first sample prediction model; d4) repeat steps d2) to d3) until the operation value (t ) To obtain a plurality of sample prediction models, and merge the plurality of sample prediction models to form a first sub prediction model.
  • step e further includes the following steps: eap1) each of the plurality of prediction algorithms obtains several first sample accuracy; and eap2) the plurality of first samples are selected by a voting mode or an average mode The one with the highest confidence index for sample accuracy is used as the first prediction result.
  • step e further includes the following steps: e1) comparing the first accuracy with a known result to obtain a first accuracy index; and step f further includes the following steps: f1) is accurate according to the nth Degree and n-th accuracy index modify the n-th eigenvalue and parameter setting group.
  • the accuracy index includes accuracy, which refers to all correctly predicted samples / total samples, AUC (Area Underlying the Receiver, Operating Characteristic Curve), and MCC (Matthews Correlation, Coefficient).
  • step b a plurality of the conversion programs are repeatedly compared to the data format, and a corresponding conversion program is selected.
  • the data format is a csv file or a plain text file.
  • a user provides a data to be predicted, has a data format, and selects an optimized prediction model and the method to be used Several prediction algorithms; b) using a conversion program to convert the data format to which the data to be predicted belongs to a relay format to obtain a formatted raw data; and c) the data value contained in the formatted raw data Substituting into the optimization prediction model, an optimization prediction result and an optimization accuracy index are obtained through the plurality of prediction algorithms.
  • the system for obtaining an optimized prediction result based on mechanical learning in the embodiment of the present invention includes at least a storage unit and a processing unit.
  • the storage unit includes a data to be predicted having a data format, an optimized prediction model, and a complex prediction algorithm.
  • the processing unit is coupled to the storage unit and configured to perform the following method steps: a) selecting the optimized prediction model and the prediction algorithms; b) using a conversion program to convert the data format to which the data to be predicted belongs to a Relay format to obtain a formatted raw data; and c) substituting the data value contained in the formatted raw data into the optimized prediction model, and obtaining an optimized prediction result and an optimized accuracy index through the plurality of prediction algorithms.
  • step a further includes the following steps: a1) selecting an operation value; and step c further includes: c1) the formatted raw data is a first formatted raw data, and the first format is The value of the data contained in the original data is substituted into the optimized prediction model, and a first prediction result is obtained through the prediction algorithms; c2) An n-th formatted to-be-predicted data is combined with an n-th prediction result to obtain an n + 1th format To predict the data, repeat step c1) until the number of repetitions satisfies the value of the operation, and provide an n + 1th prediction result as the optimized prediction result.
  • step c1 further includes the following steps: c1p1) obtaining a first accuracy through the plurality of prediction algorithms, comparing the first accuracy with a known result to obtain a first accuracy index; and Step c2 further includes the following steps: c2p1) Provide an n + 1th accuracy index as the optimization accuracy index.
  • the accuracy indicators include accuracy, AUC, and MCC.
  • the above method of the present invention may exist in a program code manner.
  • the program code When the program code is loaded and executed by a machine, the machine becomes a device for practicing the present invention.
  • FIG. 1 is a schematic diagram showing a system for establishing an optimized prediction model based on mechanical learning according to an embodiment of the present invention
  • FIG. 2 is a flowchart showing a method for establishing an optimized prediction model based on mechanical learning according to an embodiment of the present invention
  • FIG. 3 is a flowchart showing a method for establishing an optimized prediction model based on mechanical learning according to another embodiment of the present invention.
  • FIG. 4 is a flowchart showing a method for automatic feature value selection and machine learning algorithm parameter optimization according to an embodiment of the present invention
  • 5A and 5B are a flowchart showing a method for modularly establishing a prediction model according to an embodiment of the present invention
  • 6A and 6B are flowcharts showing a balanced data sampling mode and a random forest prediction model training method according to an embodiment of the present invention
  • FIG. 7A and 7B are flowcharts showing a method for optimizing prediction accuracy according to an embodiment of the present invention.
  • FIG. 8 is a schematic diagram showing a system for obtaining an optimal prediction result based on mechanical learning according to an embodiment of the present invention.
  • FIG. 9 is a flowchart showing a method for obtaining an optimized prediction result based on mechanical learning according to an embodiment of the present invention.
  • FIG. 10 is a flowchart showing a method for obtaining an optimized prediction result based on mechanical learning according to another embodiment of the present invention.
  • FIG. 11 is a flowchart showing an iterative prediction method according to an embodiment of the present invention.
  • FIG. 12 is a flowchart showing a method for predicting modular data according to an embodiment of the present invention.
  • FIG. 13 is a flowchart showing a random forest type data prediction method according to an embodiment of the present invention.
  • SYMBOLS 1000 establishment system of optimized prediction model based on mechanical learning; 1100 electronic device; 1110 data input unit; 1120 storage unit; 1122 training data; 1124 machine learning algorithm; 1130 processing unit; S2002, S2004, ...
  • FIG. 1 shows a system 1000 for building an optimized prediction model based on mechanical learning according to an embodiment of the present invention.
  • the system 1000 for building an optimization prediction model based on mechanical learning according to an embodiment of the present invention may be applied to an electronic device 1100, such as a single-core or multi-core computing device, and may be a stand-alone environment or a cluster environment.
  • the electronic device 1100 includes a data input unit 1110, a storage unit 1120, and a processing unit 1130.
  • the data input unit 1110 may be used to receive a plurality of training data.
  • the storage unit 1120 may store the training data 1122 received by the data input unit 1110 and a plurality of machine learning algorithms 1124.
  • the data format is a csv file or a plain text file.
  • the system can receive an advanced system configuration through the data input unit 1110 for system settings, such as the size of the random forest, or the voting mechanism for setting prediction results and detailed parameters of each algorithm.
  • the processing unit 1130 can control the related software and hardware operations in the electronic device 1100 and perform the method for establishing an optimized prediction model based on mechanical learning of the present invention, the details of which will be described below.
  • FIG. 2 shows a method for establishing an optimized prediction model based on mechanical learning according to an embodiment of the present invention.
  • the method for establishing an optimized prediction model based on mechanical learning according to an embodiment of the present invention is applicable to the electronic device shown in FIG. 1.
  • step S2002 a plurality of training data input by a user and at least one selected machine learning algorithm are received. It is worth noting that, in some embodiments, an advanced system configuration can also be received for system setting.
  • step S2004 the received training data is uniformly converted into a relay format of the system. It should be noted that the received training data may have different data formats.
  • step S2004 the training data in different formats are respectively converted into a relay format for subsequent processing.
  • step S2006 an algorithm M is performed to perform automatic feature value screening and machine learning algorithm parameter optimization.
  • step S2008 an algorithm O is performed to optimize the iterative prediction model.
  • step S2010 a prediction model and corresponding accuracy evaluation data are output. Algorithm M and Algorithm O will be described in detail below.
  • FIG. 3 shows a method for establishing an optimized prediction model based on mechanical learning according to another embodiment of the present invention.
  • the method for establishing an optimized prediction model based on mechanical learning according to an embodiment of the present invention is applicable to the electronic device shown in FIG. 1.
  • step S3002 a plurality of training data input by a user and at least one selected machine learning algorithm are received. Similarly, in some embodiments, an advanced system configuration may also be received for system settings. Then, according to steps S3004a, S3004b, ..., S3004n, the training data in different formats are uniformly converted into a relay format of the system by corresponding conversion procedures of different formats, and in step S3006, training with a relay format is output.
  • step S3008 combining the characteristic values of the corresponding training data with the adjustable parameters of the selected machine learning algorithm is to control the specific behavior of each calculus, such as the number of layers of the artificial neural network and The number of nodes in each layer becomes a "characteristic value and parameter setting group”.
  • step S3010 an algorithm M is performed to perform automatic feature value screening and machine learning algorithm parameter optimization.
  • step S3012 an algorithm O is performed to optimize the iterative prediction model.
  • step S3014 a prediction model and corresponding accuracy evaluation data are output.
  • algorithm M and algorithm O will be described in detail below.
  • FIG. 4 shows an automatic eigenvalue screening and machine learning algorithm parameter optimization method (Algorithm M) according to an embodiment of the present invention.
  • Algorithm M machine learning algorithm parameter optimization method
  • step S4002 a "characteristic value and parameter setting group” is obtained.
  • step S4004 the characteristic values are screened programmatically and the parameters of each algorithm are adjusted.
  • the “feature value and parameter setting group” is adjusted programmatically, and an algorithm T is performed in step S4006 to establish a prediction model and test the accuracy according to the “feature value and parameter setting group”.
  • step S4004 may be a simple random screening and adjustment.
  • step S4004 may be performed using a Monte Carlo algorithm, a genetic algorithm, and / or a derivative algorithm thereof. The algorithm T will be described later.
  • step S4008 the "feature value and parameter setting group" and the corresponding accuracy data are temporarily stored.
  • step S4010 it is determined whether the accuracy data has reached a specific standard or a cycle number has reached an upper limit. It is reminded that the specific standard or number of cycles can be defaulted by the system or set by a user.
  • the accuracy data does not reach a specific standard or the number of cycles does not reach the upper limit (NO in step S4010), as in step S4014, the number of cycles is increased by 1, and the flow returns to step S4004.
  • the accuracy data reaches a certain standard or the number of cycles reaches the upper limit (YES in step S4010), as in step S4012, the temporarily stored "characteristic value and parameter setting group" and / or the corresponding accuracy data are output.
  • 5A and 5B show a method (algorithm T) for modularly building a prediction model according to an embodiment of the present invention.
  • a predictive model can be established by a modular program.
  • step S5002 training data and a "feature value and parameter setting group" are obtained.
  • step S5004 it is determined whether a test is required.
  • the training data is divided into a "training set TRD" and a "testing set TED".
  • step S5006 can be implemented in different ways.
  • the segmentation method may be based on N-fold cross validations, random grouping, or a combination of N-fold cross validations and random grouping. It should be noted that the above segmentation method is only an example of the invention, and the present invention is not limited thereto.
  • step S5008a, S5008b, ..., S5008n the training set TRD is put into a modular program, and a prediction model belonging to each method is established with a selected machine learning algorithm.
  • the algorithm AT is used to implement the above-mentioned modular program, the details of which will be described below.
  • step S5010 the prediction models of all the machine learning algorithms are integrated, and in step S5012, an accuracy test is performed on the integrated prediction model according to the test set TED.
  • the algorithm P is used to implement the accuracy test described above, and its details will be described below.
  • step S5014 it is determined whether all the training data have been used for model establishment and accuracy testing or the number of cycles has reached a number of tests. It is reminded that the number of tests may be defaulted by the system or set by a user.
  • the number of tests may be defaulted by the system or set by a user.
  • step S5016 the number of cycles is increased by 1, and the flow returns to step S5006.
  • step S5018 a prediction accuracy is counted and output, and as in step S5024, the entire output is output.
  • step S5004 When the test is not required ("No" in step S5004), such as steps S5020a, S5020b, ..., S5020n, all the formatted raw data FOD is put into a modular program, and the selected machine learning algorithm is used to establish the The predictive model of the method. Similarly, the algorithm AT is used as the above-mentioned modular program, and its details will be described later. Then, as in step S5022, the prediction models of all the machine learning algorithms are merged, and in step S5024, the integrated prediction model is output.
  • FIG. 6A and FIG. 6B illustrate an equalized data sampling mode and a random forest prediction model training method (algorithm AT) according to an embodiment of the present invention.
  • algorithm AT random forest prediction model training method
  • the degree of "preference” and “over-adaptation” of the prediction system can be effectively reduced.
  • step S6002 training data, a sampling number t, and a balanced sample number n are obtained. It is worth noting that in this procedure t samples will be sampled and t sub-prediction models will be established.
  • the training data is grouped according to a known category to generate a category 1, a category 2, ..., a category n (C1, C2, ..., Cn). For example, it is known that there are 4 types of correct answers: heart disease, diabetes, gout, and none of the above diseases, and the training data can be divided into 4 groups according to the correct answers.
  • step S6010 in a random and repeatable manner, n pieces of data are taken from each group to form a sample s together.
  • step S6012 a sample prediction model s is established by using the obtained sample s.
  • step S6014 it is determined whether the number of cycles s is less than t. When the number of loops s is less than t (YES in step S6014), the flow returns to step S6008.
  • step S6016 When the number of cycles s is not less than t (NO in step S6014), as in step S6016, the t sub-prediction models obtained by combining the above are the final random forest type prediction model, and as in step S6018, the final random forest type prediction model Output.
  • FIG. 7A and FIG. 7B show a prediction accuracy optimization method (Algorithm O) according to an embodiment of the present invention.
  • Algorithm O a prediction accuracy optimization method according to an embodiment of the present invention.
  • “iterative prediction model optimization” can be performed with an automated program.
  • step S7002 training data is obtained, and the training data is divided into “training set TRD” and "testing set TED".
  • step S7004 the latest generation of "characteristic value and parameter setting group” is obtained.
  • step S7006 a prediction model is established according to the test set TED and the algorithm T in the embodiment of FIG. 5, a prediction result such as a probability value and / or a confidence index is calculated, and the accuracy is tested.
  • step S7008 the "feature value and parameter setting group” and the prediction result in step S7006 are integrated to form a new generation of "feature value and parameter setting group". In other words, the predicted data can be added as a new feature value to the "feature value and parameter setting group".
  • step S7010 the latest generation of "characteristic value and parameter setting group" and its accuracy data are temporarily stored.
  • step S7012 the completed algebra is incremented, and if step S7014, it is judged whether the accuracy data has reached a specific standard or the number of cycles has reached the upper limit of the number of generations. It is reminded that a specific standard or algebraic upper limit may be a system default or set by a user.
  • the accuracy data does not reach a specific standard or the number of cycles does not reach the upper limit of the algebra (NO in step S7014), as in step S7016, the number of cycles is increased by 1, and the flow returns to step S7004.
  • the accuracy data reaches a certain standard or the number of cycles reaches the upper limit of algebra (YES in step S7014), as in step S7018, the "feature value and parameter setting group" with the highest current accuracy is output.
  • the algorithm M and the algorithm O may be implemented as two steps of upstream and downstream, as shown in the embodiment of FIG. 3.
  • the algorithm M and the algorithm O can also be integrated as a step by covering each other.
  • the algorithm T used in the algorithm O is replaced with the algorithm M, or the algorithm T used in the algorithm M is replaced. Steps are replaced by algorithm O.
  • FIG. 8 shows a system for obtaining an optimized prediction result based on mechanical learning according to an embodiment of the present invention.
  • the system 8000 for obtaining an optimized prediction result based on mechanical learning according to an embodiment of the present invention may be applicable to an electronic device 8100, such as a single-core or multi-core computing device, and may be a single-machine environment or a cluster environment.
  • the electronic device 8100 includes a data input unit 8110, a storage unit 8120, and a processing unit 8130.
  • the data input unit 8110 may be used to receive a data to be predicted.
  • the storage unit 8120 may store the to-be-predicted data 8122 and the prediction model 8124 received by the data input unit 8110.
  • the system may receive an advanced system configuration through the data input unit 8110 for setting the system.
  • the processing unit 8130 can control the related software and hardware operations in the electronic device 8100, and perform the method for obtaining an optimized prediction result based on mechanical learning of the present invention, the details of which will be described below.
  • FIG. 9 shows a method for obtaining an optimized prediction result based on mechanical learning according to an embodiment of the present invention.
  • the method for obtaining an optimized prediction result based on mechanical learning according to an embodiment of the present invention is applicable to the electronic device shown in FIG. 8.
  • step S9002 data to be predicted and a prediction model are received.
  • the prediction model may be generated according to the embodiment of FIG. 2 or FIG. 3.
  • an advanced system configuration can also be received for system setting.
  • step S9004 the data to be predicted is converted into a relay format of the system. It should be noted that the received to-be-predicted data may have different data formats.
  • step S9004 the data to be predicted in different formats are respectively converted into a relay format for subsequent processing.
  • an algorithm IP is performed for the automated program to perform "iterative prediction", and in step S9008, the prediction result and accuracy evaluation data are output. The algorithm IP will be described later.
  • FIG. 10 shows a method for obtaining an optimized prediction result based on mechanical learning according to another embodiment of the present invention.
  • the method for obtaining an optimized prediction result based on mechanical learning according to an embodiment of the present invention is applicable to the electronic device shown in FIG. 8.
  • step S10002 data to be predicted input by a user and a prediction model are received.
  • the prediction model may be generated according to the embodiment of FIG. 2 or FIG. 3.
  • an advanced system configuration can also be received for system setting.
  • steps S10004a, S10004b, ..., S10004n the data to be predicted in different formats are uniformly converted into a relay format of the system through corresponding different format conversion procedures, and in step S10006, the The data to be predicted is called "formatted data to be predicted”.
  • step S10008 the content of the prediction model is confirmed, and an algorithm adaptation operation is performed.
  • step S10010 an algorithm IP is performed for the automated program to perform "iterative prediction", and in step S10012, the prediction result and accuracy evaluation data are output.
  • the algorithm IP will be described later.
  • FIG. 11 shows an iterative prediction method (algorithm IP) according to an embodiment of the present invention.
  • step S11002 the data to be predicted and an iterative prediction model are obtained.
  • step S11004 the total algebra (g) included in the iterative prediction model is analyzed.
  • step S11006 the latest generation of "data to be predicted” is obtained, and in step S11008, a prediction result is obtained according to the "data to be predicted”.
  • the "data to be predicted" of the current algebra can be input into an algorithm P to perform prediction to obtain a prediction result.
  • step S11012 it is determined whether g is greater than 0 (g> 0).
  • g is greater than 0 (YES in step S11012)
  • step S11014 the prediction result obtained in step S11008 is integrated into the to-be-predicted data of the current generation as feature values, and becomes the new-generation of to-be-predicted data.
  • the flow returns to step S11006.
  • g is not greater than 0 (NO in step S11012)
  • each generation model in the iterative prediction model is used up sequentially, as in step S11016, the prediction result is output.
  • FIG. 12 shows a modular data prediction method (algorithm P) according to an embodiment of the present invention.
  • step S12002 data to be predicted and a prediction model are obtained. It is worth noting that, in some embodiments, a known result of the corresponding to-be-predicted data may also be received.
  • step S12004 each machine learning algorithm is adapted according to the prediction model, and in steps S12006a, S12006b, ..., S12006n, the data to be predicted is put into a modular program, and prediction is performed with each of the machine learning methods selected initially.
  • the modular program may be executed using an algorithm AP. The algorithm AP will be explained later.
  • step S12008 the prediction results of all the machine learning algorithms are merged.
  • the merging method may be to average the prediction data of all the used machine learning algorithms on the same piece of data.
  • step S12010 it is determined whether there is a known result to verify the prediction accuracy and it is required to perform verification. When there is no known result to verify the prediction accuracy and no verification is required (NO in step S12010), as in step S12012, the prediction result is output. When there is a known result to verify the accuracy of the prediction and a verification is required ("Yes" in step S12010), as in step S12014, the prediction result is compared with the known result, and various accuracy indicators are calculated, as in step S12016 , Output prediction results and / or various accuracy indicators.
  • the accuracy index includes accuracy, AUC, and MCC.
  • FIG. 13 shows a random forest type data prediction method (algorithm AP) according to an embodiment of the present invention.
  • random forest-type data prediction can be performed.
  • the algorithm AP and algorithm AT the "preference” and “over-adaptation” of the prediction system can be effectively reduced.
  • step S13002 the data to be predicted and a random forest type prediction model are obtained, and the machine learning method to be used is configured according to the settings in the random forest type prediction model.
  • step S13004 the data to be predicted are imported into the sub-prediction programs of all the sub-models in the corresponding random forest prediction model, and in steps S13006a, S13006b, ..., S13006t, the individual sub-prediction in the random forest prediction model is used according to the data to be predicted. Program to make predictions, so as to get prediction results and probability values. Assuming that there are t sub-models in the prediction model, there are t sub-prediction programs.
  • step S13008 it is determined that the prediction result integration mode is a voting mode or an average mode.
  • the prediction result integration mode is a voting mode
  • the category with the highest number of votes is the forecast result
  • the percentage of votes obtained by each category is its confidence index.
  • the prediction result and the confidence index are output.
  • the prediction result integration mode is an average mode
  • the confidence index of each sub-prediction program in each category is settled for each piece of data to be predicted.
  • the confidence index of each category is the average probability value of all subroutines in that category, and the category with the highest confidence index is the predicted result.
  • the prediction result and the confidence index are output.
  • the method of the present invention may exist in the form of program code.
  • the program code may be contained in a physical medium, such as a floppy disk, a compact disc, a hard disk, or any other machine-readable (such as computer-readable) storage medium, or is not limited to an external form of computer program product.
  • a machine such as a computer
  • this machine becomes a device for participating in the present invention.
  • the program code can also be transmitted through some transmission media, such as wires or cables, optical fibers, or any transmission type.
  • the program code is received, loaded, and executed by a machine, such as a computer, the machine becomes used to participate in the program.
  • Invented device When implemented in a general-purpose processing unit, the program code in combination with the processing unit provides a unique device that operates similarly to application-specific logic circuits.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Systems and methods for establishing an optimized prediction model and obtaining a prediction result based on machine learning. In an establishment program of the optimized prediction model, a plurality of pieces of training data input by a user and at least one machine learning algorithm selected by the user are received, and the received training data is uniformly converted to a relay format. Automatic feature value filtration and machine learning algorithm parameter optimization are performed, and iterative prediction model optimization is performed. Then, a prediction model and corresponding accuracy evaluation data are output. In the obtaining program of the prediction result, the data to be predicted is converted to the relay format, and iterative prediction is performed on the automatic programs to generate and output the prediction result and the accuracy evaluation data.

Description

优化预测模型的建立与预测结果获得系统及方法System and method for establishing optimal prediction model and obtaining prediction result 背景技术Background technique
本发明涉及一种预测模型的建立与预测结果获得系统及方法,且特别涉及一种以机械学习为基础的优化预测模型的建立与预测结果的取得系统及方法。The present invention relates to a system and method for establishing a prediction model and obtaining a prediction result, and in particular, to a system and method for establishing an optimal prediction model based on mechanical learning and obtaining a prediction result.
背景技术Background technique
近年来,随着人工智能(Artificial Intelligence,AI)技术的大幅进步,人工智能的应用领域不断延伸,透过人工智能将带来人类生活更加进步且便利的生活。In recent years, with the great progress of artificial intelligence (AI) technology, the application field of artificial intelligence has been continuously extended. Through artificial intelligence, human life will be more advanced and convenient.
机器学习(Machine learning)属于人工智能的一部分,机器学习的目的在于让计算机具有学习的能力。为了要让计算机具有辨识与判断的能力,计算机必须利用现有的数据进行训练与预测的两个程序。整个程序包含了获得数据、分析数据、建立模型、与预测未来等步骤。Machine learning is part of artificial intelligence. The purpose of machine learning is to make computers have the ability to learn. In order for the computer to have the ability to identify and judge, the computer must use the existing data for two programs of training and prediction. The entire program includes steps to obtain data, analyze data, build models, and predict the future.
通常,建立一个具有人工智能的计算机,极度需要高度专业能力才能达成的。举例来说,由于相关软件的操作、数据取得与算法的整合皆不容易,相关人员必须非常了解机器学习的原理,且需要良好的程序设计能力才可以完成机器学习的训练与预测程序。此外,由于目前模型的训练缺乏自动化与模块化的设计,特征值的筛选、算法参数的决定、算法的整合、及准确度的优化都必须凭借相关人员的经验,造成产出模型质量的不稳定性,并造成整体系统的学习与预测偏好。Often, building a computer with artificial intelligence requires a great deal of expertise. For example, because the operation of related software, data acquisition, and integration of algorithms are not easy, relevant personnel must be well aware of the principles of machine learning, and they need good programming skills to complete machine learning training and prediction programs. In addition, due to the lack of automation and modular design of current model training, the selection of eigenvalues, the determination of algorithm parameters, the integration of algorithms, and the optimization of accuracy must rely on the experience of relevant personnel, resulting in the instability of the output model quality. And the overall system's learning and prediction preferences.
有鉴于此,若能在机器学习的机制中将资料的取得、特征值的筛选、算法参数的决定、算法的整合、及准确度的优化等以自动化与模块化的设计来实现,将可大幅提升机器学习训练的效率、使用方便性及预测的准确度。In view of this, if the machine learning mechanism can realize the acquisition of data, the selection of eigenvalues, the determination of algorithm parameters, the integration of algorithms, and the optimization of accuracy, such as automation and modular design, it will greatly Improve the efficiency, convenience, and accuracy of machine learning training.
发明内容Summary of the Invention
本发明提供以机械学习为基础的优化预测模型的建立与预测结果的取得系统及方法,其中可以以自动化与模块化的设计来进行机器学习的模型训练与预测,从而得到更具效率的机器学习训练程序,及更具准确度的预测结果。The invention provides a system and method for establishing an optimized prediction model based on mechanical learning and obtaining a prediction result, in which an automatic and modular design can be used for machine learning model training and prediction, thereby obtaining more efficient machine learning Training procedures, and more accurate predictions.
在本发明实施例的以机械学习为基础的优化预测模型的建立方法中,首先,a)由一用户提供一训练数据,具有一数据格式,并选择欲使用的数个机械学习算法、一运算量值以及一目标预测值;b)利用一转换程序,将训练数据所属的数据格式转换至一中继格式,取得一格式化原始数据,并以一第一特征值与参数设定组设定所述数个机械学习算法;c)将格式化原始数据的数据数值分为一子训练集与一子测试集;d)通过所述数个机械学习算法与子训练集所载的数据数值建立一第一子预测模型;e)将子测试集所载的数据数值代入第一子预测模型,通过数个预测算法取得一第一准确度;f)若该格式化原始数据的数据数值均曾作为该子训练集及该子测试集,或重复次数满足该运算量值,依该第n准确度修改该第n特征值与参数设定组,取得一第n+1特征值与参数设定组,反之,重复步骤c)至e);g)以该第n特征值与参数设定组重设该等机械学习算法,通过所述数个机械学习算法与该格式化原始数据所载的数据数值建立一第一预测模型;h)若第n准确度满足该目标预测值或重复次数满足该运算量值,提供一第n预测模型作为一优化预测模型,反之,依该准确度修改第n特征值与参数设定组,取得一第n+1特征值与参数设定组设定该等机械学习算法,重复步骤c)至e);以及i)显示该优化预测模型与该第n准确度。In the method for establishing an optimized prediction model based on mechanical learning in the embodiment of the present invention, first, a) a user provides training data, has a data format, and selects several mechanical learning algorithms and an operation to be used Magnitude and a target prediction value; b) using a conversion program to convert the data format to which the training data belongs to a relay format, obtain a formatted raw data, and set it with a first characteristic value and a parameter setting group The several mechanical learning algorithms; c) dividing the data values of the formatted raw data into a sub-training set and a sub-testing set; d) establishing by the data values contained in the several mechanical learning algorithms and the sub-training set A first sub-prediction model; e) substituting the data values contained in the sub-test set into the first sub-prediction model, and obtaining a first accuracy through several prediction algorithms; f) if the data values of the formatted original data are all As the sub-training set and the sub-test set, or the number of repetitions satisfies the calculation value, modify the n-th eigenvalue and parameter setting group according to the n-th accuracy to obtain an n + 1-th eigenvalue and parameter setting group Conversely, repeat steps c) to e); g) reset the mechanical learning algorithms with the nth feature value and the parameter setting group, and use the number of mechanical learning algorithms and the data value contained in the formatted original data Establish a first prediction model; h) If the n-th accuracy meets the target prediction value or the number of repetitions satisfies the calculation value, provide an n-th prediction model as an optimized prediction model, otherwise, modify the n-th feature according to the accuracy Value and parameter setting group to obtain an n + 1th eigenvalue and parameter setting group to set the mechanical learning algorithms, and repeat steps c) to e); and i) display the optimized prediction model and the nth accuracy .
本发明实施例的以机械学习为基础的优化预测模型的建立系统至少包括一储存单元及一处理单元。储存单元包括具有一数据格式之一训练数据、与数个机械学习算法。处理单元耦接至储存单元,用以组态来执行下列方法步骤a)接收一运算量值以及一目标预测值;b)利用一转换程序,将训练数据所属的数据格式转换至一中继格式,取得一格式化原始数据,并以一第一特征值与参数设定组设定该等机械学习算法;c)将该格式化原始数据的数据数值分为一子训练集与一子测试集;d)通过所述数个机械学习算法与该子训练集所载数据数值建立一第一子预测模型;e)将子测试集所载数据数值代入第一子预测模型,通过数个预测算法取得一第一准确度;f)若该格式化原始数据的数据 数值均曾作为该子训练集及该子测试集,或重复次数满足该运算量值,依该第n准确度修改第n特征值与参数设定组,取得一第n+1特征值与参数设定组,反之,重复步骤c)至e);g)以第n特征值与参数设定组重设该等机械学习算法,通过所述数个机械学习算法与该格式化原始数据所载数据数值建立一第一预测模型;h)若第n准确度满足该目标预测值或重复次数满足所述运算量值,提供一第n预测模型作为一优化预测模型,反之,依所述准确度修改第n特征值与参数设定组,取得一第n+1特征值与参数设定组设定该等机械学习算法,重复步骤c)至e);以及i)显示该优化预测模型与所述第n准确度。The system for establishing an optimized prediction model based on mechanical learning in the embodiment of the present invention includes at least a storage unit and a processing unit. The storage unit includes training data having a data format, and several mechanical learning algorithms. The processing unit is coupled to the storage unit and configured to perform the following method steps: a) receiving a calculation value and a target prediction value; b) using a conversion program to convert the data format to which the training data belongs to a relay format To obtain a formatted raw data, and set the mechanical learning algorithms with a first feature value and a parameter setting group; c) divide the data value of the formatted raw data into a sub-training set and a sub-test set D) establishing a first sub-prediction model by using the several mechanical learning algorithms and data values carried in the sub-training set; e) substituting the data values contained in the sub-test set into the first sub-prediction model and passing several prediction algorithms Obtain a first accuracy; f) if the data values of the formatted original data have been used as the sub-training set and the sub-test set, or the number of repetitions satisfies the value of the operation, modify the nth feature according to the nth accuracy Value and parameter setting group to obtain an n + 1th eigenvalue and parameter setting group, otherwise, repeat steps c) to e); g) reset these mechanical learning algorithms with the nth eigenvalue and parameter setting group Through the several machine learning calculations Method to establish a first prediction model with the data values contained in the formatted raw data; h) if the n-th accuracy satisfies the target prediction value or the number of repetitions satisfies the calculation value, an n-th prediction model is provided as an optimized prediction Model, otherwise, modify the nth eigenvalue and parameter setting group according to the accuracy, obtain an n + 1th eigenvalue and parameter setting group to set these mechanical learning algorithms, and repeat steps c) to e); and i) Display the optimized prediction model and the n-th accuracy.
在一些实施例中,步骤h还包括下列步骤:h1)将第n+1特征值与参数设定组存取至一数据暂存区;以及h2)若重复次数满足该运算量值,由该数据暂存区中选择该准确度的最高者,重设所述数个机械学习算法。In some embodiments, step h further includes the following steps: h1) accessing the (n + 1) th feature value and the parameter setting group to a data temporary storage area; and h2) if the number of repetitions meets the operation value, the The highest accuracy is selected in the data temporary storage area, and the several mechanical learning algorithms are reset.
在一些实施例中,步骤c还包括下列步骤:c1)将该格式化原始数据的数据数值分为一训练集与一测试集后,该训练集的资料数值分为该子训练集与该子测试集,且步骤g还包括下列步骤:g1)通过过所述数个机械学习算法与该训练集所载数据数值建立该第一预测模型;g2)将测试集所载数据数值代入该第一预测模型,通过所述数个预测算法取得一第一测试准确度;以及g3)将该第一测试准确度取代为该第一准确度。In some embodiments, step c further includes the following steps: c1) After dividing the data values of the formatted raw data into a training set and a test set, the data values of the training set are divided into the sub-training set and the sub-set The test set, and step g further includes the following steps: g1) establishing the first prediction model through the several mechanical learning algorithms and the data values of the training set; g2) substituting the data values of the test set into the first A prediction model, obtaining a first test accuracy through the plurality of prediction algorithms; and g3) replacing the first test accuracy with the first accuracy.
在一些实施例中,步骤a还包括下列步骤:选择欲使用的一分类样本平衡基数(n);且,步骤d还包括下列步骤:d1)所述数个机械学习算法将子训练集所载数据数值分为多个取样类别,其中,所述数个机械学习算法具有不同取样类别:d2)分别由多个取样类别取样分类样本平衡基数的数量,建立一样本组合,在一些实施例中,步骤d2)可重复取样该分类样本平衡基数的数量;d3)利用该样本组合所载数据数值建立该第一样本预测模型;d4)重复步骤d2)至d3)直至满足该运算量值(t),取得数个样本预测模型,合并所述数个样本预测模型形成第一子预测模型。In some embodiments, step a further includes the following steps: selecting a classification sample balance cardinality (n) to be used; and step d further includes the following steps: d1) the plurality of mechanical learning algorithms load the sub-training set The data values are divided into multiple sampling categories, where the several mechanical learning algorithms have different sampling categories: d2) the number of sample balanced cardinalities is sampled from multiple sampling categories to establish a sample combination. In some embodiments, Step d2) may repeatedly sample the number of balanced cardinalities of the classification sample; d3) use the data value contained in the sample combination to establish the first sample prediction model; d4) repeat steps d2) to d3) until the operation value (t ) To obtain a plurality of sample prediction models, and merge the plurality of sample prediction models to form a first sub prediction model.
在一些实施例中,步骤e还包括下列步骤:eap1)所述数个预测算法分别取得数个第一样本准确度;以及eap2)由一投票模式或一平均模式选择所述数个第一样本准确度的信心指数最高者,作为第一预测结果。In some embodiments, step e further includes the following steps: eap1) each of the plurality of prediction algorithms obtains several first sample accuracy; and eap2) the plurality of first samples are selected by a voting mode or an average mode The one with the highest confidence index for sample accuracy is used as the first prediction result.
在一些实施例中,步骤e还包括下列步骤:e1)比对第1准确度与一已知结果,得一第1准确度指标;且,步骤f还包括下列步骤:f1)依第n准确度与第n准确度指标修改第n特征值与参数设定组。在一些实施例中,准确度指标包含accuracy,其指所有正确预测的样本数/总样本数、AUC(Area Under the receiver operating characteristic Curve),以及MCC(Matthews Correlation Coefficient)。In some embodiments, step e further includes the following steps: e1) comparing the first accuracy with a known result to obtain a first accuracy index; and step f further includes the following steps: f1) is accurate according to the nth Degree and n-th accuracy index modify the n-th eigenvalue and parameter setting group. In some embodiments, the accuracy index includes accuracy, which refers to all correctly predicted samples / total samples, AUC (Area Underlying the Receiver, Operating Characteristic Curve), and MCC (Matthews Correlation, Coefficient).
在一些实施例中,步骤b中,经由多个该转换程序重复比对该数据格式,选择符合的转换程序。In some embodiments, in step b, a plurality of the conversion programs are repeatedly compared to the data format, and a corresponding conversion program is selected.
在一些实施例中,数据格式为csv文件或纯文本档。In some embodiments, the data format is a csv file or a plain text file.
在本发明实施例的以机械学料格式习为基础之优化预测结果的取得方法中,首先,a)由一用户提供一待预测数据,具有一数据格式,并选择一优化预测模型以及欲使用的数个预测算法;b)利用一转换程序,将该待预测数据所属的数据格式转换至一中继格式,取得一格式化原始数据;以及c)将该格式化原始数据所载的数据数值代入该优化预测模型,通过所述数个预测算法取得一优化预测结果以及一优化准确度指标。In the method for obtaining an optimized prediction result based on a mechanical material format convention according to an embodiment of the present invention, first, a) a user provides a data to be predicted, has a data format, and selects an optimized prediction model and the method to be used Several prediction algorithms; b) using a conversion program to convert the data format to which the data to be predicted belongs to a relay format to obtain a formatted raw data; and c) the data value contained in the formatted raw data Substituting into the optimization prediction model, an optimization prediction result and an optimization accuracy index are obtained through the plurality of prediction algorithms.
本发明实施例的以机械学习为基础的优化预测结果的取得系统至少包括一储存单元及一处理单元。储存单元包括具有一数据格式的一待预测数据、一优化预测模型、及复数预测算法。处理单元耦接至储存单元,用以组态来执行下列方法步骤a)选择该优化预测模型及该等预测算法;b)利用一转换程序,将该待预测数据所属的该数据格式转换至一中继格式,取得一格式化原始数据;以及c)将该格式化原始数据所载的数据数值代入该优化预测模型,通过所述数个预测算法取得一优化预测结果以及一优化准确度指标。The system for obtaining an optimized prediction result based on mechanical learning in the embodiment of the present invention includes at least a storage unit and a processing unit. The storage unit includes a data to be predicted having a data format, an optimized prediction model, and a complex prediction algorithm. The processing unit is coupled to the storage unit and configured to perform the following method steps: a) selecting the optimized prediction model and the prediction algorithms; b) using a conversion program to convert the data format to which the data to be predicted belongs to a Relay format to obtain a formatted raw data; and c) substituting the data value contained in the formatted raw data into the optimized prediction model, and obtaining an optimized prediction result and an optimized accuracy index through the plurality of prediction algorithms.
在一些实施例中,步骤a还包括下列步骤:a1)再选择一运算量值;且,步骤c更包含:c1)该格式化原始数据为一第一格式化原始数据,将该第一格式化原始数据所载数据数值代入该优化预测 模型,通过该等预测算法取得一第一预测结果;c2)将一第n格式化待预测数据合并第n预测结果,取得一第n+1格式化待预测数据,重复步骤c1),直至重复次数满足运算量值,提供一第n+1预测结果作为该优化预测结果。In some embodiments, step a further includes the following steps: a1) selecting an operation value; and step c further includes: c1) the formatted raw data is a first formatted raw data, and the first format is The value of the data contained in the original data is substituted into the optimized prediction model, and a first prediction result is obtained through the prediction algorithms; c2) An n-th formatted to-be-predicted data is combined with an n-th prediction result to obtain an n + 1th format To predict the data, repeat step c1) until the number of repetitions satisfies the value of the operation, and provide an n + 1th prediction result as the optimized prediction result.
在一些实施例中,步骤c1还包括下列步骤:c1p1)通过所述数个预测算法取得一第一准确度,比对第一准确度与一已知结果,得一第一准确度指标;且,步骤c2还包括下列步骤:c2p1)提供一第n+1准确度指标作为优化准确度指标。在一些实施例中,准确度指标包含accuracy,AUC以及MCC。In some embodiments, step c1 further includes the following steps: c1p1) obtaining a first accuracy through the plurality of prediction algorithms, comparing the first accuracy with a known result to obtain a first accuracy index; and Step c2 further includes the following steps: c2p1) Provide an n + 1th accuracy index as the optimization accuracy index. In some embodiments, the accuracy indicators include accuracy, AUC, and MCC.
本发明上述方法可以通过程序代码方式存在。当程序代码被机器加载且执行时,机器变成用以实行本发明的装置。The above method of the present invention may exist in a program code manner. When the program code is loaded and executed by a machine, the machine becomes a device for practicing the present invention.
为使本发明的上述目的、特征和优点能更明显易懂,下文特举实施例,并配合所附图示,详细说明如下。In order to make the above-mentioned objects, features, and advantages of the present invention more comprehensible, the following describes the embodiments in detail with the accompanying drawings, as follows.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为显示依据本发明实施例的以机械学习为基础的优化预测模型的建立系统的一示意图;1 is a schematic diagram showing a system for establishing an optimized prediction model based on mechanical learning according to an embodiment of the present invention;
图2为显示依据本发明实施例的以机械学习为基础的优化预测模型的建立方法的一流程图;2 is a flowchart showing a method for establishing an optimized prediction model based on mechanical learning according to an embodiment of the present invention;
图3为显示依据本发明另一实施例的以机械学习为基础的优化预测模型的建立方法的一流程图。3 is a flowchart showing a method for establishing an optimized prediction model based on mechanical learning according to another embodiment of the present invention.
图4为显示依据本发明实施例的自动化特征值筛选与机器学习算法参数优化方法的一流程图;4 is a flowchart showing a method for automatic feature value selection and machine learning algorithm parameter optimization according to an embodiment of the present invention;
图5A和图5B为显示依据本发明实施例的模块化建立预测模型之方法的一流程图;5A and 5B are a flowchart showing a method for modularly establishing a prediction model according to an embodiment of the present invention;
图6A和图6B为显示依据本发明实施例的均衡式数据取样模式与随机森林式预测模型训练方法的一流程图;6A and 6B are flowcharts showing a balanced data sampling mode and a random forest prediction model training method according to an embodiment of the present invention;
图7A与图7B为显示依据本发明实施例的预测准确率优化方法的一流程图;7A and 7B are flowcharts showing a method for optimizing prediction accuracy according to an embodiment of the present invention;
图8为显示依据本发明实施例的以机械学习为基础的优化预测结果的取得系统的一示意图;8 is a schematic diagram showing a system for obtaining an optimal prediction result based on mechanical learning according to an embodiment of the present invention;
图9为显示依据本发明实施例的以机械学习为基础的优化预测结果的取得方法的一流程图;9 is a flowchart showing a method for obtaining an optimized prediction result based on mechanical learning according to an embodiment of the present invention;
图10为显示依据本发明另一实施例的以机械学习为基础的优化预测结果的取得方法的一流程图;10 is a flowchart showing a method for obtaining an optimized prediction result based on mechanical learning according to another embodiment of the present invention;
图11为显示依据本发明实施例之迭代式预测方法的一流程图;11 is a flowchart showing an iterative prediction method according to an embodiment of the present invention;
图12为显示依据本发明实施例之模块化数据预测方法的一流程图;12 is a flowchart showing a method for predicting modular data according to an embodiment of the present invention;
图13为显示依据本发明实施例之随机森林式数据预测方法的一流程图。FIG. 13 is a flowchart showing a random forest type data prediction method according to an embodiment of the present invention.
附图标记说明:1000以机械学习为基础的优化预测模型的建立系统;1100电子装置;1110数据输入单元;1120储存单元;1122训练数据;1124机器学习算法;1130处理单元;S2002、S2004、…、S2010步骤;S3002、S3004a、S3004b、S3004n、…、S3014步骤;S4002、S4004、…、S4012步骤;S5002、S5004、…、S5024步骤;S6002、S6004、…、S6018步骤;C1、C2、Cn类别;S7002、S7004、…、S7018步骤;TRD训练集;TED测试集;8000以机械学习为基础的优化预测结果的取得系统;8100电子装置;8110数据输入单元;8120储存单元;8122待预测资料;8124预测模型;8130处理单元;S9002、S9004、…、S9008步骤;S10002、S10004a、S10004b、S10004n、…、S10012步骤;S11002、S11004、…、S11016步骤;S12002、S12004、…、S12016步骤;S13002、S13004、…、S13014步骤。DESCRIPTION OF SYMBOLS: 1000 establishment system of optimized prediction model based on mechanical learning; 1100 electronic device; 1110 data input unit; 1120 storage unit; 1122 training data; 1124 machine learning algorithm; 1130 processing unit; S2002, S2004, ... , S2010 steps; S3002, S3004a, S3004b, S3004n, ..., S3014 steps; S4002, S4004, ..., S4012 steps; S5002, S5004, ..., S5024 steps; S6002, S6004, ..., S6018 steps; C1, C2, Cn categories ; S7002, S7004, ..., S7018 steps; TRD training set; TED test set; 8000 system for obtaining optimized prediction results based on mechanical learning; 8100 electronic device; 8110 data input unit; 8120 storage unit; 8122 data to be predicted; 8124 prediction model; 8130 processing unit; S9002, S9004, ..., S9008 steps; S10002, S10004a, S10004b, S10004n, ..., S10012 steps; S11002, S11004, ..., S11016 steps; S12002, S12004, ..., S12016 steps; S13002, Steps S13004, ..., S13014.
具体实施方式detailed description
图1显示依据本发明实施例的以机械学习为基础的优化预测模型的建立系统1000。依据本发明实施例的以机械学习为基础的优化预测模型的建立系统1000可以适用于一电子装置1100,如单核或多核计算设备,且可为单机环境或丛集式环境。电子装置1100包括一数据输入单元1110,一储存单元1120、与一处理单元1130。数据输入单元1110可以用以接收多个训练数据。储存单元1120可以储存数据输入单元1110接收的训练数据1122、及多个机器学习算法1124。值得注意的是,在 一些实施例中,该数据格式为csv文件或纯文本档。此外,系统可以通过数据输入单元1110接收一进阶系统配置,用以进行系统的设定,如随机森林的规模大小、或设定预测结果的投票机制与各算法的细部参数。处理单元1130可以控制电子装置1100中相关软件与硬件的作业,并进行本发明的以机械学习为基础的优化预测模型的建立方法,其细节将在下文进行说明。FIG. 1 shows a system 1000 for building an optimized prediction model based on mechanical learning according to an embodiment of the present invention. The system 1000 for building an optimization prediction model based on mechanical learning according to an embodiment of the present invention may be applied to an electronic device 1100, such as a single-core or multi-core computing device, and may be a stand-alone environment or a cluster environment. The electronic device 1100 includes a data input unit 1110, a storage unit 1120, and a processing unit 1130. The data input unit 1110 may be used to receive a plurality of training data. The storage unit 1120 may store the training data 1122 received by the data input unit 1110 and a plurality of machine learning algorithms 1124. It is worth noting that, in some embodiments, the data format is a csv file or a plain text file. In addition, the system can receive an advanced system configuration through the data input unit 1110 for system settings, such as the size of the random forest, or the voting mechanism for setting prediction results and detailed parameters of each algorithm. The processing unit 1130 can control the related software and hardware operations in the electronic device 1100 and perform the method for establishing an optimized prediction model based on mechanical learning of the present invention, the details of which will be described below.
图2显示了依据本发明实施例的以机械学习为基础的优化预测模型的建立方法。依据本发明实施例的以机械学习为基础的优化预测模型的建立方法适用于如图1图所示的电子装置。FIG. 2 shows a method for establishing an optimized prediction model based on mechanical learning according to an embodiment of the present invention. The method for establishing an optimized prediction model based on mechanical learning according to an embodiment of the present invention is applicable to the electronic device shown in FIG. 1.
首先,如步骤S2002,接收使用者输入的多个训练数据、及选定的至少一机器学习算法。值得注意的是,在一些实施例中,还可以接收一进阶系统配置,用以进行系统的设定。接着,如步骤S2004,将接收的训练数据统一转换为本系统之一中继格式。需要说明的是,接收的训练数据可以具有不同的数据格式。在步骤S2004中,不同格式的训练数据将被分别转换为中继格式,以进行后续处理。之后,如步骤S2006,进行一算法M,用以进行自动化特征值筛选与机器学习算法参数优化。如步骤S2008,进行一算法O,用以进行迭代式预测模型优化。最后,如步骤S2010,输出一预测模型与相应的准确度评估数据。算法M及算法O将于下文详细说明。First, in step S2002, a plurality of training data input by a user and at least one selected machine learning algorithm are received. It is worth noting that, in some embodiments, an advanced system configuration can also be received for system setting. Next, in step S2004, the received training data is uniformly converted into a relay format of the system. It should be noted that the received training data may have different data formats. In step S2004, the training data in different formats are respectively converted into a relay format for subsequent processing. Then, in step S2006, an algorithm M is performed to perform automatic feature value screening and machine learning algorithm parameter optimization. In step S2008, an algorithm O is performed to optimize the iterative prediction model. Finally, in step S2010, a prediction model and corresponding accuracy evaluation data are output. Algorithm M and Algorithm O will be described in detail below.
图3显示了依据本发明另一实施例的以机械学习为基础的优化预测模型的建立方法。依据本发明实施例的以机械学习为基础的优化预测模型的建立方法适用于如图1所示的电子装置。FIG. 3 shows a method for establishing an optimized prediction model based on mechanical learning according to another embodiment of the present invention. The method for establishing an optimized prediction model based on mechanical learning according to an embodiment of the present invention is applicable to the electronic device shown in FIG. 1.
首先,如步骤S3002,接收使用者输入的多个训练数据、及选定的至少一机器学习算法。类似地,在一些实施例中,还可以接收一进阶系统配置,用以进行系统的设定。接着,如步骤S3004a、S3004b、…、S3004n,藉由相应不同格式的转换程序将不同格式的训练数据统一转换为本系统的一中继格式,且如步骤S3006,产出具有中继格式的训练数据,称作“格式化原始数据”。接着,如步骤S3008,将相应训练数据的特征值与选定的机器学习算法的可调适参数结合,是控制各演算的具体行为,如类神经算法(artificial neural network)之层(layer)数与各层的节点(node)数,以成为“特征值与参数设定组”。之后,如步骤S3010,进行一算法M,用以进行自动化特征值筛选与机器学习算法参数优化。如步骤S3012,进行一算法O,用于进行迭代式预测模型优化。最后,如步骤S3014,输出一预测模型与相应的准确度评估数据。类似地,算法M及算法O将于下文详细说明。First, in step S3002, a plurality of training data input by a user and at least one selected machine learning algorithm are received. Similarly, in some embodiments, an advanced system configuration may also be received for system settings. Then, according to steps S3004a, S3004b, ..., S3004n, the training data in different formats are uniformly converted into a relay format of the system by corresponding conversion procedures of different formats, and in step S3006, training with a relay format is output. Data, called "formatted raw data." Next, in step S3008, combining the characteristic values of the corresponding training data with the adjustable parameters of the selected machine learning algorithm is to control the specific behavior of each calculus, such as the number of layers of the artificial neural network and The number of nodes in each layer becomes a "characteristic value and parameter setting group". Then, in step S3010, an algorithm M is performed to perform automatic feature value screening and machine learning algorithm parameter optimization. In step S3012, an algorithm O is performed to optimize the iterative prediction model. Finally, in step S3014, a prediction model and corresponding accuracy evaluation data are output. Similarly, algorithm M and algorithm O will be described in detail below.
图4显示了依据本发明实施例的自动化特征值筛选与机器学习算法参数优化方法(算法M)。在此实施例中,可以依据自动化程序来进行“特征值筛选”以及“算法参数优化”。FIG. 4 shows an automatic eigenvalue screening and machine learning algorithm parameter optimization method (Algorithm M) according to an embodiment of the present invention. In this embodiment, "characteristic value screening" and "algorithm parameter optimization" can be performed according to an automated program.
如步骤S4002,取得“特征值与参数设定组”。如步骤S4004,程序化地筛选特征值并调整各算法的参数。换言之,程序化调整“特征值与参数设定组”,并如步骤S4006,进行算法T,用以依据“特征值与参数设定组”建立预测模型并测试准确度。值得注意的是,在一些实施例中,步骤S4004可以是单纯的随机筛选与调整。在一些实施例中,步骤S4004可以使用蒙地卡罗算法、基因算法与/或其衍生算法等来进行。算法T将于后进行说明。之后,如步骤S4008,将“特征值与参数设定组”及相应的准确度数据进行暂存。如步骤S4010,判断准确度数据是否已经达到一特定标准或一循环次数已经达到一上限。提醒的是,特定标准或循环次数可以是系统默认或由一用户设定。当准确度数据并未达到特定标准或循环次数并未达到上限时(步骤S4010的“否”),如步骤S4014,循环次数加1,且流程回到步骤S4004。当准确度数据达到特定标准或循环次数达到上限时(步骤S4010的“是”),如步骤S4012,将暂存“特征值与参数设定组”和/或相应的准确度数据输出。In step S4002, a "characteristic value and parameter setting group" is obtained. In step S4004, the characteristic values are screened programmatically and the parameters of each algorithm are adjusted. In other words, the “feature value and parameter setting group” is adjusted programmatically, and an algorithm T is performed in step S4006 to establish a prediction model and test the accuracy according to the “feature value and parameter setting group”. It is worth noting that, in some embodiments, step S4004 may be a simple random screening and adjustment. In some embodiments, step S4004 may be performed using a Monte Carlo algorithm, a genetic algorithm, and / or a derivative algorithm thereof. The algorithm T will be described later. After that, in step S4008, the "feature value and parameter setting group" and the corresponding accuracy data are temporarily stored. In step S4010, it is determined whether the accuracy data has reached a specific standard or a cycle number has reached an upper limit. It is reminded that the specific standard or number of cycles can be defaulted by the system or set by a user. When the accuracy data does not reach a specific standard or the number of cycles does not reach the upper limit (NO in step S4010), as in step S4014, the number of cycles is increased by 1, and the flow returns to step S4004. When the accuracy data reaches a certain standard or the number of cycles reaches the upper limit (YES in step S4010), as in step S4012, the temporarily stored "characteristic value and parameter setting group" and / or the corresponding accuracy data are output.
图5A和图5B显示了依据本发明实施例的模块化建立预测模型的方法(算法T)。在此实施例中,可以通过模块化程序来建立预测模型。5A and 5B show a method (algorithm T) for modularly building a prediction model according to an embodiment of the present invention. In this embodiment, a predictive model can be established by a modular program.
如步骤S5002,取得训练数据与“特征值与参数设定组”。如步骤S5004,判断是否要求进行测试。当要求进行测试时(步骤S5004的是),如步骤S5006,将训练资料分割为“训练集TRD”与“测试集TED”。值得注意的是,步骤S5006可以通过不同方式实作。在一些实施例中,分割方法可以依据N-fold cross validations、随机分组、或结合N-fold cross validations与随机分组的方式。需要注意的是,上述分割方法仅为发明之例子,本发明并未限定于此。如步骤S5008a、S5008b、…、S5008n,将训练集TRD投入一模块化程序,以选定的机器学习算法建立属于各方法的预测模型。值 得注意的是,算法AT用以实作上述模块化程序,其细节将下文说明。之后,如步骤S5010,整并所有机器学习算法的预测模型,并如步骤S5012,依据测试集TED,以整并后的预测模型进行一准确度测试。值得注意的是,算法P系用以实作上述准确度测试,其细节将下文说明。接着,如步骤S5014,判断是否所有训练数据都已经用于建立模型及准确度测试或一循环次数已经达到一测试次数。提醒的是,测试次数可以是系统默认或由一用户设定。当所有训练数据并未都已经用于建立模型及准确度测试或循环次数并未达到测试次数时(步骤S5014的“否”),如步骤S5016,循环次数加1,且流程回到步骤S5006。当所有训练数据都已经用于建立模型及准确度测试或循环次数已经达到测试次数时(步骤S5014的“是”),如步骤S5018,统计并输出一预测准确度,并如步骤S5024,输出整并后的预测模型。当并未要求进行测试时(步骤S5004的“否”),如步骤S5020a、S5020b、…、S5020n,将所有的格式化原始数据FOD投入一模块化程序,以选定的机器学习算法建立属于各方法的预测模型。类似地,算法AT作为上述模块化程序,其细节将于下文说明。之后,如步骤S5022,整并所有机器学习算法的预测模型,并如步骤S5024,输出整并后的预测模型。In step S5002, training data and a "feature value and parameter setting group" are obtained. In step S5004, it is determined whether a test is required. When a test is required (YES in step S5004), in step S5006, the training data is divided into a "training set TRD" and a "testing set TED". It is worth noting that step S5006 can be implemented in different ways. In some embodiments, the segmentation method may be based on N-fold cross validations, random grouping, or a combination of N-fold cross validations and random grouping. It should be noted that the above segmentation method is only an example of the invention, and the present invention is not limited thereto. In steps S5008a, S5008b, ..., S5008n, the training set TRD is put into a modular program, and a prediction model belonging to each method is established with a selected machine learning algorithm. It is worth noting that the algorithm AT is used to implement the above-mentioned modular program, the details of which will be described below. After that, in step S5010, the prediction models of all the machine learning algorithms are integrated, and in step S5012, an accuracy test is performed on the integrated prediction model according to the test set TED. It is worth noting that the algorithm P is used to implement the accuracy test described above, and its details will be described below. Next, in step S5014, it is determined whether all the training data have been used for model establishment and accuracy testing or the number of cycles has reached a number of tests. It is reminded that the number of tests may be defaulted by the system or set by a user. When all the training data has not been used to build the model and the accuracy test or the number of cycles has not reached the number of tests (NO in step S5014), such as step S5016, the number of cycles is increased by 1, and the flow returns to step S5006. When all the training data has been used to build the model and the accuracy test or the number of cycles has reached the number of tests (YES in step S5014), as in step S5018, a prediction accuracy is counted and output, and as in step S5024, the entire output is output. And after the prediction model. When the test is not required ("No" in step S5004), such as steps S5020a, S5020b, ..., S5020n, all the formatted raw data FOD is put into a modular program, and the selected machine learning algorithm is used to establish the The predictive model of the method. Similarly, the algorithm AT is used as the above-mentioned modular program, and its details will be described later. Then, as in step S5022, the prediction models of all the machine learning algorithms are merged, and in step S5024, the integrated prediction model is output.
图6A与图6B图显示了依据本发明实施例的均衡式数据取样模式与随机森林式预测模型训练方法(算法AT)。在此实施例中,可以有效降低预测系统的“偏好”与“过度适应”的程度。FIG. 6A and FIG. 6B illustrate an equalized data sampling mode and a random forest prediction model training method (algorithm AT) according to an embodiment of the present invention. In this embodiment, the degree of "preference" and "over-adaptation" of the prediction system can be effectively reduced.
如步骤S6002,取得训练资料、一取样次数t、一分类样本数平衡基数n。值得注意的是,在此程序中将采样t次且建立t个子预测模型。如步骤S6004,将训练数据依照已知类别分组,以产生类别1、类别2、…、类别n(C1、C2、…、Cn)。举例来说,已知正确答案有4类:心脏病、糖尿病、痛风、无上述疾病,则可以将训练数据按正确答案分为4组。如步骤S6006,初始设定循环次数s为0(s=0),且如步骤S6008,将循环次数s加1(s=s+1)。如步骤S6010,以随机且可重复的方式,自每一组别取出n笔数据,以共同组成一份样本s,并如步骤S6012,利用上述所得样本s,建立一子预测模型s。如步骤S6014,判断循环次数s是否小于t。当循环次数s小于t时(步骤S6014的“是”),流程回到步骤S6008。当循环次数s不小于t时(步骤S6014的“否”),如步骤S6016,整并以上所得共t个子预测模型为最终随机森林式预测模型,并如步骤S6018,将最终随机森林式预测模型输出。In step S6002, training data, a sampling number t, and a balanced sample number n are obtained. It is worth noting that in this procedure t samples will be sampled and t sub-prediction models will be established. In step S6004, the training data is grouped according to a known category to generate a category 1, a category 2, ..., a category n (C1, C2, ..., Cn). For example, it is known that there are 4 types of correct answers: heart disease, diabetes, gout, and none of the above diseases, and the training data can be divided into 4 groups according to the correct answers. In step S6006, the number of cycles s is initially set to 0 (s = 0), and in step S6008, the number of cycles s is increased by 1 (s = s + 1). In step S6010, in a random and repeatable manner, n pieces of data are taken from each group to form a sample s together. In step S6012, a sample prediction model s is established by using the obtained sample s. In step S6014, it is determined whether the number of cycles s is less than t. When the number of loops s is less than t (YES in step S6014), the flow returns to step S6008. When the number of cycles s is not less than t (NO in step S6014), as in step S6016, the t sub-prediction models obtained by combining the above are the final random forest type prediction model, and as in step S6018, the final random forest type prediction model Output.
图7A与图7B显示了依据本发明实施例的预测准确率优化方法(算法O)。在此实施例中,可以以自动化程序进行“迭代式预测模型优化”。FIG. 7A and FIG. 7B show a prediction accuracy optimization method (Algorithm O) according to an embodiment of the present invention. In this embodiment, “iterative prediction model optimization” can be performed with an automated program.
如步骤S7002,取得训练数据,并将训练数据分割为“训练集TRD”与“测试集TED”。如步骤S7004,取得最新一代的“特征值与参数设定组”。如步骤S7006,依据测试集TED与图5实施例的算法T建立预测模型、算出预测结果,如机率值与/或信心指标,并测试准确度。接着,如步骤S7008,整合“特征值与参数设定组”及步骤S7006的预测结果,构成新一代的“特征值与参数设定组”。换言之,预测所得数据可以当作新的特征值加入“特征值与参数设定组”中。如步骤S7010,将最新一代的“特征值与参数设定组”及其准确度数据暂存。之后,如步骤S7012,将已完成的代数加1,并如步骤S7014,判断准确度数据是否已经达到一特定标准或一循环次数已经达到一代数上限。提醒的是,特定标准或代数上限可以是系统默认或由一用户设定。当准确度数据并未达到特定标准或循环次数并未达到代数上限时(步骤S7014的“否”),如步骤S7016,循环次数加1,且流程回到步骤S7004。当准确度数据达到特定标准或循环次数达到代数上限时(步骤S7014的“是”),如步骤S7018,输出当前准确度最高的“特征值与参数设定组”。In step S7002, training data is obtained, and the training data is divided into "training set TRD" and "testing set TED". In step S7004, the latest generation of "characteristic value and parameter setting group" is obtained. In step S7006, a prediction model is established according to the test set TED and the algorithm T in the embodiment of FIG. 5, a prediction result such as a probability value and / or a confidence index is calculated, and the accuracy is tested. Next, in step S7008, the "feature value and parameter setting group" and the prediction result in step S7006 are integrated to form a new generation of "feature value and parameter setting group". In other words, the predicted data can be added as a new feature value to the "feature value and parameter setting group". In step S7010, the latest generation of "characteristic value and parameter setting group" and its accuracy data are temporarily stored. After that, if step S7012 is performed, the completed algebra is incremented, and if step S7014, it is judged whether the accuracy data has reached a specific standard or the number of cycles has reached the upper limit of the number of generations. It is reminded that a specific standard or algebraic upper limit may be a system default or set by a user. When the accuracy data does not reach a specific standard or the number of cycles does not reach the upper limit of the algebra (NO in step S7014), as in step S7016, the number of cycles is increased by 1, and the flow returns to step S7004. When the accuracy data reaches a certain standard or the number of cycles reaches the upper limit of algebra (YES in step S7014), as in step S7018, the "feature value and parameter setting group" with the highest current accuracy is output.
必须说明的是,在一些实施例中,算法M与算法O可以实作为上下游两步骤,如图3实施例所示。在一些实施例中,算法M与算法O亦可以彼此包覆之方式整合实作为一个步骤,例如将算法O中所使用的算法T步骤置换为算法M,或将算法M中所使用的算法T步骤置换为算法O。It must be noted that in some embodiments, the algorithm M and the algorithm O may be implemented as two steps of upstream and downstream, as shown in the embodiment of FIG. 3. In some embodiments, the algorithm M and the algorithm O can also be integrated as a step by covering each other. For example, the algorithm T used in the algorithm O is replaced with the algorithm M, or the algorithm T used in the algorithm M is replaced. Steps are replaced by algorithm O.
图8显示了依据本发明实施例的以机械学习为基础的优化预测结果的取得系统。依据本发明实施例的以机械学习为基础之优化预测结果的取得系统8000可以适用于一电子装置8100,如单核或多核计算设备,且可为单机环境或丛集式环境。电子装置8100包括一数据输入单元8110,一储存单元8120、与一处理单元8130。数据输入单元8110可以用以接收一待预测数据。储存单元8120可 以储存数据输入单元8110接收的待预测资料8122、及一预测模型8124。值得注意的是,在一些实施例中,系统可以通过数据输入单元8110接收一进阶系统配置,用以进行系统的设定。处理单元8130可以控制电子装置8100中相关软件与硬件的作业,并进行本发明的以机械学习为基础的优化预测结果的取得方法,其细节将下文进行说明。FIG. 8 shows a system for obtaining an optimized prediction result based on mechanical learning according to an embodiment of the present invention. The system 8000 for obtaining an optimized prediction result based on mechanical learning according to an embodiment of the present invention may be applicable to an electronic device 8100, such as a single-core or multi-core computing device, and may be a single-machine environment or a cluster environment. The electronic device 8100 includes a data input unit 8110, a storage unit 8120, and a processing unit 8130. The data input unit 8110 may be used to receive a data to be predicted. The storage unit 8120 may store the to-be-predicted data 8122 and the prediction model 8124 received by the data input unit 8110. It is worth noting that, in some embodiments, the system may receive an advanced system configuration through the data input unit 8110 for setting the system. The processing unit 8130 can control the related software and hardware operations in the electronic device 8100, and perform the method for obtaining an optimized prediction result based on mechanical learning of the present invention, the details of which will be described below.
图9显示了依据本发明实施例的以机械学习为基础的优化预测结果的取得方法。依据本发明实施例的以机械学习为基础的优化预测结果的取得方法适用于如图8所示的电子装置。FIG. 9 shows a method for obtaining an optimized prediction result based on mechanical learning according to an embodiment of the present invention. The method for obtaining an optimized prediction result based on mechanical learning according to an embodiment of the present invention is applicable to the electronic device shown in FIG. 8.
首先,如步骤S9002,接收待预测的数据、与一预测模型。需要提醒的是,在一些实施例中,预测模型可以依据图2或图3的实施例所产生。值得注意的是,在一些实施例中,还可以接收一进阶系统配置,用以进行系统的设定。如步骤S9004,将待预测数据转换为本系统的一中继格式。需要说明的是,接收的待预测数据可以具有不同的数据格式。在步骤S9004中,不同格式的待预测数据将被分别转换为中继格式,以进行后续处理。之后,如步骤S9006,进行一算法IP,用以自动化程序进行“迭代式预测”,并如步骤S9008,输出预测结果与准确度评测数据。算法IP将于后进行说明。First, in step S9002, data to be predicted and a prediction model are received. It should be reminded that, in some embodiments, the prediction model may be generated according to the embodiment of FIG. 2 or FIG. 3. It is worth noting that, in some embodiments, an advanced system configuration can also be received for system setting. In step S9004, the data to be predicted is converted into a relay format of the system. It should be noted that the received to-be-predicted data may have different data formats. In step S9004, the data to be predicted in different formats are respectively converted into a relay format for subsequent processing. After that, in step S9006, an algorithm IP is performed for the automated program to perform "iterative prediction", and in step S9008, the prediction result and accuracy evaluation data are output. The algorithm IP will be described later.
图10显示了依据本发明另一实施例的以机械学习为基础的优化预测结果的取得方法。依据本发明实施例的以机械学习为基础的优化预测结果的取得方法适用于如图8所示的电子装置。FIG. 10 shows a method for obtaining an optimized prediction result based on mechanical learning according to another embodiment of the present invention. The method for obtaining an optimized prediction result based on mechanical learning according to an embodiment of the present invention is applicable to the electronic device shown in FIG. 8.
首先,如步骤S10002,接收用户输入的待预测数据、与一预测模型。提醒的是,在一些实施例中,预测模型可以依据图2或图3的实施例所产生。值得注意的是,在一些实施例中,还可以接收一进阶系统配置,用以进行系统的设定。接着,如步骤S10004a、S10004b、…、S10004n,藉由相应不同格式的转换程序将不同格式的待预测数据统一转换为本系统的一中继格式,且如步骤S10006,产出具有中继格式的待预测数据,称作“格式化待预测数据”。接着,如步骤S10008,确认预测模型之内容,并进行算法配适作业。之后,如步骤S10010,进行一算法IP,用以自动化程序进行“迭代式预测”,并如步骤S10012,输出预测结果与准确度评测数据。算法IP将于后进行说明。First, in step S10002, data to be predicted input by a user and a prediction model are received. It is reminded that, in some embodiments, the prediction model may be generated according to the embodiment of FIG. 2 or FIG. 3. It is worth noting that, in some embodiments, an advanced system configuration can also be received for system setting. Then, according to steps S10004a, S10004b, ..., S10004n, the data to be predicted in different formats are uniformly converted into a relay format of the system through corresponding different format conversion procedures, and in step S10006, the The data to be predicted is called "formatted data to be predicted". Next, in step S10008, the content of the prediction model is confirmed, and an algorithm adaptation operation is performed. After that, in step S10010, an algorithm IP is performed for the automated program to perform "iterative prediction", and in step S10012, the prediction result and accuracy evaluation data are output. The algorithm IP will be described later.
图11显示了依据本发明实施例之迭代式预测方法(算法IP)。FIG. 11 shows an iterative prediction method (algorithm IP) according to an embodiment of the present invention.
如步骤S11002,取得待预测资料与一迭代式预测模型。如步骤S11004,解析迭代式预测模型中所包含的总代数(g)。如步骤S11006,取得最新一代的“待预测资料”,并如步骤S11008,依据“待预测资料”取得一预测结果。值得注意的是,在一些实施例中,可以将当前代数的“待预测数据”投入一算法P来进行预测,以得到预测结果。算法P将于下文进行说明。需要说明的是,预测所用的模型是撷取自上述迭代式预测模型,且须与当前数据代数相匹配。之后,如步骤S10010,令迭代数减1(g=g-1)。如步骤S11012,判断g是否大于0(g>0)。当g大于0时(步骤S11012的“是”),如步骤S11014,将步骤S11008中所得的预测结果当作特征值整并入当前代数的待预测资料,成为新一代的待预测资料。之后,流程回到步骤S11006。当g并未大于0时(步骤S11012的“否”),换言之,迭代式预测模型中的每一代模型都依序被使用完毕时,如步骤S11016,输出预测结果。In step S11002, the data to be predicted and an iterative prediction model are obtained. In step S11004, the total algebra (g) included in the iterative prediction model is analyzed. In step S11006, the latest generation of "data to be predicted" is obtained, and in step S11008, a prediction result is obtained according to the "data to be predicted". It is worth noting that, in some embodiments, the "data to be predicted" of the current algebra can be input into an algorithm P to perform prediction to obtain a prediction result. The algorithm P will be described below. It should be noted that the model used for prediction is extracted from the above iterative prediction model and must match the current data algebra. After that, as in step S10010, the number of iterations is decreased by 1 (g = g-1). In step S11012, it is determined whether g is greater than 0 (g> 0). When g is greater than 0 (YES in step S11012), as in step S11014, the prediction result obtained in step S11008 is integrated into the to-be-predicted data of the current generation as feature values, and becomes the new-generation of to-be-predicted data. After that, the flow returns to step S11006. When g is not greater than 0 (NO in step S11012), in other words, each generation model in the iterative prediction model is used up sequentially, as in step S11016, the prediction result is output.
图12显示了依据本发明实施例的模块化数据预测方法(算法P)。FIG. 12 shows a modular data prediction method (algorithm P) according to an embodiment of the present invention.
如步骤S12002,取得待预测数据、与一预测模型。值得注意的是,在一些实施例中,还可以接收相应待预测数据的已知结果。如步骤S12004,依据预测模型来配适每一机器学习算法,并如步骤S12006a、S12006b、…、S12006n,将待预测数据投入一模块化程序,以最初选定的各机器学习方法进行预测。在一些实施例中,模块化程序可以利用一算法AP执行。算法AP将于后说明。如步骤S12008,整并所有机器学习算法的预测结果。值得注意的是,在一些实施例中,整并方式可以是将所有用到的机器学习算法对同一笔数据的预测数据取平均值。如步骤S12010,判断是否有已知结果可验证预测准确度且有要求做验证。当并未有已知结果可验证预测准确度且并未有要求做验证时(步骤S12010的“否”),如步骤S12012,输出预测结果。当有已知结果可验证预测准确度且有要求做验证时(步骤S12010的“是”),如步骤S12014,比对预测结果与已知结果,并计算各类准确度指标,并如步骤S12016,输出预测结果与/或各类准确度指标。在一些实施例中,该准确度指标包含accuracy,AUC以及MCC。In step S12002, data to be predicted and a prediction model are obtained. It is worth noting that, in some embodiments, a known result of the corresponding to-be-predicted data may also be received. In step S12004, each machine learning algorithm is adapted according to the prediction model, and in steps S12006a, S12006b, ..., S12006n, the data to be predicted is put into a modular program, and prediction is performed with each of the machine learning methods selected initially. In some embodiments, the modular program may be executed using an algorithm AP. The algorithm AP will be explained later. In step S12008, the prediction results of all the machine learning algorithms are merged. It is worth noting that, in some embodiments, the merging method may be to average the prediction data of all the used machine learning algorithms on the same piece of data. In step S12010, it is determined whether there is a known result to verify the prediction accuracy and it is required to perform verification. When there is no known result to verify the prediction accuracy and no verification is required (NO in step S12010), as in step S12012, the prediction result is output. When there is a known result to verify the accuracy of the prediction and a verification is required ("Yes" in step S12010), as in step S12014, the prediction result is compared with the known result, and various accuracy indicators are calculated, as in step S12016 , Output prediction results and / or various accuracy indicators. In some embodiments, the accuracy index includes accuracy, AUC, and MCC.
图13显示了依据本发明实施例的随机森林式数据预测方法(算法AP)。在此实施例中,可以进行随机森林式数据预测。藉由算法AP与算法AT,可有效降低预测系统的“偏好”与“过度适应”程度。FIG. 13 shows a random forest type data prediction method (algorithm AP) according to an embodiment of the present invention. In this embodiment, random forest-type data prediction can be performed. With the algorithm AP and algorithm AT, the "preference" and "over-adaptation" of the prediction system can be effectively reduced.
如步骤S13002,取得待预测资料与一随机森林式预测模型,且根据随机森林式预测模型中的设定配置要使用的机器学习方法。如步骤S13004,将待预测数据导入相应随机森林式预测模型中所有子模型的子预测程序,并如步骤S13006a、S13006b、…、S13006t,依据待预测数据使用随机森林式预测模型中的个别子预测程序来进行预测,从而得到预测结果与机率值。假设预测模型中有t个子模型,则子预测程序共有t个。如步骤S13008,判断预测结果统合模式为投票模式或平均值模式。当预测结果统合模式为投票模式时,如步骤S13010,对每一笔待预测数据结算各类别获得多少子预测程序的支持。其中,得票数最高之类别即为预测结果,且各类别的得票数比例即为其信心指数。之后,如步骤S13014,输出预测结果与信心指标。当预测结果统合模式为平均值模式时,如步骤S13012,对每一笔待预测数据结算各子预测程序在各类别的信心指数。其中,各类别信心指数即为所有子程序于该类别的机率值平均,且信心指数最高的类别即为预测结果。之后,如步骤S13014,输出预测结果与信心指标。In step S13002, the data to be predicted and a random forest type prediction model are obtained, and the machine learning method to be used is configured according to the settings in the random forest type prediction model. In step S13004, the data to be predicted are imported into the sub-prediction programs of all the sub-models in the corresponding random forest prediction model, and in steps S13006a, S13006b, ..., S13006t, the individual sub-prediction in the random forest prediction model is used according to the data to be predicted. Program to make predictions, so as to get prediction results and probability values. Assuming that there are t sub-models in the prediction model, there are t sub-prediction programs. In step S13008, it is determined that the prediction result integration mode is a voting mode or an average mode. When the prediction result integration mode is a voting mode, as in step S13010, how many sub-prediction programs are supported for each category of settlement of each to-be-predicted data. Among them, the category with the highest number of votes is the forecast result, and the percentage of votes obtained by each category is its confidence index. After that, in step S13014, the prediction result and the confidence index are output. When the prediction result integration mode is an average mode, as in step S13012, the confidence index of each sub-prediction program in each category is settled for each piece of data to be predicted. Among them, the confidence index of each category is the average probability value of all subroutines in that category, and the category with the highest confidence index is the predicted result. After that, in step S13014, the prediction result and the confidence index are output.
因此,通过本发明的以机械学习为基础的优化预测模型的建立与预测结果的取得系统及方法,可以以自动化与模块化的设计来进行机器学习的模型训练与预测,从而得到更具效率的机器学习训练程序,及更具准确度的预测结果。Therefore, through the establishment of an optimized prediction model based on machine learning and the system and method for obtaining prediction results of the present invention, it is possible to perform machine learning model training and prediction with an automated and modular design, thereby obtaining a more efficient Machine learning training programs, and more accurate prediction results.
本发明之方法,或特定型态或其部份,可以以程序代码的型态存在。程序代码可以包含于实体媒体,如软盘、光盘片、硬盘、或是任何其他机器可读取(如计算机可读取)储存媒体,亦或不限于外在形式的计算机程序产品,其中,当程序代码被机器,如计算机加载且执行时,此机器变成用以参与本发明的装置。程序代码也可以通过一些传送媒体,如电线或电缆、光纤、或是任何传输型态进行传送,其中,当程序代码被机器,如计算机接收、加载且执行时,此机器变成用以参与本发明的装置。当在一般用途处理单元实作时,程序代码结合处理单元提供一操作类似于应用特定逻辑电路的独特装置。The method of the present invention, or a specific type or part thereof, may exist in the form of program code. The program code may be contained in a physical medium, such as a floppy disk, a compact disc, a hard disk, or any other machine-readable (such as computer-readable) storage medium, or is not limited to an external form of computer program product. When code is loaded and executed by a machine, such as a computer, this machine becomes a device for participating in the present invention. The program code can also be transmitted through some transmission media, such as wires or cables, optical fibers, or any transmission type. Where the program code is received, loaded, and executed by a machine, such as a computer, the machine becomes used to participate in the program. Invented device. When implemented in a general-purpose processing unit, the program code in combination with the processing unit provides a unique device that operates similarly to application-specific logic circuits.
虽然本发明已以较佳实施例进行了如上说明,然而这些说明并非用来限定本发明,任何本领域技术人员在不脱离本发明的精神和范围内,可对其做些许更动与润饰,因此本发明的保护范围应当以所附权利要求所界定的为准。Although the present invention has been described as above with preferred embodiments, these descriptions are not intended to limit the present invention. Any person skilled in the art can make some modifications and retouching to the present invention without departing from the spirit and scope of the present invention. Therefore, the protection scope of the present invention shall be defined by the appended claims.

Claims (15)

  1. 一种以机械学习为基础的优化预测模型的建立方法,包括下列步骤:A method for establishing an optimized prediction model based on mechanical learning, including the following steps:
    a)由一用户提供一训练数据,具有一数据格式,并选择欲使用的数个机械学习算法、一运算量值以及一目标预测值;a) A user provides training data with a data format, and selects several mechanical learning algorithms to be used, a calculation value, and a target prediction value;
    b)利用一转换程序,将训练数据所属的数据格式转换至一中继格式,取得一格式化原始数据,并以一第一特征值与参数设定组设定所述数个机械学习算法;b) using a conversion program to convert the data format to which the training data belongs to a relay format, obtain a formatted raw data, and set the plurality of mechanical learning algorithms with a first feature value and a parameter setting group;
    c)将该格式化原始数据的数据数值分为一子训练集与一子测试集;c) The data value of the formatted original data is divided into a sub-training set and a sub-test set;
    d)通过所述数个机械学习算法与子训练集所载数据数值建立一第一子预测模型;d) establishing a first sub-prediction model by using the plurality of mechanical learning algorithms and data values contained in the sub-training set;
    e)将子测试集所载数据数值代入第一子预测模型,通过数个预测算法取得一第一准确度;e) Substituting the data values contained in the sub-test set into the first sub-prediction model, and obtaining a first accuracy through several prediction algorithms;
    f)若格式化原始数据的数据数值均曾作为子训练集及子测试集,或重复次数满足该运算量值,依该第n准确度修改该第n特征值与参数设定组,取得一第n+1特征值与参数设定组,反之,重复步骤c)至e);f) If the data values of the formatted original data have been used as the sub-training set and sub-test set, or the number of repetitions satisfies the value of the operation, modify the n-th feature value and parameter setting group according to the n-th accuracy to obtain a N + 1th eigenvalue and parameter setting group, otherwise, repeat steps c) to e);
    g)以第n特征值与参数设定组重设所述数个机械学习算法,通过所述数个机械学习算法与格式化原始数据所载数据数值建立一第一预测模型;g) resetting the plurality of mechanical learning algorithms with the nth feature value and the parameter setting group, and establishing a first prediction model by using the plurality of mechanical learning algorithms and formatting the data values contained in the original data;
    h)若第n准确度满足该目标预测值或重复次数满足该运算量值,提供一第n预测模型作为一优化预测模型,反之,依该准确度修改该n特征值与参数设定组,取得一第n+1特征值与参数设定组设定所述数个机械学习算法,重复步骤c)至e);以及h) If the n-th accuracy meets the target prediction value or the number of repetitions satisfies the calculation value, provide an n-th prediction model as an optimized prediction model; otherwise, modify the n-characteristic value and parameter setting group according to the accuracy, Obtaining an n + 1th feature value and a parameter setting group to set the plurality of mechanical learning algorithms, and repeating steps c) to e); and
    i)显示优化预测模型与第n准确度。i) Display the optimized prediction model and the n-th accuracy.
  2. 根据权利要求1所述的以机械学习为基础的优化预测模型的建立方法,其中,步骤h还包括下列步骤:The method for establishing an optimized prediction model based on machine learning according to claim 1, wherein step h further comprises the following steps:
    h1)将第n+1特征值与参数设定组存取至一数据暂存区;以及h1) accessing the (n + 1) th feature value and parameter setting group to a data temporary storage area; and
    h2)若重复次数满足运算量值,从数据暂存区中选择准确度的最高者,重设所述数个机械学习算法。h2) If the number of repetitions satisfies the value of the operation, the highest accuracy is selected from the data temporary storage area, and the several mechanical learning algorithms are reset.
  3. 根据权利要求1所述的以机械学习为基础的优化预测模型的建立方法,其中,步骤c还包括下列步骤:The method for establishing an optimized prediction model based on machine learning according to claim 1, wherein step c further comprises the following steps:
    c1)将格式化原始数据的数据数值分为一训练集与一测试集后,该训练集的资料数值分为子训练集与子测试集;c1) After the data values of the formatted original data are divided into a training set and a test set, the data values of the training set are divided into a sub-training set and a sub-test set;
    且,步骤g还包括下列步骤:Moreover, step g further includes the following steps:
    g1)通过所述数个机械学习算法与训练集所载数据数值建立第一预测模型;g1) establishing a first prediction model by using the several machine learning algorithms and data values contained in the training set;
    g2)将测试集所载数据数值代入第一预测模型,通过所述数个预测算法取得一第一测试准确度;以及g2) Substituting the data value of the test set into the first prediction model, and obtaining a first test accuracy through the plurality of prediction algorithms; and
    g3)将所述第一测试准确度取代为第一准确度。g3) replacing the first test accuracy with the first accuracy.
  4. 根据权利要求1所述的以机械学习为基础的优化预测模型的建立方法,其中,步骤a还包括下列步骤:The method for establishing an optimized prediction model based on machine learning according to claim 1, wherein step a further comprises the following steps:
    a1)选择欲使用的一分类样本平衡基数(n);a1) Select a balanced sample cardinality (n) to be used;
    且,步骤d还包括下列步骤:Moreover, step d further includes the following steps:
    d1)所述数个机械学习算法将子训练集所载数据数值分为多个取样类别,其中,所述数个机械学习算法具有不同取样类别:d1) The plurality of machine learning algorithms divide the data values carried in the sub-training set into multiple sampling categories, wherein the plurality of machine learning algorithms have different sampling categories:
    d2)分别由所述多个取样类别取样该分类样本平衡基数的数量,建立一样本组合;d2) sampling the number of balanced cardinal numbers of the classified samples from the multiple sampling categories to establish a sample combination;
    d3)利用样本组合所载数据数值建立第一样本预测模型;以及d3) establishing a first sample prediction model using the data values contained in the sample combination; and
    d4)重复步骤d2)至d3)直至满足该运算量值(t),取得多个样本预测模型,合并所述多个样本预测模型形成第一子预测模型。d4) Repeat steps d2) to d3) until the operation value (t) is satisfied, obtain a plurality of sample prediction models, and merge the plurality of sample prediction models to form a first sub-prediction model.
  5. 根据权利要求1所述的以机械学习为基础的优化预测模型的建立方法,其中,步骤e还包括下列步骤:The method for establishing an optimized prediction model based on machine learning according to claim 1, wherein step e further comprises the following steps:
    eap1)所述数个预测算法分别取得多个第一样本准确度;以及eap1) the plurality of prediction algorithms respectively obtain a plurality of first sample accuracy; and
    eap2)由一投票模式或一平均模式选择所述多个第一样本准确度的信心指数最高者,作为该第一预测结果。eap2) Selecting the highest confidence index of the accuracy of the plurality of first samples from a voting mode or an average mode as the first prediction result.
  6. 根据权利要求1所述的以机械学习为基础的优化预测模型的建立方法,其中,步骤e还包括下列步骤:The method for establishing an optimized prediction model based on machine learning according to claim 1, wherein step e further comprises the following steps:
    e1)比对该第一准确度与一已知结果,得一第一准确度指标;e1) comparing the first accuracy with a known result to obtain a first accuracy index;
    且,步骤f还包括下列步骤:Moreover, step f also includes the following steps:
    f1)依第n准确度与第n准确度指标修改第n特征值与参数设定组。f1) Modify the nth feature value and parameter setting group according to the nth accuracy and the nth accuracy index.
  7. 根据权利要求6所述的以机械学习为基础的优化预测模型的建立方法,其中,准确度指标包含accuracy,AUC以及MCC。The method for establishing an optimized prediction model based on machine learning according to claim 6, wherein the accuracy index includes accuracy, AUC, and MCC.
  8. 根据权利要求1所述的以机械学习为基础的优化预测模型的建立方法,步骤b中,经由多个转换程序重复比对数据格式,选择符合的转换程序。The method for establishing an optimized prediction model based on machine learning according to claim 1, wherein in step b, the data format is repeatedly compared via a plurality of conversion programs, and a corresponding conversion program is selected.
  9. 根据权利要求1所述的以机械学习为基础的优化预测模型的建立方法,其中,数据格式为csv文件或纯文本档。The method for establishing an optimized prediction model based on machine learning according to claim 1, wherein the data format is a csv file or a plain text file.
  10. 一种以机械学习为基础的优化预测结果的取得方法,包括下列步骤:A method for obtaining optimized prediction results based on mechanical learning, including the following steps:
    a)由一用户提供一待预测数据,具有一数据格式,并选择一如权利要求1所述的优化预测模型以及欲使用的数个预测算法;a) a user provides data to be predicted, has a data format, and selects an optimized prediction model according to claim 1 and a plurality of prediction algorithms to be used;
    b)利用一转换程序,将待预测数据所属数据格式转换至一中继格式,取得一格式化原始数据;以及b) using a conversion program to convert the data format to which the data to be predicted belongs to a relay format to obtain a formatted raw data; and
    c)将该格式化原始数据所载数据数值代入优化预测模型,通过所述预测算法取得一优化预测结果以及一优化准确度指标。c) Substituting the numerical value of the formatted raw data into an optimized prediction model, and obtaining an optimized prediction result and an optimized accuracy index through the prediction algorithm.
  11. 根据权利要求10所述的以机械学习为基础的优化预测结果的取得方法,其中,步骤a还包括下列步骤:The method for obtaining an optimized prediction result based on machine learning according to claim 10, wherein step a further comprises the following steps:
    a1)再选择一运算量值;a1) then select an operation value;
    且,步骤c还更包含:Moreover, step c further includes:
    c1)格式化原始数据为一第一格式化原始数据,将第一格式化原始数据所载数据数值代入优化预测模型,通过所述数个预测算法取得一第一预测结果;c1) formatting the raw data into a first formatting raw data, substituting the data value contained in the first formatting raw data into an optimized prediction model, and obtaining a first prediction result through the plurality of prediction algorithms;
    c2)将一第n格式化待预测数据合并该第n预测结果,取得一第n+1格式化待预测数据,重复步骤c1),直至重复次数满足该运算量值,提供一第n+1预测结果作为优化预测结果。c2) Combine an n-th formatted to-be-predicted data with the n-th prediction result, obtain an n + 1-th formatted to-be-predicted data, and repeat step c1) until the number of repetitions satisfies the value of the operation, and provide an n + 1th The prediction result is used as the optimized prediction result.
  12. 根据权利要求11所述的以机械学习为基础的优化预测结果的取得方法,其中,步骤c1还包括下列步骤:The method for obtaining an optimized prediction result based on machine learning according to claim 11, wherein step c1 further comprises the following steps:
    c1p1)通过所述数个预测算法取得一第一准确度,比对该第一准确度与一已知结果,得一第一准确度指标;c1p1) obtaining a first accuracy through the plurality of prediction algorithms, and comparing the first accuracy with a known result to obtain a first accuracy index;
    且,步骤c2还包括下列步骤:In addition, step c2 includes the following steps:
    c2p1)提供一第n+1准确度指标作为优化准确度指标。c2p1) provides an n + 1th accuracy index as the optimization accuracy index.
  13. 根据权利要求12所述的以机械学习为基础的优化预测结果的取得方法,其中,该准确度指标包含accuracy,AUC以及MCC。The method for obtaining an optimized prediction result based on machine learning according to claim 12, wherein the accuracy index includes accuracy, AUC, and MCC.
  14. 一种以机械学习为基础的优化预测模型的建立系统,包括:A system for building an optimal prediction model based on mechanical learning, including:
    一储存单元,用于组态来储存具有一数据格式的一训练数据、与数个机械学习算法;以及A storage unit configured to store training data having a data format and several mechanical learning algorithms; and
    一处理单元耦接至储存单元,用于组态来执行下列方法步骤:A processing unit is coupled to the storage unit for configuration to perform the following method steps:
    a)接收一运算量值以及一目标预测值;a) receiving a calculation value and a target prediction value;
    b)利用一转换程序,将该训练数据所属数据格式转换至一中继格式,取得一格式化原始数据,并以一第一特征值与参数设定组设定该等机械学习算法;b) using a conversion program to convert the data format to which the training data belongs to a relay format, obtain a formatted raw data, and set the mechanical learning algorithms with a first feature value and a parameter setting group;
    c)将格式化原始数据的数据数值分为一子训练集与一子测试集;c) dividing the data values of the formatted raw data into a sub-training set and a sub-testing set;
    d)通过所述数个机械学习算法与该子训练集所载数据数值建立一第一子预测模型;d) establishing a first sub-prediction model by using the plurality of mechanical learning algorithms and data values of the sub-training set;
    e)将子测试集所载数据数值代入该第一子预测模型,通过所述数个预测算法取得一第一准确度;e) Substituting the data values contained in the sub-test set into the first sub-prediction model, and obtaining a first accuracy through the plurality of prediction algorithms;
    f)若该格式化原始数据的数据数值均曾作为该子训练集及该子测试集,或重复次数满足该运算量值,依第n准确度修改该n特征值与参数设定组,取得一第n+1特征值与参数设定组,反之,重复步骤c)至e);f) If the data values of the formatted original data have been used as the sub-training set and the sub-test set, or the number of repetitions satisfies the value of the operation, modify the n feature value and parameter setting group according to the n-th accuracy to obtain A n + 1th eigenvalue and parameter setting group, otherwise, repeat steps c) to e);
    g)以该n特征值与参数设定组重设所述数个机械学习算法,通过所述数个机械学习算法与格式化原始数据所载数据数值建立一第一预测模型;g) resetting the plurality of mechanical learning algorithms with the n feature value and the parameter setting group, and establishing a first prediction model by using the plurality of mechanical learning algorithms and formatting the data values contained in the original data;
    h)若第n准确度满足该目标预测值或重复次数满足该运算量值,提供一第n预测模型作为一优化预测模型,反之,依该准确度修改该n特征值与参数设定组,取得一第n+1特征值与参数设定组设定所述数个机械学习算法,重复步骤c)至e);以及h) If the n-th accuracy meets the target prediction value or the number of repetitions satisfies the calculation value, provide an n-th prediction model as an optimized prediction model; otherwise, modify the n-characteristic value and parameter setting group according to the accuracy, Obtaining an n + 1th feature value and a parameter setting group to set the plurality of mechanical learning algorithms, and repeating steps c) to e); and
    i)显示该优化预测模型与该第n准确度。i) Display the optimized prediction model and the n-th accuracy.
  15. 一种以机械学习为基础的优化预测结果的取得方法系统,包括:A method system for obtaining optimized prediction results based on mechanical learning, including:
    一储存单元,用于组态来储存具有一数据格式的一待预测数据、一优化预测模型、及数个预测算法;以及A storage unit configured to store a to-be-predicted data having a data format, an optimized prediction model, and several prediction algorithms; and
    一处理单元耦接至储存单元,用于组态来执行下列方法步骤:A processing unit is coupled to the storage unit for configuration to perform the following method steps:
    a)选择该优化预测模型及该等预测算法;a) Select the optimized prediction model and the prediction algorithms;
    b)利用一转换程序,将待预测数据所属数据格式转换至一中继格式,取得一格式化原始数据;以及b) using a conversion program to convert the data format to which the data to be predicted belongs to a relay format to obtain a formatted raw data; and
    c)将格式化原始数据所载数据数值代入优化预测模型,通过所述数个预测算法取得一优化预测结果以及一优化准确度指标。c) Substituting the data values contained in the formatted raw data into the optimized prediction model, and obtaining an optimized prediction result and an optimized accuracy index through the several prediction algorithms.
PCT/CN2018/102897 2018-08-29 2018-08-29 Systems and methods for establishing optimized prediction model and obtaining prediction result WO2020041998A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/102897 WO2020041998A1 (en) 2018-08-29 2018-08-29 Systems and methods for establishing optimized prediction model and obtaining prediction result

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2018/102897 WO2020041998A1 (en) 2018-08-29 2018-08-29 Systems and methods for establishing optimized prediction model and obtaining prediction result

Publications (1)

Publication Number Publication Date
WO2020041998A1 true WO2020041998A1 (en) 2020-03-05

Family

ID=69644798

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/102897 WO2020041998A1 (en) 2018-08-29 2018-08-29 Systems and methods for establishing optimized prediction model and obtaining prediction result

Country Status (1)

Country Link
WO (1) WO2020041998A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949948A (en) * 2021-04-28 2021-06-11 北京理工大学 Integrated learning method and system for electric vehicle power conversion demand interval prediction in time-sharing mode

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8700550B1 (en) * 2007-11-30 2014-04-15 Intellectual Assets Llc Adaptive model training system and method
CN106779064A (en) * 2016-11-25 2017-05-31 电子科技大学 Deep neural network self-training method based on data characteristics
CN107492043A (en) * 2017-09-04 2017-12-19 国网冀北电力有限公司电力科学研究院 stealing analysis method and device
CN108108583A (en) * 2016-11-24 2018-06-01 南京理工大学 A kind of adaptive SVM approximate models parameter optimization method
CN108375808A (en) * 2018-03-12 2018-08-07 南京恩瑞特实业有限公司 Dense fog forecasting procedures of the NRIET based on machine learning

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8700550B1 (en) * 2007-11-30 2014-04-15 Intellectual Assets Llc Adaptive model training system and method
CN108108583A (en) * 2016-11-24 2018-06-01 南京理工大学 A kind of adaptive SVM approximate models parameter optimization method
CN106779064A (en) * 2016-11-25 2017-05-31 电子科技大学 Deep neural network self-training method based on data characteristics
CN107492043A (en) * 2017-09-04 2017-12-19 国网冀北电力有限公司电力科学研究院 stealing analysis method and device
CN108375808A (en) * 2018-03-12 2018-08-07 南京恩瑞特实业有限公司 Dense fog forecasting procedures of the NRIET based on machine learning

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112949948A (en) * 2021-04-28 2021-06-11 北京理工大学 Integrated learning method and system for electric vehicle power conversion demand interval prediction in time-sharing mode
CN112949948B (en) * 2021-04-28 2022-06-21 北京理工大学 Integrated learning method and system for electric vehicle power conversion demand interval prediction in time-sharing mode

Similar Documents

Publication Publication Date Title
TWI676940B (en) Machine learning based systems and methods for creating an optimal prediction model and obtaining optimal prediction results
Sawatsky et al. Partial least squares regression in the social sciences
JP6979527B2 (en) Chinese medicine production process knowledge system
WO2017143921A1 (en) Multi-sampling model training method and device
US9852390B2 (en) Methods and systems for intelligent evolutionary optimization of workflows using big data infrastructure
CN110046279A (en) Prediction technique, medium, device and the calculating equipment of video file feature
CN114169505A (en) Method and system for training yield prediction model and related equipment
WO2020041998A1 (en) Systems and methods for establishing optimized prediction model and obtaining prediction result
CN112070559A (en) State acquisition method and device, electronic equipment and storage medium
CN111552796A (en) Volume assembling method, electronic device and computer readable medium
CN114519508A (en) Credit risk assessment method based on time sequence deep learning and legal document information
CN111008299A (en) Quality evaluation method and device of voice database and computer storage medium
US20210390263A1 (en) System and method for automated decision making
Aldea et al. Managing information to support the decision making process
Najdi et al. A novel predictive modeling system to analyze students at risk of academic failure
TWI836791B (en) System and method for hearing tests
Zhu et al. Advanced crowdsourced test report prioritization based on adaptive strategy
CN115713441A (en) Teaching quality evaluation method and system based on AHP-Fuzzy algorithm and neural network
CN111654853B (en) Data analysis method based on user information
Parkhi et al. Machine Learning Based Prediction Model for College Admission
Wirawan et al. Application of data mining to prediction of timeliness graduation of students (a case study)
Poolwan et al. An Architecture for Simplified and AutomatedMachine Learning.
CN111695989A (en) Modeling method and platform of wind-control credit model
US20240184812A1 (en) Distributed active learning in natural language processing for determining resource metrics
US20240135455A1 (en) System and method for the dynamic allocation of funds

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18931593

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18931593

Country of ref document: EP

Kind code of ref document: A1