WO2021139465A1 - Backward model selection method and device, and readable storage medium - Google Patents

Backward model selection method and device, and readable storage medium Download PDF

Info

Publication number
WO2021139465A1
WO2021139465A1 PCT/CN2020/134736 CN2020134736W WO2021139465A1 WO 2021139465 A1 WO2021139465 A1 WO 2021139465A1 CN 2020134736 W CN2020134736 W CN 2020134736W WO 2021139465 A1 WO2021139465 A1 WO 2021139465A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
trained
features
training
feature
Prior art date
Application number
PCT/CN2020/134736
Other languages
French (fr)
Chinese (zh)
Inventor
唐兴兴
黄启军
陈瑞钦
林冰垠
李诗琦
Original Assignee
深圳前海微众银行股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳前海微众银行股份有限公司 filed Critical 深圳前海微众银行股份有限公司
Publication of WO2021139465A1 publication Critical patent/WO2021139465A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning

Definitions

  • This application relates to the artificial intelligence technology field of Fintech, and in particular to a backward model selection method, device and readable storage medium.
  • the backward selection mode is an important model selection strategy. Compared with all the features added to the model training, it can effectively prevent the model from overfitting.
  • the current backward selection mode usually requires the modeler to have High code development capabilities, and can only be implemented in a single machine, that is, the current implementation of the backward selection mode has higher threshold requirements for modelers, and because it can only be implemented in a single machine, it leads to the backward selection mode.
  • the modeling time is long and the modeling efficiency is low. Therefore, the prior art has the technical problems of high modeling threshold and low efficiency of the backward selection mode.
  • the main purpose of this application is to provide a backward model selection method, device and readable storage medium, aiming to solve the technical problems of high modeling threshold and low efficiency of backward selection mode in the prior art.
  • the present application provides a backward model selection method, the backward model selection method is applied to the server, and the backward model selection method includes:
  • each of the features to be trained calculates the first saliency corresponding to each of the features to be trained, and based on each of the first saliency, eliminate the features to be removed that meet the preset saliency requirements for removal from the features to be trained, so as to be based on the removed features.
  • Each of the features to be trained performs cyclic training on the first initial training model to obtain a cyclic training model set;
  • the cyclic training model set includes one or more model elements, and each of the model elements includes a second initial training model,
  • the feature to be removed that meets the preset removal saliency requirement is removed from the features to be trained, so as to compare the first initial saliency based on the removed features to be trained.
  • the training model performs cyclic training, and the steps to obtain the cyclic training model set include:
  • the second initial training model is cyclically trained to obtain one or more of the model elements until the feature to be removed does not exist in each of the features to be trained.
  • the step of selecting the feature to be removed from the features to be trained based on each of the first saliency and the preset removal saliency requirement includes:
  • the target significance is less than the preset rejection significance threshold, it is determined that the target feature meets the preset rejection significance requirement, and the target feature is taken as the feature to be rejected.
  • the step of calculating the first saliency corresponding to each of the features to be trained includes:
  • each of the first saliences is calculated.
  • the configuration parameter includes a training completion determination condition, and the feature to be trained includes one or more pieces of feature data;
  • the step of training a preset model to be trained based on each of the features to be trained and the configuration parameters, and obtaining a first initial training model includes:
  • the updated preset to-be-trained model does not meet the training completion judgment condition, continue to perform iterative training updates on the preset to-be-trained model until the updated preset to-train model satisfies the training Complete the judgment condition.
  • the step of selecting a target training model from the first initial training model and the cyclic training model set based on the configuration parameters includes:
  • model selection strategy includes AUC (Area Under Curve, the area under the receiver operating characteristic curve and the coordinate axis) value and AIC (Akaike information criterion, Akaike information) Quantity criterion) value;
  • model selection strategy is the AUC value
  • the model selection strategy is the AIC value
  • the AIC values of the elements in the cyclic training model set are compared, and the element corresponding to the smallest AIC value is selected as the target training model.
  • the client includes a visual interface
  • the step of generating visualization data corresponding to the target training model and feeding back the visualization data to the client includes:
  • the present application also provides a backward model selection method.
  • the backward model selection method is applied to the client, and the backward model selection method includes:
  • Receive a model selection task and send the configuration parameters corresponding to the model selection task to the server associated with the client, so that the server performs model selection based on the configuration parameters and the acquired features to be trained to obtain A target training model, and obtaining visualization data corresponding to the target training model, so as to send the visualization data to the client;
  • the visualization data fed back by the server is received, and the visualization data is displayed on a preset visualization interface.
  • the present application also provides a backward model selection device, which is applied to a backward model selection device, and the backward model selection device includes:
  • the first training module is configured to receive the configuration parameters sent by the client associated with the server and obtain the features to be trained, and train a preset model to be trained based on each of the features to be trained and the configuration parameters , To obtain the first initial training model;
  • the second training module is used to calculate the first saliency corresponding to each of the features to be trained, and based on each of the first saliences, to remove the features that meet the preset removal saliency requirements from the features to be trained The features to be eliminated, to perform cyclic training on the first initial training model based on each of the features to be trained after culling, to obtain a cyclic training model set;
  • a selection module for selecting a target training model from the first initial training model and a set of cyclic training models based on the configuration parameters
  • the feedback module is used for generating the visualization data corresponding to the target training model, and feeding back the visualization data to the client.
  • the second training module includes:
  • the first culling sub-module is configured to select the feature to be removed among the features to be trained based on each of the first saliency and the preset saliency removal requirement, and to remove the feature to be removed ;
  • a training sub-module configured to train the first initial training model based on the eliminated features to be trained to obtain the second initial training model
  • the second culling sub-module is used to calculate the second saliency of each feature to be trained after being removed, and based on each of the second saliency, remove the coincidence again from each feature to be trained after being removed Other features to be removed that are required to be removed by the preset saliency;
  • the cyclic training sub-module is used to perform cyclic training on the second initial training model based on each of the features to be trained after being removed again, to obtain one or more of the model elements, until each feature to be trained The feature to be removed does not exist in.
  • the selection submodule includes:
  • the first comparison unit is configured to compare each of the first saliency, and select the feature with the lowest saliency among the features to be trained as the target feature;
  • the second comparison unit is used to compare the target significance of the target feature with a preset significance threshold for rejection
  • the determining unit is configured to determine that if the target significance is less than the preset rejection significance threshold, determine that the target feature meets the preset rejection significance requirement, and use the target feature as the pending Remove features.
  • the second training module further includes:
  • the first calculation sub-module is used to calculate the chi-square value wald of each of the features to be trained
  • the second calculation sub-module is used for calculating each of the first saliency based on each of the chi-square value wald and the degrees of freedom of each of the features to be trained.
  • the first training module includes:
  • a training update sub-module for inputting the feature data corresponding to each of the features to be trained into the preset model to be trained, so as to train and update the preset model to be trained;
  • the first judging sub-module is used to judge whether the updated preset model to be trained satisfies the training completion judging condition, and if the updated preset to be trained model satisfies the training completion judging condition, then Obtaining the first initial training model;
  • the second judgment sub-module is configured to continue to perform iterative training updates on the preset to-be-trained model if the updated preset to-be-trained model does not satisfy the training completion judgment condition until the updated all-in-one model The preset model to be trained satisfies the training completion judgment condition.
  • the selection module includes:
  • the first obtaining sub-module is configured to obtain the model selection strategy in the parameter configuration, wherein the model selection strategy includes an AUC value and an AIC value;
  • the first comparison sub-module is configured to compare the AUC value of each element in the cyclic training model set if the model selection strategy is the AUC value to select the largest corresponding AUC value As the target training model;
  • the second comparison sub-module is used to compare the AIC value of each element in the cyclic training model set if the model selection strategy is the AIC value to select the smallest corresponding AIC value As the target training model.
  • the feedback module includes:
  • the second acquisition sub-module is used to acquire the candidate feature data, selection summary data, and training process data corresponding to the backward model selection process of the target training model;
  • a generating sub-module is used to generate the visualization data corresponding to the candidate feature data, the selection summary data, and the training process data, and feed back the visualization data to the visualization interface in real time.
  • the present application also provides a backward model selection device.
  • the backward model selection device is applied to a client, and the backward selection device includes:
  • the sending module is configured to receive the model selection task, and send the configuration parameters corresponding to the model selection task to the server associated with the client, so that the server can use the configuration parameters and the acquired configuration parameters.
  • model selection on training features obtaining a target training model, and obtaining visualization data corresponding to the target training model, so as to send the visualization data to the client;
  • the receiving module is configured to receive the visualization data fed back by the server, and display the visualization data on a preset visualization interface.
  • the present application also provides a backward model selection device.
  • the backward model selection device includes a memory, a processor, and a device for the backward model selection method that is stored on the memory and can run on the processor.
  • a program when the program of the backward model selection method is executed by a processor, the steps of the backward model selection method as described above can be realized.
  • the present application also provides a readable storage medium, the readable storage medium stores a program for implementing the backward model selection method, and when the program for the backward model selection method is executed by a processor, the backward model as described above is implemented Select the steps of the method.
  • This application receives the configuration parameters sent by the client associated with the server and obtains the features to be trained, and trains a preset model to be trained based on each of the features to be trained and the configuration parameters to obtain the first initial training
  • the model further calculates the first saliency corresponding to each of the features to be trained, and based on each of the first salience, removes the features to be removed that meet the preset removal saliency requirements from the features to be trained, and then based on
  • the first initial training model is cyclically trained to obtain a cyclic training model set, and then based on the configuration parameters, from the first initial training model and the cyclic training model set
  • the target training model is selected, and then the visualization data corresponding to the target training model is generated, and the visualization data is fed back to the client.
  • this application first sends the configuration parameters sent by the client associated with the server and acquires the features to be trained, and based on each of the features to be trained and the configuration parameters, performs a comparison of the preset model to be trained Training, obtain the first initial training model, and then perform the calculation of the first saliency corresponding to each of the features to be trained, and then based on each of the first saliency, the elimination of the features to be trained meets the preset elimination
  • the features to be removed with the saliency requirement are further based on the removed features to be trained, the cyclic training of the first initial training model is performed to obtain the cyclic training model set, and then based on the configuration parameters, from the first
  • a target training model is selected from an initial training model and a cyclic training model set, and then the visualization data corresponding to the target training model is generated, and the visualization data is fed back to the client.
  • this application provides a model selection method of backward selection mode of codeless distributed modeling and visual modeling.
  • the user only needs to set and send the necessary configuration parameters to the server through the client, and the server is It can feed back the visual data and the result of the backward model selection process corresponding to the corresponding backward model selection process, that is, through the communication connection between the client and the server for model modeling, distributed modeling is realized, and compared with a single machine
  • the modeling of the backward selection mode performed improves the modeling efficiency of the backward selection mode.
  • the visualization modeling is realized and the construction is reduced.
  • the ability threshold of the model personnel is required and the modeling efficiency of the backward selection mode is further improved.
  • the user only needs to enter the necessary model parameters in the visual interface of the client to obtain the corresponding backward model selection results.
  • code development ability which realizes no-code modeling, and further reduces the requirement for the ability threshold of modelers. Therefore, it solves the technology of high modeling threshold and low efficiency of backward selection mode in the existing technology. problem.
  • FIG. 1 is a schematic flowchart of the first embodiment of the backward model selection method of this application
  • FIG. 2 is a schematic diagram of a visual interface for configuring the parameters in the backward model selection method of this application;
  • FIG. 3 is a schematic flowchart of a second embodiment of the backward model selection method of this application.
  • FIG. 4 is a schematic diagram of the process of performing backward model selection in combination with the first embodiment in the second embodiment of the backward model selection method of this application;
  • FIG. 5 is a schematic flowchart of a third embodiment of a backward model selection method according to this application.
  • FIG. 6 is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the application.
  • the embodiment of the present application provides a method for selecting a backward model.
  • the method for selecting a backward model is applied to the server.
  • the backward model selection is Methods include:
  • Step S10 receiving configuration parameters sent by the client associated with the server and acquiring features to be trained, and training a preset model to be trained based on each of the features to be trained and the configuration parameters to obtain a first initial training model.
  • the client includes a visualization interface, and the user can configure parameters of a preset model to be trained on the visualization interface for model training, as shown in FIG.
  • the parameters such as the maximum iteration coefficient, minimum convergence error, and category weight are all parameters that need to be set before model training.
  • the backward model selection mode includes backward selection mode and stepwise selection mode.
  • the feature to be trained includes one or more features, and each feature includes one to obtain multiple pieces of feature data.
  • the preset model to be trained includes a logistic regression model.
  • the configuration parameters sent by the client associated with the server are received and the features to be trained are acquired, and a preset model to be trained is trained based on each of the features to be trained and the configuration parameters to obtain a first initial training model.
  • the configuration parameters sent by the client are received, and training completion judgment conditions are extracted from the configuration parameters, and then each feature to be trained is obtained from the local database of the backward model selection server, and each feature to be trained is
  • the feature data corresponding to the feature is input into the preset to-be-trained model to perform iterative training updates on the preset to-be-trained model, until the preset to-be-trained model reaches the preset training completion judgment condition, then the iterative training is completed, and
  • the updated preset model to be trained that is, the first initial training model is obtained, wherein the preset training completion judgment condition includes reaching the minimum convergence error, reaching the maximum number of iterations, and so on.
  • the configuration parameters include training completion judgment conditions, and the features to be trained include one or more pieces of feature data;
  • the step of training a preset model to be trained based on each of the features to be trained and the configuration parameters, and obtaining a first initial training model includes:
  • Step S11 input the feature data corresponding to each of the features to be trained into the preset model to be trained, so as to train and update the preset model to be trained;
  • the preset model to be trained is updated once, wherein the preset model to be trained is trained and updated The gradient descent method and so on.
  • Step S12 Determine whether the updated preset to-be-trained model satisfies the training completion judgment condition, and if the updated preset to-be-trained model satisfies the training completion judgment condition, obtain the first initial training model;
  • the training completion judgment condition includes reaching the minimum convergence error, reaching the maximum number of iterations, and so on.
  • the first initial training model is obtained, specifically To determine whether the updated preset to-be-trained model satisfies the training completion judgment condition, and if the updated preset to-be-trained model satisfies the training completion judgment condition, the updated model obtained in this training
  • the preset model to be trained is used as the first initial training model, that is, the first initial training model is obtained.
  • Step S13 If the updated preset to-be-trained model does not meet the training completion judgment condition, continue to perform iterative training updates on the preset to-be-trained model until the updated preset to-be-trained model satisfies The training completion judgment condition.
  • the updated preset to-be-trained model does not meet the training completion determination condition, then iterative training and update of the preset to-be-trained model continues until the updated preset to-be-trained model The training model satisfies the training completion judgment condition. Specifically, if the updated preset model to be trained does not satisfy the training completion judgment condition, it indicates that the updated preset model to be trained obtained in this training Cannot be used as the first initial training model, and then input the feature data corresponding to each of the features to be trained into the updated preset model to be trained, so as to perform iterative training updates on the preset model to be trained, Until the updated preset to-be-trained model satisfies the training completion judgment condition.
  • Step S20 Calculate the first saliency corresponding to each of the features to be trained, and based on each of the first saliency, remove the features to be removed that meet the preset saliency removal requirements from the features to be trained, so as to be based on After removing each of the features to be trained, performing cyclic training on the first initial training model to obtain a cyclic training model set;
  • the first saliency corresponding to each of the features to be trained is calculated, and based on each of the first saliency, the features to be removed that meet the preset removal saliency requirements are eliminated from the features to be trained ,
  • To perform cyclic training on the first initial training model based on the removed features to be trained to obtain a cyclic training model set specifically, based on each of the features to be trained and the features corresponding to each of the features to be trained
  • the chi-square value wald of each feature to be trained is calculated by the preset chi-square value wald calculation formula, and then based on each chi-square value wald and the degrees of freedom of each feature to be trained, the corresponding to each feature to be trained is calculated
  • the first saliency of, and then based on each of the first saliency find and remove the feature to be removed in each of the features to be trained, and then based on the feature to be trained after removing the feature to be removed, the The first initial training model is re
  • step S20 the step of calculating the first saliency corresponding to each of the features to be trained includes:
  • Step S21 Calculate the chi-square value wald of each of the features to be trained
  • the chi-square value wald of each feature to be trained is calculated, specifically, the feature data representation matrix corresponding to each feature to be trained is substituted into the preset chi-square value wald calculation formula, and each of the features is calculated in parallel.
  • S is the chi-square value wald
  • X includes n pieces of data
  • each piece of data includes k values
  • X can be represented by a feature data representation matrix
  • the feature data indicates that each column of the matrix is a piece of data and corresponds to the feature to be trained, and the model parameter obtained by training the preset model to be trained corresponding to X is ⁇ , where ⁇ is a k-dimensional vector ( ⁇ 1 , ⁇ 2 , ..., ⁇ k-1 , ⁇ k ), and the feature set X to be trained can be divided into a first model feature set and a second model feature set, wherein the feature corresponding to the first model feature set
  • the data representation matrix is X0
  • the feature data representation matrix corresponding to the second model feature set is X1
  • X 0 includes n pieces of data
  • each piece of data includes (kt) values
  • X 0 trains the preset model to be trained
  • the model parameter obtained is ⁇ 0 , where ⁇ 0 is a (kt)-dimensional vector ( ⁇ 1 , ⁇ 2 ,..., ⁇ kt ), X 1 includes n pieces of data, and each piece of data
  • the non-saliency features refer to the features of the features to be trained that are significantly less than a preset significance threshold , wherein the saliency can be obtained based on the chi-square value wald and the degree of freedom of the feature to be trained, wherein the degree of freedom is related to the value of the feature to be trained, for example, suppose the feature to be trained Including bank deposits, credit card consumption records, and loan records, then the feature to be trained includes 3 variables, and the degree of freedom is 2.
  • Step S22 Calculate each of the first saliency based on each of the chi-square value wald and the degrees of freedom of each of the features to be trained.
  • the first significance can be determined based on the Pearson correlation value, and when the Pearson correlation value is less than or equal to the preset Pearson correlation threshold, the determination is The feature corresponding to the first saliency does not meet the preset saliency removal requirement, that is, the feature corresponding to the first saliency appears to be significant, when the Pearson correlation value is greater than the preset Pearson correlation threshold
  • the degree of freedom corresponds to the number of feature data corresponding to the feature Correlation, for example, assuming that there are 100 different pieces of data in the feature data, the degree of freedom is 99.
  • the Pearson correlation value of each feature to be trained is calculated by a preset Pearson correlation value calculation formula, and then the significance of each feature to be trained is calculated by each Pearson correlation value, for example, assuming that each The Pearson correlation values are 0.0001, 0.01, and 0.05, respectively, and the corresponding measurement values for determining each of the significance are 100, 1, and 0.2. The larger the measurement value, the more significant the significance.
  • Step S30 based on the configuration parameters, select a target training model from the first initial training model and the cyclic training model set;
  • the configuration parameters include a model selection strategy.
  • a target training model is selected from the first initial training model and the cyclic training model set. Specifically, based on the model selection strategy, the first initial training model and the cyclic training model From each element of the training model set, a model that best meets the model selection strategy is selected as the target training model.
  • the step of selecting a target training model from the first initial training model and cyclic training model set based on the configuration parameters includes:
  • Step S31 Obtain a model selection strategy in the parameter configuration, where the model selection strategy includes an AUC value and an AIC value;
  • the AUC value is the criterion for evaluating the training model, and the larger the AUC value is, the better the training model is.
  • the AUC value is the area enclosed by the coordinate axis under the ROC (receiver operating characteristic curve) curve, and the value of this area will not be greater than 1, where the ROC curve is based on a A series of different binary classification methods (cutoff value or decision threshold), the true positive rate (sensitivity) is the ordinate, the false positive rate (1-specificity) is the curve drawn on the abscissa, the AIC value is calculated based on the AIC criterion Among them, the AIC criterion is a standard for measuring the goodness of the statistical model.
  • Step A32 if the model selection strategy is the AUC value, compare the AUC values of the elements in the cyclic training model set, and select the element corresponding to the largest AUC value as the target training model .
  • the model selection strategy is the AUC value
  • the AUC values of the elements in the cyclic training model set are compared, and the element corresponding to the largest AUC value is selected as the The target training model, specifically, if the model selection strategy is the AUC value, compare the AUC values to obtain the maximum AUC value, and use the training model corresponding to the maximum AUC value as the target training A model, wherein the training model includes a first initial training model and each element in the cyclic training model set.
  • Step S33 if the model selection strategy is the AIC value, compare the AIC values of the elements in the cyclic training model set, and select the element corresponding to the smallest AIC value as the target training model .
  • the model selection strategy is the AIC value
  • the AIC value of each element in the cyclic training model set is compared, and the element corresponding to the smallest AIC value is selected as the The target training model, specifically, if the model selection strategy is the AIC value, the AIC values are compared to obtain the minimum AIC value, and the training model corresponding to the minimum AIC value is used as the target training A model, wherein the training model includes a first initial training model and each element in the cyclic training model set.
  • Step S40 Generate visualization data corresponding to the target training model, and feed back the visualization data to the client.
  • the visualization data includes candidate feature visualization data, model selection summary visualization data, and training process visualization data, where the candidate feature is a feature in the feature set to be trained
  • the model selection summary data includes summary data for model selection of the first initial training model and the model elements in the cyclic training model set.
  • Generate visualization data corresponding to the target training model and feed back the visualization data to the client, specifically, generate visualization data corresponding to the acquisition process corresponding to the target training model, wherein the acquisition process includes features Selection process, model training process, model selection process, etc., and then feedback the visualization data to the visualization interface of the client for display to the customer, wherein the feature selection process is the process of selecting features in the feature set to be trained
  • the model training process is a process of training a target model, wherein the target model includes a preset model to be trained, a first initial training model, model elements, etc., and the model selection process is based on a preset model selection strategy The process of selecting the target training model.
  • the client includes a visual interface
  • the step of generating visualization data corresponding to the target training model and feeding back the visualization data to the client includes:
  • Step S41 Obtain candidate feature data, selection summary data, and training process data corresponding to the model selection process of the target training model;
  • the model selection process of the target training model includes a model iterative training process, a feature selection process, a model selection process, etc., wherein the feature selection process is a process of removing the feature to be removed, and the model selection process The process of selecting a target training model based on a preset model selection strategy.
  • Obtain candidate feature data, selection summary data, and training process data corresponding to the model selection process of the target training model specifically, acquire candidate feature data of the feature selection process and selection summary data of the model selection process in real time And training process data of the model iterative training process.
  • Step S42 Generate visualization data corresponding to the candidate feature data, the selection summary data, and the training process data, and feed back the visualization data to the visualization interface in real time.
  • the visualization data includes graphic data, table data, and the like.
  • the selection of the visualization data corresponding to the summary data and the training process data, and the real-time feedback of the visualization data to the visualization interface in real time, wherein the time interval for real-time feedback of the visualization data to the visualization interface The user of the server can be selected by the backward model to set it, and the user of the client can query the visualization data in real time on the client.
  • the preset model to be trained is trained based on each of the features to be trained and the configuration parameters to obtain the first initial Training the model, and then calculate the first saliency corresponding to each of the features to be trained, and based on each of the first salience, remove the features to be removed from the features to be trained that meet the preset removal saliency requirements, and then Based on the eliminated features to be trained, the first initial training model is cyclically trained to obtain a cyclic training model set, and then based on the configuration parameters, from the first initial training model and the cyclic training model set Selecting a target training model in, then generating visualization data corresponding to the target training model, and feeding back the visualization data to the client.
  • this embodiment first sends the configuration parameters sent by the client associated with the server and acquires the features to be trained, and based on each of the features to be trained and the configuration parameters, performs a comparison of the preset to be trained
  • the training of the model, the first initial training model is obtained, and the first saliency corresponding to each of the features to be trained is calculated, and then based on each of the first salience, the features to be trained are eliminated in accordance with the preset Remove the features to be removed that require saliency, and then perform cyclic training on the first initial training model based on the removed features to be trained to obtain a cyclic training model set, and then based on the configuration parameters, from the
  • the target training model is selected from the first initial training model and the cyclic training model set, and then the visualization data corresponding to the target training model is generated, and the visualization data is fed back to the client.
  • this embodiment provides a model selection method for the backward selection mode of codeless distributed modeling and visual modeling.
  • the user only needs to set and send the necessary configuration parameters to the server through the client. That is to say, the visual data corresponding to the backward model selection process and the backward model selection result can be fed back, that is, the client and the server are connected to communicate with each other for model modeling, which realizes distributed modeling, which is compared with The modeling of the backward selection mode performed by a stand-alone machine improves the modeling efficiency of the backward selection mode.
  • the visualization modeling is realized, which reduces The ability threshold of modelers is required and the modeling efficiency of the backward selection mode is further improved.
  • the user only needs to input the necessary model parameters in the visual interface of the client to obtain the corresponding backward model selection results.
  • the cyclic training model set includes one or more model elements, each of which The model element includes the second initial training model,
  • the feature to be removed that meets the preset removal saliency requirement is removed from the features to be trained, so as to compare the first initial saliency based on the removed features to be trained.
  • the training model performs cyclic training, and the steps to obtain the cyclic training model set include:
  • Step C10 based on each of the first saliency and the preset removal saliency requirements, select the feature to be removed among the features to be trained, and remove the feature to be removed;
  • the first significance can be determined based on the Pearson correlation value, and when the Pearson correlation value is less than or equal to the preset Pearson correlation threshold, the determination is The feature corresponding to the first saliency does not meet the preset saliency removal requirement, that is, the feature corresponding to the first saliency appears to be significant, when the Pearson correlation value is greater than the preset Pearson correlation threshold When, it is determined that the feature corresponding to the first saliency satisfies the preset saliency removal requirement, that is, the feature corresponding to the first saliency is not significant.
  • the feature to be removed among the features to be trained Based on each of the first saliency and the preset removal saliency requirements, select the feature to be removed among the features to be trained, and remove the feature to be removed, specifically, combine each of the first The saliency is compared, the feature with the lowest saliency among the features to be trained is selected as the target feature, and it is judged whether the target feature satisfies the pre-determined saliency requirement, if the target feature meets the pre-determined removal If the saliency requirement is required, the target feature is used as the feature to be eliminated, and the feature to be eliminated is eliminated. If the target feature does not meet the pre-determined saliency requirement for elimination, the current cycle training is ended.
  • the step of selecting the feature to be removed among the features to be trained based on each of the first saliency and the preset removal saliency requirement includes:
  • Step C11 comparing each of the first saliency, and selecting the feature with the lowest saliency among the features to be trained as the target feature;
  • each of the first saliency is compared, and the feature with the lowest saliency is selected as the target feature among the features to be trained. Specifically, the first saliency is selected as a target feature.
  • a comparison to obtain the least significant feature of each of the features to be trained corresponding to each of the saliency, that is, to obtain the feature with the highest Pearson correlation value, that is, in each of the features to be trained The least significant feature is selected as the target feature.
  • Step C12 comparing the target saliency of the target feature with a preset saliency rejection threshold
  • Step C13 If the target significance is less than the preset rejection significance threshold, it is determined that the target feature meets the preset rejection significance requirement, and the target feature is taken as the feature to be rejected.
  • the target saliency of the target feature is compared with a preset rejection saliency threshold, and if the target saliency is less than the preset rejection saliency threshold, the target feature is determined Meet the preset saliency requirement for rejection, and use the target feature as the feature to be rejected.
  • the target saliency of the target feature is compared with a preset saliency threshold, wherein the target The saliency is the first saliency of the target feature. If the target saliency is lower than the preset saliency threshold, the target feature meets the preset saliency removal requirement, that is, the The target feature is not significant, and then the target feature is used as the feature to be eliminated. If the target significance is higher than or equal to the preset significance threshold, the target feature does not satisfy the preset Excluding the significance requirement, that is, the target feature is significant, then this cycle training is ended.
  • Step C20 training the first initial training model based on the eliminated features to be trained to obtain the second initial training model.
  • the cyclic training model set includes one or more model elements.
  • the first initial training model is trained to obtain the second initial training model.
  • the feature data of the eliminated features to be trained is input into the A first initial training model to perform an iterative training update on the first initial training model until the updated first initial training model satisfies a preset training completion judgment condition to obtain the updated first initial training model That is, the second initial training model is obtained, wherein the preset training completion judgment condition includes reaching the maximum number of iterations and reaching the minimum convergence error.
  • Step C30 Calculate the second saliency of each feature to be trained after culling, and based on each of the second saliency, remove again from each feature to be trained after culling that meets the preset removal saliency Other required features to be removed;
  • the second saliency of each feature to be trained after being removed is calculated, and based on each of the second saliency, the removal of each feature to be trained after removal is again consistent with the preset
  • the other features to be removed that require saliency are removed, specifically, the chi-square value wald of each feature to be trained after removal is recalculated, and based on the recalculated chi-square value wald and each removed feature.
  • the degrees of freedom of the features to be trained are calculated, and the second saliency of each feature to be trained after being removed is calculated, and based on each of the second saliency, it is determined whether there is any feature that satisfies the preset after being removed. Remove the feature to be removed that requires saliency.
  • Step C40 Perform cyclic training on the second initial training model based on each of the features to be trained after being removed again, to obtain one or more of the model elements, until the feature to be trained does not exist in each of the features to be trained. Remove features.
  • the second initial training model is cyclically trained to obtain one or more of the model elements until there is no feature in each of the features to be trained
  • the features to be removed specifically, based on the features to be trained after being removed again, the second initial training model is iteratively trained and updated until the second initial training model reaches the training completion judgment condition, and the update is obtained
  • the latter second initial training model that is, one of the model elements is obtained, and the search and elimination of the features to be eliminated are re-circulated, and the bone-setting training update of the cyclically updated second initial training model is performed, Obtain one or more model elements, until there is no feature to be removed that meets the preset removal significance requirement among the features to be trained, then this cyclic training is ended, and then a cyclic training model set is obtained, as shown in Figure 4
  • This embodiment is a schematic diagram of the flow of backward model selection in combination with the first embodiment, where the features in the model are each of the features to be trained
  • the feature to be removed from the features to be trained is selected, and the feature to be removed is removed, and then based on each removed feature
  • the first initial training model is trained to obtain the second initial training model, and then the second saliency of each feature to be trained after being eliminated is calculated, and based on each of the second Saliency, among the features to be trained after being removed, other features to be removed that meet the pre-determined saliency requirement are removed again, and then based on the features to be trained after being removed again, the first 2.
  • the initial training model performs cyclic training to obtain one or more of the model elements until the feature to be removed does not exist in each feature to be trained.
  • the features to be eliminated in each feature to be trained are eliminated one by one, and the first initial training model is analyzed based on the features to be trained after each elimination.
  • the training update is performed until the feature to be removed does not exist in each feature to be trained, the cyclic training model set is obtained, and the model selection of the backward selection mode can be performed based on the cyclic training model set, that is,
  • the model selection of the backward selection mode of distributed modeling and visual modeling lays the foundation, that is, it lays a foundation for solving the technical problems of high threshold and low efficiency of backward selection mode modeling in the prior art.
  • the forward model selection method is applied to the client, and the forward model selection method includes:
  • Step A10 Receive a model selection task, and send configuration parameters corresponding to the model selection task to the server associated with the client, so that the server can perform model selection based on the configuration parameters to obtain a target training model , And obtain the visualization data corresponding to the target training model, so as to send the visualization data to the client;
  • the model selection task includes target model requirements
  • the target model requirements are determined by the configuration parameters
  • the configuration parameters include large iteration coefficients, minimum convergence errors, model selection modes, etc. parameter.
  • the visualization data corresponding to the target training model is sent to the client, specifically, the model selection task is received, and the configuration parameters corresponding to the model selection task are matched in a preset local database or determined by The user sets the configuration parameters by himself based on the model selection task, and further, sends the configuration parameters to the server associated with the client, so that the server can perform a preset initialization based on the configuration parameters.
  • the training update of the model, the model to be trained is obtained, and the cyclic training update is performed on the model to be trained to obtain one or more models to be selected, that is, the cyclic training model set is obtained, and the model to be selected is selected in each of the models to be selected.
  • the model of the preset model selection strategy is used as the target training model, and the process data corresponding to the target training model is converted into the visualization data and fed back to the client, where the visualization data includes candidate feature visualization data and models Select and summarize visualization data and model training process visualization data, where the candidate features are each of the features to be trained, and the model selection summary data includes performing model elements in the cyclic training model set based on a preset model selection strategy. Summary data for model selection.
  • Step A20 Receive the visualization data fed back by the server, and display the visualization data on a preset visualization interface.
  • the client can query the visualization data corresponding to the process data of the server in real time on the preset visualization interface, and it can be in the process of model selection or model selection. After the selection is completed, the process data is inquired, and the client is in communication with the server.
  • a model selection task is received, and the configuration parameters corresponding to the model selection task are sent to the server associated with the client, so that the server can perform model selection based on the configuration parameters to obtain target training Model, and obtain the visualization data corresponding to the target training model to send the visualization data to the client, and then receive the visualization data fed back by the server, and set the visualization data in a preset visualization
  • the interface is displayed. That is, this implementation provides a model selection method for codeless distributed modeling and visual modeling.
  • this embodiment implements distributed modeling, improves the modeling efficiency during model selection, and the model selection process does not have any code development capability requirements for users, which reduces the ability threshold requirements for modelers.
  • the server can convert the process data corresponding to the target training model into visualized data and feed it back to the client, it further reduces the ability threshold requirements for modelers, and the visualized data is convenient for modelers to understand and read.
  • the modeling efficiency of modelers can be further improved, and therefore, the technical problems of high threshold and low efficiency of forward selection model modeling in the prior art are solved.
  • FIG. 6 is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the present application.
  • the backward model selection device may include a processor 1001, such as a CPU, a memory 1005, and a communication bus 1002.
  • the communication bus 1002 is used to implement connection and communication between the processor 1001 and the memory 1005.
  • the memory 1005 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as a magnetic disk memory.
  • the memory 1005 may also be a storage device independent of the aforementioned processor 1001.
  • the backward model selection device may further include a rectangular user interface, a network interface, a camera, an RF (Radio Frequency) circuit, a sensor, an audio circuit, a WiFi module, and so on.
  • the rectangular user interface may include a display screen (Display) and an input sub-module such as a keyboard (Keyboard), and the optional rectangular user interface may also include a standard wired interface and a wireless interface.
  • the network interface can optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
  • the structure of the backward model selection device shown in FIG. 6 does not constitute a limitation on the backward model selection device, and may include more or less components than shown in the figure, or a combination of certain components, Or different component arrangements.
  • the memory 1005 which is a computer-readable storage medium, may include an operating system, a network communication module, and a backward model selection program.
  • the operating system is a program that manages and controls the hardware and software resources of the backward model selection device, and supports the operation of the backward model selection program and other software and/or programs.
  • the network communication module is used to realize the communication between the components in the memory 1005 and the communication with other hardware and software in the backward model selection system.
  • the processor 1001 is configured to execute the backward model selection program stored in the memory 1005 to implement the steps of the backward model selection method described in any one of the foregoing items.
  • the specific implementation of the backward model selection device of the present application is basically the same as the foregoing embodiments of the backward model selection method, and will not be repeated here.
  • An embodiment of the present application also provides a backward model selection device.
  • the backward model selection device is applied to a server, and the backward model selection device includes:
  • the first training module is configured to receive the configuration parameters sent by the client associated with the server and obtain the features to be trained, and train a preset model to be trained based on each of the features to be trained and the configuration parameters , To obtain the first initial training model;
  • the second training module is used to calculate the first saliency corresponding to each of the features to be trained, and based on each of the first saliences, to remove the features that meet the preset removal saliency requirements from the features to be trained The features to be eliminated, to perform cyclic training on the first initial training model based on each of the features to be trained after culling, to obtain a cyclic training model set;
  • a selection module for selecting a target training model from the first initial training model and a set of cyclic training models based on the configuration parameters
  • the feedback module is used for generating the visualization data corresponding to the target training model, and feeding back the visualization data to the client.
  • the second training module includes:
  • the first culling sub-module is configured to select the feature to be removed among the features to be trained based on each of the first saliency and the preset saliency removal requirement, and to remove the feature to be removed ;
  • a training sub-module configured to train the first initial training model based on the eliminated features to be trained to obtain the second initial training model
  • the second culling sub-module is used to calculate the second saliency of each feature to be trained after being removed, and based on each of the second saliency, remove the coincidence again from each feature to be trained after being removed Other features to be removed that are required to be removed by the preset saliency;
  • the cyclic training sub-module is used to perform cyclic training on the second initial training model based on each of the features to be trained after being removed again, to obtain one or more of the model elements, until each feature to be trained The feature to be removed does not exist in.
  • the selection submodule includes:
  • the first comparison unit is configured to compare each of the first saliency, and select the feature with the lowest saliency among the features to be trained as the target feature;
  • the second comparison unit is used to compare the target significance of the target feature with a preset significance threshold for rejection
  • the determining unit is configured to determine that if the target significance is less than the preset rejection significance threshold, determine that the target feature meets the preset rejection significance requirement, and use the target feature as the pending Remove features.
  • the second training module further includes:
  • the first calculation sub-module is used to calculate the chi-square value wald of each of the features to be trained
  • the second calculation sub-module is used for calculating each of the first saliency based on each of the chi-square value wald and the degrees of freedom of each of the features to be trained.
  • the first training module includes:
  • a training update sub-module for inputting the feature data corresponding to each of the features to be trained into the preset model to be trained, so as to train and update the preset model to be trained;
  • the first judging sub-module is used to judge whether the updated preset model to be trained satisfies the training completion judging condition, and if the updated preset to be trained model satisfies the training completion judging condition, then Obtaining the first initial training model;
  • the second judgment sub-module is configured to continue to perform iterative training updates on the preset to-be-trained model if the updated preset to-be-trained model does not satisfy the training completion judgment condition until the updated all-in-one model The preset model to be trained satisfies the training completion judgment condition.
  • the selection module includes:
  • the first obtaining sub-module is configured to obtain the model selection strategy in the parameter configuration, wherein the model selection strategy includes an AUC value and an AIC value;
  • the first comparison sub-module is configured to compare the AUC value of each element in the cyclic training model set if the model selection strategy is the AUC value to select the largest corresponding AUC value As the target training model;
  • the second comparison sub-module is used to compare the AIC value of each element in the cyclic training model set if the model selection strategy is the AIC value to select the smallest corresponding AIC value As the target training model.
  • the feedback module includes:
  • the second acquisition sub-module is used to acquire the candidate feature data, selection summary data, and training process data corresponding to the backward model selection process of the target training model;
  • a generating sub-module is used to generate the visualization data corresponding to the candidate feature data, the selection summary data, and the training process data, and feed back the visualization data to the visualization interface in real time.
  • the specific implementation of the backward model selection device of the present application is basically the same as the foregoing embodiments of the backward model selection method, and will not be repeated here.
  • an embodiment of the present application also provides a backward model selection device, the backward model selection device is applied to a client, and the backward model selection device includes:
  • the sending module is configured to receive the model selection task, and send the configuration parameters corresponding to the model selection task to the server associated with the client, so that the server can use the configuration parameters and the acquired configuration parameters.
  • model selection on training features obtaining a target training model, and obtaining visualization data corresponding to the target training model, so as to send the visualization data to the client;
  • the receiving module is configured to receive the visualization data fed back by the server, and display the visualization data on a preset visualization interface.
  • the specific implementation of the backward model selection device of the present application is basically the same as the foregoing embodiments of the backward model selection method, and will not be repeated here.
  • the embodiments of the present application provide a readable storage medium, and the readable storage medium stores one or more programs, and the one or more programs may also be executed by one or more processors for implementation The steps of the backward model selection method described in any one of the above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A backward model selection method and device, and a readable storage medium. The backward model selection method comprises: receiving configuration parameters sent by a client associated with a server and acquiring features to be trained, and training, on the basis of said features and the configuration parameters, a preset model to be trained to obtain a first initial training model (S10); calculating first significances corresponding to said features, and removing, on the basis of the first significances, features to be removed satisfying a preset removal significance requirement from the features to be trained, so as to perform loop training on the first initial training model on the basis of the removed features to be trained to obtain a loop training model set (S20); selecting a target training model from the first initial training model and the loop training model set on the basis of the configuration parameters (S30); and generating visual data corresponding to the target training model, and feeding back the visual data to the client (S40).

Description

向后模型选择方法、设备及可读存储介质Backward model selection method, equipment and readable storage medium
相关申请的交叉引用Cross-references to related applications
本申请要求于2020年1月9日提交中国专利局、申请号为202010024439.3、申请名称为“向后模型选择方法、设备及可读存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims the priority of a Chinese patent application filed with the Chinese Patent Office on January 9, 2020, the application number is 202010024439.3, and the application name is "backward model selection method, equipment and readable storage medium", the entire content of which is incorporated by reference Incorporated in this application.
技术领域Technical field
本申请涉及金融科技(Fintech)的人工智能技术领域,尤其涉及一种向后模型选择方法、设备及可读存储介质。This application relates to the artificial intelligence technology field of Fintech, and in particular to a backward model selection method, device and readable storage medium.
背景技术Background technique
随着金融科技,尤其是互联网科技金融的不断发展,越来越多的技术(如分布式、区块链Blockchain、人工智能等)应用在金融领域,但金融业也对技术提出了更高的要求,如对金融业对应的待办事项的分发也有更高的要求。With the continuous development of financial technology, especially Internet technology and finance, more and more technologies (such as distributed, blockchain, artificial intelligence, etc.) are applied in the financial field, but the financial industry has also proposed higher technology Requirements, such as the distribution of to-do items corresponding to the financial industry, also have higher requirements.
随着计算机软件和人工智能的不断发展,机器学习建模的应用也越来越广泛,在现有技术中,金融风控、医疗模型等场景通常会使用逻辑回归模型建模,而在逻辑回归模型建模中,向后选择模式是一种重要的模型选择策略,其相比全部特征加入模型训练,能有效的防止模型过拟合,但是,当前的向后选择模式通常需要建模人员具备较高的代码开发能力,且只能进行单机实现,也即,当前的向后选择模式的实施对建模人员具有较高的门槛要求,且由于只能进行单机实现进而导致向后选择模式的建模时间长、建模效率较低,所以,现有技术中存在向后选择模式的建模门槛高和效率低的技术问题。With the continuous development of computer software and artificial intelligence, the application of machine learning modeling has become more and more extensive. In the existing technology, financial risk control, medical models and other scenarios usually use logistic regression model modeling, and logistic regression In model modeling, the backward selection mode is an important model selection strategy. Compared with all the features added to the model training, it can effectively prevent the model from overfitting. However, the current backward selection mode usually requires the modeler to have High code development capabilities, and can only be implemented in a single machine, that is, the current implementation of the backward selection mode has higher threshold requirements for modelers, and because it can only be implemented in a single machine, it leads to the backward selection mode. The modeling time is long and the modeling efficiency is low. Therefore, the prior art has the technical problems of high modeling threshold and low efficiency of the backward selection mode.
申请内容Application content
本申请的主要目的在于提供一种向后模型选择方法、设备和可读存储介质,旨在解决现有技术中向后选择模式的建模门槛高和效率低的技术问题。The main purpose of this application is to provide a backward model selection method, device and readable storage medium, aiming to solve the technical problems of high modeling threshold and low efficiency of backward selection mode in the prior art.
为实现上述目的,本申请提供一种向后模型选择方法,所述向后模型选择方法应用于服务端,所述向后模型选择方法包括:To achieve the above objective, the present application provides a backward model selection method, the backward model selection method is applied to the server, and the backward model selection method includes:
接收与所述服务端关联的客户端发送的配置参数并获取待训练特征,并基于各所述待训练特征和所述配置参数对预设待训练模型进行训练,获得第一初始训练模型;Receiving configuration parameters sent by the client associated with the server and acquiring features to be trained, and training a preset model to be trained based on each of the features to be trained and the configuration parameters to obtain a first initial training model;
计算各所述待训练特征对应的第一显著性,并基于各所述第一显著性,在各所述待训练特征中剔除符合预设剔除显著性要求的待剔除特征,以基于剔除后的各所述待训练特征,对所述第一初始训练模型进行循环训练,获得循环训练模型集;Calculate the first saliency corresponding to each of the features to be trained, and based on each of the first saliency, eliminate the features to be removed that meet the preset saliency requirements for removal from the features to be trained, so as to be based on the removed features. Each of the features to be trained performs cyclic training on the first initial training model to obtain a cyclic training model set;
基于所述配置参数,从所述第一初始训练模型和循环训练模型集之中选取目标训练模型;Based on the configuration parameters, selecting a target training model from the first initial training model and the cyclic training model set;
生成所述目标训练模型对应的可视化数据,并将所述可视化数据反馈至所述客户端。Generate visualization data corresponding to the target training model, and feed back the visualization data to the client.
在一实施例中,所述循环训练模型集包括一个或者多个模型元素,各所述模型元素中包括第二初始训练模型,In an embodiment, the cyclic training model set includes one or more model elements, and each of the model elements includes a second initial training model,
所述基于各所述第一显著性,在各所述待训练特征中剔除符合预设剔除显著性要求的待剔除特征,以基于剔除后的各所述待训练特征,对所述第一初始训练模型进行循环训练,获得循环训练模型集的步骤包括:According to each of the first saliency, the feature to be removed that meets the preset removal saliency requirement is removed from the features to be trained, so as to compare the first initial saliency based on the removed features to be trained. The training model performs cyclic training, and the steps to obtain the cyclic training model set include:
基于各所述第一显著性和所述预设剔除显著性要求,选取各所述待训练特征中的所述待剔除特征,并剔除所述待剔除特征;Based on each of the first saliency and the preset removal saliency requirement, select the feature to be removed from the features to be trained, and remove the feature to be removed;
基于剔除后的各所述待训练特征,对所述第一初始训练模型进行训练,获得所述第二初始训练模型;Training the first initial training model based on each of the features to be trained after being eliminated to obtain the second initial training model;
计算剔除后的各所述待训练特征的第二显著性,并基于各所述第二显著性,在剔除后的各所述待训练特征中再次剔除符合所述预设剔除显著性要求的其他所述待剔除特征;Calculate the second saliency of each feature to be trained after culling, and based on each of the second saliency, remove other features that meet the preset removal saliency requirements from the features to be trained after removal. The features to be removed;
基于再次剔除后的各所述待训练特征,对所述第二初始训练模型进行循环训练,获得一个或者多个所述模型元素,直至各所述待训练特征中不存在所述待剔除特征。Based on each of the features to be trained after being removed again, the second initial training model is cyclically trained to obtain one or more of the model elements until the feature to be removed does not exist in each of the features to be trained.
在一实施例中,所述基于各所述第一显著性和所述预设剔除显著性要求,选取各所述待训练特征中的所述待剔除特征的步骤包括:In an embodiment, the step of selecting the feature to be removed from the features to be trained based on each of the first saliency and the preset removal saliency requirement includes:
将各所述第一显著性进行比对,以在各所述待训练特征中选取显著性最低的特征作为目标特征;Comparing each of the first saliency, and selecting the feature with the lowest saliency among the features to be trained as the target feature;
将所述目标特征的目标显著性与预设剔除显著性阀值进行比对;Comparing the target saliency of the target feature with a preset saliency rejection threshold;
若所述目标显著性小于所述预设剔除显著性阀值,则判定所述目标特征满足所述预设剔除显著性要求,并将所述目标特征作为所述待剔除特征。If the target significance is less than the preset rejection significance threshold, it is determined that the target feature meets the preset rejection significance requirement, and the target feature is taken as the feature to be rejected.
在一实施例中,所述计算各所述待训练特征对应的第一显著性的步骤包括:In an embodiment, the step of calculating the first saliency corresponding to each of the features to be trained includes:
计算各所述待训练特征的卡方值wald;Calculating the chi-square value wald of each of the features to be trained;
基于各所述卡方值wald和各所述待训练特征的自由度,计算各所述第一显著性。Based on each of the chi-square value wald and the degrees of freedom of each of the features to be trained, each of the first saliences is calculated.
在一实施例中,所述配置参数包括训练完成判定条件,所述待训练特征包括一条或者多条特征数据;In an embodiment, the configuration parameter includes a training completion determination condition, and the feature to be trained includes one or more pieces of feature data;
所述基于各所述待训练特征和所述配置参数对预设待训练模型进行训练,获得第一初始训练模型的步骤包括:The step of training a preset model to be trained based on each of the features to be trained and the configuration parameters, and obtaining a first initial training model includes:
将各所述待训练特征对应的所述特征数据输入所述预设待训练模型,以对所述预设待训练模型进行训练更新;Input the feature data corresponding to each of the features to be trained into the preset model to be trained, so as to train and update the preset model to be trained;
判断更新后的所述预设待训练模型是否满足所述训练完成判定条件,若更新后的所述预设待训练模型满足所述训练完成判定条件,则获得所述第一初始训练模型;Judging whether the updated preset to-be-trained model satisfies the training completion judgment condition, and if the updated preset to-be-trained model satisfies the training completion judgment condition, the first initial training model is obtained;
若更新后的所述预设待训练模型不满足所述训练完成判定条件,则继续对所述预设待训练模型进行迭代训练更新,直至更新后的所述预设待训练模型满足所述训练完成判定条件。If the updated preset to-be-trained model does not meet the training completion judgment condition, continue to perform iterative training updates on the preset to-be-trained model until the updated preset to-train model satisfies the training Complete the judgment condition.
在一实施例中,所述基于所述配置参数,从所述第一初始训练模型和循环训练模型集之中选取目标训练模型的步骤包括:In an embodiment, the step of selecting a target training model from the first initial training model and the cyclic training model set based on the configuration parameters includes:
获取所述参数配置中的模型选择策略,其中,所述模型选择策略包括AUC(Area Under Curve,受试者工作特征曲线下与坐标轴围成的面积)值和AIC(Akaike information criterion,赤池信息量准则)值;Acquire the model selection strategy in the parameter configuration, where the model selection strategy includes AUC (Area Under Curve, the area under the receiver operating characteristic curve and the coordinate axis) value and AIC (Akaike information criterion, Akaike information) Quantity criterion) value;
若所述模型选择策略为所述AUC值,则将所述循环训练模型集中各元素的所述AUC值进行对比,以选取最大的所述AUC值对应的元素作为所述目标训练模型;If the model selection strategy is the AUC value, compare the AUC values of the elements in the cyclic training model set, and select the element corresponding to the largest AUC value as the target training model;
若所述模型选择策略为所述AIC值,则将所述循环训练模型集中各元素的所述AIC值进行对比,以选取最小的所述AIC值对应的元素作为所述目标训练模型。If the model selection strategy is the AIC value, the AIC values of the elements in the cyclic training model set are compared, and the element corresponding to the smallest AIC value is selected as the target training model.
在一实施例中,所述客户端包括可视化界面,In an embodiment, the client includes a visual interface,
所述生成所述目标训练模型对应的可视化数据,并将所述可视化数据反馈至所述客户端的步骤包括:The step of generating visualization data corresponding to the target training model and feeding back the visualization data to the client includes:
获取所述目标训练模型的向后模型选择过程对应的备选特征数据、选择汇总数据和训练过程数据;Acquiring candidate feature data, selection summary data, and training process data corresponding to the backward model selection process of the target training model;
生成所述备选特征数据、所述选择汇总数据和所述训练过程数据共同对应的可视化数据,并将所述可视化数据实时反馈至所述可视化界面。Generate visualization data corresponding to the candidate feature data, the selection summary data, and the training process data, and feed back the visualization data to the visualization interface in real time.
为实现上述目的,本申请还提供一种向后模型选择方法,所述向后模型选择方法应用于客户端,所述向后模型选择方法包括:To achieve the above objective, the present application also provides a backward model selection method. The backward model selection method is applied to the client, and the backward model selection method includes:
接收模型选择任务,并将所述模型选择任务对应的配置参数发送至与所述客户端关联的服务端,以供所述服务端基于所述配置参数和获取的待训练特征进行模型选择,获得目标训练模型,并获取所述目标训练模型对应的可视化数据,以将所述可视化数据发送至所述客户端;Receive a model selection task, and send the configuration parameters corresponding to the model selection task to the server associated with the client, so that the server performs model selection based on the configuration parameters and the acquired features to be trained to obtain A target training model, and obtaining visualization data corresponding to the target training model, so as to send the visualization data to the client;
接收所述服务端反馈的所述可视化数据,并将所述可视化数据在预设可视化界面进行展示。The visualization data fed back by the server is received, and the visualization data is displayed on a preset visualization interface.
本申请还提供一种向后模型选择装置,所述向后模型选择装置应用于向后模型选择设备,所述向后模型选择装置包括:The present application also provides a backward model selection device, which is applied to a backward model selection device, and the backward model selection device includes:
第一训练模块,用于所述接收与所述服务端关联的客户端发送的配置参数并获取待训练特征,并基于各所述待训练特征和所述配置参数对预设待训练模型进行训练,获得第一初始训练模型;The first training module is configured to receive the configuration parameters sent by the client associated with the server and obtain the features to be trained, and train a preset model to be trained based on each of the features to be trained and the configuration parameters , To obtain the first initial training model;
第二训练模块,用于所述计算各所述待训练特征对应的第一显著性,并基于各所述第一显著性,在各所述待训练特征中剔除符合预设剔除显著性要求的待剔除特征,以基于剔除后的各所述待训练特征,对所述第一初始训练模型进行循环训练,获得循环训练模型集;The second training module is used to calculate the first saliency corresponding to each of the features to be trained, and based on each of the first saliences, to remove the features that meet the preset removal saliency requirements from the features to be trained The features to be eliminated, to perform cyclic training on the first initial training model based on each of the features to be trained after culling, to obtain a cyclic training model set;
选取模块,用于所述基于所述配置参数,从所述第一初始训练模型和循环训练模型集之中选取目标训练模型;A selection module for selecting a target training model from the first initial training model and a set of cyclic training models based on the configuration parameters;
反馈模块,用于所述生成所述目标训练模型对应的可视化数据,并将所述可视化数据反馈至所述客户端。The feedback module is used for generating the visualization data corresponding to the target training model, and feeding back the visualization data to the client.
在一实施例中,所述第二训练模块包括:In an embodiment, the second training module includes:
第一剔除子模块,用于所述基于各所述第一显著性和所述预设剔除显著性要求,选取各所述待训练特征中的所述待剔除特征,并剔除所述待剔除特征;The first culling sub-module is configured to select the feature to be removed among the features to be trained based on each of the first saliency and the preset saliency removal requirement, and to remove the feature to be removed ;
训练子模块,用于所述基于剔除后的各所述待训练特征,对所述第一初始训练模型进行训练,获得所述第二初始训练模型;A training sub-module, configured to train the first initial training model based on the eliminated features to be trained to obtain the second initial training model;
第二剔除子模块,用于所述计算剔除后的各所述待训练特征的第二显著性,并基于各所述第二显著性,在剔除后的各所述待训练特征中再次剔除符合所述预设剔除显著性要求的其他所述待剔除特征;The second culling sub-module is used to calculate the second saliency of each feature to be trained after being removed, and based on each of the second saliency, remove the coincidence again from each feature to be trained after being removed Other features to be removed that are required to be removed by the preset saliency;
循环训练子模块,用于所述基于再次剔除后的各所述待训练特征,对所述第二初始训练模型进行循环训练,获得一个或者多个所述模型元素,直至各所述待训练特征中不存在所述待剔除特征。The cyclic training sub-module is used to perform cyclic training on the second initial training model based on each of the features to be trained after being removed again, to obtain one or more of the model elements, until each feature to be trained The feature to be removed does not exist in.
在一实施例中,所述选取子模块包括:In an embodiment, the selection submodule includes:
第一比对单元,用于所述将各所述第一显著性进行比对,以在各所述待训练特征中选取显著性最低的特征作为目标特征;The first comparison unit is configured to compare each of the first saliency, and select the feature with the lowest saliency among the features to be trained as the target feature;
第二比对单元,用于所述将所述目标特征的目标显著性与预设剔除显著性阀值进行比对;The second comparison unit is used to compare the target significance of the target feature with a preset significance threshold for rejection;
判定单元,用于所述若所述目标显著性小于所述预设剔除显著性阀值,则判定所述目标特征满足所述预设剔除显著性要求,并将所述目标特征作为所述待剔除特征。The determining unit is configured to determine that if the target significance is less than the preset rejection significance threshold, determine that the target feature meets the preset rejection significance requirement, and use the target feature as the pending Remove features.
在一实施例中,所述第二训练模块还包括:In an embodiment, the second training module further includes:
第一计算子模块,用于所述计算各所述待训练特征的卡方值wald;The first calculation sub-module is used to calculate the chi-square value wald of each of the features to be trained;
第二计算子模块,用于所述基于各所述卡方值wald和各所述待训练特征的自由度,计算各所述第一显著性。The second calculation sub-module is used for calculating each of the first saliency based on each of the chi-square value wald and the degrees of freedom of each of the features to be trained.
在一实施例中,所述第一训练模块包括:In an embodiment, the first training module includes:
训练更新子模块,用于所述将各所述待训练特征对应的所述特征数据输入所述预设待训练模型,以对所述预设待训练模型进行训练更新;A training update sub-module for inputting the feature data corresponding to each of the features to be trained into the preset model to be trained, so as to train and update the preset model to be trained;
第一判断子模块,用于所述判断更新后的所述预设待训练模型是否满足所述训练完成判定条件,若更新后的所述预设待训练模型满足所述训练完成判定条件,则获得所述第一初始训练模型;The first judging sub-module is used to judge whether the updated preset model to be trained satisfies the training completion judging condition, and if the updated preset to be trained model satisfies the training completion judging condition, then Obtaining the first initial training model;
第二判断子模块,用于所述若更新后的所述预设待训练模型不满足所述训练完成判定条件,则继续对所述预设待训练模型进行迭代训练更新,直至更新后的所述预设待训练模型满足所述训练完成判定条件。The second judgment sub-module is configured to continue to perform iterative training updates on the preset to-be-trained model if the updated preset to-be-trained model does not satisfy the training completion judgment condition until the updated all-in-one model The preset model to be trained satisfies the training completion judgment condition.
在一实施例中,所述选取模块包括:In an embodiment, the selection module includes:
第一获取子模块,用于所述获取所述参数配置中的模型选择策略,其中,所述模型选择策略包括AUC值和AIC值;The first obtaining sub-module is configured to obtain the model selection strategy in the parameter configuration, wherein the model selection strategy includes an AUC value and an AIC value;
第一比对子模块,用于所述若所述模型选择策略为所述AUC值,则将所述循环训练模型集中各元素的所述AUC值进行对比,以选取最大的所述AUC值对应的元素作为所述目标训练模型;The first comparison sub-module is configured to compare the AUC value of each element in the cyclic training model set if the model selection strategy is the AUC value to select the largest corresponding AUC value As the target training model;
第二比对子模块,用于所述若所述模型选择策略为所述AIC值,则将所述循环训练模型集中各元素的所述AIC值进行对比,以选取最小的所述AIC值对应的元素作为所述目标训练模型。The second comparison sub-module is used to compare the AIC value of each element in the cyclic training model set if the model selection strategy is the AIC value to select the smallest corresponding AIC value As the target training model.
在一实施例中,所述反馈模块包括:In an embodiment, the feedback module includes:
第二获取子模块,用于所述获取所述目标训练模型的向后模型选择过程对应的备选特征数据、选择汇总数据和训练过程数据;The second acquisition sub-module is used to acquire the candidate feature data, selection summary data, and training process data corresponding to the backward model selection process of the target training model;
生成子模块,用于所述生成所述备选特征数据、所述选择汇总数据和所述训练过程数据共同对应的可视化数据,并将所述可视化数据实时反馈至所述可视化界面。A generating sub-module is used to generate the visualization data corresponding to the candidate feature data, the selection summary data, and the training process data, and feed back the visualization data to the visualization interface in real time.
为实现上述目的,本申请还提供一种向后模型选择装置,所述向后模型选择装置应用于客户端,所述向后选择装置包括:In order to achieve the above objective, the present application also provides a backward model selection device. The backward model selection device is applied to a client, and the backward selection device includes:
发送模块,用于所述接收模型选择任务,并将所述模型选择任务对应的配置参数发送至与所述客户端关联的服务端,以供所述服务端基于所述配置参数和获取的待训练特征进行模型选择,获得目标训练模型,并获取所述目标训练模型对应的可视化数据,以将所述可视化数据发送至所述客户端;The sending module is configured to receive the model selection task, and send the configuration parameters corresponding to the model selection task to the server associated with the client, so that the server can use the configuration parameters and the acquired configuration parameters. Performing model selection on training features, obtaining a target training model, and obtaining visualization data corresponding to the target training model, so as to send the visualization data to the client;
接收模块,用于所述接收所述服务端反馈的所述可视化数据,并将所述可视化数据在预设可视化界面进行展示。The receiving module is configured to receive the visualization data fed back by the server, and display the visualization data on a preset visualization interface.
本申请还提供一种向后模型选择设备,所述向后模型选择设备包括:存储器、处理器以及存储在所述存储器上并可在所述处理器上运行的所述向后模型选择方法的程序,所述向后模型选择方法的程序被处理器执行时可实现如上述的向后模型选择方法的步骤。The present application also provides a backward model selection device. The backward model selection device includes a memory, a processor, and a device for the backward model selection method that is stored on the memory and can run on the processor. A program, when the program of the backward model selection method is executed by a processor, the steps of the backward model selection method as described above can be realized.
本申请还提供一种可读存储介质,所述可读存储介质上存储有实现向后模型选择方法的程序,所述向后模型选择方法的程序被处理器执行时实现如上述的向后模型选择方法的步骤。The present application also provides a readable storage medium, the readable storage medium stores a program for implementing the backward model selection method, and when the program for the backward model selection method is executed by a processor, the backward model as described above is implemented Select the steps of the method.
本申请通过接收与所述服务端关联的客户端发送的配置参数并获取待训练特征,并基于各所述待训练特征和所述配置参数对预设待训练模型进行训练,获得第一初始训练模型,进而计算各所述待训练特征对应的第一显著性,并基于各所述第一显著性,在各所述待训练特征中剔除符合预设剔除显著性要求的待剔除特征,进而基于剔除后的各所述待训练特征,对所述第一初始训练模型进行循环训练,获得循环训练模型集,进而基于所述配置参数,从所述第一初始训练模型和循环训练模型集之中选取目标训练模型,进而生成所述目标训练模型对应的可视化数据,并将所述可视化数据反馈至所述客户端。也即,本申请首先进行与所述服务端关联的客户端发送的配置参数的发送和待训练特征的获取,并基于各所述待训练特征和所述配置参数,进行对预设待训练模型的训练,获得第一初始训练模型,进而进行各所述待训练特征对应的第一显著性的计算,进而基于各所述第一显著性,在各所述待训练特征中剔除符合预设剔除显著性要求的待剔除特征,进而基于剔除后的各所述待训练特征,进行对所述第一初始训练模型的循环训练,获得循环训练模型集,进而基于所述配置参数,从所述第一初始训练模型和循环训练模型集之中选取目标训练模型,进而进行所述目标训练模型对应的可视化数据的生成,并将所述可视化数据反馈至所述客户端。也即,本申请提供了一种无代码化分布式建模和可视化建模的向后选择模式的模型选择方法,用户只需通过客户端设置并发送必要的配置参数至服务端,服务端即可反馈相应的向后模型选择过程对应的可视化数据和向后模型选择结果,也即,通过客户端和服务端进行通信连接以进行模型建模,实现了分布式建模,进而相比于单机进行的所述向后选择模式建模,提高了向后选择模式的建模效率,进而通过生成所述目标训练模型对应的可视化数据,并反馈至客户端,实现了可视化建模,降低了建模人员的能力门槛要求并进一步提高了向后选择模式的建模效率,且在本申请中用户只需在客户端的可视化界面输入必 要的模型参数即可获取相应的向后模型选择结果,对用户并无代码开发能力的要求,进而实现了无代码建模,进一步降低了对建模人员的能力门槛要求,所以,解决了现有技术中向后选择模式的建模门槛高和效率低的技术问题。This application receives the configuration parameters sent by the client associated with the server and obtains the features to be trained, and trains a preset model to be trained based on each of the features to be trained and the configuration parameters to obtain the first initial training The model further calculates the first saliency corresponding to each of the features to be trained, and based on each of the first salience, removes the features to be removed that meet the preset removal saliency requirements from the features to be trained, and then based on After removing each of the features to be trained, the first initial training model is cyclically trained to obtain a cyclic training model set, and then based on the configuration parameters, from the first initial training model and the cyclic training model set The target training model is selected, and then the visualization data corresponding to the target training model is generated, and the visualization data is fed back to the client. That is, this application first sends the configuration parameters sent by the client associated with the server and acquires the features to be trained, and based on each of the features to be trained and the configuration parameters, performs a comparison of the preset model to be trained Training, obtain the first initial training model, and then perform the calculation of the first saliency corresponding to each of the features to be trained, and then based on each of the first saliency, the elimination of the features to be trained meets the preset elimination The features to be removed with the saliency requirement are further based on the removed features to be trained, the cyclic training of the first initial training model is performed to obtain the cyclic training model set, and then based on the configuration parameters, from the first A target training model is selected from an initial training model and a cyclic training model set, and then the visualization data corresponding to the target training model is generated, and the visualization data is fed back to the client. That is, this application provides a model selection method of backward selection mode of codeless distributed modeling and visual modeling. The user only needs to set and send the necessary configuration parameters to the server through the client, and the server is It can feed back the visual data and the result of the backward model selection process corresponding to the corresponding backward model selection process, that is, through the communication connection between the client and the server for model modeling, distributed modeling is realized, and compared with a single machine The modeling of the backward selection mode performed improves the modeling efficiency of the backward selection mode. By generating the visualization data corresponding to the target training model and feeding it back to the client, the visualization modeling is realized and the construction is reduced. The ability threshold of the model personnel is required and the modeling efficiency of the backward selection mode is further improved. In this application, the user only needs to enter the necessary model parameters in the visual interface of the client to obtain the corresponding backward model selection results. There is no requirement for code development ability, which realizes no-code modeling, and further reduces the requirement for the ability threshold of modelers. Therefore, it solves the technology of high modeling threshold and low efficiency of backward selection mode in the existing technology. problem.
附图说明Description of the drawings
此处的附图被并入说明书中并构成本说明书的一部分,示出了符合本申请的实施例,并与说明书一起用于解释本申请的原理。The drawings herein are incorporated into the specification and constitute a part of the specification, show embodiments that conform to the application, and are used together with the specification to explain the principle of the application.
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,对于本领域普通技术人员而言,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。In order to more clearly describe the technical solutions in the embodiments of the present application or the prior art, the following will briefly introduce the accompanying drawings that need to be used in the description of the embodiments or the prior art. Obviously, those of ordinary skill in the art are In other words, other drawings can be obtained based on these drawings without creative labor.
图1为本申请向后模型选择方法第一实施例的流程示意图;FIG. 1 is a schematic flowchart of the first embodiment of the backward model selection method of this application;
图2为本申请向后模型选择方法中进行所述参数配置的可视化界面的示意图;2 is a schematic diagram of a visual interface for configuring the parameters in the backward model selection method of this application;
图3为本申请向后模型选择方法第二实施例的流程示意图;3 is a schematic flowchart of a second embodiment of the backward model selection method of this application;
图4为本申请向后模型选择方法中第二实施例结合第一实施例进行向后模型选择的流程示意图;4 is a schematic diagram of the process of performing backward model selection in combination with the first embodiment in the second embodiment of the backward model selection method of this application;
图5为本申请向后模型选择方法第三实施例的流程示意图;FIG. 5 is a schematic flowchart of a third embodiment of a backward model selection method according to this application;
图6为本申请实施例方案涉及的硬件运行环境的设备结构示意图。FIG. 6 is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the application.
本申请目的的实现、功能特点及优点将结合实施例,参照附图做进一步说明。The realization, functional characteristics, and advantages of the purpose of this application will be further described in conjunction with the embodiments and with reference to the accompanying drawings.
具体实施方式Detailed ways
应当理解,此处所描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。It should be understood that the specific embodiments described here are only used to explain the present application, and are not used to limit the present application.
本申请实施例提供一种向后模型选择方法,所述向后模型选择方法应用于服务端,在本申请向后模型选择方法的第一实施例中,参照图1,所述向后模型选择方法包括:The embodiment of the present application provides a method for selecting a backward model. The method for selecting a backward model is applied to the server. In the first embodiment of the method for selecting a backward model of the present application, referring to FIG. 1, the backward model selection is Methods include:
步骤S10,接收与所述服务端关联的客户端发送的配置参数并获取待训练特征,并基于各所述待训练特征和所述配置参数对预设待训练模型进行训练,获得第一初始训练模型。Step S10, receiving configuration parameters sent by the client associated with the server and acquiring features to be trained, and training a preset model to be trained based on each of the features to be trained and the configuration parameters to obtain a first initial training model.
在本实施例中,需要说明的是,所述客户端包括可视化界面,用户可在所述可视化界面上对预设待训练模型进行参数配置以进行模型训练,如图2所示为进行所述参数配置的可视化界面,其中,最大迭代系数、最小收敛误差和类别权重等参数均为模型训练之前需要进行设置的参数。向后模型选择模式包括向后选择模式和逐步选择模式等。待训练特征包括一个或者多个特征,且每一特征包括一条获得多条特征数据。预设待训练模型包括逻辑回归模型。In this embodiment, it should be noted that the client includes a visualization interface, and the user can configure parameters of a preset model to be trained on the visualization interface for model training, as shown in FIG. In the visual interface of parameter configuration, the parameters such as the maximum iteration coefficient, minimum convergence error, and category weight are all parameters that need to be set before model training. The backward model selection mode includes backward selection mode and stepwise selection mode. The feature to be trained includes one or more features, and each feature includes one to obtain multiple pieces of feature data. The preset model to be trained includes a logistic regression model.
接收与所述服务端关联的客户端发送的配置参数并获取待训练特征,并基于各所述待训练特征和所述配置参数对预设待训练模型进行训练,获得第一初始训练模型。具体地,接收所述客户端发送的配置参数,并从所述配置参数中提取训练完成判定条件,进而在所述向后模型选择服务端的本地数据库中获取各待训练特征,并将各待训练特征对应的特征数据输入预设待训练模型,以对所述预设待训练模型进行迭代训练更新,直至所述预设待训练模型达到预设训练完成判定条件,则完成本次迭代训练,获得更新后的所述预设待训练模型,也即,获得第一初始训练模型,其中,所述预设训练完成判定条件包括达到最小收敛误差、达到最大迭代次数等。The configuration parameters sent by the client associated with the server are received and the features to be trained are acquired, and a preset model to be trained is trained based on each of the features to be trained and the configuration parameters to obtain a first initial training model. Specifically, the configuration parameters sent by the client are received, and training completion judgment conditions are extracted from the configuration parameters, and then each feature to be trained is obtained from the local database of the backward model selection server, and each feature to be trained is The feature data corresponding to the feature is input into the preset to-be-trained model to perform iterative training updates on the preset to-be-trained model, until the preset to-be-trained model reaches the preset training completion judgment condition, then the iterative training is completed, and The updated preset model to be trained, that is, the first initial training model is obtained, wherein the preset training completion judgment condition includes reaching the minimum convergence error, reaching the maximum number of iterations, and so on.
其中,所述配置参数包括训练完成判定条件,所述待训练特征包括一条或者多条特征数据;Wherein, the configuration parameters include training completion judgment conditions, and the features to be trained include one or more pieces of feature data;
所述基于各所述待训练特征和所述配置参数对预设待训练模型进行训练,获得第一初始训练模型的步骤包括:The step of training a preset model to be trained based on each of the features to be trained and the configuration parameters, and obtaining a first initial training model includes:
步骤S11,将各所述待训练特征对应的所述特征数据输入所述预设待训练模型,以对所述预设待训练模型进行训练更新;Step S11, input the feature data corresponding to each of the features to be trained into the preset model to be trained, so as to train and update the preset model to be trained;
在本实施例中,需要说明的是,每对所述预设待训练模型进行一次训练,则对所述预设待训练模型进行一次更新,其中,对所述预设待训练模型进行训练更新的梯度下降法等。In this embodiment, it should be noted that each time the preset model to be trained is trained once, the preset model to be trained is updated once, wherein the preset model to be trained is trained and updated The gradient descent method and so on.
步骤S12,判断更新后的所述预设待训练模型是否满足所述训练完成判定条件,若更新后的所述预设待训练模型满足所述训练完成判定条件,则获得所述第一初始训练模型;Step S12: Determine whether the updated preset to-be-trained model satisfies the training completion judgment condition, and if the updated preset to-be-trained model satisfies the training completion judgment condition, obtain the first initial training model;
在本实施例中,需要说明的是,所述训练完成判定条件包括达到最小收敛误差、达到最大迭代次数等。In this embodiment, it should be noted that the training completion judgment condition includes reaching the minimum convergence error, reaching the maximum number of iterations, and so on.
判断更新后的所述预设待训练模型是否满足所述训练完成判定条件,若更新后的所述预设待训练模型满足所述训练完成判定条件,则获得所述第一初始训练模型,具体地,判断更新后的所述预设待训练模型是否满足所述训练完成判定条件,若更新后的所述预设待训练模型满足所述训练完成判定条件,则将本次训练获得的更新后的所述预设待训练模型作为所述第一初始训练模型,也即,获得所述第一初始训练模型。It is judged whether the updated preset to-be-trained model satisfies the training completion judgment condition, and if the updated preset to-be-trained model satisfies the training completion judgment condition, the first initial training model is obtained, specifically To determine whether the updated preset to-be-trained model satisfies the training completion judgment condition, and if the updated preset to-be-trained model satisfies the training completion judgment condition, the updated model obtained in this training The preset model to be trained is used as the first initial training model, that is, the first initial training model is obtained.
步骤S13,若更新后的所述预设待训练模型不满足所述训练完成判定条件,则继续对所述预设待训练模型进行迭代训练更新,直至更新后的所述预设待训练模型满足所述训练完成判定条件。Step S13: If the updated preset to-be-trained model does not meet the training completion judgment condition, continue to perform iterative training updates on the preset to-be-trained model until the updated preset to-be-trained model satisfies The training completion judgment condition.
在本实施例中,若更新后的所述预设待训练模型不满足所述训练完成判定条件,则继续对所述预设待训练模型进行迭代训练更新,直至更新后的所述预设待训练模型满足所述训练完成判定条件,具体地,若更新后的所述预设待训练模型不满足所述训练完成判定条件,则表明本次训练获得的更新后的所述预设待训练模型不能作为所述第一初始训练模型,进而将各所述待训练特征对应的所述特征数据输入更新后的所述预设待训练模型,以对所述预设待训练模型进行迭代训练更新,直至更新后的所述预设待训练模型满足所述训练完成判定条件。In this embodiment, if the updated preset to-be-trained model does not meet the training completion determination condition, then iterative training and update of the preset to-be-trained model continues until the updated preset to-be-trained model The training model satisfies the training completion judgment condition. Specifically, if the updated preset model to be trained does not satisfy the training completion judgment condition, it indicates that the updated preset model to be trained obtained in this training Cannot be used as the first initial training model, and then input the feature data corresponding to each of the features to be trained into the updated preset model to be trained, so as to perform iterative training updates on the preset model to be trained, Until the updated preset to-be-trained model satisfies the training completion judgment condition.
步骤S20,计算各所述待训练特征对应的第一显著性,并基于各所述第一显著性,在各所述待训练特征中剔除符合预设剔除显著性要求的待剔除特征,以基于剔除后的各所述待训练特征,对所述第一初始训练模型进行循环训练,获得循环训练模型集;Step S20: Calculate the first saliency corresponding to each of the features to be trained, and based on each of the first saliency, remove the features to be removed that meet the preset saliency removal requirements from the features to be trained, so as to be based on After removing each of the features to be trained, performing cyclic training on the first initial training model to obtain a cyclic training model set;
在本实施例中,计算各所述待训练特征对应的第一显著性,并基于各所述第一显著性,在各所述待训练特征中剔除符合预设剔除显著性要求的待剔除特征,以基于剔除后的各所述待训练特征,对所述第一初始训练模型进行循环训练,获得循环训练模型集,具体地,基于各所述待训练特征和各所述待训练特征对应的模型训练结果,通过预设卡方值wald计算公式计算各待训练特征的卡方值wald,进而基于各卡方值wald和各所述待训练特征的自由度,计算各所述待训练特征对应的第一显著性,进而基于各所述第一显著性,寻找并剔除各所述待训练特征中的待剔除特征,进而基于剔除所述待剔除特征后的各所述待训练特征对所述第一初始训练模型重新进行训练更新,获得更新后的所述第一初始训练模型,也即,获得循环训练模型集的模型元素之一,进一步地,在剔除后的各所述待训练特征中重新进行所述待剔除特征的寻找和对更新后的所述第一初始训练模型的训练,获得模型元素,直至各所述待训练特征无所述待剔除特征,此时,获得一个或者多个所述模型元素,也即,获得所述循环训练模型集。In this embodiment, the first saliency corresponding to each of the features to be trained is calculated, and based on each of the first saliency, the features to be removed that meet the preset removal saliency requirements are eliminated from the features to be trained , To perform cyclic training on the first initial training model based on the removed features to be trained to obtain a cyclic training model set, specifically, based on each of the features to be trained and the features corresponding to each of the features to be trained As a result of model training, the chi-square value wald of each feature to be trained is calculated by the preset chi-square value wald calculation formula, and then based on each chi-square value wald and the degrees of freedom of each feature to be trained, the corresponding to each feature to be trained is calculated The first saliency of, and then based on each of the first saliency, find and remove the feature to be removed in each of the features to be trained, and then based on the feature to be trained after removing the feature to be removed, the The first initial training model is re-trained and updated, and the updated first initial training model is obtained, that is, one of the model elements of the cyclic training model set is obtained, and further, in each of the features to be trained after removal The search for the features to be eliminated and the training of the updated first initial training model are performed again to obtain model elements until each feature to be trained does not have the features to be eliminated. At this time, one or more The model element, that is, the cyclic training model set is obtained.
其中,在步骤S20中,所述计算各所述待训练特征对应的第一显著性的步骤包括:Wherein, in step S20, the step of calculating the first saliency corresponding to each of the features to be trained includes:
步骤S21,计算各所述待训练特征的卡方值wald;Step S21: Calculate the chi-square value wald of each of the features to be trained;
在本实施例中,计算各所述待训练特征的卡方值wald,具体地,将各所述待训练特征对应的特征数据表示矩阵代入预设卡方值wald计算公式,分布式并行计算各所述待训练特征对应的卡方值wald,其中,所述预设卡方值wald计算公式如下所示:In this embodiment, the chi-square value wald of each feature to be trained is calculated, specifically, the feature data representation matrix corresponding to each feature to be trained is substituted into the preset chi-square value wald calculation formula, and each of the features is calculated in parallel. The chi-square value wald corresponding to the feature to be trained, wherein the preset chi-square value wald calculation formula is as follows:
Figure PCTCN2020134736-appb-000001
Figure PCTCN2020134736-appb-000001
其中,
Figure PCTCN2020134736-appb-000002
among them,
Figure PCTCN2020134736-appb-000002
其中,S为卡方值wald,所述待训练特征对应的特征数据表示为X,其中,X包括n条数据,每一条数据包括k个数值,且X可用特征数据表示矩阵进行表示,所述特征数据表示 矩阵的每一列为一条数据,并对应一所述待训练特征,且X对应的训练所述预设待训练模型获得的模型参数为θ,其中,θ为k维的向量(θ 1、θ 2、…、θ k-1、θ k),且所述待训练特征集X可分为第一模型特征集和第二模型特征集,其中,所述第一模型特征集对应的特征数据表示矩阵为X0,所述第二模型特征集对应的特征数据表示矩阵为X1,X 0包括n条数据,每一条数据包括(k-t)个数值,且X 0训练所述预设待训练模型获得的模型参数为θ 0,其中,θ 0为(k-t)维的向量(θ 1、θ 2、…、θ k-t),X 1包括n条数据,每一条数据包括t个数值,所述待训练模型的目标输出对应的数据集为Y,其中,Y包括n条数据,且Y对应存在预测概率P,P包括n个概率(p 1、p 2、…、p n-1、p n),则此时进行零假设H 0:Cθ=h,此时所有值均为0,C为t*k的矩阵,h为k*1的向量,进一步地,基于各所述卡方值wald,剔除所述待训练特征中的非显著特征,获得所述第二待训练特征,其中,所述非显著特征指的是各所述待训练特征中显著性低于预设显著性阀值的特征,其中,所述显著性可基于所述卡方值wald和所述待训练特征的自由度获取,其中所述自由度与所述待训练特征的取值相关,例如,假设所述待训练特征包括银行存款、刷卡消费记录和贷款记录,则所述待训练特征包括3个变量,则所述自由度为2。 Where S is the chi-square value wald, and the feature data corresponding to the feature to be trained is denoted as X, where X includes n pieces of data, each piece of data includes k values, and X can be represented by a feature data representation matrix. The feature data indicates that each column of the matrix is a piece of data and corresponds to the feature to be trained, and the model parameter obtained by training the preset model to be trained corresponding to X is θ, where θ is a k-dimensional vector (θ 1 , Θ 2 , ..., θ k-1 , θ k ), and the feature set X to be trained can be divided into a first model feature set and a second model feature set, wherein the feature corresponding to the first model feature set The data representation matrix is X0, the feature data representation matrix corresponding to the second model feature set is X1, X 0 includes n pieces of data, each piece of data includes (kt) values, and X 0 trains the preset model to be trained The model parameter obtained is θ 0 , where θ 0 is a (kt)-dimensional vector (θ 1 , θ 2 ,..., θ kt ), X 1 includes n pieces of data, and each piece of data includes t values. The data set corresponding to the target output of the training model is Y, where Y includes n pieces of data, and Y corresponds to the predicted probability P, and P includes n probabilities (p 1 , p 2 , ..., p n-1 , p n ) , Then the null hypothesis H 0 : Cθ=h is performed at this time, at this time all values are 0, C is a matrix of t*k, and h is a vector of k*1. Further, based on each of the chi-square values wald, Eliminate the non-saliency features in the features to be trained to obtain the second feature to be trained, where the non-saliency features refer to the features of the features to be trained that are significantly less than a preset significance threshold , Wherein the saliency can be obtained based on the chi-square value wald and the degree of freedom of the feature to be trained, wherein the degree of freedom is related to the value of the feature to be trained, for example, suppose the feature to be trained Including bank deposits, credit card consumption records, and loan records, then the feature to be trained includes 3 variables, and the degree of freedom is 2.
步骤S22,基于各所述卡方值wald和各所述待训练特征的自由度,计算各所述第一显著性。Step S22: Calculate each of the first saliency based on each of the chi-square value wald and the degrees of freedom of each of the features to be trained.
在本实施例中,需要说明的是,所述第一显著性可基于皮尔逊相关性值进行判定,当所述皮尔逊相关性值小于或者等于预设皮尔逊相关性阀值,则判定所述第一显著性对应的特征不满足预设剔除显著性要求,也即,所述第一显著性对应的特征表现为显著,当所述皮尔逊相关性值大于预设皮尔逊相关性阀值时,则判定所述第一显著性对应的特征满足预设剔除显著性要求,也即,所述第一显著性对应的特征表现为不显著,所述自由度与特征对应的特征数据的数量相关,例如,假设所述特征数据存在100条不同的数据,则所述自由度为99。In this embodiment, it should be noted that the first significance can be determined based on the Pearson correlation value, and when the Pearson correlation value is less than or equal to the preset Pearson correlation threshold, the determination is The feature corresponding to the first saliency does not meet the preset saliency removal requirement, that is, the feature corresponding to the first saliency appears to be significant, when the Pearson correlation value is greater than the preset Pearson correlation threshold When, it is determined that the feature corresponding to the first saliency satisfies the preset saliency removal requirement, that is, the feature corresponding to the first saliency appears to be insignificant, and the degree of freedom corresponds to the number of feature data corresponding to the feature Correlation, for example, assuming that there are 100 different pieces of data in the feature data, the degree of freedom is 99.
基于各所述卡方值wald和各所述待训练特征的自由度,计算各所述第一显著性,具体地,基于各所述卡方值wald和各所述待训练特征的自由度,通过预设皮尔逊相关性值计算公式计算各所述待训练特征的皮尔逊相关性值,进而通过各所述皮尔逊相关性值计算各所述待训练特征的显著性,例如,假设各所述皮尔逊相关性值分别为0.0001、0.01和0.05,则对应的判定各所述显著性的衡量值为100、1和0.2,其中,所述衡量值越大,则所述显著性越显著。Calculate each of the first saliency based on each of the chi-square value wald and the degrees of freedom of each of the features to be trained, specifically, based on each of the chi-square value wald and the degrees of freedom of each feature to be trained, The Pearson correlation value of each feature to be trained is calculated by a preset Pearson correlation value calculation formula, and then the significance of each feature to be trained is calculated by each Pearson correlation value, for example, assuming that each The Pearson correlation values are 0.0001, 0.01, and 0.05, respectively, and the corresponding measurement values for determining each of the significance are 100, 1, and 0.2. The larger the measurement value, the more significant the significance.
步骤S30,基于所述配置参数,从所述第一初始训练模型和循环训练模型集之中选取目标训练模型;Step S30, based on the configuration parameters, select a target training model from the first initial training model and the cyclic training model set;
在本实施例中,需要说明的是,所述配置参数包括模型选择策略。In this embodiment, it should be noted that the configuration parameters include a model selection strategy.
基于所述配置参数,在所述第一初始训练模型和所述循环训练模型集之中选取目标训练模型,具体地,基于所述模型选择策略,在所述第一初始训练模型和所述循环训练模型集的各元素中选择最符合所述模型选择策略的模型作为所述目标训练模型。Based on the configuration parameters, a target training model is selected from the first initial training model and the cyclic training model set. Specifically, based on the model selection strategy, the first initial training model and the cyclic training model From each element of the training model set, a model that best meets the model selection strategy is selected as the target training model.
其中,所述基于所述配置参数,从所述第一初始训练模型和循环训练模型集之中选取目标训练模型的步骤包括:Wherein, the step of selecting a target training model from the first initial training model and cyclic training model set based on the configuration parameters includes:
步骤S31,获取所述参数配置中的模型选择策略,其中,所述模型选择策略包括AUC值和AIC值;Step S31: Obtain a model selection strategy in the parameter configuration, where the model selection strategy includes an AUC value and an AIC value;
在本实施例中,需要说明的是,在本实施例中,需要说明的是,所述AUC值所述评价所述训练模型的标准,且AUC值越大,则所述训练模型越优,其中,所述AUC值为ROC(receiver operating characteristic curve,受试者工作特征曲线)曲线下与坐标轴围成的面积,且这个面积的数值不会大于1,其中,所述ROC曲线是根据一系列不同的二分类方式(分界值或决定阈),以真阳性率(灵敏度)为纵坐标,假阳性率(1-特异度)为横坐标绘制的曲线,所述AIC值为基于AIC准则计算出来的值,其中,所述AIC准则为衡量统计模型拟合优良性的一种标准。In this embodiment, it should be noted that in this embodiment, it should be noted that the AUC value is the criterion for evaluating the training model, and the larger the AUC value is, the better the training model is. Wherein, the AUC value is the area enclosed by the coordinate axis under the ROC (receiver operating characteristic curve) curve, and the value of this area will not be greater than 1, where the ROC curve is based on a A series of different binary classification methods (cutoff value or decision threshold), the true positive rate (sensitivity) is the ordinate, the false positive rate (1-specificity) is the curve drawn on the abscissa, the AIC value is calculated based on the AIC criterion Among them, the AIC criterion is a standard for measuring the goodness of the statistical model.
步骤A32,若所述模型选择策略为所述AUC值,则将所述循环训练模型集中各元素的所述AUC值进行对比,以选取最大的所述AUC值对应的元素作为所述目标训练模型。Step A32, if the model selection strategy is the AUC value, compare the AUC values of the elements in the cyclic training model set, and select the element corresponding to the largest AUC value as the target training model .
在本实施例中,若所述模型选择策略为所述AUC值,则将所述循环训练模型集中各元素的所述AUC值进行对比,以选取最大的所述AUC值对应的元素作为所述目标训练模型,具体地,若所述模型选择策略为所述AUC值,则将各所述AUC值进行对比,获得最大AUC值,并将所述最大AUC值对应的训练模型作为所述目标训练模型,其中,所述训练模型包括第一初始训练模型和所述循环训练模型集中的各元素。In this embodiment, if the model selection strategy is the AUC value, the AUC values of the elements in the cyclic training model set are compared, and the element corresponding to the largest AUC value is selected as the The target training model, specifically, if the model selection strategy is the AUC value, compare the AUC values to obtain the maximum AUC value, and use the training model corresponding to the maximum AUC value as the target training A model, wherein the training model includes a first initial training model and each element in the cyclic training model set.
步骤S33,若所述模型选择策略为所述AIC值,则将所述循环训练模型集中各元素的所述AIC值进行对比,以选取最小的所述AIC值对应的元素作为所述目标训练模型。Step S33, if the model selection strategy is the AIC value, compare the AIC values of the elements in the cyclic training model set, and select the element corresponding to the smallest AIC value as the target training model .
在本实施例中,若所述模型选择策略为所述AIC值,则将所述循环训练模型集中各元素的所述AIC值进行对比,以选取最小的所述AIC值对应的元素作为所述目标训练模型,具体地,若所述模型选择策略为所述AIC值,则将各所述AIC值进行对比,获得最小AIC值,并将所述最小AIC值对应的训练模型作为所述目标训练模型,其中,所述训练模型包括第一初始训练模型和所述循环训练模型集中的各元素。In this embodiment, if the model selection strategy is the AIC value, the AIC value of each element in the cyclic training model set is compared, and the element corresponding to the smallest AIC value is selected as the The target training model, specifically, if the model selection strategy is the AIC value, the AIC values are compared to obtain the minimum AIC value, and the training model corresponding to the minimum AIC value is used as the target training A model, wherein the training model includes a first initial training model and each element in the cyclic training model set.
步骤S40,生成所述目标训练模型对应的可视化数据,并将所述可视化数据反馈至所述客户端。Step S40: Generate visualization data corresponding to the target training model, and feed back the visualization data to the client.
在本实施例中,需要说明的是,所述可视化数据包括备选特征可视化数据、模型选择汇总可视化数据和训练过程可视化数据,其中,所述备选特征为所述待训练特征集中的特征,所述模型选择汇总数据包括对第一初始训练模型和所述循环训练模型集中的模型元素进行模型选择的汇总数据。In this embodiment, it should be noted that the visualization data includes candidate feature visualization data, model selection summary visualization data, and training process visualization data, where the candidate feature is a feature in the feature set to be trained, The model selection summary data includes summary data for model selection of the first initial training model and the model elements in the cyclic training model set.
生成所述目标训练模型对应的可视化数据,并将所述可视化数据反馈至所述客户端,具体地,生成所述目标训练模型对应的获取过程对应的可视化数据,其中,所述获取过程包括特征选择过程、模型训练过程和模型选择过程等,进而将所述可视化数据反馈至所述客户端的可视化界面以向客户进行展示,其中,所述特征选择过程为在所述待训练特征集中选择特征的过程,所述模型训练过程为对目标模型训练的过程,其中,所述目标模型包括预设待训练模型、第一初始训练模型和模型元素等,所述模型选择过程为基于预设模型选择策略选择目标训练模型的过程。Generate visualization data corresponding to the target training model, and feed back the visualization data to the client, specifically, generate visualization data corresponding to the acquisition process corresponding to the target training model, wherein the acquisition process includes features Selection process, model training process, model selection process, etc., and then feedback the visualization data to the visualization interface of the client for display to the customer, wherein the feature selection process is the process of selecting features in the feature set to be trained The model training process is a process of training a target model, wherein the target model includes a preset model to be trained, a first initial training model, model elements, etc., and the model selection process is based on a preset model selection strategy The process of selecting the target training model.
其中,所述客户端包括可视化界面,Wherein, the client includes a visual interface,
所述生成所述目标训练模型对应的可视化数据,并将所述可视化数据反馈至所述客户端的步骤包括:The step of generating visualization data corresponding to the target training model and feeding back the visualization data to the client includes:
步骤S41,获取所述目标训练模型的模型选择过程对应的备选特征数据、选择汇总数据和训练过程数据;Step S41: Obtain candidate feature data, selection summary data, and training process data corresponding to the model selection process of the target training model;
在本实施例中,所述目标训练模型的模型选择过程包括模型迭代训练过程、特征选取过程和模型选取过程等,其中,特征选取过程为剔除所述待剔除特征的过程,所述模型选取过程为基于预设模型选择策略选取目标训练模型的过程。In this embodiment, the model selection process of the target training model includes a model iterative training process, a feature selection process, a model selection process, etc., wherein the feature selection process is a process of removing the feature to be removed, and the model selection process The process of selecting a target training model based on a preset model selection strategy.
获取所述目标训练模型的模型选择过程对应的备选特征数据、选择汇总数据和训练过程数据,具体地,实时获取所述特征选取过程的备选特征数据、所述模型选取过程的选择汇总数据和所述模型迭代训练过程的训练过程数据。Obtain candidate feature data, selection summary data, and training process data corresponding to the model selection process of the target training model, specifically, acquire candidate feature data of the feature selection process and selection summary data of the model selection process in real time And training process data of the model iterative training process.
步骤S42,生成所述备选特征数据、所述选择汇总数据和所述训练过程数据共同对应的可视化数据,并将所述可视化数据实时反馈至所述可视化界面。Step S42: Generate visualization data corresponding to the candidate feature data, the selection summary data, and the training process data, and feed back the visualization data to the visualization interface in real time.
在本实施例中,需要说明的是,所述可视化数据包括图文数据、表格数据等。In this embodiment, it should be noted that the visualization data includes graphic data, table data, and the like.
生成所述备选特征数据、所述选择汇总数据和所述训练过程数据共同对应的可视化数据,并将所述可视化数据实时反馈至所述可视化界面,具体地,实时生成所述备选特征数据、所述选择汇总数据和所述训练过程数据共同对应的可视化数据,并实时将所述可视化数据实时反馈至所述可视化界面,其中,将所述可视化数据实时反馈至所述可视化界面的时间间隔可由向后模型选择服务端的使用用户自行设置,且客户端用户可在客户端上实时 查询所述可视化数据。Generate visualization data corresponding to the candidate feature data, the selection summary data, and the training process data, and feed back the visualization data to the visualization interface in real time, specifically, generate the candidate feature data in real time , The selection of the visualization data corresponding to the summary data and the training process data, and the real-time feedback of the visualization data to the visualization interface in real time, wherein the time interval for real-time feedback of the visualization data to the visualization interface The user of the server can be selected by the backward model to set it, and the user of the client can query the visualization data in real time on the client.
本实施例通过接收与所述服务端关联的客户端发送的配置参数并获取待训练特征,并基于各所述待训练特征和所述配置参数对预设待训练模型进行训练,获得第一初始训练模型,进而计算各所述待训练特征对应的第一显著性,并基于各所述第一显著性,在各所述待训练特征中剔除符合预设剔除显著性要求的待剔除特征,进而基于剔除后的各所述待训练特征,对所述第一初始训练模型进行循环训练,获得循环训练模型集,进而基于所述配置参数,从所述第一初始训练模型和循环训练模型集之中选取目标训练模型,进而生成所述目标训练模型对应的可视化数据,并将所述可视化数据反馈至所述客户端。也即,本实施例首先进行与所述服务端关联的客户端发送的配置参数的发送和待训练特征的获取,并基于各所述待训练特征和所述配置参数,进行对预设待训练模型的训练,获得第一初始训练模型,进而进行各所述待训练特征对应的第一显著性的计算,进而基于各所述第一显著性,在各所述待训练特征中剔除符合预设剔除显著性要求的待剔除特征,进而基于剔除后的各所述待训练特征,进行对所述第一初始训练模型的循环训练,获得循环训练模型集,进而基于所述配置参数,从所述第一初始训练模型和循环训练模型集之中选取目标训练模型,进而进行所述目标训练模型对应的可视化数据的生成,并将所述可视化数据反馈至所述客户端。也即,本实施例提供了一种无代码化分布式建模和可视化建模的向后选择模式的模型选择方法,用户只需通过客户端设置并发送必要的配置参数至服务端,服务端即可反馈相应的向后模型选择过程对应的可视化数据和向后模型选择结果,也即,通过客户端和服务端进行通信连接以进行模型建模,实现了分布式建模,进而相比于单机进行的所述向后选择模式建模,提高了向后选择模式的建模效率,进而通过生成所述目标训练模型对应的可视化数据,并反馈至客户端,实现了可视化建模,降低了建模人员的能力门槛要求并进一步提高了向后选择模式的建模效率,且在本实施例中用户只需在客户端的可视化界面输入必要的模型参数即可获取相应的向后模型选择结果,对用户并无代码开发能力的要求,进而实现了无代码建模,进一步降低了对建模人员的能力门槛要求,所以,解决了现有技术中向后选择模式的建模门槛高和效率低的技术问题。In this embodiment, by receiving the configuration parameters sent by the client associated with the server and acquiring the features to be trained, the preset model to be trained is trained based on each of the features to be trained and the configuration parameters to obtain the first initial Training the model, and then calculate the first saliency corresponding to each of the features to be trained, and based on each of the first salience, remove the features to be removed from the features to be trained that meet the preset removal saliency requirements, and then Based on the eliminated features to be trained, the first initial training model is cyclically trained to obtain a cyclic training model set, and then based on the configuration parameters, from the first initial training model and the cyclic training model set Selecting a target training model in, then generating visualization data corresponding to the target training model, and feeding back the visualization data to the client. That is, this embodiment first sends the configuration parameters sent by the client associated with the server and acquires the features to be trained, and based on each of the features to be trained and the configuration parameters, performs a comparison of the preset to be trained The training of the model, the first initial training model is obtained, and the first saliency corresponding to each of the features to be trained is calculated, and then based on each of the first salience, the features to be trained are eliminated in accordance with the preset Remove the features to be removed that require saliency, and then perform cyclic training on the first initial training model based on the removed features to be trained to obtain a cyclic training model set, and then based on the configuration parameters, from the The target training model is selected from the first initial training model and the cyclic training model set, and then the visualization data corresponding to the target training model is generated, and the visualization data is fed back to the client. That is, this embodiment provides a model selection method for the backward selection mode of codeless distributed modeling and visual modeling. The user only needs to set and send the necessary configuration parameters to the server through the client. That is to say, the visual data corresponding to the backward model selection process and the backward model selection result can be fed back, that is, the client and the server are connected to communicate with each other for model modeling, which realizes distributed modeling, which is compared with The modeling of the backward selection mode performed by a stand-alone machine improves the modeling efficiency of the backward selection mode. By generating the visualization data corresponding to the target training model and feeding it back to the client, the visualization modeling is realized, which reduces The ability threshold of modelers is required and the modeling efficiency of the backward selection mode is further improved. In this embodiment, the user only needs to input the necessary model parameters in the visual interface of the client to obtain the corresponding backward model selection results. There is no requirement for the user's code development ability, and thus no code modeling is realized, which further reduces the ability threshold requirement for modelers. Therefore, it solves the high modeling threshold and low efficiency of the backward selection mode in the prior art. Technical issues.
进一步地,参照图3,基于本申请中第一实施例,在向后模型选择方法的另一实施例中,在步骤S20中,所述循环训练模型集包括一个或者多个模型元素,各所述模型元素中包括第二初始训练模型,Further, referring to FIG. 3, based on the first embodiment of the present application, in another embodiment of the backward model selection method, in step S20, the cyclic training model set includes one or more model elements, each of which The model element includes the second initial training model,
所述基于各所述第一显著性,在各所述待训练特征中剔除符合预设剔除显著性要求的待剔除特征,以基于剔除后的各所述待训练特征,对所述第一初始训练模型进行循环训练,获得循环训练模型集的步骤包括:According to each of the first saliency, the feature to be removed that meets the preset removal saliency requirement is removed from the features to be trained, so as to compare the first initial saliency based on the removed features to be trained. The training model performs cyclic training, and the steps to obtain the cyclic training model set include:
步骤C10,基于各所述第一显著性和所述预设剔除显著性要求,选取各所述待训练特征中的所述待剔除特征,并剔除所述待剔除特征;Step C10, based on each of the first saliency and the preset removal saliency requirements, select the feature to be removed among the features to be trained, and remove the feature to be removed;
在本实施例中,需要说明的是,所述第一显著性可基于皮尔逊相关性值进行判定,当所述皮尔逊相关性值小于或者等于预设皮尔逊相关性阀值,则判定所述第一显著性对应的特征不满足预设剔除显著性要求,也即,所述第一显著性对应的特征表现为显著,当所述皮尔逊相关性值大于预设皮尔逊相关性阀值时,则判定所述第一显著性对应的特征满足预设剔除显著性要求,也即,所述第一显著性对应的特征表现为不显著。In this embodiment, it should be noted that the first significance can be determined based on the Pearson correlation value, and when the Pearson correlation value is less than or equal to the preset Pearson correlation threshold, the determination is The feature corresponding to the first saliency does not meet the preset saliency removal requirement, that is, the feature corresponding to the first saliency appears to be significant, when the Pearson correlation value is greater than the preset Pearson correlation threshold When, it is determined that the feature corresponding to the first saliency satisfies the preset saliency removal requirement, that is, the feature corresponding to the first saliency is not significant.
基于各所述第一显著性和所述预设剔除显著性要求,选取各所述待训练特征中的所述待剔除特征,并剔除所述待剔除特征,具体地,将各所述第一显著性进行对比,以在各所述待训练特征中选取显著性最低的特征作为目标特征,并判断所述目标特征是否满足预设剔除显著性要求,若所述目标特征满足所述预设剔除显著性要求,则将所述目标特征作为所述待剔除特征,并剔除所述待剔除特征,若所述目标特征不满足所述预设剔除显著性要求,则结束本次循环训练。Based on each of the first saliency and the preset removal saliency requirements, select the feature to be removed among the features to be trained, and remove the feature to be removed, specifically, combine each of the first The saliency is compared, the feature with the lowest saliency among the features to be trained is selected as the target feature, and it is judged whether the target feature satisfies the pre-determined saliency requirement, if the target feature meets the pre-determined removal If the saliency requirement is required, the target feature is used as the feature to be eliminated, and the feature to be eliminated is eliminated. If the target feature does not meet the pre-determined saliency requirement for elimination, the current cycle training is ended.
其中,所述基于各所述第一显著性和所述预设剔除显著性要求,选取各所述待训练特征中的所述待剔除特征的步骤包括:Wherein, the step of selecting the feature to be removed among the features to be trained based on each of the first saliency and the preset removal saliency requirement includes:
步骤C11,将各所述第一显著性进行比对,以在各所述待训练特征中选取显著性最低的特征作为目标特征;Step C11, comparing each of the first saliency, and selecting the feature with the lowest saliency among the features to be trained as the target feature;
在本实施例中,将各所述第一显著性进行比对,以在各所述待训练特征中选取显著性最低的特征作为目标特征,具体地,将各所述第一显著性进行一一比对,以获取各所述显著性对应的各所述待训练特征中最不显著的特征,也即,获取皮尔逊相关性值最高的特征,也即,在各所述待训练特征中选取显著性最低的特征作为目标特征。In this embodiment, each of the first saliency is compared, and the feature with the lowest saliency is selected as the target feature among the features to be trained. Specifically, the first saliency is selected as a target feature. A comparison to obtain the least significant feature of each of the features to be trained corresponding to each of the saliency, that is, to obtain the feature with the highest Pearson correlation value, that is, in each of the features to be trained The least significant feature is selected as the target feature.
步骤C12,将所述目标特征的目标显著性与预设剔除显著性阀值进行比对;Step C12, comparing the target saliency of the target feature with a preset saliency rejection threshold;
步骤C13,若所述目标显著性小于所述预设剔除显著性阀值,则判定所述目标特征满足所述预设剔除显著性要求,并将所述目标特征作为所述待剔除特征。Step C13: If the target significance is less than the preset rejection significance threshold, it is determined that the target feature meets the preset rejection significance requirement, and the target feature is taken as the feature to be rejected.
在本实施例中,将所述目标特征的目标显著性与预设剔除显著性阀值进行比对,若所述目标显著性小于所述预设剔除显著性阀值,则判定所述目标特征满足所述预设剔除显著性要求,并将所述目标特征作为所述待剔除特征,具体地,将所述目标特征的目标显著性与预设显著性阀值进行对比,其中,所述目标显著性为所述目标特征的第一显著性,若所述目标显著性低于所述预设显著性阀值,则所述目标特征满足所述预设剔除显著性要求,也即,所述目标特征是不显著的,进而将所述目标特征作为所述待剔除特征,若所述目标显著性高于或者等于所述预设显著性阀值,则所述目标特征不满足所述预设剔除显著性要求,也即,所述目标特征是显著的,则结束本次循环训练。In this embodiment, the target saliency of the target feature is compared with a preset rejection saliency threshold, and if the target saliency is less than the preset rejection saliency threshold, the target feature is determined Meet the preset saliency requirement for rejection, and use the target feature as the feature to be rejected. Specifically, the target saliency of the target feature is compared with a preset saliency threshold, wherein the target The saliency is the first saliency of the target feature. If the target saliency is lower than the preset saliency threshold, the target feature meets the preset saliency removal requirement, that is, the The target feature is not significant, and then the target feature is used as the feature to be eliminated. If the target significance is higher than or equal to the preset significance threshold, the target feature does not satisfy the preset Excluding the significance requirement, that is, the target feature is significant, then this cycle training is ended.
步骤C20,基于剔除后的各所述待训练特征,对所述第一初始训练模型进行训练,获得所述第二初始训练模型。Step C20, training the first initial training model based on the eliminated features to be trained to obtain the second initial training model.
在本实施例中,需要说明的是,所述循环训练模型集包括一个或者多个模型元素。In this embodiment, it should be noted that the cyclic training model set includes one or more model elements.
基于剔除后的各所述待训练特征,对所述第一初始训练模型进行训练,获得所述第二初始训练模型,具体地,将剔除后的各所述待训练特征的特征数据输入所述第一初始训练模型,以对所述第一初始训练模型进行迭代训练更新,直至更新后的所述第一初始训练模型满足预设训练完成判定条件,获得更新后的所述第一初始训练模型,也即,获得所述第二初始训练模型,其中,所述预设训练完成判定条件包括达到最大迭代次数和达到最小收敛误差等。Based on the eliminated features to be trained, the first initial training model is trained to obtain the second initial training model. Specifically, the feature data of the eliminated features to be trained is input into the A first initial training model to perform an iterative training update on the first initial training model until the updated first initial training model satisfies a preset training completion judgment condition to obtain the updated first initial training model That is, the second initial training model is obtained, wherein the preset training completion judgment condition includes reaching the maximum number of iterations and reaching the minimum convergence error.
步骤C30,计算剔除后的各所述待训练特征的第二显著性,并基于各所述第二显著性,在剔除后的各所述待训练特征中再次剔除符合所述预设剔除显著性要求的其他所述待剔除特征;Step C30: Calculate the second saliency of each feature to be trained after culling, and based on each of the second saliency, remove again from each feature to be trained after culling that meets the preset removal saliency Other required features to be removed;
在本实施例中,计算剔除后的各所述待训练特征的第二显著性,并基于各所述第二显著性,在剔除后的各所述待训练特征中再次剔除符合所述预设剔除显著性要求的其他所述待剔除特征,具体地,重新计算剔除后的各所述待训练特征的卡方值wald,并基于重新计算的各所述卡方值wald和剔除后的各所述待训练特征的自由度,计算剔除后的各所述待训练特征的第二显著性,进而基于各所述第二显著性,判断剔除后的各所述待训练特征中是否存在满足预设剔除显著性要求的待剔除特征,若剔除后的各所述待训练特征中存在满足预设剔除显著性要求的其他待剔除特征,则再次剔除所述其他待剔除特征,若剔除后的各所述待训练特征中不存在满足预设剔除显著性要求的其他待剔除特征,则结束本次循环训练。In this embodiment, the second saliency of each feature to be trained after being removed is calculated, and based on each of the second saliency, the removal of each feature to be trained after removal is again consistent with the preset The other features to be removed that require saliency are removed, specifically, the chi-square value wald of each feature to be trained after removal is recalculated, and based on the recalculated chi-square value wald and each removed feature. The degrees of freedom of the features to be trained are calculated, and the second saliency of each feature to be trained after being removed is calculated, and based on each of the second saliency, it is determined whether there is any feature that satisfies the preset after being removed. Remove the feature to be removed that requires saliency. If there are other features to be removed that meet the preset requirement of removal saliency among the removed features to be trained, the other features to be removed will be removed again. If there are no other features to be eliminated that meet the pre-determined saliency requirement for elimination among the features to be trained, the current cycle training is ended.
步骤C40,基于再次剔除后的各所述待训练特征,对所述第二初始训练模型进行循环训练,获得一个或者多个所述模型元素,直至各所述待训练特征中不存在所述待剔除特征。Step C40: Perform cyclic training on the second initial training model based on each of the features to be trained after being removed again, to obtain one or more of the model elements, until the feature to be trained does not exist in each of the features to be trained. Remove features.
在本实施例中,基于再次剔除后的各所述待训练特征,对所述第二初始训练模型进行循环训练,获得一个或者多个所述模型元素,直至各所述待训练特征中不存在所述待剔除特征,具体地,基于再次剔除后的各所述待训练特征,对所述第二初始训练模型进行迭代训练更新,直至所述第二初始训练模型达到训练完成判定条件,获得更新后的所述第二初始训练模型,也即,获得模型元素之一,并重新循环进行所述待剔除特征的寻找和剔除、对循环更新的所述第二初始训练模型的跌打训练更新,获得一个或者多个模型元素,直至 所述待训练特征中不存在满足预设剔除显著性要求的所述待剔除特征,则结束本次循环训练,进而获得循环训练模型集,如图4所示为本实施例结合第一实施例进行向后模型选择的流程示意图,其中,模型中的特征即为各所述待训练特征,训练模型为所述预设待训练模型或者为经过训练后的预设待训练模型,例如第一初始训练模型或者其他模型元素等,所述阀值为所述预设剔除显著性阀值。In this embodiment, based on each of the features to be trained after being removed again, the second initial training model is cyclically trained to obtain one or more of the model elements until there is no feature in each of the features to be trained The features to be removed, specifically, based on the features to be trained after being removed again, the second initial training model is iteratively trained and updated until the second initial training model reaches the training completion judgment condition, and the update is obtained The latter second initial training model, that is, one of the model elements is obtained, and the search and elimination of the features to be eliminated are re-circulated, and the bone-setting training update of the cyclically updated second initial training model is performed, Obtain one or more model elements, until there is no feature to be removed that meets the preset removal significance requirement among the features to be trained, then this cyclic training is ended, and then a cyclic training model set is obtained, as shown in Figure 4 This embodiment is a schematic diagram of the flow of backward model selection in combination with the first embodiment, where the features in the model are each of the features to be trained, and the training model is the preset model to be trained or the pre-trained model. It is assumed that the model to be trained, such as the first initial training model or other model elements, etc., and the threshold value is the preset significance threshold for rejection.
本实施例基于各所述第一显著性和所述预设剔除显著性要求,选取各所述待训练特征中的所述待剔除特征,并剔除所述待剔除特征,进而基于剔除后的各所述待训练特征,对所述第一初始训练模型进行训练,获得所述第二初始训练模型,进而计算剔除后的各所述待训练特征的第二显著性,并基于各所述第二显著性,在剔除后的各所述待训练特征中再次剔除符合所述预设剔除显著性要求的其他所述待剔除特征,进而基于再次剔除后的各所述待训练特征,对所述第二初始训练模型进行循环训练,获得一个或者多个所述模型元素,直至各所述待训练特征中不存在所述待剔除特征。也即,本实施例通过计算各所述待训练特征的显著性,逐个剔除各所述待训练特征中的待剔除特征,并基于每次剔除后的待训练特征对所述第一初始训练模型进行训练更新,直至各所述待训练特征中不存在所述待剔除特征,则获得所述循环训练模型集,进而基于所述循环训练模型集可进行向后选择模式的模型选择,也即,通过计算并分析各待训练特征对应的显著性,逐步剔除各所述待训练特征的待剔除特征,以对所述第一初始训练模型进行循环训练,获得循环训练模型集,进而为实现无代码化分布式建模和可视化建模的向后选择模式的模型选择奠定了基础,也即,为解决现有技术中向后选择模式建模门槛高和效率低的技术问题奠定了基础。In this embodiment, based on each of the first saliency and the preset removal saliency requirement, the feature to be removed from the features to be trained is selected, and the feature to be removed is removed, and then based on each removed feature For the feature to be trained, the first initial training model is trained to obtain the second initial training model, and then the second saliency of each feature to be trained after being eliminated is calculated, and based on each of the second Saliency, among the features to be trained after being removed, other features to be removed that meet the pre-determined saliency requirement are removed again, and then based on the features to be trained after being removed again, the first 2. The initial training model performs cyclic training to obtain one or more of the model elements until the feature to be removed does not exist in each feature to be trained. That is, in this embodiment, by calculating the saliency of each of the features to be trained, the features to be eliminated in each feature to be trained are eliminated one by one, and the first initial training model is analyzed based on the features to be trained after each elimination. The training update is performed until the feature to be removed does not exist in each feature to be trained, the cyclic training model set is obtained, and the model selection of the backward selection mode can be performed based on the cyclic training model set, that is, By calculating and analyzing the corresponding saliency of each feature to be trained, the feature to be removed for each feature to be trained is gradually eliminated to perform cyclic training on the first initial training model to obtain a cyclic training model set, and then to achieve no code The model selection of the backward selection mode of distributed modeling and visual modeling lays the foundation, that is, it lays a foundation for solving the technical problems of high threshold and low efficiency of backward selection mode modeling in the prior art.
进一步地,参照图5,基于本申请中第一实施例,在向前模型选择方法的另一实施例中,所述向前模型选择方法应用于客户端,所述向前模型选择方法包括:Further, referring to FIG. 5, based on the first embodiment of the present application, in another embodiment of the forward model selection method, the forward model selection method is applied to the client, and the forward model selection method includes:
步骤A10,接收模型选择任务,并将所述模型选择任务对应的配置参数发送至与所述客户端关联的服务端,以供所述服务端基于所述配置参数进行模型选择,获得目标训练模型,并获取所述目标训练模型对应的可视化数据,以将所述可视化数据发送至所述客户端;Step A10: Receive a model selection task, and send configuration parameters corresponding to the model selection task to the server associated with the client, so that the server can perform model selection based on the configuration parameters to obtain a target training model , And obtain the visualization data corresponding to the target training model, so as to send the visualization data to the client;
在本实施例中,需要说明的是,所述模型选择任务包括目标模型要求,所述目标模型要求由所述配置参数决定,所述配置参数包括大迭代系数、最小收敛误差、模型选择模式等参数。In this embodiment, it should be noted that the model selection task includes target model requirements, the target model requirements are determined by the configuration parameters, and the configuration parameters include large iteration coefficients, minimum convergence errors, model selection modes, etc. parameter.
接收模型选择任务,并将所述模型选择任务对应的配置参数发送至与所述客户端关联的服务端,以供所述服务端基于所述配置参数进行模型选择,获得目标训练模型,并获取所述目标训练模型对应的可视化数据,以将所述可视化数据发送至所述客户端,具体地,接收模型选择任务,并在预设本地数据库中匹配所述模型选择任务对应的配置参数或者由用户基于所述模型选择任务自行设置所述配置参数,进一步地,将所述配置参数发送至与所述客户端关联的服务端,以供所述服务端基于所述配置参数进行对预设初始模型的训练更新,获得待训练模型,进而对所述待训练模型进行循环训练更新,获得一个或者多个待选择模型,也即获得循环训练模型集,并在各所述待选择模型中选取符合预设模型选择策略的模型作为目标训练模型,并将所述目标训练模型对应的过程数据转化为所述可视化数据反馈至所述客户端,其中,所述可视化数据包括备选特征可视化数据、模型选择汇总可视化数据和模型训练过程可视化数据,其中,所述备选特征为各所述待训练特征,所述模型选择汇总数据包括对所述循环训练模型集中的模型元素基于预设模型选择策略进行模型选择的汇总数据。Receive the model selection task, and send the configuration parameters corresponding to the model selection task to the server associated with the client, so that the server can make model selection based on the configuration parameters, obtain the target training model, and obtain The visualization data corresponding to the target training model is sent to the client, specifically, the model selection task is received, and the configuration parameters corresponding to the model selection task are matched in a preset local database or determined by The user sets the configuration parameters by himself based on the model selection task, and further, sends the configuration parameters to the server associated with the client, so that the server can perform a preset initialization based on the configuration parameters. The training update of the model, the model to be trained is obtained, and the cyclic training update is performed on the model to be trained to obtain one or more models to be selected, that is, the cyclic training model set is obtained, and the model to be selected is selected in each of the models to be selected. The model of the preset model selection strategy is used as the target training model, and the process data corresponding to the target training model is converted into the visualization data and fed back to the client, where the visualization data includes candidate feature visualization data and models Select and summarize visualization data and model training process visualization data, where the candidate features are each of the features to be trained, and the model selection summary data includes performing model elements in the cyclic training model set based on a preset model selection strategy. Summary data for model selection.
步骤A20,接收所述服务端反馈的所述可视化数据,并将所述可视化数据在预设可视化界面进行展示。Step A20: Receive the visualization data fed back by the server, and display the visualization data on a preset visualization interface.
在本实施例中,需要说明的是,所述客户端可在所述预设可视化界面上实时查询所述服务端的所述过程数据对应的可视化数据,且可在进行模型选择的过程中或者模型选择结束后进行所述过程数据的查询,所述客户端与所述服务端通信连接。In this embodiment, it should be noted that the client can query the visualization data corresponding to the process data of the server in real time on the preset visualization interface, and it can be in the process of model selection or model selection. After the selection is completed, the process data is inquired, and the client is in communication with the server.
本实施例通过接收模型选择任务,并将所述模型选择任务对应的配置参数发送至与所 述客户端关联的服务端,以供所述服务端基于所述配置参数进行模型选择,获得目标训练模型,并获取所述目标训练模型对应的可视化数据,以将所述可视化数据发送至所述客户端,进而接收所述服务端反馈的所述可视化数据,并将所述可视化数据在预设可视化界面进行展示。也即,本实施提供了一种无代码化分布式建模和可视化建模的模型选择方法,用户只需通过客户端设置并发送必要的配置参数至服务端,服务器端可反馈相应的可视化数据,也即,本实施例实现了分布式建模,提高了进行模型选择时的建模效率,且该模型选择过程对用户无任何代码开发能力要求,降低了对建模人员的能力门槛要求,且由于服务端可将获取所述目标训练模型对应的过程数据转化为可视化数据反馈至客户端,进一步降低了对建模人员的能力门槛要求,且可视化数据便于建模人员去进行理解和阅读,进而可进一步提高建模人员的建模效率,所以,解决了现有技术中向前选择模式建模门槛高和效率低的技术问题。In this embodiment, a model selection task is received, and the configuration parameters corresponding to the model selection task are sent to the server associated with the client, so that the server can perform model selection based on the configuration parameters to obtain target training Model, and obtain the visualization data corresponding to the target training model to send the visualization data to the client, and then receive the visualization data fed back by the server, and set the visualization data in a preset visualization The interface is displayed. That is, this implementation provides a model selection method for codeless distributed modeling and visual modeling. The user only needs to set and send the necessary configuration parameters to the server through the client, and the server can feed back the corresponding visual data That is, this embodiment implements distributed modeling, improves the modeling efficiency during model selection, and the model selection process does not have any code development capability requirements for users, which reduces the ability threshold requirements for modelers. And because the server can convert the process data corresponding to the target training model into visualized data and feed it back to the client, it further reduces the ability threshold requirements for modelers, and the visualized data is convenient for modelers to understand and read. In turn, the modeling efficiency of modelers can be further improved, and therefore, the technical problems of high threshold and low efficiency of forward selection model modeling in the prior art are solved.
参照图6,图6是本申请实施例方案涉及的硬件运行环境的设备结构示意图。Referring to FIG. 6, FIG. 6 is a schematic diagram of the device structure of the hardware operating environment involved in the solution of the embodiment of the present application.
如图6所示,该向后模型选择设备可以包括:处理器1001,例如CPU,存储器1005,通信总线1002。其中,通信总线1002用于实现处理器1001和存储器1005之间的连接通信。存储器1005可以是高速RAM存储器,也可以是稳定的存储器(non-volatile memory),例如磁盘存储器。存储器1005可选的还可以是独立于前述处理器1001的存储设备。As shown in FIG. 6, the backward model selection device may include a processor 1001, such as a CPU, a memory 1005, and a communication bus 1002. Among them, the communication bus 1002 is used to implement connection and communication between the processor 1001 and the memory 1005. The memory 1005 may be a high-speed RAM memory, or a non-volatile memory (non-volatile memory), such as a magnetic disk memory. Optionally, the memory 1005 may also be a storage device independent of the aforementioned processor 1001.
在一实施例中,该向后模型选择设备还可以包括矩形用户接口、网络接口、摄像头、RF(Radio Frequency,射频)电路,传感器、音频电路、WiFi模块等等。矩形用户接口可以包括显示屏(Display)、输入子模块比如键盘(Keyboard),可选矩形用户接口还可以包括标准的有线接口、无线接口。网络接口可选的可以包括标准的有线接口、无线接口(如WI-FI接口)。In an embodiment, the backward model selection device may further include a rectangular user interface, a network interface, a camera, an RF (Radio Frequency) circuit, a sensor, an audio circuit, a WiFi module, and so on. The rectangular user interface may include a display screen (Display) and an input sub-module such as a keyboard (Keyboard), and the optional rectangular user interface may also include a standard wired interface and a wireless interface. The network interface can optionally include a standard wired interface and a wireless interface (such as a WI-FI interface).
本领域技术人员可以理解,图6中示出的向后模型选择设备结构并不构成对向后模型选择设备的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件布置。Those skilled in the art can understand that the structure of the backward model selection device shown in FIG. 6 does not constitute a limitation on the backward model selection device, and may include more or less components than shown in the figure, or a combination of certain components, Or different component arrangements.
如图6所示,作为一种计算机存储可读存储介质的存储器1005中可以包括操作系统、网络通信模块以及向后模型选择程序。操作系统是管理和控制向后模型选择设备硬件和软件资源的程序,支持向后模型选择程序以及其它软件和/或程序的运行。网络通信模块用于实现存储器1005内部各组件之间的通信,以及与向后模型选择系统中其它硬件和软件之间通信。As shown in FIG. 6, the memory 1005, which is a computer-readable storage medium, may include an operating system, a network communication module, and a backward model selection program. The operating system is a program that manages and controls the hardware and software resources of the backward model selection device, and supports the operation of the backward model selection program and other software and/or programs. The network communication module is used to realize the communication between the components in the memory 1005 and the communication with other hardware and software in the backward model selection system.
在图6所示的向后模型选择设备中,处理器1001用于执行存储器1005中存储的向后模型选择程序,实现上述任一项所述的向后模型选择方法的步骤。In the backward model selection device shown in FIG. 6, the processor 1001 is configured to execute the backward model selection program stored in the memory 1005 to implement the steps of the backward model selection method described in any one of the foregoing items.
本申请向后模型选择设备具体实施方式与上述向后模型选择方法各实施例基本相同,在此不再赘述。The specific implementation of the backward model selection device of the present application is basically the same as the foregoing embodiments of the backward model selection method, and will not be repeated here.
本申请实施例还提供一种向后模型选择装置,所述向后模型选择装置应用于服务端,所述向后模型选择装置包括:An embodiment of the present application also provides a backward model selection device. The backward model selection device is applied to a server, and the backward model selection device includes:
第一训练模块,用于所述接收与所述服务端关联的客户端发送的配置参数并获取待训练特征,并基于各所述待训练特征和所述配置参数对预设待训练模型进行训练,获得第一初始训练模型;The first training module is configured to receive the configuration parameters sent by the client associated with the server and obtain the features to be trained, and train a preset model to be trained based on each of the features to be trained and the configuration parameters , To obtain the first initial training model;
第二训练模块,用于所述计算各所述待训练特征对应的第一显著性,并基于各所述第一显著性,在各所述待训练特征中剔除符合预设剔除显著性要求的待剔除特征,以基于剔除后的各所述待训练特征,对所述第一初始训练模型进行循环训练,获得循环训练模型集;The second training module is used to calculate the first saliency corresponding to each of the features to be trained, and based on each of the first saliences, to remove the features that meet the preset removal saliency requirements from the features to be trained The features to be eliminated, to perform cyclic training on the first initial training model based on each of the features to be trained after culling, to obtain a cyclic training model set;
选取模块,用于所述基于所述配置参数,从所述第一初始训练模型和循环训练模型集之中选取目标训练模型;A selection module for selecting a target training model from the first initial training model and a set of cyclic training models based on the configuration parameters;
反馈模块,用于所述生成所述目标训练模型对应的可视化数据,并将所述可视化数据反馈至所述客户端。The feedback module is used for generating the visualization data corresponding to the target training model, and feeding back the visualization data to the client.
在一实施例中,所述第二训练模块包括:In an embodiment, the second training module includes:
第一剔除子模块,用于所述基于各所述第一显著性和所述预设剔除显著性要求,选取各所述待训练特征中的所述待剔除特征,并剔除所述待剔除特征;The first culling sub-module is configured to select the feature to be removed among the features to be trained based on each of the first saliency and the preset saliency removal requirement, and to remove the feature to be removed ;
训练子模块,用于所述基于剔除后的各所述待训练特征,对所述第一初始训练模型进行训练,获得所述第二初始训练模型;A training sub-module, configured to train the first initial training model based on the eliminated features to be trained to obtain the second initial training model;
第二剔除子模块,用于所述计算剔除后的各所述待训练特征的第二显著性,并基于各所述第二显著性,在剔除后的各所述待训练特征中再次剔除符合所述预设剔除显著性要求的其他所述待剔除特征;The second culling sub-module is used to calculate the second saliency of each feature to be trained after being removed, and based on each of the second saliency, remove the coincidence again from each feature to be trained after being removed Other features to be removed that are required to be removed by the preset saliency;
循环训练子模块,用于所述基于再次剔除后的各所述待训练特征,对所述第二初始训练模型进行循环训练,获得一个或者多个所述模型元素,直至各所述待训练特征中不存在所述待剔除特征。The cyclic training sub-module is used to perform cyclic training on the second initial training model based on each of the features to be trained after being removed again, to obtain one or more of the model elements, until each feature to be trained The feature to be removed does not exist in.
在一实施例中,所述选取子模块包括:In an embodiment, the selection submodule includes:
第一比对单元,用于所述将各所述第一显著性进行比对,以在各所述待训练特征中选取显著性最低的特征作为目标特征;The first comparison unit is configured to compare each of the first saliency, and select the feature with the lowest saliency among the features to be trained as the target feature;
第二比对单元,用于所述将所述目标特征的目标显著性与预设剔除显著性阀值进行比对;The second comparison unit is used to compare the target significance of the target feature with a preset significance threshold for rejection;
判定单元,用于所述若所述目标显著性小于所述预设剔除显著性阀值,则判定所述目标特征满足所述预设剔除显著性要求,并将所述目标特征作为所述待剔除特征。The determining unit is configured to determine that if the target significance is less than the preset rejection significance threshold, determine that the target feature meets the preset rejection significance requirement, and use the target feature as the pending Remove features.
在一实施例中,所述第二训练模块还包括:In an embodiment, the second training module further includes:
第一计算子模块,用于所述计算各所述待训练特征的卡方值wald;The first calculation sub-module is used to calculate the chi-square value wald of each of the features to be trained;
第二计算子模块,用于所述基于各所述卡方值wald和各所述待训练特征的自由度,计算各所述第一显著性。The second calculation sub-module is used for calculating each of the first saliency based on each of the chi-square value wald and the degrees of freedom of each of the features to be trained.
在一实施例中,所述第一训练模块包括:In an embodiment, the first training module includes:
训练更新子模块,用于所述将各所述待训练特征对应的所述特征数据输入所述预设待训练模型,以对所述预设待训练模型进行训练更新;A training update sub-module for inputting the feature data corresponding to each of the features to be trained into the preset model to be trained, so as to train and update the preset model to be trained;
第一判断子模块,用于所述判断更新后的所述预设待训练模型是否满足所述训练完成判定条件,若更新后的所述预设待训练模型满足所述训练完成判定条件,则获得所述第一初始训练模型;The first judging sub-module is used to judge whether the updated preset model to be trained satisfies the training completion judging condition, and if the updated preset to be trained model satisfies the training completion judging condition, then Obtaining the first initial training model;
第二判断子模块,用于所述若更新后的所述预设待训练模型不满足所述训练完成判定条件,则继续对所述预设待训练模型进行迭代训练更新,直至更新后的所述预设待训练模型满足所述训练完成判定条件。The second judgment sub-module is configured to continue to perform iterative training updates on the preset to-be-trained model if the updated preset to-be-trained model does not satisfy the training completion judgment condition until the updated all-in-one model The preset model to be trained satisfies the training completion judgment condition.
在一实施例中,所述选取模块包括:In an embodiment, the selection module includes:
第一获取子模块,用于所述获取所述参数配置中的模型选择策略,其中,所述模型选择策略包括AUC值和AIC值;The first obtaining sub-module is configured to obtain the model selection strategy in the parameter configuration, wherein the model selection strategy includes an AUC value and an AIC value;
第一比对子模块,用于所述若所述模型选择策略为所述AUC值,则将所述循环训练模型集中各元素的所述AUC值进行对比,以选取最大的所述AUC值对应的元素作为所述目标训练模型;The first comparison sub-module is configured to compare the AUC value of each element in the cyclic training model set if the model selection strategy is the AUC value to select the largest corresponding AUC value As the target training model;
第二比对子模块,用于所述若所述模型选择策略为所述AIC值,则将所述循环训练模型集中各元素的所述AIC值进行对比,以选取最小的所述AIC值对应的元素作为所述目标训练模型。The second comparison sub-module is used to compare the AIC value of each element in the cyclic training model set if the model selection strategy is the AIC value to select the smallest corresponding AIC value As the target training model.
在一实施例中,所述反馈模块包括:In an embodiment, the feedback module includes:
第二获取子模块,用于所述获取所述目标训练模型的向后模型选择过程对应的备选特征数据、选择汇总数据和训练过程数据;The second acquisition sub-module is used to acquire the candidate feature data, selection summary data, and training process data corresponding to the backward model selection process of the target training model;
生成子模块,用于所述生成所述备选特征数据、所述选择汇总数据和所述训练过程数据共同对应的可视化数据,并将所述可视化数据实时反馈至所述可视化界面。A generating sub-module is used to generate the visualization data corresponding to the candidate feature data, the selection summary data, and the training process data, and feed back the visualization data to the visualization interface in real time.
本申请向后模型选择装置的具体实施方式与上述向后模型选择方法各实施例基本相 同,在此不再赘述。The specific implementation of the backward model selection device of the present application is basically the same as the foregoing embodiments of the backward model selection method, and will not be repeated here.
为实现上述目的,本申请实施例还提供一种向后模型选择装置,所述向后模型选择装置应用于客户端,所述向后模型选择装置包括:To achieve the foregoing objective, an embodiment of the present application also provides a backward model selection device, the backward model selection device is applied to a client, and the backward model selection device includes:
发送模块,用于所述接收模型选择任务,并将所述模型选择任务对应的配置参数发送至与所述客户端关联的服务端,以供所述服务端基于所述配置参数和获取的待训练特征进行模型选择,获得目标训练模型,并获取所述目标训练模型对应的可视化数据,以将所述可视化数据发送至所述客户端;The sending module is configured to receive the model selection task, and send the configuration parameters corresponding to the model selection task to the server associated with the client, so that the server can use the configuration parameters and the acquired configuration parameters. Performing model selection on training features, obtaining a target training model, and obtaining visualization data corresponding to the target training model, so as to send the visualization data to the client;
接收模块,用于所述接收所述服务端反馈的所述可视化数据,并将所述可视化数据在预设可视化界面进行展示。The receiving module is configured to receive the visualization data fed back by the server, and display the visualization data on a preset visualization interface.
本申请向后模型选择装置的具体实施方式与上述向后模型选择方法各实施例基本相同,在此不再赘述。The specific implementation of the backward model selection device of the present application is basically the same as the foregoing embodiments of the backward model selection method, and will not be repeated here.
本申请实施例提供了一种可读存储介质,且所述可读存储介质存储有一个或者一个以上程序,所述一个或者一个以上程序还可被一个或者一个以上的处理器执行以用于实现上述任一项所述的向后模型选择方法的步骤。The embodiments of the present application provide a readable storage medium, and the readable storage medium stores one or more programs, and the one or more programs may also be executed by one or more processors for implementation The steps of the backward model selection method described in any one of the above.
本申请可读存储介质具体实施方式与上述向后模型选择方法各实施例基本相同,在此不再赘述。The specific implementation of the readable storage medium of the present application is basically the same as each embodiment of the backward model selection method described above, and will not be repeated here.
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利处理范围内。The above are only the preferred embodiments of the application, and do not limit the scope of the patent for this application. Any equivalent structure or equivalent process transformation made using the content of the description and drawings of the application, or directly or indirectly applied to other related technical fields , The same reason is included in the scope of patent processing of this application.

Claims (20)

  1. 一种向后模型选择方法,其中,所述向后模型选择方法应用于服务端,所述向后模型选择方法包括:A method for selecting a backward model, wherein the method for selecting a backward model is applied to a server, and the method for selecting a backward model includes:
    接收与所述服务端关联的客户端发送的配置参数并获取待训练特征,并基于各所述待训练特征和所述配置参数对预设待训练模型进行训练,获得第一初始训练模型;Receiving configuration parameters sent by the client associated with the server and acquiring features to be trained, and training a preset model to be trained based on each of the features to be trained and the configuration parameters to obtain a first initial training model;
    计算各所述待训练特征对应的第一显著性,并基于各所述第一显著性,在各所述待训练特征中剔除符合预设剔除显著性要求的待剔除特征,以基于剔除后的各所述待训练特征,对所述第一初始训练模型进行循环训练,获得循环训练模型集;Calculate the first saliency corresponding to each of the features to be trained, and based on each of the first saliency, eliminate the features to be removed that meet the preset saliency requirements for removal from the features to be trained, so as to be based on the removed features. Each of the features to be trained performs cyclic training on the first initial training model to obtain a cyclic training model set;
    基于所述配置参数,从所述第一初始训练模型和循环训练模型集之中选取目标训练模型;Based on the configuration parameters, selecting a target training model from the first initial training model and the cyclic training model set;
    生成所述目标训练模型对应的可视化数据,并将所述可视化数据反馈至所述客户端。Generate visualization data corresponding to the target training model, and feed back the visualization data to the client.
  2. 如权利要求1所述的向后模型选择方法,其中,所述循环训练模型集包括一个或者多个模型元素,各所述模型元素中包括第二初始训练模型,The backward model selection method according to claim 1, wherein the cyclic training model set includes one or more model elements, and each of the model elements includes a second initial training model,
    所述基于各所述第一显著性,在各所述待训练特征中剔除符合预设剔除显著性要求的待剔除特征,以基于剔除后的各所述待训练特征,对所述第一初始训练模型进行循环训练,获得循环训练模型集的步骤包括:According to each of the first saliency, the feature to be removed that meets the preset removal saliency requirement is removed from the features to be trained, so as to compare the first initial saliency based on the removed features to be trained. The training model performs cyclic training, and the steps to obtain the cyclic training model set include:
    基于各所述第一显著性和所述预设剔除显著性要求,选取各所述待训练特征中的所述待剔除特征,并剔除所述待剔除特征;Based on each of the first saliency and the preset removal saliency requirement, select the feature to be removed from the features to be trained, and remove the feature to be removed;
    基于剔除后的各所述待训练特征,对所述第一初始训练模型进行训练,获得所述第二初始训练模型;Training the first initial training model based on each of the features to be trained after being eliminated to obtain the second initial training model;
    计算剔除后的各所述待训练特征的第二显著性,并基于各所述第二显著性,在剔除后的各所述待训练特征中再次剔除符合所述预设剔除显著性要求的其他所述待剔除特征;Calculate the second saliency of each feature to be trained after culling, and based on each of the second saliency, remove other features that meet the preset removal saliency requirements from the features to be trained after removal. The features to be removed;
    基于再次剔除后的各所述待训练特征,对所述第二初始训练模型进行循环训练,获得一个或者多个所述模型元素,直至各所述待训练特征中不存在所述待剔除特征。Based on each of the features to be trained after being removed again, the second initial training model is cyclically trained to obtain one or more of the model elements until the feature to be removed does not exist in each of the features to be trained.
  3. 如权利要求2所述的向后模型选择方法,其中,所述基于各所述第一显著性和所述预设剔除显著性要求,选取各所述待训练特征中的所述待剔除特征的步骤包括:3. The backward model selection method according to claim 2, wherein the selected feature of the feature to be removed from the feature to be trained is based on each of the first saliency and the preset removal saliency requirement The steps include:
    将各所述第一显著性进行比对,以在各所述待训练特征中选取显著性最低的特征作为目标特征;Comparing each of the first saliency, and selecting the feature with the lowest saliency among the features to be trained as the target feature;
    将所述目标特征的目标显著性与预设剔除显著性阀值进行比对;Comparing the target saliency of the target feature with a preset saliency rejection threshold;
    若所述目标显著性小于所述预设剔除显著性阀值,则判定所述目标特征满足所述预设剔除显著性要求,并将所述目标特征作为所述待剔除特征。If the target significance is less than the preset rejection significance threshold, it is determined that the target feature meets the preset rejection significance requirement, and the target feature is taken as the feature to be rejected.
  4. 如权利要求1所述的向后模型选择方法,其中,所述计算各所述待训练特征对应的第一显著性的步骤包括:The backward model selection method according to claim 1, wherein the step of calculating the first saliency corresponding to each of the features to be trained comprises:
    计算各所述待训练特征的卡方值wald;Calculating the chi-square value wald of each of the features to be trained;
    基于各所述卡方值wald和各所述待训练特征的自由度,计算各所述第一显著性。Based on each of the chi-square value wald and the degrees of freedom of each of the features to be trained, each of the first saliences is calculated.
  5. 如权利要求1所述的向后模型选择方法,其中,所述配置参数包括训练完成判定条件,所述待训练特征包括一条或者多条特征数据;5. The backward model selection method according to claim 1, wherein the configuration parameters include training completion judgment conditions, and the features to be trained include one or more pieces of feature data;
    所述基于各所述待训练特征和所述配置参数对预设待训练模型进行训练,获得第一初始训练模型的步骤包括:The step of training a preset model to be trained based on each of the features to be trained and the configuration parameters, and obtaining a first initial training model includes:
    将各所述待训练特征对应的所述特征数据输入所述预设待训练模型,以对所述预设待训练模型进行训练更新;Input the feature data corresponding to each of the features to be trained into the preset model to be trained, so as to train and update the preset model to be trained;
    判断更新后的所述预设待训练模型是否满足所述训练完成判定条件,若更新后的所述预设待训练模型满足所述训练完成判定条件,则获得所述第一初始训练模型;Judging whether the updated preset to-be-trained model satisfies the training completion judgment condition, and if the updated preset to-be-trained model satisfies the training completion judgment condition, the first initial training model is obtained;
    若更新后的所述预设待训练模型不满足所述训练完成判定条件,则继续对所述预设待训练模型进行迭代训练更新,直至更新后的所述预设待训练模型满足所述训练完成判定条件。If the updated preset to-be-trained model does not meet the training completion judgment condition, continue to perform iterative training updates on the preset to-be-trained model until the updated preset to-train model satisfies the training Complete the judgment condition.
  6. 如权利要求1所述的向后模型选择方法,其中,所述基于所述配置参数,从所述第一初始训练模型和循环训练模型集之中选取目标训练模型的步骤包括:5. The backward model selection method according to claim 1, wherein the step of selecting a target training model from the first initial training model and a set of cyclic training models based on the configuration parameters comprises:
    获取所述参数配置中的模型选择策略,其中,所述模型选择策略包括AUC值和AIC值;Acquiring a model selection strategy in the parameter configuration, where the model selection strategy includes an AUC value and an AIC value;
    若所述模型选择策略为所述AUC值,则将所述循环训练模型集中各元素的所述AUC值进行对比,以选取最大的所述AUC值对应的元素作为所述目标训练模型;If the model selection strategy is the AUC value, compare the AUC values of the elements in the cyclic training model set, and select the element corresponding to the largest AUC value as the target training model;
    若所述模型选择策略为所述AIC值,则将所述循环训练模型集中各元素的所述AIC值进行对比,以选取最小的所述AIC值对应的元素作为所述目标训练模型。If the model selection strategy is the AIC value, the AIC values of the elements in the cyclic training model set are compared, and the element corresponding to the smallest AIC value is selected as the target training model.
  7. 如权利要求1所述的向后模型选择方法,其中,所述客户端包括可视化界面,The backward model selection method according to claim 1, wherein the client includes a visual interface,
    所述生成所述目标训练模型对应的可视化数据,并将所述可视化数据反馈至所述客户端的步骤包括:The step of generating visualization data corresponding to the target training model and feeding back the visualization data to the client includes:
    获取所述目标训练模型的模型选择过程对应的备选特征数据、选择汇总数据和训练过程数据;Acquiring candidate feature data, selection summary data, and training process data corresponding to the model selection process of the target training model;
    生成所述备选特征数据、所述选择汇总数据和所述训练过程数据共同对应的可视化数据,并将所述可视化数据实时反馈至所述可视化界面。Generate visualization data corresponding to the candidate feature data, the selection summary data, and the training process data, and feed back the visualization data to the visualization interface in real time.
  8. 如权利要求5所述的向后模型选择方法,其中,所述训练完成判定条件包括达到最小收敛误差、达到最大迭代次数。5. The backward model selection method according to claim 5, wherein the training completion judgment condition includes reaching a minimum convergence error and reaching a maximum number of iterations.
  9. 如权利要求4所述的向后模型选择方法,其中,所述非显著特征指的是各所述待训练特征中显著性低于预设显著性阀值的特征,其中,所述显著性基于所述卡方值wald和所述待训练特征的自由度获取,其中所述自由度与所述待训练特征的取值相关。The backward model selection method according to claim 4, wherein the insignificant feature refers to a feature whose significance is lower than a preset significance threshold among the features to be trained, wherein the significance is based on The chi-square value wald and the degree of freedom of the feature to be trained are acquired, wherein the degree of freedom is related to the value of the feature to be trained.
  10. 如权利要求1所述的向后模型选择方法,其中,所述第一显著性是基于皮尔逊相关性值进行判定的,当所述皮尔逊相关性值小于或者等于预设皮尔逊相关性阀值时,判定所述第一显著性对应的特征不满足预设剔除显著性要求。The backward model selection method of claim 1, wherein the first significance is determined based on a Pearson correlation value, when the Pearson correlation value is less than or equal to a preset Pearson correlation valve Value, it is determined that the feature corresponding to the first saliency does not meet the preset saliency removal requirement.
  11. 如权利要求4所述的向后模型选择方法,其中,所述计算各所述待训练特征的卡方值wald包括:5. The backward model selection method according to claim 4, wherein said calculating the chi-square value wald of each of the features to be trained comprises:
    将各所述待训练特征对应的特征数据表示矩阵代入预设卡方值wald计算公式,分布式并行计算各所述待训练特征对应的卡方值wald,其中,所述预设卡方值wald计算公式如下所示:The feature data representation matrix corresponding to each feature to be trained is substituted into the preset chi-square value wald calculation formula, and the chi-square value wald corresponding to each feature to be trained is calculated in parallel, wherein the preset chi-square value wald The calculation formula is as follows:
    Figure PCTCN2020134736-appb-100001
    Figure PCTCN2020134736-appb-100001
    其中,
    Figure PCTCN2020134736-appb-100002
    among them,
    Figure PCTCN2020134736-appb-100002
    S为卡方值wald,X为所述待训练特征对应的特征数据,其中,X包括n条数据,每一条数据包括k个数值,所述待训练特征集X分为第一模型特征集X 0和第二模型特征集X1,X 0包括n条数据,每一条数据包括(k-t)个数值,X 1包括n条数据,每一条数据包括t个数值,且X训练所述预设待训练模型获得的模型参数为θ,所述待训练模型的目标输出对应的数据集为Y,Y包括n条数据,且Y对应存在预测概率P,P包括n个概率(p 1、p 2、…、p n-1、p n),C为t*k的矩阵,h为k*1的向量。 S is the chi-square value wald, X is the feature data corresponding to the feature to be trained, where X includes n pieces of data, and each piece of data includes k values, and the feature set to be trained X is divided into a first model feature set X 0 and the second model feature set X1, X 0 includes n pieces of data, each piece of data includes (kt) numeric values, X 1 includes n pieces of data, each piece of data includes t numeric values, and X trains the preset to be trained The model parameter obtained by the model is θ, the data set corresponding to the target output of the model to be trained is Y, Y includes n pieces of data, and Y corresponds to the existence prediction probability P, P includes n probabilities (p 1 , p 2 ,... , P n-1 , p n ), C is a matrix of t*k, and h is a vector of k*1.
  12. 如权利要求4所述的向后模型选择方法,其中,所述计算剔除后的各所述待训练特征的第二显著性包括:The backward model selection method according to claim 4, wherein the second saliency of each of the features to be trained after the culling of the calculation comprises:
    重新计算剔除后的各所述待训练特征的卡方值wald,并基于重新计算的各所述卡方值wald和剔除后的各所述待训练特征的自由度,计算剔除后的各所述待训练特征的第二显著性。Recalculate the chi-square value wald of each feature to be trained after removal, and calculate each of the removed chi-square values wald based on the recalculated chi-square value wald and the degrees of freedom of each feature to be trained after removal The second significance of the feature to be trained.
  13. 如权利要求6所述的向后模型选择方法,其中,所述AUC值为ROC曲线下与坐标轴围成的面积,所述面积的数值小于或者等于1。7. The backward model selection method according to claim 6, wherein the AUC value is the area under the ROC curve enclosed by the coordinate axis, and the value of the area is less than or equal to 1.
  14. 一种向后模型选择方法,其中,所述向后模型选择方法应用于客户端,所述向后模型选择方法包括:A backward model selection method, wherein the backward model selection method is applied to a client, and the backward model selection method includes:
    接收模型选择任务,并将所述模型选择任务对应的配置参数发送至与所述客户端关联的服务端,以供所述服务端基于所述配置参数和获取的待训练特征进行模型选择,获得目标训练模型,并获取所述目标训练模型对应的可视化数据,以将所述可视化数据发送至所述客户端;Receive a model selection task, and send the configuration parameters corresponding to the model selection task to the server associated with the client, so that the server performs model selection based on the configuration parameters and the acquired features to be trained to obtain A target training model, and obtaining visualization data corresponding to the target training model, so as to send the visualization data to the client;
    接收所述服务端反馈的所述可视化数据,并将所述可视化数据在预设可视化界面进行展示。The visualization data fed back by the server is received, and the visualization data is displayed on a preset visualization interface.
  15. 一种向后模型选择设备,其中,所述向后模型选择设备包括:存储器、处理器以及存储在存储器上的用于实现所述向后模型选择方法的程序,A backward model selection device, wherein the backward model selection device includes a memory, a processor, and a program stored on the memory for implementing the backward model selection method,
    所述存储器用于存储实现向后模型选择方法的程序;The memory is used to store a program for implementing the backward model selection method;
    所述处理器用于执行实现所述向后模型选择方法的程序,以实现向后模型选择方法,所述向后模型选择方法包括:接收与所述服务端关联的客户端发送的配置参数并获取待训练特征,并基于各所述待训练特征和所述配置参数对预设待训练模型进行训练,获得第一初始训练模型;The processor is configured to execute a program that implements the backward model selection method to implement the backward model selection method. The backward model selection method includes: receiving configuration parameters sent by a client associated with the server and obtaining Training features to be trained, and training a preset model to be trained based on each of the features to be trained and the configuration parameters to obtain a first initial training model;
    计算各所述待训练特征对应的第一显著性,并基于各所述第一显著性,在各所述待训练特征中剔除符合预设剔除显著性要求的待剔除特征,以基于剔除后的各所述待训练特征,对所述第一初始训练模型进行循环训练,获得循环训练模型集;Calculate the first saliency corresponding to each of the features to be trained, and based on each of the first saliency, eliminate the features to be removed that meet the preset saliency requirements for removal from the features to be trained, so as to be based on the removed features. Each of the features to be trained performs cyclic training on the first initial training model to obtain a cyclic training model set;
    基于所述配置参数,从所述第一初始训练模型和循环训练模型集之中选取目标训练模型;Based on the configuration parameters, selecting a target training model from the first initial training model and the cyclic training model set;
    生成所述目标训练模型对应的可视化数据,并将所述可视化数据反馈至所述客户端。Generate visualization data corresponding to the target training model, and feed back the visualization data to the client.
  16. 如权利要求15所述的向后模型选择设备,其中,所述处理器用于执行实现所述向后模型选择方法的程序,以实现如下步骤:所述循环训练模型集包括一个或者多个模型元素,各所述模型元素中包括第二初始训练模型,The backward model selection device according to claim 15, wherein the processor is configured to execute a program for implementing the backward model selection method to implement the following steps: the cyclic training model set includes one or more model elements , Each of the model elements includes a second initial training model,
    所述基于各所述第一显著性,在各所述待训练特征中剔除符合预设剔除显著性要求的待剔除特征,以基于剔除后的各所述待训练特征,对所述第一初始训练模型进行循环训练,获得循环训练模型集的步骤包括:According to each of the first saliency, the feature to be removed that meets the preset removal saliency requirement is removed from the features to be trained, so as to compare the first initial saliency based on the removed features to be trained. The training model performs cyclic training, and the steps to obtain the cyclic training model set include:
    基于各所述第一显著性和所述预设剔除显著性要求,选取各所述待训练特征中的所述待剔除特征,并剔除所述待剔除特征;Based on each of the first saliency and the preset removal saliency requirement, select the feature to be removed from the features to be trained, and remove the feature to be removed;
    基于剔除后的各所述待训练特征,对所述第一初始训练模型进行训练,获得所述第二初始训练模型;Training the first initial training model based on each of the features to be trained after being eliminated to obtain the second initial training model;
    计算剔除后的各所述待训练特征的第二显著性,并基于各所述第二显著性,在剔除后的各所述待训练特征中再次剔除符合所述预设剔除显著性要求的其他所述待剔除特征;Calculate the second saliency of each feature to be trained after culling, and based on each of the second saliency, remove other features that meet the preset removal saliency requirements from the features to be trained after removal. The features to be removed;
    基于再次剔除后的各所述待训练特征,对所述第二初始训练模型进行循环训练,获得一个或者多个所述模型元素,直至各所述待训练特征中不存在所述待剔除特征。Based on each of the features to be trained after being removed again, the second initial training model is cyclically trained to obtain one or more of the model elements until the feature to be removed does not exist in each of the features to be trained.
  17. 如权利要求16所述的向后模型选择设备,其中,所述处理器用于执行实现所述向后模型选择方法的程序,以实现如下步骤:The backward model selection device according to claim 16, wherein the processor is configured to execute a program that implements the backward model selection method to implement the following steps:
    所述基于各所述第一显著性和所述预设剔除显著性要求,选取各所述待训练特征中的所述待剔除特征的步骤包括:The step of selecting the feature to be removed among the features to be trained based on each of the first saliency and the preset removal saliency requirement includes:
    将各所述第一显著性进行比对,以在各所述待训练特征中选取显著性最低的特征作为目标特征;Comparing each of the first saliency, and selecting the feature with the lowest saliency among the features to be trained as the target feature;
    将所述目标特征的目标显著性与预设剔除显著性阀值进行比对;Comparing the target saliency of the target feature with a preset saliency rejection threshold;
    若所述目标显著性小于所述预设剔除显著性阀值,则判定所述目标特征满足所述预设剔除显著性要求,并将所述目标特征作为所述待剔除特征。If the target significance is less than the preset rejection significance threshold, it is determined that the target feature meets the preset rejection significance requirement, and the target feature is taken as the feature to be rejected.
  18. 如权利要求15所述的向后模型选择设备,其中,所述处理器用于执行实现所述向后模型选择方法的程序,以实现如下步骤:15. The backward model selection device according to claim 15, wherein the processor is configured to execute a program that implements the backward model selection method to implement the following steps:
    所述计算各所述待训练特征对应的第一显著性的步骤包括:The step of calculating the first saliency corresponding to each of the features to be trained includes:
    计算各所述待训练特征的卡方值wald;Calculating the chi-square value wald of each of the features to be trained;
    基于各所述卡方值wald和各所述待训练特征的自由度,计算各所述第一显著性。Based on each of the chi-square value wald and the degrees of freedom of each of the features to be trained, each of the first saliences is calculated.
  19. 如权利要求15所述的向后模型选择设备,其中,所述处理器用于执行实现所述向后模型选择方法的程序,以实现如下步骤:15. The backward model selection device according to claim 15, wherein the processor is configured to execute a program that implements the backward model selection method to implement the following steps:
    所述配置参数包括训练完成判定条件,所述待训练特征包括一条或者多条特征数据;The configuration parameters include training completion judgment conditions, and the features to be trained include one or more pieces of feature data;
    所述基于各所述待训练特征和所述配置参数对预设待训练模型进行训练,获得第一初始训练模型的步骤包括:The step of training a preset model to be trained based on each of the features to be trained and the configuration parameters, and obtaining a first initial training model includes:
    将各所述待训练特征对应的所述特征数据输入所述预设待训练模型,以对所述预设待训练模型进行训练更新;Input the feature data corresponding to each of the features to be trained into the preset model to be trained, so as to train and update the preset model to be trained;
    判断更新后的所述预设待训练模型是否满足所述训练完成判定条件,若更新后的所述预设待训练模型满足所述训练完成判定条件,则获得所述第一初始训练模型;Judging whether the updated preset to-be-trained model satisfies the training completion judgment condition, and if the updated preset to-be-trained model satisfies the training completion judgment condition, the first initial training model is obtained;
    若更新后的所述预设待训练模型不满足所述训练完成判定条件,则继续对所述预设待训练模型进行迭代训练更新,直至更新后的所述预设待训练模型满足所述训练完成判定条件。If the updated preset to-be-trained model does not meet the training completion judgment condition, continue to perform iterative training updates on the preset to-be-trained model until the updated preset to-train model satisfies the training Complete the judgment condition.
  20. 一种可读存储介质,其中,所述可读存储介质上存储有实现向后模型选择方法的程序,所述实现向后模型选择方法的程序被处理器执行以实现如权利要求1至13或14中任一项所述向后模型选择方法的步骤。A readable storage medium, wherein a program for implementing the backward model selection method is stored on the readable storage medium, and the program for implementing the backward model selection method is executed by a processor to implement claims 1 to 13 or Steps of the backward model selection method described in any one of 14.
PCT/CN2020/134736 2020-01-09 2020-12-09 Backward model selection method and device, and readable storage medium WO2021139465A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010024439.3 2020-01-09
CN202010024439.3A CN111210022B (en) 2020-01-09 2020-01-09 Backward model selecting method, apparatus and readable storage medium

Publications (1)

Publication Number Publication Date
WO2021139465A1 true WO2021139465A1 (en) 2021-07-15

Family

ID=70786101

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/134736 WO2021139465A1 (en) 2020-01-09 2020-12-09 Backward model selection method and device, and readable storage medium

Country Status (2)

Country Link
CN (1) CN111210022B (en)
WO (1) WO2021139465A1 (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111210022B (en) * 2020-01-09 2024-05-17 深圳前海微众银行股份有限公司 Backward model selecting method, apparatus and readable storage medium
CN111241746B (en) * 2020-01-09 2024-01-26 深圳前海微众银行股份有限公司 Forward model selection method, apparatus, and readable storage medium
CN112434620B (en) * 2020-11-26 2024-03-01 新奥新智科技有限公司 Scene text recognition method, device, equipment and computer readable medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108830380A (en) * 2018-04-11 2018-11-16 开放智能机器(上海)有限公司 A kind of training pattern generation method and system based on cloud service
US20180374104A1 (en) * 2017-06-26 2018-12-27 Sap Se Automated learning of data aggregation for analytics
CN110298389A (en) * 2019-06-11 2019-10-01 上海冰鉴信息科技有限公司 More wheels circulation feature selection approach and device when training pattern
CN110543946A (en) * 2018-05-29 2019-12-06 百度在线网络技术(北京)有限公司 method and apparatus for training a model
CN111210022A (en) * 2020-01-09 2020-05-29 深圳前海微众银行股份有限公司 Backward model selection method, device and readable storage medium

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875289B (en) * 2017-05-08 2021-12-14 腾讯科技(深圳)有限公司 Algorithm debugging method, client, background server and system
US10600005B2 (en) * 2018-06-01 2020-03-24 Sas Institute Inc. System for automatic, simultaneous feature selection and hyperparameter tuning for a machine learning model
CN110378472A (en) * 2019-07-24 2019-10-25 苏州浪潮智能科技有限公司 A kind of data parallel training method, device and the equipment of deep neural network model

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180374104A1 (en) * 2017-06-26 2018-12-27 Sap Se Automated learning of data aggregation for analytics
CN108830380A (en) * 2018-04-11 2018-11-16 开放智能机器(上海)有限公司 A kind of training pattern generation method and system based on cloud service
CN110543946A (en) * 2018-05-29 2019-12-06 百度在线网络技术(北京)有限公司 method and apparatus for training a model
CN110298389A (en) * 2019-06-11 2019-10-01 上海冰鉴信息科技有限公司 More wheels circulation feature selection approach and device when training pattern
CN111210022A (en) * 2020-01-09 2020-05-29 深圳前海微众银行股份有限公司 Backward model selection method, device and readable storage medium

Also Published As

Publication number Publication date
CN111210022B (en) 2024-05-17
CN111210022A (en) 2020-05-29

Similar Documents

Publication Publication Date Title
WO2021139465A1 (en) Backward model selection method and device, and readable storage medium
WO2021139462A1 (en) Stepwise model selection method and device, and readable storage medium
WO2019233421A1 (en) Image processing method and device, electronic apparatus, and storage medium
US20190378044A1 (en) Processing dynamic data within an adaptive oracle-trained learning system using curated training data for incremental re-training of a predictive model
WO2022077646A1 (en) Method and apparatus for training student model for image processing
WO2019214344A1 (en) System reinforcement learning method and apparatus, electronic device, and computer storage medium
WO2020007177A1 (en) Quotation method executed by computer, quotation device, electronic device and storage medium
US11687804B2 (en) Latent feature dimensionality bounds for robust machine learning on high dimensional datasets
Hvarfner et al. Joint entropy search for maximally-informed Bayesian optimization
CN113326852A (en) Model training method, device, equipment, storage medium and program product
WO2021139483A1 (en) Forward model selection method and device, and readable storage medium
CN113222149A (en) Model training method, device, equipment and storage medium
US20220309779A1 (en) Neural network training and application method, device and storage medium
Maire et al. Adaptive incremental mixture markov chain monte carlo
Lin et al. Plug-in performative optimization
US11847599B1 (en) Computing system for automated evaluation of process workflows
CN111161238A (en) Image quality evaluation method and device, electronic device, and storage medium
US20220351533A1 (en) Methods and systems for the automated quality assurance of annotated images
US11762562B2 (en) Performance analysis apparatus and performance analysis method
CN113793298A (en) Pulmonary nodule detection model construction optimization method, equipment, storage medium and product
CN112070162A (en) Multi-class processing task training sample construction method, device and medium
CN113361402B (en) Training method of recognition model, method, device and equipment for determining accuracy
US20230195838A1 (en) Discovering distribution shifts in embeddings
Liu et al. A fuzzy density peak optimization initial centers selection for k-medoids clustering algorithm
CN114581751B (en) Training method of image recognition model, image recognition method and device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20911889

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20911889

Country of ref document: EP

Kind code of ref document: A1