CN117436029A

CN117436029A - Method and system for serial collaborative fusion of multiple large models

Info

Publication number: CN117436029A
Application number: CN202311443875.4A
Authority: CN
Inventors: 丁文斌
Original assignee: Beijing Qingyue Technology Co ltd
Current assignee: Beijing Qingyue Technology Co ltd
Priority date: 2023-11-01
Filing date: 2023-11-01
Publication date: 2024-01-23

Abstract

The invention discloses a serial collaboration fusion method for a plurality of large models, which adopts a weighted average method to realize serial collaboration fusion of the large models, and comprises the steps S1-S6. Even if a certain model performs poorly in a particular situation, other models can still provide reliable prediction results; the complexity of the model can be gradually reduced, the risk of overfitting is reduced, and the generalization capability of the model can be improved; the multi-model serial collaboration fusion can provide richer model combination and feature selection modes, and the interpretability of the model is improved.

Description

Method and system for serial collaborative fusion of multiple large models

Technical Field

The invention relates to the technical field of multi-model serial collaboration fusion, in particular to a method and a system for serial collaboration fusion of a plurality of large models.

Background

The multi-model serial collaboration fusion technology is used for modeling and predicting complex problems, based on the thought of serial collaboration, a plurality of large models are sequentially collaborated and fused according to a certain sequence, and a final prediction result is obtained through continuous iteration and interactive updating, so that the accuracy and generalization capability of the models are improved.

In the prior art, multiple models are required to be trained and optimized simultaneously through multi-model fusion, the complexity of training and optimizing is increased, multiple models are required to be operated simultaneously through multi-model fusion, more computing resources and memory are consumed, in multi-model fusion, dependency relationship exists among the models, and the output of each model can influence the input of the subsequent model. This may lead to accumulation of transfer errors between models, which in turn affects overall predictive performance, and therefore we need to propose a method and system for serial collaborative fusion of multiple large models.

Disclosure of Invention

The invention aims to provide a method and a system for serial collaborative fusion of a plurality of large models, wherein the multi-model serial collaborative fusion can fully exert the advantages of each model by connecting different types of models in series, make up the defect of a single model, reduce the influence of accidental errors of individual models on results and improve the robustness of the whole model through common efforts of the plurality of models. Even if a certain model performs poorly in a particular situation, other models can still provide reliable prediction results; the complexity of the model can be gradually reduced, the risk of overfitting is reduced, and the generalization capability of the model can be improved; the multi-model serial collaboration fusion can provide richer model combination and feature selection modes, and the interpretability of the model is increased, so that the problems in the background technology are solved.

In order to achieve the above purpose, the present invention provides the following technical solutions: a method for realizing serial collaboration fusion of a plurality of large models by adopting a weighted average method comprises the following steps:

s1, preparing a training data set test data set;

s2, selecting a plurality of basic models, and distributing weights for each basic model;

s3, training each basic model by using a training data set respectively to obtain a prediction result of the basic model;

s4, carrying out weighted average on the prediction result of the basic model according to the weight to obtain a final prediction result;

s5, predicting the test data set by using a final prediction result;

and S6, evaluating the fused model by using evaluation indexes of accuracy, recall and F1 values, and adjusting and optimizing the model according to the evaluation result.

Preferably, in step S1, preparing the training data set test data set includes the steps of:

a1, data collection: first, enough data needs to be collected for training and testing;

a2, data cleaning and pretreatment: cleaning and preprocessing the collected data, including removing repeated data, processing missing values and processing abnormal values;

a3, data division: dividing the cleaned and preprocessed data set into a training data set and a test data set;

a4, data set balance: the data set has the problem of unbalanced category, the sample number of partial category is small, and the problem of unbalanced data set can be processed by undersampling and oversampling methods.

Preferably, in step S2, in selecting the base model, the following factors are considered:

model type: selecting different types of models, and modeling data from different angles;

model performance: evaluating according to the accuracy of the model on a training data set or other evaluation indexes, and selecting a model with good performance;

model variability: the selected models have differences and are used for providing diversified prediction results;

model complexity: models of different complexity are selected, and a trade-off is made between the diversity of model predictions and the computational complexity of the model.

Preferably, in step S2, weights may be assigned to each base model in any of a uniform assignment, a performance-based assignment, a confidence-based assignment, and an experience-based assignment;

wherein, evenly distributing weights: adopting an arithmetic average mode to distribute equal weight to each basic model;

weight is assigned based on performance: distributing weights to the models according to the performance of each basic model on the training data set;

weights are assigned based on confidence: according to the confidence level or reliability of each basic model, a weight is distributed to the models;

weights are assigned based on experience: each base model is assigned a weight based on experience or domain knowledge.

Preferably, in step S3, the following detailed steps are included:

b1, selecting a proper basic model according to the properties of tasks and data;

b2, training a basic model: for each basic model, training by using a training data set, training the model by using a corresponding training method and an optimization algorithm according to the type and algorithm of the model, inputting the input characteristics and the target variable of the training data set into the model in the training process, and updating the parameters of the model according to the difference between the output of the model and the target variable;

b3, obtaining a prediction result of the basic model: predicting a training data set by using the trained basic models, inputting the input features of the training data set into each basic model, and obtaining a prediction result of the model;

and B4, repeating the step B2 and the step B3 until all the basic models are trained and the prediction results are obtained.

Preferably, in step S4, the prediction result of each basic model is multiplied by the weight corresponding to the prediction result, and all the weighted results are added to obtain a final prediction result;

the final prediction is equal to Σ (base model prediction) where Σ represents summing the predictions of all base models.

Preferably, in step S5, the following detailed steps are included:

c1, performing feature scaling, standardization and coding feature processing on a test data set;

and C2, inputting the test data set subjected to the feature processing into a final model to obtain a prediction result.

The invention also provides a serial collaboration fusion system for the large models, which comprises a main control module, a data processing module, a model training module, a model prediction module, a model fusion module, a performance evaluation module and a visualization and result display module, wherein the data processing module, the model training module, the model prediction module, the model fusion module, the performance evaluation module and the visualization and result display module are all connected with the main control module.

Preferably, the data processing module: the method comprises the steps of preprocessing original data and characteristic processing, wherein the preprocessing comprises data cleaning, repeated data removal, missing value processing and abnormal value processing, and the characteristic processing comprises characteristic selection, characteristic transformation, characteristic coding and data standardization;

the model training module: for training a plurality of independent basis models, including linear models, tree models, different types of models of neural networks;

the model prediction module: the method comprises the steps of predicting test data, predicting the test data by using a trained basic model, and generating a prediction result of the basic model;

the model fusion module: the prediction results of the plurality of basic models are fused;

the performance evaluation module: the method is used for evaluating the performance of the fusion model, calculating the indexes of the accuracy rate, recall rate and F1 value of the model, and helping a user to select an optimal model fusion strategy;

the visualization and result display module: the method is used for visually displaying the processes and results of data processing, model training and model fusion, providing visual charts and visual tools, and helping users understand and analyze the performance and results of the models.

Compared with the prior art, the invention has the beneficial effects that:

the multi-model serial collaborative fusion can fully exert the advantages of each model by connecting different types of models in series, make up the defect of a single model, reduce the influence of accidental errors of individual models on results through the common effort of a plurality of models, and improve the robustness of the whole model. Even if a certain model performs poorly in a particular situation, other models can still provide reliable prediction results; the complexity of the model can be gradually reduced, the risk of overfitting is reduced, and the generalization capability of the model can be improved; the multi-model serial collaboration fusion can provide richer model combination and feature selection modes, and the interpretability of the model is improved.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a flow chart of the present invention for preparing a training data set test data set;

FIG. 3 is a block flow diagram of the model training of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

Referring to fig. 1-3, the present invention provides a technical solution: a method for realizing serial collaboration fusion of a plurality of large models by adopting a weighted average method comprises the following steps:

s1, preparing a training data set test data set;

s5, predicting the test data set by using a final prediction result;

The stacking method can also be used for realizing serial collaborative fusion of a plurality of large models, and comprises the following steps:

d1, preparing a training data set and a test data set;

d2, selecting a plurality of basic models, and dividing the training data set into a plurality of subsets;

d3, training each subset by using different basic models to obtain a prediction result of the basic model,

D4, taking a prediction result of the basic model as a new characteristic, and constructing a meta model;

and D5, predicting the test data set by using the meta model.

The voting method can also be used for realizing serial collaborative fusion of a plurality of large models, and comprises the following steps:

e1, preparing a training data set and a test data set;

e2, selecting a plurality of basic models;

e3, training each basic model by using a training data set to obtain a prediction result of the basic model;

e4, voting the predicted result of the basic model, and selecting the category with the largest occurrence number as the final predicted result;

and E5, predicting the test data set by using the final prediction result.

The learning method can also be used for realizing serial collaborative fusion of a plurality of large models, and the method comprises the following steps:

z1, preparing a training data set and a test data set;

z2, selecting a plurality of basic models, and dividing the training data set into a plurality of subsets;

z3, training each subset by using different basic models respectively to obtain a prediction result of the basic model;

z4, learning and fusing the prediction result of the basic model by using a meta model to obtain a final prediction result;

and Z5, predicting the test data set by using the final prediction result.

In step S1, preparing a training data set test data set includes the steps of:

a1, data collection: first, enough data needs to be collected for training and testing; the data may be obtained from a public data set or collected by itself according to the characteristics of the task.

A2, data cleaning and pretreatment: cleaning and preprocessing the collected data, including removing repeated data, processing missing values and processing abnormal values; pretreatment steps such as feature selection, feature encoding, data standardization and the like can also be carried out.

A3, data division: dividing the cleaned and preprocessed data set into a training data set and a test data set; common partitioning methods include random partitioning, proportional partitioning, cross-validation, and the like. Typically, a large portion of the dataset is used for training and a small portion is used for testing.

Training data set: the training data set is a data set for training a model. It contains the input features and corresponding target variables (tags). The training data set should be representative and capable of covering various situations of the task.

Test dataset: the test dataset is a dataset for evaluating the performance of the model. It also contains input features and corresponding target variables (labels), but the target variables are typically not used by the model for training. The test data set should be independent of the training data set, and reflect the performance of the model in practical applications.

In step S2, in selecting the base model, the following factors are considered:

model type: selecting different types of models, such as a linear model, a tree model, a neural network and the like, and modeling data from different angles;

In step S2, weights may be assigned to each base model in any one of a uniform assignment, a performance-based assignment, a confidence-based assignment, and an experience-based assignment;

weight is assigned based on performance: distributing weights to the models according to the performance of each basic model on the training data set; performance may be measured in terms of accuracy, F1 score, etc.

Weights are assigned based on confidence: according to the confidence level or reliability of each basic model, a weight is distributed to the models; confidence may be calculated based on the probability of model predictions or other confidence measures.

Weights are assigned based on experience: each base model is assigned a weight based on experience or domain knowledge. This may require some preliminary evaluation and comparison of the model.

In step S3, the following detailed steps are included:

b1, selecting a proper basic model according to the properties of tasks and data; different types of models may be selected, such as linear models, tree models, neural networks, etc.

b3, obtaining a prediction result of the basic model: predicting a training data set by using the trained basic models, inputting the input features of the training data set into each basic model, and obtaining a prediction result of the model; the prediction result may be a classification prediction result of a classification model or a numerical prediction result of a regression model.

Through the steps, the prediction result of each basic model on the training data set can be obtained. These predictions will be used for subsequent model fusion or performance evaluation.

In step S4, multiplying the prediction result of each basic model by the weight corresponding to the prediction result, and adding all weighted results to obtain a final prediction result;

And carrying out post-processing on the final prediction result according to task requirements. For example, the classification problem may be probability normalized, thresholded for classification, etc.

Through the steps, the prediction result of the basic model can be weighted and averaged according to the weight, and the final prediction result is obtained. The weighted average method can integrate the prediction capability of a plurality of basic models, and improve the performance and the robustness of the whole model.

In step S5, the following detailed steps are included:

c1, performing feature scaling, standardization and coding feature processing on a test data set; in particular, the training data set also performs feature scaling, normalization, and encoding feature processing.

And C2, inputting the test data set subjected to the feature processing into a final model to obtain a prediction result. For classification problems, classification can be performed according to the final prediction result; for regression problems, the final prediction result may be directly used as a prediction value.

Through serial fusion of multiple models, the advantages of different models can be fully utilized, and the performance and the robustness of the models are improved.

And a data processing module: the method comprises the steps of preprocessing original data and characteristic processing, wherein the preprocessing comprises data cleaning, repeated data removal, missing value processing and abnormal value processing, and the characteristic processing comprises characteristic selection, characteristic transformation, characteristic coding and data standardization;

model training module: for training a plurality of independent basis models, including linear models, tree models, different types of models of neural networks; the module generally comprises functions of model selection, super-parameter tuning, model training and the like.

Model prediction module: the method comprises the steps of predicting test data, predicting the test data by using a trained basic model, and generating a prediction result of the basic model;

model fusion module: the prediction results of the plurality of basic models are fused; the fusion may be performed using voting, averaging, weighted averaging, stacking, etc. The module can provide different fusion strategies and parameter configurations to meet different requirements.

Performance evaluation module: the method is used for evaluating the performance of the fusion model, calculating the indexes of the accuracy rate, recall rate and F1 value of the model, and helping a user to select an optimal model fusion strategy;

and a visualization and result display module: the method is used for visually displaying the processes and results of data processing, model training and model fusion, providing visual charts and visual tools, and helping users understand and analyze the performance and results of the models.

Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.

Claims

1. A method for serial collaborative fusion of a plurality of large models is characterized by adopting a weighted average method to realize serial collaborative fusion of a plurality of large models, and comprises the following steps:

s1, preparing a training data set test data set;

s5, predicting the test data set by using a final prediction result;

2. The method for serial collaborative fusion of multiple large models according to claim 1, wherein: in step S1, preparing a training data set test data set includes the steps of:

3. The method for serial collaborative fusion of multiple large models according to claim 1, wherein: in step S2, in selecting the base model, the following factors are considered:

4. The method for serial collaborative fusion of multiple large models according to claim 1, wherein: in step S2, weights may be assigned to each base model in any one of a uniform assignment, a performance-based assignment, a confidence-based assignment, and an experience-based assignment;

5. The method for serial collaborative fusion of multiple large models according to claim 1, wherein: in step S3, the following detailed steps are included:

6. The method for serial collaborative fusion of multiple large models according to claim 1, wherein: in step S4, multiplying the prediction result of each basic model by the weight corresponding to the prediction result, and adding all weighted results to obtain a final prediction result;

7. The method for serial collaborative fusion of multiple large models according to claim 1, wherein: in step S5, the following detailed steps are included:

8. A serial collaboration fusion system of multiple large models, based on a serial collaboration fusion method of multiple large models according to any one of claims 1-7, characterized in that: the system comprises a main control module, a data processing module, a model training module, a model prediction module, a model fusion module, a performance evaluation module and a visualization and result display module, wherein the data processing module, the model training module, the model prediction module, the model fusion module, the performance evaluation module and the visualization and result display module are all connected with the main control module.

9. The multiple large model serial collaboration fusion system of claim 8, wherein: the data processing module: the method comprises the steps of preprocessing original data and characteristic processing, wherein the preprocessing comprises data cleaning, repeated data removal, missing value processing and abnormal value processing, and the characteristic processing comprises characteristic selection, characteristic transformation, characteristic coding and data standardization;