CN111325291A - Entity object classification method for selectively integrating heterogeneous models and related equipment - Google Patents

Entity object classification method for selectively integrating heterogeneous models and related equipment Download PDF

Info

Publication number
CN111325291A
CN111325291A CN202010409750.XA CN202010409750A CN111325291A CN 111325291 A CN111325291 A CN 111325291A CN 202010409750 A CN202010409750 A CN 202010409750A CN 111325291 A CN111325291 A CN 111325291A
Authority
CN
China
Prior art keywords
base classifier
base
combination
weight
classifiers
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202010409750.XA
Other languages
Chinese (zh)
Other versions
CN111325291B (en
Inventor
张雅淋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202010409750.XA priority Critical patent/CN111325291B/en
Publication of CN111325291A publication Critical patent/CN111325291A/en
Application granted granted Critical
Publication of CN111325291B publication Critical patent/CN111325291B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/285Selection of pattern recognition techniques, e.g. of classifiers in a multi-classifier system
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The entity object classification system for selectively integrating heterogeneous models provided by one or more embodiments of the present specification provides a solution for the selective integration of heterogeneous models, and includes heterogeneous base classifiers in ensemble learning, each type of base classifier gives different parameter combinations to learn in a learning stage to obtain a plurality of models, and in a selection stage, one or more of the models are selected as a component of a final model. By the method, the characteristics of different models which are long can be fully utilized, complementation is achieved, the robustness and effectiveness of the whole model are improved, and entity object classification can be well completed.

Description

Entity object classification method for selectively integrating heterogeneous models and related equipment
Technical Field
The present disclosure relates to the field of data processing technologies, and in particular, to an entity object classification method and related devices for selectively integrating heterogeneous models.
Background
In the application scene of the internet, a large amount of data needs to be analyzed every day, and machine learning is playing a role in more and more scenes as a technical means. For a given task, ensemble learning is often a good choice for achieving good deployment results, and it is often feasible to improve the overall generalization performance by integrating multiple different models.
However, the conventional model integration is based on the trained base classifier to average to obtain the final prediction result, which often fails to achieve a good effect, and has the problems of large storage overhead and long prediction time. Correspondingly, selective integration is a way to alleviate this problem, and by selecting and reasonably combining all candidate models, a better overall effect can often be achieved, and model storage overhead and prediction time overhead can be greatly reduced. Therefore, there is a need to provide a faster or more reliable model integration scheme.
Disclosure of Invention
In view of the above, an object of one or more embodiments of the present disclosure is to provide a method and related apparatus for entity object classification with selective integration of heterogeneous models, so as to solve the above problems.
In view of the above, one or more embodiments of the present specification provide an entity object classification method for selectively integrating heterogeneous models, including:
acquiring a training data set and a verification data set; the training dataset and the validation dataset comprise entity object data;
training to obtain at least two groups of heterogeneous base classifiers by using the training data set;
and circularly executing the following steps of generating and grading the combination of the base classifiers according to the specified number of rounds:
generating a plurality of base classifier combinations; each base classifier combination is obtained by assigning a weight to each base classifier through an evolutionary algorithm by using the previous round of base classifier combination and the score of the base classifier combination in combination with the weight assigned to the base classifier included in the previous round of base classifier combination, and selecting at least one base classifier from each group of base classifiers according to the sequence of weights from large to small; in the first round, a weight is given to each base classifier in a mode of randomly generating the weight;
predicting the data in the verification data set by using the base classifier combination and the weight given to the base classifier included in the base classifier combination, and calculating the score of each base classifier combination according to the prediction result;
and determining the highest-grade base classifier combination in all the rounds, and obtaining the selective integration heterogeneous model by combining the weight values corresponding to the base classifiers based on the highest-grade base classifier combination for carrying out entity object classification prediction.
One or more embodiments of the present specification further provide an entity object classification apparatus selectively integrating heterogeneous models, including:
an acquisition module for acquiring a training data set and a verification data set; the training dataset and the validation dataset comprise entity object data;
the training module is used for training to obtain at least two groups of heterogeneous base classifiers by utilizing the training data set;
the base classifier combination generation and scoring module is used for circularly executing the following steps of generating and scoring the base classifier combination according to the specified number of rounds:
generating a plurality of base classifier combinations; each base classifier combination is obtained by assigning a weight to each base classifier through an evolutionary algorithm by using the previous round of base classifier combination and the score of the base classifier combination in combination with the weight assigned to the base classifier included in the previous round of base classifier combination, and selecting at least one base classifier from each group of base classifiers according to the sequence of weights from large to small;
predicting the data in the verification data set by using the base classifier combination and the weight given to the base classifier included in the base classifier combination, and calculating the score of each base classifier combination according to the prediction result;
and the classification module is used for determining the highest-grade base classifier combination in all the rounds, and obtaining the selective integration heterogeneous model by combining the weight values corresponding to the base classifiers based on the highest-grade base classifier combination for carrying out entity object classification prediction.
One or more embodiments of the present specification also provide an electronic device comprising a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the method when executing the program.
One or more embodiments of the present specification also provide a non-transitory computer-readable storage medium storing computer instructions for causing the computer to perform the method.
From the above description, it can be seen that, in the entity object classification method for selectively integrating heterogeneous models and the related apparatus provided in one or more embodiments of the present specification, a solution for selectively integrating heterogeneous models is proposed, in which heterogeneous base classifiers are included in ensemble learning, each type of base classifier is given different parameter combinations to learn in a learning stage to obtain multiple models, and in a selection stage, one or more of the models are selected as a component of a final model for each type of model. By the method, the characteristics of different models which are long can be fully utilized, complementation is achieved, the robustness and effectiveness of the whole model are improved, and entity object classification can be completed better.
Drawings
In order to more clearly illustrate one or more embodiments or prior art solutions of the present specification, the drawings that are needed in the description of the embodiments or prior art will be briefly described below, and it is obvious that the drawings in the following description are only one or more embodiments of the present specification, and that other drawings may be obtained by those skilled in the art without inventive effort from these drawings.
FIG. 1 is a schematic diagram of a solid object classification system for selectively integrating heterogeneous models provided in one or more embodiments of the present description;
FIG. 2 is a flowchart of a method for entity object classification of selectively integrated heterogeneous models according to one or more embodiments of the present disclosure;
FIG. 3 is another schematic flow diagram of a method for entity object classification for selectively integrating heterogeneous models according to one or more embodiments of the present disclosure;
FIG. 4 is a block diagram of an entity object classification apparatus selectively integrating heterogeneous models according to one or more embodiments of the present disclosure;
fig. 5 is a schematic diagram of a hardware structure of an electronic device according to one or more embodiments of the present disclosure.
Detailed Description
For the purpose of promoting a better understanding of the objects, aspects and advantages of the present disclosure, reference is made to the following detailed description taken in conjunction with the accompanying drawings.
It is to be noted that unless otherwise defined, technical or scientific terms used in one or more embodiments of the present specification should have the ordinary meaning as understood by those of ordinary skill in the art to which this disclosure belongs. The use of "first," "second," and similar terms in one or more embodiments of the specification is not intended to indicate any order, quantity, or importance, but rather is used to distinguish one element from another. The word "comprising" or "comprises", and the like, means that the element or item listed before the word covers the element or item listed after the word and its equivalents, but does not exclude other elements or items. The terms "connected" or "coupled" and the like are not restricted to physical or mechanical connections, but may include electrical connections, whether direct or indirect.
And (3) supervision and learning: one area of research in machine learning, given data comprising a large number of labeled samples, builds models based on such training data to predict test samples. Wherein the samples are represented as feature vectors describing their features, and all samples are labeled samples with labeling information (e.g., labeled as positive or negative) representing their attributes.
Integrated learning: one area of research in machine learning is to combine multiple base learners in an attempt to achieve superior generalization performance over a single learner.
Homogeneous model: when a plurality of base classifiers in ensemble learning belong to the same type of classifier (such as neural network models), the models are said to be homogeneous.
Heterogeneous model: when a plurality of base classifiers in ensemble learning belong to different classifiers (such as a support vector machine, a neural network, a random forest and the like), the models are called to be heterogeneous.
As an embodiment of ensemble learning, a plurality of homogeneous base classifiers (e.g., 5) may be trained based on the same learning algorithm (e.g., a neural network), and an average of prediction results of the plurality of homogeneous base classifiers is used as a final prediction result of the model. However, integration schemes based on homogeneous models, limited by the characteristics of the model itself, may not be advantageous at certain tasks. Meanwhile, simply averaging the prediction results of various models without screening the models may result in undesirable overall model effects due to poor effects of some individual models.
FIG. 1 illustrates a schematic diagram of an entity object classification system for selectively integrating heterogeneous models provided by one or more embodiments of the present specification.
As shown in fig. 1, the entity object classification system selectively integrating heterogeneous models obtains at least two sets of heterogeneous base classifiers by training based on different learning algorithms using training data in a training data set; wherein the heterogeneity may mean that at least one of the at least two groups of base classifiers has a different type from the other groups of base classifiers; that is, there are heterogeneous basis classifiers in the at least two sets of basis classifiers. For example, if three groups of base classifiers are obtained through training, wherein at least one group of base classifiers (e.g., neural network) has a different type from the other two groups of base classifiers (e.g., decision tree), so that in the finally obtained target classification model (selectively integrated heterogeneous model), the different types of base classifiers can exert their own characteristics, so that the target classification model as a whole can be suitable for more application scenarios.
For example, it is assumed that three groups of base classifiers (the types of the base classifiers in the same group of base classifiers are the same) are respectively trained based on three learning algorithms (e.g., support vector machine, neural network, random forest), and the groups in the three groups of base classifiers are heterogeneous to each other. Therefore, the target classification model obtained after final selective integration can have the characteristics of three types of base classifiers, and is suitable for more scenes.
In one or more embodiments of the present description, after at least two groups of base classifiers are obtained through training, the following steps of generating a base classifier combination and scoring the base classifier combination are performed according to a specified number of rounds:
generating a plurality of base classifier combinations; each base classifier combination is obtained by assigning a weight to each base classifier through an evolutionary algorithm by using the previous round of base classifier combination and the score of the base classifier combination in combination with the weight assigned to the base classifier included in the previous round of base classifier combination, and selecting at least one base classifier from each group of base classifiers according to the sequence of weights from large to small;
and predicting the data in the verification data set by using the base classifier combination and the weight given to the base classifier included in the base classifier combination, and calculating the score of each base classifier combination according to the prediction result.
The number of the specified wheels is set according to the requirement, and for example, the number of the specified wheels can be 10, 15, 20, and the like.
The base classifier combination is obtained by selecting a certain number of base classifiers from each group of base classifiers according to a certain rule and combining the base classifiers. For example, for a combination of base classifiers, the generation process may include: firstly, each base classifier in at least two groups of heterogeneous base classifiers obtained by training is endowed with a weight value, and the way of endowing the weight value can be that the weight value is endowed to each base classifier in at least two groups of heterogeneous base classifiers obtained by training through an evolutionary algorithm by utilizing the previous round of base classifier combination and the score of the base classifier combination endowed by the base classifiers included in the previous round of base classifier combination; for the combination of the base classifiers obtained in the first round, the weights of the base classifiers are obtained in a random generation mode. And secondly, selecting at least one base classifier from each group of base classifiers based on the sequence of the weights of the base classifiers from large to small. For example, for a group of base classifiers, it is assumed that 4 base classifiers are included, and the weights assigned to the base classifiers are respectively 0.1, 0.2, 0.3, and 0.4, so that a base classifier with a weight of 0.4 is selected if one base classifier needs to be selected from the group of base classifiers in the descending order of the weights, and two base classifiers need to be selected from the group of base classifiers if weights corresponding to 0.3 and 0.4 are selected, and so on. And finally, collecting the base classifiers selected from each group of base classifiers into the base classifier combination.
In one or more embodiments of the present description, when generating a plurality of combinations of basis classifiers, the weights assigned to the basis classifiers in each combination of basis classifiers may be different. For example, when weights are assigned to the base classifiers (using an evolutionary algorithm or randomly generated), the generated weights are a plurality of groups, for example, 10 groups of weights, and for each group of weights, a base classifier combination is correspondingly generated according to the foregoing method, so that for the plurality of groups of weights, a plurality of base classifier combinations are finally obtained correspondingly, and in the base classifier combinations, the corresponding selected base classifiers may be different, and the assigned weights of the selected base classifiers may also be different.
In one or more embodiments of the present disclosure, after a plurality of base classifier combinations are generated, for each base classifier combination, a weight value given to a base classifier included in the base classifier combination is combined (at this time, the base classifier combination with the weight value already corresponds to a selective integration heterogeneous model), data in the verification data set may be predicted, and a score of each base classifier combination may be calculated according to a prediction result. After the score of each base classifier combination is obtained, when the generation and scoring steps of the next round of base classifier combination are carried out, the previous round of base classifier combination and the score given by the base classifiers included in the previous round are used for combining the score of the base classifier combination, the weight is given to each base classifier again through an evolutionary algorithm (the weight at this time can be a plurality of groups), then at least one base classifier is selected from each group of base classifiers according to the sequence of the weights from large to small to obtain a new base classifier combination, and the new base classifier combination can be predicted by using the data in the verification data set again to obtain the score. And circularly executing the generation and grading steps of the base classifier combination until the specified number of rounds is reached.
Finally, the base classifier combinations in all the rounds respectively have a corresponding score, wherein the base classifier combination with the highest score is combined with the weight values corresponding to the base classifiers included in the base classifier combinations to obtain a final selective integration heterogeneous model, and classification prediction can be carried out by utilizing the model.
In one or more embodiments of the present description, the aforementioned weights assigned to the base classifiers may be regarded as weight vectors. For example, the entity object classification system of the selective ensemble heterogeneous model may determine a first predetermined number (e.g., 10) of weight vectors (including the weight assigned to each base classifier) of the at least two sets of base classifiers according to some method adopted by some combination strategies of ensemble learning. Here, the weight vector refers to a vector obtained by combining the weights of all the base classifiers in the at least two groups of base classifiers. For a set of basis classifiers, the vectors corresponding to the set of basis classifiers may be referred to as sub-weight vectors. For example, if three sets of basis classifiers are obtained by training, the first set of basis classifiers corresponds to the first sub-weight vector, the second set of basis classifiers corresponds to the second sub-weight vector, and the third set of basis classifiers corresponds to the third sub-weight vector, the weight vector is a combination of the first sub-weight vector, the second sub-weight vector, and the third sub-weight vector. Here, more than one weight vector may be determined, and a first predetermined number of weight vectors, for example, 10 sets of weight vectors, may be determined according to a preset first predetermined number. Optionally, in the first round, the first predetermined number of weight vectors is obtained by means of random generation.
After obtaining the first predetermined number of weight vectors, the entity object classification system for selectively integrating heterogeneous models needs to selectively integrate the basis classifiers in the at least two sets of basis classifiers according to the weight vectors. Specifically, for each of the weight vectors, a second predetermined number (e.g., 1) of weights is selected according to the value (weight) of each weight in each sub-weight vector, and the values of the remaining weights in the sub-weight vector are set to 0, so as to obtain a first predetermined number of corrected weight vectors. Here, the second predetermined number is a value that is taken as needed. For example, assuming that the second predetermined number here is 1, one base classifier is selected from each group of base classifiers; for example, in this step, the selecting method may be to select the weight with the largest weight value in the sub-weight vector, and then set the values of the remaining weights in the sub-weight vector to 0 (here, the weight being set to zero may be understood as that the corresponding base classifier is not selected, and the base classifier corresponding to the weight that is not set to zero is the selected base classifier); these corrected sub-weight vectors are then combined into a corrected weight vector. For each group of weight vectors, the step is adopted to perform processing, so as to obtain a first predetermined number (for example, 10 groups) of corrected weight vectors, and the base classifier with the weight value set to zero in the corrected weight vectors can be understood as that the base classifier is not selected, and the base classifier corresponding to the weight value without being set to zero in the corrected weight vectors is the selected base classifier.
Then, the entity object classification system selectively integrates heterogeneous models predicts the data in the verification data set by using the first predetermined number of correction weight vectors and combining the at least two groups of base classifiers, and calculates the score of each correction weight vector according to the prediction result. Alternatively, the scoring method herein may employ a model performance method commonly used in machine learning. For example, a Receiver operating characteristics method (ROC) or an Area Under ROC Curve calculation method (AUC). Optionally, in this step, the correction weight vectors used for prediction may be normalized first and then used for prediction, so that the performance of each correction weight vector is more comparable.
Then, the entity object classification system of the selective integration heterogeneous model may regenerate a first predetermined number of new weight vectors by using an evolutionary algorithm in combination with the correction weight vectors and scores thereof, and repeat the steps from the calculation of the correction weight vectors to the calculation of scores of the correction weight vectors by using the regenerated weight vectors to obtain a new round of scores of the correction weight vectors; and repeating the previous steps until the number of the specified rounds is reached, and finally obtaining each correction weight vector and the score thereof in all rounds.
After the step of scoring for the specified number of rounds is completed, the correction weight vector with the highest score in the scores of each correction weight vector obtained in all rounds is determined, and a second preset number of base classifiers is selected from each group of base classifiers based on the correction weight vector with the highest score.
And finally, the entity object classification system of the selective integration heterogeneous model combines the selected base classifiers according to the corresponding correction weight vectors to obtain a target classification model (the selective integration heterogeneous model) for classification prediction. Here, the target classification model includes the selected base classifiers and their weights, and when performing classification prediction on data, only the weighted average of the classification prediction results of the base classifiers is needed to obtain the final classification prediction result.
Optionally, the evolutionary algorithm employs at least one of a genetic algorithm, a genetic plan, an evolutionary strategy, and an evolutionary plan.
In the entity object classification system for selectively integrating heterogeneous models, which is provided in one or more embodiments of the present specification, a solution for selectively integrating heterogeneous models is proposed, in which heterogeneous basis classifiers (such as support vector machines, neural networks, random forests, gradient descent decision trees (GBDTs), etc.) are included in ensemble learning, each type of basis classifier is given different parameter combinations to learn in a learning stage to obtain multiple models, and in a selection stage, one or more of the models are selected as a component of a final model. By the mode, the characteristics of different models which are long are fully utilized, complementation is achieved, and the robustness and effectiveness of the whole model are improved. And by training a plurality of models for each type of base classifier, the best effect of each model under the synchronous parameters can be fully explored, and the overall performance is further improved.
In one or more embodiments of the present description, the entity object classification system that selectively integrates heterogeneous models can be used to classify various entity objects. The entity object in one or more embodiments of the present specification may be, for example, any one of a user, a device, or an account of a user (which may also be referred to simply as an account).
For example, for a user, user properties (e.g., legal or illegal), user status (e.g., risky or non-risky), and the like may be classified. Similarly, account properties (e.g., legal or illegal), account status (e.g., risky or no risk), etc. may also be classified for the user's account, and device properties (e.g., legal or illegal), device status (e.g., risky or no risk), etc. may also be classified for the device.
In one or more embodiments of the present description, the entity object classification system for selectively integrating heterogeneous models can be used to classify user properties (e.g., classify users as legitimate users or illegitimate users); the training data set and the verification data set comprise at least one of user basic information, user dynamic information and user relation information; the basic information of the user comprises at least one of gender, age and academic calendar, the dynamic information of the user comprises at least one of browsing record and consumption record of the user in a preset period, the user relationship information comprises at least one of friend number and basic information of friends, and the basic information of friends of the user comprises at least one of gender, age and academic calendar of the friends.
It can be seen that in the training data, there are different types of features in the user basic information, the user dynamic information and the user relation information, for example, for data such as age information, consumption information, and the like, there are usually continuous features, for data such as gender, academic calendar, and the like, there are usually discrete features, and for different types of features, it is more suitable for different base classifiers. For example, the continuity features are more suitably trained using tree models (e.g., GBDT, random forest), while the discreteness features are more suitably trained using neural network models. Therefore, aiming at different types of features in the training data, when the heterogeneous models are selectively integrated, the finally obtained target classification model can better complete the task.
Similarly, for the account and the device, the training data set and the verification data set may be obtained by collecting the account/device basic information, the account/device dynamic information, and the account/device relationship information, which are not described herein again.
Fig. 2 is a flowchart illustrating an entity object classification method for selectively integrating heterogeneous models according to one or more embodiments of the present disclosure.
As shown in fig. 2, the entity object classification method for selectively integrating heterogeneous models includes:
step 102: a training dataset and a validation dataset are acquired.
Optionally, the data in the training data set and the validation data set are both provided with classification labels. For example, if the entity object classification method of the selective integration heterogeneous model is used for classifying user properties, the classification labels are user property labels, such as legal users or illegal users.
Step 104: training to obtain at least two groups of heterogeneous base classifiers by using the training data set; wherein at least one of the at least two sets of base classifiers has a type different from the other sets of base classifiers; that is, there are heterogeneous basis classifiers in the at least two sets of basis classifiers.
Optionally, the base classifier comprises at least one of a logistic regression model, a support vector machine model, a decision tree model, a gradient descent decision tree model, a random forest model, a neural network model.
For example, if three groups of base classifiers are obtained through training, wherein at least one group of base classifiers (for example, neural networks) has a different type from the other two groups of base classifiers (for example, decision trees), so that in the finally obtained target classification model, the different types of base classifiers can exert their own characteristics, so that the target classification model as a whole can be suitable for more application scenarios.
Optionally, each of the at least two groups of base classifiers is of a different type from the base classifiers of the other groups.
For example, it is assumed that three groups of base classifiers (the types of the base classifiers in the same group are the same) are respectively trained based on three learning algorithms (e.g., support vector machine, neural network, random forest), and the groups in the three groups of base classifiers are heterogeneous to each other. Therefore, the target classification model obtained after final selective integration can have the characteristics of three base classifiers, and is suitable for more scenes.
The following generation 106 and scoring 108 steps of the base classifier combination are performed in a loop according to the specified number of rounds. The specified number of rounds is set as required, and may be, for example, 10 rounds, 15 rounds, 20 rounds, and the like.
Step 106: several combinations of base classifiers are generated.
In this step, each base classifier combination is obtained by assigning a weight to each base classifier through an evolutionary algorithm by using a previous round of base classifier combination and a score assigned to the base classifier combination by the base classifier included in the previous round of base classifier combination, and selecting at least one base classifier from each group of base classifiers according to the order of the weights from large to small.
The base classifier combination is obtained by selecting a certain number of base classifiers from each group of base classifiers according to a certain rule and combining the base classifiers. For example, for a combination of base classifiers, the generation process may include: firstly, each base classifier in at least two groups of heterogeneous base classifiers obtained by training is endowed with a weight value, and the way of endowing the weight value can be that the weight value is endowed to each base classifier in at least two groups of heterogeneous base classifiers obtained by training through an evolutionary algorithm by utilizing the previous round of base classifier combination and the score of the base classifier combination endowed by the base classifiers included in the previous round of base classifier combination; for the combination of the base classifiers obtained in the first round, the weights of the base classifiers are obtained in a random generation mode. And secondly, selecting at least one base classifier from each group of base classifiers based on the sequence of the weights of the base classifiers from large to small. For example, for a group of base classifiers, it is assumed that 4 base classifiers are included, and the weights assigned to the base classifiers are respectively 0.1, 0.2, 0.3, and 0.4, so that a base classifier with a weight of 0.4 is selected if one base classifier needs to be selected from the group of base classifiers in the descending order of the weights, and two base classifiers need to be selected from the group of base classifiers if weights corresponding to 0.3 and 0.4 are selected, and so on. And finally, collecting the base classifiers selected from each group of base classifiers into the base classifier combination.
In one or more embodiments of the present description, when generating a plurality of combinations of basis classifiers, the weights assigned to the basis classifiers in each combination of basis classifiers may be different. For example, when weights are assigned to the base classifiers (using an evolutionary algorithm or randomly generated), the generated weights are a plurality of groups, for example, 10 groups of weights, and for each group of weights, a base classifier combination is correspondingly generated according to the foregoing method, so that for the plurality of groups of weights, a plurality of base classifier combinations are finally obtained correspondingly, and in the base classifier combinations, the corresponding selected base classifiers may be different, and the assigned weights of the selected base classifiers may also be different.
In one or more embodiments of the present description, the aforementioned weights assigned to the base classifiers may be regarded as weight vectors. Alternatively, for example, a first predetermined number (e.g., 10) of weight vectors (including the assigned weight value of each base classifier) for the at least two sets of base classifiers may be determined according to some method employed in connection with the strategy of ensemble learning.
In this step, the weight vector is a vector obtained by combining the weights of all the base classifiers in the at least two groups of base classifiers. For a set of basis classifiers, the vectors corresponding to the set of basis classifiers may be referred to as sub-weight vectors. For example, if three sets of basis classifiers are obtained by training, the first set of basis classifiers corresponds to the first sub-weight vector, the second set of basis classifiers corresponds to the second sub-weight vector, and the third set of basis classifiers corresponds to the third sub-weight vector, the weight vector is a combination of the first sub-weight vector, the second sub-weight vector, and the third sub-weight vector. Here, more than one weight vector may be determined, and a first predetermined number of weight vectors, for example, 10 sets of weight vectors, may be determined according to a preset first predetermined number. Optionally, in the first round, the first predetermined number of weight vectors is obtained by means of random generation.
After obtaining the first predetermined number of weight vectors, the entity object classification system for selectively integrating heterogeneous models needs to selectively integrate the basis classifiers in the at least two sets of basis classifiers according to the weight vectors. Specifically, for each of the weight vectors, a second predetermined number (e.g., 1) of weights is selected according to the value (weight) of each weight in each sub-weight vector, and the values of the remaining weights in the sub-weight vector are set to 0, so as to obtain a first predetermined number of corrected weight vectors. Here, the second predetermined number is a value that is taken as needed. For example, assuming that the second predetermined number here is 1, one base classifier is selected from each group of base classifiers; for example, in this step, the selecting method may be to select the weight with the largest weight value in the sub-weight vector, and then set the values of the remaining weights in the sub-weight vector to 0 (here, the weight being set to zero may be understood as that the corresponding base classifier is not selected, and the base classifier corresponding to the weight that is not set to zero is the selected base classifier); these corrected sub-weight vectors are then combined into a corrected weight vector. For each group of weight vectors, the step is adopted to perform processing, so as to obtain a first predetermined number (for example, 10 groups) of corrected weight vectors, and the base classifier with the weight value set to zero in the corrected weight vectors can be understood as that the base classifier is not selected, and the base classifier corresponding to the weight value without being set to zero in the corrected weight vectors is the selected base classifier.
Step 108: and predicting the data in the verification data set by using the base classifier combination and the weight given to the base classifier included in the base classifier combination, and calculating the score of each base classifier combination according to the prediction result.
In one or more embodiments of the present disclosure, after a plurality of base classifier combinations are generated, for each base classifier combination, a weight value given to a base classifier included in the base classifier combination is combined (at this time, the base classifier combination with the weight value already corresponds to a selective integration heterogeneous model), data in the verification data set may be predicted, and a score of each base classifier combination may be calculated according to a prediction result. After the score of each base classifier combination is obtained, when the generation and scoring steps of the next round of base classifier combination are carried out, the previous round of base classifier combination and the score given by the base classifiers included in the previous round are used for combining the score of the base classifier combination, the weight is given to each base classifier again through an evolutionary algorithm (the weight at this time can be a plurality of groups), then at least one base classifier is selected from each group of base classifiers according to the sequence of the weights from large to small to obtain a new base classifier combination, and the new base classifier combination can be predicted by using the data in the verification data set again to obtain the score. And circularly executing the generation and grading steps of the base classifier combination until the specified number of rounds is reached.
Alternatively, the scoring method herein may employ a model performance method commonly used in machine learning. For example, a Receiver Operating characterization method (ROC) or an Area Under ROC Curve calculation method (AUC).
In one or more embodiments of the present specification, when the weight given to the base classifier is regarded as a weight vector, the data in the verification data set may be predicted by using the first predetermined number of correction weight vectors and combining the at least two groups of base classifiers, and a score of each correction weight vector may be calculated according to a prediction result. Optionally, in this step, the correction weight vectors used for prediction may be normalized first and then used for prediction, so that the performance of each correction weight vector is more comparable.
After steps 106 and 108 are performed until the number of rounds is specified, the following steps may be performed.
Step 110: and determining the highest-grade base classifier combination in all the rounds, and combining the weights corresponding to the base classifiers included in the highest-grade base classifier combination to obtain the selective integration heterogeneous model for classification prediction.
In this step, the base classifier combinations in all the rounds have a corresponding score, wherein the base classifier combination with the highest score is combined with the weight values corresponding to the base classifiers included in the base classifier combinations to obtain the final selective integration heterogeneous model, and classification prediction can be performed by using the model.
In this step, the selective integration heterogeneous model includes the selected base classifier and its weight (the weight is the weight assigned to the base classifier when the generated base classifier is combined), and when performing classification prediction on data, only the classification prediction results of each base classifier need to be weighted and averaged to obtain the final classification prediction result.
In the entity object classification method for selectively integrating heterogeneous models, provided by one or more embodiments of the present specification, a solution for selectively integrating heterogeneous models is proposed, in which a heterogeneous base classifier is included in ensemble learning, each type of base classifier is subjected to different parameter combination learning in a learning stage to obtain a plurality of models, and in a selection stage, one or more of the models are selected as a component of a final model. By the mode, the characteristics of different models which are long are fully utilized, complementation is achieved, and the robustness and effectiveness of the whole model are improved. And by training a plurality of models for each type of base classifier, the best effect of each model under the synchronous parameters can be fully explored, and the overall performance is further improved.
In one or more embodiments of the present description, the entity object classification method of the selectively integrated heterogeneous model may be used to classify various entity objects. The entity object in one or more embodiments of the present specification may be, for example, any one of a user, a device, or an account of a user (which may also be referred to simply as an account).
For example, for a user, user properties (e.g., legal or illegal), user status (e.g., risky or non-risky), and the like may be classified. Similarly, account properties (e.g., legal or illegal), account status (e.g., risky or no risk), etc. may also be classified for the user's account, and device properties (e.g., legal or illegal), device status (e.g., risky or no risk), etc. may also be classified for the device.
In one or more embodiments of the present description, the entity object classification method of the selective integration heterogeneous model can be used to classify the user properties (e.g., classify the user as a legitimate user or an illegitimate user); the training data set and the verification data set comprise at least one of user basic information, user dynamic information and user relation information; the basic information of the user comprises at least one of gender, age and academic calendar, the dynamic information of the user comprises at least one of browsing record and consumption record of the user in a preset period, the user relationship information comprises at least one of friend number and basic information of friends, and the basic information of friends of the user comprises at least one of gender, age and academic calendar of the friends.
It can be seen that in the training data, there are different types of features in the user basic information, the user dynamic information and the user relation information, for example, for data such as age information, consumption information, and the like, there are usually continuous features, for data such as gender, academic calendar, and the like, there are usually discrete features, and for different types of features, it is more suitable for different base classifiers. For example, the continuity features are more suitably trained using tree models (e.g., GBDT, random forest), while the discreteness features are more suitably trained using neural network models. Therefore, aiming at different types of features in the training data, when the heterogeneous models are selectively integrated, the finally obtained target classification model can better complete the task.
Similarly, for the account and the device, the training data set and the verification data set may be obtained by collecting the account/device basic information, the account/device dynamic information, and the account/device relationship information, which are not described herein again.
Fig. 3 illustrates another flow diagram of an entity object classification method for selectively integrating heterogeneous models according to one or more embodiments of the present disclosure.
As shown in fig. 3, the entity object classification method for selectively integrating heterogeneous models includes:
step 202: obtaining a training data set DTAnd validating the data set DV
Optionally, the data in the training data set and the validation data set are both provided with classification labels.
Step 204: training by using the training data set to obtain at least two groups of base classifiers; wherein at least one of the at least two groups of base classifiers has a type different from the other groups of base classifiers.
Optionally, the base classifier comprises at least one of a logistic regression model (LR), a decision tree model, a random forest model, a neural network model.
Optionally, each of the at least two groups of base classifiers is of a different type from the base classifiers of the other groups.
For example, select n types of base classifiers M1,M2,…,Mn. Referring to FIG. 1, M1Representing a logistic regression model, M2Representing a random forest model, MnRepresenting a neural network model. The parameter k is the number of base classifiers trained in each set of base classifiers.
For example, for each set of base classifiers, based on different parameters, a candidate model is trained, respectively denoted as M11,M12,…M1k, M21, M22, …M2k,…,Mn1,Mn2, …MnkWherein M isijPresentation base classifier MjAnd (5) obtaining n x k candidate models (base classifiers) in total from the sub models obtained under the ith group of parameters.
Step 206: determining a first predetermined number (e.g., 10) of weight vectors for the at least two sets of base classifiers
Figure DEST_PATH_IMAGE001
(ii) a Wherein the weight vector
Figure 151695DEST_PATH_IMAGE001
Each term in (a) is a number between 0 and 1; each weight vector comprises a sub-weight vector omega corresponding to each group of base classifiers ii1i2,…,ωik
Step 208: for each of said weight vectors, according to each sub-weight vector ωi1i2,…,ωikSelecting a second predetermined number of weights and setting the values of the rest weights in the sub-weight vector as 0 to obtain a first predetermined number of corrected weight vectors.
For example, for each set of sub-weight vectors, the one with the largest value of weight may be selected. Let the maximum term in the first set of sub-weight vectors be ω11Then, after processing in step 208, the sub-weight vector ω1112,…,ω1kBecomes omega110, …,0, where only one entry is not 0 (meaning that only one of each class of base classifier is selected); let the maximum term in the second set of sub-weight vectors be ω22Then, after processing in step 208, the sub-weight vector ω2122,…,ω2kBecomes 0, omega220, …, 0; by analogy, assume that the maximum term in the nth group of sub-weight vectors is ωnkThen, after processing in step 208, the sub-weight vector ωn1n2,…,ωnkBecomes 0, …,0, ωnk. Thus, the resulting correction weight vector is: omega11,0,…,0,ω22,0,…,0,…,0,…,0,ωnk
For another example, for each set of sub-weight vectors, the first two terms with the largest value of weight may be selected. Let ω be the maximum two terms in the first set of sub-weight vectors11And ω12Then, after processing in step 208, the sub-weight vector ω1112,…,ω1kBecomes omega11120, …,0, where only two of the terms are not 0 (meaning that only two of them are selected in each class of base classifier); let ω be the largest two terms in the second set of sub-weight vectors22And ω23Then, after processing in step 208, the sub-weight vector ω2122,…,ω2kBecomes 0, omega22230, …, 0; and so on; let ω be the largest two terms in the nth set of sub-weight vectorsnk-1And ωnkThen, after processing in step 208, the sub-weight vector ωn1n2,…,ωnkBecomes 0, …,0, ωnk-1nk. Thus, the resulting correction weight vector is: omega1112,0,…,0, 0,ω2223,0,…,0,…,0,…,0,ωnk-1nk
Other examples refer to the principles of the previous embodiments and are not described in detail herein.
Step 210: and normalizing the correction weight vector to obtain the correction weight vector after normalization is completed, so that the sum of all weight values in the correction weight vector is 1.
Step 212: and predicting the data in the verification data set by using the normalized correction weight vectors of the first preset number and combining the at least two groups of base classifiers, and calculating the score of each correction weight vector according to the prediction result.
For example, the data set D will be validatedVIs represented by the mark of
Figure 289022DEST_PATH_IMAGE002
(a vector of length equal to the number of prediction samples) and P represents the prediction result for each model
Figure DEST_PATH_IMAGE003
The concatenated matrix (a matrix with length equal to the number of predicted samples and width equal to the number n x k of candidate models) is represented by ω, which is a correction weight vector that can be calculated when ω takes different values
Figure 559597DEST_PATH_IMAGE004
Representing the final result of integrating the predicted results of the different models according to the weight vector omega, further based on
Figure DEST_PATH_IMAGE005
And
Figure 577232DEST_PATH_IMAGE006
various evaluation indexes can be calculated to evaluate the quality of the current weight vector. The optimization goal of the embodiment is to obtain a suitable ω, so that a better evaluation index is obtained.
Optionally, predicting data in the verification data set by using the first predetermined number of normalized correction weight vectors in combination with the at least two sets of base classifiers, and calculating a score of each correction weight vector according to a prediction result, including:
selecting a base classifier corresponding to the weight with the value not being zero from the at least two groups of base classifiers according to the weight with the value not being zero in the normalized correction weight vector to form a classification model;
inputting the data in the verification data set into the classification model, and performing weighted prediction according to the normalized correction weight vector to obtain a prediction result;
and according to the prediction result, obtaining the corresponding score of the correction weight vector according to a preset model performance evaluation method.
Step 214: and regenerating a first preset number of weight vectors by using an evolution algorithm (also called an evolutionary algorithm) in combination with the correction weight vectors and the scores thereof, and repeating the steps from the calculation of the correction weight vectors to the calculation of the scores of the correction weight vectors by using the regenerated weight vectors to obtain the scores of the correction weight vectors in a new round.
In the step, the correction weight vectors and the scores thereof obtained by calculation in the step are combined through an evolution algorithm to regenerate the weight vectors with the first preset number, so that the weight vectors can be optimized by the evolution algorithm. Optionally, when the weight vectors need to be regenerated by using the evolutionary algorithm, in the foregoing step 206, i.e., in the first round, the first predetermined number of weight vectors of the at least two groups of base classifiers may be determined by randomly generating the weight vectors.
Optionally, at least one of the Genetic Algorithms (Genetic Algorithms), Genetic Programming (Genetic Programming), Evolution Strategies (Evolution Strategies), and Evolution Programming (Evolution Programming).
Step 216: the previous steps are repeated until a specified number of rounds (e.g. 10 rounds) is reached.
Step 218: and determining the correction weight vector with the highest score in the scores of all the correction weight vectors obtained in all the rounds, and selecting a second preset number of base classifiers from each group of base classifiers on the basis of the correction weight vector with the highest score.
Step 220: and combining the selected base classifiers to obtain a target classification model for classification prediction. For a new sample to be classified, weighted prediction is carried out only by combining the prediction result of the target classification model with the weight of each model.
As an embodiment, the entity object classification method of the selective integration heterogeneous model is used for classifying user properties; the training data set and the verification data set comprise at least one of user basic information, user dynamic information and user relation information; the basic information of the user comprises at least one of gender, age and academic calendar, the dynamic information of the user comprises at least one of browsing record and consumption record of the user in a preset period, the user relationship information comprises at least one of friend number and basic information of friends, and the basic information of friends of the user comprises at least one of gender, age and academic calendar of the friends.
In the entity object classification method for selectively integrating the heterogeneous models, provided by one or more embodiments of the present specification, various heterogeneous base classifiers are introduced, and the advantages of each base classifier can be fully utilized to make the effect of the overall model more robust; for each type of base classifier, a scheme of selective integration is adopted, so that the performance of each base classifier can be better mined, the overall effect is better, and further, the strong dependence of data on a specific model can be reduced. Furthermore, the method is simple. In the selection process of the candidate model, an evolution algorithm of weight vector correction is adopted, and a better solution can be efficiently obtained aiming at the complex optimization problem with more constraint conditions in the embodiment.
It should be noted that the method of one or more embodiments of the present disclosure may be performed by a single device, such as a computer or server. The method of the embodiment can also be applied to a distributed scene and completed by the mutual cooperation of a plurality of devices. In such a distributed scenario, one of the devices may perform only one or more steps of the method of one or more embodiments of the present disclosure, and the devices may interact with each other to complete the method.
Fig. 4 is a block diagram illustrating a classification apparatus for selectively integrating heterogeneous models according to one or more embodiments of the present disclosure.
As shown in fig. 4, the classification apparatus for selectively integrating heterogeneous models includes:
an obtaining module 301, configured to obtain a training data set and a verification data set;
a training module 302, configured to train to obtain at least two sets of heterogeneous base classifiers by using the training data set;
a base classifier combination generation and scoring module 303, configured to cyclically execute the following steps of generating and scoring a base classifier combination according to a specified number of rounds:
generating a plurality of base classifier combinations; each base classifier combination is obtained by assigning a weight to each base classifier through an evolutionary algorithm by using the previous round of base classifier combination and the score of the base classifier combination in combination with the weight assigned to the base classifier included in the previous round of base classifier combination, and selecting at least one base classifier from each group of base classifiers according to the sequence of weights from large to small;
predicting the data in the verification data set by using the base classifier combination and the weight given to the base classifier included in the base classifier combination, and calculating the score of each base classifier combination according to the prediction result;
the classification module 304 is configured to determine a combination of the base classifiers with the highest scores in all the rounds, and obtain the selective integration heterogeneous model based on the combination of the base classifiers with the highest scores in combination with weights corresponding to the base classifiers included in the combination of the base classifiers, so as to perform classification prediction.
The classification device for selectively integrating heterogeneous models provided by one or more embodiments of the present specification proposes a solution for selectively integrating heterogeneous models, and includes a heterogeneous base classifier in ensemble learning, each type of base classifier gives different parameter combinations to learn in a learning stage to obtain a plurality of models, and in a selection stage, one or more of the models are selected as a component of a final model. By the mode, the characteristics of different models which are long are fully utilized, complementation is achieved, and the robustness and effectiveness of the whole model are improved.
Optionally, the base classifier combination generation and scoring module is configured to:
generating a plurality of base classifier combinations; each base classifier combination is obtained by endowing each base classifier with a weight in a mode of randomly generating the weight and selecting at least one base classifier from each group of base classifiers according to the sequence of the weights from large to small;
and predicting the data in the verification data set by using the base classifier combination and the weight given to the base classifier included in the base classifier combination, and calculating the score of each base classifier combination according to the prediction result.
Optionally, the evolutionary algorithm employs at least one of a genetic algorithm, a genetic plan, an evolutionary strategy, and an evolutionary plan.
Optionally, the data in the training data set and the validation data set are both provided with classification labels.
Optionally, the base classifier comprises at least one of a logistic regression model, a support vector machine model, a decision tree model, a gradient descent decision tree model, a random forest model, a neural network model.
Optionally, the classification device of the selective integration heterogeneous model is used for classifying the user property; the training data set and the verification data set comprise at least one of user basic information, user dynamic information and user relation information; the basic information of the user comprises at least one of gender, age and academic calendar, the dynamic information of the user comprises at least one of browsing record and consumption record of the user in a preset period, the user relationship information comprises at least one of friend number and basic information of friends, and the basic information of friends of the user comprises at least one of gender, age and academic calendar of the friends.
For convenience of description, the above devices are described as being divided into various modules by functions, and are described separately. Of course, the functionality of the modules may be implemented in the same one or more software and/or hardware implementations in implementing one or more embodiments of the present description.
The apparatus of the foregoing embodiment is used to implement the corresponding method in the foregoing embodiment, and has the beneficial effects of the corresponding method embodiment, which are not described herein again.
Fig. 5 is a schematic diagram illustrating a more specific hardware structure of an electronic device according to this embodiment, where the electronic device may include: a processor 401, a memory 402, an input/output interface 403, a communication interface 404, and a bus 405. Wherein the processor 401, the memory 402, the input/output interface 403 and the communication interface 404 are communicatively connected to each other within the device by a bus 405.
The processor 401 may be implemented by a general-purpose CPU (Central Processing Unit), a microprocessor, an Application Specific Integrated Circuit (ASIC), or one or more Integrated circuits, and is configured to execute related programs to implement the technical solutions provided in the embodiments of the present specification.
The Memory 402 may be implemented in the form of a ROM (Read Only Memory), a RAM (Random access Memory), a static storage device, a dynamic storage device, or the like. The memory 402 may store an operating system and other application programs, and when the technical solution provided by the embodiments of the present specification is implemented by software or firmware, the relevant program codes are stored in the memory 402 and called to be executed by the processor 401.
The input/output interface 403 is used for connecting an input/output module to realize information input and output. The i/o module may be configured as a component in a device (not shown) or may be external to the device to provide a corresponding function. The input devices may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output devices may include a display, a speaker, a vibrator, an indicator light, etc.
The communication interface 404 is used to connect a communication module (not shown in the figure) to implement communication interaction between the present device and other devices. The communication module can realize communication in a wired mode (such as USB, network cable and the like) and also can realize communication in a wireless mode (such as mobile network, WIFI, Bluetooth and the like).
The bus 405 includes a path that transfers information between the various components of the device, such as the processor 401, memory 402, input/output interface 403, and communication interface 404.
It should be noted that although the above-mentioned device only shows the processor 401, the memory 402, the input/output interface 403, the communication interface 404 and the bus 405, in a specific implementation, the device may also include other components necessary for normal operation. In addition, those skilled in the art will appreciate that the above-described apparatus may also include only those components necessary to implement the embodiments of the present description, and not necessarily all of the components shown in the figures.
Computer-readable media of the present embodiments, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device.
The foregoing description has been directed to specific embodiments of this disclosure. Other embodiments are within the scope of the following claims. In some cases, the actions or steps recited in the claims may be performed in a different order than in the embodiments and still achieve desirable results. In addition, the processes depicted in the accompanying figures do not necessarily require the particular order shown, or sequential order, to achieve desirable results. In some embodiments, multitasking and parallel processing may also be possible or may be advantageous.
Those of ordinary skill in the art will understand that: the discussion of any embodiment above is meant to be exemplary only, and is not intended to intimate that the scope of the disclosure, including the claims, is limited to these examples; within the spirit of the present disclosure, features from the above embodiments or from different embodiments may also be combined, steps may be implemented in any order, and there are many other variations of different aspects of one or more embodiments of the present description as described above, which are not provided in detail for the sake of brevity.
In addition, well-known power/ground connections to Integrated Circuit (IC) chips and other components may or may not be shown in the provided figures, for simplicity of illustration and discussion, and so as not to obscure one or more embodiments of the disclosure. Furthermore, devices may be shown in block diagram form in order to avoid obscuring the understanding of one or more embodiments of the present description, and this also takes into account the fact that specifics with respect to implementation of such block diagram devices are highly dependent upon the platform within which the one or more embodiments of the present description are to be implemented (i.e., specifics should be well within purview of one skilled in the art). Where specific details (e.g., circuits) are set forth in order to describe example embodiments of the disclosure, it should be apparent to one skilled in the art that one or more embodiments of the disclosure can be practiced without, or with variation of, these specific details. Accordingly, the description is to be regarded as illustrative instead of restrictive.
While the present disclosure has been described in conjunction with specific embodiments thereof, many alternatives, modifications, and variations of these embodiments will be apparent to those of ordinary skill in the art in light of the foregoing description. For example, other memory architectures (e.g., dynamic ram (dram)) may use the discussed embodiments.
It is intended that the one or more embodiments of the present specification embrace all such alternatives, modifications and variations as fall within the broad scope of the appended claims. Therefore, any omissions, modifications, substitutions, improvements, and the like that may be made without departing from the spirit and principles of one or more embodiments of the present disclosure are intended to be included within the scope of the present disclosure.

Claims (14)

1. An entity object classification method for selectively integrating heterogeneous models comprises the following steps:
acquiring a training data set and a verification data set; the training dataset and the validation dataset comprise entity object data;
training to obtain at least two groups of heterogeneous base classifiers by using the training data set;
and circularly executing the following steps of generating and grading the combination of the base classifiers according to the specified number of rounds:
generating a plurality of base classifier combinations; each base classifier combination is obtained by assigning a weight to each base classifier through an evolutionary algorithm by using the previous round of base classifier combination and the score of the base classifier combination in combination with the weight assigned to the base classifier included in the previous round of base classifier combination, and selecting at least one base classifier from each group of base classifiers according to the sequence of weights from large to small;
predicting the data in the verification data set by using the base classifier combination and the weight given to the base classifier included in the base classifier combination, and calculating the score of each base classifier combination according to the prediction result;
and determining the highest-grade base classifier combination in all the rounds, and obtaining the selective integration heterogeneous model by combining the weight values corresponding to the base classifiers based on the highest-grade base classifier combination for carrying out entity object classification prediction.
2. The method of claim 1, wherein the method further comprises: the step of generating and scoring the first round of base classifier combinations specifically comprises:
generating a plurality of base classifier combinations; each base classifier combination is obtained by endowing each base classifier with a weight in a mode of randomly generating the weight and selecting at least one base classifier from each group of base classifiers according to the sequence of the weights from large to small;
and predicting the data in the verification data set by using the base classifier combination and the weight given to the base classifier included in the base classifier combination, and calculating the score of each base classifier combination according to the prediction result.
3. The method of claim 1, wherein the evolutionary algorithm employs at least one of a genetic algorithm, a genetic plan, an evolutionary strategy, and an evolutionary plan.
4. The method of claim 1, wherein the data in the training dataset and the validation dataset are each labeled with a classification.
5. The method of claim 1, wherein the base classifier comprises at least one of a logistic regression model, a support vector machine model, a decision tree model, a gradient descent decision tree model, a random forest model, a neural network model.
6. The method according to any of claims 1-5, wherein the method is used for classifying user properties; the training data set and the verification data set comprise at least one of user basic information, user dynamic information and user relation information; the basic information of the user comprises at least one of gender, age and academic calendar, the dynamic information of the user comprises at least one of browsing record and consumption record of the user in a preset period, the user relationship information comprises at least one of friend number and basic information of friends, and the basic information of friends of the user comprises at least one of gender, age and academic calendar of the friends.
7. An entity object classification apparatus selectively integrating heterogeneous models, comprising:
an acquisition module for acquiring a training data set and a verification data set; the training dataset and the validation dataset comprise entity object data;
the training module is used for training to obtain at least two groups of heterogeneous base classifiers by utilizing the training data set;
the base classifier combination generation and scoring module is used for circularly executing the following steps of generating and scoring the base classifier combination according to the specified number of rounds:
generating a plurality of base classifier combinations; each base classifier combination is obtained by assigning a weight to each base classifier through an evolutionary algorithm by using the previous round of base classifier combination and the score of the base classifier combination in combination with the weight assigned to the base classifier included in the previous round of base classifier combination, and selecting at least one base classifier from each group of base classifiers according to the sequence of weights from large to small;
predicting the data in the verification data set by using the base classifier combination and the weight given to the base classifier included in the base classifier combination, and calculating the score of each base classifier combination according to the prediction result;
and the classification module is used for determining the highest-grade base classifier combination in all the rounds, and obtaining the selective integration heterogeneous model by combining the weight values corresponding to the base classifiers based on the highest-grade base classifier combination for carrying out entity object classification prediction.
8. The apparatus of claim 7, wherein the base classifier combination generation and scoring module is to:
generating a plurality of base classifier combinations; each base classifier combination is obtained by endowing each base classifier with a weight in a mode of randomly generating the weight and selecting at least one base classifier from each group of base classifiers according to the sequence of the weights from large to small;
and predicting the data in the verification data set by using the base classifier combination and the weight given to the base classifier included in the base classifier combination, and calculating the score of each base classifier combination according to the prediction result.
9. The apparatus of claim 7, wherein the evolutionary algorithm employs at least one of a genetic algorithm, a genetic plan, an evolutionary strategy, and an evolutionary plan.
10. The apparatus of claim 7, wherein the data in the training data set and the validation data set are each labeled with a classification.
11. The apparatus of claim 7, wherein the base classifier comprises at least one of a logistic regression model, a support vector machine model, a decision tree model, a gradient descent decision tree model, a random forest model, a neural network model.
12. The apparatus according to any of claims 7-11, wherein the apparatus is configured to classify a user property; the training data set and the verification data set comprise at least one of user basic information, user dynamic information and user relation information; the basic information of the user comprises at least one of gender, age and academic calendar, the dynamic information of the user comprises at least one of browsing record and consumption record of the user in a preset period, the user relationship information comprises at least one of friend number and basic information of friends, and the basic information of friends of the user comprises at least one of gender, age and academic calendar of the friends.
13. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1 to 6 when executing the program.
14. A non-transitory computer readable storage medium storing computer instructions for causing a computer to perform the method of any one of claims 1 to 6.
CN202010409750.XA 2020-05-15 2020-05-15 Entity object classification method for selectively integrating heterogeneous models and related equipment Active CN111325291B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010409750.XA CN111325291B (en) 2020-05-15 2020-05-15 Entity object classification method for selectively integrating heterogeneous models and related equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010409750.XA CN111325291B (en) 2020-05-15 2020-05-15 Entity object classification method for selectively integrating heterogeneous models and related equipment

Publications (2)

Publication Number Publication Date
CN111325291A true CN111325291A (en) 2020-06-23
CN111325291B CN111325291B (en) 2020-08-25

Family

ID=71173557

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010409750.XA Active CN111325291B (en) 2020-05-15 2020-05-15 Entity object classification method for selectively integrating heterogeneous models and related equipment

Country Status (1)

Country Link
CN (1) CN111325291B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114259633A (en) * 2021-12-20 2022-04-01 东软汉枫医疗科技有限公司 Mechanical ventilation decision method and device, storage medium and electronic equipment
CN116883765A (en) * 2023-09-07 2023-10-13 腾讯科技(深圳)有限公司 Image classification method, device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107103332A (en) * 2017-04-07 2017-08-29 武汉理工大学 A kind of Method Using Relevance Vector Machine sorting technique towards large-scale dataset
CN110059773A (en) * 2019-05-17 2019-07-26 江苏师范大学 A kind of compound diagnostic method of transformer fault
US20190325261A1 (en) * 2018-04-19 2019-10-24 Fujitsu Limited Generation of a classifier from existing classifiers

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107103332A (en) * 2017-04-07 2017-08-29 武汉理工大学 A kind of Method Using Relevance Vector Machine sorting technique towards large-scale dataset
US20190325261A1 (en) * 2018-04-19 2019-10-24 Fujitsu Limited Generation of a classifier from existing classifiers
CN110059773A (en) * 2019-05-17 2019-07-26 江苏师范大学 A kind of compound diagnostic method of transformer fault

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114259633A (en) * 2021-12-20 2022-04-01 东软汉枫医疗科技有限公司 Mechanical ventilation decision method and device, storage medium and electronic equipment
CN116883765A (en) * 2023-09-07 2023-10-13 腾讯科技(深圳)有限公司 Image classification method, device, electronic equipment and storage medium
CN116883765B (en) * 2023-09-07 2024-01-09 腾讯科技(深圳)有限公司 Image classification method, device, electronic equipment and storage medium

Also Published As

Publication number Publication date
CN111325291B (en) 2020-08-25

Similar Documents

Publication Publication Date Title
US10909455B2 (en) Information processing apparatus using multi-layer neural network and method therefor
CN109214421B (en) Model training method and device and computer equipment
US11455523B2 (en) Risk evaluation method, computer-readable recording medium, and information processing apparatus
US20200184393A1 (en) Method and apparatus for determining risk management decision-making critical values
US20170140273A1 (en) System and method for automatic selection of deep learning architecture
CN111325291B (en) Entity object classification method for selectively integrating heterogeneous models and related equipment
US20190130303A1 (en) Smart default threshold values in continuous learning
CN112990423A (en) Artificial intelligence AI model generation method, system and equipment
US11403550B2 (en) Classifier
CN115699041A (en) Extensible transfer learning using expert models
US11003989B2 (en) Non-convex optimization by gradient-accelerated simulated annealing
CN115617867A (en) Time series prediction method, electronic device and storage medium
CN115269247A (en) Flash memory bad block prediction method, system, medium and device based on deep forest
CN111582315A (en) Sample data processing method and device and electronic equipment
WO2020065908A1 (en) Pattern recognition device, pattern recognition method, and pattern recognition program
CN112966754A (en) Sample screening method, sample screening device and terminal equipment
KR102188115B1 (en) Electronic device capable of selecting a biomarker to be used in cancer prognosis prediction based on generative adversarial networks and operating method thereof
CN112910890A (en) Anonymous network flow fingerprint identification method and device based on time convolution network
CN113837210A (en) Applet classifying method, device, equipment and computer readable storage medium
JP2021012531A (en) Prediction program, prediction method and prediction device
CN115577798A (en) Semi-federal learning method and device based on random acceleration gradient descent
US11868885B2 (en) Learning device, inference device, learning method, and inference method using a transformation matrix generated from learning data
CN115470190A (en) Multi-storage-pool data classification storage method and system and electronic equipment
CN113457167A (en) Training method of user classification network, user classification method and device
JP7097261B2 (en) Learning data analysis method and computer system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant