CN112819079A - Model sampling algorithm matching method and device and electronic equipment - Google Patents
Model sampling algorithm matching method and device and electronic equipment Download PDFInfo
- Publication number
- CN112819079A CN112819079A CN202110159651.5A CN202110159651A CN112819079A CN 112819079 A CN112819079 A CN 112819079A CN 202110159651 A CN202110159651 A CN 202110159651A CN 112819079 A CN112819079 A CN 112819079A
- Authority
- CN
- China
- Prior art keywords
- sampling
- sampling algorithm
- model
- target
- sample set
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/22—Matching criteria, e.g. proximity measures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Image Analysis (AREA)
Abstract
The application discloses a sampling algorithm matching method and device of a model and electronic equipment, and belongs to the technical field of model sampling. The method comprises the following steps: acquiring a sampling algorithm set, wherein the sampling algorithm set comprises a plurality of sampling algorithms; respectively sampling samples through each sampling algorithm in the sampling algorithm set to obtain a target training sample set corresponding to each sampling algorithm; training model parameters of the target model through each target training sample set respectively to obtain a plurality of trained models; and according to the evaluation index value of each trained model, determining a sampling algorithm matched with the target model in the sampling algorithm set, so that the model detection performance can be improved.
Description
Technical Field
The application belongs to the technical field of model sampling, and particularly relates to a sampling algorithm matching method and device of a model and electronic equipment.
Background
Data sampling is a common data processing process in model construction, a model is trained by using sampled data, but a scene with unbalanced data can be encountered in the sampling process, the effect of the model can be influenced by the unbalanced data, and the generalization capability of the model is weak, so that the prediction of the model is inaccurate. In order to deal with the problem of model unbalance, the data balance is achieved by adopting traditional Random Oversampling (ROS) or synthesizing a few class samples (smote) by using an interpolation method in the prior art.
However, the prior art has at least the following problems: ROS sampling and smote sampling are only simple to make positive and negative samples equal in number, and do not consider the real distribution situation of data. For example, the ROS sampling is to synthesize sample points by random oversampling, and only increase the number of the minority samples around the original minority samples repeatedly, so that the improvement of the model detection effect is not great, and even the model detection effect may be worse than that of the model without sampling; SMote sampling is to generate a few samples in a plurality of samples, which increases the noise of original data and influences the distinguishing degree of characteristics and the performance of a model.
Therefore, it is necessary to select the sampling algorithm that best matches the different models.
Content of application
The embodiment of the application aims to provide a sampling algorithm matching method and device of a model and electronic equipment, and the problem that the detection performance of the model is not high due to the fact that a sampling technology is not adaptive can be solved.
In order to solve the technical problem, the present application is implemented as follows:
in a first aspect, an embodiment of the present application provides a sampling algorithm matching method for a model, where the method includes:
acquiring a sampling algorithm set, wherein the sampling algorithm set comprises a plurality of sampling algorithms;
respectively sampling samples through each sampling algorithm in the sampling algorithm set to obtain a target training sample set corresponding to each sampling algorithm;
respectively training the target models through each target training sample set to obtain a plurality of trained models;
and determining a sampling algorithm matched with the target model in the sampling algorithm set according to the evaluation index value of each trained model.
In a second aspect, an embodiment of the present application provides a sampling algorithm matching apparatus for a model, including:
the algorithm acquisition module is used for acquiring a sampling algorithm set, and the sampling algorithm set comprises a plurality of sampling algorithms;
the sampling module is used for sampling samples through each sampling algorithm in the sampling algorithm set respectively to obtain a target training sample set corresponding to each sampling algorithm;
the training module is used for training the target model through each target training sample set respectively to obtain a plurality of trained models;
and the matching module is used for determining a sampling algorithm matched with the target model in the sampling algorithm set according to the evaluation index value of each trained model.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor, and when executed by the processor, the program or instructions implement the steps of the method according to the first aspect.
In a fourth aspect, embodiments of the present application provide a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the method according to the first aspect.
In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method according to the first aspect.
In the embodiment of the application, a total training sample set is divided into target training sample sets with the same number as that of sampling algorithms to be matched, each sampling algorithm is used for sampling the target training sample sets, meanwhile, direct sampling and fitting sampling data are combined, the sampling data are enabled to be more fit with the characteristic distribution of original data, the balance of sample data is improved, each trained model is tested by the target testing sample sets, evaluation index values are obtained, the model performance of the model after the sample data obtained by the corresponding sampling algorithm are trained is detected according to the evaluation index values, and then the model is matched with the optimal sampling algorithm, so that the model has better performance.
Drawings
FIG. 1 is a flowchart of a model sampling algorithm matching method provided in this embodiment;
FIG. 2 is a block diagram of a model sampling algorithm matching device provided in this embodiment;
fig. 3 is a schematic diagram of a hardware structure of an electronic device according to this embodiment.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used is interchangeable under appropriate circumstances such that the embodiments of the application are capable of operation in sequences other than those illustrated or described herein. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.
The following describes in detail a model sampling algorithm matching method provided in the embodiments of the present application with reference to the accompanying drawings through specific embodiments and application scenarios thereof.
Referring to fig. 1, the sampling algorithm matching method for a model provided in this embodiment includes the following steps:
step S110: and acquiring a sampling algorithm set.
In this embodiment, the sampling algorithm set includes a plurality of sampling algorithms, for example, the sampling algorithm set includes sampling algorithms such as sampling algorithm 1 (conventional smote sampling), sampling algorithm 2 (cluster sampling), sampling algorithm 3 (density oversampling), sampling algorithm 4 (boundary 1 oversampling), and sampling algorithm 5 (boundary 2 oversampling). And acquiring a sampling algorithm in the sampling algorithm set, so as to sample data from the original data set by using the sampling algorithm.
The original data may be a set of all sample data required in the model creation process, the original data is divided into a total training sample set train _ data and a total test sample set test _ data in the model creation process, the total training sample set is used as training data of the model to obtain a model with recognition capability, and the total test sample set is used as test data of the trained model to detect the recognition performance of the model.
Step S120: and respectively sampling samples by each sampling algorithm in the sampling algorithm set to obtain a target training sample set corresponding to each sampling algorithm.
In order to make the model better learn the characteristics of data, data sampling is often performed in the process of training the model, a random event is simulated according to a given probability distribution, and the model learning effect is better through the simulation of the random phenomenon. Since most data is huge in real life, the overall distribution may include countless sample points, and the model cannot directly model the massive data, we generally adopt a sample sampling manner to extract a subset from the overall sample to approximate the overall distribution, use a small number of sample points to approximate the overall distribution, and characterize the uncertainty in the overall distribution. This subset is called the training set, and the goal of model training is to minimize the loss function on the training set, and after training is completed, another data set is needed to serve as the test set to evaluate the performance of the model.
In this embodiment, each sampling algorithm in the sampling algorithm set is used to perform sample sampling to obtain a target training sample set corresponding to each sampling algorithm, so that the model is trained by using the target training sample corresponding to each sampling algorithm, and whether the sampling algorithm corresponding to the target training sample is the optimal sampling algorithm for the model is judged according to the trained model performance.
The method for obtaining the corresponding target training sample set comprises the following steps: acquiring a total training sample set of a target model; distributing samples in the total training sample set by a sample-putting back method to obtain sub-training sample sets corresponding to each sampling algorithm one by one; and respectively sampling samples from the corresponding sub-training sample sets through each sampling algorithm to obtain corresponding target training sample sets.
In a feasible embodiment, the number of sampling algorithms in the sampling algorithm set is obtained, data in the total training sample set is extracted through a put-back sampling method, a plurality of equal parts of data with the same number as the sampling algorithms are obtained, and each extracted data is subjected to sample sampling through each sampling algorithm, so that a target training sample set corresponding to each sampling algorithm is obtained. For example, if the sampling algorithm set includes 5 sampling algorithms, 5 sub-training sample sets are obtained by extracting 5 from the data of the total training sample set, and each sub-training sample set is denoted as train _ dataj(0<j<J is the number of sampling algorithms to be selected, 5). For example, the sub-training sample set corresponding to the sampling algorithm 1 is train _ data1The sub-training sample set corresponding to the sampling algorithm 2 is train _ data2. From the sub-training sample set train _ data using sampling algorithm 11And sampling samples to obtain a target training sample set corresponding to the sampling algorithm 1. In the embodiment, the sampling is returned, that is, the total training sample set is returned for the next sampling after one sampling data is extracted, so that the data imbalance in the sub-training sample sets can be prevented to a certain extent.
In this embodiment, from the perspective of sample distribution, the distribution characteristics of the original sample are restored by fitting sample data, so as to solve the problem of uneven sample distribution in the sample data to be trained, specifically, sample sampling is performed from corresponding sub-training sample sets through each sampling algorithm, so as to obtain first sample sets corresponding to each sampling algorithm one to one; respectively performing data fitting on each first sample set to obtain second sample sets corresponding to each sampling algorithm one by one; and for each sampling algorithm, combining the corresponding first sample set and the second sample set to obtain a corresponding target training sample set.
That is, each sampling algorithm is adopted to perform data sampling on the corresponding sub-training sample set to obtain a first sample set, then data fitting is performed on the first sample set to obtain a second sample set, the first sample set and the second sample set are combined into a target training sample set, and the model to be generated is trained through the target training sample set.
For example, the sub-training sample set train _ data is sub-trained by using sampling algorithm 11Carrying out direct sampling to obtain a first sample set corresponding to the sampling algorithm 1, and then carrying out data fitting on the first sample set to obtain fitted data xf1=xe1+β(xg1-xe1)xe1、xg1Is the original data point in the first sample set, beta is the fitting coefficient, and the fitted data x is usedf1Obtaining a second sample set, combining the first sample set and the second sample set to obtain a target training sample set smote _ train _ data sampled by the sampling algorithm 11。
Similarly, the sampling algorithm 2 is utilized to pair the sub-training sample set train _ data2Carrying out direct sampling to obtain a first sample set corresponding to the sampling algorithm 2, and then carrying out data fitting on the first sample set to obtain fitted data xff2=kmeans(xee2)+β(kmeans(xgg2)-kmeans(xee2)),xgg2、xee2Data after kmean polymerization, beta is a fitting coefficient, and the fitted data x is utilizedff2Obtaining a second sample set, combining the first sample set and the second sample set to obtain a target training sample set smote _ train _ data sampled by a sampling algorithm 22。
Utilizing sampling algorithm 3 to pair sub-training sample set train _ data3Carrying out direct sampling to obtain a first sample set corresponding to the sampling algorithm 3, and then carrying out data fitting on the first sample set to obtain fitted data xff3=xee3+β(xgg3-xee3),xgg3、xee3The data density value in the first sample set corresponding to the sampling algorithm 3 is beta, which is a fitting coefficient, the fitting coefficient beta can be calculated by using a Gaussian density formula, and the fitted data x is usedff3Obtaining a second sample set, combining the first sample set and the second sample set to obtain a target training sample set smote _ train _ data sampled by a sampling algorithm 33。
Utilizing sampling algorithm 4 to pair sub-training sample set train _ data4Carrying out direct sampling to obtain a first sample set corresponding to the sampling algorithm 4, and then carrying out data fitting on the first sample set to obtain fitted data xff4=xee4+β(xgg4-xee4),xgg4,xee4Is a few sample points at the boundary, beta is a fitting coefficient, and the fitted data x is utilizedff4Obtaining a corresponding second sample set, and combining the first sample set and the second sample set to obtain a target training sample set smote _ train _ data sampled by a sampling algorithm 44。
Utilizing sampling algorithm 5 to pair sub-training sample set train _ data5Carrying out direct sampling to obtain a first sample set corresponding to the sampling algorithm 5, and then carrying out data fitting on the first sample set to obtain fitted data xff5=xee5+β(xgg5-xee5),xgg5,xee5Is any sample point at the boundary, beta is a fitting coefficient, and the fitted data x is utilizedff5Obtaining a corresponding second sample set, and combining the first sample set and the second sample set to obtain a target training sample set smote _ train _ data sampled by a sampling algorithm 55. In this embodiment, only the sampling algorithm in fig. 5 is taken as an example, and the calculation methods of other algorithms are similar to that in this embodiment, and are not described herein again.
In the embodiment, the distribution characteristics of the positive sample under the real condition can be known through the sampling result (namely the target training sample set) when the sampling algorithm is selected, which is beneficial to the parameters of the later optimization model training and verification links.
Step S130: training model parameters of the target model through each target training sample set respectively to obtain a plurality of trained models;
in this embodiment, the target training sample set smote _ train _ data is usedjRespectively training the model parameters of the target model to obtain the model parameters of the target model and smote _ train _ datajThe corresponding trained model, for example, the model to be trained obtains a target training sample set smote _ train _ data through a sampling algorithm jjAfter training, obtaining a corresponding modelj. For example, a set of target-trained samples smote _ train_data1The trained target model is a model1。
Obtaining a trained modeljLater, a model is neededjSee the following steps for specific details.
Step S140: and determining a sampling algorithm matched with the target model in the sampling algorithm set according to the evaluation index value of each trained model.
In this embodiment, before obtaining the evaluation index value of the trained model, the test sample of the model needs to be processed, specifically: acquiring a total test sample set; splitting the total test sample set into a plurality of sub-test sample sets which correspond to each sampling algorithm one by one; and respectively redistributing each sub-test sample set by a non-return sampling method to obtain a plurality of target test sample sets corresponding to each sub-test sample set, wherein each sub-test sample set corresponds to each sampling algorithm one by one, namely a plurality of target test sample sets corresponding to each sampling algorithm are obtained.
In a feasible embodiment, the number of sampling algorithms in the sampling algorithm set is obtained, the total test sample set test _ data is divided into equal parts with the same number as the sampling algorithms, a plurality of sub-test sample sets with the same data are obtained, the test data of each sub-test sample set is redistributed through a non-return sampling method, and a plurality of target test sample sets are obtained, wherein the number of the target test sample sets corresponding to each sub-test sample set is the same. In this embodiment, since the test sample data is used to detect the trained model performance, the trained model can be detected more comprehensively in order to traverse each test sample data, and since the non-return sampling method can ensure that each test data can be input into the modeljTherefore, the non-return sampling method is adopted to divide each sub-test sample set into a plurality of target test sample sets, wherein the number of the target test sample sets can be considered to be set according to the precision requirement of the model, and is preferably 10 or 20.
For example, the number of sampling algorithms is 5, and the data of the total test sample set is divided into 5 equal parts to obtain 5 sub-samplesTest sample set, recorded as test _ dataj(0<<J is the number of sampling algorithms, and each test _ data is assigned to one sampling algorithmjThe samples are divided into 10 parts equally by using the non-return sample and are recorded as test _ dataijWherein, 0 < i ≦ 10, i is the data copy, test _ dataijA sample set is tested for the target. E.g., a target test sample set test data corresponding to sampling algorithm 111、test_data21……test_datai1。
It should be noted that, in order to obtain an evaluation index value of the trained model, when the total test sample set and the sub-test sample sets are split, it is required to ensure that each split test sample set includes a positive sample, that is, each target test sample set includes a positive sample. The specific splitting mode can be split through a classifier or can be split manually.
In the embodiment, for each trained model, the model identification label of each target test sample set test sample is obtained through the corresponding model; and identifying the label and the actual label of the test sample according to the model of the target test sample set test sample to obtain the evaluation index value of each trained model.
In this embodiment, the model identification tag may be a model in which test samples in a target test sample set are input into the modeljThe model identification label can be the same data as the actual label of the test sample, can also be the data similar to the actual label of the test sample, and can also be the data inconsistent with the actual label of the test sample, and the model can be detected by comparing the model identification label with the actual label of the test samplejThe performance of (c). And quantifying the consistency of the model identification label and the actual label of the test sample to obtain the evaluation index value of each trained model.
In this embodiment, the evaluation index values of all the trained models are compared; and selecting the sampling algorithm corresponding to the trained model with the maximum evaluation index value from the sampling algorithm set as the sampling algorithm matched with the target model. That is to sayIn other words, after the target test sample set is tested, the model with the maximum evaluation index value is obtainedjThe corresponding sampling algorithm j is the sampling algorithm that matches the target model.
For example, the evaluation index value of each trained model is calculated by using the precision rate and the recall rate of the model, the precision rate (precision) calculates the proportion of all correctly retrieved samples to all actually retrieved samples, and the recall rate (recall) calculates the proportion of all correctly retrieved samples to all samples which should be retrieved. Test _ data set of target test samplesijRespectively substituted into model modelsjRespectively obtaining test _ data under a sampling algorithm jiPrecision ratio precision ofijRecall with recalling rate recallijModel is calculatedjWeighted average of recall and precision op _ precisionjAs an evaluation index value of the model.
Weighted average op _ precision of recall and precision of sampling algorithm jjThe calculation formula of (2) is as follows:
wherein, i is more than 0 and less than or equal to 10, i is the data share, alpha is the weight of the recall rate, alpha is more than or equal to 0 and less than or equal to 1, callijModel trained after sampling for sampling algorithm j at test _ dataiRecall rate of, precisonijModel trained after sampling for sampling algorithm j at test _ dataiThe accuracy of (2).
The evaluation index value of the model in the embodiment is calculated by selecting the precision rate and the recall rate, the weighted average maximum value of the recall rate and the precision rate of the model is obtained, and the weighted average maximum value of the recall rate and the precision rate of the model is the largest, so that the data noise is reduced after sampling by the corresponding sampling algorithm, the data characteristic distribution is obvious, and the model fitting effect and the generalization capability can be improved. Thus, op _ precision is selectedjThe sampling algorithm j corresponding to the maximum value in (b) is the sampling algorithm matched with the target model.
The method comprises the steps of dividing a total training sample set into target training sample sets with the same number as sampling algorithms to be matched, sampling the target training sample sets by using each sampling algorithm, combining direct sampling and fitting sampling data, enabling the sampling data to be more fit with the characteristic distribution of original data, improving the balance of sample data, testing each trained model by using a target testing sample set to obtain an evaluation index value, detecting the model performance of the model after the sample data obtained by the corresponding sampling algorithm is trained according to the evaluation index value, further matching the optimal sampling algorithm for the model, and enabling the model to have better performance.
It should be noted that, according to the sampling algorithm matching method for a model provided in the embodiment of the present application, the execution subject may be a sampling algorithm matching device for a model, or alternatively, a control module in the sampling algorithm matching device for a model for executing a sampling algorithm matching method for loading a model. In the embodiment of the present application, a sampling algorithm matching method for loading a model is performed by a sampling algorithm matching device of a model, which is taken as an example, to describe a sampling algorithm matching method of a model provided in the embodiment of the present application.
Referring to fig. 2, the sampling algorithm matching apparatus for a model in this embodiment includes:
an algorithm obtaining module 201, configured to obtain a sampling algorithm set, where the sampling algorithm set includes a plurality of sampling algorithms;
the sampling module 202 is configured to perform sample sampling through each sampling algorithm in the sampling algorithm set respectively to obtain a target training set sample set corresponding to each sampling algorithm;
a training module 203, configured to train the target model through each sample set, respectively, to obtain a plurality of trained models;
and the matching module 204 is configured to determine, in the sampling algorithm set, a sampling algorithm matched with the target model according to the evaluation index value of each trained model.
In this embodiment, the sampling module 202 is further configured to: acquiring a total training sample set of a target model; distributing samples in the total training sample set by a sample-putting back method to obtain sub-training sample sets corresponding to each sampling algorithm one by one; and respectively sampling samples from the corresponding sub-training sample sets through each sampling algorithm to obtain corresponding target training sample sets.
In this embodiment, the sampling module 202 is further configured to: respectively sampling samples from the corresponding sub-training sample sets through each sampling algorithm to obtain first sample sets corresponding to the sampling algorithms one by one; respectively performing data fitting on each first sample set to obtain second sample sets corresponding to each sampling algorithm one by one; and for each sampling algorithm, combining the corresponding first sample set and the second sample set to obtain a corresponding target training sample set.
In this embodiment, the matching module 204 is further configured to: acquiring a total test sample set; splitting the total test sample set into a plurality of sub-test sample sets which correspond to each sampling algorithm one by one; redistributing each sub-test sample set by a non-return sampling method to obtain a plurality of target test sample sets corresponding to each sub-test sample set; for each trained model, obtaining a model identification label of each target test sample set test sample through a corresponding model; and identifying the label and the actual label of the test sample according to the model of the target test sample set test sample to obtain the evaluation index value of each trained model.
In this embodiment, the matching module 204 is further configured to: according to the model identification label of each target test sample centralized test sample and the actual label of the test sample, obtaining the precision rate and the recall rate of the trained model; and obtaining an evaluation index value of each trained model according to the precision rate and the recall rate.
And, the matching module 204 is further configured to: comparing the evaluation index values of all the trained models; and selecting the sampling algorithm corresponding to the trained model with the maximum evaluation index value from the sampling algorithm set as the sampling algorithm matched with the target model.
The specific functions performed by the modules of the apparatus are described in the steps S110 to S140 of the method, and are not described herein again.
The sampling algorithm matching device of the model in the embodiment of the present application may be a device, or may be a component, an integrated circuit, or a chip in a terminal. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine or a self-service machine, and the like, and the embodiments of the present application are not particularly limited.
The sampling algorithm matching device of the model in the embodiment of the application can be a device with an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present application are not limited specifically.
The sampling algorithm matching device for a model provided in the embodiment of the present application can implement each process implemented by the sampling algorithm matching device for a model in the method embodiment of fig. 1, and is not described herein again to avoid repetition.
The method comprises the steps of dividing a total training sample set into target training sample sets with the same number as sampling algorithms to be matched, sampling the target training sample sets by using each sampling algorithm, combining direct sampling and fitting sampling data, enabling the sampling data to be more fit with the characteristic distribution of original data, improving the balance of sample data, testing each trained model by using a target testing sample set to obtain an evaluation index value, detecting the model performance of the model after the sample data obtained by the corresponding sampling algorithm is trained according to the evaluation index value, further matching the optimal sampling algorithm for the model, and enabling the model to have better performance.
Optionally, an electronic device is further provided in this embodiment of the present application, and includes a processor 1010, a memory 1009, and a program or an instruction stored in the memory 1009 and capable of running on the processor 1010, where the program or the instruction is executed by the processor 1010 to implement each process of the above embodiment of the sampling algorithm matching method for a model, and can achieve the same technical effect, and details are not described here to avoid repetition.
It should be noted that the electronic devices in the embodiments of the present application include the mobile electronic devices and the non-mobile electronic devices described above.
Drawing (A)3The hardware structure diagram of the electronic device is used for realizing the embodiment of the application.
The electronic device 1000 includes, but is not limited to: a radio frequency unit 1001, a network module 1002, an audio output unit 1003, an input unit 1004, a sensor 1005, a display unit 1006, a user input unit 1007, an interface unit 1008, a memory 1009, and a processor 1010.
Those skilled in the art will appreciate that the electronic device 1000 may further comprise a power source (e.g., a battery) for supplying power to various components, and the power source may be logically connected to the processor 1010 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system. Drawing (A)3The electronic device structures shown in the figures do not constitute limitations of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is not repeated here.
The electronic device 1000 includes, but is not limited to: memory 1009, and processor 1010, among other components.
Those skilled in the art will appreciate that the electronic device 1000 may further comprise a power source (e.g., a battery) for supplying power to various components, and the power source may be logically connected to the processor 1010 through a power management system, so as to implement functions of managing charging, discharging, and power consumption through the power management system. Drawing (A)3The electronic device structures shown in the figures do not constitute limitations of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is not repeated here.
The processor 1010 is configured to obtain a sampling algorithm set; respectively sampling samples through each sampling algorithm in the sampling algorithm set to obtain a corresponding target training sample set; training model parameters of the target model through each target training sample set respectively to obtain a plurality of trained models; and determining a sampling algorithm matched with the target model in the sampling algorithm set according to the evaluation index value of each trained model.
It should be understood that in the embodiment of the present application, the input Unit 1004 may include a Graphics Processing Unit (GPU) 10041 and a microphone 10042, and the Graphics Processing Unit 10041 processes image data of still pictures or videos obtained by an image capturing device (such as a camera) in a video capturing mode or an image capturing mode. The display unit 1006 may include a display panel 10061, and the display panel 10061 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 1007 includes a touch panel 10071 and other input devices 10072. The touch panel 10071 is also referred to as a touch screen. The touch panel 10071 may include two parts, a touch detection device and a touch controller. Other input devices 10072 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein. The memory 1009 may be used to store software programs as well as various data, including but not limited to application programs and operating systems. Processor 1010 may integrate an application processor that handles primarily operating systems, user interfaces, applications, etc. and a modem processor that handles primarily wireless communications. It will be appreciated that the modem processor described above may not be integrated into processor 1010.
The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the signal transmission method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
The method comprises the steps of dividing a total training sample set into target training sample sets with the same number as sampling algorithms to be matched, sampling the target training sample sets by using each sampling algorithm, combining direct sampling and fitting sampling data, enabling the sampling data to be more fit with the characteristic distribution of original data, improving the balance of sample data, testing each trained model by using a target testing sample set to obtain an evaluation index value, detecting the model performance of the model after the sample data obtained by the corresponding sampling algorithm is trained according to the evaluation index value, further matching the optimal sampling algorithm for the model, and enabling the model to have better performance.
The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the embodiment of the sampling algorithm matching method for a model, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.
The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to run a program or an instruction to implement each process of the embodiment of the sampling algorithm matching method for a model, and the same technical effect can be achieved, and in order to avoid repetition, the details are not repeated here.
It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present application.
While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (10)
1. A method for matching a sampling algorithm of a model, the method comprising:
acquiring a sampling algorithm set, wherein the sampling algorithm set comprises a plurality of sampling algorithms;
respectively sampling samples through each sampling algorithm in the sampling algorithm set to obtain a target training sample set corresponding to each sampling algorithm;
respectively training the target models through each target training sample set to obtain a plurality of trained models;
and determining a sampling algorithm matched with the target model in the sampling algorithm set according to the evaluation index value of each trained model.
2. The method of claim 1, wherein the separately sampling samples by each sampling algorithm in the set of sampling algorithms to obtain a corresponding set of target training samples comprises:
acquiring a total training sample set;
distributing the samples in the total training sample set by a sample-putting back method to obtain sub-training sample sets corresponding to each sampling algorithm one by one;
and respectively sampling samples from the corresponding sub-training sample sets through each sampling algorithm to obtain corresponding target training sample sets.
3. The method of claim 2, wherein the separately sampling samples from the corresponding sub-training sample set by each sampling algorithm to obtain a corresponding target training sample set comprises:
respectively sampling samples from the corresponding sub-training sample sets through each sampling algorithm to obtain first sample sets corresponding to the sampling algorithms one by one;
respectively performing data fitting on each first sample set to obtain second sample sets corresponding to each sampling algorithm one by one;
and for each sampling algorithm, combining the corresponding first sample set and the second sample set to obtain a corresponding target training sample set.
4. The method of claim 1, further comprising, before the determining, in the set of sampling algorithms, a sampling algorithm that matches the target model based on the evaluation index value of each trained model, the method further comprising:
acquiring a total test sample set;
splitting the total test sample set into a plurality of sub-test sample sets which correspond to each sampling algorithm one by one;
redistributing each sub-test sample set by a non-return sampling method to obtain a plurality of target test sample sets corresponding to each sub-test sample set;
for each trained model, obtaining a model identification label of each target test sample set test sample through a corresponding model;
and obtaining an evaluation index value of each trained model according to the model identification label of the target test sample set test sample and the actual label of the test sample.
5. The method of claim 4, wherein obtaining an evaluation index value of each trained model according to the model identification labels of the target test sample set and the actual labels of the test samples comprises:
according to the model identification label of each target test sample centralized test sample and the actual label of the test sample, obtaining the precision rate and the recall rate of the trained model;
and obtaining an evaluation index value of each trained model according to the precision rate and the recall rate.
6. The method of claim 1, wherein determining a sampling algorithm matching the target model in the set of sampling algorithms according to the evaluation index value of each of the trained models comprises:
comparing the evaluation index values of all the trained models;
and selecting the sampling algorithm corresponding to the trained model with the maximum evaluation index value from the sampling algorithm set as the sampling algorithm matched with the target model.
7. An apparatus for matching a sampling algorithm of a model, comprising:
the algorithm acquisition module is used for acquiring a sampling algorithm set, and the sampling algorithm set comprises a plurality of sampling algorithms;
the sampling module is used for sampling samples through each sampling algorithm in the sampling algorithm set respectively to obtain a target training sample set corresponding to each sampling algorithm;
the training module is used for training the target model through each target training sample set respectively to obtain a plurality of trained models;
and the matching module is used for determining a sampling algorithm matched with the target model in the sampling algorithm set according to the evaluation index value of each trained model.
8. The apparatus of claim 7, wherein the sampling module is further configured to:
acquiring a total training sample set;
distributing the samples in the total training sample set by a sample-putting back method to obtain sub-training sample sets corresponding to each sampling algorithm one by one;
and respectively sampling samples from the corresponding sub-training sample sets through each sampling algorithm to obtain corresponding target training sample sets.
9. An electronic device comprising a processor, a memory and a program or instructions stored on the memory and executable on the processor, the program or instructions, when executed by the processor, implementing the steps of the sampling algorithm matching method of a model according to claims 1-6.
10. A readable storage medium, on which a program or instructions are stored, which program or instructions, when executed by a processor, carry out the steps of the sampling algorithm matching method of a model according to claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110159651.5A CN112819079A (en) | 2021-02-04 | 2021-02-04 | Model sampling algorithm matching method and device and electronic equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110159651.5A CN112819079A (en) | 2021-02-04 | 2021-02-04 | Model sampling algorithm matching method and device and electronic equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112819079A true CN112819079A (en) | 2021-05-18 |
Family
ID=75861657
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110159651.5A Pending CN112819079A (en) | 2021-02-04 | 2021-02-04 | Model sampling algorithm matching method and device and electronic equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112819079A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113537510A (en) * | 2021-07-13 | 2021-10-22 | 中国工商银行股份有限公司 | Machine learning model data processing method and device based on unbalanced data set |
-
2021
- 2021-02-04 CN CN202110159651.5A patent/CN112819079A/en active Pending
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113537510A (en) * | 2021-07-13 | 2021-10-22 | 中国工商银行股份有限公司 | Machine learning model data processing method and device based on unbalanced data set |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111860573B (en) | Model training method, image category detection method and device and electronic equipment | |
CN108197652B (en) | Method and apparatus for generating information | |
CN110766080B (en) | Method, device and equipment for determining labeled sample and storage medium | |
CN109460793A (en) | A kind of method of node-classification, the method and device of model training | |
CN113067653B (en) | Spectrum sensing method and device, electronic equipment and medium | |
CN107729081B (en) | Application management method and device, storage medium and electronic equipment | |
CN116108393B (en) | Power sensitive data classification and classification method and device, storage medium and electronic equipment | |
CN112818888B (en) | Video auditing model training method, video auditing method and related devices | |
CN113140012B (en) | Image processing method, device, medium and electronic equipment | |
KR20230165085A (en) | Method and system for quantitatively evaluating alignment between multimodal feature vectors | |
CN112766402A (en) | Algorithm selection method and device and electronic equipment | |
CN112241761A (en) | Model training method and device and electronic equipment | |
CN110276404B (en) | Model training method, device and storage medium | |
CN111062440A (en) | Sample selection method, device, equipment and storage medium | |
CN111222558A (en) | Image processing method and storage medium | |
CN112819079A (en) | Model sampling algorithm matching method and device and electronic equipment | |
CN115994839A (en) | Prediction method, device, equipment and medium for answer accuracy | |
CN116959059A (en) | Living body detection method, living body detection device and storage medium | |
WO2024031332A1 (en) | Stock trend analysis method and apparatus based on machine learning | |
CN112966272B (en) | Internet of things Android malicious software detection method based on countermeasure network | |
CN112182382B (en) | Data processing method, electronic device, and medium | |
CN116467153A (en) | Data processing method, device, computer equipment and storage medium | |
CN111835541B (en) | Method, device, equipment and system for detecting aging of flow identification model | |
CN114519520B (en) | Model evaluation method, device and storage medium | |
CN110728615B (en) | Steganalysis method based on sequential hypothesis testing, terminal device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |