CN116304932B - Sample generation method, device, terminal equipment and medium - Google Patents
Sample generation method, device, terminal equipment and medium Download PDFInfo
- Publication number
- CN116304932B CN116304932B CN202310566164.XA CN202310566164A CN116304932B CN 116304932 B CN116304932 B CN 116304932B CN 202310566164 A CN202310566164 A CN 202310566164A CN 116304932 B CN116304932 B CN 116304932B
- Authority
- CN
- China
- Prior art keywords
- sample set
- evolution
- population
- disease
- feature
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 44
- 201000010099 disease Diseases 0.000 claims abstract description 167
- 208000037265 diseases, disorders, signs and symptoms Diseases 0.000 claims abstract description 167
- 239000011159 matrix material Substances 0.000 claims abstract description 33
- 238000003745 diagnosis Methods 0.000 claims abstract description 15
- 238000004364 calculation method Methods 0.000 claims description 36
- 238000003066 decision tree Methods 0.000 claims description 24
- 238000004590 computer program Methods 0.000 claims description 20
- 230000006870 function Effects 0.000 claims description 10
- 238000004422 calculation algorithm Methods 0.000 claims description 9
- 230000002068 genetic effect Effects 0.000 claims description 8
- 238000010606 normalization Methods 0.000 claims description 4
- 238000012545 processing Methods 0.000 abstract description 5
- 230000000694 effects Effects 0.000 description 5
- 230000004044 response Effects 0.000 description 4
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000001514 detection method Methods 0.000 description 3
- 230000008569 process Effects 0.000 description 3
- 230000006978 adaptation Effects 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 238000010276 construction Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000002411 adverse Effects 0.000 description 1
- 238000013459 approach Methods 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000007781 pre-processing Methods 0.000 description 1
- 238000012549 training Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/20—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for computer-aided diagnosis, e.g. based on medical expert systems
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02A—TECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
- Y02A90/00—Technologies having an indirect contribution to adaptation to climate change
- Y02A90/10—Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation
Abstract
The application is applicable to the technical field of data processing, and provides a sample generation method, a device, terminal equipment and a medium, wherein a disease sample set and a normal sample set are obtained by dividing a diagnosis sample set; adding covariance and entropy into a feature matrix of a disease sample set, and constructing a gradient lifting tree model according to the feature matrix; according to the gradient lifting tree model, the entropy of the disease sample set is combined, and the splitting contribution degree and entropy difference of each feature are calculated respectively to obtain feature weights of the features; constructing an initial population corresponding to the disease sample set; obtaining a first evolution probability corresponding to the initial population according to the covariance, the entropy and the characteristic weight; according to the first evolution probability, evolving the initial population to obtain an intermediate population, and calculating a second evolution probability corresponding to the intermediate population; and taking the intermediate population meeting the evolution termination condition as a new disease sample set based on the first evolution probability and the second evolution probability. The application can improve the quality of the generated sample.
Description
Technical Field
The present application belongs to the technical field of data processing, and in particular, relates to a sample generation method, a device, a terminal device, and a medium.
Background
An unbalanced data set is one in which the number of samples in each category in the data set varies greatly. Taking the classification problem as an example, if the number of negative class samples is far greater than that of positive class samples, the classification result is biased towards the negative class, and the misclassification rate of the positive class is high. Indeed, if the imbalance ratio of the data set exceeds 4:1, the classifier will be biased towards a large number of classes, whereas in super-imbalance data sets, the proportion of positive classes in the data set will typically be less than one percent, and such problems are particularly pronounced in the medical field.
In the medical field, the ratio of the disease sample (positive sample) is often much lower than that of the normal sample (negative sample), which greatly influences the subsequent identification of the disease and thus endangers the health of the patient.
To address such problems, those skilled in the art have employed a synthetic minority class oversampling approach (SMOTE, synthetic Minority Oversampling Technique) to increase the number of negative class samples to balance the data set, thereby avoiding the above-described problems. However, the quality of the sample generated by the current sample generation method is low, which causes uncertainty and randomness in the subsequent classification result and affects the effect of the classifier.
Disclosure of Invention
The embodiment of the application provides a sample generation method, a sample generation device, terminal equipment and a sample generation medium, which can solve the problem of low quality of samples generated by the current sample generation method.
In a first aspect, an embodiment of the present application provides a sample generating method, including:
step 1, dividing an unbalanced diagnosis sample set to obtain a disease sample set and a normal sample set;
step 2, respectively calculating covariance and entropy of the disease sample set, adding the covariance and entropy into a feature matrix of the disease sample set, and constructing a gradient lifting tree model according to the feature matrix; the gradient lifting tree model comprises a plurality of decision trees, and leaf nodes of the decision trees are in one-to-one correspondence with features in the feature matrix;
step 3, according to the gradient lifting tree model, combining entropy of the disease sample set, calculating split contribution degree and entropy difference of each feature in the plurality of features respectively, and obtaining feature weights of the features; the split contribution degree and the entropy difference are used for representing the importance of the feature;
step 4, constructing an initial population corresponding to the disease sample set; wherein, population individuals of the initial population are in one-to-one correspondence with disease samples in the disease sample set;
Step 5, obtaining a first evolution probability corresponding to the initial population according to the covariance, the entropy and the characteristic weight; the first evolution probability is used for representing the importance of population individuals in the initial population;
step 6, evolving the initial population according to the first evolution probability to obtain an intermediate population, and calculating a second evolution probability corresponding to the intermediate population;
step 7, judging whether the middle population meets a preset evolution termination condition according to the first evolution probability and the second evolution probability;
step 8, if the intermediate population meets the preset evolution termination condition, taking the intermediate population as a new disease sample set; otherwise, the middle population is used as the initial population in the step 6, and the step 6 is executed again.
Optionally, step 3 includes:
by calculation formula
Obtaining the contribution degree of division; wherein ,/>Indicate->The individual features are in->The degree of split contribution on the individual decision trees,,/>representing the total number of features in the disease sample set, +.>Indicate->Leaf nodes on the individual decision tree, +.>Indicate->No. H of individual disease samples>Personal characteristics (I)>,/>Represents the total number of disease samples in the disease sample set, +.>Indicate->The individual features are in->Optimal split point on the individual decision tree, < - >Indicate->Individual disease samples, ->Indicate>Other disease samples than those;
by calculation formula
Obtaining entropy difference; wherein ,/>Indicate->Entropy difference of individual features,/>Represents the entropy of the disease sample set,indicating that the affected sample set is removed +.>Entropy after individual features,/->Indicating removal of +.>Probability of individual disease samples;
for each feature, the following steps are performed:
obtaining the variance contribution degree of the features according to the division contribution degree and the covariance; the variance contribution is used to characterize the importance of the feature;
and obtaining the feature weight of the feature according to the entropy difference and the variance contribution degree.
Optionally, obtaining the variance contribution of the feature according to the split contribution and the covariance includes:
by calculation formula
Obtaining variance contribution degree; wherein ,/>Indicate->Variance contribution of individual features, ++>Representing covariance.
Optionally, obtaining the feature weight of the feature according to the entropy difference and the variance contribution degree includes:
by calculation formulaObtain characteristic weight +.>; wherein ,/>Indicate->Feature weight of individual feature, +.>Representing a hyper-parameter for controlling the weight of said entropy difference and the weight of said variance contribution, ++ >。
Optionally, step 5 includes:
by calculation formula
Obtaining a first evolution probability; wherein ,/>Representing normalization coefficients, +.>,/>Representing a set of feature weights->,/>Representing probability density->Representing an exponential term for adjusting the shape of the exponential function such that the trend of the probability density function across different dimensions is smoother,。
optionally, step 6 includes:
by calculation formulaObtaining new population individuals->;
By calculation formulaObtaining the adaptability of the new population individuals>;
Ordering all new population individuals according to the order of the fitness from large to small, and sorting the previous population individualsIndividual new populations are added to the disease sample set +.>In (2) obtaining a new disease sample set +.>;
According to a genetic algorithm, according to the new disease sample setObtaining a middle population;
calculating a second evolution probability corresponding to the middle population。
Optionally, step 7 includes:
by calculation formulaObtaining evolution loss of intermediate population>; wherein ,representing the middle population->Evolution loss of the secondary evolution, which is used for representing the superiority of the intermediate population,,/>representing a preset maximum population evolution frequency;
counting the execution times of the step 6, and taking the times as evolution times of the intermediate population;
If the evolution loss is smaller than a preset threshold value and the evolution frequency is larger than or equal to the maximum population evolution frequency, determining that the intermediate population meets a preset evolution termination condition; otherwise, determining that the intermediate population does not meet the preset evolution termination condition.
In a second aspect, an embodiment of the present application provides a sample generating device, including:
the sample dividing module is used for dividing an unbalanced diagnosis sample set to obtain a disease sample set and a normal sample set;
the gradient lifting tree module is used for respectively calculating covariance and entropy of the disease sample set, adding the covariance and entropy into a feature matrix of the disease sample set, and constructing a gradient lifting tree model according to the feature matrix; the gradient lifting tree model comprises a plurality of decision trees, and leaf nodes of the decision trees are in one-to-one correspondence with features in the feature matrix;
the feature weight module is used for respectively calculating the splitting contribution degree and entropy difference of each feature in the plurality of features according to the gradient lifting tree model and the entropy of the disease sample set to obtain feature weights of the features; the split contribution degree and the entropy difference are used for representing the importance of the feature;
the genetic algorithm module is used for constructing an initial population corresponding to the disease sample set; wherein, population individuals of the initial population are in one-to-one correspondence with disease samples in the disease sample set;
The first evolution probability module is used for obtaining a first evolution probability corresponding to the initial population according to the covariance, the entropy and the feature weight; the first evolution probability is used for representing the importance of population individuals in the initial population;
the second evolution probability module is used for evolving the initial population according to the first evolution probability to obtain an intermediate population, and calculating a second evolution probability corresponding to the intermediate population;
the judging module is used for judging whether the middle population meets the preset evolution termination condition according to the first evolution probability and the second evolution probability;
the sample generation module is used for taking the intermediate population as a new disease sample set if the intermediate population meets the preset evolution termination condition; otherwise, the middle population is used as the initial population in the second evolution probability module, and the second evolution probability module is executed in a returning mode.
In a third aspect, an embodiment of the present application provides a terminal device, including a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor implements the sample generation method described above when executing the computer program.
In a fourth aspect, an embodiment of the present application provides a computer readable storage medium storing a computer program, which when executed by a processor, implements the above-described sample generation method.
The scheme of the application has the following beneficial effects:
in some embodiments of the application, covariance and entropy of a disease sample set are calculated, the covariance and entropy are added into a feature matrix of the disease sample set, a gradient lifting tree model is constructed according to the feature matrix, then feature weights are obtained according to the gradient lifting tree model, features of the disease sample set are expanded, and important features are given larger weights, so that diversity and robustness of samples are improved, and the quality of the samples is improved; the evolution probability of the population is obtained according to the feature weight, so that excellent individuals in the population can be prevented from being evolved, excellent samples are reserved, and the sample quality is improved.
Other advantageous effects of the present application will be described in detail in the detailed description section which follows.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the embodiments or the description of the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings can be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a sample generation method according to an embodiment of the present application;
FIG. 2 is a schematic diagram of a sample generating device according to an embodiment of the present application;
fig. 3 is a schematic structural diagram of a terminal device according to an embodiment of the present application.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth such as the particular system architecture, techniques, etc., in order to provide a thorough understanding of the embodiments of the present application. It will be apparent, however, to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It should be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
It should also be understood that the term "and/or" as used in the present specification and the appended claims refers to any and all possible combinations of one or more of the associated listed items, and includes such combinations.
As used in the present description and the appended claims, the term "if" may be interpreted as "when..once" or "in response to a determination" or "in response to detection" depending on the context. Similarly, the phrase "if a determination" or "if a [ described condition or event ] is detected" may be interpreted in the context of meaning "upon determination" or "in response to determination" or "upon detection of a [ described condition or event ]" or "in response to detection of a [ described condition or event ]".
Furthermore, the terms "first," "second," "third," and the like in the description of the present specification and in the appended claims, are used for distinguishing between descriptions and not necessarily for indicating or implying a relative importance.
Reference in the specification to "one embodiment" or "some embodiments" or the like means that a particular feature, structure, or characteristic described in connection with the embodiment is included in one or more embodiments of the application. Thus, appearances of the phrases "in one embodiment," "in some embodiments," "in other embodiments," and the like in the specification are not necessarily all referring to the same embodiment, but mean "one or more but not all embodiments" unless expressly specified otherwise. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.
Aiming at the problem of low quality of samples generated by the current sample generation method, the application provides a sample generation method, a device, a terminal device and a medium, the covariance and entropy are added into a feature matrix of a disease sample set by calculating the covariance and entropy of the disease sample set, a gradient lifting tree model is constructed according to the feature matrix, and then feature weights are obtained according to the gradient lifting tree model, so that the features of the disease sample set are expanded, and more weights are given to important features, thereby ensuring the subsequent generation of excellent samples and facilitating the lifting of the generated sample quality; the evolution probability of the population is obtained according to the feature weight, so that excellent individuals in the population can be prevented from being evolved, excellent samples are reserved, and the sample quality is improved.
As shown in fig. 1, the sample generating method provided by the application comprises the following steps:
step 1, dividing an unbalanced diagnosis sample set to obtain a disease sample set and a normal sample set.
In some embodiments of the present application, the diagnostic sample set is further pre-processed prior to performing step 1 to ensure the validity of the diagnostic sample set. Illustratively, the preprocessing includes: data desiccation, missing value padding, etc.
The above-mentioned division of the unbalanced diagnostic sample set is based on the class labels of the diagnostic samples in the diagnostic sample set. Exemplary, diagnostic sample sets, wherein ,/>Representing the total number of diagnostic samples in the diagnostic sample set. Dividing a diagnostic sample labeled as a disease into a disease sample set +.>, wherein ,represents +.>Individual disease samples, ->Representing the total number of disease samples. Based on the above division principle, a normal sample set can be obtained accordingly>, wherein ,/>Represents +.>Normal samples->Indicating the total number of normal samples.
And 2, respectively calculating covariance and entropy of the disease sample set, adding the covariance and entropy into a feature matrix of the disease sample set, and constructing a gradient lifting tree model according to the feature matrix.
The gradient lifting tree model comprises a plurality of decision trees, and leaf nodes of the decision trees are in one-to-one correspondence with features in the feature matrix.
It should be noted that the feature matrix of the disease sample set includes a plurality of features, each feature representing one dimension of the disease sample set, and illustratively, in one embodiment of the present application, any of the disease samples The corresponding characteristics are that, wherein ,/>Representing the total number of features of the disease sample.
The gradient lifting tree model is common knowledge to those skilled in the art, and a common gradient lifting tree model construction method can be adopted to construct the gradient lifting tree model, and the construction mode is not limited herein.
It is worth mentioning that the covariance and entropy are added into the feature matrix of the disease sample set, so that the features of the disease sample set are expanded, and the robustness of the disease sample is enhanced.
And 3, respectively calculating the split contribution degree and entropy difference of each feature in the plurality of features according to the gradient lifting tree model and by combining the entropy of the disease sample set, so as to obtain the feature weight of the feature.
Wherein both the split contribution and entropy difference are used to characterize the importance of the feature.
And 4, constructing an initial population corresponding to the disease sample set.
Wherein, the population individuals of the initial population are in one-to-one correspondence with the disease samples in the disease sample set.
In some embodiments of the application, genetic algorithms may be used to construct an initial population for a disease sample set.
And step 5, obtaining a first evolution probability corresponding to the initial population according to the covariance, the entropy and the characteristic weight.
The first evolution probability is used for representing the importance of population individuals in the initial population.
Specifically, by a calculation formula
Obtaining a first evolution probability; wherein ,/>Representing normalization coefficients, +.>,/>Representing a set of feature weights->,/>Representing probability density->Sample set of diseases>Personal characteristic weight, ++>Representing an exponential term for adjusting the shape of the exponential function such that the probability density function is in different dimensionsIs smoother and is->。
And 6, evolving the initial population according to the first evolution probability to obtain an intermediate population, and calculating a second evolution probability corresponding to the intermediate population.
The second evolution probability corresponding to the middle population is calculated so as to facilitate the subsequent calculation of the similarity between the middle population and the initial population, so that the situation that the newly generated sample is larger than the original sample is avoided, and the effect of the classifier is affected.
The second evolution probability is used for representing the importance of population individuals in the middle population.
And 7, judging whether the middle population meets a preset evolution termination condition according to the first evolution probability and the second evolution probability.
Step 8, if the intermediate population meets the preset evolution termination condition, taking the intermediate population as a new disease sample set; otherwise, the middle population is used as the initial population in the step 6, and the step 6 is executed again.
In some embodiments of the present application, after obtaining the intermediate population meeting the evolution termination condition, a new corresponding disease sample set is generated according to the number of population individuals in the intermediate population and the disease sample types corresponding to the individuals in each population, and the new disease sample set and the normal sample set form a balance sample set at this time, so that the classifier is trained, and accurate identification and classification of the disease can be realized.
In embodiments of the present application, the formula may be calculated byObtaining the number of new disease samples to be generated +.>; wherein ,/>Representing the total number of normal samples in the normal sample set obtained in step 1, < >>Representing the total number of disease samples in the disease sample set obtained in step 1, < >>Representing the control parameters set by the person,/-, for example>。
The following describes an exemplary procedure of step 3 (calculating the split contribution degree and entropy difference of each of the plurality of features according to the gradient lifting tree model, in combination with the entropy of the disease sample set, to obtain the feature weight of the feature).
Step 3.1, through a calculation formula
Obtaining the contribution degree of division。
wherein ,indicate->The individual features are in->Split contribution on individual decision tree, +.>,/>Representing the total number of features in the disease sample set, +. >Indicate->Leaf nodes on the individual decision tree, +.>Indicate->No. H of individual disease samples>Personal characteristics (I)>,/>Represents the total number of disease samples in the disease sample set, +.>Indicate->The individual features are in->Optimal split point on the individual decision tree, < ->Indicate->Individual disease samples, ->Indicate>Other disease samples than those;
step 3.2, through a calculation formula
Obtaining entropy difference。
wherein ,indicate->Entropy difference of individual features,/>Entropy representing the disease sample set, +.>Indicating that the disease sample set is removed +.>Entropy after individual features,/->Indicating removal of +.>Probability of individual disease samples.
For each feature, the following steps are performed:
step 3.3, obtaining the variance contribution degree of the features according to the division contribution degree and the covariance; the variance contribution is used to characterize the importance of the feature.
Specifically, by a calculation formula
Obtaining variance contribution degree; wherein ,/>Indicate->Variance contribution of individual features, ++>Representing covariance.
And 3.4, obtaining the characteristic weight of the characteristic according to the entropy difference and the variance contribution degree.
Specifically, by a calculation formulaObtain characteristic weight +.>; wherein ,/>Indicate->Feature weight of individual feature, +. >Representing hyper-parameters, weights for controlling the entropy difference and the variance contribution, weight ++>。
The following describes an exemplary procedure of step 6 (evolve the initial population according to the first evolution probability, obtain the intermediate population, and calculate the second evolution probability corresponding to the intermediate population).
Step 6.1, through a calculation formulaObtaining new population individuals->。
It should be noted that, the greater the evolution probability, the more important the population of individuals, and the more likely it should be kept to avoid the evolution of the excellent population of individuals.
Step 6.2, through the calculation formulaObtaining the adaptability of the new population individuals>。
Step 6.3, sorting all new population individuals according to the order of the fitness from the big to the small, and sorting the previous population individualsThe new population of individuals is added to the disease sample set +.>In (2) obtaining a new disease sample set +.>。
Step 6.5, according to the genetic algorithm, according to the new disease sample setObtaining an intermediate population.
Step 6.6, calculating a second evolution probability corresponding to the intermediate population。
The following describes an exemplary procedure of step 7 (determining whether the intermediate population satisfies the preset evolution termination condition according to the first evolution probability and the second evolution probability).
Step 7.1, through the calculation formulaObtaining evolution loss of intermediate population>。
wherein ,representing the middle population->Evolution loss of the secondary evolution, evolution loss being used to characterize the superiority of the intermediate population, ++>,/>Representing the preset maximum population evolution times.
And 7.2, counting the execution times of the step 6, and taking the times as evolution times of the intermediate population.
Step 7.3, if the evolution loss is smaller than a preset threshold value and the evolution frequency is larger than or equal to the maximum population evolution frequency, determining that the intermediate population meets a preset evolution termination condition; otherwise, determining that the intermediate population does not meet the preset evolution termination condition.
It should be noted that, when the evolution loss is greater than the preset threshold, it is indicated that the difference between the middle population and the initial population is too large to be suitable for the subsequent training of the degree classifier. Meanwhile, the number of evolution times is required to be larger than or equal to the maximum population evolution times so as to improve the precision of the middle population and avoid the influence of accidental factors on the result.
As can be seen from the above steps, the sample generation method provided by the application adds covariance and entropy into the feature matrix of the disease sample set by calculating the covariance and entropy of the disease sample set, builds a gradient lifting tree model according to the feature matrix, obtains feature weights according to the gradient lifting tree model, expands the features of the disease sample set, gives larger weights to important features, improves the diversity and robustness of the samples, and is beneficial to improving the quality of the samples; the evolution probability of the population is obtained according to the feature weight, so that excellent individuals in the population can be prevented from being evolved, excellent samples are reserved, and the sample quality is improved.
The sample generating device provided by the present application is exemplified below.
As shown in fig. 2, the sample generation apparatus 200 includes:
the sample dividing module 201 is configured to divide an unbalanced diagnostic sample set to obtain a disease sample set and a normal sample set;
the gradient lifting tree module 202 is configured to calculate covariance and entropy of the disease sample set, add the covariance and entropy to a feature matrix of the disease sample set, and construct a gradient lifting tree model according to the feature matrix; the gradient lifting tree model comprises a plurality of decision trees, and leaf nodes of the decision trees are in one-to-one correspondence with features in the feature matrix;
the feature weight module 203 is configured to calculate a split contribution degree and an entropy difference of each feature in the plurality of features according to the gradient lifting tree model and in combination with entropy of the disease sample set, so as to obtain feature weights of the features; the split contribution degree and the entropy difference are used for representing the importance of the feature;
the genetic algorithm module 204 is configured to construct an initial population corresponding to the disease sample set; wherein, population individuals of the initial population are in one-to-one correspondence with disease samples in the disease sample set;
the first evolution probability module 205 is configured to obtain a first evolution probability corresponding to the initial population according to the covariance, the entropy and the feature weight; the first evolution probability is used for representing the importance of population individuals in the initial population;
The second evolution probability module 206 is configured to evolve the initial population according to the first evolution probability, obtain an intermediate population, and calculate a second evolution probability corresponding to the intermediate population;
a judging module 207, configured to judge whether the intermediate population meets a preset evolution termination condition according to the first evolution probability and the second evolution probability;
the sample generation module 208 is configured to take the intermediate population as a new disease sample set if the intermediate population meets a preset evolution termination condition; otherwise, the middle population is used as the initial population in the second evolution probability module, and the second evolution probability module is executed in a returning mode.
It should be noted that, because the content of information interaction and execution process between the above devices/units is based on the same concept as the method embodiment of the present application, specific functions and technical effects thereof may be referred to in the method embodiment section, and will not be described herein.
It will be apparent to those skilled in the art that, for convenience and brevity of description, only the above-described division of the functional units and modules is illustrated, and in practical application, the above-described functional distribution may be performed by different functional units and modules according to needs, i.e. the internal structure of the apparatus is divided into different functional units or modules to perform all or part of the above-described functions. The functional units and modules in the embodiment may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit, where the integrated units may be implemented in a form of hardware or a form of a software functional unit. In addition, the specific names of the functional units and modules are only for distinguishing from each other, and are not used for limiting the protection scope of the present application. The specific working process of the units and modules in the above system may refer to the corresponding process in the foregoing method embodiment, which is not described herein again.
As shown in fig. 3, an embodiment of the present application provides a terminal device, and as shown in fig. 3, a terminal device 300 of the embodiment includes: at least one processor 301 (only one processor is shown in fig. 3), a memory 302 and a computer program 303 stored in the memory 302 and executable on the at least one processor 301, the processor 301 implementing the steps in any of the various method embodiments described above when executing the computer program 303.
Specifically, when the processor 301 executes the computer program 303, the diagnostic sample set is divided into a disease sample set and a normal sample set, covariance and entropy of the disease sample set are added into a feature matrix of the disease sample set, a gradient lifting tree model is built, entropy of the disease sample set is combined, split contribution degree and entropy difference of each feature are calculated respectively to obtain feature weights of the features, an initial population corresponding to the disease sample set is built, a first evolution probability of the initial population is obtained according to the covariance, the entropy and the feature weights, the initial population is evolved according to the evolution probability to obtain a middle population, a second evolution probability of the middle population is calculated, and the middle population meeting evolution termination conditions is used as a new disease sample set based on the first evolution probability and the second evolution probability. The covariance and entropy of the disease sample set are calculated, the covariance and entropy are added into a feature matrix of the disease sample set, a gradient lifting tree model is built according to the feature matrix, and then feature weights are obtained according to the gradient lifting tree model, so that the features of the disease sample set are expanded, larger weights are given to important features, the diversity and the robustness of the samples are improved, and the quality of the samples is improved; the evolution probability of the population is obtained according to the feature weight, so that excellent individuals in the population can be prevented from being evolved, excellent samples are reserved, and the sample quality is improved.
The processor 301 may be a central processing unit (CPU, central Processing Unit), the processor 301 may also be other general purpose processors, digital signal processors (DSP, digital Signal Processor), application specific integrated circuits (ASIC, application Specific Integrated Circuit), off-the-shelf programmable gate arrays (FPGA, field-Programmable Gate Array) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, or the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 302 may in some embodiments be an internal storage unit of the terminal device 300, such as a hard disk or a memory of the terminal device 300. The memory 302 may also be an external storage device of the terminal device 300 in other embodiments, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card) or the like, which are provided on the terminal device 300. Further, the memory 302 may also include both an internal storage unit and an external storage device of the terminal device 300. The memory 302 is used to store an operating system, application programs, boot loader (BootLoader), data, and other programs, such as program code for the computer program. The memory 302 may also be used to temporarily store data that has been output or is to be output.
Embodiments of the present application also provide a computer readable storage medium storing a computer program which, when executed by a processor, implements steps for implementing the various method embodiments described above.
Embodiments of the present application provide a computer program product enabling a terminal device to carry out the steps of the method embodiments described above when the computer program product is run on the terminal device.
The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable storage medium. Based on such understanding, the present application may implement all or part of the flow of the method of the above embodiments, and may be implemented by a computer program to instruct related hardware, where the computer program may be stored in a computer readable storage medium, and when the computer program is executed by a processor, the computer program may implement the steps of each of the method embodiments described above. Wherein the computer program comprises computer program code which may be in source code form, object code form, executable file or some intermediate form etc. The computer readable medium may include at least: any entity or device capable of carrying computer program code to sample generating device/terminal equipment, recording medium, computer Memory, read-Only Memory (ROM), random access Memory (RAM, random Access Memory), electrical carrier signals, telecommunication signals, and software distribution media. Such as a U-disk, removable hard disk, magnetic or optical disk, etc. In some jurisdictions, computer readable media may not be electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and in part, not described or illustrated in any particular embodiment, reference is made to the related descriptions of other embodiments.
Those of ordinary skill in the art will appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed apparatus/network device and method may be implemented in other manners. For example, the apparatus/network device embodiments described above are merely illustrative, e.g., the division of the modules or units is merely a logical functional division, and there may be additional divisions in actual implementation, e.g., multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed may be an indirect coupling or communication connection via interfaces, devices or units, which may be in electrical, mechanical or other forms.
The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
The sample generation method provided by the application has the following advantages:
the quality of data can be improved, and the defects of the traditional SMOTE method can be overcome. When the method is used for generating the synthesized sample, the feature weight is required to be calculated to ensure the similarity between the new sample and the original sample so as to reduce noise generation, and therefore, a gradient lifting tree method is provided for solving the feature weight of the disease sample set, and the gradient lifting tree is a feature weight calculation method combining the entropy and covariance of the data set, has good interpretation and robustness, and is very suitable for a large-scale data set and a high-dimensional feature space. Sample information is added through a probability generation model, entropy and covariance of an original data set are considered, and characteristic weight evolution probability of a population is calculated according to variation conditions of the generated synthesized sample and the covariance and entropy of the original data set to judge whether the generated synthesized sample has representativeness and reliability. If the evolution probability of the characteristic weight of the generated sample is too large, the generated sample can be considered to have adverse effect on the quality of the data set, and the sample needs to be regenerated. The method can reduce noise interference and improve the diversity and robustness of the generated data. Has important application value in important fields such as disease diagnosis, credit card fraud and the like.
While the foregoing is directed to the preferred embodiments of the present application, it will be appreciated by those skilled in the art that various modifications and adaptations can be made without departing from the principles of the present application, and such modifications and adaptations are intended to be comprehended within the scope of the present application.
Claims (4)
1. A method of generating a sample, comprising:
step 1, dividing an unbalanced diagnosis sample set to obtain a disease sample set and a normal sample set; the diagnosis sample set comprises a plurality of diagnosis samples aiming at a disease, and a label corresponding to each diagnosis sample in the plurality of diagnosis samples, wherein the label comprises the disease and the normal;
step 2, respectively calculating covariance and entropy of the disease sample set, adding the covariance and the entropy into a feature matrix of the disease sample set, and constructing a gradient lifting tree model according to the feature matrix; the gradient lifting tree model comprises a plurality of decision trees, and leaf nodes of the decision trees are in one-to-one correspondence with features in the feature matrix; the separately calculating covariance and entropy of the disease sample set comprises:
by calculation formula
;
;
Obtaining the covariance ,/>Representing the disease sample set,/->Represents +.f in the disease sample set>Individual disease samples, ->,/>Representing a total number of disease samples in the disease sample set;
by calculation formulaObtaining said entropy->, wherein ,/>Indicating the removal of +.>Probability of individual disease samples;
the feature matrix of the disease sample set includes a plurality of features, each feature representing a dimension of the disease sample set, any of the disease samplesThe corresponding characteristic is->, wherein ,/>Representing the total number of features of the disease sample;
step 3, according to the gradient lifting tree model, combining the entropy of the disease sample set, respectively calculating the split contribution degree and entropy difference of each feature in the plurality of features to obtain the feature weight of the feature; wherein the split contribution and the entropy difference are both used to characterize the importance of the feature; the step 3 comprises the following steps:
by calculation formula
Obtaining the split contribution degree; wherein ,/>Indicate->The individual features are in->The degree of split contribution on the individual decision trees,,/>representing the total number of features in the disease sample set, < >>Indicate->Leaf nodes on the individual decision tree, +.>Indicate->No. H of individual disease samples >Personal characteristics (I)>,/>Representing the total number of disease samples in said disease sample set, < > x->Indicate->Personal specialSymptoms at->Optimal split point on the individual decision tree, < ->Indicate->Individual disease samples, ->Indicate>Other disease samples than those;
by calculation formula
Obtaining the entropy difference; wherein ,/>Indicate->Entropy difference of individual features,/>Entropy representing the disease sample set, +.>Representing the removal of the +.>Entropy after individual features,/->Indicating the removal of +.>Probability of individual disease samples;
for each feature, the following steps are performed:
obtaining the variance contribution of the feature according to the division contribution and the covariance; the variance contribution is used to characterize the importance of the feature;
obtaining feature weights of the features according to the entropy difference and the variance contribution degree; and obtaining the feature weight of the feature according to the entropy difference and the variance contribution degree, wherein the feature weight comprises the following components: by calculation formulaObtaining the characteristic weight +.>; wherein ,/>Indicate->Feature weight of individual feature, +.>Representing a hyper-parameter for controlling the weight of said entropy difference and the weight of said variance contribution, ++ >;
Step 4, constructing an initial population corresponding to the disease sample set; wherein, the population individuals of the initial population are in one-to-one correspondence with the disease samples in the disease sample set;
step 5, obtaining a first evolution probability corresponding to the initial population according to the covariance, the entropy and the feature weight; the first evolution probability is used for representing the importance of population individuals in the initial population; the step 5 comprises the following steps:
by calculation formula
Obtaining the first evolution probability; wherein ,/>Representing normalization coefficients, +.>,/>Representing a set of feature weights->,/>Representing probability density->Representing an exponential term for adjusting the shape of the exponential function such that the trend of the probability density function across different dimensions is smoother,;
step 6, evolving the initial population according to the first evolution probability to obtain an intermediate population, and calculating a second evolution probability corresponding to the intermediate population; the step 6 comprises the following steps:
by calculation formulaObtaining new population individuals->;
By calculation formulaObtaining the fitness of the new population of individuals>;
Ordering all new population individuals according to the order of the fitness from large to small, and sorting the previous population individuals Individual new populations are added to the disease sample set +.>In (2) obtaining a new disease sample set +.>;
According to a genetic algorithm, according to the new disease sample setObtaining the intermediate population;
calculating a second evolution probability corresponding to the intermediate population;
Step 7, judging whether the intermediate population meets a preset evolution termination condition according to the first evolution probability and the second evolution probability; the step 7 comprises the following steps:
by calculation formulaObtaining the evolution loss of the intermediate population +.>; wherein ,representing the intermediate population +.>Evolution loss of secondary evolution, said evolution loss being used to characterize the superiority of said intermediate population,/->,/>Representing a preset maximum population evolution frequency;
counting the times of execution of the step 6, and taking the times as the evolution times of the intermediate population;
if the evolution loss is smaller than a preset threshold and the evolution frequency is larger than or equal to the maximum population evolution frequency, determining that the intermediate population meets a preset evolution termination condition; otherwise, determining that the intermediate population does not meet a preset evolution termination condition;
step 8, if the intermediate population meets the preset evolution termination condition, taking the intermediate population as a new disease sample set; otherwise, the intermediate population is used as the initial population in the step 6, and the step 6 is executed again.
2. A sample generation apparatus, comprising:
the sample dividing module is used for dividing an unbalanced diagnosis sample set to obtain a disease sample set and a normal sample set; the diagnosis sample set comprises a plurality of diagnosis samples aiming at a disease, and a label corresponding to each diagnosis sample in the plurality of diagnosis samples, wherein the label comprises the disease and the normal;
the gradient lifting tree module is used for respectively calculating covariance and entropy of the disease sample set, adding the covariance and the entropy into a feature matrix of the disease sample set, and constructing a gradient lifting tree model according to the feature matrix; the gradient lifting tree model comprises a plurality of decision trees, and leaf nodes of the decision trees are in one-to-one correspondence with features in the feature matrix; the separately calculating covariance and entropy of the disease sample set comprises:
by calculation formula
;
;
Obtaining the covariance,/>Representing the disease sample set,/->Represents +.f in the disease sample set>Individual disease samples, ->,/>Representing a total number of disease samples in the disease sample set;
by calculation formulaObtaining said entropy->, wherein ,/>Indicating the removal of +. >Probability of individual disease samples;
the feature matrix of the disease sample set includes a plurality of features, each feature representing a dimension of the disease sample set, any of the disease samplesThe corresponding characteristic is->, wherein ,/>Representing the total number of features of the disease sample;
the feature weight module is used for respectively calculating the split contribution degree and entropy difference of each feature in the plurality of features according to the gradient lifting tree model and the entropy of the disease sample set to obtain the feature weight of the feature; wherein the split contribution and the entropy difference are both used to characterize the importance of the feature; the feature weight module comprises: by calculation formula
;
Obtaining the split contribution degree; wherein ,/>Indicate->The individual features are in->The degree of split contribution on the individual decision trees,,/>representing the total number of features in the disease sample set, < >>Indicate->The leaf nodes on the individual decision trees are,indicate->No. H of individual disease samples>Personal characteristics (I)>,/>Representing the total number of disease samples in said disease sample set, < > x->Indicate->Personal specialSymptoms at->Optimal split point on the individual decision tree, < ->Indicate->Individual disease samples, ->Indicate>Other disease samples than those;
By calculation formula
;
Obtaining the entropy difference; wherein ,/>Indicate->Entropy difference of individual features,/>Entropy representing the disease sample set, +.>Representing the removal of the +.>Entropy after individual features,/->Indicating the removal of +.>Probability of individual disease samples;
for each feature, the following steps are performed:
obtaining the variance contribution of the feature according to the division contribution and the covariance; the variance contribution is used to characterize the importance of the feature;
obtaining feature weights of the features according to the entropy difference and the variance contribution degree; and obtaining the feature weight of the feature according to the entropy difference and the variance contribution degree, wherein the feature weight comprises the following components: by calculation formulaObtaining the characteristic weight +.>; wherein ,/>Indicate->Feature weight of individual feature, +.>Representing a hyper-parameter for controlling the weight of said entropy difference and the weight of said variance contribution, ++>;
The genetic algorithm module is used for constructing an initial population corresponding to the disease sample set; wherein, the population individuals of the initial population are in one-to-one correspondence with the disease samples in the disease sample set;
the first evolution probability module is used for obtaining a first evolution probability corresponding to the initial population according to the covariance, the entropy and the characteristic weight; the first evolution probability is used for representing the importance of population individuals in the initial population; the first evolution probability module comprises: by calculation formula
;
Obtaining the first evolution probability; wherein ,/>Representing normalization coefficients, +.>,/>Representing a set of feature weights->,/>Representing probability density->Representing an exponential term for adjusting the shape of the exponential function such that the trend of the probability density function across different dimensions is smoother,;
the second evolution probability module is used for evolving the initial population according to the first evolution probability to obtain an intermediate population, and calculating a second evolution probability corresponding to the intermediate population; the second actorThe probability module comprises: by calculation formulaObtaining new population individuals->The method comprises the steps of carrying out a first treatment on the surface of the By calculation formula->Obtaining the fitness of the new population of individuals>The method comprises the steps of carrying out a first treatment on the surface of the Ordering all new population individuals in order of fitness from big to small, and sorting the former +.>Individual new populations are added to the disease sample set +.>In (2) obtaining a new disease sample set +.>The method comprises the steps of carrying out a first treatment on the surface of the According to a genetic algorithm, according to said new set of disease samples +.>Obtaining the intermediate population; calculating a second evolution probability corresponding to the intermediate population>;
The judging module is used for judging whether the middle population meets a preset evolution termination condition according to the first evolution probability and the second evolution probability; the judging module comprises: by calculation formula Obtaining the intermediate speciesEvolution loss of group->; wherein ,/>Representing the intermediate population +.>Evolution loss of secondary evolution, said evolution loss being used to characterize the superiority of said intermediate population,/->,/>Representing a preset maximum population evolution frequency; counting the times of execution of the step 6, and taking the times as the evolution times of the intermediate population; if the evolution loss is smaller than a preset threshold and the evolution frequency is larger than or equal to the maximum population evolution frequency, determining that the intermediate population meets a preset evolution termination condition; otherwise, determining that the intermediate population does not meet a preset evolution termination condition;
the sample generation module is used for taking the intermediate population as a new disease sample set if the intermediate population meets a preset evolution termination condition; otherwise, the intermediate population is used as an initial population in the second evolution probability module, and the second evolution probability module is executed in a returning mode.
3. A terminal device comprising a memory, a processor and a computer program stored in the memory and executable on the processor, characterized in that the processor implements the sample generation method of claim 1 when executing the computer program.
4. A computer readable storage medium storing a computer program, which when executed by a processor implements the sample generation method of claim 1.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310566164.XA CN116304932B (en) | 2023-05-19 | 2023-05-19 | Sample generation method, device, terminal equipment and medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310566164.XA CN116304932B (en) | 2023-05-19 | 2023-05-19 | Sample generation method, device, terminal equipment and medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN116304932A CN116304932A (en) | 2023-06-23 |
CN116304932B true CN116304932B (en) | 2023-09-05 |
Family
ID=86796372
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310566164.XA Active CN116304932B (en) | 2023-05-19 | 2023-05-19 | Sample generation method, device, terminal equipment and medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN116304932B (en) |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP3767392A1 (en) * | 2019-07-17 | 2021-01-20 | ASML Netherlands B.V. | Method and apparatus for determining feature contribution to performance |
CN112307472A (en) * | 2020-11-03 | 2021-02-02 | 平安科技(深圳)有限公司 | Abnormal user identification method and device based on intelligent decision and computer equipment |
US11080617B1 (en) * | 2017-11-03 | 2021-08-03 | Paypal, Inc. | Preservation of causal information for machine learning |
CN113470816A (en) * | 2021-06-30 | 2021-10-01 | 中国人民解放军总医院第一医学中心 | Machine learning-based diabetic nephropathy prediction method, system and prediction device |
CN114334036A (en) * | 2021-11-25 | 2022-04-12 | 腾讯科技(深圳)有限公司 | Model training method, related device, equipment and storage medium |
CN114613497A (en) * | 2022-03-24 | 2022-06-10 | 北京交通大学 | Intelligent medical auxiliary diagnosis method of patient sample based on GBDT sample level |
WO2022126448A1 (en) * | 2020-12-16 | 2022-06-23 | 华为技术有限公司 | Neural architecture search method and system based on evolutionary learning |
CN115510981A (en) * | 2022-09-29 | 2022-12-23 | 中银金融科技(苏州)有限公司 | Decision tree model feature importance calculation method and device and storage medium |
CN115577357A (en) * | 2022-10-08 | 2023-01-06 | 重庆邮电大学 | Android malicious software detection method based on stacking integration technology |
CN115937698A (en) * | 2022-09-29 | 2023-04-07 | 华中师范大学 | Self-adaptive tailing pond remote sensing deep learning detection method |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
EP1590469A4 (en) * | 2002-11-12 | 2005-12-28 | Becton Dickinson Co | Diagnosis of sepsis or sirs using biomarker profiles |
CN107622801A (en) * | 2017-02-20 | 2018-01-23 | 平安科技(深圳)有限公司 | The detection method and device of disease probability |
US20210241140A1 (en) * | 2020-02-05 | 2021-08-05 | The Florida International University Board Of Trustees | Hybrid methods and systems for feature selection |
CN112052875A (en) * | 2020-07-30 | 2020-12-08 | 华控清交信息科技(北京)有限公司 | Method and device for training tree model |
US20220374783A1 (en) * | 2021-05-11 | 2022-11-24 | Paypal, Inc. | Feature selection using multivariate effect optimization models |
-
2023
- 2023-05-19 CN CN202310566164.XA patent/CN116304932B/en active Active
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US11080617B1 (en) * | 2017-11-03 | 2021-08-03 | Paypal, Inc. | Preservation of causal information for machine learning |
EP3767392A1 (en) * | 2019-07-17 | 2021-01-20 | ASML Netherlands B.V. | Method and apparatus for determining feature contribution to performance |
CN112307472A (en) * | 2020-11-03 | 2021-02-02 | 平安科技(深圳)有限公司 | Abnormal user identification method and device based on intelligent decision and computer equipment |
WO2022126448A1 (en) * | 2020-12-16 | 2022-06-23 | 华为技术有限公司 | Neural architecture search method and system based on evolutionary learning |
CN113470816A (en) * | 2021-06-30 | 2021-10-01 | 中国人民解放军总医院第一医学中心 | Machine learning-based diabetic nephropathy prediction method, system and prediction device |
CN114334036A (en) * | 2021-11-25 | 2022-04-12 | 腾讯科技(深圳)有限公司 | Model training method, related device, equipment and storage medium |
CN114613497A (en) * | 2022-03-24 | 2022-06-10 | 北京交通大学 | Intelligent medical auxiliary diagnosis method of patient sample based on GBDT sample level |
CN115510981A (en) * | 2022-09-29 | 2022-12-23 | 中银金融科技(苏州)有限公司 | Decision tree model feature importance calculation method and device and storage medium |
CN115937698A (en) * | 2022-09-29 | 2023-04-07 | 华中师范大学 | Self-adaptive tailing pond remote sensing deep learning detection method |
CN115577357A (en) * | 2022-10-08 | 2023-01-06 | 重庆邮电大学 | Android malicious software detection method based on stacking integration technology |
Non-Patent Citations (1)
Title |
---|
多层梯度提升树在药品鉴别中的应用;杜师帅 等;《计算机科学与探索》;第第14卷卷(第第02期期);第260-273页 * |
Also Published As
Publication number | Publication date |
---|---|
CN116304932A (en) | 2023-06-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111181939B (en) | Network intrusion detection method and device based on ensemble learning | |
CN102024455B (en) | Speaker recognition system and method | |
CN106653001A (en) | Baby crying identifying method and system | |
WO2019200782A1 (en) | Sample data classification method, model training method, electronic device and storage medium | |
CN109918498B (en) | Problem warehousing method and device | |
CN111967392A (en) | Face recognition neural network training method, system, equipment and storage medium | |
JP7024515B2 (en) | Learning programs, learning methods and learning devices | |
CN111785384A (en) | Abnormal data identification method based on artificial intelligence and related equipment | |
WO2022121163A1 (en) | User behavior tendency identification method, apparatus, and device, and storage medium | |
CN110276621A (en) | Data card is counter to cheat recognition methods, electronic device and readable storage medium storing program for executing | |
CN109656878B (en) | Health record data generation method and device | |
CN111259924A (en) | Boundary synthesis, mixed sampling, anomaly detection algorithm and data classification method | |
CN109858518A (en) | A kind of large data clustering method based on MapReduce | |
CN105224954B (en) | It is a kind of to remove the topic discovery method that small topic influences based on Single-pass | |
CN116304932B (en) | Sample generation method, device, terminal equipment and medium | |
CN114494263B (en) | Medical image lesion detection method, system and equipment integrating clinical information | |
Parmar et al. | A review on data balancing techniques and machine learning methods | |
CN113569957A (en) | Object type identification method and device of business object and storage medium | |
CN112989284A (en) | SAMME algorithm-based data noise detection method, system and equipment | |
CN111723700A (en) | Face recognition method and device and electronic equipment | |
Adeniyi et al. | Automatic Classification of Breast Cancer Histopathological Images Based on a Discriminatively Fine-Tuned Deep Learning Model | |
CN112328787B (en) | Text classification model training method and device, terminal equipment and storage medium | |
CN115358157B (en) | Prediction analysis method and device for litter size of individual litters and electronic equipment | |
CN117708569B (en) | Identification method, device, terminal and storage medium for pathogenic microorganism information | |
CN113361497B (en) | Intelligent tail box application method and device based on training sample fingerprint identification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |