CN110298450A - A kind of virtual sample generation method based on production confrontation network - Google Patents

A kind of virtual sample generation method based on production confrontation network Download PDF

Info

Publication number
CN110298450A
CN110298450A CN201910424679.XA CN201910424679A CN110298450A CN 110298450 A CN110298450 A CN 110298450A CN 201910424679 A CN201910424679 A CN 201910424679A CN 110298450 A CN110298450 A CN 110298450A
Authority
CN
China
Prior art keywords
sample
data
generator
confrontation network
generation method
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910424679.XA
Other languages
Chinese (zh)
Inventor
卢剑
何良华
李旭升
颜野
朱学华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tongji University
Peking University Third Hospital Peking University Third Clinical Medical College
Original Assignee
Tongji University
Peking University Third Hospital Peking University Third Clinical Medical College
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tongji University, Peking University Third Hospital Peking University Third Clinical Medical College filed Critical Tongji University
Priority to CN201910424679.XA priority Critical patent/CN110298450A/en
Publication of CN110298450A publication Critical patent/CN110298450A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • G06N20/10Machine learning using kernel methods, e.g. support vector machines [SVM]

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Engineering & Computer Science (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Medical Informatics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of virtual sample generation method based on production confrontation network, comprising: carries out svm classifier pre-training based on input sample of the WGAN-GP improved model to generator;The position of decision surface is obtained according to the svm classifier, and simulates the minority class sample for generating and being located near the decision surface;According to the geometric distance for generating sample and the decision surface, setting generates the position constraint of sample, to control sample formation range;PCGAN model is established according to the position constraint, and carries out the minority class sample based on the PCGAN model and expands;The generation sample for meeting original distribution is generated near SVM decision surface by the PCGAN model.The present invention can improve the stability and practicability of production confrontation network.

Description

A kind of virtual sample generation method based on production confrontation network
Technical field
The present invention relates to deep learning nerual network technique fields, and in particular to a kind of void based on production confrontation network Quasi- sample generating method.
Background technique
Since data nature, the acquisition conditions such as complexity and economic factor cause data to be easy to appear distribution not Equilibrium appearance.And sample imbalance can make disaggregated model the phenomenon that decision surface offset occur, lead to not preferably be divided Class result.By taking SVM classifier as an example, the classification performance in imbalanced training sets can decline as unbalance factor increases. To solve the problems, such as sample imbalance, the game that production confrontation network passes through generator and arbiter can use at present, thus Very approximate non-artificial interference sample sequence is distributed required for generating with original sample.But there is training not in original GAN Stablize, the disadvantages of classifying quality is bad on imbalance problem.
Summary of the invention
The present invention provides a kind of virtual sample generation method based on production confrontation network, solves existing GAN model and exists The unstable and bad problem of imbalance problem classifying quality in training can improve the stability of production confrontation network and practical Property.
In order to achieve the above object, the present invention the following technical schemes are provided:
A kind of virtual sample generation method based on production confrontation network, comprising:
Svm classifier pre-training is carried out based on input sample of the WGAN-GP improved model to generator;
The position of decision surface is obtained according to the svm classifier, and simulates the minority class for generating and being located near the decision surface Sample;
According to the geometric distance for generating sample and the decision surface, setting generates the position constraint of sample, to control sample Formation range;
PCGAN model is established according to the position constraint, and carries out the minority class sample based on the PCGAN model and expands It fills;
The generation sample for meeting original distribution is generated near SVM decision surface by the PCGAN model.
Preferably, further includes:
Data screening is carried out to sample is generated;
Best generation data re -training svm classifier is selected, and then obtains new decision surface.
Preferably, described pair of generation sample progress data screening includes:
It is screened using crew, it is similar to the Euclidean distance and cosine of original sample and neighbour's sample to calculate it to generation sample Property, and judge whether to meet and impose a condition, if it is, selection exptended sample.
Preferably, described to include: using crew's screening
For eachDetermine a series of k-th of nearest samples xjThe S of compositionNeighbor, and for eachPass through traversalWithIt calculates separately and generates sample and minority class sample and neighbour's sample Distance;
If | | xi-xj| | < | | xk-xj| |, then further calculate (xi-xj) and (xk-xj) cosine similarity
If cosine similarity is greater than threshold value C, by xiIt is included in EDS extended data set Sexp
Wherein, xi、xj、xkRespectively generate sample, minority class sample, k-th of nearest samples, Smin、Sgen、SNeighbor Respectively minority class sample set, generation sample set, neighbour's sample set.
Preferably, described pair of generation sample carries out data screening further include:
Using the screening based on Danger collection, to screen the generation for meeting sample distribution being located near the Danger collection Sample, wherein the Danger collection includes the minority class sample containing most classes in neighbour's sample set.
Preferably, screening of the use based on Danger collection, comprising:
ForDetermine a series of nearest samples collection SDanger, nearest samples collection then is judged to each sample In belong to the numbers of most class samples, i.e., | Si:m-NNSmaj|, for meeting inequalityXiIt is formed SDanger
For eachThe data set for determining nearest samples composition is SNeighbor, and for each TraversalWithCalculate separately distance;
If | | xi-xj| | < | | xk-xj| |, then further calculate (xi-xj) and (xk-xj) cosine similarity;
If cosine similarity is greater than threshold value C, by xiIt is included in EDS extended data set Sexp
Wherein, SDangerMinority class sample set, S containing most classes are concentrated for nearest samplesi:m-NNFor arest neighbors sample This collection, SmajFor most class sample sets.
Preferably, described pair of generation sample carries out data screening further include:
Sample is mapped in higher-dimension separable space by kernel method, according to generating sample to the distance between hyperplane, when When distance is less than set threshold value, sample will be generated and bring exptended sample concentration into, wherein mapping function uses RBF core.
Preferably, the setting generates the position constraint of sample, comprising:
The restrictive condition of SmoothLoss, the restrictive condition are set are as follows:
Wherein, x is the formation range of sample, and y is the cut off value of smooth piecewise function.
It is preferably, described that PCGAN model is established according to the position constraint, comprising:
The loss function of generator:
The loss function of arbiter:
Wherein, E is expectation, c is class label, x be original sample,For generator generate sample,For x~pdataAndInterpolated sample between the two, ε~U [0,1], pgTo generate sample distribution, pdataIt is really distributed for sample, ppenaltyForDistribution, λ be hyper parameter,For gradient, L is that the production of restriction site fights the expression of network, LsmoothTo generate sample Position constraint function, LDFor the loss function of arbiter, LGFor the loss function of generator, D is the differentiation letter of authentic specimen Number, G are the generating function of sample, and w is the weight vectors of decision surface, and b is the displacement item of model.
Preferably, the minority class sample carried out based on the PCGAN model expands, comprising:
(1) hyper parameter of initialization design mainly includes gradient penalty coefficient λ1=10, decision surface constraint factor λ2=5, Arbiter frequency of training n in every wheel confrontation optimizationcritic=3, generator device frequency of training n in every wheel confrontation optimizationgen=1, it instructs Practice batch of data amount check m=10, Adam optimizer hyper parameter α=0.0001, β1=0.9, β2=0.99, arbiter is initialized ParameterWith generator parameter θ0, then Linear SVM is trained to obtain its divisional plane parameter (w, b);
(2) it for each sample of every batch of in arbiter training, first samples to obtain x~p from authentic specimen collectiondataAnd Class label c, then sampling obtains z~p (z) from noise profile, and noise is mapped to generation sample space, i.e.,Then it calculatesIt obtains generating sample, wherein [0,1] ε~U;
(3) expression formula of arbiter loss function isThen more New arbiter parameter, i.e.,
(4) repeat step (2)~step (3) and when execute this two recirculate at the end of deconditioning, circulation altogether ncritic× m times;
(5) for each sample of every batch of in generator training, first sampling obtains z~p (z) from noise profile, connects Calculate data and decision surface geometric distance
(6) according to SmoothL1Loss theoretical calculation generator loss function, its expression formula isGenerator parameter is further updated, i.e.,
(7) repeat step (5)~step (6) and when execute this two recirculate at the end of deconditioning, circulation altogether ngen× m times;
(7) step (2)~step (7) is repeated and when the generation data for generating network meet preset data demand When deconditioning.
The present invention provides a kind of virtual sample generation method based on production confrontation network, by WGAN-GP model It is added to sample and the decision surface distance based on SVM theory to limit, to carry out position constraint to sample is generated, then exist PCGAN carries out data screening after generating sample, after generating sample and screening by PCGAN model, chooses best generation sample This re -training SVM obtains new decision surface.The stability of production confrontation network model processing discrete data is improved, There is good effect in imbalance problem classification.
Detailed description of the invention
In order to illustrate more clearly of specific embodiments of the present invention, attached drawing needed in the embodiment will be made below Simply introduce.
Fig. 1 is a kind of flow chart of virtual sample generation method based on production confrontation network provided by the invention;
Fig. 2 is that the minority class sample for generating network based on confrontation type in the embodiment of the present invention expands flow diagram;
Fig. 3 is performance of the above-mentioned technical proposal compared to other methods on cmc, pima data set in the embodiment of the present invention Evaluation result figure;
Fig. 4 be in the embodiment of the present invention above-mentioned technical proposal compared to other methods on robot, satlog data set Evaluation results figure;
Fig. 5 be in the embodiment of the present invention above-mentioned technical proposal compared to other methods in haberman, semeion data set On Evaluation results figure;
Fig. 6 be in the embodiment of the present invention above-mentioned technical proposal compared to other methods on yeast, yeast_2 data set Evaluation results figure.
Specific embodiment
The scheme of embodiment in order to enable those skilled in the art to better understand the present invention with reference to the accompanying drawing and is implemented Mode is described in further detail the embodiment of the present invention.
Have the shortcomings that training is unstable, classifying quality is bad on imbalance problem for current original GAN network, this Invention provides a kind of virtual sample generation method based on production confrontation network, is based on by being added in WGAN-GP model Sample and the decision surface distance of SVM theory limit, to carry out position constraint to sample is generated, then generate sample in PCGAN After carry out data screening, by PCGAN model generate sample and screening after, choose best generation sample re -training SVM Obtain new decision surface.Solve the problems, such as that existing GAN model is bad in the upper unstable and imbalance problem classifying quality of training, energy Mention the practicability of production confrontation network.
As shown in Figure 1, a kind of virtual sample generation method based on production confrontation network, comprising:
Step 1: svm classifier pre-training is carried out based on input sample of the WGAN-GP improved model to generator;
Step 2: the position of decision surface being obtained according to the svm classifier, and simulates generation and is located near the decision surface Minority class sample;
Step 3: according to the geometric distance for generating sample and the decision surface, setting generates the position constraint of sample, with control This formation range of sample preparation;
Step 4: PCGAN model being established according to the position constraint, and carries out the minority class sample based on the PCGAN model This expansion;
Step 5: generating the generation sample for meeting original distribution near SVM decision surface by the PCGAN model.
Further, this method further include:
Step 6: carrying out data screening to sample is generated;
Step 7: selecting best generation data re -training svm classifier, and then obtain new decision surface.
Specifically, in a first aspect, the method for building up of the improvement PCGAN based on WGAN-GP includes the following steps:
Step S1: in generating portion pre-training SVM, the position of decision surface is obtained.
Only it is located at the point near decision surface in SVM theory and there is influence, therefore master of the invention to the variation of decision surface Wanting target is to fight network analog by production to generate the minority class sample being located near decision surface.
Step S2: in generator training process, the geometric distance for generating sample and decision surface is measured;
Step S3: being added to the restrictive condition of SmoothLoss, to control sample formation range;
Wherein, SmoothLoss restrictive condition is expressed as follows:
Wherein, x is the formation range of sample, and y is the cut off value of smooth piecewise function.
Further, the structure of the PCGAN, target are expressed as follows:
Wherein, the loss function of generator is expressed as follows:
The loss function of arbiter:
Wherein, E is expectation, c is class label, x be original sample,For generator generate sample,For x~pdataAndInterpolated sample between the two, ε~U [0,1], pgTo generate sample distribution, pdataIt is really distributed for sample, ppenaltyForDistribution, λ be hyper parameter,For gradient, L is that the production of restriction site fights the expression of network, LsmoothTo generate sample Position constraint function, LDFor the loss function of arbiter, LGFor the loss function of generator, D is the differentiation letter of authentic specimen Number, G are the generating function of sample, and w is the weight vectors of decision surface, and b is the displacement item of model.
In second aspect, the screening technique after three kinds of samples of proposition generate has, method 1: calculating it for generating sample Exptended sample is selected if meeting specified conditions with the Euclidean distance and cosine similarity of original sample and neighbour's sample.Method 2: screening is located at the generation sample for meeting sample distribution of " Danger collection " nearby.Wherein, " Danger collection " includes neighbour's sample Concentrate the minority class sample containing most classes.Method 3: sample is mapped in higher-dimension separable space by the present invention by kernel method, According to sample is generated to the distance between hyperplane, when distance is less than set threshold value, sample will be generated and bring expansion sample into This concentration.
Specifically, it includes: to be screened using crew that described pair of generation sample, which carries out data screening, to generate sample calculate its with The Euclidean distance and cosine similarity of original sample and neighbour's sample, and judge whether to meet and impose a condition, if it is, selection Exptended sample.
Further, described to include: using crew's screening
For eachDetermine a series of k-th of nearest samples xjThe S of compositionNeighbor, and for eachPass through traversalWithIt calculates separately and generates sample and minority class sample and neighbour's sample Distance.If | | xi-xj| | < | | xk-xj| |, then further calculate (xi-xj) and (xk-xj) cosine similarityIf cosine similarity is greater than threshold value C, by xiIt is included in EDS extended data set Sexp。 Wherein, xi、xj、xkRespectively generate sample, minority class sample, k-th of nearest samples, Smin、Sgen、SNeighborIt is respectively few Several classes of sample sets generate sample set, neighbour's sample set.
Described pair of generation sample carries out data screening further include: uses the screening based on Danger collection, is located at institute with screening State the generation sample for meeting sample distribution near Danger collection, wherein the Danger collection includes containing in neighbour's sample set The minority class sample of most classes.
Screening of the use based on Danger collection, comprising: forDetermine a series of nearest samples collection SDanger, nearest samples concentration, which belongs to the number of most class samples, then to be judged to each sample, i.e.,For Meet inequalityXiForm SDanger.For eachDetermine that nearest samples form Data set be SNeighbor, and for eachTraversal rWithCalculate separately distance.Such as Fruit | | xi-xj| | < | | xk-xj| |, then further calculate (xi-xj) and (xk-xj) cosine similarity;If cosine similarity is greater than Threshold value C, then by xiIt is included in EDS extended data set Sexp.Wherein, SDangerThe minority class sample containing most classes is concentrated for nearest samples This set, Si:m-NNFor nearest samples collection, SmajFor most class sample sets.
Described pair of generation sample carries out data screening further include: sample is mapped to higher-dimension separable space by kernel method In, according to sample is generated to the distance between hyperplane, when distance is less than set threshold value, sample will be generated and bring expansion into In sample set, wherein mapping function uses RBF core, and screening sample method is generated with Euclidean distance and cosine similarity to measure Similarity between sample and authentic specimen.
Although it should be noted that in above-mentioned screening sample embodiment by each step in the way of above-mentioned precedence It is described, it will be recognized to those skilled in the art that the effect in order to realize the present embodiment, between different steps not It must execute according to such order, (parallel) simultaneously can execute or be executed with reverse order, these simple variations are all Within protection scope of the present invention.
Further, the setting generates the position constraint of sample, comprising:
The restrictive condition of SmoothLoss, the restrictive condition are set are as follows:
Wherein, x is the formation range of sample, and y is the cut off value of smooth piecewise function.
It is described that PCGAN model is established according to the position constraint, comprising:
The loss function of generator:
The loss function of arbiter:
Wherein, E is expectation, c is class label, x be original sample,For generator generate sample,For x~pdataAndInterpolated sample between the two, ε~U [0,1], pgTo generate sample distribution, pdataIt is really distributed for sample, ppenaltyForDistribution, λ be hyper parameter,For gradient, L is that the production of restriction site fights the expression of network, LsmoothTo generate sample Position constraint function, LDFor the loss function of arbiter, LGFor the loss function of generator, D is the differentiation letter of authentic specimen Number, G are the generating function of sample, and w is the weight vectors of decision surface, and b is the displacement item of model.
In one embodiment, the minority class sample for generating network based on confrontation type, which expands process, may include steps of:
(1) hyper parameter of initialization design mainly includes gradient penalty coefficient λ1=10, decision surface constraint factor λ2=5, Arbiter frequency of training n in every wheel confrontation optimizationcritic=3, generator device frequency of training n in every wheel confrontation optimizationgen=1, it instructs Practice batch of data amount check m=10, Adam optimizer hyper parameter α=0.0001, β1=0.9, β2=0.99, arbiter is initialized ParameterWith generator parameter θ0, then Linear SVM is trained to obtain its divisional plane parameter (w, b);
(2) it for each sample of every batch of in arbiter training, first samples to obtain x~p from authentic specimen collectiondataAnd Class label c, then sampling obtains z~p (z) from noise profile, and noise is mapped to generation sample space, i.e.,Then it calculatesIt obtains generating sample, wherein [0,1] ε~U.
(3) expression formula of arbiter loss function isThen more New arbiter parameter, i.e.,
(4) repeat step (2)~step (3) and when execute this two recirculate at the end of deconditioning.Circulation is altogether ncritic× m times.
(5) for each sample of every batch of in generator training, first sampling obtains z~p (z) from noise profile, connects Calculate data and decision surface geometric distance
(6) according to SmoothL1Loss theoretical calculation generator loss function, its expression formula isGenerator parameter is further updated, i.e.,
(7) repeat step (5)~step (6) and when execute this two recirculate at the end of deconditioning.Circulation is altogether ngen× m times.
(7) step (2)~step (7) is repeated and when the generation data for generating network meet preset data demand When deconditioning.
In one embodiment, above-mentioned sample product process is specifically described.
Target data set uses the part unbalanced dataset in UCI database in embodiment, while according to the ratio of 8:2 The target data set is divided into training set and test set at random, each data is tied by 10 experiments in experimentation Obtained from fruit is averaged.Wherein use F1-score, G-mean as evaluation criterion.Comparison algorithm includes: that traditional SVM is calculated The technical solution proposed in method, the SVM algorithm with core (RBF), SMOTE algorithm and the present embodiment.
Refering to attached drawing 3, Fig. 3 illustrate in the present embodiment above-mentioned technical proposal compared to other methods in cmc and Evaluation results on pima data set.
Specifically, on cmc data set, translation of the PCGAN proposed in the present embodiment the data generated to divisional plane It plays a role in promoting, classifying quality is allowed to improve 12.5% (compared with RBF SVM);On pima data set, in the present embodiment The PCGAN and screening technique of proposition, are the models that target foundation is modified to SVM divisional plane, and sample generated is PCGAN mould The distribution of quasi- original sample, rather than linear interpolation after simple cluster, show more reasonable for the processing of such data, Therefore it meets and improves 5.2% in RBF SVM.
Refering to attached drawing 4, Fig. 4 illustrates in the present embodiment above-mentioned technical proposal compared to other methods in robot With the Evaluation results on stalog data set.
Specifically, on robot data set, the PCGAN proposed in the present embodiment data generated are on this data set It is smaller for the variation promotion of divisional plane, it is as a result slightly promoted, is 0.7%;On stalog data set, the present embodiment and RBF Svm classifier effect is close, but this method is best in group.
Refering to attached drawing 5, Fig. 5 illustrates above-mentioned technical proposal in the present embodiment and exists compared to other methods Evaluation results on haberman and semeion data set.
Specifically, on haberman data set, the PCGAN proposed in the present embodiment can preferably analogue data be distributed, So that classifying quality is improved, reach 16.3%;On semeion data set, the present embodiment performance is stablized, with RBF SVM effect is suitable.
Refering to attached drawing 6, Fig. 6 illustrates in the present embodiment above-mentioned technical proposal compared to other methods in yeast With the Evaluation results on yeast_2 data set.
Specifically, on yeast and yeast_2 data set, the PCGAN and screening technique proposed in the present embodiment generates sample This is excessively concentrated, and sample diversity is insufficient, therefore is promoted limited.
It is tested on 8 UCI data sets by above-mentioned sample product process to verify the performance of PCGAN method, and It was found that its F1-score and G-mean performance on most of data set is all very prominent, totally 14/16 index reaches the One, the SVM algorithm and SMOTE algorithm that effect is better than traditional SVM algorithm, has core (RBF).
As it can be seen that the present invention provides a kind of virtual sample generation method based on production confrontation network, by WGAN-GP Model is added to sample and decision surface distance limitation based on SVM theory, to then exist to sample progress position constraint is generated After PCGAN generates sample, three kinds of screening techniques based on Euclidean distance, cosine similarity are proposed.Experiment shows constructed Minority class sample expands process and improves the stability of production confrontation network model processing discrete data, in imbalance problem There is good effect in classification.
Structure, feature and effect of the invention, the above institute is described in detail according to diagrammatically shown embodiment above Only presently preferred embodiments of the present invention is stated, but the present invention does not limit the scope of implementation as shown in the drawings, it is all according to structure of the invention Think made change or equivalent example modified to equivalent change, when not going beyond the spirit of the description and the drawings, It should all be within the scope of the present invention.

Claims (10)

1. a kind of virtual sample generation method based on production confrontation network characterized by comprising
Svm classifier pre-training is carried out based on input sample of the WGAN-GP improved model to generator;
The position of decision surface is obtained according to the svm classifier, and simulates the minority class sample for generating and being located near the decision surface;
According to the geometric distance for generating sample and the decision surface, setting generates the position constraint of sample, to control sample generation Range;
PCGAN model is established according to the position constraint, and carries out the minority class sample based on the PCGAN model and expands;
The generation sample for meeting original distribution is generated near SVM decision surface by the PCGAN model.
2. the virtual sample generation method according to claim 1 based on production confrontation network, which is characterized in that also wrap It includes:
Data screening is carried out to sample is generated;
Best generation data re -training svm classifier is selected, and then obtains new decision surface.
3. the virtual sample generation method according to claim 2 based on production confrontation network, which is characterized in that described Include: to sample progress data screening is generated
It is screened using crew, calculates generation sample the Euclidean distance and cosine similarity of itself and original sample and neighbour's sample, And judge whether to meet and impose a condition, if it is, selection exptended sample.
4. the virtual sample generation method according to claim 3 based on production confrontation network, which is characterized in that described Include: using crew's screening
For eachDetermine a series of k-th of nearest samples xjThe S of compositionNeighbor, and for each Pass through traversalWithIt calculates separately and generates sample at a distance from minority class sample and neighbour's sample;
If | | xi-xj| | < | | xk-xj| |, then further calculate (xi-xj) and (xk-xj) cosine similarity
If cosine similarity is greater than threshold value C, by xiIt is included in EDS extended data set Sexp
Wherein, xi、xj、xkRespectively generate sample, minority class sample, k-th of nearest samples, Smin、Sgen、SNeighborRespectively For minority class sample set, generate sample set, neighbour's sample set.
5. the virtual sample generation method according to claim 4 based on production confrontation network, which is characterized in that described Data screening is carried out to sample is generated further include:
Using the screening based on Danger collection, to screen the generation sample for meeting sample distribution being located near the Danger collection This, wherein the Danger collection includes the minority class sample containing most classes in neighbour's sample set.
6. the virtual sample generation method according to claim 5 based on production confrontation network, which is characterized in that described Use the screening based on Danger collection, comprising:
ForDetermine a series of nearest samples collection Si:m-NN, nearest samples, which are concentrated, then to be judged to each sample and is belonged to In the number of most class samples, i.e.,For meeting inequalityXiIt is formed SDanger
For eachThe data set for determining nearest samples composition is SNeighbor, and for eachTraversalWithCalculate separately distance;
If | | xi-xj| | < | | xk-xj| |, then further calculate (xi-xj) and (xk-xj) cosine similarity;
If cosine similarity is greater than threshold value C, by xiIt is included in EDS extended data set Sexp
Wherein, SDangerMinority class sample set, S containing most classes are concentrated for nearest samplesi:m-NNFor nearest samples collection, SmajFor most class sample sets.
7. the virtual sample generation method according to claim 6 based on production confrontation network, which is characterized in that described Data screening is carried out to sample is generated further include:
Sample is mapped in higher-dimension separable space by kernel method, according to sample is generated to the distance between hyperplane, works as distance When less than set threshold value, sample will be generated and bring exptended sample concentration into, wherein mapping function uses RBF core.
8. the virtual sample generation method according to claim 1 based on production confrontation network, which is characterized in that described The position constraint for generating sample is set, comprising:
The restrictive condition of SmoothLoss, the restrictive condition are set are as follows:
Wherein, x is the formation range of sample, and y is the cut off value of smooth piecewise function.
9. the virtual sample generation method according to claim 1 based on production confrontation network, which is characterized in that described PCGAN model is established according to the position constraint, comprising:
The loss function of generator:
The loss function of arbiter:
Wherein, E is expectation, c is class label, x be original sample,For generator generate sample,For x~pdataAnd Interpolated sample between the two, ε~U [0,1], pgTo generate sample distribution, pdataIt is really distributed for sample, ppenaltyFor's Distribution, λ be hyper parameter,For gradient, L is that the production of restriction site fights the expression of network, LsmoothTo generate sample Position constraint function, LDFor the loss function of arbiter, LGFor the loss function of generator, D is the discriminant function of authentic specimen, G For the generating function of sample, w is the weight vectors of decision surface, and b is the displacement item of model.
10. the virtual sample generation method according to claim 9 based on production confrontation network, which is characterized in that institute It states and carries out the minority class sample expansion based on the PCGAN model, comprising:
(1) hyper parameter of initialization design mainly includes gradient penalty coefficient λ1=10, decision surface constraint factor λ2=5, every wheel Arbiter frequency of training n in confrontation optimizationcritic=3, generator device frequency of training n in every wheel confrontation optimizationgen=1, training one Data amount check m=10, Adam optimizer hyper parameter α=0.0001, β criticized1=0.9, β2=0.99, arbiter parameter is initializedWith generator parameter θ0, then Linear SVM is trained to obtain its divisional plane parameter (w, b);
(2) it for each sample of every batch of in arbiter training, first samples to obtain x~p from authentic specimen collectiondataAnd classification Label c, then sampling obtains z~p (z) from noise profile, and noise is mapped to generation sample space, i.e., Then it calculatesIt obtains generating sample, wherein [0,1] ε~U;
(3) expression formula of arbiter loss function isThen it updates and sentences Other device parameter, i.e.,
(4) repeat step (2)~step (3) and when execute this two recirculate at the end of deconditioning, recycle total ncritic × m times;
(5) for each sample of every batch of in generator training, first sampling obtains z~p (z) from noise profile, then counts It counts according to the geometric distance with decision surface
(6) according to SmoothL1Loss theoretical calculation generator loss function, its expression formula isFurther update generator parameter, i.e. θ ← Adam (▽θ·LG,θ,α,β12);
(7) repeat step (5)~step (6) and when execute this two recirculate at the end of deconditioning, recycle total ngen× M times;
(8) it repeats step (2)~step (7) and stops when the generation data for generating network meet preset data demand Only train.
CN201910424679.XA 2019-05-21 2019-05-21 A kind of virtual sample generation method based on production confrontation network Pending CN110298450A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910424679.XA CN110298450A (en) 2019-05-21 2019-05-21 A kind of virtual sample generation method based on production confrontation network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910424679.XA CN110298450A (en) 2019-05-21 2019-05-21 A kind of virtual sample generation method based on production confrontation network

Publications (1)

Publication Number Publication Date
CN110298450A true CN110298450A (en) 2019-10-01

Family

ID=68027023

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910424679.XA Pending CN110298450A (en) 2019-05-21 2019-05-21 A kind of virtual sample generation method based on production confrontation network

Country Status (1)

Country Link
CN (1) CN110298450A (en)

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046755A (en) * 2019-11-27 2020-04-21 上海眼控科技股份有限公司 Character recognition method, character recognition device, computer equipment and computer-readable storage medium
CN111062310A (en) * 2019-12-13 2020-04-24 哈尔滨工程大学 Few-sample unmanned aerial vehicle image identification method based on virtual sample generation
CN111310791A (en) * 2020-01-17 2020-06-19 电子科技大学 Dynamic progressive automatic target identification method based on small sample number set
CN112091727A (en) * 2020-08-12 2020-12-18 上海交通大学 Cutter damage identification method and device based on virtual sample generation and terminal
CN113095446A (en) * 2021-06-09 2021-07-09 中南大学 Abnormal behavior sample generation method and system
CN114036356A (en) * 2021-10-13 2022-02-11 中国科学院信息工程研究所 Unbalanced traffic classification method and system based on confrontation generation network traffic enhancement
EP4033317A1 (en) 2021-01-26 2022-07-27 Sedapta S.r.l. Method and system for managing a cyber-physical production system with predictive capabilities of anomalous operating conditions
IT202100029405A1 (en) 2021-11-22 2023-05-22 Genera Ip B V A contrast enhancement medium for diagnostic imaging methods and systems

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111046755A (en) * 2019-11-27 2020-04-21 上海眼控科技股份有限公司 Character recognition method, character recognition device, computer equipment and computer-readable storage medium
CN111062310A (en) * 2019-12-13 2020-04-24 哈尔滨工程大学 Few-sample unmanned aerial vehicle image identification method based on virtual sample generation
CN111062310B (en) * 2019-12-13 2022-07-29 哈尔滨工程大学 Few-sample unmanned aerial vehicle image identification method based on virtual sample generation
CN111310791A (en) * 2020-01-17 2020-06-19 电子科技大学 Dynamic progressive automatic target identification method based on small sample number set
CN112091727A (en) * 2020-08-12 2020-12-18 上海交通大学 Cutter damage identification method and device based on virtual sample generation and terminal
EP4033317A1 (en) 2021-01-26 2022-07-27 Sedapta S.r.l. Method and system for managing a cyber-physical production system with predictive capabilities of anomalous operating conditions
CN113095446A (en) * 2021-06-09 2021-07-09 中南大学 Abnormal behavior sample generation method and system
CN114036356A (en) * 2021-10-13 2022-02-11 中国科学院信息工程研究所 Unbalanced traffic classification method and system based on confrontation generation network traffic enhancement
IT202100029405A1 (en) 2021-11-22 2023-05-22 Genera Ip B V A contrast enhancement medium for diagnostic imaging methods and systems
WO2023089589A1 (en) 2021-11-22 2023-05-25 Genera Ip B.V A contrast enhancing agent for diagnostic imaging methods and systems

Similar Documents

Publication Publication Date Title
CN110298450A (en) A kind of virtual sample generation method based on production confrontation network
CN108564592A (en) Based on a variety of image partition methods for being clustered to differential evolution algorithm of dynamic
CN110110802A (en) Airborne laser point cloud classification method based on high-order condition random field
CN108009509A (en) Vehicle target detection method
CN107392919B (en) Adaptive genetic algorithm-based gray threshold acquisition method and image segmentation method
CN109613006A (en) A kind of fabric defect detection method based on end-to-end neural network
CN110334580A (en) The equipment fault classification method of changeable weight combination based on integrated increment
CN109461025A (en) A kind of electric energy substitution potential customers&#39; prediction technique based on machine learning
Obayashi et al. Niching and elitist models for mogas
Yi et al. An improved initialization center algorithm for K-means clustering
CN105005789B (en) A kind of remote sensing images terrain classification method of view-based access control model vocabulary
CN106374465B (en) Short-term wind power forecast method based on GSA-LSSVM model
CN103996029B (en) Expression method for measuring similarity and device
CN104899607B (en) A kind of automatic classification method of traditional moire pattern
CN106529574A (en) Image classification method based on sparse automatic encoder and support vector machine
CN110097060A (en) A kind of opener recognition methods towards trunk image
CN109886464A (en) The low information loss short-term wind speed forecasting method of feature set is generated based on optimization singular value decomposition
CN110362997A (en) A kind of malice URL oversampler method based on generation confrontation network
CN103714577A (en) Three-dimensional model simplification method suitable for model with textures
CN109035289A (en) Purple soil image segmentation extracting method based on Chebyshev inequality H threshold value
CN108717497A (en) Imitative stichopus japonicus place of production discrimination method based on PCA-SVM
CN109271427A (en) A kind of clustering method based on neighbour&#39;s density and manifold distance
CN109635140A (en) A kind of image search method clustered based on deep learning and density peaks
CN107680099A (en) A kind of fusion IFOA and F ISODATA image partition method
CN109614520A (en) One kind is towards the matched parallel acceleration method of multi-mode figure

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination