CN110298450A - A kind of virtual sample generation method based on production confrontation network - Google Patents
A kind of virtual sample generation method based on production confrontation network Download PDFInfo
- Publication number
- CN110298450A CN110298450A CN201910424679.XA CN201910424679A CN110298450A CN 110298450 A CN110298450 A CN 110298450A CN 201910424679 A CN201910424679 A CN 201910424679A CN 110298450 A CN110298450 A CN 110298450A
- Authority
- CN
- China
- Prior art keywords
- sample
- data
- generator
- confrontation network
- generation method
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N20/00—Machine learning
- G06N20/10—Machine learning using kernel methods, e.g. support vector machines [SVM]
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- General Engineering & Computer Science (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Biology (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of virtual sample generation method based on production confrontation network, comprising: carries out svm classifier pre-training based on input sample of the WGAN-GP improved model to generator;The position of decision surface is obtained according to the svm classifier, and simulates the minority class sample for generating and being located near the decision surface;According to the geometric distance for generating sample and the decision surface, setting generates the position constraint of sample, to control sample formation range;PCGAN model is established according to the position constraint, and carries out the minority class sample based on the PCGAN model and expands;The generation sample for meeting original distribution is generated near SVM decision surface by the PCGAN model.The present invention can improve the stability and practicability of production confrontation network.
Description
Technical field
The present invention relates to deep learning nerual network technique fields, and in particular to a kind of void based on production confrontation network
Quasi- sample generating method.
Background technique
Since data nature, the acquisition conditions such as complexity and economic factor cause data to be easy to appear distribution not
Equilibrium appearance.And sample imbalance can make disaggregated model the phenomenon that decision surface offset occur, lead to not preferably be divided
Class result.By taking SVM classifier as an example, the classification performance in imbalanced training sets can decline as unbalance factor increases.
To solve the problems, such as sample imbalance, the game that production confrontation network passes through generator and arbiter can use at present, thus
Very approximate non-artificial interference sample sequence is distributed required for generating with original sample.But there is training not in original GAN
Stablize, the disadvantages of classifying quality is bad on imbalance problem.
Summary of the invention
The present invention provides a kind of virtual sample generation method based on production confrontation network, solves existing GAN model and exists
The unstable and bad problem of imbalance problem classifying quality in training can improve the stability of production confrontation network and practical
Property.
In order to achieve the above object, the present invention the following technical schemes are provided:
A kind of virtual sample generation method based on production confrontation network, comprising:
Svm classifier pre-training is carried out based on input sample of the WGAN-GP improved model to generator;
The position of decision surface is obtained according to the svm classifier, and simulates the minority class for generating and being located near the decision surface
Sample;
According to the geometric distance for generating sample and the decision surface, setting generates the position constraint of sample, to control sample
Formation range;
PCGAN model is established according to the position constraint, and carries out the minority class sample based on the PCGAN model and expands
It fills;
The generation sample for meeting original distribution is generated near SVM decision surface by the PCGAN model.
Preferably, further includes:
Data screening is carried out to sample is generated;
Best generation data re -training svm classifier is selected, and then obtains new decision surface.
Preferably, described pair of generation sample progress data screening includes:
It is screened using crew, it is similar to the Euclidean distance and cosine of original sample and neighbour's sample to calculate it to generation sample
Property, and judge whether to meet and impose a condition, if it is, selection exptended sample.
Preferably, described to include: using crew's screening
For eachDetermine a series of k-th of nearest samples xjThe S of compositionNeighbor, and for eachPass through traversalWithIt calculates separately and generates sample and minority class sample and neighbour's sample
Distance;
If | | xi-xj| | < | | xk-xj| |, then further calculate (xi-xj) and (xk-xj) cosine similarity
If cosine similarity is greater than threshold value C, by xiIt is included in EDS extended data set Sexp;
Wherein, xi、xj、xkRespectively generate sample, minority class sample, k-th of nearest samples, Smin、Sgen、SNeighbor
Respectively minority class sample set, generation sample set, neighbour's sample set.
Preferably, described pair of generation sample carries out data screening further include:
Using the screening based on Danger collection, to screen the generation for meeting sample distribution being located near the Danger collection
Sample, wherein the Danger collection includes the minority class sample containing most classes in neighbour's sample set.
Preferably, screening of the use based on Danger collection, comprising:
ForDetermine a series of nearest samples collection SDanger, nearest samples collection then is judged to each sample
In belong to the numbers of most class samples, i.e., | Si:m-NNSmaj|, for meeting inequalityXiIt is formed
SDanger;
For eachThe data set for determining nearest samples composition is SNeighbor, and for each
TraversalWithCalculate separately distance;
If | | xi-xj| | < | | xk-xj| |, then further calculate (xi-xj) and (xk-xj) cosine similarity;
If cosine similarity is greater than threshold value C, by xiIt is included in EDS extended data set Sexp;
Wherein, SDangerMinority class sample set, S containing most classes are concentrated for nearest samplesi:m-NNFor arest neighbors sample
This collection, SmajFor most class sample sets.
Preferably, described pair of generation sample carries out data screening further include:
Sample is mapped in higher-dimension separable space by kernel method, according to generating sample to the distance between hyperplane, when
When distance is less than set threshold value, sample will be generated and bring exptended sample concentration into, wherein mapping function uses RBF core.
Preferably, the setting generates the position constraint of sample, comprising:
The restrictive condition of SmoothLoss, the restrictive condition are set are as follows:
Wherein, x is the formation range of sample, and y is the cut off value of smooth piecewise function.
It is preferably, described that PCGAN model is established according to the position constraint, comprising:
The loss function of generator:
The loss function of arbiter:
Wherein, E is expectation, c is class label, x be original sample,For generator generate sample,For x~pdataAndInterpolated sample between the two, ε~U [0,1], pgTo generate sample distribution, pdataIt is really distributed for sample, ppenaltyForDistribution, λ be hyper parameter,For gradient, L is that the production of restriction site fights the expression of network, LsmoothTo generate sample
Position constraint function, LDFor the loss function of arbiter, LGFor the loss function of generator, D is the differentiation letter of authentic specimen
Number, G are the generating function of sample, and w is the weight vectors of decision surface, and b is the displacement item of model.
Preferably, the minority class sample carried out based on the PCGAN model expands, comprising:
(1) hyper parameter of initialization design mainly includes gradient penalty coefficient λ1=10, decision surface constraint factor λ2=5,
Arbiter frequency of training n in every wheel confrontation optimizationcritic=3, generator device frequency of training n in every wheel confrontation optimizationgen=1, it instructs
Practice batch of data amount check m=10, Adam optimizer hyper parameter α=0.0001, β1=0.9, β2=0.99, arbiter is initialized
ParameterWith generator parameter θ0, then Linear SVM is trained to obtain its divisional plane parameter (w, b);
(2) it for each sample of every batch of in arbiter training, first samples to obtain x~p from authentic specimen collectiondataAnd
Class label c, then sampling obtains z~p (z) from noise profile, and noise is mapped to generation sample space, i.e.,Then it calculatesIt obtains generating sample, wherein [0,1] ε~U;
(3) expression formula of arbiter loss function isThen more
New arbiter parameter, i.e.,
(4) repeat step (2)~step (3) and when execute this two recirculate at the end of deconditioning, circulation altogether
ncritic× m times;
(5) for each sample of every batch of in generator training, first sampling obtains z~p (z) from noise profile, connects
Calculate data and decision surface geometric distance
(6) according to SmoothL1Loss theoretical calculation generator loss function, its expression formula isGenerator parameter is further updated, i.e.,
(7) repeat step (5)~step (6) and when execute this two recirculate at the end of deconditioning, circulation altogether
ngen× m times;
(7) step (2)~step (7) is repeated and when the generation data for generating network meet preset data demand
When deconditioning.
The present invention provides a kind of virtual sample generation method based on production confrontation network, by WGAN-GP model
It is added to sample and the decision surface distance based on SVM theory to limit, to carry out position constraint to sample is generated, then exist
PCGAN carries out data screening after generating sample, after generating sample and screening by PCGAN model, chooses best generation sample
This re -training SVM obtains new decision surface.The stability of production confrontation network model processing discrete data is improved,
There is good effect in imbalance problem classification.
Detailed description of the invention
In order to illustrate more clearly of specific embodiments of the present invention, attached drawing needed in the embodiment will be made below
Simply introduce.
Fig. 1 is a kind of flow chart of virtual sample generation method based on production confrontation network provided by the invention;
Fig. 2 is that the minority class sample for generating network based on confrontation type in the embodiment of the present invention expands flow diagram;
Fig. 3 is performance of the above-mentioned technical proposal compared to other methods on cmc, pima data set in the embodiment of the present invention
Evaluation result figure;
Fig. 4 be in the embodiment of the present invention above-mentioned technical proposal compared to other methods on robot, satlog data set
Evaluation results figure;
Fig. 5 be in the embodiment of the present invention above-mentioned technical proposal compared to other methods in haberman, semeion data set
On Evaluation results figure;
Fig. 6 be in the embodiment of the present invention above-mentioned technical proposal compared to other methods on yeast, yeast_2 data set
Evaluation results figure.
Specific embodiment
The scheme of embodiment in order to enable those skilled in the art to better understand the present invention with reference to the accompanying drawing and is implemented
Mode is described in further detail the embodiment of the present invention.
Have the shortcomings that training is unstable, classifying quality is bad on imbalance problem for current original GAN network, this
Invention provides a kind of virtual sample generation method based on production confrontation network, is based on by being added in WGAN-GP model
Sample and the decision surface distance of SVM theory limit, to carry out position constraint to sample is generated, then generate sample in PCGAN
After carry out data screening, by PCGAN model generate sample and screening after, choose best generation sample re -training SVM
Obtain new decision surface.Solve the problems, such as that existing GAN model is bad in the upper unstable and imbalance problem classifying quality of training, energy
Mention the practicability of production confrontation network.
As shown in Figure 1, a kind of virtual sample generation method based on production confrontation network, comprising:
Step 1: svm classifier pre-training is carried out based on input sample of the WGAN-GP improved model to generator;
Step 2: the position of decision surface being obtained according to the svm classifier, and simulates generation and is located near the decision surface
Minority class sample;
Step 3: according to the geometric distance for generating sample and the decision surface, setting generates the position constraint of sample, with control
This formation range of sample preparation;
Step 4: PCGAN model being established according to the position constraint, and carries out the minority class sample based on the PCGAN model
This expansion;
Step 5: generating the generation sample for meeting original distribution near SVM decision surface by the PCGAN model.
Further, this method further include:
Step 6: carrying out data screening to sample is generated;
Step 7: selecting best generation data re -training svm classifier, and then obtain new decision surface.
Specifically, in a first aspect, the method for building up of the improvement PCGAN based on WGAN-GP includes the following steps:
Step S1: in generating portion pre-training SVM, the position of decision surface is obtained.
Only it is located at the point near decision surface in SVM theory and there is influence, therefore master of the invention to the variation of decision surface
Wanting target is to fight network analog by production to generate the minority class sample being located near decision surface.
Step S2: in generator training process, the geometric distance for generating sample and decision surface is measured;
Step S3: being added to the restrictive condition of SmoothLoss, to control sample formation range;
Wherein, SmoothLoss restrictive condition is expressed as follows:
Wherein, x is the formation range of sample, and y is the cut off value of smooth piecewise function.
Further, the structure of the PCGAN, target are expressed as follows:
Wherein, the loss function of generator is expressed as follows:
The loss function of arbiter:
Wherein, E is expectation, c is class label, x be original sample,For generator generate sample,For x~pdataAndInterpolated sample between the two, ε~U [0,1], pgTo generate sample distribution, pdataIt is really distributed for sample, ppenaltyForDistribution, λ be hyper parameter,For gradient, L is that the production of restriction site fights the expression of network, LsmoothTo generate sample
Position constraint function, LDFor the loss function of arbiter, LGFor the loss function of generator, D is the differentiation letter of authentic specimen
Number, G are the generating function of sample, and w is the weight vectors of decision surface, and b is the displacement item of model.
In second aspect, the screening technique after three kinds of samples of proposition generate has, method 1: calculating it for generating sample
Exptended sample is selected if meeting specified conditions with the Euclidean distance and cosine similarity of original sample and neighbour's sample.Method
2: screening is located at the generation sample for meeting sample distribution of " Danger collection " nearby.Wherein, " Danger collection " includes neighbour's sample
Concentrate the minority class sample containing most classes.Method 3: sample is mapped in higher-dimension separable space by the present invention by kernel method,
According to sample is generated to the distance between hyperplane, when distance is less than set threshold value, sample will be generated and bring expansion sample into
This concentration.
Specifically, it includes: to be screened using crew that described pair of generation sample, which carries out data screening, to generate sample calculate its with
The Euclidean distance and cosine similarity of original sample and neighbour's sample, and judge whether to meet and impose a condition, if it is, selection
Exptended sample.
Further, described to include: using crew's screening
For eachDetermine a series of k-th of nearest samples xjThe S of compositionNeighbor, and for eachPass through traversalWithIt calculates separately and generates sample and minority class sample and neighbour's sample
Distance.If | | xi-xj| | < | | xk-xj| |, then further calculate (xi-xj) and (xk-xj) cosine similarityIf cosine similarity is greater than threshold value C, by xiIt is included in EDS extended data set Sexp。
Wherein, xi、xj、xkRespectively generate sample, minority class sample, k-th of nearest samples, Smin、Sgen、SNeighborIt is respectively few
Several classes of sample sets generate sample set, neighbour's sample set.
Described pair of generation sample carries out data screening further include: uses the screening based on Danger collection, is located at institute with screening
State the generation sample for meeting sample distribution near Danger collection, wherein the Danger collection includes containing in neighbour's sample set
The minority class sample of most classes.
Screening of the use based on Danger collection, comprising: forDetermine a series of nearest samples collection
SDanger, nearest samples concentration, which belongs to the number of most class samples, then to be judged to each sample, i.e.,For
Meet inequalityXiForm SDanger.For eachDetermine that nearest samples form
Data set be SNeighbor, and for eachTraversal rWithCalculate separately distance.Such as
Fruit | | xi-xj| | < | | xk-xj| |, then further calculate (xi-xj) and (xk-xj) cosine similarity;If cosine similarity is greater than
Threshold value C, then by xiIt is included in EDS extended data set Sexp.Wherein, SDangerThe minority class sample containing most classes is concentrated for nearest samples
This set, Si:m-NNFor nearest samples collection, SmajFor most class sample sets.
Described pair of generation sample carries out data screening further include: sample is mapped to higher-dimension separable space by kernel method
In, according to sample is generated to the distance between hyperplane, when distance is less than set threshold value, sample will be generated and bring expansion into
In sample set, wherein mapping function uses RBF core, and screening sample method is generated with Euclidean distance and cosine similarity to measure
Similarity between sample and authentic specimen.
Although it should be noted that in above-mentioned screening sample embodiment by each step in the way of above-mentioned precedence
It is described, it will be recognized to those skilled in the art that the effect in order to realize the present embodiment, between different steps not
It must execute according to such order, (parallel) simultaneously can execute or be executed with reverse order, these simple variations are all
Within protection scope of the present invention.
Further, the setting generates the position constraint of sample, comprising:
The restrictive condition of SmoothLoss, the restrictive condition are set are as follows:
Wherein, x is the formation range of sample, and y is the cut off value of smooth piecewise function.
It is described that PCGAN model is established according to the position constraint, comprising:
The loss function of generator:
The loss function of arbiter:
Wherein, E is expectation, c is class label, x be original sample,For generator generate sample,For x~pdataAndInterpolated sample between the two, ε~U [0,1], pgTo generate sample distribution, pdataIt is really distributed for sample, ppenaltyForDistribution, λ be hyper parameter,For gradient, L is that the production of restriction site fights the expression of network, LsmoothTo generate sample
Position constraint function, LDFor the loss function of arbiter, LGFor the loss function of generator, D is the differentiation letter of authentic specimen
Number, G are the generating function of sample, and w is the weight vectors of decision surface, and b is the displacement item of model.
In one embodiment, the minority class sample for generating network based on confrontation type, which expands process, may include steps of:
(1) hyper parameter of initialization design mainly includes gradient penalty coefficient λ1=10, decision surface constraint factor λ2=5,
Arbiter frequency of training n in every wheel confrontation optimizationcritic=3, generator device frequency of training n in every wheel confrontation optimizationgen=1, it instructs
Practice batch of data amount check m=10, Adam optimizer hyper parameter α=0.0001, β1=0.9, β2=0.99, arbiter is initialized
ParameterWith generator parameter θ0, then Linear SVM is trained to obtain its divisional plane parameter (w, b);
(2) it for each sample of every batch of in arbiter training, first samples to obtain x~p from authentic specimen collectiondataAnd
Class label c, then sampling obtains z~p (z) from noise profile, and noise is mapped to generation sample space, i.e.,Then it calculatesIt obtains generating sample, wherein [0,1] ε~U.
(3) expression formula of arbiter loss function isThen more
New arbiter parameter, i.e.,
(4) repeat step (2)~step (3) and when execute this two recirculate at the end of deconditioning.Circulation is altogether
ncritic× m times.
(5) for each sample of every batch of in generator training, first sampling obtains z~p (z) from noise profile, connects
Calculate data and decision surface geometric distance
(6) according to SmoothL1Loss theoretical calculation generator loss function, its expression formula isGenerator parameter is further updated, i.e.,
(7) repeat step (5)~step (6) and when execute this two recirculate at the end of deconditioning.Circulation is altogether
ngen× m times.
(7) step (2)~step (7) is repeated and when the generation data for generating network meet preset data demand
When deconditioning.
In one embodiment, above-mentioned sample product process is specifically described.
Target data set uses the part unbalanced dataset in UCI database in embodiment, while according to the ratio of 8:2
The target data set is divided into training set and test set at random, each data is tied by 10 experiments in experimentation
Obtained from fruit is averaged.Wherein use F1-score, G-mean as evaluation criterion.Comparison algorithm includes: that traditional SVM is calculated
The technical solution proposed in method, the SVM algorithm with core (RBF), SMOTE algorithm and the present embodiment.
Refering to attached drawing 3, Fig. 3 illustrate in the present embodiment above-mentioned technical proposal compared to other methods in cmc and
Evaluation results on pima data set.
Specifically, on cmc data set, translation of the PCGAN proposed in the present embodiment the data generated to divisional plane
It plays a role in promoting, classifying quality is allowed to improve 12.5% (compared with RBF SVM);On pima data set, in the present embodiment
The PCGAN and screening technique of proposition, are the models that target foundation is modified to SVM divisional plane, and sample generated is PCGAN mould
The distribution of quasi- original sample, rather than linear interpolation after simple cluster, show more reasonable for the processing of such data,
Therefore it meets and improves 5.2% in RBF SVM.
Refering to attached drawing 4, Fig. 4 illustrates in the present embodiment above-mentioned technical proposal compared to other methods in robot
With the Evaluation results on stalog data set.
Specifically, on robot data set, the PCGAN proposed in the present embodiment data generated are on this data set
It is smaller for the variation promotion of divisional plane, it is as a result slightly promoted, is 0.7%;On stalog data set, the present embodiment and RBF
Svm classifier effect is close, but this method is best in group.
Refering to attached drawing 5, Fig. 5 illustrates above-mentioned technical proposal in the present embodiment and exists compared to other methods
Evaluation results on haberman and semeion data set.
Specifically, on haberman data set, the PCGAN proposed in the present embodiment can preferably analogue data be distributed,
So that classifying quality is improved, reach 16.3%;On semeion data set, the present embodiment performance is stablized, with RBF
SVM effect is suitable.
Refering to attached drawing 6, Fig. 6 illustrates in the present embodiment above-mentioned technical proposal compared to other methods in yeast
With the Evaluation results on yeast_2 data set.
Specifically, on yeast and yeast_2 data set, the PCGAN and screening technique proposed in the present embodiment generates sample
This is excessively concentrated, and sample diversity is insufficient, therefore is promoted limited.
It is tested on 8 UCI data sets by above-mentioned sample product process to verify the performance of PCGAN method, and
It was found that its F1-score and G-mean performance on most of data set is all very prominent, totally 14/16 index reaches the
One, the SVM algorithm and SMOTE algorithm that effect is better than traditional SVM algorithm, has core (RBF).
As it can be seen that the present invention provides a kind of virtual sample generation method based on production confrontation network, by WGAN-GP
Model is added to sample and decision surface distance limitation based on SVM theory, to then exist to sample progress position constraint is generated
After PCGAN generates sample, three kinds of screening techniques based on Euclidean distance, cosine similarity are proposed.Experiment shows constructed
Minority class sample expands process and improves the stability of production confrontation network model processing discrete data, in imbalance problem
There is good effect in classification.
Structure, feature and effect of the invention, the above institute is described in detail according to diagrammatically shown embodiment above
Only presently preferred embodiments of the present invention is stated, but the present invention does not limit the scope of implementation as shown in the drawings, it is all according to structure of the invention
Think made change or equivalent example modified to equivalent change, when not going beyond the spirit of the description and the drawings,
It should all be within the scope of the present invention.
Claims (10)
1. a kind of virtual sample generation method based on production confrontation network characterized by comprising
Svm classifier pre-training is carried out based on input sample of the WGAN-GP improved model to generator;
The position of decision surface is obtained according to the svm classifier, and simulates the minority class sample for generating and being located near the decision surface;
According to the geometric distance for generating sample and the decision surface, setting generates the position constraint of sample, to control sample generation
Range;
PCGAN model is established according to the position constraint, and carries out the minority class sample based on the PCGAN model and expands;
The generation sample for meeting original distribution is generated near SVM decision surface by the PCGAN model.
2. the virtual sample generation method according to claim 1 based on production confrontation network, which is characterized in that also wrap
It includes:
Data screening is carried out to sample is generated;
Best generation data re -training svm classifier is selected, and then obtains new decision surface.
3. the virtual sample generation method according to claim 2 based on production confrontation network, which is characterized in that described
Include: to sample progress data screening is generated
It is screened using crew, calculates generation sample the Euclidean distance and cosine similarity of itself and original sample and neighbour's sample,
And judge whether to meet and impose a condition, if it is, selection exptended sample.
4. the virtual sample generation method according to claim 3 based on production confrontation network, which is characterized in that described
Include: using crew's screening
For eachDetermine a series of k-th of nearest samples xjThe S of compositionNeighbor, and for each
Pass through traversalWithIt calculates separately and generates sample at a distance from minority class sample and neighbour's sample;
If | | xi-xj| | < | | xk-xj| |, then further calculate (xi-xj) and (xk-xj) cosine similarity
If cosine similarity is greater than threshold value C, by xiIt is included in EDS extended data set Sexp;
Wherein, xi、xj、xkRespectively generate sample, minority class sample, k-th of nearest samples, Smin、Sgen、SNeighborRespectively
For minority class sample set, generate sample set, neighbour's sample set.
5. the virtual sample generation method according to claim 4 based on production confrontation network, which is characterized in that described
Data screening is carried out to sample is generated further include:
Using the screening based on Danger collection, to screen the generation sample for meeting sample distribution being located near the Danger collection
This, wherein the Danger collection includes the minority class sample containing most classes in neighbour's sample set.
6. the virtual sample generation method according to claim 5 based on production confrontation network, which is characterized in that described
Use the screening based on Danger collection, comprising:
ForDetermine a series of nearest samples collection Si:m-NN, nearest samples, which are concentrated, then to be judged to each sample and is belonged to
In the number of most class samples, i.e.,For meeting inequalityXiIt is formed
SDanger;
For eachThe data set for determining nearest samples composition is SNeighbor, and for eachTraversalWithCalculate separately distance;
If | | xi-xj| | < | | xk-xj| |, then further calculate (xi-xj) and (xk-xj) cosine similarity;
If cosine similarity is greater than threshold value C, by xiIt is included in EDS extended data set Sexp;
Wherein, SDangerMinority class sample set, S containing most classes are concentrated for nearest samplesi:m-NNFor nearest samples collection,
SmajFor most class sample sets.
7. the virtual sample generation method according to claim 6 based on production confrontation network, which is characterized in that described
Data screening is carried out to sample is generated further include:
Sample is mapped in higher-dimension separable space by kernel method, according to sample is generated to the distance between hyperplane, works as distance
When less than set threshold value, sample will be generated and bring exptended sample concentration into, wherein mapping function uses RBF core.
8. the virtual sample generation method according to claim 1 based on production confrontation network, which is characterized in that described
The position constraint for generating sample is set, comprising:
The restrictive condition of SmoothLoss, the restrictive condition are set are as follows:
Wherein, x is the formation range of sample, and y is the cut off value of smooth piecewise function.
9. the virtual sample generation method according to claim 1 based on production confrontation network, which is characterized in that described
PCGAN model is established according to the position constraint, comprising:
The loss function of generator:
The loss function of arbiter:
Wherein, E is expectation, c is class label, x be original sample,For generator generate sample,For x~pdataAnd
Interpolated sample between the two, ε~U [0,1], pgTo generate sample distribution, pdataIt is really distributed for sample, ppenaltyFor's
Distribution, λ be hyper parameter,For gradient, L is that the production of restriction site fights the expression of network, LsmoothTo generate sample
Position constraint function, LDFor the loss function of arbiter, LGFor the loss function of generator, D is the discriminant function of authentic specimen, G
For the generating function of sample, w is the weight vectors of decision surface, and b is the displacement item of model.
10. the virtual sample generation method according to claim 9 based on production confrontation network, which is characterized in that institute
It states and carries out the minority class sample expansion based on the PCGAN model, comprising:
(1) hyper parameter of initialization design mainly includes gradient penalty coefficient λ1=10, decision surface constraint factor λ2=5, every wheel
Arbiter frequency of training n in confrontation optimizationcritic=3, generator device frequency of training n in every wheel confrontation optimizationgen=1, training one
Data amount check m=10, Adam optimizer hyper parameter α=0.0001, β criticized1=0.9, β2=0.99, arbiter parameter is initializedWith generator parameter θ0, then Linear SVM is trained to obtain its divisional plane parameter (w, b);
(2) it for each sample of every batch of in arbiter training, first samples to obtain x~p from authentic specimen collectiondataAnd classification
Label c, then sampling obtains z~p (z) from noise profile, and noise is mapped to generation sample space, i.e.,
Then it calculatesIt obtains generating sample, wherein [0,1] ε~U;
(3) expression formula of arbiter loss function isThen it updates and sentences
Other device parameter, i.e.,
(4) repeat step (2)~step (3) and when execute this two recirculate at the end of deconditioning, recycle total ncritic
× m times;
(5) for each sample of every batch of in generator training, first sampling obtains z~p (z) from noise profile, then counts
It counts according to the geometric distance with decision surface
(6) according to SmoothL1Loss theoretical calculation generator loss function, its expression formula isFurther update generator parameter, i.e. θ ← Adam
(▽θ·LG,θ,α,β1,β2);
(7) repeat step (5)~step (6) and when execute this two recirculate at the end of deconditioning, recycle total ngen×
M times;
(8) it repeats step (2)~step (7) and stops when the generation data for generating network meet preset data demand
Only train.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910424679.XA CN110298450A (en) | 2019-05-21 | 2019-05-21 | A kind of virtual sample generation method based on production confrontation network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910424679.XA CN110298450A (en) | 2019-05-21 | 2019-05-21 | A kind of virtual sample generation method based on production confrontation network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110298450A true CN110298450A (en) | 2019-10-01 |
Family
ID=68027023
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910424679.XA Pending CN110298450A (en) | 2019-05-21 | 2019-05-21 | A kind of virtual sample generation method based on production confrontation network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110298450A (en) |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111046755A (en) * | 2019-11-27 | 2020-04-21 | 上海眼控科技股份有限公司 | Character recognition method, character recognition device, computer equipment and computer-readable storage medium |
CN111062310A (en) * | 2019-12-13 | 2020-04-24 | 哈尔滨工程大学 | Few-sample unmanned aerial vehicle image identification method based on virtual sample generation |
CN111310791A (en) * | 2020-01-17 | 2020-06-19 | 电子科技大学 | Dynamic progressive automatic target identification method based on small sample number set |
CN112091727A (en) * | 2020-08-12 | 2020-12-18 | 上海交通大学 | Cutter damage identification method and device based on virtual sample generation and terminal |
CN113095446A (en) * | 2021-06-09 | 2021-07-09 | 中南大学 | Abnormal behavior sample generation method and system |
CN114036356A (en) * | 2021-10-13 | 2022-02-11 | 中国科学院信息工程研究所 | Unbalanced traffic classification method and system based on confrontation generation network traffic enhancement |
EP4033317A1 (en) | 2021-01-26 | 2022-07-27 | Sedapta S.r.l. | Method and system for managing a cyber-physical production system with predictive capabilities of anomalous operating conditions |
IT202100029405A1 (en) | 2021-11-22 | 2023-05-22 | Genera Ip B V | A contrast enhancement medium for diagnostic imaging methods and systems |
-
2019
- 2019-05-21 CN CN201910424679.XA patent/CN110298450A/en active Pending
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111046755A (en) * | 2019-11-27 | 2020-04-21 | 上海眼控科技股份有限公司 | Character recognition method, character recognition device, computer equipment and computer-readable storage medium |
CN111062310A (en) * | 2019-12-13 | 2020-04-24 | 哈尔滨工程大学 | Few-sample unmanned aerial vehicle image identification method based on virtual sample generation |
CN111062310B (en) * | 2019-12-13 | 2022-07-29 | 哈尔滨工程大学 | Few-sample unmanned aerial vehicle image identification method based on virtual sample generation |
CN111310791A (en) * | 2020-01-17 | 2020-06-19 | 电子科技大学 | Dynamic progressive automatic target identification method based on small sample number set |
CN112091727A (en) * | 2020-08-12 | 2020-12-18 | 上海交通大学 | Cutter damage identification method and device based on virtual sample generation and terminal |
EP4033317A1 (en) | 2021-01-26 | 2022-07-27 | Sedapta S.r.l. | Method and system for managing a cyber-physical production system with predictive capabilities of anomalous operating conditions |
CN113095446A (en) * | 2021-06-09 | 2021-07-09 | 中南大学 | Abnormal behavior sample generation method and system |
CN114036356A (en) * | 2021-10-13 | 2022-02-11 | 中国科学院信息工程研究所 | Unbalanced traffic classification method and system based on confrontation generation network traffic enhancement |
IT202100029405A1 (en) | 2021-11-22 | 2023-05-22 | Genera Ip B V | A contrast enhancement medium for diagnostic imaging methods and systems |
WO2023089589A1 (en) | 2021-11-22 | 2023-05-25 | Genera Ip B.V | A contrast enhancing agent for diagnostic imaging methods and systems |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110298450A (en) | A kind of virtual sample generation method based on production confrontation network | |
CN108564592A (en) | Based on a variety of image partition methods for being clustered to differential evolution algorithm of dynamic | |
CN110110802A (en) | Airborne laser point cloud classification method based on high-order condition random field | |
CN108009509A (en) | Vehicle target detection method | |
CN107392919B (en) | Adaptive genetic algorithm-based gray threshold acquisition method and image segmentation method | |
CN109613006A (en) | A kind of fabric defect detection method based on end-to-end neural network | |
CN110334580A (en) | The equipment fault classification method of changeable weight combination based on integrated increment | |
CN109461025A (en) | A kind of electric energy substitution potential customers' prediction technique based on machine learning | |
Obayashi et al. | Niching and elitist models for mogas | |
Yi et al. | An improved initialization center algorithm for K-means clustering | |
CN105005789B (en) | A kind of remote sensing images terrain classification method of view-based access control model vocabulary | |
CN106374465B (en) | Short-term wind power forecast method based on GSA-LSSVM model | |
CN103996029B (en) | Expression method for measuring similarity and device | |
CN104899607B (en) | A kind of automatic classification method of traditional moire pattern | |
CN106529574A (en) | Image classification method based on sparse automatic encoder and support vector machine | |
CN110097060A (en) | A kind of opener recognition methods towards trunk image | |
CN109886464A (en) | The low information loss short-term wind speed forecasting method of feature set is generated based on optimization singular value decomposition | |
CN110362997A (en) | A kind of malice URL oversampler method based on generation confrontation network | |
CN103714577A (en) | Three-dimensional model simplification method suitable for model with textures | |
CN109035289A (en) | Purple soil image segmentation extracting method based on Chebyshev inequality H threshold value | |
CN108717497A (en) | Imitative stichopus japonicus place of production discrimination method based on PCA-SVM | |
CN109271427A (en) | A kind of clustering method based on neighbour's density and manifold distance | |
CN109635140A (en) | A kind of image search method clustered based on deep learning and density peaks | |
CN107680099A (en) | A kind of fusion IFOA and F ISODATA image partition method | |
CN109614520A (en) | One kind is towards the matched parallel acceleration method of multi-mode figure |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |